
285 views
Create a Real Time Voice Translator using Python
Creating a real-time voice translator using Python involves several steps, including speech recognition, translation, and text-to-speech synthesis. To build a simple example, you can use the Google Cloud Speech-to-Text and Translation API for speech recognition and translation. Here’s a step-by-step guide:
Note: This example uses Google Cloud services, and you’ll need to set up a Google Cloud account and obtain API credentials. Google Cloud offers a free tier with limited usage.
- Install Required Libraries: Install the necessary Python libraries using pip:
pip install pyaudio google-cloud-speech google-cloud-translate
- Set Up Google Cloud Services:
- Create a Google Cloud project and enable the Speech-to-Text and Translation API.
- Create service account credentials and save them as a JSON file.
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path of your JSON credentials file:export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json"
- Python Code: Here’s a Python script for a real-time voice translator:
import os
import pyaudio
import wave
from google.cloud import speech_v1p1beta1 as speech
from google.cloud import translate_v2 as translate
def record_audio():
audio = pyaudio.PyAudio()
stream = audio.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
frames = []
print("Recording...")
try:
while True:
data = stream.read(1024)
frames.append(data)
except KeyboardInterrupt:
pass
print("Finished recording.")
stream.stop_stream()
stream.close()
audio.terminate()
return frames
def save_audio(frames, filename):
audio = pyaudio.PyAudio()
wf = wave.open(filename, 'wb')
wf.setnchannels(1)
wf.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
wf.setframerate(16000)
wf.writeframes(b''.join(frames))
wf.close()
def transcribe_audio(audio_file):
client = speech.SpeechClient()
with open(audio_file, 'rb') as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
response = client.recognize(config=config, audio=audio)
return response.results[0].alternatives[0].transcript
def translate_text(text, target_language):
client = translate.Client()
translation = client.translate(text, target_language=target_language)
return translation["translatedText"]
def main():
frames = record_audio()
audio_file = "audio.wav"
save_audio(frames, audio_file)
transcript = transcribe_audio(audio_file)
print(f"Transcription: {transcript}")
target_language = "fr" # Change this to your desired target language code
translated_text = translate_text(transcript, target_language)
print(f"Translation ({target_language}): {translated_text}")
os.remove(audio_file)
if __name__ == "__main__":
main()
- Usage:
- Run the Python script.
- Speak into your microphone when prompted.
- The script will record your voice, transcribe it, and translate it into the target language.
This is a basic example, and you can extend it to handle real-time translation and text-to-speech synthesis as needed. Additionally, you may want to implement error handling and user interface components for a more user-friendly experience.