Cover Image for Create a Real Time Voice Translator using Python
285 views

Create a Real Time Voice Translator using Python

Creating a real-time voice translator using Python involves several steps, including speech recognition, translation, and text-to-speech synthesis. To build a simple example, you can use the Google Cloud Speech-to-Text and Translation API for speech recognition and translation. Here’s a step-by-step guide:

Note: This example uses Google Cloud services, and you’ll need to set up a Google Cloud account and obtain API credentials. Google Cloud offers a free tier with limited usage.

  1. Install Required Libraries: Install the necessary Python libraries using pip:
   pip install pyaudio google-cloud-speech google-cloud-translate
  1. Set Up Google Cloud Services:
  • Create a Google Cloud project and enable the Speech-to-Text and Translation API.
  • Create service account credentials and save them as a JSON file.
  • Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your JSON credentials file: export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json"
  1. Python Code: Here’s a Python script for a real-time voice translator:
   import os
   import pyaudio
   import wave
   from google.cloud import speech_v1p1beta1 as speech
   from google.cloud import translate_v2 as translate

   def record_audio():
       audio = pyaudio.PyAudio()

       stream = audio.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
       frames = []

       print("Recording...")

       try:
           while True:
               data = stream.read(1024)
               frames.append(data)
       except KeyboardInterrupt:
           pass

       print("Finished recording.")
       stream.stop_stream()
       stream.close()
       audio.terminate()

       return frames

   def save_audio(frames, filename):
       audio = pyaudio.PyAudio()
       wf = wave.open(filename, 'wb')
       wf.setnchannels(1)
       wf.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
       wf.setframerate(16000)
       wf.writeframes(b''.join(frames))
       wf.close()

   def transcribe_audio(audio_file):
       client = speech.SpeechClient()
       with open(audio_file, 'rb') as audio_file:
           content = audio_file.read()
       audio = speech.RecognitionAudio(content=content)
       config = speech.RecognitionConfig(
           encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
           sample_rate_hertz=16000,
           language_code="en-US",
       )
       response = client.recognize(config=config, audio=audio)

       return response.results[0].alternatives[0].transcript

   def translate_text(text, target_language):
       client = translate.Client()
       translation = client.translate(text, target_language=target_language)
       return translation["translatedText"]

   def main():
       frames = record_audio()
       audio_file = "audio.wav"
       save_audio(frames, audio_file)

       transcript = transcribe_audio(audio_file)
       print(f"Transcription: {transcript}")

       target_language = "fr"  # Change this to your desired target language code
       translated_text = translate_text(transcript, target_language)
       print(f"Translation ({target_language}): {translated_text}")

       os.remove(audio_file)

   if __name__ == "__main__":
       main()
  1. Usage:
  • Run the Python script.
  • Speak into your microphone when prompted.
  • The script will record your voice, transcribe it, and translate it into the target language.

This is a basic example, and you can extend it to handle real-time translation and text-to-speech synthesis as needed. Additionally, you may want to implement error handling and user interface components for a more user-friendly experience.

YOU MAY ALSO LIKE...

The Tech Thunder

The Tech Thunder

The Tech Thunder


COMMENTS