Google has announced its first direct speech-to-speech translation system called “Translatotron” that can convert verbal communication from one language to another while maintaining the speaker’s voice and tempo.
“Translatotron” is based on a sequence-to-sequence network which takes source spectrograms — a visual representation of frequencies — as input and generates spectrograms of the translated content in the target language, Ye Jia and Ron Weiss, software engineers at Google Artificial Intelligence (AI) wrote in a blog post on Wednesday.