Facebook has introduced SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows communication through speech and text in different languages. SeamlessM4T includes speech recognition for almost 100 languages, speech-to-text translation for nearly 100 input and output languages, speech-to-speech translation supporting almost 100 input languages and 36 output languages (including English), text-to-text translation for almost 100 languages, and text-to-speech translation supporting almost 100 input languages and 35 output languages (including English).

SeamlessM4T is available under a research license for researchers and developers to build on. Additionally, Facebook is sharing the metadata of SeamlessAlign, which is the largest open multimodal translation dataset to date, containing 270,000 hours of mined speech and text alignments.

Facebook aims to facilitate effortless communication across different languages through the development of SeamlessM4T.

A new all-in-one multimodal and multilingual AI translation model, called SeamlessM4T, has been introduced by Facebook. This model allows communication through speech and text in different languages and includes speech recognition for nearly 100 languages, speech-to-text translation for almost 100 input and output languages, speech-to-speech translation supporting almost 100 input languages and 36 output languages (including English), text-to-text translation for almost 100 languages, and text-to-speech translation supporting almost 100 input languages and 35 output languages (including English).

SeamlessM4T is the first translation model of its kind and is available under a research license for developers and researchers to build on. Facebook is also sharing the metadata of SeamlessAlign, the largest open multimodal translation dataset to date, containing 270,000 hours of mined speech and text alignments.

The development of a universal language translator, like the Babel Fish from The Hitchhiker’s Guide to the Galaxy, is challenging due to existing speech-to-speech and speech-to-text systems only covering a small fraction of the world’s languages. However, SeamlessM4T’s single system approach reduces errors and delays compared to the use of separate models, thus improving the efficiency and quality of translations. This enables people who speak different languages to communicate more effectively.

SeamlessM4T builds on previous advancements made by Facebook and others towards the creation of a universal translator. Facebook released No Language Left Behind (NLLB) last year, a text-to-text machine translation model that supports 200 languages and has since been integrated into Wikipedia as a translation provider. Facebook also shared a demo of their Universal Speech Translator, which was the first direct speech-to-speech translation system for Hokkien, a language without a widely used writing system. Earlier this year, Facebook revealed Massively Multilingual Speech, which provides speech recognition, language identification, and speech synthesis technology across over 1,100 languages.

SeamlessM4T draws on insights from all of these projects to enable a multilingual and multimodal translation experience stemming from a single model, built across a wide range of spoken data sources with state-of-the-art results.