Tuesday, September 26, 2023

SeamlessM4T, a multimodal AI model for speech and text translations

Language is a medium of communication that helps us express our thoughts, feelings, and emotions. However, many different languages are spoken worldwide, which creates problems of poor communication and can lead to misunderstandings. Learning a new language can be a costly and time-consuming process. Even if you learn a language, sometimes mixing languages by accident can lead to confusion. Many big companies like Google and Apple are working to solve this issue.

To overcome this problem, Meta Platforms Inc (Meta) introduced SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that will allow people to communicate through speech and text in different languages easily.

SeamlessM4T builds on the findings of all of Meta’s projects since last year, including No Language Left Behind (NLLB), Universal Speech Translator, SpeechMatrix, and Massively Multilingual Speech, to enable a multilingual and multimodal translation experience generated from a single model.

SeamlessM4T supports automatic speech recognition for nearly 100 languages, speech-to-text translation in almost 100 input and output languages, speech-to-speech translation in nearly 100 input languages, and 35 (+ English) output languages. It also supports text-to-text and text-to-speech translation in nearly 100 languages, almost 100 input languages, and 35 (+ English) output languages in a single model. It is an essential breakthrough in the AI community’s quest to create universal multitask systems.

Meta SeamlessM4T’s single system approach reduces errors and delays, increasing the efficiency and quality of the translation process. It enables people speaking different languages to communicate with each other more effectively.

SeamlessM4T will be released publicly under the Meta Research License, allowing researchers and developers to build on this work. They are also releasing the metadata of SeamlessAlign, the largest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignment.

Meta’s speech and text translations can be helpful for many people as they can translate into more than 1100 languages. With the help of SeamlessM4T, it will be easy to communicate with people who speak different languages. SeamlessM4T is yet to be released.