How to make friends in the Metaverse? Meta Releases Cross-Language Communication Speech Model to Support 128 Languages ​​Barrier-Free Dialogue

After changing its name to Meta, Facebook’s vision of the Metaverse is coming to fruition. This time, Facebook has set its sights on social Metaverse.

How to make friends in the Metaverse? Meta Releases Cross-Language Communication Speech Model to Support 128 Languages ​​Barrier-Free Dialogue

Meta releases speech processing model XLS-R

Recently, Meta officially released XLS-R, a new set of self-supervised models for various speech tasks. It is reported that XLS-R is trained from massive public data (the amount of data is ten times that of the past), which can more than double the language support of traditional multilingual models. Currently, XLS-R supports a total of 128 languages.

Meta believes that voice communication is the most natural form of interaction people have. “With the development of voice technology, we have been able to directly interact with our own devices and the virtual world of the future through dialogue, thereby merging the virtual experience with the real world.”

This coincides with Zuckerberg’s previous statement that “the company’s business will be Metaverse first”. Previously, Zuckerberg had outlined his plans for a “metaworld”: a digital world built on top of our own, including virtual and augmented reality. “We believe that the Metaverse will take over the mobile Internet.”

And XLS-R, as an indispensable part of the Metaverse social, can help people with different native languages ​​to have a barrier-free conversation in the Metaverse.

It is worth mentioning that in order to achieve a wide range of speech understanding capabilities in multiple languages ​​through a single model, Meta has fine-tuned XLS-R to obtain functions such as speech recognition, speech translation, and language recognition. According to reports, XLS-R has achieved good results in BABEL, CommonVoice and VoxPopuli speech recognition benchmarks, CoVoST-2 foreign language to English translation benchmarks, and VoxLingua107 language recognition benchmarks.

In order to reduce the threshold for functional access as much as possible, Meta and Hugging Face have jointly released the model ontology, which is fully open through the fairseq GitHub repo.

Trial address:

How XLS-R works

According to reports, XLS-R has received more than 436,000 hours of public speech recording training on the wav2vec 2.0 training set, thereby realizing a self-supervised learning method for speech expression. This amount of training has reached 10 times that of the then-strongest model, XLSR-53, released last year. Leveraging multiple sources of speech data, from meeting recordings to audiobooks, XLS-R’s language support expands to 128, covering nearly 2.5 times more languages ​​than the previous model.

As the largest model ever built by Meta, XLS-R contains over 2 billion parameters and outperforms other similar models.Meta said that it turns out that more parameters can more fully reflect the various languages ​​in the data set. In addition, Meta found that larger models also outperformed other smaller models for single-language pre-training.

Meta evaluated XLS-R on four main multilingual speech recognition tests and found that it outperformed previous models in 37 languages. The specific test scenarios are: 5 languages ​​are selected from BABEL, 10 languages ​​are selected from CommonVoice, 8 languages ​​are selected from MLS, and 14 languages ​​are selected from VoxPopuli.

How to make friends in the Metaverse? Meta Releases Cross-Language Communication Speech Model to Support 128 Languages ​​Barrier-Free Dialogue

Word error rate benchmark results on BABEL. The XLS-R is a significant improvement over the previous generation model.

In addition, Meta evaluates speech translation models, i.e. the direct translation of audio recordings into another language.To build a set of models capable of performing multiple tasks, Meta fine-tuned XLS-R simultaneously in several different translation directions of the CoVoST-2 benchmark, enabling it to implement content between English and up to 21 languages Interpretation.

Significant performance gains were achieved when using XLS-R to encode languages ​​other than English, a major breakthrough in the field of multilingual speech representation. According to Meta, XLS-R achieves significant improvements in low-resource language learning, such as Indonesian-to-English translation, where BLEU accuracy doubles on average. The improvement of the BLEU index means that the automatic translation results given by the model have a higher degree of coincidence with the manual translation results that process the same content, which represents a big step forward in improving the spoken translation ability of the model.

How to make friends in the Metaverse? Meta Releases Cross-Language Communication Speech Model to Support 128 Languages ​​Barrier-Free Dialogue

Automatic speech translation accuracy measured by the BLEU metric, where higher values ​​indicate that XLS-R translates from high-resource languages ​​(e.g., French, German), medium-resource languages ​​(e.g., Russian, Turkish) accuracy when translating voice recordings to English.

Meta believes that XLS-R proves that scaling up cross-language pre-training can further improve understanding performance in low-resource languages. Not only does it improve speech recognition, it also more than doubles the accuracy of foreign-language-to-English speech translation.

“XLS-R is an important step towards our goal of understanding many different languages ​​(speech) with a single model, and represents our best effort to advance multilingual pre-training with public data. We strongly believe that this is A correct direction of exploration will allow machine learning applications to better understand all human speech, and promote follow-up research, greatly reducing the threshold for the use of speech technology globally, especially in underserved communities. We will continue to develop new methods , expand the language understanding ability of the model through low-supervised learning, gradually make it cover more than 7,000 languages ​​in the world, and realize the continuous update of the algorithm.” Meta mentioned.

Posted by:CoinYuppie,Reprinted with attribution to:
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-01-28 08:42
Next 2022-01-28 08:43

Related articles