Prior to the iOS15 system update, there is a feature that has attracted a lot of attention, and that is the built-in image translation.
The user can directly select the text to be translated in the photo and view the translation result. It is said that Apple has also updated the translation model to improve the translation effect.
This is exciting. If the emergence of neural machine translation a few years ago was a nightmare for many human translations, then the introduction of the built-in image translation function of this system might make many professional translation software sleepless.
Take, for example, people like me who need to read English information frequently. I often encounter pdf documents and pictures, which require third-party software to take pictures and translate them. If iOS15 can directly recognize pictures, don’t you omit a few steps? Simply lazy gospel.
So the question is, whether the translation is good or not, the focus is on “curative effect”. Is the image translation capability of iOS15 strong enough to replace professional translation software? You know, Faithfulness of the Chinese expression has been a problem in the field of NLP, a gene with Silicon Valley technology companies, can really complete the authentic English mutual translation?
In the attitude of real knowledge through practice, we prepared a number of exam questions and selected Youdao Dictionary APP, which is relatively prominent in word-of-mouth and user numbers, to participate in the horizontal evaluation, in order to explore the true level of iOS15 image translation.
Truth and Dare: Three Steps to Picture Translation
There are many professional evaluations for translation functions. Various indicators such as short, medium and long sentences have detailed evaluation standards and specifications. However, as ordinary users, we decided to conduct evaluations from the specific scenes and steps in which everyone uses image translation daily.
Generally speaking, for the text in a picture to be accurately translated, at least three abilities are required:
The first step: “Keep your eyes right.”
If image translation wants to meet the needs of users, the first test is not NLP technology, but OCR capabilities. Only accurate recognition can lay the foundation for subsequent translation. The key evaluation index of this ability is the word accuracy rate.
From an operational point of view, Apple iOS15 uses a built-in method, you can directly select the text to be translated in the photo, and view the translation result; Youdao Dictionary needs to turn on the photo recognition function in the app. The former is more convenient to use. But when it comes to the recognition link, iOS15 is a little bit hip.
We found an English short sentence, an English long sentence and a Chinese long sentence. The results show that there is not much difference between Apple and Youdao in terms of English word accuracy.
For example, Youdao recognized the original text “Do me a favor, can you look for my credit card. I don’t find it.” 100%
iOS15的结果是：Do me a favor.，can you look for my credit card,I don find it.
Although Apple recognizes don’t as don, it does not affect reading and the accuracy is still acceptable.
Change the English long sentence to test, the following picture, youdao recognition result is:
One bad chapter doesn’t mean my story is over until you find a new chapter which you think it’s right，达到了98.96%的字准率。
The result of iOS15 is:
One bad chapter t mean my story is over until you find chapter which you think it’l S right。
Recognizing it’s right as it’l S right may affect subsequent semantic understanding.
When it came to the Chinese character accuracy test, Youdao and Apple opened the gap. For example, the picture below:
Youdao is 100% fully recognized. Apple iOS15 does not recognize the rain of “rain” and the one of “a series”. The three words “memorial ancestor” in the penultimate paragraph are also omitted, which directly affects the reading experience. And users understand.
In general, there is not much difference between the accuracy of English characters, and Youdao is slightly better. In terms of accuracy of Chinese characters, Youdao can achieve more than 90% accurate recognition, and Apple iOS15 has only 79%, which has obvious advantages.
The reason for this gap may be that Youdao Dictionary has accumulated more image translation.
As early as 5 years ago, Youdao began to try to develop image translation functions, and subsequently provided related capabilities to many mainstream mobile phone manufacturers through Youdao Zhiyun. Many users will use them under various lighting conditions and various usages, so they have accumulated a lot of training corpus, through continuous iterative analysis paragraphs, pictures detection, image shift angle detection, language detection algorithm, OCR capability naturally get targeted optimization.
In addition, Youdao, as a Chinese company, has a deeper understanding of Chinese native language, and Apple’s iOS15 has just begun to be widely promoted. It is understandable that there are still insufficient Chinese recognition in real-world scenarios.
The second step: the heart is like a mirror, “understand”.
After the image text recognition is completed, neural machine translation is needed to convert it into the corresponding translation. Both Chinese and English are relatively rich languages, so their comprehension ability is also higher.
So we chose two more detailed points to investigate:
One is tense.
The original text “Yes, go out to play today” contains the meaning of plan.
Youdao translates as “Yes.we’re going out today”;
Apple’s translation is: Yes.go out today.
Obviously, Youdao adopted the “be going” general future tense, and understood the intention of the original text more accurately, expressing the plan, arrangement, and intention to do something. Apple’s translation failed to reflect the state of the plan.
The second is singular and plural.
The singular and plural of English words often bear completely different interpretations. If they cannot be accurately identified, the translation may be different from the original meaning.
For example, this “1200 square”, Youdao dictionary translates as “1200 square”, the translation of Apple iOS15 is “1200 squares”.
In the singular state, square refers to the unit of square. Apple’s translation is easy to cause ambiguity among readers.
Of course, in terms of overall understanding, Youdao and Apple’s Chinese-English translation level can meet basic reading needs.
For example, this long sentence:
He puts down $20,000 as a deposit on the beautiful $200,000 villa believing that his investment would increase over time.
Youdao’s translation: He paid a deposit of US$20,000 for a beautiful villa worth US$200,000, and believes that his investment will increase over time.
Translation of iOS15: He saved $20,000 as a deposit for this beautiful $200,000 villa, and believes that his investment will increase over time.
At present, Youdao and iOS15 have basically shown strong comprehension ability in the Chinese translation of image translation. There are gaps in the insights into the details of some word usage and expression habits. Behind this lies the differences in corpus accumulation, model selection, and performance optimization.
The third step: the tongue is bright and the lotus flower, “talking to people”.
For Chinese translation, the golden index for many people is “faithfulness, elegance, and elegance”, which means that the translation must be accurate and not deviate from the original text; it must be fluent and grammatical in accordance with habit; it must also be elegant, authentic and rich in literary style.
With the development of neural machine translation today, can it meet this requirement? Youdao and Apple, two translation platforms with different language genes, happen to be tricked.
Let’s first give a sub-question:
Original: You charged me 80 yuan;
Youdao translation: You charged me 580;
iOS15 translation: You received me 80 yuan.
“Receive money” is charged. Youdao’s translation is more in line with the English expression. Apple directly translates “received” into receive, which is not authentic enough.
Try another long sentence:
原文：After the accident，I felt myself another person。
Youdao Translation: After the accident, I felt like I was a different person;
iOS15 translation: After the accident, I felt like another person.
Apple directly translates “another person” as “another person” instead of expressing a change in mentality, which is prone to ambiguity. Youdao translates “another person” as “a person”, which is more accurate and colloquial.
Of course, the problem of overly literal translation can also be committed. For example, in the picture below, the original text: In conclusion, drawing on the electronic media or printed books might be a good approach to understand different places or countries.
Youdao translated as: In short, using electronic media or printed books may be a good way to understand different places or countries;
Apple translated as: In short, using electronic media or printed books may be a good way to learn about different places or countries.
After iOS15 adjusts the word order, the expression is more appropriate and natural, while Youdao appears to be literally translated according to the corresponding mode.
However, this time the test questions are mainly based on life travel and cultural exchange scenes. The translation effect of proper nouns needs further investigation.
In addition, because Apple’s iOS15 has a low word accuracy in the first OCR recognition, this will directly affect the subsequent text understanding, so some of Apple’s translation results are not referential, and its translation level cannot be jumped on.
It can be seen from the evaluation that a Xindaya image translation relies on the integration of multiple technologies and requires multiple capabilities such as OCR, word segmentation, semantic understanding, context memory, and topic extraction.
Therefore, there is still a long way to go for the fledgling Apple system-level image translation to replace professional translation software. However, Youdao also has some general problems with machine translation. As a professional translation software, it can continue to strengthen its professional barriers .
This also triggered our thinking, why with AI, neural machine translation is still not comparable to human translators?
The disparity between ideal and reality: neural machine translation dancing in shackles
When neural networks were first introduced into machine translation, they were regarded as invincible artifacts. But a few years later, this amazing technology is indeed much better than traditional statistical machine translation, but it is still far from the level of human translators.
Take the horizontal evaluation of Apple iOS15 and Youdao Dictionary for example, both of them have some shortcomings. Simply put, the reasons may be in several aspects:
1. The OOV (Out of Vocabulary) problem is difficult to solve.
Machine translation models based on deep neural networks require massive data learning. If the amount of data is relatively small, the word vector quality of the words with very few occurrences will be lower. In actual applications, too many unregistered words will lead to mistranslations and missed translations. At present, some vertical fields have scarce data and insufficient corpus, especially Chinese with tens of thousands of characters, many of which are rare characters, which will affect the performance of the model and the quality of the translation.
To solve this, we can only rely on “stupid effort” to accumulate data. According to the technical staff of Youdao Dictionary, there is no particularly good way to do a good job in Chinese recognition. You can only continue to accumulate data and iterate the algorithm. Youdao has done a lot of work in the past few years.
2. Algorithm optimization and innovation are waiting for breakthroughs.
Different languages and cultures have different text expressions, logical structures, information redundancy, and grammatical structures. There are a lot of “information asymmetries”. It is not surprising that mistranslations occur in the process of “coding and decoding”.
In the book “Cultural Translation Compendium”, it is mentioned that the translation is a mixture of “original + original cultural background + translation + target cultural background + original author’s temperament and style + translator’s temperament and style”.
To understand the “hidden attributes” such as the culture, temperament, and style behind it can only be achieved through technological iteration and innovation. For example, Youdao allows users to provide additional custom dictionaries to precisely adjust the local results of neural network machine translation and solve the problem of proper noun translation;
The industry has also begun to try to introduce multi-modal translation to assist the understanding of the text through the characteristics of other things in the picture. For example, if the machine translation only sees the word GATE, it may be simply translated as “door”, but if the picture shows that this is a ticket or the background is an airport, then it will be translated as “boarding gate.” Would be more appropriate.
3. There is no shortcut for subdivision scene adaptation.
With the popularity of machine translation, users have put forward more detailed requirements for translation quality, such as cascading problems that may be caused by a small number of recognition errors in the front link during image translation; web page translation must not only provide correct translation, but also Try to maintain the consistency of the original web page style as much as possible; during document translation, the names of people, places, organizations, or professional terms may appear multiple times, and how to keep the context consistent; on lower-configured end-side devices, provide fast Good translation experience… Various problems in different scenarios require targeted optimization.
Take picture translation as an example. Image recognition in natural scenes is very complicated and often works well in the laboratory. However, users will take photos of all kinds of weird things under various lighting conditions. After recognition, they need to judge which ones. A word belongs to a sentence, which sentence is a paragraph, and how the translated result should be presented. It is understood that Youdao has also been optimized for a long time, and the robustness of the translation model has been enhanced on the algorithm side. Even if some unavoidable text recognition errors are encountered in the actual environment, it still has stable performance.
From this perspective, if new application scenarios such as image translation want to bring subversive changes to the user experience, it requires not only innovation in the laboratory, but also attention to and resolution of the specific technology encountered in the application scenario. Problem, to optimize the final experience, there is no shortcut.
From the “witchcraft” that was exclaimed by human translators to grab jobs, to the daily applications that frequently appear in reading and learning entertainment scenes, the speed of neural machine translation technology landing in reality is beyond many people’s imagination.
For these practical technologies, application breakthroughs are simpler and more urgent than theoretical breakthroughs. This is why we want to talk about the “small” function of image translation at this time.
With the gradual restart of global exchanges, it has become more and more inevitable to understand cross-lingual information in a more natural and real-time manner. Image translation is of indispensable value for travel, professional reading, and access to the Internet for barrier-free people. This is also the value of Apple, Youdao and other industrial forces. Through data, interaction and feedback from the real environment, iterate and promote the progress of neural machine translation.
At present, it seems that it is too early for the system-level image translation of Apple iOS15 to replace professional translation software. In fact, it is foreseeable that the two will not replace each other in the future, but will complement each other in their respective applicable scenarios and different demand intensities. , Each is good at their strengths.
Every practice from the industry has pushed technology forward. Sand builds up into a tower, and one day humans will be able to climb over the “Tongtian Pagoda” that hinders language communication.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/ios15-launches-the-image-translation-function-can-it-replace-professional-translation-software/ Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.