AI imitates your notes and only requires one word.
You write a word on paper, and AI can imitate your handwriting just by looking at it, and it still looks flawless.
Facebook recently announced a new image AI-TextStyleBrush, which can copy and reproduce the text style in images.
With this technology, you only need to enter a word as the “standard”, and the AI can imitate your writing style throughout the article, with one-click execution, and the effect is amazing.
In addition, you can also use it to replace text in different scenes (such as posters, trash cans, road signs, etc.). The left side of the figure below is the original scene image, with words displayed in blue rectangles; the right side is the image after text replacement.
As can be seen from the figure, font AI of various styles can almost hold. In the figure below, each image pair shows the input source style on the left, and the new content (string) on the right. The fonts on the left and right ends look exactly the same. Compared with the source image, the output image seems a bit blurry in appearance, but we can see that in most cases, the technology seems to work very well.
Compared with other handwriting imitating AI, TextStyleBrush is more powerful and can analyze text styles from a more subtle perspective, so that handwriting can be imitated under various angles and backgrounds.
The following figure is the implementation process of replacing the soy sauce bottle (Soya) with the tea bottle (Tea):
This powerful imitating artifact is the “TextStyleBrush” launched by Facebook AI. You only need to enter a word to perfectly reproduce the handwriting. The principle of this technology is similar to the style brush tool in a word processing APP, which can separate text and style.
Only one word is needed to copy the style of the text in the photo. Using this AI model, you can edit and replace the text in the image.
Unlike most AI systems, TextStyleBrush is the first self-supervised AI model that uses a single example word to replace text in handwriting and images at once.
In the future, it will release new potential in areas such as personalized information and subtitles, such as realizing realistic language translation in augmented reality (AR).
By publishing the capabilities, methods, and results of this research, researchers hope to promote dialogue and research to discover potential applications of this type of technology, such as deep vacation text attacks-this is a major challenge in the field of artificial intelligence.
Since TextStyleBrush may also be used to make misleading images, Facebook’s CTO stated on their personal social networking site that they only published papers and data sets, but did not disclose the code. And said that just like our approach to deepfakes, we believe that sharing research and data sets will help build detection systems and prevent attacks in advance.
TextStyleBrush that can learn the characterization of text style
The use of AI to generate images has been developing at an astonishing speed. This generation technology can reproduce historical scenes or turn photos into painting styles such as Van Gogh. Now, Facebook AI has established an AI that can replace scenes and handwritten text styles, requiring only one word as input.
Although most AI systems can accomplish well-defined and specialized tasks, building an AI system that is flexible enough to understand the nuances of text and handwriting in real-world scenarios is a great challenge. This means that you need to understand many text styles, including not only different fonts and writing styles, but also different conversions, such as rotation, curved text, and image noise.
Facebook AI proposed the TSB (TextStyleBrush) architecture. The architecture is trained in a self-supervised method, without the use of target style supervision, only the original style image is used. The framework can automatically find the true style of the picture. During training, it assumes that each word box has a true value (text that appears in the box); during inference, it uses a single source style image and new content (string), and generates a new source style with the target content image.
The generator architecture is based on the StyleGAN2 model. However, it has two important limitations:
First of all, StyleGAN2 is an unconditional model, which means that it generates images by sampling a random latent vector . But the TextStyleBrush must generate an image of the specified text.
Secondly, the text image style generated by TextStyleBrush is not controlled. Text style involves global information (such as palette and spatial transformation), as well as a combination of fine-scale information (such as subtle changes in a single handwriting).
Researchers adjust the generator through content and style representations to address the above limitations. The multi-scale characteristics of text style are processed by extracting layer-specific style information and injecting it into each layer of the generator. In addition to generating the target image in the desired style, the generator also generates a soft mask image that represents the foreground pixels (text area). In this way, the generator can control the low-resolution and high-resolution details of the text to match the desired input style.
The study also introduced a new self-supervised training criterion that uses typeface classifiers, text recognizers, and adversarial discriminators to preserve the source style and target content. First, the researchers evaluated the generator’s ability to capture the style of the input text by using a pre-trained font classification network. In addition, they use a pre-trained text recognition network to evaluate the content of the generated image to reflect the effect of the generator in capturing the target content. All in all, this method can effectively self-supervise training.
Table 2 provides the results of ablation experiments that evaluate different loss functions, style feature extensions, and the role of masks when training TSB. Experimental results show that the MSE (synthesis error) of the pictures generated by TextStyleBrush is greatly reduced, and the PSNR (peak signal to noise ratio) and SSIM (structure similarity) are both improved.
Table 3 shows the accuracy of text recognition measured on the images of the three data sets. The experimental results show that the recognition effect of TSB is the best, the recognition accuracy rate on IC13 is 97.2%, the recognition accuracy rate on IC15 is 97.6%, and the recognition accuracy rate on TextVQA is 95.0%.
Table 4 provides a quantitative comparison of the generated handwritten text, comparing the TSB method with the SotA method specially designed for generating handwritten text by Davis et al. . The lower the FID score, the better the generation quality. Obviously, the TSB method is superior to previous work.
TextStyleBrush proves that AI can recognize text more flexibly and accurately than in the past. However, this technology still has many problems, such as the inability to imitate characters or colored characters on metal surfaces. Facebook hopes that this research can continue to expand and break through translation, Obstacles between autonomous expression and deepfake research, etc.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/just-one-word-can-imitate-your-handwriting-the-ai-%e2%80%8b%e2%80%8bof-facebook-is-so-powerful-that-it-dare-not-open-source-code/ Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.