Give one line to the face of a secondary wife.
The next line will make them all smile: the
Or for pooper scoopers, turning tigers and cats into ? , turn dogs into wolves? , and also turn tigers into cats:.
A this, wait a minute, let me run through: tiger ->> cat ->> dog ->> wolf, does this not mean that tiger == wolf?
For this, perhaps the majority of netizens have long been surprised, after all, Su Daqiang can also be transformed into Daniel Wu: the
Yes, as you guessed, today AI Technology Review introduces you a GAN model that can learn to edit face attributes by manipulating latent space semantics – L2M-GAN.
This is a model proposed by Mr. Lu Zhiwu’s lab team at the High Tall AI School of Renmin University, and the paper has been accepted as Oral by CVPR 2021 with the title: “L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing”.
The goal of face attribute editing tasks is to manipulate the semantic properties of real face pictures, which have a wide range of real-world applications such as entertainment, assisted psychotherapy, and data enhancement, among others. With the development of deep generative models, most of the recent work is based on GAN (Generative Adversarial Network). A major challenge for existing face attribute editing models is to satisfy two requirements simultaneously.
(1) correctly modifying the desired attributes and (2) preserving other irrelevant information. However, because of the various relationships between different attributes and between attributes and identity information, it is likely that modifying one attribute will inadvertently lead to changes in other features, which makes it difficult to satisfy both attributes at the same time.
To satisfy these two conditions, some recent approaches use spatial attention approaches. Such approaches assume that each attribute has a corresponding local region in which the attribute operations of the picture can be restricted. They learn to model this region through the attention module in the network, and once this region is determined, they can use mask and residual sum methods to achieve editing only in part of the region.
However, this assumption is not satisfied for all attributes, such as gender, smile, etc., which correspond to regions that basically cover the whole face and overlap with other attribute regions. Therefore, this type of model does not work well when manipulating these attributes. Another part of the approach focuses on decomposing the hidden variables in the hidden space learned by the GAN to obtain the attribute related vectors by decomposition. Given a pre-trained GAN model, they map the original vector to the vector expressing the corresponding attributes by learning sub-maps.
However, such methods still suffer from two problems.
(1) they rely on the hidden space provided by the pre-trained GAN model and do not retrain the model. The hidden space of such models that are not retrained end-to-end may be a sub-optimal hidden space.
(2) Such methods tend to decouple between only a few labels provided by the dataset, but there are many more information that are not included in these predefined labels that need to be decoupled, such as illumination information and identity information.
To overcome these limitations, a new hidden space decomposition model, L2M-GAN, is proposed in this paper.
The model is trained end-to-end and learns to decompose the hidden vector explicitly into attribute-related and attribute-independent vectors to achieve decoupling of relevant attribute information from other information. Similar to such previous approaches, we also decouple the variables in the hidden space based on the attribute labels, but the difference is that we explicitly decompose them into attribute-related and attribute-independent vectors, instead of just decoupling the two predefined attributes.
Before introducing our method, we define the concept of “domain”. A “domain” is a combination of the values of certain attributes. For example, if you want to edit the attribute , there are four “domains”, , , , and . Given an input image and its corresponding domain, as well as the target domain, our goal is to synthesize an image belonging to the target domain while preserving the domain-independent information of the input image.
As shown in the figure below, our proposed model consists of three components: a style encoder, a style converter, and a generator.
In a multi-task learning setup, our style encoder consists of output branches from multiple domains. For the convenience of explanation, only the output of one domain is represented in the figure above.
The style encoder is a key component of L2M-GAN, which consists of two components: a decomposer and a domain converter. Among them, the decomposer decomposes the domain-independent (attribute-independent) vector from the original hidden vector one click to make the quadratic wife smile and also turn the cat face into a dog face, this CVPR paper is really interesting, and then by subtraction can get the domain-dependent (attribute-dependent) vector. Because our goal is to modify the target attributes into the target domain, while other irrelevant information will not be modified. This situation will occur when and only when and one click makes the quadratic wife smile and also turns the cat face into a dog face, this CVPR paper is really interesting is mutually perpendicular and the modified vector is also mutually perpendicular to one click makes the quadratic wife smile and also turns the cat face into a dog face, this CVPR paper is really interesting.
For this reason, we introduce a vertical loss to restrict these two vectors. It is worth noting that while previous approaches use vertical loss to decouple the two attributes, L2M-GAN uses this loss to separate the attribute-related information from all other irrelevant information. This is crucial for the requirement of retaining other information in the attribute editing, since the other attribute labels do not cover all irrelevant information. After obtaining the domain dependent vector, L2M-GAN converts it to the target domain by means of a domain converter to obtain a domain dependent vector representing the information of the target domain. Its domain-irrelevant vectors can make the quadratic wife smile with one click, and can turn a cat face into a dog face, this CVPR paper is really interesting to add up to get the edited hidden vector.
The generator takes an input image and an edited hidden encoding as input to generate a target domain image, which contains the target domain information and other domain-independent information of the input image. Similar to StarGAN V2, our generator also uses the Adaptive Instance Normalization (AdaIN) structure to fuse the style information contained in the hidden encoding into the input image.
We conduct experiments on the widely used CelebA-HQ dataset. We divide CelebA-HQ into 27176 training images and 2824 test images based on the division of CelebA and the correspondence between CelebA and CelebA-HQ images.
We compare our method with several other state-of-the-art methods. The experiments in the main text are on the specific attribute “smile”, while the results for other more attributes are shown in the Appendix. It is worth noting that the “smile” attribute is the most challenging attribute among the 40 labels given in the dataset, because it involves multiple parts of the face at the same time, and adding and removing smiles requires the model to have an advanced semantic understanding of the input image so that it can modify multiple components of the face image at the same time without changing other information.
As we can see from the visualization results, StarGAN and CycleGAN tend to generate blurred and distorted results around the mouth, so they do not edit the corresponding attributes properly in most of the generated images. decoupled.
PA-GAN is a spatial attention-based approach, so it preserves some irrelevant information, such as the background, better, but it can be seen that the “smile” attribute, which is difficult to define a definite modification area, is usually under-modified and thus cannot be edited correctly. InterfaceGAN* is able to generate high quality images, but still does not do well in some details, such as the generation of eyes and mouths. Also its sometimes modifies the identity information of the input images, which is because it only considers the decoupling between attributes, but not other information such as identity.
In terms of quantitative results, we mainly used FID and attribute operation accuracy to evaluate the quality of the synthesized images and the resultant correctness of attribute editing, respectively. It can be seen that the FID is lower than PA-GAN in the result of eliminating smiles, except for the result of eliminating smiles, which exceeds all the latest results, while PA-GAN achieves higher image quality at the cost of inadequate modifications.
In addition to the above results, our model shows other capabilities, including: control of the strength of editing attributes, simultaneous modification of multiple attributes, and migration of unseen images.
Because the hidden space after training is a continuous space with learned semantic information, the semantic information about the target domain represented by the synthesized pictures will gradually increase and the semantic information about the original domain will gradually decrease as we linearly put the transformations to, and because we are interested in, with one click to make the secondary wife smile, and also to turn the cat face into a dog face, this CVPR paper really interesting vertical restrictions that This process does not change other irrelevant information. This process can be expressed as follows.
We can control the intensity of the relevant attributes of the synthesized images by controlling the hyperparameters.
In addition, our L2M-GAN model uses StarGAN V2 as the skeleton network, so it can naturally perform the task of multi-attribute editing.
In this paper, we also tested the generalization ability of our model using images outside the dataset. It can be seen that our model, after being trained on the real face dataset CelebA-HQ, can also be tested directly on the anime dataset with a relatively large difference in distribution to achieve good attribute editing and synthesize high quality images.
Also, to further validate the effectiveness of our model structure, we also train on the non-face animal dataset AFHQ. As can be seen from the visualization results, our model achieves good attribute editing and generated image quality on the non-face dataset as well. This further validates the effectiveness and generalization of our approach.
We have proposed a new face attribute editing model based on hidden space decomposition.
Our proposed model L2M-GAN is the first end-to-end face attribute editing model based on hidden space decomposition, which can efficiently edit both local and global attributes. This is made possible by the proposed new style converter that decomposes the hidden vector into attribute-related and attribute-independent parts, and imposes vertical constraints on the vectors before and after the transformation. Numerous experiments demonstrate the significant improvement of our proposed L2M-GAN over other existing methods.
In addition, this paper has been open-sourced, welcome to try more and have a star~
Open source link: https://github.com/rucmlcv/L2M-GAN
Address of the paper:
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/one-click-to-make-secondary-wives-smile-and-turn-cat-faces-into-dog-faces-this-cvpr-paper-is-really-interesting/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.