With 15 billion parameters, Meta has refined a super-large-scale protein prediction model, whose calculation speed is ten times faster than that of AlphaFold2.
Just a few days ago, ESMFold , the protein prediction model with the most parameters and the largest scale so far, was officially announced by Meta, and some researchers even claimed that the model is big and good enough to crush Google’s AlphaFold2 launched in 2021.
▲ESMFold and corresponding author Alexander of Meta AI
This news really shocked the academic and industrial circles. It is important to know that these large models must have proper “money capabilities”, whether they are trained or used. If the models are getting smaller and smaller, maybe they don’t need bigger calculations. Power chip. (Of course this is not the case) Even LeCun Daniel tweeted to endorse ESMFold, calling it ” Super-fast and accurate“.
Predicting protein structures from amino acid sequences has long been a major challenge in the natural sciences . Among evolutionary-based algorithms, AlphaFold2 is arguably the most successful solution to this problem so far. It achieves a breakthrough by training end-to-end neural networks on multiple sequence inputs, evolutionary homolog-aligned sequences, and optional structural templates, greatly accelerating the construction of a “living Metaverse.”
▲The evolution of protein prediction AI large model
The ESMFold protein model of the Meta team, on the other hand, requires only a sequence as input. The team behind the model is led by Alexander Rives , a senior research scientist at Meta AI (formerly Facebook AI). The team focuses on unsupervised representation learning models for large-scale protein sequence and structural data. Alexander himself is also the co-founder of Fate Therapeutics, Syros Pharma, and Kallyope, and a proper technology creator.
Can ESMFold really crush AlphaFold2? Let us first review what is protein structure prediction, and then analyze the network structure of ESMFold in depth.
First, protein structure refers to the spatial structure of various protein molecules. Proteins composed of linear amino acids need to be folded into a specific spatial structure to have corresponding physiological activities and biological functions.
▲The quaternary structure of protein
The molecular structure of proteins can be divided into four levels to describe the characteristics of their different levels:
Protein primary structure: The linear amino acid sequence that makes up the polypeptide chain of a protein.
Protein secondary structure: a stable structure formed by hydrogen bonds between C=O and NH groups between different amino acids, mainly α-helix and β-sheet.
Protein tertiary structure: The three-dimensional structure of a protein molecule formed by the arrangement of multiple secondary structural elements in three-dimensional space.
Protein quaternary structure: used to describe the interaction of different polypeptide chains (subunits) to form functional protein complex molecules.
What we call protein structure prediction refers to predicting the three-dimensional structure of a protein from its amino acid sequence. That is, the folding and secondary, tertiary, and quaternary structures of a protein are predicted from its primary structure.
In the protein structure prediction competition CASP 14, AlphaFold2 of DeepMind (under Google), the prediction of most protein structures is only one atom wide from the real structure, reaching a level close to the detection of complex instruments such as cryo-electron microscopy.This huge advance has been selected as one of the top ten scientific breakthroughs in 2021by Nature and Science .
The number of configurations a protein can fold into is an astronomical number based on different amino acids and sequences, making it difficult to accurately predict protein structures using conventional methods. For example, current experimental methods (such as cryo-EM) have so far been able to solve 100,000 protein structures.
▲ Cryo-EM and its images
Therefore, using AI methods to accelerate the analysis of protein structure and analyze its composition and function has become an important work in the biological and medical fields .
02 “Magic Effect” by ESMFold
ESMFold has comparable accuracy with AlphaFold2 and RoseTTAFold for protein structure prediction from multiple sequence inputs. However, ESMFold’s outstanding advantage is that its computational speed is an order of magnitude faster than that of AlphaFold2 , and it can explore the structural space of proteins on a more efficient time scale.
In the past, AlphaFold2 and RoseTTAFold have achieved breakthrough success in the problem of atomic-resolution protein structure prediction, but rely on the use of Multiple Sequence Alignment (MSA) and templates of similar protein structures to achieve optimal performance.
▲ESMFold model has higher speed than AlphaFold2
ESMFold uses the information and representation learned by ESM-2 to perform end-to-end 3D structure prediction, especially using only a single sequence as input (AlphaFold2 requires multi-sequence input), which is convenient for researchers to use model scaling and control the model size within Millions to billions of parameters. It is important to note that a continuous improvement in prediction accuracy is observed as the model size increases. (Or “bigger is more accurate”)
▲The ESM-2 model increases in accuracy with the increase of the parameter quantity
Since ESMFold’s prediction speed is an order of magnitude faster than other existing atomic-resolution protein structure prediction models, ESMFold can help rapidly construct protein structure databases. Using ESMFold, one million predicted structures can be rapidly calculated, representing distinct subsets of the protein prediction space, most of which have no annotated structure or function.
Moreover, most of ESMFold’s high-confidence predictions had low similarity to known experimental structures, indicating the structural novelty of genomic proteins obtained by AI computation .
Notably, many high-confidence structures also share low sequence similarity with those in UniRef90, indicating that the model has generalization capabilities beyond its training dataset, enabling structure-based prediction of protein function.
Based on this, the researchers believe that ESMFold can help understand protein structures that go beyond what is currently known.
▲ESMFold’s prediction accuracy is significantly better than AlphaFold2 in single sequence input
Although ESMFold has high speed and good accuracy, especially in single-sequence input, the accuracy is significantly better than AlphaFold2. But we must also see that in the case of multi-sequence input, the accuracy of ESMFold is still slightly lower than that of AlphaFold2.
03 ESMFold network structure
Similar to the AlphaFold2 model, the architecture of the ESMFold model can also be divided into four parts: the data parsing part, the encoder part (Folding Trunk), the decoder part (Structure Module), and the cycle part (Recycling).
A key difference between ESMFold and AlphaFold2 is the use of language model representations to remove the requirement for explicit homology sequences (in the form of MSAs) as input .
The language model represents the folded backbone provided as input to ESMFold. Evoformer in AlphaFold2 is simplified by replacing the computationally expensive Folding Block module that handles MSA with a Transformer module that handles sequences. This simplification or optimization means that ESMFold will be much faster than MSA-based models.
▲Comparison between ESMFold and AlphaFold2
Using MSA and templates in AlphaFold2 and RoseTTAFold causes two bottlenecks.
First, it may be necessary to retrieve and align MSAs and templates based on CPU. This is due to the fact that AlphaFold2 and RoseTTAFold are not 2D sequential embedding states, but operate on the 3D internal states of the MSA using axial attention, which is computationally expensive even with GPUs.
In contrast, ESMFold is a fully end-to-end sequence structure predictor that runs entirely on GPUs without access to any external databases.
For example, on a single NVIDIA V100 GPU, ESMFold with fewer parameters predicts a protein with 384 residues in 14.2 seconds, which is 6 times faster than a single AlphaFold2 model. And on shorter sequences, we even see ~60x improvement.
The order-of-magnitude improvement in speed is a unique advantage of ESMFold over AlphaFold2 , allowing us to build a large number of predictive structures on a shorter timescale than existing methods. This is especially important given the scale of available sequence data.
For example, the initial version of the AlphaFold2 protein structure database was published with about 360,000 predicted structures, and as of July 2022 contained about 995,000 predictions, which is orders of magnitude smaller than many current protein sequence databases.
04 In-depth analysis of data parsing part and decoder
The data parsing part is used for parsing the input sequence and database, providing input to the encoder.
In the AlphaFold2 model, the amino acid sequence database and the structure database were used in the data analysis part, which were used for the alignment of similar sequences and the pairing of structural templates, respectively.
Bioinformatics is based on the assumption that sequences are similar, structures are similar, and functions are similar . It is generally believed that similar sequences or similar structures will lead to similar functional domains.
1) The sequence database is used for Multiple Sequence Alignment (MSA), that is, the sequence database is searched for database sequences that are close to the input sequence.
2) The structure database is used for structure matching to find known structural templates that are close to the structure of the input sequence.
The results of the sequence alignment and structural alignment are then transmitted as input to the encoder part.
▲The structure comparison of ESMFold Folding Block and AlphaFold2 Evoformer
The decoder part is the Folding Trunk, with a total of 48 layers.
A key difference between ESMFold and AlphaFold2 is that ESMFold uses a language model representation that eliminates the need for explicit homology sequences (in the form of MSAs) as input.
ESMFold simplifies the Evoformer in AlphaFold2 by replacing the computationally expensive network module that handles MSA with a Transformer module that handles sequences. This simplification means that ESMFold is significantly faster than MSA-based models.
As a large model for protein structure prediction, ESMFold obtains accurate atomic resolution structure prediction inference (Inferenc) speed about an order of magnitude faster than AlphaFold2.Especially in actual calculation, this speed advantage is more obvious. This is due to the fact that ESMFold reduces the computational effort of searching multiple sequences to construct MSA.
▲ESMFold is used to explore the metagenome structure space
The inference speed advantage makes it possible to efficiently map the structural space of large metagenomic sequence databases based on computation.
In addition to being used to identify distant homologies, ESMFold can also be used to make fast and accurate structure predictions and obtain millions of predicted structures on practical time scales, further aiding the discovery of new protein structures and functions. This is equivalent to using AI computing to build a “Metaverse” of life.
Large model with 15 billion parameters, 10 times faster. Although Meta ESMFold failed to fully “smash” AlphaFold2 in terms of accuracy, after all, it is “fast but not broken”, which has a huge role in promoting protein structure analysis and prediction and the construction of large-scale metagenomic structure databases.
Jumper, J. et al., Highly accurate protein structure prediction with AlphaFold, Nature (2021):1-11.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/exploding-the-life-metaverse-with-ai-in-depth-analysis-of-the-meta-protein-model/ Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.