98.5% of human proteins are all predicted by AI

98.5% of human proteins are all predicted by AI

98.5% of the human protein structure is predicted by AlphaFold2!

And also made a data set, all free and open!

Only a week after the open source AlphaFold2, DeepMind shocked the release of the AlphaFold data set, detonating the scientific research circle again!


Of all the amino acid residues predicted in the data set, 58% reached the confidence level, and 35.7% reached the high confidence level .

Before this, scientists’ efforts for decades only covered 17% of the amino acid residues in the human protein sequence.

In addition to the human proteome, the data set also includes the proteome data of 20 organisms commonly used in scientific research, such as Escherichia coli, Drosophila, and mice, with a total of more than 350,000 protein structures.

Most importantly, these are all free and open! Entrusted to the European Institute of Bioinformatics.


“This is the most important data set after the Human Genome Atlas.” This evaluation comes from Ewan Birney, who led the follow-up project of the Human Genome Project: Encyclopedia of Human Gene Elements (ENCODE) .

DeepMind founder Hathabis published an article on the official website entitled “Handing the Power of AlphaFold into the Hands of the World”, and also expressed his uncontrollable excitement on Twitter:

This is the day I dream of in my life, and it is also the original intention of creating Deepmind: to use AI to advance scientific development and benefit mankind.


The other side of benefiting humanity as a whole is a huge impact on current practitioners of structural biology.

Some people feel desperate about racing against AI.


Some people complained that they were open sourced for free and could not declare funds.


But some people have put forward a different view: the 21st century is not just the century of biology, but also the century of synthetic biology!


The Zhihu netizen @sorrySorui, who has worked in the Structural Biology Laboratory, also thinks that the emergence of AlphaFold saves a lot of time and energy for researchers.


He believes that the results obtained using AlphaFold can help further research such as drug design.


So which proteins in this prediction result can open up new research directions?

Several key predictions

AlphaFold 2 predicted a total of 350,000 results. DeepMind selected three typical protein structure predictions in the paper. These predictions were all made from scratch.

Although the results will ultimately be verified through experiments, these predictions still provide biologists with many useful results.

1. Glucose-6-phosphatase : A new protein gating mechanism has been discovered

This is a membrane-bound enzyme that catalyzes the last step of glucose synthesis and is essential for maintaining blood sugar levels. There is no experimental structure of this protein before. AlphaFold predicts with very high reliability and gives a nine-helix topology.


DeepMind found that in this predicted structure, glutamate can stabilize the binding site in a closed conformation, so there may be a gating function, and this new mechanism has not been discovered in the past.

2. Diacylglycerol O-acyltransferase 2 (Diacylglycerol O-acyltransferase 2) : Looking for binding sites for inhibiting enzymes

This enzyme is responsible for storing excess metabolic energy as fat. It (DGAT2) is one of the two necessary acyltransferases for the final acyl addition in the catalytic process. Previous studies have shown that inhibiting DGAT2 can improve liver function in mouse models of liver disease. .

With AlphaFold’s highly reliable predicted structure (median pLDDT 95.9) , the binding site of the protein and the inhibitor can be determined.


3. Wolframin: Looking for the causes of genetic diseases

Wolframin is a transmembrane protein located in the ER and is related to the genetic disease Wolfram syndrome. Wolfram syndrome is a neurodegenerative disease characterized by early-onset diabetes, progressive vision and hearing loss, and early death.


Although the confidence of AlphaFold’s complete prediction results is low (median pLDDT 81.7) , it can be used to identify special regions of the protein structure, and useful results can be obtained.

For example, a recent evolutionary analysis studied a region of Wolframin, and AlphaFold’s predictions largely support their conclusions.

AlphaFold’s predictions indicate that because Wolfram syndrome patients lack the cysteine ​​in Wolframin, disulfide cross-links may be formed in the protein. The results of the analysis are important to help us understand the principles of this genetic disease.


Accelerate the treatment of diseases such as cancer and HIV

Currently, there are approximately 365,000 structural predictions in the AlphaFold database.

The researchers said that they will further expand the scope of the forecast next and expect to increase the number of forecasts to 130 million by the end of this year.

This amount has reached half of the total number of known proteins in humans.

Such shocking results also made Google CEO Pichai once again stand for AlphaFold:

The AlphaFold database shows the great potential of AI to accelerate scientific progress. It can greatly improve our understanding of protein structure and human proteome overnight.


Protein has the characteristic that structure determines function. Through the study of its structure, scientists can grasp more information about its function and mechanism.

For example, you can understand how proteins interact with other chemicals and where they react.

This helps scientists understand how the mutant protein changes its function, so that they can further explore cancer, HIV, and genetic diseases.

In addition, AlphaFold2 can improve the accuracy of prediction to the atomic level.

In other words, humans can now determine the active sites of enzymes more quickly and accurately, which is also of great significance for drug development.

Edith Heard, head of the European Molecular Biology Laboratory (EMBL) , said:

We believe this has a transformative effect on understanding how living organisms work.

Mohammed AlQuraishi, a computational biologist at Columbia University, said that the field of protein structure prediction always spent a lot of time on some basic work, wasting a lot of scholars’ energy, and now they can focus more on the study of protein structure.

We used to do research that relied on amino acid sequences, and now we can start directly from the structure of the protein.

In fact, some research teams working with DeepMind have accelerated the research process through AlphaFold.

For example, DNDi (Neglected Disease Drug Development Organization) said that AlphaFold2 promoted their research in tropical disease drug development.

The Enzyme Innovation Center (CEI) of the University of Portsmouth also stated that they are using AlphaFold2 to develop some new enzymes that can be used to degrade environmentally polluting disposable plastics.

Marcelo Sousa, a biochemist at the University of Colorado Boulder, used AlphaFold to make protein structure models and conduct a study on antibiotics.

A team from the University of California, San Francisco said that AlphaFold2 can help them better understand the biological mechanism of SARS-CoV-2.


Behind the great success of AlphaFold2, the research of Proteomics is inseparable .

The proteome refers to all the proteins expressed by the genome, cell, tissue or organism at a specific time.

In the 1990s, when the Human Genome Project began to take shape, scientists realized that it was not enough to master the base arrangement of genes. They must also understand the proteins that are produced by genes.

As a result, Australian geneticist Mark Wilkins proposed the idea of ​​deciphering the human proteome.

At the same time that the framework map of the human genome was released in 2001, the Human Proteome Research Organization (HUPO) was also formally established.

It was not until 2014 that the Technical University of Munich and Johns Hopkins University finally drew a sketch of the human proteome.

Subsequently, the human proteome database was gradually improved. AlphaFold used the Uniprot database with the most extensive and comprehensive annotation information.

For more technical details of AlphaFold, please refer to the link below:

“The Secret of AlphaFold2’s Success: Attention Mechanism Replaces Convolutional Networks, Improves Forecasting Accuracy by Over 30% “

Paper address: https://www.nature.com/articles/s41586-021-03828-1

Data set: https://alphafold.ebi.ac.uk

Know the authorized answer: @sorrySorui is a bit screaming: https://www.zhihu.com/question/474094187/answer/2014736529

Reference link:





Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/98-5-of-human-proteins-are-all-predicted-by-ai/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2021-07-23 03:58
Next 2021-07-23 04:50

Related articles