HOME > Health

Google creates artificial intelligence to decipher the secrets of human DNA.

New tool analyzes how hidden parts of the genome influence genes and may help to understand the origin of thousands of diseases.

Representation of DNA strands (Photo: Generated by AI/DALL-E)

247 - Google announced on Wednesday (28) AlphaGenome, a new artificial intelligence tool aimed at analyzing the human genome, focusing on understanding how extensive regions of DNA influence gene regulation within cells. The initiative seeks to go beyond simply reading the genetic code, investigating the mechanisms that control when, where and how genes are activated or silenced in the organism, explains the Folha de São Paulo.

During the presentation of AlphaGenome in the magazine NatureGoogle DeepMind's vice president of research, Pushmeet Kohli, emphasized that the complete sequencing of the human genome, completed in 2003, represented only the first step. "Decoding the entire human genome in 2003 gave us the book of life, but reading it remains a challenge," he stated. According to him, although the genetic text is available—comprising approximately 3 billion nucleotide pairs—understanding its "grammar" remains one of the great frontiers of science. "We have the text, but understanding the grammar and how it governs life constitutes the next great frontier of research," said Kohli.

Most human DNA does not directly code for proteins. Only about 2% of the sequences perform this essential function for the functioning of living organisms. The remaining 98% plays a complex regulatory role, acting as a conductor that coordinates, protects, and adjusts gene expression in each cell. It is precisely in these regions that numerous variants associated with diseases are concentrated, and it is this territory that AlphaGenome aims to explore.

The new model complements other tools developed by Google's artificial intelligence laboratory, such as AlphaMissense, focused on analyzing DNA coding sequences, AlphaProteo, dedicated to protein design, and AlphaFold, responsible for predicting protein structures and winner of the 2024 Nobel Prize in Chemistry. In the case of AlphaGenome, the innovation lies in its ability to analyze long DNA sequences and predict how each nucleotide pair influences different biological processes within the cell.

Based on deep learning techniques, the system was trained with data from large public consortia that conducted experimental measurements on hundreds of types of human and rat cells and tissues. This foundation allowed the model to learn complex patterns of genetic regulation and apply this knowledge in an integrated way.

Before AlphaGenome, models capable of studying regulatory regions of DNA already existed, but they faced technical limitations. It was necessary to choose between analyzing long sequences, with lower precision, or focusing on smaller stretches, with more detailed resolution. According to Žiga Avsec, one of the project's co-authors, fully understanding the regulatory environment of a gene requires analyzing sequences that can reach up to one million nucleotide pairs. The new tool seeks to overcome this dilemma by combining length and precision.

Another distinguishing feature of AlphaGenome is its ability to simultaneously model the influence of DNA on 11 distinct biological processes. Until now, researchers had to resort to different models to obtain this type of integrated analysis. According to Natasha Latysheva, also a co-author of the study published in... NatureThe tool represents a significant advance. "It can accelerate our understanding of the genome by helping to map the location of functional elements and determine their roles at the molecular level," he stated.

Kohli also highlighted the collaborative nature of the project. "We hope that researchers will enrich it with more data and modalities," he said. According to Google, AlphaGenome has already been tested by approximately 3.000 scientists from 160 countries and is now available as open source for use in non-commercial research.

Experts outside the project evaluate the model positively, but cautiously. Ben Lehner, head of generative and synthetic genomics at the Wellcome Sanger Institute in Cambridge, stated that the tool is very effective, although it still has limitations. “Accurately identifying the differences in our genomes that make us more or less susceptible to developing thousands of diseases is a crucial step towards better treatments,” he noted. At the same time, he cautioned that “AI models are only as good as the data used to train them,” emphasizing that many available datasets are still small and poorly standardized.

A similar assessment was made by Robert Goldstone, head of genomics at the Francis Crick Institute. He believes that AlphaGenome should not be seen as a definitive answer to all questions in biology, since gene expression also depends on complex environmental factors. Even so, he considered the tool essential for advancing the field. According to Goldstone, it will allow scientists to "programmatically study and simulate the genetic basis of complex diseases," expanding the possibilities for research and understanding of how the human genome works.

Related Articles