Technology sequence analysis

TECHNOLOGY SEQUENCE ANALYSIS FULL

Since its public release last July, AlphaFold2 has been applied to proteomes, to determine the structures of all the proteins expressed in humans 6 and in 20 model organisms (see Nature 595, 635 2021), as well as nearly 440,000 proteins in the Swiss-Prot database, greatly increasing the number of proteins for which high-confidence modelling data are available. “For some of the structures, the predictions are almost eerily good,” says Janet Thornton, senior scientist and former director of the European Bioinformatics Institute in Hinxton, UK. Following a decisive victory at the 2020 Critical Assessment of protein Structure Prediction competition, in which computational biologists test their structure-prediction algorithms head-to-head, AlphaFold2’s reputation - and adoption - has soared.

The AlphaFold2 structure-prediction algorithm, developed by Alphabet subsidiary DeepMind in London, relies on ‘deep learning’ strategies to extrapolate the shape of a folded protein from its amino acid sequence 5. Major experimental and computational advances in the past two years have given researchers complementary tools for determining protein structures with unprecedented speed and resolution.Īrtificial intelligence powers protein-folding predictions “I think within the next 10 years, we’re going to be doing telomere-to-telomere genomes routinely,” he says.

TECHNOLOGY SEQUENCE ANALYSIS FULL

As chair of the Vertebrate Genomes Project, Jarvis also hopes to leverage these complete genome assembly capabilities to generate full sequences for every vertebrate species on Earth. “We’re aiming to capture an average of 97% of human allelic diversity,” says Erich Jarvis, one of the consortium’s lead investigators and a geneticist at the Rockefeller University in New York City. This diploid assembly work is being conducted in collaboration with T2T’s partner organization, the Human Pangenome Reference Consortium, which aspires to produce a more representative genome map, based on hundreds of donors from around the world.

“We’re already getting some pretty phenomenal phased assemblies,” says Miga. Normal diploid human genomes contain two versions of each chromosome, and researchers are now working on ‘phasing’ strategies that can confidently assign each sequence to the appropriate chromosome copy. The genome T2T solved was from a cell line that contains two identical sets of chromosomes. The ONT platform also captures many modifications to DNA that modulate gene expression, and T2T was able to map these ‘epigenetic tags’ genome-wide as well 4. These subtle ‘fingerprints’ made long repetitive chromosome segments tractable, and the rest of the genome quickly fell into line. By the time the T2T team reconstructed 2, 3 their first individual chromosomes - X and 8 - in 2020, however, Pacific Biosciences’ sequencing had advanced to the extent that T2T scientists could detect tiny variations in long stretches of repeated sequences. Developed by Pacific Biosciences in Menlo Park, California, and Oxford Nanopore Technologies (ONT) in Oxford, UK, these technologies can sequence tens or even hundreds of thousands of bases in a single read, but - at least at the outset - not without errors. Long-read sequencing technologies proved to be the game-changer. They are not long enough to unambiguously map highly repetitive genomic sequences, including the telomeres that cap chromosome ends and the centromeres that coordinate the partitioning of newly replicated DNA during cell division. This is largely because the widely used sequencing technology developed by Illumina, in San Diego, California, produces reads that are accurate, but short. In a preprint published in May last year, the consortium reported the first end-to-end sequence of the human genome, adding nearly 200 million new base pairs to the widely used human consensus genome sequence known as GRCh38, and writing the final chapter of the Human Genome Project 1.įirst released in 2013, GRCh38 has been a valuable tool - a scaffold on which to map sequencing reads. Roughly one-tenth of the human genome remained uncharted when genomics researchers Karen Miga at the University of California, Santa Cruz, and Adam Phillippy at the National Human Genome Research Institute in Bethesda, Maryland, launched the Telomere-to-Telomere (T2T) consortium in 2019. Sumner/SPLįrom gene editing to protein-structure determination to quantum computing, here are seven technologies that are likely to have an impact on science in the year ahead. The Telomere-to-Telomere Consortium is sequencing whole chromosomes.