In 2003, after nearly $3 billion in funding and 13 years of painstaking research, scientists with the Human Genome Project (HGP) announced they had finally mapped the first human genome sequence. This was a momentous breakthrough in science that would revolutionize genomics. However, the initial draft and updates of the human genome sequence that followed were not 100% complete. But now, scientists with the Telomere-to-Telomere (T2T) Consortium claim they’ve addressed the remaining 8% of the human genome that was missing.
“The Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the largest improvement to the human reference genome since its initial release,” wrote the scientists in a paper published in the pre-print server bioRxiv, meaning it has yet to be peer-reviewed.
The first truly complete genome of a vertebrate
The genome is the sum of all the DNA and mitochondrial DNA (mtDNA) sequences in the cell. It contains all the instructions a living being needs to survive and replicate, consisting of chemical building blocks or “bases” (G, A, T, and C), whose order encodes biological information.
In diploid organisms, such as humans, the size of the genome is considered to be the total number of bases in one copy of its nuclear DNA. Humans and other mammals contain duplicate copies of almost all of their DNA. For instance, we have pairs of chromosomes, with one chromosome of each pair inherited from each parent. But scientists are only interested in sequencing the sum of the bases of one copy of each chromosome pair. A person’s actual genome is roughly six billion bases in size, but a single “representative” copy of the human genome is about three billion bases in size.
Because the human genome is so large, its bases cannot be read in order end-to-end in one single step. What HGP scientists did to sequence the genome was to first break down the DNA into smaller pieces, with each piece then subjected to various chemical reactions that allowed the identity and order of its bases to be deduced. These bits and pieces were then put back together to deduce the sequence of the starting genome.
Although genome sequencing technology has advanced a lot since the HGP announced the first draft of the human genome in 2001, a complete sequence of the entire genome was never achieved. Around 8% of the genome was missing, which corresponds to areas where DNA sequences are made up of long repeating patterns. Some of these repeating patterns, such as those found in the centromeres of chromosomes (the ‘knot’ that ties chromosomes together), play important biological roles, but standard technology hasn’t been able to decode them properly.
Using revolutionary new technology, scientists affiliated with T2T now claim that they’ve filled these gaps.
“You’re just trying to dig into this final unknown of the human genome,” Karen Miga, a researcher at the University of California, Santa Cruz, who co-led the international consortium, told STAT News. “It’s just never been done before and the reason it hasn’t been done before is because it’s hard.”
According to Miga and colleagues, the genome breakthrough was made possible thanks to new DNA sequencing technologies developed by Pacific Biosciences in California and Oxford Nanopore in the UK. These technologies do not cut the DNA into tiny pieces for later assembly, which can result in errors. Instead, Oxford Nanopore tech runs the DNA molecule through a nanoscopic hole, resulting in a long sequence. Meanwhile, lasers developed by Pacific Biosciences read the same DNA sequence again and again, which makes the readout far more accurate than previous technology.
Both technologies complemented each other to reveal the missing parts of the genome that have been eluding scientists for almost two decades. According to TNT, the number of DNA bases has been increased from 2.92 billion to 3.05 billion, marking a 4.5% improvement. However, the number of genes only increased by 0.4%, to 19.1969 — that’s because the vast majority of DNA sequences do not code for proteins but rather regulate the expression and activity of these genes.
“The complete, telomere-to-telomere assembly of a human genome marks a new era of genomics where no region of the genome is beyond reach. Prior updates to the human reference genome have been incremental and the high cost of switching to a new assembly has outweighed the marginal gains for many researchers. In contrast, the T2T-CHM13 assembly presented here includes five entirely new chromosome arms and is the single largest addition of new content to the human genome in the past 20 years,” wrote the researchers.
“This 8% of the genome has not been overlooked due to its lack of importance, but rather due to technological limitations. High accuracy long-read sequencing has finally removed this technological barrier, enabling comprehensive studies of genomic variation across the entire human genome. Such studies will necessarily require a complete and accurate human reference genome, ultimately driving adoption of the T2T-CHM13 assembly presented here,” they added.
The genome that the researchers sequenced didn’t come from a person but rather from a hydatidiform mole, a rare mass or growth that forms inside the womb (uterus) at the beginning of a pregnancy. This tissue forms when sperm fertilizes an egg with no nucleus, so it contains only 23 chromosomes, just like a gamete (sperm or egg), rather than 46 found in the DNA of a human’s cell. These cells make the computational effort simpler but may constitute a limitation.
We will find out more once the paper is peer-reviewed and properly scrutinized by the international scientific community. If the findings hold water, they may mark a new age of genomics — one where no nook or cranny of DNA is left unexplored.