After over two decades, the human genome sequence is finally complete.
The holdout? The Y chromosome. Although far smaller than the other 23 chromosomes, Y is a genetic contortionist, carrying multiple strange structures that are notoriously difficult to disentangle, and not for lack of effort. As one of the two sex chromosomes—X being the other—Y houses genes essential for producing sperm and for human reproduction.
Now, two international teams have finally cracked the enigmatic Y genetic code. Both of their papers were recently published in Nature. One paper, from the Telomere-to-Telomere (T2T) consortium, used state-of-the-art genetic sequencing technologies that read over 62 million base pairs (duos of DNA letters in the human Y chromosome), finally producing a reference human genome that contains all 24 chromosomes.
Yet one male can hardly represent the diversity of our species’ genetics. Another study assembled Y chromosomes from 43 biologically male individuals spanning five continents, extensively covering most of the deepest-rooted human Y lineages.
The newly-assembled dataset “provides the most comprehensive view of genetic variation…across over 180,000 years of human Y chromosome evolution,” said the authors.
So why should we care? For one, this marks a monumental step in deciphering our genetic landscape. Although the human genome was first sequenced two decades ago, nearly 50 percent of the Y’s genetic letters remained elusive. As sequencing and analysis methods improve we’ll likely fill in more gaps.
For another, scientists now have a valuable resource for analyzing Y chromosome evolution and behavior. As the weird one in the chromosomal pack it seems to have shrunk over millennia, shedding genetic material like a particularly aggressive spring cleaning. Why this happened and what its consequences were remain mysterious, leading to some speculating that Y is degenerating.
The clarity and variation in the new datasets now offer a road map for further research. Fertility aside, the Y chromosome has also been linked to a number of health issues, such as bladder cancer.
“Just a few years ago, half of the human Y chromosome was missing [from the reference]—the challenging, complex satellite areas,” said Dr. Monika Cechova at the University of California, Santa Cruz, who worked on a full Y sequence. “Back then we didn’t even know if it could be sequenced, it was so puzzling. This is really a huge shift in what’s possible.”
The Hitchhiker’s Guide to the Genomic Galaxy
When scientists talk about the human genome, they usually mean a reference genome. First constructed decades ago, the first draft was a triumph. But with eight percent of DNA letters missing, it was far from perfect.
The missing chunks have repercussions for diagnosis and research into some of the most troubling diseases of our time: cancer, heart disease, diabetes, dementia, and other brain disorders. It also misses our ability to detect rare but devastating disorders and, in turn, use genetic editing tools to treat them.
The rise of large-scale sequencing and analysis allows scientists to hunt down groups of genes that could up the chance of getting a certain disorder. This is often done by comparing a patient’s genome to the reference genome—the current “dictionary” of our DNA letters, called GRCh38 (catchy, I know).
Back in March, Dr. Adam Phillippy, a lead researcher for one of the papers, released the most complete sequence of the human genome to date—except for the Y chromosome. Over 50 percent of its sequences were represented by gaps.
Y is a conundrum in biology. Similar to other chromosome pairs, in which each pair is identical in size, Y is significantly smaller than its X counterpart. The current reasoning is that Y has gradually shrunk to one-sixth the size of X and contains just half the genes of its counterpart. Why and how this happened remains a mystery, yet Y still packs a hefty biological punch in its small package, containing genes that determine biological males and are essential for sperm production.
The evolutionary shrinkage of Y led some researchers to initially discard it completely during genome sequencing. If the chromosome contained a graveyard of genes set to mutate out of its biological functions, why bother?
Then there’s the technological hurdle. “The Y really is as weird, and as interesting, as we thought,” said Dr. Jenny Graves at La Trobe University, who wasn’t involved in either study.
Y is different from other chromosomes because it contains strange genetic sequences. One example: palindromes, like “Was it a car or a cat I saw?” The sentence reads the same forwards and backwards. Similarly, the genetic letters of one strand of the Y chromosome (A, T, C, and G) read exactly the same as on one of the other strands reversed.
When sequencing genomes, scientists need to patch together snippets of genetic material. Ones that read similarly in both directions make it extremely difficult to piece together the genomic puzzle. Y is also repetitive, with large segments that regurgitate the same few DNA letters.
In the new study, the team tapped into recent techniques that can read longer DNA sequences spanning over a million base pairs, and a highly efficient computational assembly strategy analogous to mapping a metropolitan subway system. Overall, the herculean work mapped over 62 million DNA letters from the Y chromosome.
Meanwhile, a second collaboration took the Y sequencing challenge global. Headed by Dr. Charles Lee, the director of the Jackson Laboratory of Genomic Medicine, the team broadened its scope to 21 distinct populations around the world covering five continents.
These men were all part of the 1000 Genome Project. Launched in 2008, the project is going strong with an open-sourced database available for anyone to analyze. The team selected 43 males, half with largely African ancestry. Altogether, the most recent common ancestor among the group was estimated to reach back 182,900 years, going much farther back in time than the current GRCh38 reference genome.
Only some parts of Ys varied in both their genetics and epigenetics—the regulation of genes—among the individuals. But the changes were surprisingly large. Some had genes that reversed their order in 14 different places, affecting half of the Y chromosome genetic structure. The Y chromosome also has a tendency to copy itself: one gene that’s known for producing sperm, TSPY, had over 10 more copies in one individual than in another. Other genetic breaks also popped up, yet the Y remained functional and resilient to evolutionary forces.
To Dr. Mark Jobling at the University of Leicester, who was not involved in either work, these results “confirm that gene content of the Y chromosome is essentially conserved.” As for the Y chromosome withering away, “The idea that the Y is still degenerating and destined to disappear is really scotched by this,” he said.
For now, the studies can’t yet link Y chromosome changes to specific diseases. But it’s the first comprehensive resource that opens the gate to genetic research, therapies, and synthetic biology. Based on the location of TSPY genes, for example, the team already knows that they are locations of frequent DNA alterations.
“It’s a bit like high-definition TV—we could see the picture before, and these studies bring it into super-sharp focus,” said Joling.
As for the authors, the studies are “a long-awaited yet crucial milestone towards understanding the full extent of human genetic variation.” It’s a “starting point” to finally decipher the mysterious roles of Y—the enigmatic chromosome that’s shaped human evolution and itself, and likely will for the next generation to come.
Image Credit: N. Hanacek/NIST