Though the human Y chromosome is a "tiny" guy among other chromosomes, it has been regarded as the notoriously difficult target to sequence and assemble because of its complex repeat structures including tandem repeats, long palindromes, and segmental duplications. Thus, not surprisingly, more than half of the Y chromosome has been missing from the GRCh38 reference and it remains the last riddle in human chromosome to be challenged.
A milestone study published recently in Nature by the Telomere-to-Telomere (T2T) consortium (Rhie A et al., Nature, 2023) presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y references and adds over 30 Mb of sequence, which are predominantly from the heterochromatic region of the Yqh-arm (Figure 1), unmasking the complete ampliconic structures of TSPY, DAZ, and RBMY, and moreover discovering 110 genes, among which 42 are predicted to be protein-coding genes from the TSPY gene family. It has been known that 9 protein-coding ampliconic gene families have expanded specifically on the Y and are expressed in testis functioning in spermatogenesis and fertility. These findings are particularly intriguing when concerning of evolution and sex determination since Y chromosome has been lost by certain non-mammalian species and this latest study could inspire us to understand how mammals follows a similar biological fate.
By combining T2T-Y with a prior assembly of the CHM13 genome, available population variation, clinical variants, and functional genomics data, the study has been mapped to produce a complete and comprehensive reference sequence for all 24 human chromosomes. Notably, both short and long read sequencing technologies have been utilized in this study to structure the entire human Y chromosome by frames and details, highlighting the potency of this “short + long” strategy in future sequencings. Long-read overcomes those complicated tandem repeat region, copy numbers and high-ploidy, short-read polishes the genomic details such as SNP and sequence variants.
The more we dig, the more we understand those scientific mysteries, in the meanwhile arising more “Why”s to be answered in the Omic-verse of Biology.