Shilpa Garg from University of Copenhagen, Denmark Presented in Translational Genomics Virtual Distinguished Lecture Series
Efficient, high-resolution bioinformatic approaches for integrative sequencing analysis of complex diseasesShilpa Garg, PhD Tuesday, May 25, 2021 |
|||||
Reconstructing the complete phased sequences of every chromosome copy in human and non-human species are important for medical genetics. The unprecedented advancements in sequencing technologies have opened up new avenues to reconstruct these phased sequences that would enable a deeper understanding of molecular, cellular and developmental processes underlying complex diseases. Despite these interesting sequencing innovations, the highly polymorphic and gene-dense human leukocyte antigen (HLA) are not yet fully phased in the reference genome. The reference genome still contains gaps in multi-megabase repetitive regions, and thus annotating novel expression and methylation results are incomplete and inaccurate, that affect the interpretation of molecular genetics and epigenetics of diseases. There is a pressing need for streamlined, production-level, easy-to-use computational approaches that can reconstruct high-quality chromosome-scale phased sequences, and that can be applied to hundreds of human genomes. In this talk, first, I will present an efficient combinatorial phasing model that leverages new long-range strand-specific technology and long reads to generate chromosome-scale phasing. Second, I present an efficient algorithm to perform accurate haplotype-resolved assembly of human individuals. This method takes advantage of new long accurate data type (PacBio HiFi) and long-range Hi-C data. We for the first time can generate accurate chromosome-scale phased assemblies with base-level-accuracy of Q50 and continuity of 25Mb within 24 hours per sample, there-fore, setting up a milestone in the genomic community. Third, I will present the generalized graph-based method for phased assembly of related individuals. This graph framework provides a compact representation to encode various data types and can be applied to genomes of any complexity having varying heterozygous rates and repeat content. Finally, I will present the importance of haplotype-resolved assemblies to various medical applications.
|