The Program in Bioinformatics and Integrative Biology (BIB) was established in 2008 to address one of the most dynamic and central areas in biomedical research—the ever-increasing quantity of molecular information available to scientists. Our mission is to develop and explore computational and quantitative approaches and tools to help the biomedical research community maximize their understanding of the growing volume and complexity of biomedical big data.

Collections in this community

Recently Published

  • The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models

    Rozowsky, Joel; Gao, Jiahao; Borsari, Beatrice; Yang, Yucheng T; Galeev, Timur; Gürsoy, Gamze; Epstein, Charles B; Xiong, Kun; Xu, Jinrui; Li, Tianxiao; et al. (2023-03-30)
    Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
  • Toward a comprehensive catalog of regulatory elements

    Fan, Kaili; Pfister, Edith; Weng, Zhiping (2023-03-19)
    Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions.
  • Leveraging Base Pair Mammalian Constraint to Understand Genetic Variation and Human Disease [preprint]

    Sullivan, Patrick F; Meadows, Jennifer R S; Gazal, Steven; Phan, BaDoi N; Li, Xue; Genereux, Diane P; Dong, Michael X; Bianchi, Matteo; Andrews, Gregory; Sakthikumar, Sharadha; et al. (2023-03-10)
    Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
  • Downregulation of Hsp90 and the antimicrobial peptide Mtk suppresses poly(GR)-induced neurotoxicity in C9ORF72-ALS/FTD

    Lee, Soojin; Jun, Yong-Woo; Linares, Gabriel R; Butler, Brandon; Yuva-Adyemir, Yeliz; Moore, Jill; Krishnan, Gopinath; Ruiz-Juarez, Bryan; Santana, Manuel; Pons, Marine; et al. (2023-03-10)
    GGGGCC repeat expansion in the C9ORF72 gene is the most common genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). Repeat RNAs can be translated into dipeptide repeat proteins, including poly(GR), whose mechanisms of action remain largely unknown. In an RNA-seq analysis of poly(GR) toxicity in Drosophila, we found that several antimicrobial peptide genes, such as metchnikowin (Mtk), and heat shock protein (Hsp) genes are activated. Mtk knockdown in the fly eye or in all neurons suppresses poly(GR) neurotoxicity. These findings suggest a cell-autonomous role of Mtk in neurodegeneration. Hsp90 knockdown partially rescues both poly(GR) toxicity in flies and neurodegeneration in C9ORF72 motor neurons derived from induced pluripotent stem cells (iPSCs). Topoisomerase II (TopoII) regulates poly(GR)-induced upregulation of Hsp90 and Mtk. TopoII knockdown also suppresses poly(GR) toxicity in Drosophila and improves survival of C9ORF72 iPSC-derived motor neurons. These results suggest potential novel therapeutic targets for C9ORF72-ALS/FTD.
  • Cross-ancestry, cell-type-informed atlas of gene, isoform, and splicing regulation in the developing human brain [preprint]

    Wen, Cindy; Margolis, Michael; Dai, Rujia; Zhang, Pan; Przytycki, Pawel F; Vo, Daniel D; Bhattacharya, Arujun; Kim, Minsoo; Matoba, Nana; Tsai, Ellen; et al. (2023-03-06)
    Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage- and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.
  • oFlowSeq: a quantitative approach to identify protein coding mutations affecting cell type enrichment using mosaic CRISPR-Cas9 edited cerebral organoids

    Dawes, Pepper; Murray, Liam F; Olson, Meagan N; Barton, Nathaniel J; Smullen, Molly; Suresh, Madhusoodhanan; Yan, Guang; Zhang, Yucheng; Fernandez-Fontaine, Aria; English, Jay; et al. (2023-03-06)
    Cerebral organoids are comprised of diverse cell types found in the developing human brain, and can be leveraged in the identification of critical cell types perturbed by genetic risk variants in common, neuropsychiatric disorders. There is great interest in developing high-throughput technologies to associate genetic variants with cell types. Here, we describe a high-throughput, quantitative approach (oFlowSeq) by utilizing CRISPR-Cas9, FACS sorting, and next-generation sequencing. Using oFlowSeq, we found that deleterious mutations in autism-associated gene KCTD13 resulted in increased proportions of Nestin+ cells and decreased proportions of TRA-1-60+ cells within mosaic cerebral organoids. We further identified that a locus-wide CRISPR-Cas9 survey of another 18 genes in the 16p11.2 locus resulted in most genes with > 2% maximum editing efficiencies for short and long indels, suggesting a high feasibility for an unbiased, locus-wide experiment using oFlowSeq. Our approach presents a novel method to identify genotype-to-cell type imbalances in an unbiased, high-throughput, quantitative manner.
  • Epigenetic and chromosomal features drive transposon insertion in Drosophila melanogaster

    Cao, Jichuan; Yu, Tianxiong; Xu, Bo; Hu, Zhongren; Zhang, Xiao-Ou; Theurkauf, William E; Weng, Zhiping (2023-02-10)
    Transposons are mobile genetic elements prevalent in the genomes of most species. The distribution of transposons within a genome reflects the actions of two opposing processes: initial insertion site selection, and selective pressure from the host. By analyzing whole-genome sequencing data from transposon-activated Drosophila melanogaster, we identified 43 316 de novo and 237 germline insertions from four long-terminal-repeat (LTR) transposons, one LINE transposon (I-element), and one DNA transposon (P-element). We found that all transposon types favored insertion into promoters de novo, but otherwise displayed distinct insertion patterns. De novo and germline P-element insertions preferred replication origins, often landing in a narrow region around transcription start sites and in regions of high chromatin accessibility. De novo LTR transposon insertions preferred regions with high H3K36me3, promoters and exons of active genes; within genes, LTR insertion frequency correlated with gene expression. De novo I-element insertion density increased with distance from the centromere. Germline I-element and LTR transposon insertions were depleted in promoters and exons, suggesting strong selective pressure to remove transposons from functional elements. Transposon movement is associated with genome evolution and disease; therefore, our results can improve our understanding of genome and disease biology.
  • CRISPR-induced exon skipping of β-catenin reveals tumorigenic mutants driving distinct subtypes of liver cancer

    Mou, Haiwei; Eskiocak, Onur; Özler, Kadir A; Gorman, Megan; Yue, Junjiayu; Jin, Ying; Wang, Zhikai; Gao, Ya; Janowitz, Tobias; Meyer, Hannah V; et al. (2023-02-08)
    CRISPR/Cas9-driven cancer modeling studies are based on the disruption of tumor suppressor genes by small insertions or deletions (indels) that lead to frame-shift mutations. In addition, CRISPR/Cas9 is widely used to define the significance of cancer oncogenes and genetic dependencies in loss-of-function studies. However, how CRISPR/Cas9 influences gain-of-function oncogenic mutations is elusive. Here, we demonstrate that single guide RNA targeting exon 3 of Ctnnb1 (encoding β-catenin) results in exon skipping and generates gain-of-function isoforms in vivo. CRISPR/Cas9-mediated exon skipping of Ctnnb1 induces liver tumor formation in synergy with YAPS127A in mice. We define two distinct exon skipping-induced tumor subtypes with different histological and transcriptional features. Notably, ectopic expression of two exon-skipped β-catenin transcript isoforms together with YAPS127A phenocopies the two distinct subtypes of liver cancer. Moreover, we identify similar CTNNB1 exon-skipping events in patients with hepatocellular carcinoma. Collectively, our findings advance our understanding of β-catenin-related tumorigenesis and reveal that CRISPR/Cas9 can be repurposed, in vivo, to study gain-of-function mutations of oncogenes in cancer. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
  • FAVOR: functional annotation of variants online resource and annotator for variation across the human genome

    Zhou, Hufeng; Arapoglou, Theodore; Li, Xihao; Li, Zilin; Zheng, Xiuwen; Moore, Jill; Asok, Abhijith; Kumar, Sushant; Blue, Elizabeth E; Buyske, Steven; et al. (2022-11-09)
    Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.
  • An Early Islet Transcriptional Signature Is Associated With Local Inflammation in Autoimmune Diabetes

    Derr, Alan G; Arowosegbe, Adediwura; Satish, Basanthi; Redick, Sambra D; Qaisar, Natasha; Guo, Zhiru; Vanderleeden, Emma; Trombly, Melanie I; Baer, Christina E; Harlan, David M; et al. (2022-11-08)
    Identifying the early islet cellular processes of autoimmune type 1 diabetes (T1D) in humans is challenging given the absence of symptoms during this period and the inaccessibility of the pancreas for sampling. In this article, we study temporal events in pancreatic islets in LEW.1WR1 rats, in which autoimmune diabetes can be induced with virus infection, by performing transcriptional analysis of islets harvested during the prediabetic period. Single-cell RNA-sequencing and differential expression analyses of islets from prediabetic rats reveal subsets of β- and α-cells under stress as evidenced by heightened expression, over time, of a transcriptional signature characterized by interferon-stimulated genes, chemokines including Cxcl10, major histocompatibility class I, and genes for the ubiquitin-proteasome system. Mononuclear phagocytes show increased expression of inflammatory markers. RNA-in situ hybridization of rat pancreatic tissue defines the spatial distribution of Cxcl10+ β- and α-cells and their association with CD8+ T cell infiltration, a hallmark of insulitis and islet destruction. Our studies define early islet transcriptional events during immune cell recruitment to islets and reveal spatial associations between stressed β- and α-cells and immune cells. Insights into such early processes can assist in the development of therapeutic and prevention strategies for T1D.
  • The transcription factor TCFL5 responds to A MYB to elaborate the male meiotic program in mice

    Cecchini, Katharine; Biasini, Adriano; Yu, Tianxiong; Säflund, Martin; Mou, Haiwei; Arif, Amena; Eghbali, Atiyeh; Colpan, Cansu; Gainetdinov, Ildar; de Rooij, Dirk G; et al. (2022-11-01)
    In male mice, the transcription factors STRA8 and MEISON initiate meiosis I. We report that STRA8/MEISON activates the transcription factors A MYB and TCFL5, which together reprogram gene expression after spermatogonia enter into meiosis. TCFL5 promotes transcription of genes required for meiosis, mRNA turnover, miR-34/449 production, meiotic exit, and spermiogenesis. This transcriptional architecture is conserved in rhesus macaque, suggesting TCFL5 plays a central role in meiosis and spermiogenesis in placental mammals. Tcfl5em1/em1 mutants are sterile, and spermatogenesis arrests at the mid- or late-pachytene stage of meiosis. Moreover, Tcfl5+/em1 mutants produce fewer motile sperm.
  • A-MYB/TCFL5 regulatory architecture ensures the production of pachytene piRNAs in placental mammals

    Yu, Tianxiong; Biasini, Adriano; Cecchini, Katharine; Saflund, Martin; Mou, Haiwei; Arif, Amena; Eghbali, Atiyeh; de Rooij, Dirk; Weng, Zhiping; Zamore, Phillip D; et al. (2022-10-14)
    In male mice, the transcription factor A MYB initiates the transcription of pachytene piRNA genes during meiosis. Here, we report that A MYB activates the transcription factor Tcfl5 produced in pachytene spermatocytes. Subsequently, A MYB and TCFL5 reciprocally reinforce their own transcription to establish a positive feedback circuit that triggers pachytene piRNA production. TCFL5 regulates the expression of genes required for piRNA maturation and promotes transcription of evolutionarily young pachytene piRNA genes, whereas A-MYB activates the transcription of older pachytene piRNA genes. Intriguingly, pachytene piRNAs from TCFL5-dependent young loci initiates the production of piRNAs from A-MYB-dependent older loci ensuring the self-propagation of pachytene piRNAs. A MYB and TCFL5 act via a set of incoherent feedforward loops that drive regulation of gene expression by pachytene piRNAs during spermatogenesis. This regulatory architecture is conserved in rhesus macaque, suggesting that it was present in the last common ancestor of placental mammals.
  • Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data

    Li, Zixiu; Zhou, Peng; Kwon, Euijin; Fitzgerald, Katherine A; Weng, Zhiping; Zhou, Chan (2022-10-13)
    Long noncoding RNAs (lncRNAs) play critical regulatory roles in human development and disease. Although there are over 100,000 samples with available RNA sequencing (RNA-seq) data, many lncRNAs have yet to be annotated. The conventional approach to identifying novel lncRNAs from RNA-seq data is to find transcripts without coding potential but this approach has a false discovery rate of 30-75%. Other existing methods either identify only multi-exon lncRNAs, missing single-exon lncRNAs, or require transcriptional initiation profiling data (such as H3K4me3 ChIP-seq data), which is unavailable for many samples with RNA-seq data. Because of these limitations, current methods cannot accurately identify novel lncRNAs from existing RNA-seq data. To address this problem, we have developed software, Flnc, to accurately identify both novel and annotated full-length lncRNAs, including single-exon lncRNAs, directly from RNA-seq data without requiring transcriptional initiation profiles. Flnc integrates machine learning models built by incorporating four types of features: transcript length, promoter signature, multiple exons, and genomic location. Flnc achieves state-of-the-art prediction power with an AUROC score over 0.92. Flnc significantly improves the prediction accuracy from less than 50% using the conventional approach to over 85%. Flnc is available via GitHub platform.
  • Constructing, validating, and updating machine learning models to predict survival in children with Ebola Virus Disease

    Genisca, Alicia E; Butler, Kelsey; Gainey, Monique; Chu, Tzu-Chun; Huang, Lawrence; Mbong, Eta N; Kennedy, Stephen B; Laghari, Razia; Nganga, Fiston; Muhayangabo, Rigobert F; et al. (2022-10-12)
    Background: Ebola Virus Disease (EVD) causes high case fatality rates (CFRs) in young children, yet there are limited data focusing on predicting mortality in pediatric patients. Here we present machine learning-derived prognostic models to predict clinical outcomes in children infected with Ebola virus. Methods: Using retrospective data from the Ebola Data Platform, we investigated children with EVD from the West African EVD outbreak in 2014-2016. Elastic net regularization was used to create a prognostic model for EVD mortality. In addition to external validation with data from the 2018-2020 EVD epidemic in the Democratic Republic of the Congo (DRC), we updated the model using selected serum biomarkers. Findings: Pediatric EVD mortality was significantly associated with younger age, lower PCR cycle threshold (Ct) values, unexplained bleeding, respiratory distress, bone/muscle pain, anorexia, dysphagia, and diarrhea. These variables were combined to develop the newly described EVD Prognosis in Children (EPiC) predictive model. The area under the receiver operating characteristic curve (AUC) for EPiC was 0.77 (95% CI: 0.74-0.81) in the West Africa derivation dataset and 0.76 (95% CI: 0.64-0.88) in the DRC validation dataset. Updating the model with peak aspartate aminotransferase (AST) or creatinine kinase (CK) measured within the first 48 hours after admission increased the AUC to 0.90 (0.77-1.00) and 0.87 (0.74-1.00), respectively. Conclusion: The novel EPiC prognostic model that incorporates clinical information and commonly used biochemical tests, such as AST and CK, can be used to predict mortality in children with EVD.
  • Comparison of Rapid Antigen Tests' Performance Between Delta and Omicron Variants of SARS-CoV-2 : A Secondary Analysis From a Serial Home Self-testing Study

    Soni, Apurv; Herbert, Carly; Filippaios, Andreas; Broach, John; Colubri, Andres; Fahey, Nisha; Woods, Kelsey; Nanavati, Janvi; Wright, Colton; Orwig, Taylor; et al. (2022-10-11)
    Background: It is important to document the performance of rapid antigen tests (Ag-RDTs) in detecting SARS-CoV-2 variants. Objective: To compare the performance of Ag-RDTs in detecting the Delta (B.1.617.2) and Omicron (B.1.1.529) variants of SARS-CoV-2. Design: Secondary analysis of a prospective cohort study that enrolled participants between 18 October 2021 and 24 January 2022. Participants did Ag-RDTs and collected samples for reverse transcriptase polymerase chain reaction (RT-PCR) testing every 48 hours for 15 days. Setting: The parent study enrolled participants throughout the mainland United States through a digital platform. All participants self-collected anterior nasal swabs for rapid antigen testing and RT-PCR testing. All Ag-RDTs were completed at home, whereas nasal swabs for RT-PCR were shipped to a central laboratory. Participants: Of 7349 participants enrolled in the parent study, 5779 asymptomatic persons who tested negative for SARS-CoV-2 on day 1 of the study were eligible for this substudy. Measurements: Sensitivity of Ag-RDTs on the same day as the first positive (index) RT-PCR result and 48 hours after the first positive RT-PCR result. Results: A total of 207 participants were positive on RT-PCR (58 Delta, 149 Omicron). Differences in sensitivity between variants were not statistically significant (same day: Delta, 15.5% [95% CI, 6.2% to 24.8%] vs. Omicron, 22.1% [CI, 15.5% to 28.8%]; at 48 hours: Delta, 44.8% [CI, 32.0% to 57.6%] vs. Omicron, 49.7% [CI, 41.6% to 57.6%]). Among 109 participants who had RT-PCR-positive results for 48 hours, rapid antigen sensitivity did not differ significantly between Delta- and Omicron-infected participants (48-hour sensitivity: Delta, 81.5% [CI, 66.8% to 96.1%] vs. Omicron, 78.0% [CI, 69.1% to 87.0%]). Only 7.2% of the 69 participants with RT-PCR-positive results for shorter than 48 hours tested positive by Ag-RDT within 1 week; those with Delta infections remained consistently negative on Ag-RDTs. Limitation: A testing frequency of 48 hours does not allow a finer temporal resolution of the analysis of test performance, and the results of Ag-RDTs are based on self-report. Conclusion: The performance of Ag-RDTs in persons infected with the SARS-CoV-2 Omicron variant is not inferior to that in persons with Delta infections. Serial testing improved the sensitivity of Ag-RDTs for both variants. The performance of rapid antigen testing varies on the basis of duration of RT-PCR positivity. Primary funding source: National Heart, Lung, and Blood Institute of the National Institutes of Health.
  • Three-dimensional genome re-wiring in loci with Human Accelerated Regions [preprint]

    Keough, Kathleen C.; Whalen, Sean; Inoue, Fumitaka; Przytycki, Pawel F.; Fair, Tyler; Deng, Chengyu; Steyert, Marilyn; Ryu, Hane; Lindblad-Toh, Kerstin; Karlsson, Elinor; et al. (2022-10-05)
    Human Accelerated Regions (HARs) are conserved genomic loci that evolved at an accelerated rate in the human lineage and may underlie human-specific traits. We generated HARs and chimpanzee accelerated regions with the largest alignment of mammalian genomes to date. To facilitate exploration of accelerated evolution in other lineages, we implemented an open-source Nextflow pipeline that runs on any computing platform. Combining deep-learning with chromatin capture experiments in human and chimpanzee neural progenitor cells, we discovered a significant enrichment of HARs in topologically associating domains (TADs) containing human-specific genomic variants that change three-dimensional (3D) genome organization. Differential gene expression between humans and chimpanzees at these loci in multiple cell types suggests rewiring of regulatory interactions between HARs and neurodevelopmental genes. Thus, comparative genomics together with models of 3D genome folding revealed enhancer hijacking as an explanation for the rapid evolution of HARs. One-Sentence Summary: Human-specific changes to 3D genome organization may have contributed to rapid evolution of mammalian-conserved loci in the human genome.
  • Evolution of the ancestral mammalian karyotype and syntenic regions

    Damas, Joana; Corbo, Marco; Kim, Jaebum; Turner-Maier, Jason; Farré, Marta; Larkin, Denis M; Ryder, Oliver A; Steiner, Cynthia; Houck, Marlys L; Hall, Shaune; et al. (2022-09-26)
    Decrypting the rearrangements that drive mammalian chromosome evolution is critical to understanding the molecular bases of speciation, adaptation, and disease susceptibility. Using 8 scaffolded and 26 chromosome-scale genome assemblies representing 23/26 mammal orders, we computationally reconstructed ancestral karyotypes and syntenic relationships at 16 nodes along the mammalian phylogeny. Three different reference genomes (human, sloth, and cattle) representing phylogenetically distinct mammalian superorders were used to assess reference bias in the reconstructed ancestral karyotypes and to expand the number of clades with reconstructed genomes. The mammalian ancestor likely had 19 pairs of autosomes, with nine of the smallest chromosomes shared with the common ancestor of all amniotes (three still conserved in extant mammals), demonstrating a striking conservation of synteny for ∼320 My of vertebrate evolution. The numbers and types of chromosome rearrangements were classified for transitions between the ancestral mammalian karyotype, descendent ancestors, and extant species. For example, 94 inversions, 16 fissions, and 14 fusions that occurred over 53 My differentiated the therian from the descendent eutherian ancestor. The highest breakpoint rate was observed between the mammalian and therian ancestors (3.9 breakpoints/My). Reconstructed mammalian ancestor chromosomes were found to have distinct evolutionary histories reflected in their rates and types of rearrangements. The distributions of genes, repetitive elements, topologically associating domains, and actively transcribed regions in multispecies homologous synteny blocks and evolutionary breakpoint regions indicate that purifying selection acted over millions of years of vertebrate evolution to maintain syntenic relationships of developmentally important genes and regulatory landscapes of gene-dense chromosomes.
  • Multimodal surveillance of SARS-CoV-2 at a university enables development of a robust outbreak response framework

    Petros, Brittany A; Paull, Jillian S; Tomkins-Tinch, Christopher H; Loftness, Bryn C; DeRuff, Katherine C; Nair, Parvathy; Gionet, Gabrielle L; Benz, Aaron; Brock-Fisher, Taylor; Hughes, Michael; et al. (2022-09-19)
    Background: Universities are vulnerable to infectious disease outbreaks, making them ideal environments to study transmission dynamics and evaluate mitigation and surveillance measures. Here, we analyze multimodal COVID-19-associated data collected during the 2020-2021 academic year at Colorado Mesa University and introduce a SARS-CoV-2 surveillance and response framework. Methods: We analyzed epidemiological and sociobehavioral data (demographics, contact tracing, and WiFi-based co-location data) alongside pathogen surveillance data (wastewater and diagnostic testing, and viral genomic sequencing of wastewater and clinical specimens) to characterize outbreak dynamics and inform policy. We applied relative risk, multiple linear regression, and social network assortativity to identify attributes or behaviors associated with contracting SARS-CoV-2. To characterize SARS-CoV-2 transmission, we used viral sequencing, phylogenomic tools, and functional assays. Findings: Athletes, particularly those on high-contact teams, had the highest risk of testing positive. On average, individuals who tested positive had more contacts and longer interaction durations than individuals who never tested positive. The distribution of contacts per individual was overdispersed, although not as overdispersed as the distribution of phylogenomic descendants. Corroboration via technical replicates was essential for identification of wastewater mutations. Conclusions: Based on our findings, we formulate a framework that combines tools into an integrated disease surveillance program that can be implemented in other congregate settings with limited resources. Funding: This work was supported by the National Science Foundation, the Hertz Foundation, the National Institutes of Health, the Centers for Disease Control and Prevention, the Massachusetts Consortium on Pathogen Readiness, the Howard Hughes Medical Institute, the Flu Lab, and the Audacious Project.
  • The genomic landscape of canine osteosarcoma cell lines reveals conserved structural complexity and pathway alterations

    Megquier, Kate; Turner-Maier, Jason; Morrill, Kathleen; Li, Xue; Johnson, Jeremy; Karlsson, Elinor K; London, Cheryl A; Gardner, Heather L (2022-09-13)
    The characterization of immortalized canine osteosarcoma (OS) cell lines used for research has historically been based on phenotypic features such as cellular morphology and expression of bone specific markers. With the increasing use of these cell lines to investigate novel therapeutic approaches prior to in vivo translation, a much more detailed understanding regarding the genomic landscape of these lines is required to ensure accurate interpretation of findings. Here we report the first whole genome characterization of eight canine OS cell lines, including single nucleotide variants, copy number variants and other structural variants. Many alterations previously characterized in primary canine OS tissue were observed in these cell lines, including TP53 mutations, MYC copy number gains, loss of CDKN2A, PTEN, DLG2, MAGI2, and RB1 and structural variants involving SETD2, DLG2 and DMD. These data provide a new framework for understanding how best to incorporate in vitro findings generated using these cell lines into the design of future clinical studies involving dogs with spontaneous OS.
  • GLH/VASA helicases promote germ granule formation to ensure the fidelity of piRNA-mediated transcriptome surveillance

    Chen, Wenjun; Brown, Jordan S; He, Tao; Wu, Wei-Sheng; Tu, Shikui; Weng, Zhiping; Zhang, Donglei; Lee, Heng-Chi (2022-09-09)
    piRNAs function as guardians of the genome by silencing non-self nucleic acids and transposable elements in animals. Many piRNA factors are enriched in perinuclear germ granules, but whether their localization is required for piRNA biogenesis or function is not known. Here we show that GLH/VASA helicase mutants exhibit defects in forming perinuclear condensates containing PIWI and other small RNA cofactors. These mutant animals produce largely normal levels of piRNA but are defective in triggering piRNA silencing. Strikingly, while many piRNA targets are activated in GLH mutants, we observe that hundreds of endogenous genes are aberrantly silenced by piRNAs. This defect in self versus non-self recognition is also observed in other mutants where perinuclear germ granules are disrupted. Together, our results argue that perinuclear germ granules function critically to promote the fidelity of piRNA-based transcriptome surveillance in C. elegans and preserve self versus non-self distinction.

View more