• Broad host range of SARS-CoV-2 predicted by comparative and structural analysis of ACE2 in vertebrates

      Damas, Joana; Karlsson, Elinor K.; Lewin, Harris A. (2020-08-21)
      The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of COVID-19. The main receptor of SARS-CoV-2, angiotensin I converting enzyme 2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of ACE2 sequences from 410 vertebrate species, including 252 mammals, to study the conservation of ACE2 and its potential to be used as a receptor by SARS-CoV-2. We designed a five-category binding score based on the conservation properties of 25 amino acids important for the binding between ACE2 and the SARS-CoV-2 spike protein. Only mammals fell into the medium to very high categories and only catarrhine primates into the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a protein structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 spike protein binding and found the number of predicted unfavorable changes significantly correlated with the binding score. Extending this analysis to human population data, we found only rare (frequency < 0.001) variants in 10/25 binding sites. In addition, we found significant signals of selection and accelerated evolution in the ACE2 coding sequence across all mammals, and specific to the bat lineage. Our results, if confirmed by additional experimental data, may lead to the identification of intermediate host species for SARS-CoV-2, guide the selection of animal models of COVID-19, and assist the conservation of animals both in native habitats and in human care.
    • Broad Host Range of SARS-CoV-2 Predicted by Comparative and Structural Analysis of ACE2 in Vertebrates [preprint]

      Damas, Joana; Karlsson, Elinor K.; Lewin, Harris A. (2020-04-18)
      The novel coronavirus SARS-CoV-2 is the cause of Coronavirus Disease-2019 (COVID-19). The main receptor of SARS-CoV-2, angiotensin I converting enzyme 2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of 410 vertebrates, including 252 mammals, to study cross-species conservation of ACE2 and its likelihood to function as a SARS-CoV-2 receptor. We designed a five-category ranking score based on the conservation properties of 25 amino acids important for the binding between receptor and virus, classifying all species from very high to very low. Only mammals fell into the medium to very high categories, and only catarrhine primates in the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a protein structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 binding, and found the number of predicted unfavorable changes significantly correlated with the binding score. Extending this analysis to human population data, we found only rare (
    • Comparative Analysis of Immune Cells Reveals a Conserved Regulatory Lexicon

      Donnard, Elisa; Vangala, Pranitha; Afik, Shaked; McCauley, Sean M.; Nowosielska, Anetta; Kucukural, Alper; Tabak, Barbara; Zhu, Xiaopeng; Diehl, William E.; McDonel, Patrick; et al. (2018-03-28)
      Most well-characterized enhancers are deeply conserved. In contrast, genome-wide comparative studies of steady-state systems showed that only a small fraction of active enhancers are conserved. To better understand conservation of enhancer activity, we used a comparative genomics approach that integrates temporal expression and epigenetic profiles in an innate immune system. We found that gene expression programs diverge among mildly induced genes, while being highly conserved for strongly induced genes. The fraction of conserved enhancers varies greatly across gene expression programs, with induced genes and early-response genes, in particular, being regulated by a higher fraction of conserved enhancers. Clustering of conserved accessible DNA sequences within enhancers resulted in over 60 sequence motifs including motifs for known factors, as well as many with unknown function. We further show that the number of instances of these motifs is a strong predictor of the responsiveness of a gene to pathogen detection.
    • Comparative Pangenomics of the Mammalian Gut Commensal Bifidobacterium longum

      Albert, Korin; Rani, Asha; Sela, David (2019-12-18)
      Bifidobacterium longum colonizes mammalian gastrointestinal tracts where it could metabolize host-indigestible oligosaccharides. Although B. longum strains are currently segregated into three subspecies that reflect common metabolic capacities and genetic similarity, heterogeneity within subspecies suggests that these taxonomic boundaries may not be completely resolved. To address this, the B. longum pangenome was analyzed from representative strains isolated from a diverse set of sources. As a result, the B. longum pangenome is open and contains almost 17,000 genes, with over 85% of genes found in < /=28 of 191 strains. B. longum genomes share a small core gene set of only ~500 genes, or ~3% of the total pangenome. Although the individual B. longum subspecies pangenomes share similar relative abundances of clusters of orthologous groups, strains show inter- and intrasubspecies differences with respect to carbohydrate utilization gene content and growth phenotypes.
    • Darwinian genomics and diversity in the tree of life

      Stephan, Taylorlyn; Karlsson, Elinor K. (2022-01-25)
      Genomics encompasses the entire tree of life, both extinct and extant, and the evolutionary processes that shape this diversity. To date, genomic research has focused on humans, a small number of agricultural species, and established laboratory models. Fewer than 18,000 of approximately 2,000,000 eukaryotic species ( < 1%) have a representative genome sequence in GenBank, and only a fraction of these have ancillary information on genome structure, genetic variation, gene expression, epigenetic modifications, and population diversity. This imbalance reflects a perception that human studies are paramount in disease research. Yet understanding how genomes work, and how genetic variation shapes phenotypes, requires a broad view that embraces the vast diversity of life. We have the technology to collect massive and exquisitely detailed datasets about the world, but expertise is siloed into distinct fields. A new approach, integrating comparative genomics with cell and evolutionary biology, ecology, archaeology, anthropology, and conservation biology, is essential for understanding and protecting ourselves and our world. Here, we describe potential for scientific discovery when comparative genomics works in close collaboration with a broad range of fields as well as the technical, scientific, and social constraints that must be addressed.
    • Evolutionary analysis across mammals reveals distinct classes of long noncoding RNAs [preprint]

      Chen, Jenny; Shishkin, Alexander A.; Zhu, Xiaopeng; Kadri, Sabah; Maza, Itay; Hanna, Jacob H.; Regev, Aviv; Garber, Manuel (2015-11-11)
      BACKGROUND: Recent advances in transcriptome sequencing have enabled the discovery of thousands of long non-coding RNAs (lncRNAs) across multitudes of species. Though several lncRNAs have been shown to play important roles in diverse biological processes, the functions and mechanisms of most lncRNAs remain unknown. Two significant obstacles lie between transcriptome sequencing and functional characterization of lncRNAs: 1) identifying truly noncoding genes from de novo reconstructed transcriptomes, and 2) prioritizing hundreds of resulting putative lncRNAs from each sample for downstream experimental interrogation. RESULTS: We present slncky, a computational lncRNA discovery tool that produces a high-quality set of lncRNAs from RNA-Sequencing data and further prioritizes lncRNAs by characterizing selective constraint as a proxy for function. Our filtering pipeline is comparable to manual curation efforts and more sensitive than previously published approaches. Further, we develop, for the first time, a sensitive alignment pipeline for aligning lncRNA loci and propose new evolutionary metrics relevant for both sequence and transcript evolution. Our analysis reveals that selection acts in several distinct patterns, and uncovers two notable classes of lncRNAs: one showing strong purifying selection at RNA sequence and another where constraint is restricted to the regulation but not the sequence of the transcript. CONCLUSION: Our novel comparative methods for lncRNAs reveals 233 constrained lncRNAs out of tens of thousands of currently annotated transcripts, which we believe should be prioritized for further interrogation. To aid in their analysis we provide the slncky Evolution Browser as a resource for experimentalists.
    • How to make a rodent giant: Genomic basis and tradeoffs of gigantism in the capybara, the world's largest rodent

      Herrera-Alvarez, Santiago; Karlsson, Elinor K.; Ryder, Oliver A.; Lindblad-Toh, Kerstin; Crawford, Andrew J. (2020-11-10)
      Gigantism results when one lineage within a clade evolves extremely large body size relative to its small-bodied ancestors, a common phenomenon in animals. Theory predicts that the evolution of giants should be constrained by two tradeoffs. First, because body size is negatively correlated with population size, purifying selection is expected to be less efficient in species of large body size, leading to increased mutational load. Second, gigantism is achieved through generating a higher number of cells along with higher rates of cell proliferation, thus increasing the likelihood of cancer. To explore the genetic basis of gigantism in rodents and uncover genomic signatures of gigantism-related tradeoffs, we assembled a draft genome of the capybara (Hydrochoerus hydrochaeris), the world's largest living rodent. We found that the genome-wide ratio of non-synonymous to synonymous mutations (omega) is elevated in the capybara relative to other rodents, likely caused by a generation-time effect and consistent with a nearly-neutral model of molecular evolution. A genome-wide scan for adaptive protein evolution in the capybara highlighted several genes controlling post-natal bone growth regulation and musculoskeletal development, which are relevant to anatomical and developmental modifications for an increase in overall body size. Capybara-specific gene-family expansions included a putative novel anticancer adaptation that involves T cell-mediated tumor suppression, offering a potential resolution to the increased cancer risk in this lineage. Our comparative genomic results uncovered the signature of an intragenomic conflict where the evolution of gigantism in the capybara involved selection on genes and pathways that are directly linked to cancer.
    • The comparative genomics of Bifidobacterium callitrichos reflects dietary carbohydrate utilization within the common marmoset gut

      Albert, Korin; Rani, Asha; Sela, David A. (2018-06-15)
      Bifidobacterium is a diverse genus of anaerobic, saccharolytic bacteria that colonize many animals, notably humans and other mammals. The presence of these bacteria in the gastrointestinal tract represents a potential coevolution between the gut microbiome and its mammalian host mediated by diet. To study the relationship between bifidobacterial gut symbionts and host nutrition, we analyzed the genome of two bifidobacteria strains isolated from the feces of a common marmoset (Callithrix jacchus), a primate species studied for its ability to subsist on host-indigestible carbohydrates. Whole genome sequencing identified these isolates as unique strains of Bifidobacterium callitrichos. All three strains, including these isolates and the previously described type strain, contain genes that may enable utilization of marmoset dietary substrates. These include genes predicted to contribute to galactose, arabinose, and trehalose metabolic pathways. In addition, significant genomic differences between strains suggest that bifidobacteria possess distinct roles in carbohydrate metabolism within the same host. Thus, bifidobacteria utilize dietary components specific to their host, both humans and non-human primates alike. Comparative genomics suggests conservation of possible coevolutionary relationships within the primate clade.