• A benchmark testing ground for integrating homology modeling and protein docking

      Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J.; Bonvin, Alexandre M. J. J.; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima (2016-05-12)
      Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases.
    • A comprehensive genomic history of extinct and living elephants

      Palkopoulou, Eleftheria; Karlsson, Elinor K.; Reich, David (2018-03-13)
      Elephantids are the world's most iconic megafaunal family, yet there is no comprehensive genomic assessment of their relationships. We report a total of 14 genomes, including 2 from the American mastodon, which is an extinct elephantid relative, and 12 spanning all three extant and three extinct elephantid species including an approximately 120,000-y-old straight-tusked elephant, a Columbian mammoth, and woolly mammoths. Earlier genetic studies modeled elephantid evolution via simple bifurcating trees, but here we show that interspecies hybridization has been a recurrent feature of elephantid evolution. We found that the genetic makeup of the straight-tusked elephant, previously placed as a sister group to African forest elephants based on lower coverage data, in fact comprises three major components. Most of the straight-tusked elephant's ancestry derives from a lineage related to the ancestor of African elephants while its remaining ancestry consists of a large contribution from a lineage related to forest elephants and another related to mammoths. Columbian and woolly mammoths also showed evidence of interbreeding, likely following a latitudinal cline across North America. While hybridization events have shaped elephantid history in profound ways, isolation also appears to have played an important role. Our data reveal nearly complete isolation between the ancestors of the African forest and savanna elephants for approximately 500,000 y, providing compelling justification for the conservation of forest and savanna elephants as separate species.
    • A deep sequencing tool for partitioning clearance rates following antimalarial treatment in polyclonal infections

      Mideo, Nicole; Bailey, Jeffrey A.; Hathaway, Nicholas J.; Ngasala, Billy; Saunders, David L.; Lon, Chanthap; Kharabora, Oksana; Jamnik, Andrew; Balasubramanian, Sujata; Bjorkman, Anders; et al. (2016-01-27)
      BACKGROUND AND OBJECTIVES: Current tools struggle to detect drug-resistant malaria parasites when infections contain multiple parasite clones, which is the norm in high transmission settings in Africa. Our aim was to develop and apply an approach for detecting resistance that overcomes the challenges of polyclonal infections without requiring a genetic marker for resistance. METHODOLOGY: Clinical samples from patients treated with artemisinin combination therapy were collected from Tanzania and Cambodia. By deeply sequencing a hypervariable locus, we quantified the relative abundance of parasite subpopulations (defined by haplotypes of that locus) within infections and revealed evolutionary dynamics during treatment. Slow clearance is a phenotypic, clinical marker of artemisinin resistance; we analyzed variation in clearance rates within infections by fitting parasite clearance curves to subpopulation data. RESULTS: In Tanzania, we found substantial variation in clearance rates within individual patients. Some parasite subpopulations cleared as slowly as resistant parasites observed in Cambodia. We evaluated possible explanations for these data, including resistance to drugs. Assuming slow clearance was a stable phenotype of subpopulations, simulations predicted that modest increases in their frequency could substantially increase time to cure. CONCLUSIONS AND IMPLICATIONS: By characterizing parasite subpopulations within patients, our method can detect rare, slow clearing parasites in vivo whose phenotypic effects would otherwise be masked. Since our approach can be applied to polyclonal infections even when the genetics underlying resistance are unknown, it could aid in monitoring the emergence of artemisinin resistance. Our application to Tanzanian samples uncovers rare subpopulations with worrying phenotypes for closer examination.
    • A flexible docking approach for prediction of T cell receptor-peptide-MHC complexes

      Pierce, Brian G.; Weng, Zhiping (2013-01-01)
      T cell receptors (TCRs) are immune proteins that specifically bind to antigenic molecules, which are often foreign peptides presented by major histocompatibility complex proteins (pMHCs), playing a key role in the cellular immune response. To advance our understanding and modeling of this dynamic immunological event, we assembled a protein-protein docking benchmark consisting of 20 structures of crystallized TCR/pMHC complexes for which unbound structures exist for both TCR and pMHC. We used our benchmark to compare predictive performance using several flexible and rigid backbone TCR/pMHC docking protocols. Our flexible TCR docking algorithm, TCRFlexDock, improved predictive success over the fixed backbone protocol, leading to near-native predictions for 80% of the TCR/pMHC cases among the top 10 models, and 100% of the cases in the top 30 models. We then applied TCRFlexDock to predict the two distinct docking modes recently described for a single TCR bound to two different antigens, and tested several protein modeling scoring functions for prediction of TCR/pMHC binding affinities. This algorithm and benchmark should enable future efforts to predict, and design of uncharacterized TCR/pMHC complexes.
    • A generalized framework for computational design and mutational scanning of T-cell receptor binding interfaces

      Riley, Timothy P.; Ayres, Cory M.; Hellman, Lance M.; Singh, Nishant K.; Cosiano, Michael; Cimons, Jennifer M.; Anderson, Michael J.; Piepenbrink, Kurt H.; Pierce, Brian G.; Weng, Zhiping; et al. (2016-12-01)
      T-cell receptors (TCRs) have emerged as a new class of therapeutics, most prominently for cancer where they are the key components of new cellular therapies as well as soluble biologics. Many studies have generated high affinity TCRs in order to enhance sensitivity. Recent outcomes, however, have suggested that fine manipulation of TCR binding, with an emphasis on specificity may be more valuable than large affinity increments. Structure-guided design is ideally suited for this role, and here we studied the generality of structure-guided design as applied to TCRs. We found that a previous approach, which successfully optimized the binding of a therapeutic TCR, had poor accuracy when applied to a broader set of TCR interfaces. We thus sought to develop a more general purpose TCR design framework. After assembling a large dataset of experimental data spanning multiple interfaces, we trained a new scoring function that accounted for unique features of each interface. Together with other improvements, such as explicit inclusion of molecular flexibility, this permitted the design new affinity-enhancing mutations in multiple TCRs, including those not used in training. Our approach also captured the impacts of mutations and substitutions in the peptide/MHC ligand, and recapitulated recent findings regarding TCR specificity, indicating utility in more general mutational scanning of TCR-pMHC interfaces.
    • A machine learning approach for the prediction of protein surface loop flexibility

      Hwang, Howook; Vreven, Thom; Whitfield, Troy W.; Wiehe, Kevin; Weng, Zhiping (2011-08-01)
      Proteins often undergo conformational changes when binding to each other. A major fraction of backbone conformational changes involves motion on the protein surface, particularly in loops. Accounting for the motion of protein surface loops represents a challenge for protein-protein docking algorithms. A first step in addressing this challenge is to distinguish protein surface loops that are likely to undergo backbone conformational changes upon protein-protein binding (mobile loops) from those that are not (stationary loops). In this study, we developed a machine learning strategy based on support vector machines (SVMs). Our SVM uses three features of loop residues in the unbound protein structures-Ramachandran angles, crystallographic B-factors, and relative accessible surface area-to distinguish mobile loops from stationary ones. This method yields an average prediction accuracy of 75.3% compared with a random prediction accuracy of 50%, and an average of 0.79 area under the receiver operating characteristic (ROC) curve using cross-validation. Testing the method on an independent dataset, we obtained a prediction accuracy of 70.5%. Finally, we applied the method to 11 complexes that involve members from the Ras superfamily and achieved prediction accuracy of 92.8% for the Ras superfamily proteins and 74.4% for their binding partners.
    • A Sex Chromosome piRNA Promotes Robust Dosage Compensation and Sex Determination in C. elegans

      Tang, Wen; Seth, Meetu; Tu, Shikui; Shen, En-Zhi; Li, Qian; Shirayama, Masaki; Weng, Zhiping; Mello, Craig C. (2018-03-26)
      In metazoans, Piwi-related Argonaute proteins engage piRNAs (Piwi-interacting small RNAs) to defend the genome against invasive nucleic acids, such as transposable elements. Yet many organisms-including worms and humans-express thousands of piRNAs that do not target transposons, suggesting that piRNA function extends beyond genome defense. Here, we show that the X chromosome-derived piRNA 21ux-1 downregulates XOL-1 (XO Lethal), a master regulator of X chromosome dosage compensation and sex determination in Caenorhabditis elegans. Mutations in 21ux-1 and several Piwi-pathway components sensitize hermaphrodites to dosage compensation and sex determination defects. We show that the piRNA pathway also targets xol-1 in C. briggsae, a nematode species related to C. elegans. Our findings reveal physiologically important piRNA-mRNA interactions, raising the possibility that piRNAs function broadly to ensure robust gene expression and germline development.
    • A structure-based benchmark for protein-protein binding affinity

      Kastritis, Panagiotis L.; Moal, Iain H.; Hwang, Howook; Weng, Zhiping; Bates, Paul A.; Bonvin, Alexandre M. J. J.; Janin, Joel (2011-03-01)
      We have assembled a nonredundant set of 144 protein-protein complexes that have high-resolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods. The set is diverse in terms of the biological functions it represents, with complexes that involve G-proteins and receptor extracellular domains, as well as antigen/antibody, enzyme/inhibitor, and enzyme/substrate complexes. It is also diverse in terms of the partners' affinity for each other, with K(d) ranging between 10(-5) and 10(-14) M. Nine pairs of entries represent closely related complexes that have a similar structure, but a very different affinity, each pair comprising a cognate and a noncognate assembly. The unbound structures of the component proteins being available, conformation changes can be assessed. They are significant in most of the complexes, and large movements or disorder-to-order transitions are frequently observed. The set may be used to benchmark biophysical models aiming to relate affinity to structure in protein-protein interactions, taking into account the reactants and the conformation changes that accompany the association reaction, instead of just the final product.
    • A systems level approach to temporal expression dynamics in Drosophila reveals clusters of long term memory genes

      Bozler, Julianna; Kacsoh, Balint Z.; Chen, Hao; Theurkauf, William E.; Weng, Zhiping; Bosco, Giovanni (2017-10-30)
      The ability to integrate experiential information and recall it in the form of memory is observed in a wide range of taxa, and is a hallmark of highly derived nervous systems. Storage of past experiences is critical for adaptive behaviors that anticipate both adverse and positive environmental factors. The process of memory formation and consolidation involve many synchronized biological events including gene transcription, protein modification, and intracellular trafficking: However, many of these molecular mechanisms remain illusive. With Drosophila as a model system we use a nonassociative memory paradigm and a systems level approach to uncover novel transcriptional patterns. RNA sequencing of Drosophila heads during and after memory formation identified a number of novel memory genes. Tracking the dynamic expression of these genes over time revealed complex gene networks involved in long term memory. In particular, this study focuses on two functional gene clusters of signal peptides and proteases. Bioinformatics network analysis and prediction in combination with high-throughput RNA sequencing identified previously unknown memory genes, which when genetically knocked down resulted in behaviorally validated memory defects.
    • AAV-delivered suppressor tRNA overcomes a nonsense mutation in mice

      Wang, Jiaming; Zhang, Yue; Mendonca, Craig A.; Yukselen, Onur; Muneeruddin, Khaja; Ren, Lingzhi; Liang, Jialing; Zhou, Chen; Xie, Jun; Li, Jia; et al. (2022-03-23)
      Gene therapy is a potentially curative medicine for many currently untreatable diseases, and recombinant adeno-associated virus (rAAV) is the most successful gene delivery vehicle for in vivo applications(1-3). However, rAAV-based gene therapy suffers from several limitations, such as constrained DNA cargo size and toxicities caused by non-physiological expression of a transgene(4-6). Here we show that rAAV delivery of a suppressor tRNA (rAAV.sup-tRNA) safely and efficiently rescued a genetic disease in a mouse model carrying a nonsense mutation, and effects lasted for more than 6 months after a single treatment. Mechanistically, this was achieved through a synergistic effect of premature stop codon readthrough and inhibition of nonsense-mediated mRNA decay. rAAV.sup-tRNA had a limited effect on global readthrough at normal stop codons and did not perturb endogenous tRNA homeostasis, as determined by ribosome profiling and tRNA sequencing, respectively. By optimizing the AAV capsid and the route of administration, therapeutic efficacy in various target tissues was achieved, including liver, heart, skeletal muscle and brain. This study demonstrates the feasibility of developing a toolbox of AAV-delivered nonsense suppressor tRNAs operating on premature termination codons (AAV-NoSTOP) to rescue pathogenic nonsense mutations and restore gene function under endogenous regulation. As nonsense mutations account for 11% of pathogenic mutations, AAV-NoSTOP can benefit a large number of patients. AAV-NoSTOP obviates the need to deliver a full-length protein-coding gene that may exceed the rAAV packaging limit, elicit adverse immune responses or cause transgene-related toxicities. It therefore represents a valuable addition to gene therapeutics.
    • ACT: aggregation and correlation toolbox for analyses of genome tracks

      Jee, Justin; Rozowsky, Joel; Yip, Kevin Y.; Lochovsky, Lucas; Bjornson, Robert; Zhong, Guoneng; Zhang, Zhengdong; Fu, Yutao; Wang, Jie; Weng, Zhiping; et al. (2011-04-15)
      We have implemented aggregation and correlation toolbox (ACT), an efficient, multifaceted toolbox for analyzing continuous signal and discrete region tracks from high-throughput genomic experiments, such as RNA-seq or ChIP-chip signal profiles from the ENCODE and modENCODE projects, or lists of single nucleotide polymorphisms from the 1000 genomes project. It is able to generate aggregate profiles of a given track around a set of specified anchor points, such as transcription start sites. It is also able to correlate related tracks and analyze them for saturation--i.e. how much of a certain feature is covered with each new succeeding experiment. The ACT site contains downloadable code in a variety of formats, interactive web servers (for use on small quantities of data), example datasets, documentation and a gallery of outputs. Here, we explain the components of the toolbox in more detail and apply them in various contexts. AVAILABILITY: ACT is available at http://act.gersteinlab.org CONTACT: pi@gersteinlab.org.
    • Adaptation to P element transposon invasion in Drosophila melanogaster

      Khurana, Jaspreet S.; Wang, Jie; Xu, Jia; Koppetsch, Birgit S.; Thomson, Travis; Nowosielska, Anetta; Li, Chengjian; Zamore, Phillip D.; Weng, Zhiping; Theurkauf, William E. (2011-12-23)
      Transposons evolve rapidly and can mobilize and trigger genetic instability. Piwi-interacting RNAs (piRNAs) silence these genome pathogens, but it is unclear how the piRNA pathway adapts to invasion of new transposons. In Drosophila, piRNAs are encoded by heterochromatic clusters and maternally deposited in the embryo. Paternally inherited P element transposons thus escape silencing and trigger a hybrid sterility syndrome termed P-M hybrid dysgenesis. We show that P-M hybrid dysgenesis activates both P elements and resident transposons and disrupts the piRNA biogenesis machinery. As dysgenic hybrids age, however, fertility is restored, P elements are silenced, and P element piRNAs are produced de novo. In addition, the piRNA biogenesis machinery assembles, and resident elements are silenced. Significantly, resident transposons insert into piRNA clusters, and these new insertions are transmitted to progeny, produce novel piRNAs, and are associated with reduced transposition. P element invasion thus triggers heritable changes in genome structure that appear to enhance transposon silencing.
    • Adaptive Evolution Leads to Cross-Species Incompatibility in the piRNA Transposon Silencing Machinery

      Parhad, Swapnil S.; Tu, Shikui; Weng, Zhiping; Theurkauf, William E. (2017-10-09)
      Reproductive isolation defines species divergence and is linked to adaptive evolution of hybrid incompatibility genes. Hybrids between Drosophila melanogaster and Drosophila simulans are sterile, and phenocopy mutations in the PIWI interacting RNA (piRNA) pathway, which silences transposons and shows pervasive adaptive evolution, and Drosophila rhino and deadlock encode rapidly evolving components of a complex that binds to piRNA clusters. We show that Rhino and Deadlock interact and co-localize in simulans and melanogaster, but simulans Rhino does not bind melanogaster Deadlock, due to substitutions in the rapidly evolving Shadow domain. Significantly, a chimera expressing the simulans Shadow domain in a melanogaster Rhino backbone fails to support piRNA production, disrupts binding to piRNA clusters, and leads to ectopic localization to bulk heterochromatin. Fusing melanogaster Deadlock to simulans Rhino, by contrast, restores localization to clusters. Deadlock binding thus directs Rhino to piRNA clusters, and Rhino-Deadlock co-evolution has produced cross-species incompatibilities, which may contribute to reproductive isolation.
    • Adenovirus-Mediated Somatic Genome Editing of Pten by CRISPR/Cas9 in Mouse Liver in Spite of Cas9-Specific Immune Responses

      Wang, Dan; Mou, Haiwei; Li, Shaoyong; Li, Yingxiang; Hough, Soren; Tran, Karen; Li, Jia; Yin, Hao; Anderson, Daniel G.; Sontheimer, Erik J.; et al. (2015-07-01)
      CRISPR/Cas9 derived from the bacterial adaptive immunity pathway is a powerful tool for genome editing, but the safety profiles of in vivo delivered Cas9 (including host immune responses to the bacterial Cas9 protein) have not been comprehensively investigated in model organisms. Nonalcoholic steatohepatitis (NASH) is a prevalent human liver disease characterized by excessive fat accumulation in the liver. In this study, we used adenovirus (Ad) vector to deliver a Streptococcus pyogenes-derived Cas9 system (SpCas9) targeting Pten, a gene involved in NASH and a negative regulator of the PI3K-AKT pathway, in mouse liver. We found that the Ad vector mediated efficient Pten gene editing even in the presence of typical Ad vector-associated immunotoxicity in the liver. Four months after vector infusion, mice receiving the Pten gene-editing Ad vector showed massive hepatomegaly and features of NASH, consistent with the phenotypes following Cre-loxP-induced Pten deficiency in mouse liver. We also detected induction of humoral immunity against SpCas9 and the potential presence of an SpCas9-specific cellular immune response. Our findings provide a strategy to model human liver diseases in mice and highlight the importance considering Cas9-specific immune responses in future translational studies involving in vivo delivery of CRISPR/Cas9.
    • America's lost dogs

      Goodman, Linda; Karlsson, Elinor K. (2018-07-05)
      Few traces remain of the domesticated dogs that populated the Americas before the arrival of Europeans in the 15th century. On page 81 of this issue, Ní Leathlobhair et al. (1) shed light on the origins of the elusive precontact dog population through genetic analysis of ancient and modern dogs. Building on earlier work, they show that American dogs alive today have almost no ancestry from precontact dogs, a monophyletic lineage descended from Arctic dogs that accompanied human migrations from Asia. Instead, the authors found that their closest remaining relative is a global transmissible cancer carrying the DNA of a long-deceased dog. It remains unclear why precontact dogs survived and thrived for thousands of years in the Americas only to swiftly and almost completely disappear with the arrival of Europeans.
    • Analysis of Microarray and RNA-seq Expression Profiling Data

      Hung, Jui-Hung; Weng, Zhiping (2016-08-29)
      Gene expression profiling refers to the simultaneous measurement of the expression levels of a large number of genes (often all genes in a genome), typically in multiple experiments spanning a variety of cell types, treatments, or environmental conditions. Expression profiling is accomplished by assaying mRNA levels with microarrays or next-generation sequencing technologies (RNA-seq). This introduction describes normalization and analysis of data generated from microarray or RNA-seq experiments.
    • Analysis of the Human Mucosal Response to Cholera Reveals Sustained Activation of Innate Immune Signaling Pathways

      Bourque, Daniel L.; Genereux, Diane P.; Karlsson, Elinor K.; Qadri, Firdausi; Harris, Jason B. (2018-01-22)
      To better understand the innate immune response to Vibrio cholerae infection, we tracked gene expression in the duodenal mucosa of 11 Bangladeshi adults with cholera, using biopsy specimens obtained immediately after rehydration and 30 and 180 days later. We identified differentially expressed genes and performed an analysis to predict differentially regulated pathways and upstream regulators. During acute cholera, there was a broad increase in the expression of genes associated with innate immunity, including activation of the NF-kappaB, mitogen-activated protein kinase (MAPK), and Toll-like receptor (TLR)-mediated signaling pathways, which, unexpectedly, persisted even 30 days after infection. Focusing on early differences in gene expression, we identified 37 genes that were differentially expressed on days 2 and 30 across the 11 participants. These genes included the endosomal Toll-like receptor gene TLR8, which was expressed in lamina propria cells. Underscoring a potential role for endosomal TLR-mediated signaling in vivo, our pathway analysis found that interferon regulatory factor 7 and beta 1 and alpha 2 interferons were among the top upstream regulators activated during cholera. Among the innate immune effectors, we found that the gene for DUOX2, an NADPH oxidase involved in the maintenance of intestinal homeostasis, was upregulated in intestinal epithelial cells during cholera. Notably, the observed increases in DUOX2 and TLR8 expression were also modeled in vitro when Caco-2 or THP-1 cells, respectively, were stimulated with live V. cholerae but not with heat-killed organisms or cholera toxin alone. These previously unidentified features of the innate immune response to V. cholerae extend our understanding of the mucosal immune signaling pathways and effectors activated in vivo following cholera.
    • Analyzing Microarray Data

      Hung, Jui-Hung; Weng, Zhiping (2016-08-29)
      Because there is no widely used software for analyzing RNA-seq data that has a graphical user interface, this protocol provides an example of analyzing microarray data using Babelomics. This analysis entails performing quantile normalization and then detecting differentially expressed genes associated with the transgenesis of a human oncogene c-Myc in mice. Finally, hierarchical clustering is performed on the differentially expressed genes using the Cluster program, and the results are visualized using TreeView.
    • Ancestry-inclusive dog genomics challenges popular breed stereotypes

      Morrill, Kathleen; Li, Xue; McClure, Jesse; Logan, Brittney; Gao, Mingshi; Dong, Yinan; Carmichael, Elena; White, Michelle E.; Weng, Zhiping; Colubri, Andres; et al. (2022-04-29)
      Behavioral genetics in dogs has focused on modern breeds, which are isolated subgroups with distinctive physical and, purportedly, behavioral characteristics. We interrogated breed stereotypes by surveying owners of 18,385 purebred and mixed-breed dogs and genotyping 2155 dogs. Most behavioral traits are heritable [heritability (h(2)) > 25%], and admixture patterns in mixed-breed dogs reveal breed propensities. Breed explains just 9% of behavioral variation in individuals. Genome-wide association analyses identify 11 loci that are significantly associated with behavior, and characteristic breed behaviors exhibit genetic complexity. Behavioral loci are not unusually differentiated in breeds, but breed propensities align, albeit weakly, with ancestral function. We propose that behaviors perceived as characteristic of modern breeds derive from thousands of years of polygenic adaptation that predates breed formation, with modern breeds distinguished primarily by aesthetic traits.
    • Antisense piRNA amplification, but not piRNA production or nuage assembly, requires the Tudor-domain protein Qin

      Zhang, Zhao; Koppetsch, Birgit S.; Wang, Jie; Tipping, Cindy; Weng, Zhiping; Theurkauf, William E.; Zamore, Phillip D. (2014-03-18)