• Analysis, Visualization, and Machine Learning of Epigenomic Data

      Purcaro, Michael J. (2017-12-12)
      The goal of the Encyclopedia of DNA Elements (ENCODE) project has been to characterize all the functional elements of the human genome. These elements include expressed transcripts and genomic regions bound by transcription factors (TFs), occupied by nucleosomes, occupied by nucleosomes with modified histones, or hypersensitive to DNase I cleavage, etc. Chromatin Immunoprecipitation (ChIP-seq) is an experimental technique for detecting TF binding in living cells, and the genomic regions bound by TFs are called ChIP-seq peaks. ENCODE has performed and compiled results from tens of thousands of experiments, including ChIP-seq, DNase, RNA-seq and Hi-C. These efforts have culminated in two web-based resources from our lab—Factorbook and SCREEN—for the exploration of epigenomic data for both human and mouse. Factorbook is a peak-centric resource presenting data such as motif enrichment and histone modification profiles for transcription factor binding sites computed from ENCODE ChIP-seq data. SCREEN provides an encyclopedia of ~2 million regulatory elements, including promoters and enhancers, identified using ENCODE ChIP-seq and DNase data, with an extensive UI for searching and visualization. While we have successfully utilized the thousands of available ENCODE ChIP-seq experiments to build the Encyclopedia and visualizers, we have also struggled with the practical and theoretical inability to assay every possible experiment on every possible biosample under every conceivable biological scenario. We have used machine learning techniques to predict TF binding sites and enhancers location, and demonstrate machine learning is critical to help decipher functional regions of the genome.
    • CBFbeta-SMMHC Inhibition Triggers Apoptosis by Disrupting MYC Chromatin Dynamics in Acute Myeloid Leukemia

      Pulikkan, John A.; Hegde, Mahesh; Ahmad, Hafiz Mohd; Belaghzal, Houda; Illendula, Anuradha; Yu, Jun; O'Hagan, Kelsey; Ou, Jianhong; Muller-Tidow, Carsten; Wolfe, Scot A.; et al. (2018-06-28)
      The fusion oncoprotein CBFbeta-SMMHC, expressed in leukemia cases with chromosome 16 inversion, drives leukemia development and maintenance by altering the activity of the transcription factor RUNX1. Here, we demonstrate that CBFbeta-SMMHC maintains cell viability by neutralizing RUNX1-mediated repression of MYC expression. Upon pharmacologic inhibition of the CBFbeta-SMMHC/RUNX1 interaction, RUNX1 shows increased binding at three MYC distal enhancers, where it represses MYC expression by mediating the replacement of the SWI/SNF complex component BRG1 with the polycomb-repressive complex component RING1B, leading to apoptosis. Combining the CBFbeta-SMMHC inhibitor with the BET inhibitor JQ1 eliminates inv(16) leukemia in human cells and a mouse model. Enhancer-interaction analysis indicated that the three enhancers are physically connected with the MYC promoter, and genome-editing analysis demonstrated that they are functionally implicated in deregulation of MYC expression. This study reveals a mechanism whereby CBFbeta-SMMHC drives leukemia maintenance and suggests that inhibitors targeting chromatin activity may prove effective in inv(16) leukemia therapy.
    • Defining a Registry of Candidate Regulatory Elements to Interpret Disease Associated Genetic Variation

      Moore, Jill E. (2017-10-10)
      Over the last decade there has been a great effort to annotate noncoding regions of the genome, particularly those that regulate gene expression. These regulatory elements contain binding sites for transcription factors (TF), which interact with one another and transcriptional machinery to initiate, enhance, or repress gene expression. The Encyclopedia of DNA Elements (ENCODE) consortium has generated thousands of epigenomic datasets, such as DNase-seq and ChIP-seq experiments, with the goal of defining such regions. By integrating these assays, we developed the Registry of candidate Regulatory Elements (cREs), a collection of putative regulatory regions across human and mouse. In total, we identified over 1.3M human and 400k mouse cREs each annotated with cell-type specific signatures (e.g. promoter-like, enhancer-like) in over 400 human and 100 mouse biosamples. We then demonstrated the biological utility of these regions by analyzing cell type enrichments for genetic variants reported by genome wide association studies (GWAS). To search and visualize these cREs, we developed the online database SCREEN (search candidate regulatory elements by ENCODE). After defining cREs, we next sought to determine their potential gene targets. To compare target gene prediction methods, we developed a comprehensive benchmark of enhancer-gene links by curating ChIA-PET, Hi-C and eQTL datasets. We then used this benchmark to evaluate unsupervised linking approaches such as the correlation of epigenomic signal. We determined that these methods have low overall performance and do not outperform simply selecting the closest gene. We then developed a supervised Random Forest model which had notably better performance than unsupervised methods. We demonstrated that this model can be applied across cell types and can be used to predict target genes for GWAS associated variants. Finally, we used the registry of cREs to annotate variants associated with psychiatric disorders. We found that these "psych SNPs" are enriched in cREs active in brain tissue and likely target genes involved in neural development pathways. We also demonstrated that psych SNPs overlap binding sites for TFs involved in neural and immune pathways. Finally, by identifying psych SNPs with allele imbalance in chromatin accessibility, we highlighted specific cases of psych SNPs altering TF binding motifs resulting in the disruption of TF binding. Overall, we demonstrated our collection of putative regulatory regions, the Registry of cREs, can be used to understand the potential biological function of noncoding variation and develop hypotheses for future testing.
    • Robust Identification of Developmentally Active Endothelial Enhancers in Zebrafish Using FANS-Assisted ATAC-Seq

      Quillien, Aurelie; Abdalla, Mary; Yu, Jun; Ou, Jianhong; Zhu, Lihua (Julie); Lawson, Nathan D. (2017-07-18)
      Identification of tissue-specific and developmentally active enhancers provides insights into mechanisms that control gene expression during embryogenesis. However, robust detection of these regulatory elements remains challenging, especially in vertebrate genomes. Here, we apply fluorescent-activated nuclei sorting (FANS) followed by Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq) to identify developmentally active endothelial enhancers in the zebrafish genome. ATAC-seq of nuclei from Tg(fli1a:egfp)(y1) transgenic embryos revealed expected patterns of nucleosomal positioning at transcriptional start sites throughout the genome and association with active histone modifications. Comparison of ATAC-seq from GFP-positive and -negative nuclei identified more than 5,000 open elements specific to endothelial cells. These elements flanked genes functionally important for vascular development and that displayed endothelial-specific gene expression. Importantly, a majority of tested elements drove endothelial gene expression in zebrafish embryos. Thus, FANS-assisted ATAC-seq using transgenic zebrafish embryos provides a robust approach for genome-wide identification of active tissue-specific enhancer elements.
    • TALE and NF-Y co-occupancy marks enhancers of developmental control genes during zygotic genome activation in zebrafish [preprint]

      Stanney, William J. III; Ladam, Franck; Donaldson, Ian J.; Parsons, Teagan J.; Maehr, Rene; Bobola, Nicoletta; Sagerstrom, Charles G. (2019-07-31)
      Animal embryogenesis is initiated by maternal factors, but zygotic genome activation (ZGA) shifts control to the embryo at early blastula stages. ZGA is thought to be mediated by specialized maternally deposited transcription factors (TFs), but here we demonstrate that NF-Y and TALE – TFs with known later roles in embryogenesis – co-occupy unique genomic elements at zebrafish ZGA. We show that these elements are selectively associated with early-expressed genes involved in transcriptional regulation and possess enhancer activity in vivo. In contrast, we find that elements individually occupied by either NF-Y or TALE are associated with genes acting later in development – such that NF-Y controls a cilia gene expression program while TALE TFs control expression of hox genes. We conclude that NF-Y and TALE have a shared role at ZGA, but separate roles later during development, demonstrating that combinations of known TFs can regulate subsets of key developmental genes at vertebrate ZGA.
    • The 3D Genome as Moderator of Chromosomal Communication

      Dekker, Job; Mirny, Leonid A. (2016-03-10)
      Proper expression of genes requires communication with their regulatory elements that can be located elsewhere along the chromosome. The physics of chromatin fibers imposes a range of constraints on such communication. The molecular and biophysical mechanisms by which chromosomal communication is established, or prevented, have become a topic of intense study, and important roles for the spatial organization of chromosomes are being discovered. Here we present a view of the interphase 3D genome characterized by extensive physical compartmentalization and insulation on the one hand and facilitated long-range interactions on the other. We propose the existence of topological machines dedicated to set up and to exploit a 3D genome organization to both promote and censor communication along and between chromosomes.
    • The TALE Factors and Nuclear Factor Y Cooperate to Drive Transcription at Zygotic Genome Activation

      Stanney, William J. III (2019-08-06)
      The TALE factors, comprising the pbx and prep/meis gene families, are transcription factors (TFs) vital to the proper formation of anterior anatomical structures during embryonic development. Although best understood as essential cofactors for tissue-specific TFs such as the hox genes during segmentation, the TALE factors also form complexes with nuclear factor Y (NFY) in the early zygote. In zebrafish, Pbx4, Prep1, and NFY are maternally deposited and can access their DNA binding sites in compact chromatin. Our results suggest that TALE/NFY complexes have a unique role in early embryonic development which is distinct from each factor’s independent functions at later stages. To characterize these TALE/NFY complexes, we employed high-throughput transcriptomic and genomic techniques in zebrafish embryos. Using dominant negatives to disrupt the function of each factor, we find that they display similar, but not identical, loss-of-function phenotypes and co-regulate genes involved in transcription regulation and embryonic development. Independently, the TALE factors regulate homeobox genes and NFY governs cilia-related genes. ChIP-seq analysis at zygotic genome activation reveals that the TALE factors occupy DECA sites adjacent to CCAAT boxes near genes expressed early in development and involved with transcription regulation. Finally, DNA elements containing TALE and NFY binding sites drive reporter gene expression in transgenic zebrafish, and disruption of TALE/NFY binding via mutation or dominant negatives eliminates this expression. Taken together, this data suggests that the TALE factors and NFY cooperate to regulate a set of development and transcription control genes in early zygotic development but also have independent roles after gastrulation.