Show simple item record

dc.contributor.authorPratt, Henry E.
dc.contributor.authorAndrews, Gregory
dc.contributor.authorPhalke, Nishigandha
dc.contributor.authorPurcaro, Michael J.
dc.contributor.authorvan der Velde, Arjan
dc.contributor.authorMoore, Jill E.
dc.contributor.authorWeng, Zhiping
dc.date2022-08-11T08:08:28.000
dc.date.accessioned2022-08-23T15:56:03Z
dc.date.available2022-08-23T15:56:03Z
dc.date.issued2021-10-12
dc.date.submitted2021-12-01
dc.identifier.citation<p>bioRxiv 2021.10.11.463518; doi: https://doi.org/10.1101/2021.10.11.463518. <a href="https://doi.org/10.1101/2021.10.11.463518" target="_blank" title="view preprint in biorxiv"> Link to preprint on bioRxiv.</a></p>
dc.identifier.doi10.1101/2021.10.11.463518
dc.identifier.urihttp://hdl.handle.net/20.500.14038/29887
dc.description<p>This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.</p>
dc.description.abstractThe human genome contains roughly 1,600 transcription factors (TFs) (1), DNA-binding proteins recognizing characteristic sequence motifs to exert regulatory effects on gene expression. The binding specificities of these factors have been profiled both in vitro, using techniques such as HT-SELEX (2), and in vivo, using techniques including ChIP-seq (3, 4). We previously developed Factorbook, a TF-centric database of annotations, motifs, and integrative analyses based on ChIP-seq data from Phase II of the ENCODE Project. Here we present an update to Factorbook which significantly expands the breadth of cell type and TF coverage. The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif catalog is integrated with the ENCODE registry of candidate cis-regulatory elements to annotate a comprehensive collection of genome-wide candidate TF binding sites. The database also offers novel tools for applying the motif models within machine learning frameworks and using these models for integrative analysis, including annotation of variants and disease and trait heritability. We will continue to expand the resource as ENCODE Phase IV data are released.
dc.language.isoen_US
dc.relation<p>Now published in <em>Nucleic Acids Research</em> doi: <a href="http://dx.doi.org/10.1093/nar/gkab1039" target="_blank">10.1093/nar/gkab1039</a></p>
dc.rightsThe copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectBioinformatics
dc.subjectFactorbook
dc.subjecttranscription factors
dc.subjectENCODE
dc.subjectAmino Acids, Peptides, and Proteins
dc.subjectBioinformatics
dc.subjectComputational Biology
dc.titleFactorbook: an Updated Catalog of Transcription Factor Motifs and Candidate Regulatory Motif Sites [preprint]
dc.typePreprint
dc.source.journaltitlebioRxiv
dc.identifier.legacyfulltexthttps://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=3125&amp;context=faculty_pubs&amp;unstamped=1
dc.identifier.legacycoverpagehttps://escholarship.umassmed.edu/faculty_pubs/2093
dc.identifier.contextkey26419987
refterms.dateFOA2022-08-23T15:56:03Z
html.description.abstract<p>The human genome contains roughly 1,600 transcription factors (TFs) (<a href="https://www.biorxiv.org/content/10.1101/2021.10.11.463518v1#ref-1" id="x-x-x-x-xref-ref-1-1">1</a>), DNA-binding proteins recognizing characteristic sequence motifs to exert regulatory effects on gene expression. The binding specificities of these factors have been profiled both <em>in vitro</em>, using techniques such as HT-SELEX (<a href="https://www.biorxiv.org/content/10.1101/2021.10.11.463518v1#ref-2" id="x-x-x-x-xref-ref-2-1">2</a>), and <em>in vivo</em>, using techniques including ChIP-seq (<a href="https://www.biorxiv.org/content/10.1101/2021.10.11.463518v1#ref-3" id="x-x-x-x-xref-ref-3-1">3</a>, <a href="https://www.biorxiv.org/content/10.1101/2021.10.11.463518v1#ref-4" id="x-x-x-x-xref-ref-4-1">4</a>). We previously developed Factorbook, a TF-centric database of annotations, motifs, and integrative analyses based on ChIP-seq data from Phase II of the ENCODE Project. Here we present an update to Factorbook which significantly expands the breadth of cell type and TF coverage. The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif catalog is integrated with the ENCODE registry of candidate cis-regulatory elements to annotate a comprehensive collection of genome-wide candidate TF binding sites. The database also offers novel tools for applying the motif models within machine learning frameworks and using these models for integrative analysis, including annotation of variants and disease and trait heritability. We will continue to expand the resource as ENCODE Phase IV data are released.</p>
dc.identifier.submissionpathfaculty_pubs/2093
dc.contributor.departmentGraduate School of Biomedical Sciences
dc.contributor.departmentProgram in Bioinformatics and Integrative Biology


Files in this item

Thumbnail
Name:
2021.10.11.463518v1.full.pdf
Size:
2.222Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Except where otherwise noted, this item's license is described as The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.