A novel NLP-based method and algorithm to discover RNA-binding protein (RBP) motifs, contexts, binding preferences, and interactions [preprint]
Elhajjajy, Shaimae I ; Weng, Zhiping
Authors
Student Authors
Faculty Advisor
Academic Program
UMass Chan Affiliations
Document Type
Publication Date
Keywords
Subject Area
Embargo Expiration Date
Link to Full Text
Abstract
RNA-binding proteins (RBPs) are essential modulators in the regulation of mRNA processing. The binding patterns, interactions, and functions of most RBPs are not well-characterized. Previous studies have shown that motif context is an important contributor to RBP binding specificity, but its precise role remains unclear. Despite recent computational advances to predict RBP binding, existing methods are challenging to interpret and largely lack a categorical focus on RBP motif contexts and RBP-RBP interactions. There remains a need for interpretable predictive models to disambiguate the contextual determinants of RBP binding specificity . Here, we present a novel and comprehensive pipeline to address these knowledge gaps. We devise a Natural Language Processing-based decomposition method to deconstruct sequences into entities consisting of a central target -mer and its flanking regions, then use this representation to formulate the RBP binding prediction task as a weakly supervised Multiple Instance Learning problem. To interpret our predictions, we introduce a deterministic motif discovery algorithm designed to handle our data structure, recapitulating the established motifs of numerous RBPs as validation. Importantly, we characterize the binding motifs and binding contexts for 71 RBPs, with many of them being novel. Finally, through feature integration, transitive inference, and a new cross-prediction approach, we propose novel cooperative and competitive RBP-RBP interaction partners and hypothesize their potential regulatory functions. In summary, we present a complete computational strategy for investigating the contextual determinants of specific RBP binding, and we demonstrate the significance of our findings in delineating RBP binding patterns, interactions, and functions.
Source
Elhajjajy SI, Weng Z. A novel NLP-based method and algorithm to discover RNA-binding protein (RBP) motifs, contexts, binding preferences, and interactions. bioRxiv [Preprint]. 2025 Jan 22:2025.01.20.631609. doi: 10.1101/2025.01.20.631609. PMID: 39896518; PMCID: PMC11785142.
Year of Medical School at Time of Visit
Sponsors
Dates of Travel
DOI
Permanent Link to this Item
PubMed ID
Other Identifiers
Notes
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.