Publication

A novel NLP-based method and algorithm to discover RNA-binding protein (RBP) motifs, contexts, binding preferences, and interactions [preprint]

Elhajjajy, Shaimae I
Weng, Zhiping
Citations
Google Scholar:
Altmetric:
Student Authors
Faculty Advisor
Academic Program
Document Type
Preprint
Publication Date
2025-01-22
Subject Area
Embargo Expiration Date
Link to Full Text
Abstract

RNA-binding proteins (RBPs) are essential modulators in the regulation of mRNA processing. The binding patterns, interactions, and functions of most RBPs are not well-characterized. Previous studies have shown that motif context is an important contributor to RBP binding specificity, but its precise role remains unclear. Despite recent computational advances to predict RBP binding, existing methods are challenging to interpret and largely lack a categorical focus on RBP motif contexts and RBP-RBP interactions. There remains a need for interpretable predictive models to disambiguate the contextual determinants of RBP binding specificity . Here, we present a novel and comprehensive pipeline to address these knowledge gaps. We devise a Natural Language Processing-based decomposition method to deconstruct sequences into entities consisting of a central target -mer and its flanking regions, then use this representation to formulate the RBP binding prediction task as a weakly supervised Multiple Instance Learning problem. To interpret our predictions, we introduce a deterministic motif discovery algorithm designed to handle our data structure, recapitulating the established motifs of numerous RBPs as validation. Importantly, we characterize the binding motifs and binding contexts for 71 RBPs, with many of them being novel. Finally, through feature integration, transitive inference, and a new cross-prediction approach, we propose novel cooperative and competitive RBP-RBP interaction partners and hypothesize their potential regulatory functions. In summary, we present a complete computational strategy for investigating the contextual determinants of specific RBP binding, and we demonstrate the significance of our findings in delineating RBP binding patterns, interactions, and functions.

Source

Elhajjajy SI, Weng Z. A novel NLP-based method and algorithm to discover RNA-binding protein (RBP) motifs, contexts, binding preferences, and interactions. bioRxiv [Preprint]. 2025 Jan 22:2025.01.20.631609. doi: 10.1101/2025.01.20.631609. PMID: 39896518; PMCID: PMC11785142.

Year of Medical School at Time of Visit
Sponsors
Dates of Travel
DOI
10.1101/2025.01.20.631609
PubMed ID
39896518
Other Identifiers
Notes

This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.

Funding and Acknowledgements
Corresponding Author
Related Resources
Related Resources
Repository Citation
Rights
Distribution License