Asymmetric trichotomous data partitioning enables development of predictive machine learning models using limited siRNA efficacy datasets [preprint]
UMass Chan Affiliations
Morningside Graduate School of Biomedical SciencesRNA Therapeutics Institute
Document Type
PreprintPublication Date
2022-07-10
Metadata
Show full item recordAbstract
Chemically modified small interfering RNAs (siRNAs) are promising therapeutics guiding sequence-specific silencing of disease genes. However, identifying chemically modified siRNA sequences that effectively silence target genes is a challenge. Such determinations necessitate computational algorithms. Machine Learning (ML) is a powerful predictive approach for tackling biological problems, but typically requires datasets significantly larger than most available siRNA datasets. Here, we describe a framework for applying ML to a small dataset (356 modified sequences) for siRNA efficacy prediction. To overcome noise and biological limitations in siRNA datasets, we apply a trichotomous (using two thresholds) partitioning approach, producing several combinations of classification threshold pairs. We then test the effects of different thresholds on random forest (RF) ML model performance using a novel evaluation metric accounting for class imbalances. We identify thresholds yielding a model with high predictive power outperforming a simple linear classification model generated from the same data. Using a novel method to extract model features, we observe target site base preferences consistent with current understanding of the siRNA-mediated silencing mechanism, with RF providing higher resolution than the linear model. This framework applies to any classification challenge involving small biological datasets, providing an opportunity to develop high-performing design algorithms for oligonucleotide therapies.Source
Asymmetric trichotomous data partitioning enables development of predictive machine learning models using limited siRNA efficacy datasets Kathryn R. Monopoli, Dmitry Korkin, Anastasia Khvorova bioRxiv 2022.07.08.499317; doi: https://doi.org/10.1101/2022.07.08.499317DOI
10.1101/2022.07.08.499317Permanent Link to this Item
http://hdl.handle.net/20.500.14038/51495Notes
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.Rights
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.; Attribution-NonCommercial 4.0 InternationalDistribution License
http://creativecommons.org/licenses/by-nc/4.0/ae974a485f413a2113503eed53cd6c53
10.1101/2022.07.08.499317
Scopus Count
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.