Li, ZixiuZhou, PengKwon, EuijinFitzgerald, Katherine AWeng, ZhipingZhou, Chan2022-12-132022-12-132022-10-13Li Z, Zhou P, Kwon E, Fitzgerald KA, Weng Z, Zhou C. Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data. Noncoding RNA. 2022 Oct 13;8(5):70. doi: 10.3390/ncrna8050070. PMID: 36287122; PMCID: PMC9607125.2311-553X10.3390/ncrna805007036287122https://hdl.handle.net/20.500.14038/51448Long noncoding RNAs (lncRNAs) play critical regulatory roles in human development and disease. Although there are over 100,000 samples with available RNA sequencing (RNA-seq) data, many lncRNAs have yet to be annotated. The conventional approach to identifying novel lncRNAs from RNA-seq data is to find transcripts without coding potential but this approach has a false discovery rate of 30-75%. Other existing methods either identify only multi-exon lncRNAs, missing single-exon lncRNAs, or require transcriptional initiation profiling data (such as H3K4me3 ChIP-seq data), which is unavailable for many samples with RNA-seq data. Because of these limitations, current methods cannot accurately identify novel lncRNAs from existing RNA-seq data. To address this problem, we have developed software, Flnc, to accurately identify both novel and annotated full-length lncRNAs, including single-exon lncRNAs, directly from RNA-seq data without requiring transcriptional initiation profiles. Flnc integrates machine learning models built by incorporating four types of features: transcript length, promoter signature, multiple exons, and genomic location. Flnc achieves state-of-the-art prediction power with an AUROC score over 0.92. Flnc significantly improves the prediction accuracy from less than 50% using the conventional approach to over 85%. Flnc is available via GitHub platform.enCopyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).; Attribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/RNA-seqlncRNAmachine learningtoolunannotatedFlnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq DataJournal ArticleNon-coding RNA