Challenges in identifying mRNA transcript starts and ends from long-read sequencing data [preprint]
Calvo-Roitberg, Ezequiel ; Daniels, Rachel F ; Pai, Athma A
Student Authors
Faculty Advisor
Academic Program
UMass Chan Affiliations
Document Type
Publication Date
Subject Area
Embargo Expiration Date
Link to Full Text
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.
Source
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. bioRxiv [Preprint]. 2023 Jul 27:2023.07.26.550536. doi: 10.1101/2023.07.26.550536. Update in: Genome Res. 2024 Nov 20;34(11):1719-1734. doi: 10.1101/gr.279559.124. PMID: 37546743; PMCID: PMC10402045.
Year of Medical School at Time of Visit
Sponsors
Dates of Travel
DOI
Permanent Link to this Item
PubMed ID
Other Identifiers
Notes
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.
Funding and Acknowledgements
Corresponding Author
Related Resources
Now published in Genome Research doi: 10.1101/gr.279559.124