Loading...
Thumbnail Image
Publication

Challenges in identifying mRNA transcript starts and ends from long-read sequencing data [preprint]

Calvo-Roitberg, Ezequiel
Daniels, Rachel F
Pai, Athma A
Citations
Google Scholar:
Altmetric:
Student Authors
Faculty Advisor
Academic Program
UMass Chan Affiliations
Document Type
Preprint
Publication Date
2023-07-27
Subject Area
Embargo Expiration Date
Link to Full Text
Abstract

Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.

Source

Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. bioRxiv [Preprint]. 2023 Jul 27:2023.07.26.550536. doi: 10.1101/2023.07.26.550536. Update in: Genome Res. 2024 Nov 20;34(11):1719-1734. doi: 10.1101/gr.279559.124. PMID: 37546743; PMCID: PMC10402045.

Year of Medical School at Time of Visit
Sponsors
Dates of Travel
DOI
10.1101/2023.07.26.55053610.1038/nbt.425910.1038/s41592-023-01908-w10.1101/2021.04.21.44073610.1101/2023.04.27.538568
PubMed ID
37546743
Other Identifiers
Notes

This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.

Funding and Acknowledgements
Corresponding Author
Related Resources

Now published in Genome Research doi: 10.1101/gr.279559.124

Related Resources
Repository Citation
Rights
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.