Show simple item record

dc.contributor.authorZhang, Wen
dc.contributor.authorZhu, Xiaopeng
dc.contributor.authorFu, Yu
dc.contributor.authorTsuji, Junko
dc.contributor.authorWeng, Zhiping
dc.date2022-08-11T08:07:58.000
dc.date.accessioned2022-08-23T15:37:57Z
dc.date.available2022-08-23T15:37:57Z
dc.date.issued2017-12-01
dc.date.submitted2018-02-16
dc.identifier.citation<p>BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):464. doi: 10.1186/s12859-017-1875-6. <a href="https://doi.org/10.1186/s12859-017-1875-6">Link to article on publisher's site</a></p>
dc.identifier.issn1471-2105 (Linking)
dc.identifier.doi10.1186/s12859-017-1875-6
dc.identifier.pmid29219070
dc.identifier.urihttp://hdl.handle.net/20.500.14038/25834
dc.description.abstractBACKGROUND: Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints. RESULTS: Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method. CONCLUSIONS: In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.
dc.language.isoen_US
dc.relation<p><a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&list_uids=29219070&dopt=Abstract">Link to Article in PubMed</a></p>
dc.rights© The Author(s). 2017. Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectGenetic algorithm
dc.subjectHuman splicing branchpoint
dc.subjectLogistic regression
dc.subjectMulti-label learning
dc.subjectBioinformatics
dc.subjectComputational Biology
dc.subjectGenetic Phenomena
dc.subjectGenetics and Genomics
dc.subjectIntegrative Biology
dc.titlePredicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
dc.typeJournal Article
dc.source.journaltitleBMC bioinformatics
dc.source.volume18
dc.source.issueSuppl 13
dc.identifier.legacyfulltexthttps://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=1135&amp;context=bioinformatics_pubs&amp;unstamped=1
dc.identifier.legacycoverpagehttps://escholarship.umassmed.edu/bioinformatics_pubs/126
dc.identifier.contextkey11573986
refterms.dateFOA2022-08-23T15:37:57Z
html.description.abstract<p>BACKGROUND: Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints.</p> <p>RESULTS: Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method.</p> <p>CONCLUSIONS: In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.</p>
dc.identifier.submissionpathbioinformatics_pubs/126
dc.contributor.departmentDepartment of Biochemistry and Molecular Pharmacology
dc.contributor.departmentProgram in Bioinformatics and Integrative Biology
dc.source.pages464


Files in this item

Thumbnail
Name:
s12859_017_1875_6.pdf
Size:
690.9Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

© The Author(s). 2017. Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Except where otherwise noted, this item's license is described as © The Author(s). 2017. Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.