Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data
dc.contributor.author | Tsuji, Junko | |
dc.contributor.author | Weng, Zhiping | |
dc.date | 2022-08-11T08:07:59.000 | |
dc.date.accessioned | 2022-08-23T15:38:24Z | |
dc.date.available | 2022-08-23T15:38:24Z | |
dc.date.issued | 2015-12-01 | |
dc.date.submitted | 2016-01-07 | |
dc.identifier.citation | Brief Bioinform. 2015 Dec 1. pii: bbv103. <a href="http://dx.doi.org/10.1093/bib/bbv103">Link to article on publisher's site</a> | |
dc.identifier.issn | 1467-5463 (Linking) | |
dc.identifier.doi | 10.1093/bib/bbv103 | |
dc.identifier.pmid | 26628557 | |
dc.identifier.uri | http://hdl.handle.net/20.500.14038/25927 | |
dc.description.abstract | Cytosine methylation regulates many biological processes such as gene expression, chromatin structure and chromosome stability. The whole genome bisulfite sequencing (WGBS) technique measures the methylation level at each cytosine throughout the genome. There are an increasing number of publicly available pipelines for analyzing WGBS data, reflecting many choices of read mapping algorithms as well as preprocessing and postprocessing methods. We simulated single-end and paired-end reads based on three experimental data sets, and comprehensively evaluated 192 combinations of three preprocessing, five postprocessing and five widely used read mapping algorithms. We also compared paired-end data with single-end data at the same sequencing depth for performance of read mapping and methylation level estimation. Bismark and LAST were the most robust mapping algorithms. We found that Mott trimming and quality filtering individually improved the performance of both read mapping and methylation level estimation, but combining them did not lead to further improvement. Furthermore, we confirmed that paired-end sequencing reduced error rate and enhanced sensitivity for both read mapping and methylation level estimation, especially for short reads and in repetitive regions of the human genome. | |
dc.language.iso | en_US | |
dc.relation | <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&list_uids=26628557&dopt=Abstract">Link to Article in PubMed</a> | |
dc.relation.url | http://dx.doi.org/10.1093/bib/bbv103 | |
dc.subject | DNA methylation | |
dc.subject | WGBS analysis step evaluation | |
dc.subject | WGBS mapping software | |
dc.subject | read quality trimming | |
dc.subject | whole genome bisulfite sequencing | |
dc.subject | Bioinformatics | |
dc.subject | Computational Biology | |
dc.subject | Genomics | |
dc.title | Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data | |
dc.type | Journal Article | |
dc.source.journaltitle | Briefings in bioinformatics | |
dc.identifier.legacycoverpage | https://escholarship.umassmed.edu/bioinformatics_pubs/68 | |
dc.identifier.contextkey | 7992035 | |
html.description.abstract | <p>Cytosine methylation regulates many biological processes such as gene expression, chromatin structure and chromosome stability. The whole genome bisulfite sequencing (WGBS) technique measures the methylation level at each cytosine throughout the genome. There are an increasing number of publicly available pipelines for analyzing WGBS data, reflecting many choices of read mapping algorithms as well as preprocessing and postprocessing methods. We simulated single-end and paired-end reads based on three experimental data sets, and comprehensively evaluated 192 combinations of three preprocessing, five postprocessing and five widely used read mapping algorithms. We also compared paired-end data with single-end data at the same sequencing depth for performance of read mapping and methylation level estimation. Bismark and LAST were the most robust mapping algorithms. We found that Mott trimming and quality filtering individually improved the performance of both read mapping and methylation level estimation, but combining them did not lead to further improvement. Furthermore, we confirmed that paired-end sequencing reduced error rate and enhanced sensitivity for both read mapping and methylation level estimation, especially for short reads and in repetitive regions of the human genome.</p> | |
dc.identifier.submissionpath | bioinformatics_pubs/68 | |
dc.contributor.department | Program in Bioinformatics and Integrative Biology |