Show simple item record

dc.contributor.authorYukselen, Onur
dc.contributor.authorTurkyilmaz, Osman
dc.contributor.authorOzturk, Ahmet R.
dc.contributor.authorGarber, Manuel
dc.contributor.authorKucukural, Alper
dc.date2022-08-11T08:09:56.000
dc.date.accessioned2022-08-23T16:49:15Z
dc.date.available2022-08-23T16:49:15Z
dc.date.issued2020-04-19
dc.date.submitted2020-05-08
dc.identifier.citation<p>Yukselen O, Turkyilmaz O, Ozturk AR, Garber M, Kucukural A. DolphinNext: a distributed data processing platform for high throughput genomics. BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x. PMID: 32306927; PMCID: PMC7168977. <a href="https://doi.org/10.1186/s12864-020-6714-x">Link to article on publisher's site</a></p>
dc.identifier.issn1471-2164 (Linking)
dc.identifier.doi10.1186/s12864-020-6714-x
dc.identifier.pmid32306927
dc.identifier.urihttp://hdl.handle.net/20.500.14038/41424
dc.description.abstractBACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations. RESULTS: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis. CONCLUSION: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results.
dc.language.isoen_US
dc.relation<p><a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&list_uids=32306927&dopt=Abstract">Link to Article in PubMed</a></p>
dc.rights© The Author(s). 2020 Open Access: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectBig data processing
dc.subjectGenome analysis
dc.subjectPipeline
dc.subjectSequencing
dc.subjectWorkflow
dc.subjectUMCCTS funding
dc.subjectBioinformatics
dc.subjectComputational Biology
dc.subjectComputer Sciences
dc.subjectGenomics
dc.subjectIntegrative Biology
dc.subjectSystems Biology
dc.titleDolphinNext: a distributed data processing platform for high throughput genomics
dc.typeJournal Article
dc.source.journaltitleBMC genomics
dc.source.volume21
dc.source.issue1
dc.identifier.legacyfulltexthttps://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=5224&amp;context=oapubs&amp;unstamped=1
dc.identifier.legacycoverpagehttps://escholarship.umassmed.edu/oapubs/4205
dc.identifier.contextkey17677181
refterms.dateFOA2022-08-23T16:49:15Z
html.description.abstract<p>BACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations.</p> <p>RESULTS: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis.</p> <p>CONCLUSION: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results.</p>
dc.identifier.submissionpathoapubs/4205
dc.contributor.departmentGarber Lab
dc.contributor.departmentProgram in Bioinformatics and Integrative Biology
dc.contributor.departmentProgram in Molecular Medicine
dc.contributor.departmentRNA Therapeutics Institute
dc.contributor.departmentBioinformatics Core
dc.source.pages310


Files in this item

Thumbnail
Name:
document.pdf
Size:
1.737Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

© The Author(s). 2020 Open Access: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Except where otherwise noted, this item's license is described as © The Author(s). 2020 Open Access: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.