Show simple item record

dc.contributor.authorArmstrong, Joel
dc.contributor.authorKarlsson, Elinor K
dc.contributor.authorZhang, Guojie
dc.contributor.authorPaten, Benedict
dc.date2022-08-11T08:08:26.000
dc.date.accessioned2022-08-23T15:55:10Z
dc.date.available2022-08-23T15:55:10Z
dc.date.issued2020-11-11
dc.date.submitted2021-02-05
dc.identifier.citation<p>Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J, Genereux D, Johnson J, Marinescu VD, Alföldi J, Harris RS, Lindblad-Toh K, Haussler D, Karlsson E, Jarvis ED, Zhang G, Paten B. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. doi: 10.1038/s41586-020-2871-y. Epub 2020 Nov 11. PMID: 33177663; PMCID: PMC7673649. <a href="https://doi.org/10.1038/s41586-020-2871-y">Link to article on publisher's site</a></p>
dc.identifier.issn0028-0836 (Linking)
dc.identifier.doi10.1038/s41586-020-2871-y
dc.identifier.pmid33177663
dc.identifier.urihttp://hdl.handle.net/20.500.14038/29693
dc.description<p>Full author list omitted for brevity. For the full list of authors, see article.</p>
dc.description.abstractNew genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies(1-3). For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database(4) increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies(5) are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus(6), a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
dc.language.isoen_US
dc.relation<p><a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&list_uids=33177663&dopt=Abstract">Link to Article in PubMed</a></p>
dc.rights© The Author(s) 2020. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectComparative genomics
dc.subjectGenome informatics
dc.subjectPhylogen
dc.subjectSoftware
dc.subjectBioinformatics
dc.subjectComputational Biology
dc.subjectEcology and Evolutionary Biology
dc.subjectGenomics
dc.titleProgressive Cactus is a multiple-genome aligner for the thousand-genome era
dc.typeJournal Article
dc.source.journaltitleNature
dc.source.volume587
dc.source.issue7833
dc.identifier.legacyfulltexthttps://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=2924&amp;context=faculty_pubs&amp;unstamped=1
dc.identifier.legacycoverpagehttps://escholarship.umassmed.edu/faculty_pubs/1905
dc.identifier.contextkey21481839
refterms.dateFOA2022-08-23T15:55:10Z
html.description.abstract<p>New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies(1-3). For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database(4) increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies(5) are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus(6), a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.</p>
dc.identifier.submissionpathfaculty_pubs/1905
dc.contributor.departmentProgram in Bioinformatics and Integrative Biology
dc.contributor.departmentProgram in Molecular Medicine
dc.source.pages246-251


Files in this item

Thumbnail
Name:
s41586_020_2871_y.pdf
Size:
2.313Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

© The Author(s) 2020. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Except where otherwise noted, this item's license is described as © The Author(s) 2020. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.