A suite of computational tools to interrogate sequence data with local haplotype analysis within complex Plasmodium infections and other microbial mixtures
dc.contributor.advisor | Jeffrey Bailey | |
dc.contributor.author | Hathaway, Nicholas J | |
dc.date | 2022-08-11T08:08:46.000 | |
dc.date.accessioned | 2022-08-23T16:07:59Z | |
dc.date.available | 2022-08-23T16:07:59Z | |
dc.date.issued | 2018-03-19 | |
dc.date.submitted | 2018-05-04 | |
dc.identifier.doi | 10.13028/M2039K | |
dc.identifier.uri | http://hdl.handle.net/20.500.14038/32357 | |
dc.description.abstract | The rapid development of DNA sequencing technologies has opened up new avenues of research, including the investigation of population structure within infectious diseases (both within patient and between populations). In order to take advantage of these advances in technologies and the generation of new types of data, novel bioinformatics tools are needed that won’t succumb to artifacts introduced by the data generation, and thus provide accurate and precise results. To achieve this goal I have create several tools. First, SeekDeep, a pipeline for analyzing targeted amplicon sequencing datasets from various technologies, is able to achieve 1-base resolution even at low frequencies and read depths allowing for accurate comparison between samples and the detection of important SNPs. Next, PathWeaver, a local haplotype assembler designed for complex infections and highly variable genomic regions with poor reference mapping. PathWeaver is able to create highly accurate haplotypes without generating chimeric assemblies. PathWeaver was used on the key protein in pregnancy-associated malaria Plasmodium falciparum VAR2CSA which revealed population sub-structuring within the key binding domain of the protein observed to be present globally along with confirming copy number variation. Finally, the program Carmen is able to utilize PathWeaver to augment the results from targeted amplicon approaches by reporting where and when local haplotypes have been found previously. These rigorously tested tools allow the analysis of local haplotype data from various technologies and approaches to provide accurate, precise and easily accessible results. | |
dc.language.iso | en_US | |
dc.publisher | University of Massachusetts Medical School | |
dc.rights | Copyright is held by the author, with all rights reserved. | |
dc.subject | plasmodium | |
dc.subject | sequencing | |
dc.subject | infectious diseases | |
dc.subject | targeted amplicon | |
dc.subject | Bioinformatics | |
dc.subject | Computational Biology | |
dc.subject | Parasitic Diseases | |
dc.title | A suite of computational tools to interrogate sequence data with local haplotype analysis within complex Plasmodium infections and other microbial mixtures | |
dc.type | Doctoral Dissertation | |
dc.identifier.legacyfulltext | https://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=1976&context=gsbs_diss&unstamped=1 | |
dc.identifier.legacycoverpage | https://escholarship.umassmed.edu/gsbs_diss/970 | |
dc.legacy.embargo | 2019-05-04T00:00:00-07:00 | |
dc.identifier.contextkey | 12075262 | |
refterms.dateFOA | 2022-08-25T04:29:56Z | |
html.description.abstract | <p>The rapid development of DNA sequencing technologies has opened up new avenues of research, including the investigation of population structure within infectious diseases (both within patient and between populations). In order to take advantage of these advances in technologies and the generation of new types of data, novel bioinformatics tools are needed that won’t succumb to artifacts introduced by the data generation, and thus provide accurate and precise results. To achieve this goal I have create several tools.</p> <p>First, SeekDeep, a pipeline for analyzing targeted amplicon sequencing datasets from various technologies, is able to achieve 1-base resolution even at low frequencies and read depths allowing for accurate comparison between samples and the detection of important SNPs. Next, PathWeaver, a local haplotype assembler designed for complex infections and highly variable genomic regions with poor reference mapping. PathWeaver is able to create highly accurate haplotypes without generating chimeric assemblies. PathWeaver was used on the key protein in pregnancy-associated malaria Plasmodium falciparum VAR2CSA which revealed population sub-structuring within the key binding domain of the protein observed to be present globally along with confirming copy number variation. Finally, the program Carmen is able to utilize PathWeaver to augment the results from targeted amplicon approaches by reporting where and when local haplotypes have been found previously.</p> <p>These rigorously tested tools allow the analysis of local haplotype data from various technologies and approaches to provide accurate, precise and easily accessible results.</p> | |
dc.identifier.submissionpath | gsbs_diss/970 | |
dc.contributor.department | Department of Bioinformatics and Integrative Biology | |
dc.identifier.orcid | 0000-0001-9639-2894 |