Genomics and Computational Biology
http://hdl.handle.net/20.500.14038/194
2024-03-29T04:48:38ZInvestigating the etiologies of non-malarial febrile illness in Senegal using metagenomic sequencing
http://hdl.handle.net/20.500.14038/53099
Investigating the etiologies of non-malarial febrile illness in Senegal using metagenomic sequencing
Levine, Zoë C; Sene, Aita; Mkandawire, Winnie; Deme, Awa B; Ndiaye, Tolla; Sy, Mouhamad; Gaye, Amy; Diedhiou, Younouss; Mbaye, Amadou M; Ndiaye, Ibrahima M; Gomis, Jules; Ndiop, Médoune; Sene, Doudou; Faye Paye, Marietou; MacInnis, Bronwyn L; Schaffner, Stephen F; Park, Daniel J; Badiane, Aida S; Colubri, Andres; Ndiaye, Mouhamadou; Sy, Ngayo; Sabeti, Pardis C; Ndiaye, Daouda; Siddle, Katherine J
The worldwide decline in malaria incidence is revealing the extensive burden of non-malarial febrile illness (NMFI), which remains poorly understood and difficult to diagnose. To characterize NMFI in Senegal, we collected venous blood and clinical metadata in a cross-sectional study of febrile patients and healthy controls in a low malaria burden area. Using 16S and untargeted sequencing, we detected viral, bacterial, or eukaryotic pathogens in 23% (38/163) of NMFI cases. Bacteria were the most common, with relapsing fever Borrelia and spotted fever Rickettsia found in 15.5% and 3.8% of cases, respectively. Four viral pathogens were found in a total of 7 febrile cases (3.5%). Sequencing also detected undiagnosed Plasmodium, including one putative P. ovale infection. We developed a logistic regression model that can distinguish Borrelia from NMFIs with similar presentation based on symptoms and vital signs (F1 score: 0.823). These results highlight the challenge and importance of improved diagnostics, especially for Borrelia, to support diagnosis and surveillance.
2024-01-25T00:00:00ZDog size and patterns of disease history across the canine age spectrum: Results from the Dog Aging Project
http://hdl.handle.net/20.500.14038/53103
Dog size and patterns of disease history across the canine age spectrum: Results from the Dog Aging Project
Nam, Yunbi; White, Michelle; Karlsson, Elinor K; Creevy, Kate E; Promislow, Daniel E L; McClelland, Robyn L
Age in dogs is associated with the risk of many diseases, and canine size is a major factor in that risk. However, the size patterns are complex. While small size dogs tend to live longer, some diseases are more prevalent among small dogs. In this study we seek to quantify how the pattern of disease history varies across the spectrum of dog size, dog age, and their interaction. Utilizing owner-reported data on disease history from a substantial number of companion dogs enrolled in the Dog Aging Project, we investigate how body size, as measured by weight, associates with the lifetime prevalence of a reported condition and its pattern across age for various disease categories. We found significant positive associations between dog size and the lifetime prevalence of skin, bone/orthopedic, gastrointestinal, ear/nose/throat, cancer/tumor, brain/neurologic, endocrine, and infectious diseases. Similarly, dog size was negatively associated with lifetime prevalence of ocular, cardiac, liver/pancreas, and respiratory disease categories. Kidney/urinary disease prevalence did not vary by size. We also found that the association between age and lifetime disease prevalence varied by dog size for many conditions including ocular, cardiac, orthopedic, ear/nose/throat, and cancer. Controlling for sex, purebred vs. mixed-breed status, and geographic region made little difference in all disease categories we studied. Our results align with the reduced lifespan in larger dogs for most of the disease categories and suggest potential avenues for further examination.
2024-01-17T00:00:00ZA Burden of Rare Copy Number Variants in Obsessive-Compulsive Disorder [preprint]
http://hdl.handle.net/20.500.14038/53105
A Burden of Rare Copy Number Variants in Obsessive-Compulsive Disorder [preprint]
Halvorsen, Matthew; de Schipper, Elles; Boberg, Julia; Strom, Nora; Hagen, Kristen; Lindblad-Toh, Kerstin; Karlsson, Elinor K; Pedersen, Nancy; Bulik, Cynthia; Fundín, Bengt; Landén, Mikael; Kvale, Gerd; Hansen, Bjarne; Haavik, Jan; Mattheisen, Manuel; Rück, Christian; Mataix-Cols, David; Crowley, James
Current genetic research on obsessive-compulsive disorder (OCD) supports contributions to risk specifically from common single nucleotide variants (SNVs), along with rare coding SNVs and small insertion-deletions (indels). The contribution to OCD risk from large, rare copy number variants (CNVs), however, has not been formally assessed at a similar scale. Here we describe an analysis of rare CNVs called from genotype array data in 2,248 deeply phenotyped OCD cases and 3,608 unaffected controls from Sweden and Norway. We found that in general cases carry an elevated burden of large (>30kb, at least 15 probes) CNVs (OR=1.12, P=1.77×10-3). The excess rate of these CNVs in cases versus controls was around 0.07 (95% CI 0.02-0.11, P=2.58×10-3). This signal was largely driven by CNVs overlapping protein-coding regions (OR=1.19, P=3.08×10-4), particularly deletions impacting loss-of-function intolerant genes (pLI>0.995, OR=4.12, P=2.54×10-5). We did not identify any specific locus where CNV burden was associated with OCD case status at genome-wide significance, but we noted non-random recurrence of CNV deletions in cases (permutation P = 2.60×10-3). In cases where sufficient clinical data were available (n=1612) we found that carriers of neurodevelopmental duplications were more likely to have comorbid autism (P<0.001), and that carriers of deletions overlapping neurodevelopmental genes had lower treatment response (P=0.02). The results demonstrate a contribution of large, rare CNVs to OCD risk, and suggest that studies of rare coding variation in OCD would have increased power to identify risk genes if this class of variation were incorporated into formal tests.
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.
2024-01-03T00:00:00ZMutational spectrum and phenotypic variability of Duchenne muscular dystrophy and related disorders in a Bangladeshi population
http://hdl.handle.net/20.500.14038/52872
Mutational spectrum and phenotypic variability of Duchenne muscular dystrophy and related disorders in a Bangladeshi population
Sarker, Shaoli; Eshaque, Tamannyat Binte; Soorajkumar, Anjana; Nassir, Nasna; Zehra, Binte; Kanta, Shayla Imam; Rahaman, Md Atikur; Islam, Amirul; Akter, Shimu; Ali, Mohammad Kawsar; Mim, Rabeya Akter; Uddin, K M Furkan; Chowdhury, Mohammod Shah Jahan; Shams, Nusrat; Baqui, Md Abdul; Lim, Elaine T; Akter, Hosneara; Woodbury-Smith, Marc; Uddin, Mohammed
Duchenne muscular dystrophy (DMD) is a severe rare neuromuscular disorder caused by mutations in the X-linked dystrophin gene. Several mutations have been identified, yet the full mutational spectrum, and their phenotypic consequences, will require genotyping across different populations. To this end, we undertook the first detailed genotype and phenotype characterization of DMD in the Bangladeshi population. We investigated the rare mutational and phenotypic spectrum of the DMD gene in 36 DMD-suspected Bangladeshi participants using an economically affordable diagnostic strategy involving initial screening for exonic deletions in the DMD gene via multiplex PCR, followed by testing PCR-negative patients for mutations using whole exome sequencing. The deletion mapping identified two critical DMD gene hotspot regions (near proximal and distal ends, spanning exons 8-17 and exons 45-53, respectively) that comprised 95% (21/22) of the deletions for this population cohort. From our exome analysis, we detected two novel pathogenic hemizygous mutations in exons 21 and 42 of the DMD gene, and novel pathogenic recessive and loss of function variants in four additional genes: SGCD, DYSF, COL6A3, and DOK7. Our phenotypic analysis showed that DMD suspected participants presented diverse phenotypes according to the location of the mutation and which gene was impacted. Our study provides ethnicity specific new insights into both clinical and genetic aspects of DMD.
2023-12-06T00:00:00ZAn encyclopedia of enhancer-gene regulatory interactions in the human genome [preprint]
http://hdl.handle.net/20.500.14038/52868
An encyclopedia of enhancer-gene regulatory interactions in the human genome [preprint]
Gschwind, Andreas R; Mualim, Kristy S; Karbalayghareh, Alireza; Sheth, Maya U; Dey, Kushal K; Jagoda, Evelyn; Nurtdinov, Ramil N; Xi, Wang; Tan, Anthony S; Jones, Hank; Ma, X Rosa; Yao, David; Nasser, Joseph; Avsec, Žiga; James, Benjamin T; Shamim, Muhammad S; Durand, Neva C; Rao, Suhas S P; Mahajan, Ragini; Doughty, Benjamin R; Andreeva, Kalina; Ulirsch, Jacob C; Fan, Kaili; Perez, Elizabeth M; Nguyen, Tri C; Kelley, David R; Finucane, Hilary K; Moore, Jill E; Weng, Zhiping; Kellis, Manolis; Bassik, Michael C; Price, Alkes L; Beer, Michael A; Guigó, Roderic; Stamatoyannopoulos, John A; Lieberman Aiden, Erez; Greenleaf, William J; Leslie, Christina S; Steinmetz, Lars M; Kundaje, Anshul; Engreitz, Jesse M
Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.
2023-11-13T00:00:00ZSingle-cell transcriptomic and genomic changes in the aging human brain [preprint]
http://hdl.handle.net/20.500.14038/52869
Single-cell transcriptomic and genomic changes in the aging human brain [preprint]
Jeffries, Ailsa M; Yu, Tianxiong; Ziegenfuss, Jennifer S; Tolles, Allie K; Kim, Yerin; Weng, Zhiping; Lodato, Michael A
Aging brings dysregulation of various processes across organs and tissues, often stemming from stochastic damage to individual cells over time. Here, we used a combination of single-nucleus RNA-sequencing and single-cell whole-genome sequencing to identify transcriptomic and genomic changes in the prefrontal cortex of the human brain across life span, from infancy to centenarian. We identified infant-specific cell clusters enriched for the expression of neurodevelopmental genes, and a common down-regulation of cell-essential homeostatic genes that function in ribosomes, transport, and metabolism during aging across cell types. Conversely, expression of neuron-specific genes generally remains stable throughout life. We observed a decrease in specific DNA repair genes in aging, including genes implicated in generating brain somatic mutations as indicated by mutation signature analysis. Furthermore, we detected gene-length-specific somatic mutation rates that shape the transcriptomic landscape of the aged human brain. These findings elucidate critical aspects of human brain aging, shedding light on transcriptomic and genomics dynamics.
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.
2023-11-07T00:00:00ZBeyond genome-wide association studies: Investigating the role of noncoding regulatory elements in primary sclerosing cholangitis
http://hdl.handle.net/20.500.14038/52649
Beyond genome-wide association studies: Investigating the role of noncoding regulatory elements in primary sclerosing cholangitis
Pratt, Henry E; Wu, Tong; Elhajjajy, Shaimae I; Zhou, Jeffrey Y.; Fitzgerald, Kate; Fazzio, Tom; Weng, Zhiping; Pratt, Daniel S
Background: Genome-wide association studies (GWAS) have identified 30 risk loci for primary sclerosing cholangitis (PSC). Variants within these loci are found predominantly in noncoding regions of DNA making their mechanisms of conferring risk hard to define. Epigenomic studies have shown noncoding variants broadly impact regulatory element activity. The possible association of noncoding PSC variants with regulatory element activity has not been studied. We aimed to (1) determine if the noncoding risk variants in PSC impact regulatory element function and (2) if so, assess the role these regulatory elements have in explaining the genetic risk for PSC.
Methods: Available epigenomic datasets were integrated to build a comprehensive atlas of cell type-specific regulatory elements, emphasizing PSC-relevant cell types. RNA-seq and ATAC-seq were performed on peripheral CD4+ T cells from 10 PSC patients and 11 healthy controls. Computational techniques were used to (1) study the enrichment of PSC-risk variants within regulatory elements, (2) correlate risk genotype with differences in regulatory element activity, and (3) identify regulatory elements differentially active and genes differentially expressed between PSC patients and controls.
Results: Noncoding PSC-risk variants are strongly enriched within immune-specific enhancers, particularly ones involved in T-cell response to antigenic stimulation. In total, 250 genes and >10,000 regulatory elements were identified that are differentially active between patients and controls.
Conclusions: Mechanistic effects are proposed for variants at 6 PSC-risk loci where genotype was linked with differential T-cell regulatory element activity. Regulatory elements are shown to play a key role in PSC pathophysiology.
2023-09-27T00:00:00ZReliable multiplex generation of pooled induced pluripotent stem cells
http://hdl.handle.net/20.500.14038/52637
Reliable multiplex generation of pooled induced pluripotent stem cells
Smullen, Molly; Olson, Meagan N; Reichert, Julia M; Dawes, Pepper; Murray, Liam F; Baer, Christina E; Wang, Qi; Readhead, Benjamin; Church, George M; Lim, Elaine T; Chan, Yingleong
Reprogramming somatic cells into pluripotent stem cells (iPSCs) enables the study of systems in vitro. To increase the throughput of reprogramming, we present induction of pluripotency from pooled cells (iPPC)-an efficient, scalable, and reliable reprogramming procedure. Using our deconvolution algorithm that employs pooled sequencing of single-nucleotide polymorphisms (SNPs), we accurately estimated individual donor proportions of the pooled iPSCs. With iPPC, we concurrently reprogrammed over one hundred donor lymphoblastoid cell lines (LCLs) into iPSCs and found strong correlations of individual donors' reprogramming ability across multiple experiments. Individual donors' reprogramming ability remains consistent across both same-day replicates and multiple experimental runs, and the expression of certain immunoglobulin precursor genes may impact reprogramming ability. The pooled iPSCs were also able to differentiate into cerebral organoids. Our procedure enables a multiplex framework of using pooled libraries of donor iPSCs for downstream research and investigation of in vitro phenotypes.
2023-08-31T00:00:00ZImproving diagnosis of non-malarial fevers in Senegal: Borrelia and the contribution of tick-borne bacteria [preprint]
http://hdl.handle.net/20.500.14038/52658
Improving diagnosis of non-malarial fevers in Senegal: Borrelia and the contribution of tick-borne bacteria [preprint]
Levine, Zoë C; Sene, Aita; Mkandawire, Winnie; Deme, Awa B; Ndiaye, Tolla; Sy, Mouhamad; Gaye, Amy; Diedhiou, Younouss; Mbaye, Amadou M; Ndiaye, Ibrahima; Gomis, Jules; Ndiop, Médoune; Sene, Doudou; Paye, Marietou Faye; MacInnis, Bronwyn; Schaffner, Stephen F; Park, Daniel J; Badiane, Aida S; Colubri, Andres; Ndiaye, Mouhamadou; Sy, Ngayo; Sabeti, Pardis C; Ndiaye, Daouda; Siddle, Katherine J
The worldwide decline in malaria incidence is revealing the extensive burden of non-malarial febrile illness (NMFI), which remains poorly understood and difficult to diagnose. To characterize NMFI in Senegal, we collected venous blood and clinical metadata from febrile patients and healthy controls in a low malaria burden area. Using 16S and unbiased sequencing, we detected viral, bacterial, or eukaryotic pathogens in 29% of NMFI cases. Bacteria were the most common, with relapsing fever Borrelia and spotted fever Rickettsia found in 15% and 3.7% of cases, respectively. Four viral pathogens were found in a total of 7 febrile cases (3.5%). Sequencing also detected undiagnosed Plasmodium, including one putative P. ovale infection. We developed a logistic regression model to distinguish Borrelia from NMFIs with similar presentation based on symptoms and vital signs. These results highlight the challenge and importance of improved diagnostics, especially for Borrelia, to support diagnosis and surveillance.
This article is a preprint. Preprints are preliminary reports of work that have not been certified by peer review.
2023-08-25T00:00:00ZUsing evolutionary constraint to define novel candidate driver genes in medulloblastoma
http://hdl.handle.net/20.500.14038/52664
Using evolutionary constraint to define novel candidate driver genes in medulloblastoma
Roy, Ananya; Sakthikumar, Sharadha; Kozyrev, Sergey V; Nordin, Jessika; Pensch, Raphaela; Mäkeläinen, Suvi; Pettersson, Mats; Karlsson, Elinor K; Lindblad-Toh, Kerstin; Forsberg-Nilsson, Karin
Current knowledge of cancer genomics remains biased against noncoding mutations. To systematically search for regulatory noncoding mutations, we assessed mutations in conserved positions in the genome under the assumption that these are more likely to be functional than mutations in positions with low conservation. To this end, we use whole-genome sequencing data from the International Cancer Genome Consortium and combined it with evolutionary constraint inferred from 240 mammals, to identify genes enriched in noncoding constraint mutations (NCCMs), mutations likely to be regulatory in nature. We compare medulloblastoma (MB), which is malignant, to pilocytic astrocytoma (PA), a primarily benign tumor, and find highly different NCCM frequencies between the two, in agreement with the fact that malignant cancers tend to have more mutations. In PA, a high NCCM frequency only affects the BRAF locus, which is the most commonly mutated gene in PA. In contrast, in MB, >500 genes have high levels of NCCMs. Intriguingly, several loci with NCCMs in MB are associated with different ages of onset, such as the HOXB cluster in young MB patients. In adult patients, NCCMs occurred in, e.g., the WASF-2/AHDC1/FGR locus. One of these NCCMs led to increased expression of the SRC kinase FGR and augmented responsiveness of MB cells to dasatinib, a SRC kinase inhibitor. Our analysis thus points to different molecular pathways in different patient groups. These newly identified putative candidate driver mutations may aid in patient stratification in MB and could be valuable for future selection of personalized treatment options.
2023-08-07T00:00:00Z