• Characterizing Long COVID: Deep Phenotype of a Complex Condition

      Deer, Rachel R.; Liu, Feifan; Haendel, Melissa A.; Robinson, Peter N. (2021-12-01)
      BACKGROUND: Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 (PASC or "long COVID"), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations. Patient-led studies are of particular importance for understanding the natural history of COVID-19, but integration is hampered because they often use different terms to describe the same symptom or condition. This significant disparity in patient versus clinical characterization motivated the proposed ontological approach to specifying manifestations, which will improve capture and integration of future long COVID studies. METHODS: The Human Phenotype Ontology (HPO) is a widely used standard for exchange and analysis of phenotypic abnormalities in human disease but has not yet been applied to the analysis of COVID-19. FUNDING: We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to HPO terms. We present layperson synonyms and definitions that can be used to link patient self-report questionnaires to standard medical terminology. Long COVID clinical manifestations are not assessed consistently across studies, and most manifestations have been reported with a wide range of synonyms by different authors. Across at least 10 cohorts, authors reported 31 unique clinical features corresponding to HPO terms; the most commonly reported feature was Fatigue (median 45.1%) and the least commonly reported was Nausea (median 3.9%), but the reported percentages varied widely between studies. INTERPRETATION: Translating long COVID manifestations into computable HPO terms will improve analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared/pooled more effectively. Furthermore, mapping lay terminology to HPO will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, thereby improving the stratification, diagnosis, and treatment of long COVID. FUNDING: U24TR002306; UL1TR001439; P30AG024832; GBMF4552; R01HG010067; UL1TR002535; K23HL128909; UL1TR002389; K99GM145411.
    • Characterizing Long COVID: Deep Phenotype of a Complex Condition [preprint]

      Deer, Rachel R.; Liu, Feifan; Haendel, Melissa; Robinson, Peter N. (2021-06-29)
      Importance Since late 2019, the novel coronavirus SARS-CoV-2 has given rise to a global pandemic and introduced many health challenges with economic, social, and political consequences. In addition to a complex acute presentation that can affect multiple organ systems, there is mounting evidence of various persistent long-term sequelae. The worldwide scientific community is characterizing a diverse range of seemingly common long-term outcomes associated with SARS-CoV-2 infection, but the underlying assumptions in these studies vary widely making comparisons difficult. Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 infection (PASC or “long COVID”), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations of long COVID. Observations We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts of individuals three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to Human Phenotype Ontology (HPO) terms. Conclusions and Relevance Patients and clinicians often use different terms to describe the same symptom or condition. Addressing the heterogeneous and inconsistent language used to describe the clinical manifestations of long COVID combined with the lack of standardized terminologies for long COVID will provide a necessary foundation for comparison and meta-analysis of different studies. Translating long COVID manifestations into computable HPO terms will improve the analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared or pooled more effectively. Furthermore, mapping lay terminology to HPO for long COVID manifestations will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, which may improve the stratification and thereby diagnosis and treatment of long COVID.
    • Who has long-COVID? A big data approach [preprint]

      Pfaff, Emily R.; Girvin, Andrew T.; Bennett, Tellen D.; Bhatia, Abhishek; Brooks, Ian M.; Deer, Rachel R.; Dekermanjian, Jonathan P.; Jolley, Sarah Elizabeth; Kahn, Michael G.; Kostka, Kristin; et al. (2021-10-22)
      Background Post-acute sequelae of SARS-CoV-2 infection (PASC), otherwise known as long-COVID, have severely impacted recovery from the pandemic for patients and society alike. This new disease is characterized by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous long-COVID definition. Electronic health record (EHR) studies are a critical element of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, which is addressing the urgent need to understand PASC, accurately identify who has PASC, and identify treatments. Methods Using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models to identify potential long-COVID patients. We examined demographics, healthcare utilization, diagnoses, and medications for 97,995 adult COVID-19 patients. We used these features and 597 long-COVID clinic patients to train three ML models to identify potential long-COVID patients among (1) all COVID-19 patients, (2) patients hospitalized with COVID-19, and (3) patients who had COVID-19 but were not hospitalized. Findings Our models identified potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve of 0.91 (all patients), 0.90 (hospitalized); and 0.85 (non-hospitalized). Important features include rate of healthcare utilization, patient age, dyspnea, and other diagnosis and medication information available within the EHR. Applying the “all patients” model to the larger N3C cohort identified 100,263 potential long-COVID patients. Interpretation Patients flagged by our models can be interpreted as “patients likely to be referred to or seek care at a long-COVID specialty clinic,” an essential proxy for long-COVID diagnosis in the current absence of a definition. We also achieve the urgent goal of identifying potential long-COVID patients for clinical trials. As more data sources are identified, the models can be retrained and tuned based on study needs. Funding This study was funded by NCATS and NIH through the RECOVER Initiative.