Name:
Publisher version
View Source
Access full-text PDFOpen Access
View Source
Check access options
Check access options
Authors
Antony, BlessyBlau, Hannah
Casiraghi, Elena
Loomba, Johanna J
Callahan, Tiffany J
Laraway, Bryan J
Wilkins, Kenneth J
Antonescu, Corneliu C
Valentini, Giorgio
Williams, Andrew E
Robinson, Peter N
Reese, Justin T
Murali, T M
UMass Chan Affiliations
Center for Clinical and Translational ScienceDocument Type
Journal ArticlePublication Date
2023-09-04
Metadata
Show full item recordAbstract
Background: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. Methods: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). Findings: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. Interpretation: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. Funding: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.Source
Antony B, Blau H, Casiraghi E, Loomba JJ, Callahan TJ, Laraway BJ, Wilkins KJ, Antonescu CC, Valentini G, Williams AE, Robinson PN, Reese JT, Murali TM; N3C consortium. Predictive models of long COVID. EBioMedicine. 2023 Oct;96:104777. doi: 10.1016/j.ebiom.2023.104777. Epub 2023 Sep 4. PMID: 37672869; PMCID: PMC10494314.DOI
10.1016/j.ebiom.2023.104777Permanent Link to this Item
http://hdl.handle.net/20.500.14038/52963PubMed ID
37672869Funding and Acknowledgements
The UMass Center for Clinical and Translational Science (UMCCTS), UL1TR001453, provided data for this study.Rights
Copyright © 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).; Attribution 4.0 InternationalDistribution License
http://creativecommons.org/licenses/by/4.0/ae974a485f413a2113503eed53cd6c53
10.1016/j.ebiom.2023.104777
Scopus Count
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Copyright © 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).