Loading...
Thumbnail Image
Publication

Knowledge-Guided Machine Learning for Precision Medicine: New Methods for Robust Biomarker Discovery and Interpretable Risk Stratification from High-Dimensional Omics Data

Citations
Altmetric:
Student Authors
Faculty Advisor
Chan Zhou
Academic Program
Population Health Sciences
Document Type
Doctoral Dissertation
Publication Date
2025-11-11
Subject Area
Embargo Expiration Date
2027-11-11
Link to Full Text
Abstract

Precision medicine aims to integrate individual molecular information with clinical profiles to improve disease prediction and treatment. However, biomedical omics data are typically both high-dimensional and limited in sample size (𝑝≫𝑛), creating major analytical challenges. Naïve application of "black box" machine learning models to such data can lead to overfitting and statistical instability. This, and their inherent lack of biological interpretability, limits the value of such models in clinical translation.

This work explore the value of integrating biological domain knowledge as a structural constraint to address these challenges. This knowledge-guided paradigm enhances data efficiency, improves statistical robustness, and ensures mechanistic interpretability. To validate this approach, two knowledge-guided computational methods were developed to address critical tasks in precision medicine: biomarker discovery and disease risk prediction.

For biomarker discovery, BAMBI (Biostatistical and Artificial-intelligence Methods for Biomarker Identification) was developed. By combining biologically informed statistical filtering with robust machine learning-based feature selection, BAMBI identifies parsimonious yet highly predictive RNA biomarker panels, outperforming existing methods in small-sample scenarios.

For disease risk stratification, a pathway-guided Graph Neural Network was constructed to predict 10-year cardiovascular disease risk. By embedding pathway structures into the model architecture and incorporating a demographic-guided attention mechanism, this method substantially outperformed the Framingham Risk Score with respect to balanced accuracy and sensitivity while identifying biologically relevant central genes as key risk drivers.

This work demonstrates that integrating biological knowledge into machine learning is a powerful strategy for translating complex omics data into robust, interpretable, and clinically actionable insights.

Source
Year of Medical School at Time of Visit
Sponsors
Dates of Travel
DOI
10.13028/1p8j-dt69
PubMed ID
Other Identifiers
Notes
Funding and Acknowledgements
Corresponding Author
Related Resources
Related Resources
Repository Citation
Rights
Copyright © 2025 Peng Zhou
Distribution License