From Data to Decision: Explainable Risk Prediction for Cardiovascular Diseases Using Multicenter Patient Records

Authors

  • Jiaming Ou Author

DOI:

https://doi.org/10.61173/c0yt3b70

Keywords:

Cardiovascular disease prediction, Lasso lo-gistic regression, Random forest, Machine learning

Abstract

Cardiovascular disease (CVD) remains one of the leading global causes of mortality, highlighting the critical need for early prediction to reduce fatality rates. This study utilizes a publicly available CVD dataset to develop and compare three supervised learning models—Lasso-regularized logistic regression, random forest, and an ensemble model (Stacking)—for assessing individual disease risk. Through comprehensive preprocessing, including interaction terms and dummy variable encoding, this research enhanced model expressiveness and feature representation. The experimental results demonstrate robust predictive performance across all models, with the Stacking ensemble achieving the highest accuracy (90.00%), surpassing logistic regression (87.78%) and random forest (89.44%). Feature importance analysis further reveals ST depression induced by exercise (Oldpeak), Slope of peak exercise ST segment (ST_slope), and maximum heart rate achieved during exercise (MaxHR) as the most influential predictors. These findings not only validate machine learning's effectiveness in CVD risk assessment but also emphasize the value of feature engineering and model assembling in boosting predictive accuracy. The study provides a reliable framework for clinical decision support, potentially enabling earlier interventions and improved patient outcomes.

 

Downloads

Published

2025-08-26

Issue

Section

Articles