Construction and Analysis of a Risk Prediction Model for Heart Disease Based on Binary Logit Regression and Random Forest Model

Authors

  • Ailin Du Author

DOI:

https://doi.org/10.61173/ns18c775

Keywords:

Random Forest Model, Binary Logit Regression, Cardiovascular prediction, Heart Disease Prediction

Abstract

Heart disease (HD) is one of the most serious health problems worldwide. If it can be predicted at an early stage, it can prevent heart attacks and thereby reduce the mortality rate of HD. Previous studies that successfully predicted HD using random forest models and binary logistic models have inspired this research. Therefore, this research jointly uses the random forest model and the binary logistic model to make more reliable and stable predictions of HD. This research analyzed the survey data on the annual health status of over 400,000 adults from the Centers for Disease Prevention and Control of the United States in the public database of the Kaggle website in 2022. Finally, it was concluded that stroke is the most significant factor affecting HD.  In the visual analysis and the predictions of the two models, the risk factor Stroke stood out significantly. In the prediction results of the random forest model, this factor ranked among the top four, with a model prediction accuracy of 89%. In the binary logit model prediction, the variable Stroke ranked first, with a model prediction accuracy of 91.36%. Both models had relatively high accuracy rates, and Stroke was determined as a significant influencing factor in both models, making the prediction results reliable. This research provides a theoretical basis for early clinical screening of HD and offers more innovative and reliable prediction methods, which is expected to reduce the mortality rate of HD in the future.

Downloads

Published

2025-10-23

Issue

Section

Articles