Analysis of Diabetes Prediction Models Based on XGBoost and LightGBM

Authors

  • Weihao Zhou Author

DOI:

https://doi.org/10.61173/bzjqa936

Keywords:

XGBoost, LightGBM, diabetes predictions

Abstract

Predicting diabetes more effectively and accurately is becoming more and more important, and this research helps find the best method to predict diabetes. With the development of the economy, the incidence of diabetes is gradually increasing. Research on comparing eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) in diabetes predictions and further comparing improved results remains limited. This study compares XGBoost and LightGBM’s performances in diabetes predictions, and gives suggestions on the best algorithm to predict diabetes mellitus. By using a dataset collected from Kaggle, this research uses XGBoost and LightGBM to create prediction models and compares the results to find out which performs better in diabetes predictions. Furthermore, this research optimizes the data used in building models and again compares the second results to excavate deeper information. The test set’s accuracy of XGBoost is 97.2% and the test set’s precision of XGBoost is 97.2% using the whole data, while the test set’s accuracy of LightGBM is 97.3% and the test set’s precision of LightGBM is 97.3%. The test set’s accuracy of XGBoost is 97.3% and the test set’s precision of XGBoost is 97.4% using the optimized data, while the test set’s accuracy of LightGBM is 94.9% and the test set’s precision of LightGBM is 94.8%. In conclusion, LightGBM has slight advantages compared to XGBoost using the whole data, and on the other hand, XGBoost performs better using the optimized data. Because the two algorithms have different advantages, the selection of the algorithms in diabetes predictions needs to depend on specific circumstances.

Downloads

Published

2025-12-19

Issue

Section

Articles