E-Commerce Customer Churn Prediction and Key Determinant Investigation Based on Machine Learning Algorithms
DOI:
https://doi.org/10.61173/caqwqt36Keywords:
Customer churn prediction, XGBoost, E-commerce, Class imbalance, Feature importanceAbstract
In response to the challenges of high dimensionality, class imbalance, and a large number of missing values in e-commerce customer data, this paper aims to construct a high-precision and interpretable customer churn prediction model to enhance the customer retention capabilities of e-commerce enterprises. This study is based on the XGBoost algorithm, using median imputation to handle missing values, applying oversampling techniques to alleviate class imbalance, and combining multi-dimensional feature selection to enhance model interpretability. By comparing the performance of three models - logistic regression, decision tree and XGBoost - it was found that XGBoost significantly outperformed the other models even without tuning. Further, after hyperparameter optimization, the model achieved an accuracy of 98.49% and an Area Under Curve (AUC) value of 0.99 on the test set, demonstrating excellent generalization ability. Feature importance analysis indicated that “customer tenure” and “whether to complain” were the core factors influencing churn. This study provides a robust and interpretable solution for e-commerce customer churn warning, with strong practical application value.