Multi-factor Selection and Stock Picking Strategy Based on Random Forest and LightGBM Algorithms
DOI:
https://doi.org/10.61173/w66yzp49Keywords:
Random Forest, LightGBM, Multi-factor Stock Selection, CSI 300 Index, Backtesting AnalysisAbstract
This study employs Random Forest (RF) and Light Gradient Boosting Machine (LightGBM) algorithms to construct a multi-factor stock selection strategy. Focusing on China Securities Index 300 (CSI 300) constituent stocks, it integrates fundamental and technical factors to conduct factor validity testing and weight optimization through machine learning methods. The study first constructs and preprocesses the factor system. Subsequently, a Random Forest model is employed for binary classification prediction of stock returns, achieving an accuracy of 81.7% and an Area Under the Receiver Operating Characteristic Curve (AUC) value of 0.88 on the test set, demonstrating strong classification capability. Further, a Light Gradient Boosting Machine regression model is used to predict stock market capitalization, yielding a goodness-of-fit of 0.91, which is then utilized to screen for undervalued stock portfolios. During strategy backtesting, four highimportance factors were weighted and synthesized. Over the backtesting period from August 2022 to August 2025, the strategy achieved a total return of 28.68% and an annualized return of 9.07%, generating an excess return of 29.65% relative to the benchmark index. The maximum drawdown was 18.39%, with a Sharpe Ratio of 0.310. The results demonstrate that quantitative strategies integrating multi-factor and machine learning approaches exhibit certain effectiveness and practicality in the A-share market.