Multi-factor Selection and Stock Picking Strategy Based on Random Forest and LightGBM Algorithms

Authors

  • Pengjie Yuan Author

DOI:

https://doi.org/10.61173/w66yzp49

Keywords:

Random Forest, LightGBM, Multi-factor Stock Selection, CSI 300 Index, Backtesting Analysis

Abstract

This study employs Random Forest (RF) and Light Gradient Boosting Machine (LightGBM) algorithms to construct a multi-factor stock selection strategy. Focusing on China Securities Index 300 (CSI 300) constituent stocks, it integrates fundamental and technical factors to conduct factor validity testing and weight optimization through machine learning methods. The study first constructs and preprocesses the factor system. Subsequently, a Random Forest model is employed for binary classification prediction of stock returns, achieving an accuracy of 81.7% and an Area Under the Receiver Operating Characteristic Curve (AUC) value of 0.88 on the test set, demonstrating strong classification capability. Further, a Light Gradient Boosting Machine regression model is used to predict stock market capitalization, yielding a goodness-of-fit of 0.91, which is then utilized to screen for undervalued stock portfolios. During strategy backtesting, four highimportance factors were weighted and synthesized. Over the backtesting period from August 2022 to August 2025, the strategy achieved a total return of 28.68% and an annualized return of 9.07%, generating an excess return of 29.65% relative to the benchmark index. The maximum drawdown was 18.39%, with a Sharpe Ratio of 0.310. The results demonstrate that quantitative strategies integrating multi-factor and machine learning approaches exhibit certain effectiveness and practicality in the A-share market.

Downloads

Published

2025-12-19

Issue

Section

Articles