An Empirical Study Based on Machine Learning and LSTM in Stock Prediction

Authors

  • Zhongyu Wang Author

DOI:

https://doi.org/10.61173/yhe54f71

Keywords:

LSTM, stock prediction, Machine Learning

Abstract

This research aims to make a comprehensive comparative analysis between traditional machine learning methods and Long Short-Term Memory networks (LSTMs) towards stock return predictions. By using the full Kaggle market data set (the Stock Price Prediction Challenge), we managed to generate an integrated forecasting pipeline for stock prediction. In this data set, we used 45 stocks and three major indexes to engineer extensive features and a strong model validation. After considering all factors, Gradient Boosting, outperforming both traditional methods and LSTM, achieves the greatest training performance: Mean Squared Error (MSE) of 0.000135, R2 of 0.027195, and mean absolute percentage error (MAPE) of 155.85%. Contrary to the initial assumption, all models exhibited severe overfitting. A significant performance drop on the validation set suggests a major challenge in practical prediction use. The findings indicate that while the models do not provide practically useful, accurate return predictions based solely on price information, they do provide strong comparison benchmarks and methodological suggestions in future studies involving other data and stronger regularisation approaches.

Downloads

Published

2025-12-19

Issue

Section

Articles