Abstract:
The study provides an analysis of the Vietnamese stock market using statistical and machine learning models. The dataset shows that all features have a positive linear relationship with the VN Index, but exhibit different scales and degrees of skewness. The Augmented Dickey-Fuller (ADF) unit root test was conducted to identify whether the variables were stationary or non-stationary, and most variables were transformed into stationary data through first differencing. The OLS method was used to construct a short run model, and the results indicated that only three variables, namely CPI, exchange rate, and S&P500 index, exhibited statistical significance. The ARDL Bound test was conducted, and the results indicated that there is a long-run relationship between the variables under consideration, and the results indicated that only three variables, namely CPI, GDP, and S&P500 index, exhibited statistical significance. The decision tree, random forest, and XGBoost models were used to study short and long run relationships. The findings suggest that the random forest model performed the best in the short run, while the XGBoost model performed the best in the long run. The three statistically significant variables of both OLS and ARDL were ranked as the top-three influential variables on Random Forest and XGBoost, respectively.