Video Game Sales Success Using Random Forest Based Machine Learning Techniques
Marwa Al-Hadi *
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Hiba ALMarwi
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Abdulrahman Alsabri
Department of Information system, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Idrees Hajar
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Ahmed Al-Kataby
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Hussam Al-Maswari
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Ali Almansor
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Ashraf Alshujaa
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Ibrahim Al-Zubaidi
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Osama Dammag
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Ahmed Alghawri
Department of Computer Science, Faculty of Computer and IT, Sana’a University, Sana’a, Yemen.
Abdullah Amer
Department of Computer Science, Faculty of Computer Science and Information Technology, Aden University, Aden, Yemen.
*Author to whom correspondence should be addressed.
Abstract
Background: Accurately predicting video game sales remains a challenging task due to the complex interaction of multiple factors, including game genre, platform, publisher reputation, release timing, and market competition. Traditional forecasting approaches often rely on historical averages or manual analysis, which may fail to capture nonlinear relationships within large and heterogeneous datasets.
Aims: The present study develops a machine learning-based model for predicting video game sales success and to evaluate its effectiveness in classifying games into high and low sales categories.
Study Design: An experimental study based on supervised machine learning classification techniques.
Methodology: A publicly available dataset containing video game attributes such as genre, platform, publisher, and global sales was utilized. Data preprocessing included data cleaning, removal of irrelevant features, categorical encoding, and normalization. Class imbalance was addressed using oversampling techniques. Feature selection was performed using the chi-square test to identify the most relevant predictors. The problem was formulated as a binary classification task by defining a target variable representing high and low sales categories. The proposed model was evaluated and compared with baseline classifiers under the same experimental conditions.
Results: The proposed machine learning model achieved an accuracy of 78%, with balanced precision and recall values. Comparative evaluation showed that the proposed approach outperformed baseline models, including Support Vector Machine and Logistic Regression, across all performance metrics. The model demonstrated strong capability in identifying high-sales games with improved classification reliability.
Conclusion: Machine learning techniques provide an effective approach for predicting video game sales success and can support data-driven decision-making in the gaming industry. The proposed framework improves classification performance through structured preprocessing and feature selection; however, further studies incorporating additional external factors may enhance prediction accuracy.
Keywords: Machine learning, random forest, video game sales prediction, classification, data preprocessing, imbalanced data