Horse Racing Outcome Prediction — Deep Neural Network vs Random Forest

Course STAT4012 – Statistical Machine Learning

Completed on May 07, 2021

Project type Group project

This group project analyzes Hong Kong horse racing data and compares several machine learning methods for predicting race outcomes. The work combines data preparation, feature engineering, model comparison, and result interpretation.

Highlights

Worked with a dataset containing more than 130,000 Hong Kong horse racing records.
Compared deep learning and classical machine learning models in the same prediction task.
Examined which features contributed most strongly to predictive performance.

Methods

Built a dataset combining horse demographics, race characteristics, jockey and trainer information, track conditions, and sectional time variables.
Evaluated Deep Neural Network, Random Forest, XGBoost, and Logistic Regression models.
Compared models using predictive performance and feature importance patterns.

Findings

Random Forest achieved the strongest out-of-sample accuracy among the main models tested.
Deep Neural Networks were competitive but required more tuning and model adjustment.
Odds-related and horse-form variables contributed substantially to predictive performance.

Resources

GitHub Repository