Abstract:
Flight delays pose a significant challenge to the growth and efficiency of the US airline industry, which is a central focus of our thesis that extends to the broader global airline sector. This study aims to predict aircraft arrival delays utilizing machine learning models. In addressing the substantial challenge of aircraft delays within the US airline industry, our research relied on Kaggle datasets. Following comprehensive data reorganization and cleanup processes, a crucial aspect of our approach involved integrating weather data from the arrival airport at the time of arrival into the model, thereby enhancing our dataset to bolster predictive accuracy. The thesis explores the efficacy of four prominent machine learning methodologies—Random Forest, CatBoost, Gradient Boosting, and AdaBoost—utilizing robust GridSearchCV to rigorously test numerous hyperparameter combinations. The outcomes underscored the Random Forest model's prowess, showcasing an impressive 83% accuracy and an F1-score of 0.56, emerging as the most promising method. While Gradient Boosting, CatBoost, and AdaBoost achieved accuracies of 81.1%, 81%, and 72.6% respectively, the highest F1-score of 0.56 was attained by the Random Forest method, followed by Gradient Boosting (0.47), CatBoost (0.46), and AdaBoost (0.45). However, limitations in accessing in-flight traffic data, enroute weather at different flight levels, and aircraft-specific details underscore the necessity for enhanced data acquisition strategies to fortify real-time aircraft arrival predictions and further refine predictive models. As aircraft delays, primarily influenced by air traffic and weather, continue to present challenges, specialized forecasting models and the integration of advanced technologies become imperative for improving air travel efficiency