Predicting hotel cancellations using machine learning
Abstract
Room cancellations is a big challenge for the hotel industry since the number of guest affects the whole operational setup. The purpose of the thesis is to predict hotel cancella-tions using machine learning and analyse which factors have the most influence. Broadly speaking, machine learning can be summarized as an interdisciplinary science for using computers to solve a given problem by finding patterns and learning from existing data. Machine learning involves theory from among others probability, statistics, optimization, algorithms and computer science. The problem of predicting cancellations is a binary classification problem, as the two possible outcomes are cancellation or non-cancellation. Classification in statistics is the process of determining what class a given input data belongs to, in other words predicting a qualitative outcome variable. Data was provided by a hotel in the Gothenburg area and the machine learning algorithms used in the thesis were Random Forest, XGBoost and Logit. Random Forest and XGBoost are tree-based models, which creates decision trees in order to make predictions and in a classification problem these are referred to as classification trees. The aim for a classification tree is to determine a qualitative outcome variable by making step-wise binary splits, where the different outcomes are denoted as classes. The logit model, or logistic regression, is a form of binary regression which is used as a reference model in this thesis. Our main findings indicate that Random Forest is the best performing model onthe hotel data with an accuracy close to 80%. Leadtime, which is a numeric variable that represent the days between when the hotel reservation was made and day of arrival, was the most influential variable in the Random Forest model. Adding weather data marginally improved the accuracy of predicting hotel cancellations, for all models.
Degree
Student essay
View/ Open
Date
2022-02-18Author
Gartvall, Enok
Skånhagen, Oscar
Series/Report no.
202202:181
Uppsats
Language
eng