• English
    • svenska
  • svenska 
    • English
    • svenska
  • Logga in
Redigera dokument 
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Economics / Institutionen för nationalekonomi med statistik
  • Kandidatuppsatser / Institutionen för nationalekonomi och statistik
  • Redigera dokument
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Economics / Institutionen för nationalekonomi med statistik
  • Kandidatuppsatser / Institutionen för nationalekonomi och statistik
  • Redigera dokument
JavaScript is disabled for your browser. Some features of this site may not work without it.

Predicting hotel cancellations using machine learning

Sammanfattning
Room cancellations is a big challenge for the hotel industry since the number of guest affects the whole operational setup. The purpose of the thesis is to predict hotel cancella-tions using machine learning and analyse which factors have the most influence. Broadly speaking, machine learning can be summarized as an interdisciplinary science for using computers to solve a given problem by finding patterns and learning from existing data. Machine learning involves theory from among others probability, statistics, optimization, algorithms and computer science. The problem of predicting cancellations is a binary classification problem, as the two possible outcomes are cancellation or non-cancellation. Classification in statistics is the process of determining what class a given input data belongs to, in other words predicting a qualitative outcome variable. Data was provided by a hotel in the Gothenburg area and the machine learning algorithms used in the thesis were Random Forest, XGBoost and Logit. Random Forest and XGBoost are tree-based models, which creates decision trees in order to make predictions and in a classification problem these are referred to as classification trees. The aim for a classification tree is to determine a qualitative outcome variable by making step-wise binary splits, where the different outcomes are denoted as classes. The logit model, or logistic regression, is a form of binary regression which is used as a reference model in this thesis. Our main findings indicate that Random Forest is the best performing model onthe hotel data with an accuracy close to 80%. Leadtime, which is a numeric variable that represent the days between when the hotel reservation was made and day of arrival, was the most influential variable in the Random Forest model. Adding weather data marginally improved the accuracy of predicting hotel cancellations, for all models.
Examinationsnivå
Student essay
URL:
http://hdl.handle.net/2077/70742
Samlingar
  • Kandidatuppsatser / Institutionen för nationalekonomi och statistik
Fil(er)
Thesis frame (666.6Kb)
Datum
2022-02-18
Författare
Gartvall, Enok
Skånhagen, Oscar
Serie/rapportnr.
202202:181
Uppsats
Språk
eng
Metadata
Visa fullständig post

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV
 

 

Visa

VisaSamlingarI datumordningFörfattareTitlarNyckelordDenna samlingI datumordningFörfattareTitlarNyckelord

Mitt konto

Logga inRegistrera dig

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV