Modelling rare events using non-parametric machine learning classifiers - Under what circumstances are support vector machines preferable to conventional parametric classifiers?
Modellering av ”rare events” med hjälp av maskininlärningsmetoder -- under vilka omständigheter är det mer lämpligt att tillämpa SVM än de konventionella klassificeringsmetoderna?
Abstract
Rare event modelling is an important topic in quantitative social science research. However, despite the fact that traditional classifiers based upon general linear models (GLM) might lead to biased results, little attention in the social science community is devoted to methodological studies aimed at alleviating such bias, even fewer of them have considered the use of machine learning methods to tackle analytical problems imposed by rare events.
In this thesis, I compared the classification performance of the SVMs – a group of machine learning classification algorithms – with that of the GLMs under the presence of imbalanced classes and rare events. The results of this study shows that the standard SVMs have no better classification performance than the traditional GLMs. In addition, the standard SVMs also tend to have low sensitivity, rendering it inappropriate for rare event modelling. Although the cost-sensitive SVMs could lead to more rare events be identified, these methods tend to suffer from overfitting as the events become rarer. Finally, the results of the empirical analysis using the Military Interstate Dispute (MID) data imply that the probabilistic outputs produced by Platt scaling are not reliable. For the above reasons, a wider application of SVMs in rare event modelling is not supported by the results of this study.
Degree
Student essay
View/ Open
Date
2021-04-06Author
Ma, Lukas
Series/Report no.
202104:61
Uppsats
Language
eng