Text analysis for email multi label classification
Text analysis for email multi label classification
Abstract
This master’s thesis studies a multi label text classification task on a small data
set of bilingual, English and Swedish, short texts (emails). Specifically, the size of
the data set is 5800 emails and those emails are distributed among 107 classes with
the special case that the majority of the emails includes the two languages at the
same time. For handling this task different models have been employed: Support
Vector Machines (SVM), Gated Recurrent Units (GRU), Convolution Neural Network
(CNN), Quasi Recurrent Neural Network (QRNN) and Transformers. The
experiments demonstrate that in terms of weighted averaged F1 score, the SVM
outperforms the other models with a score of 0.96 followed by the CNN with 0.89
and the QRNN with 0.80.
Degree
Student essay
Collections
View/ Open
Date
2020-07-08Author
Paniskaki, Kyriaki
Harsha Kadam, Sanjit
Keywords
natural language processing
machine learning
multi label text classification
deep neural networks
bilingual texts
emails
short texts
Series/Report no.
CSE 20-14
Language
eng