Machine Learning for Detecting Hate Speech in Low Resource Languages
Machine Learning for Detecting Hate Speech in Low Resource Languages
Sammanfattning
This work examines the role of both cross-lingual zero-shot learning and data augmentation
in detecting hate speech online for low resource set-ups. The proposed
solutions for situations where the amount of labeled data is scarce are to use a
language with more resources during training or to create synthetic data points.
Cross-lingual zero-shot results suggest some knowledge transfer is occurring. However,
results seem greatly influenced by the specific training data set selected. This
is further supported by cross-data set experimentation within the same language,
where results were also found to fluctuate based on training data without the need
for cross-lingual transfer. Meanwhile, data augmentation methods show an improvement,
especially for low amounts of data. Furthermore, a detailed discussion
on how the proposed data augmentation techniques impact the data is presented in
this work.
Examinationsnivå
Student essay
Samlingar
Fil(er)
Datum
2020-07-08Författare
Rodriguez, David
Saynova, Denitsa
Nyckelord
machine learning
natural language processing
BERT
cross-lingual zeroshot learning
data augmentation
hate speech
classification
Twitter
Serie/rapportnr.
CSE 20-16
Språk
eng