Machine Learning for Detecting Hate Speech in Low Resource Languages
| dc.contributor.author | Rodriguez, David | |
| dc.contributor.author | Saynova, Denitsa | |
| dc.contributor.department | Göteborgs universitet/Institutionen för data- och informationsteknik | swe | 
| dc.contributor.department | University of Gothenburg/Department of Computer Science and Engineering | eng | 
| dc.date.accessioned | 2020-07-08T11:39:08Z | |
| dc.date.available | 2020-07-08T11:39:08Z | |
| dc.date.issued | 2020-07-08 | |
| dc.description.abstract | This work examines the role of both cross-lingual zero-shot learning and data augmentation in detecting hate speech online for low resource set-ups. The proposed solutions for situations where the amount of labeled data is scarce are to use a language with more resources during training or to create synthetic data points. Cross-lingual zero-shot results suggest some knowledge transfer is occurring. However, results seem greatly influenced by the specific training data set selected. This is further supported by cross-data set experimentation within the same language, where results were also found to fluctuate based on training data without the need for cross-lingual transfer. Meanwhile, data augmentation methods show an improvement, especially for low amounts of data. Furthermore, a detailed discussion on how the proposed data augmentation techniques impact the data is presented in this work. | sv | 
| dc.identifier.uri | http://hdl.handle.net/2077/65590 | |
| dc.language.iso | eng | sv | 
| dc.relation.ispartofseries | CSE 20-16 | sv | 
| dc.setspec.uppsok | Technology | |
| dc.subject | machine learning | sv | 
| dc.subject | natural language processing | sv | 
| dc.subject | BERT | sv | 
| dc.subject | cross-lingual zeroshot learning | sv | 
| dc.subject | data augmentation | sv | 
| dc.subject | hate speech | sv | 
| dc.subject | classification | sv | 
| dc.subject | sv | |
| dc.title | Machine Learning for Detecting Hate Speech in Low Resource Languages | sv | 
| dc.title.alternative | Machine Learning for Detecting Hate Speech in Low Resource Languages | sv | 
| dc.type | text | |
| dc.type.degree | Student essay | |
| dc.type.uppsok | H2 | 
Files
Original bundle
1 - 1 of 1
 No Thumbnail Available 
- Name:
- gupea_2077_65590_1.pdf
- Size:
- 6.4 MB
- Format:
- Adobe Portable Document Format
- Description:
- Master thesis
License bundle
1 - 1 of 1
 No Thumbnail Available 
- Name:
- license.txt
- Size:
- 876 B
- Format:
- Item-specific license agreed upon to submission
- Description: