• English
    • svenska
  • English 
    • English
    • svenska
  • Login
View Item 
  •   Home
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Master
  • View Item
  •   Home
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Master
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Exploit Unlabeled Data with Language Model for Text Classification. Comparison of four unsupervised learning models

Abstract
Within a situation where Semi-Supervised Learning (SSL) is available to exploit unlabeled data, this paper shows that Language Model (LM) outperforms the three models in text classification, which three models are based on Term-Frequency Inverse Document Frequency (Tf-idf) and two pre-trained word vectors. The experimental results show that the LM outperforms the other three unsupervised learning models whether the task is easy or difficult, which the difficult task consists of imbalanced data. To investigate not only how the LM outperforms the other models but also how to maximize the performance of the LM in a small quantity of labeled data, this paper suggests two techniques to improve the performance of the LM in neural networks: (1) obtaining information from the neural network layers and (2) employing a proper evaluation for trained neural networks models. Finally, this paper explores the various scenarios where SSL is not available, but only Transfer Learning (TL) is accessible to exploit unlabeled data. With two types of Self-Taught Learning and Multi-Tasks in TL, the results of the experiments show that exploiting dataset which has wider domain benefits the performance of the LM.
Degree
Student essay
URI
http://hdl.handle.net/2077/58035
Collections
  • Master
View/Open
Masteruppsats språkteknologi (1.188Mb)
Date
2018-10-29
Author
Yang, Sung-Min
Keywords
Text classification
Semi-supervised learning
Unsupervised learning
Transfer learning
Natural Language Processing
Publication type
H2
Language
eng
Metadata
Show full item record

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV