• English
    • svenska
  • svenska 
    • English
    • svenska
  • Logga in
Redigera dokument 
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • Redigera dokument
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • Redigera dokument
JavaScript is disabled for your browser. Some features of this site may not work without it.

Detection of software incidents from large log material with the use of unsupervised machine learning

Sammanfattning
Computer systems generate log files, which contain information on the various operations performed by these systems. This information can support the process of error/failure detection and debugging. Therefore, anomalies can be spotted in the system through its produced log material. The task of anomaly detection can be treated as a binary classification of log files, with the two classes being anomalous and non anomalous. Due to the sheer volume of data and the complexity of the task, it is not possible for it to be performed manually by humans, thus creating the need for automation. Centiro, a Swedish software company, has decided to follow a machine learning approach for automating the task of software incident detection. In this thesis, we apply four machine learning models in order to detect anomalies. These are namely the Local Outlier Factor (LOF), the Isolation Forest (IF), the Principal Component Analysis (PCA) and the LSTM-Autoencoder. We make use of four publicly available datasets as well as a dataset gathered from the produced logs of the computer systems of the company. Preprocessing of the data and selection of the appropriate features are two tasks that needed to be carefully performed for the successful implementation of the models. Precision, Recall and F-Score were used as evaluation metrics to measure the performance of the models on the different datasets. The model with the best and most stable overall performance on the publicly available datasets is the LSTM-Autoencoder, therefore we decided to apply it on the data of the company in order to detect any possible software incidents.
Examinationsnivå
Student essay
URL:
https://hdl.handle.net/2077/72192
Samlingar
  • Masteruppsatser
Fil(er)
CSE 22-08 Anastasiadis Lenart.pdf (3.997Mb)
Datum
2022-06-20
Författare
ANASTASIADIS, DIMITRIOS
LENART, JAKUB
Nyckelord
binary classification
log
anomaly detection
machine learning
Local Outlier Factor
Isolation Forest
PCA
LSTM-Autoencoder
Språk
eng
Metadata
Visa fullständig post

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV
 

 

Visa

VisaSamlingarI datumordningFörfattareTitlarNyckelordDenna samlingI datumordningFörfattareTitlarNyckelord

Mitt konto

Logga inRegistrera dig

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV