Compressed Machine Learning on Time Series Data

Finger, Felix; Gocht, Nathalie

Efficient compression through clustering using candidate selection and the application of machine learning on compressed data

Abstract

The extent of time related data across many fields has led to substantial interest in the analysis of time series. This interest meets growing challenges to store and process data. While the data is collected at an exponential rate, advancements in processing units are slowing down. Therefore, active research is practiced to find more efficient means of storing and processing data. This can be especially difficult for time series due to their various shapes and scales. In this thesis, we present two variants for optimising a Greedy Clustering algorithm used for lossy time series compression. This study investigates, whether the efficient but lossy compression sufficiently preserves the characteristics of the time series to allow time series prediction and anomaly detection. We suggest two variants for a performance optimization, Greedy SF and Greedy SAX. These algorithms are based on novel lookup methods for cluster candidate selection based on statistical features of time series and extracted SAX substrings. Furthermore, we enabled the clustering to allow processing time series with different value ranges, which allows the compression of time series with various scales. To validate the endto- end pipeline including compression and prediction, a performance evaluation is applied. To further analyse the applicability, a comprehensive benchmark against a pipeline with an autoencoder for compression and a stacked LSTM for prediction is performed.

Degree

Student essay

Date

2020-07-08

Author

Finger, Felix

Gocht, Nathalie

Keywords

time series clustering

large scale data

machine learning

prediction

anomaly detection

compression

Series/Report no.

CSE 20-13

Language

eng

Metadata

Show full item record