Online detekce anomálií v časových řadách

Pajurek Tomáš

Online Anomaly Detection in Time-Series

Typ dokumentu

diplomová práce
master thesis

Autor

Pajurek Tomáš

Vedoucí práce

Borovička Tomáš

Oponent práce

Vašata Daniel

Studijní obor

Znalostní inženýrství

Studijní program

Informatika

Instituce přidělující hodnost

katedra aplikované matematiky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

Metody pro online detekci anomálií jsou navrženy pro odhalování anomalií ve spojitém proudu dat namísto ve statickém datasetu. Tyto metody jsou schopné se adaptovat na změny v charakteristice datového proudu, který může v čase nastávat (concept drift). Tato práce analyzuje čtyři metody vhodné pro online detekci anomálií v časových řadách (klouzavý průměr, local outlier factor, isolation forest, hierarchical temporal memory) a několik metod detekce concept driftu včetně některých nových přístupů. Je navrženo obecné schéma, které umožňuje kombinovat různé metody pro detekci anomálií a concept driftu. Pro všechny analyzované metody jsou provedeny experimenty na pěti realných datasetech a jednom umělém. Během experimentů byly zkoumány vlastnosti jednotlivých metod a porovnáván jejich výkon s ostatními metodami. Výsledky experimentů ukazují, že žádná metoda není lepší než ostatní na všech datasetech z hlediska F1 skóre upraveného pro úlohu detekce anomalií (harmonický průměr specificity a míry falešné pozitivních detekcí) a AUC. Ve většině případů bylo nalezeno optimální nastavení methody s F1 skóre >85% a AUC >90%.

Methods for online anomaly detection are designed to reveal anomalies in a continuous stream of data rather than in a static dataset. These methods are able to adapt to the changes of underlying characteristics of the stream that might occur in time (concept drift). This thesis reviews four methods suitable for online anomaly detection in time-series (moving average, local outlier factor, isolation forest, hierarchical temporal memory) and several concept drift detection methods including some novel approaches. A general framework that allows to orthogonally combine various anomaly detection methods and concept drift detection methods is proposed. Experiments were executed for all reviewed methods on five real-world datasets and one artificial dataset. During the experiments, the properties of individual methods were examined as well as their performance compared to the other methods. Results of the experiments show that none of the methods is superior to the others on all datasets in terms of F1 score adapted for anomaly detection (harmonic mean of recall and false positive rate) and AUC. In the majority of cases, an optimal method settings with F1 score >85% and AUC >90% was found.