Klasifikace na temporálních relačních datech

Mück Petr

Temporal Relational Classification

dc.contributor.advisor	Motl Jan
dc.contributor.author	Mück Petr
dc.date.accessioned	2018-06-08T08:03:09Z
dc.date.available	2018-06-08T08:03:09Z
dc.date.issued	2018-06-08
dc.identifier	KOS-784265590305
dc.identifier.uri	http://hdl.handle.net/10467/76368
dc.description.abstract	Tato práce se zabývá možnostmi klasifikace temporálních dat. V práci implementuji agregační model, který je schopen pracovat s relačními daty, jejichž záznamy jsou pro určitou entitu ve vztahu n:1 pro daný čas predikce třídy a pomocí agregačních funkcí -- průměr, minimum a maximum -- agreguje hodnoty atributů na jeden záznam pro každou entitu. Dále se v práci zabývám možnostmi optimalizace použité délky historie v agregaci pro zlepšení kvality predikce z důvodu, že nedávná data mohou být relevantnější než ta starší. Závislost mezi agregací atributů zdrojových dat v určité délce historie a cílovou třídou v čase poté hodnotím pomocí měr Chi2, vzájemné informace a Cohenova Kappa po aplikaci klasifikátoru Gaussovský Naivní Bayes. Výsledné nejlepší dosažené hodnoty Kappa poté porovnávám, tam, kde to je možné, s již existujícími klasifikačními algoritmy pro časové řady -- se skrytým Markovovým modelem a algoritmem ARIMA. Nejlepší zjištěné délky historie jsou nakonec aplikovány v klasifikačním algoritmu náhodný les a zjištěn jejich efekt na úspěšnost klasifikace. Provedeným výzkumem jsem zjistil, že výsledky klasifikace pomocí optimalizované délky historie na šesti z deseti testovaných datasetů dosahují lepší hodnoty Kappa v průměru o 33.57% vyšších oproti klasifikace pomocí agregace přes celou délku historie. Pro zbylé čtyři testované datasety pak nedochází k žádné výrazné změně. Agregační model dosahoval v porovnání s algoritmy ARIMA a skrytý Markovův model lepších výsledků, testy ale nebyly příliš rozsáhlé, protože většina datasetů použitých v práci neobsahuje více historických bodů ke klasifikaci pro jednu entitu a tedy nejsou přiliš vhodné pro standardní algoritmy časových řad. Závěrem práce tedy je, že agregační model ve většině případů nabízí lepší výsledky v optimalizované délce historie, než na historii celé.	cze
dc.description.abstract	This thesis describes options of classification of temporal data. In this thesis I implement aggregation model, which is able to work with relational data which have attributes of certain entity in n:1 relation to the predicted classes in certain time of prediction and using aggregation functions -- average, minimum and maximum -- aggregates the values of attributes into one record for each entity. The thesis further describes the ways of optimization of used history length for prediction quality increase, because recent data might be more relevant than the older data. Then, I calculate the similarity between aggregated attribute values and the predicted class of the entity using measures Chi2, mutual information and Kappa after applying the Gaussian Naive Bayes classifier. The best obtained values of Kappa are then compared to existing time series algorithms, hidden Markov model and ARIMA, on the datasets that allow it. The best lengths of history are then used in random forest classificator to find how the optimization affects the classification success. The results of testing are that on six out of ten tested datasets the Kappa values of the classifier using the optimized lengths of history are on average 33.57% better than when using the aggregated values over the whole history. There is no significant change for the four remaining datasets. Aggregation model achieved better results in comparison to time series algorithms ARIMA and hidden Markov model, the tests weren't very extensive however, because datasets used in the thesis usually do not contain more than one classification record in time and therefore are not suitable to standard time series algorithms. The conclusion is that the aggregation model presented in this thesis in most cases achieves better results in optimized history length than on the history as a whole.	eng
dc.language.iso	CZE
dc.publisher	České vysoké učení technické v Praze. Vypočetní a informační centrum.	cze
dc.publisher	Czech Technical University in Prague. Computing and Information Centre.	eng
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html	eng
dc.rights	Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html	cze
dc.subject	klasifikace,temporální data,relační data,vícerozměrná data,extrakce příznaků,optimalizace,historie,Chi2,vzájemná informace,Cohenovo Kappa,naivní Bayes,náhodný les	cze
dc.subject	classification,temporal data,relational data,multivariete data,feature engineering,optimization,history,Chi2,mutual information,Cohen Kappa,naive Bayes,random forest	eng
dc.title	Klasifikace na temporálních relačních datech	cze
dc.title	Temporal Relational Classification	eng
dc.type	diplomová práce	cze
dc.type	master thesis	eng
dc.date.accepted
dc.contributor.referee	Surynek Pavel
theses.degree.discipline	Znalostní inženýrství	cze
theses.degree.grantor	katedra aplikované matematiky	cze
theses.degree.programme	Informatika	cze

Soubory tohoto záznamu

Název:: F8-DP-2018-Muck-Petr-thesis.pdf
Velikost:: 4.617Mb
Formát:: PDF
Popis:: PLNY_TEXT
: Zobrazit/otevřít

Název:: F8-DP-2018-posudek-Motl_Jan.pdf
Velikost:: 144.6Kb
Formát:: PDF
Popis:: POSUDEK
: Zobrazit/otevřít

Název:: F8-DP-2018-posudek-Surynek_Pav ...
Velikost:: 142.6Kb
Formát:: PDF
Popis:: POSUDEK
: Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Diplomové práce - 18105 [195]

Zobrazit minimální záznam