Porovnání národních časových řad vývoje COVID-19

Michael Kolínský

Comparison of national COVID-19 time series

Typ dokumentu

bakalářská práce
bachelor thesis

Autor

Michael Kolínský

Vedoucí práce

Dedecius Kamil

Oponent práce

Žemlička Radomír

Studijní obor

Teoretická informatika

Studijní program

Informatika 2009

Instituce přidělující hodnost

katedra teoretické informatiky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

Tato práce se zabývá analýzou národních časových řad denního počtu nově nakažených virem COVID-19. Data jsou převzatá ze Světové zdravotnické organizace. Ve fázi předzpracování dat jsou národní řady přeškálovány na počet obyvatel v dané zemi. Datům je snížena dimenzi s pomocí metody Piecewise aggregate approximation a jsou odstraněny všechny složky časových řad s vyjímkou trendu. V práci jsou definovány čtyři metody porovnání časových řad jako Dynamic Time Warping (DTW), Edit Distance With Real Penalty (ERP), Longest Common Subsequence Similarity (LCSS) a Diskrétní Fréchetova vzdálenost. V další fázi je na předzpracovaná data aplikován algoritmus aglomerativního hierarchického shlukování s použitím průměrné párové vzdálenosti a využitím předchozích metrik. V poslední fázi jsou zvoleny výsledné počety shluků pro všechny metriky s využitím dendrogramu. V závěru práce se nachází vykreslené shluky, které jsou diskutovány spolu s vlastnostmi použitých metod měření vzdálenosti.

This thesis analyses the national time series of newly infected people by COVID-19. The data are taken from the World Health Organization. In the preprocessing phase are the national time series scaled to respect the size of the population. The dimension is reduced using the Piecewise aggregate approximation and just the trend component of the time series is taken into account. In the thesis, there are defined four measures of time series (dis)similarity like Dynamic Time Warping (DTW), Edit Distance With Real Penalty (ERP), Longest Common Subsequence Similarity LCSS, and Discrete Fréchet distance. In the following phase, the preprocessed data are clustered using the agglomerative hierarchical clustering algorithm with the use of the average linkage that exploits the defined measures. In the last phase, the resulting count of clusters is chosen for each metric using the dendrogram. In the conclusion of this thesis, there are the resulting plots, which are further discussed together with the properties of the distance measures.