Částečně řízené učení milionů astronomických spekter

Palička Andrej

Semi-Supervised Learning of Millions of Astronomical Spectra

Type of document

diplomová práce
master thesis

Author

Palička Andrej

Supervisor

Škoda Petr

Opponent

Šimeček Ivan

Field of study

Znalostní inženýrství

Study program

Informatika

Institutions assigning rank

18101

Defended

2016-06-16

Rights

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://www.cvut.cz/sites/default/files/content/d1dc93cd-5894-4521-b799-c7e715d3c59e/cs/20160901-metodicky-pokyn-c-12009-o-dodrzovani-etickych-principu-pri-priprave-vysokoskolskych.pdf
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://www.cvut.cz/sites/default/files/content/d1dc93cd-5894-4521-b799-c7e715d3c59e/cs/20160901-metodicky-pokyn-c-12009-o-dodrzovani-etickych-principu-pri-priprave-vysokoskolskych.pdf

Metadata

Show full item record

Abstract

Použili sme čiastočne riadené učenie na detekciu emisných spektier v archíve z observatória LAMOST za pomoci masívne paralelného prostredia Spark. Implementovali sme aplikáciu, ktorá tieto spektrá predspracuje a aplikuje sériu transformácii aby sme tieto dáta mohli použiť na trénovanie modelov. Ďalej sme implementovali algoritmy čiastočne riadeného učenia, založené na grafovej reprezentácii dát, zvané Label Propagation a Label Spreading. tieto algoritmy používame na naučenie modelu, ktorý spektrá bude klasifikovať. Aplikovali sme tieto algoritmy na podmnožinu archívu, ktorej veľkosť bola jeden milión spektier.

We use semi-supervised learning to detect spectra with emission in an archive from the LAMOST observatory using a massively parallel environment called Spark. We have implemented a preprocessing application that would take original raw spectra and apply series of transformations in order for them to be usable for training models. We have also implemented graph-based semi-supervised algorithms Label Propagation and Label Spreading. We use these to fit the models and then classify the spectra. We have applied these algorithms to a subsample of the archive of size one million of spectra.