Predictor Factory: Učení z relačních dat
Predictor Factory: Learning from Relational Data
Type of document
disertační prácedoctoral thesis
Author
Jan Motl
Supervisor
Kordík Pavel
Opponent
Kliegr Tomáš
Field of study
InformatikaStudy program
InformatikaInstitutions assigning rank
katedra aplikované matematikyRights
A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.htmlVysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html
Metadata
Show full item recordAbstract
Propositionalization algorithms transform relational data into a single table with features, which can be used for classification or regression with conventional machine learning tools. However, the contemporary propositionalization algorithms were not designed to work on changing data and suffered from the production of many irrelevant and redundant features. We altered the propositionalization to work on temporal data and introduced meta-learning, which predicts, which features will be relevant and unique. To test Predictor Factory, our implementation of propositionalization, we have created a repository of relational datasets and implemented a scalable algorithm for relationship discovery between tables in the dataset. The implementations were open-sourced and applied to real-world banking, government, marketing, medicine, and telecommunication problems. Propositionalization algorithms transform relational data into a single table with features, which can be used for classification or regression with conventional machine learning tools. However, the contemporary propositionalization algorithms were not designed to work on changing data and suffered from the production of many irrelevant and redundant features. We altered the propositionalization to work on temporal data and introduced meta-learning, which predicts, which features will be relevant and unique. To test Predictor Factory, our implementation of propositionalization, we have created a repository of relational datasets and implemented a scalable algorithm for relationship discovery between tables in the dataset. The implementations were open-sourced and applied to real-world banking, government, marketing, medicine, and telecommunication problems.