Sociopath: automatická extrakce informací o kulturních událostech
Sociopath: Automatic Local Events Extractor
dc.contributor.advisor | Drchal Jan | |
dc.contributor.author | Alperovich Galina | |
dc.date.accessioned | 2017-06-07T16:22:03Z | |
dc.date.available | 2017-06-07T16:22:03Z | |
dc.date.issued | 2017-05-26 | |
dc.identifier | KOS-695600328705 | |
dc.identifier.uri | http://hdl.handle.net/10467/70498 | |
dc.description.abstract | The Internet is large data source which is mostly unstructured from the semantic point of view. Despite the fact there are many attempts to unify the way how information is presented, there is still no general format for it. For the computer program, it is easy to read the Web page as HTML code, but it's hard to understand the meaning and extract the semantic structure. It makes the automatic information extraction be the challenging problem. Automatic extraction of the information from Web pages is a common task in data mining. It is used in many modern services and strongly related to the structure of the webpage and the properties of the content itself. The thesis is focused on Web information extraction about local social events. Social events include various cultural events, sports events, and any other activities. One of the biggest problems in Web Extraction field is collecting the training data. In this thesis, we presented the approach with the use of Microdata semantic markup for automatic collecting the labeled training dataset. We built the system which automatically collects the training samples with comprehensive features including visual, textual, spatial and DOM-related. Also, this thesis is focused on various techniques on data processing, cleaning and building the classification model for every extracted event component. | cze |
dc.description.abstract | The Internet is large data source which is mostly unstructured from the semantic point of view. Despite the fact there are many attempts to unify the way how information is presented, there is still no general format for it. For the computer program, it is easy to read the Web page as HTML code, but it's hard to understand the meaning and extract the semantic structure. It makes the automatic information extraction be the challenging problem. Automatic extraction of the information from Web pages is a common task in data mining. It is used in many modern services and strongly related to the structure of the webpage and the properties of the content itself. The thesis is focused on Web information extraction about local social events. Social events include various cultural events, sports events, and any other activities. One of the biggest problems in Web Extraction field is collecting the training data. In this thesis, we presented the approach with the use of Microdata semantic markup for automatic collecting the labeled training dataset. We built the system which automatically collects the training samples with comprehensive features including visual, textual, spatial and DOM-related. Also, this thesis is focused on various techniques on data processing, cleaning and building the classification model for every extracted event component. | eng |
dc.language.iso | ENG | |
dc.publisher | České vysoké učení technické v Praze. Vypočetní a informační centrum. | cze |
dc.publisher | Czech Technical University in Prague. Computing and Information Centre. | eng |
dc.rights | A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html | eng |
dc.rights | Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html | cze |
dc.subject | information extraction, web extraction, microdata, event extraction, machine learning | cze |
dc.subject | information extraction, web extraction, microdata, event extraction, machine learning | eng |
dc.title | Sociopath: automatická extrakce informací o kulturních událostech | cze |
dc.title | Sociopath: Automatic Local Events Extractor | eng |
dc.type | diplomová práce | cze |
dc.type | master thesis | eng |
dc.date.accepted | ||
dc.contributor.referee | Šourek Gustav | |
theses.degree.discipline | Umělá inteligence | cze |
theses.degree.grantor | katedra počítačů | cze |
theses.degree.programme | Otevřená informatika | cze |
Soubory tohoto záznamu
Tento záznam se objevuje v následujících kolekcích
-
Diplomové práce - 13136 [892]