Zobrazit minimální záznam

Sociopath: Automatic Local Events Extractor



dc.contributor.advisorDrchal Jan
dc.contributor.authorAlperovich Galina
dc.date.accessioned2017-06-07T16:22:03Z
dc.date.available2017-06-07T16:22:03Z
dc.date.issued2017-05-26
dc.identifierKOS-695600328705
dc.identifier.urihttp://hdl.handle.net/10467/70498
dc.description.abstractThe Internet is large data source which is mostly unstructured from the semantic point of view. Despite the fact there are many attempts to unify the way how information is presented, there is still no general format for it. For the computer program, it is easy to read the Web page as HTML code, but it's hard to understand the meaning and extract the semantic structure. It makes the automatic information extraction be the challenging problem. Automatic extraction of the information from Web pages is a common task in data mining. It is used in many modern services and strongly related to the structure of the webpage and the properties of the content itself. The thesis is focused on Web information extraction about local social events. Social events include various cultural events, sports events, and any other activities. One of the biggest problems in Web Extraction field is collecting the training data. In this thesis, we presented the approach with the use of Microdata semantic markup for automatic collecting the labeled training dataset. We built the system which automatically collects the training samples with comprehensive features including visual, textual, spatial and DOM-related. Also, this thesis is focused on various techniques on data processing, cleaning and building the classification model for every extracted event component.cze
dc.description.abstractThe Internet is large data source which is mostly unstructured from the semantic point of view. Despite the fact there are many attempts to unify the way how information is presented, there is still no general format for it. For the computer program, it is easy to read the Web page as HTML code, but it's hard to understand the meaning and extract the semantic structure. It makes the automatic information extraction be the challenging problem. Automatic extraction of the information from Web pages is a common task in data mining. It is used in many modern services and strongly related to the structure of the webpage and the properties of the content itself. The thesis is focused on Web information extraction about local social events. Social events include various cultural events, sports events, and any other activities. One of the biggest problems in Web Extraction field is collecting the training data. In this thesis, we presented the approach with the use of Microdata semantic markup for automatic collecting the labeled training dataset. We built the system which automatically collects the training samples with comprehensive features including visual, textual, spatial and DOM-related. Also, this thesis is focused on various techniques on data processing, cleaning and building the classification model for every extracted event component.eng
dc.language.isoENG
dc.publisherČeské vysoké učení technické v Praze. Vypočetní a informační centrum.cze
dc.publisherCzech Technical University in Prague. Computing and Information Centre.eng
dc.rightsA university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.htmleng
dc.rightsVysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.htmlcze
dc.subjectinformation extraction, web extraction, microdata, event extraction, machine learningcze
dc.subjectinformation extraction, web extraction, microdata, event extraction, machine learningeng
dc.titleSociopath: automatická extrakce informací o kulturních událostechcze
dc.titleSociopath: Automatic Local Events Extractoreng
dc.typediplomová prácecze
dc.typemaster thesiseng
dc.date.accepted
dc.contributor.refereeŠourek Gustav
theses.degree.disciplineUmělá inteligencecze
theses.degree.grantorkatedra počítačůcze
theses.degree.programmeOtevřená informatikacze


Soubory tohoto záznamu





Tento záznam se objevuje v následujících kolekcích

Zobrazit minimální záznam