Metody document retrieval nad českými texty vhodné pro zpracování dlouhých vstupů

Alexander Gažo

Algorithms for Document Retrieval in Czech Language Supporting Long Inputs

dc.contributor.advisor	Drchal Jan
dc.contributor.author	Alexander Gažo
dc.date.accessioned	2021-08-31T22:51:35Z
dc.date.available	2021-08-31T22:51:35Z
dc.date.issued	2021-08-31
dc.identifier	KOS-1089438967905
dc.identifier.uri	http://hdl.handle.net/10467/97063
dc.description.abstract	Úloha vyhľadávania dokumentov (document retrieval) je dobre známy problém nájdenia relevantnej podmnožiny dokumentov k vyhľadávanemu dotazu. Nedávny pokrok v oblasti spracovania prirodzeného jazyka (NLP), konkrétne architektúra transformera (Vaswani et al., 2017) a model BERT (Devlin et al., 2018), poskytujú nový prístup k vyhľadávaniu dokumentov. Vyhľadávanie dokumentov v tejto práci je motivované úlohou overovania faktov v českom jazyku, ktorá je dôležitou výzvou pre moderný svet. V tejto práci aplikujeme najnovšie výskumné výsledky na mechanizmus pozornosti (attention) transformera (Bahdanau et al., 2015), znižujúc priestorovú a časovú zložitosť, čo umožňuje prácu s dlhšími vstupnými sekvenciami (dokumentami). Na záver skúmame, či spracovanie celých článkov, na rozdiel od iba ich odsekov, zlepšuje výkonnosť vyhľadávacích modelov.	cze
dc.description.abstract	The document retrieval task is a well-studied problem of finding the relevant subset of documents to the provided search query. Recent advances in the field of Nat- ural Language Processing (NLP), namely the transformer architecture (Vaswani et al., 2017) and BERT model (Devlin et al., 2018) provide a new approach to document retrieval. The document retrieval in this thesis is motivated by the Czech fact-checking task, which is an important challenge in the modern world. In this thesis, we apply the latest research achievements to the transformer’s attention mechanism (Bahdanau et al., 2015), decreasing the space and time complexity, allowing for longer input se- quences (documents). We then study whether the processing of whole articles, unlike only theirs paragraphs, improves the performance of the retrieval models.	eng
dc.publisher	České vysoké učení technické v Praze. Vypočetní a informační centrum.	cze
dc.publisher	Czech Technical University in Prague. Computing and Information Centre.	eng
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html	eng
dc.rights	Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html	cze
dc.subject	NLP	cze
dc.subject	BERT	cze
dc.subject	TFIDF	cze
dc.subject	český jazyk	cze
dc.subject	dlouhé vstupy	cze
dc.subject	ověřování faktů	cze
dc.subject	vyhledávání dokumentů	cze
dc.subject	document retrieval	eng
dc.subject	fact-checking	eng
dc.subject	long-inputs	eng
dc.subject	Czech language	eng
dc.subject	NLP	eng
dc.subject	BERT	eng
dc.subject	TFIDF	eng
dc.title	Metody document retrieval nad českými texty vhodné pro zpracování dlouhých vstupů	cze
dc.title	Algorithms for Document Retrieval in Czech Language Supporting Long Inputs	eng
dc.type	diplomová práce	cze
dc.type	master thesis	eng
dc.contributor.referee	Kordík Pavel
theses.degree.discipline	Umělá inteligence	cze
theses.degree.grantor	katedra počítačů	cze
theses.degree.programme	Otevřená informatika	cze

Soubory tohoto záznamu

Název:: F3-DP-2021-Gazo-Alexander-Algo ...
Velikost:: 3.381Mb
Formát:: PDF
Popis:: PLNY_TEXT
: Zobrazit/otevřít

Název:: F3-DP-2021-Gazo-Alexander-pril ...
Velikost:: 1.088Mb
Formát:: Neznámý
Popis:: PRILOHA
: Zobrazit/otevřít

Název:: F3-DP-2021-posudek-Drchal_Jan.pdf
Velikost:: 207.5Kb
Formát:: PDF
Popis:: POSUDEK
: Zobrazit/otevřít

Název:: F3-DP-2021-posudek-Kordik_Pavel.pdf
Velikost:: 130.2Kb
Formát:: PDF
Popis:: POSUDEK
: Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Diplomové práce - 13136 [892]

Zobrazit minimální záznam