Generování přirozeného jazyka ze znalostních databází

Cílem této diplomové práce je vytvořit nástroj jenž za pomocí strojového učení dokáže verbalizovat data, t.j. ze vstupních dat ve formě RDF trojic dokáže vytvořit odpovídající text v přirozeném jazyce (angličtina) takový, že bude gramaticky a mluvnicky správný, bude obsahovat veškeré informace ze vstupních dat a nebude obsahovat žádné informace navíc. Práce nejprve zkoumá dostupná data, poté se zabývá architekturami modelů pro statistické strojové učení a jejich možné použití pro generování přirozeného jazyka. Práce se taktéž zabývá numerickou reprezentací textu, generováním textu pomocí učících se modelů a optimalizačních algoritmů pro trénování těchto modelů. V další části práce jsou navrženy dva rozdílné přístupy pro řešení zadání práce. Navržené přístupy jsou poté zhodnoceny pomocí automatických metrik a nejlepší systémy jsou zhodnoceny manuálně. Závěr této diplomové práce je věnován nasazení výsledné aplikace pro produkční běh.

The main goal of this master thesis is to create a machine-learning-based tool that is able to verbalize given data, i.e., from given RDF triples; it should be able to create a corresponding text in a natural language (English) such that the text must be grammatically correct, fluent, must contain all information from the input data and cannot have any additional information. The thesis begins with examining the publicly available datasets; then, it focuses on the architectures of statistical machine learning models and their possible usage for natural language generation. The work is also focused on possible numerical text representation, text generation by machine learning models, and optimization algorithms for training the models. The next part of the thesis proposes two main solutions to the problem and examines each of them. Automatic metrics evaluate all systems, and the best performing models are then passed to a human (manual) evaluation. The last part of the thesis focuses on implementing the final application and its deployment for production.

Keywords

RDF Triple, Strojové Učení, Generování Přirozeného Jazyka, LSTM, Transformer, T5, Roberta, RDF Triple, Machine Learning, Natural Language Generation, LSTM, Transformer, T5, Roberta

Permanent link

http://hdl.handle.net/10467/95427

Rights/License

A university thesis is a work protected by the Copyright Act of the Czech Republic. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one`s own expense. The use of thesis should be in compliance with the Copyright Act.

Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem v platném znění.

Collections

Master Theses - 13136

Full item page

Natural Language Generation from Knowledge-Base Triples