Implementace Q* algoritmu v Julia

Jiří Klubal

Implementation of Q* algorithm in Julia

Typ dokumentu

bakalářská práce
bachelor thesis

Autor

Jiří Klubal

Vedoucí práce

Kalvoda Tomáš

Oponent práce

Klouda Karel

Studijní obor

Umělá inteligence 2021

Studijní program

Informatika

Instituce přidělující hodnost

katedra aplikované matematiky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

V této bakalářské práci je představen algoritmus Q*, společně s algoritmem A*, na kterém je založen. Je vysvětlen systém automatického získávání heuristických funkcí pomocí metod posilovaného učení, konkrétně algoritmy Deep Q-learning, DAVI a navrhnutými variantami DQVI a naivní DQVI. Všechny algoritmy jsou implementovány v jazyce Julia. Za ukázkový problém je zvoleno řazení palačinek, pro který je navrhnuto virtuální rozhraní. Veškeré algoritmy i rozhraní jsou implementovány za účelem snadné upravitelnosti a znovupoužitelnosti i v jiných problémech či specifikacích, než je tato bakalářská práce. Na problému o velikosti deseti palačinek jsou natrénované neuronové sítě k aproximování heuristické funkce pro algoritmus A* a Q*. Sítě jsou mezi sebou porovnané. Ukazuje se, že zatímco algoritmus DQVI se učí více než pětkrát rychleji než algoritmus DAVI nebo naivní DQVI, je po učení pro stejný počet iterací méně efektivní. Algoritmus Deep Q-learning dosahuje dobrých výsledků, často u něj ale dochází k divergenci. Algoritmy A* a Q* jsou následně pro vybrané heuristické funkce porovnány mezi sebou. V problému řešeném touto bakalářskou prací je algoritmus A* rychlejší a přesnější než algoritmus Q*, který ale využívá méně vyhodnocení neuronových sítí.

In this bachelor thesis, the Q* algorithm is introduced, together with the A* algorithm on which it is based. A system for automatic creation of heuristic function using reinforcement learning methods is explained, specifically the Deep Q-learning, DAVI and its proposed variants DQVI and naive DQVI. All algorithms are implemented in the Julia language. The pancake sorting puzzle is chosen as a toy problem and a virtual interface is designed for it. All algorithms and interfaces are implemented for easy modifiability and reusability in problems and specifications outside of the scope of this thesis. Neural networks are trained on a problem size of ten pancakes to approximate heuristic functions for the A* and Q* algorithms. The networks are compared to each other. It turns out that while the DQVI algorithm learns more than five times faster than the DAVI or naive DQVI algorithm, it is less efficient after learning for the same number of iterations. The Deep Q-learning algorithm achieves good results, but it often diverges. The A* and Q* algorithms are then compared with each other. In the problem researched in this thesis, the A* algorithm is faster and more accurate than the Q* algorithm, which however uses less neural network evaluation.