Algoritmy pro optimální exploraci a exploitaci v
 dynamických prostředí

Marek Petr

Intelligent Algorithms for Optimal Exploration and Exploitation in Dynamic Environments

Typ dokumentu

bakalářská práce
bachelor thesis

Autor

Marek Petr

Vedoucí práce

Kopřiva Štěpán

Oponent práce

Selecký Martin

Studijní obor

Informatika a počítačové vědy

Studijní program

Otevřená informatika

Instituce přidělující hodnost

katedra kybernetiky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://www.cvut.cz/sites/default/files/content/d1dc93cd-5894-4521-b799-c7e715d3c59e/cs/20160901-metodicky-pokyn-c-12009-o-dodrzovani-etickych-principu-pri-priprave-vysokoskolskych.pdf
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://www.cvut.cz/sites/default/files/content/d1dc93cd-5894-4521-b799-c7e715d3c59e/cs/20160901-metodicky-pokyn-c-12009-o-dodrzovani-etickych-principu-pri-priprave-vysokoskolskych.pdf

Metadata

Zobrazit celý záznam

Abstrakt

V předložené práci studujeme algoritmy, které se zabývají udržováním rovnováhy mezi exploration (průzkum prostředí ve snaze najít optimální strategii) a exploitation (využívání strategie, která přináší největší užitek). Dilema mezi exploration a exploitation je součástí teorie posilovaného učení (reinforcement learning) a jeden z jeho nejjednodušších příkladů je u tzv. multi-armed bandit problému, který je v práci využit pro modelování konkrétního prostředí jednoduché online hry. Hlavním cílem práce je navrhnout algoritmus, který bude optimálně vyvažovat exploration a exploitation a tak přinese největší užitek. V případě jednoduché online hry je algoritmus schopen najít nejoblíbenější variantu této hry. K tomu účelu je navržena hra Speeder s volitelnými herními parametry a algoritmus UCBCustom, který je uzpůsobený použití v daném prostředí a jeho efektivita analyzována na hře Speeder. Algoritmus umí za běhu dynamicky přepočítávat užitky parametrů. V práci také popisujeme, jak se místo herních parametrů dají použít ceny v případě freemium her. Skrze testování práce ukazuje, že navržený algoritmus UCBCustom je schopný identifikovat parametry jedné z nejoblíbenějších variant hry a to již po malém počtu odehraných her.

In the thesis we study algorithms that balance exploration (searching the environment for optimal strategy) and exploitation (using the most beneficial strategy). The exploration vs. exploitation dilemma is the main aspect of various problems in theory of reinforcement learning and one of its simplest instances is the multi-armed bandit problem, which we use to model a specific environment of a simple online game. The main objective of the thesis is to develop an algorithm that will optimally balance exploration and exploitation and thus will result in the best possible profit. In the case of a simple online game, the algorithm is able to find the most enjoyable variant of the game. To that end, we have designed a game with variable game parameters called Speeder and a custom algorithm called UCBCustom, which is tailored for use in the given environment and its effectiveness is evaluated on the game Speeder. The algorithm is able to dynamically re-evaluate benefits of the parameters during runtime. The thesis also describes how we can use prices of features in a freemium game instead of game parameters. Further in the thesis, we show through testing that our designed algorithm UCBCustom is able to identify one of the more enjoyable variants of the game even after a limited number of plays.