Řešení částečně pozorovatelných stochastických her pomocí omezeného horizontu

Matěj Veselý

Finite-horizon Approximation of Partially Observable Stochastic Games

Typ dokumentu

bakalářská práce
bachelor thesis

Autor

Matěj Veselý

Vedoucí práce

Bošanský Branislav

Oponent práce

Čermák Jiří

Studijní obor

Základy umělé inteligence a počítačových věd

Studijní program

Otevřená informatika

Instituce přidělující hodnost

katedra kybernetiky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

V této práci implementujeme tři různé částečně pozorovatelné stochastické hry s možnými nekonečnými horizonty do frameworku OpenSpiel. Poté získáme strategie v implementovaných hrách, které mohou být nekonečné tím, že je aproximujeme jako hry konečné. Nakonec zhodnotíme kvalitu získaných strategií a dopad různých odhadů hodnot stavů za horizontem na jejich kvalitu a dopad odhadů na konvergenci použitých algoritmů. Implementované hry jsou Pursuit Evasion, Search Game a Patrolling Game. Všechny implementované hry představují možný bezpečnostně obraný scénář, kde se jeden agent snaží zabránit jinému agentovi ve vykonávání nějaké činnosti. V~Pursuit Evasion se jeden hráč snaží chytit druhého hráče v dané oblasti. V~Search Game se jeden hráč snaží zabránit druhému hráči v pohybu přes zóny. Ve hře Patrolling Game se snaží jeden hráč bránit graf před útokem druhého hráče. Pro odhadnutí nekonečné hry pomocí konečné hry, omezíme délku dané hry. Tím vzniknou nové koncové stavy, které odpovídají stavům, kde je dostažena maximální délka hry. Pokračování nekonečné hry za horizontem bude reprezentováno odměnami které hráči obdrží v nově vzniklých terminálních stavech. Tyto odměny odpovídají odhadům hodnot stavů za horizontem. K získání strategií budeme používat algoritmy MCCFR a IS-MCTS.

In this work, we will implement three different partially observable stochastic games with possibly infinite horizons into the OpenSpiel framework; then, we will compute strategies on the games by approximating them as finite games. Finally, we will evaluate the quality of computed strategies and the impact of different value estimations of states beyond the horizon on the quality of strategies and their impact on the convergence of used algorithms. The games are Pursuit Evasion, Search Game and Patrolling Game. All of the games simulate a possible defence scenario where one agent tries to prevent another agent from performing some activity. In Pursuit Evasion, one player tries to catch the other player in an area; in Search Game, one player tries to prevent the other player from moving through zones; and in Patrolling Game, one player tries to defend a graph from an attack of the other player. To approximate an infinite game as finite, we limit the length of the game; this will introduce new terminal states that correspond to the states where the maximum length is reached; the continuation of the infinite game beyond the horizon will be represented as rewards in the new terminal states. The rewards correspond to the value estimations of states beyond the horizon. To compute strategies, we will use MCCFR and IS-MCTS algorithms.