Řízení robotů pomocí stukturovaného hlubokého učení

Teymur Azayev

Robotic control with deep-learned structured policies

dc.contributor.advisor	Zimmermann Karel
dc.contributor.author	Teymur Azayev
dc.date.accessioned	2022-12-09T14:19:16Z
dc.date.available	2022-12-09T14:19:16Z
dc.date.issued	2022-11-01
dc.identifier	KOS-917342279105
dc.identifier.uri	http://hdl.handle.net/10467/105236
dc.description.abstract	Metody řízené daty pro robotické řízení si v posledním desetiletí neustále získávají popularitu a ukazují slibné výsledky pro složité, vysoce rozměrné morfologie robotů. Takové metody obvykle zahrnují učení aproximátoru parametrických funkcí (řídicí pravidlo), který mapuje sadu senzorických vstupů na akční výstupy. Nejoblíbenější typ takových aproximátorů je neuronová síť Multi layer perceptron (MLP), skládající se z několika lineárních vrstev oddělených nelinearitou. Ve spojení s výkonnými algoritmy na principu pokus-omyl, jako je posilováné učení, jsme schopni se naučit řídicí pravidla, která můžou maximalizovat dané funkce odměny (nákladů) téměř v jakékoli doméně. Zatímco většina výzkumů je zaměřena na učení algoritmů, zlepšování efektivity vzorků a adaptace mezi simulátorem a realitou, je věnována menší pozornost na skutečný aproximátor funkcí, který představuje řídicí pravidlo. V této práci ukazujeme, že pro některé robotické morfologie, řízení pomocí monolitické MLP nebo rekurentní neuronové sítě (RNN) může vést k problémům během fáze učení a také k celkově špatným výsledkům, zejména pro strukturované úkoly, jako je lokomoce.	cze
dc.description.abstract	Data-driven methods for robotic control have been steadily gaining popularity over the past decades, showing high-performing results for complex, high-dimensional robot morphologies. Such methods usually entail learning a parametric function approximator (policy/control law) that maps a set of sensory inputs to action outputs. The most popular class of such function approximators is the neural network Multi-layer perceptron (MLP), consisting of several linear layers separated by a non-linearity. In conjunction with powerful trial and error algorithms such as Reinforcement learning, we are able to learn control policies that can maximize a given reward (cost) function in almost any domain. While most research is focused on learning algorithms and improving sample efficiency and sim-to-real adaptation, there is less attention to the actual function approximator that represents the control law. In this thesis, we show that for some robotic morphologies, using a monolithic MLP or Recurrent Neural network (RNN) can lead to issues during the learning phase and well as an overall poor result, especially for structured tasks such as locomotion. In our first published work, we show this in experiments on learning adaptive locomotion behaviors for legged hexapod robots and show that it can be effective to partition the control problem into several discrete tasks, learning an optimal policy for each one and then learning to switch between them. In subsequent work, we enforce well-explainable structural elements into the overall architectural design while preserving the end-to-end training. This is done by starting with an initial hand-designed algorithm and successively replacing various heuristic decision points with neural network modules that can then be trained using black-box optimization methods. In our third published work, we show yet another way in which we can structure the control policy as a hybrid of hand-designed knowledge and learnable elements, giving a sample-efficient and more interpretable architecture that can be used to learn autonomous flipper control for articulated tracked robots. We verified our experiments in this work on a real platform in conjunction with a full navigation stack, as well as deployed part of our algorithm in the DARPA Subt urban competition with good results. Finally, we describe various methods of simulation to real robot policy transfer and discuss how the various methods relate to a theoretical Bayesian approach in an increasing learning complexity hierarchy of agents.	eng
dc.publisher	České vysoké učení technické v Praze. Vypočetní a informační centrum.	cze
dc.publisher	Czech Technical University in Prague. Computing and Information Centre.	eng
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html	eng
dc.rights	Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html	cze
dc.subject	Strukturované řidicí pravidla	cze
dc.subject	neuronové sítě	cze
dc.subject	posilované učení	cze
dc.subject	robotika	cze
dc.subject	structured policies	eng
dc.subject	neural networks	eng
dc.subject	reinforcement learning	eng
dc.subject	robotics	eng
dc.title	Řízení robotů pomocí stukturovaného hlubokého učení	cze
dc.title	Robotic control with deep-learned structured policies	eng
dc.type	disertační práce	cze
dc.type	doctoral thesis	eng
dc.contributor.referee	Walas Krzysztof
theses.degree.discipline	Umělá inteligence a biokybernetika	cze
theses.degree.grantor	katedra kybernetiky	cze
theses.degree.programme	Elektrotechnika a informatika	cze

Soubory tohoto záznamu

Název:: F3-D-2022-Azayev-Teymur-PHD_TH ...
Velikost:: 12.03Mb
Formát:: PDF
Popis:: PLNY_TEXT
: Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Disertační práce - 13000 [700]

Zobrazit minimální záznam