Detekce a lokalizace netexturovaných objektů pomocí hlubokých neuronových sítí

Haluza Pavel

Detection and Localization of Texture-less Objects with Deep Neural Networks

Type of document

bakalářská práce
bachelor thesis

Author

Haluza Pavel

Supervisor

Hodaň Tomáš

Opponent

Kämäräinen Joni

Field of study

Robotika

Study program

Kybernetika a robotika

Institutions assigning rank

katedra kybernetiky

Defended

2017-06-19

Rights

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Show full item record

Abstract

Tato práce zkoumá Faster R-CNN, moderní metodu pro detekci objektů v RGB snímcích, a navrhuje její rozšíření na RGB-D snímky. Jsou diskutována a vyhodnocena řešení následujících problémů: vyplnění chybějících hodnot v hloubkových snímcích, zakódování hloubkové informace (původní vs. povrchové normály), rozšíření CNN architektury o hloubkové vstupy a inicializace vah v rozšířené síti. Celkově nejlepších výsledků bylo dosaženo se síti pracující s informací o hloubce předzpracované iterativním mediánovým filtrem pro vyplnění chybějících hodnot a hloubkovými váhami v první konvoluční vrstvě inicializovanými průměrem RGB vah předtrénovaných na ImageNetu. Nicméně zlepšení vůči původní metodě využívající pouze RGB kanály je nepatrné (mAP bylo zvýšeno o 1 - 2 %), což vybízí k jinému přístupu použití hloubkové informace.

This thesis studies Faster R-CNN, the state-of-art method for object detection in RGB images, and proposes its extension to RGB-D images. Solutions to the following problems are proposed and evaluated: filling missing values in depth images, depth encoding (raw depth vs. surface normals), extension of the CNN architecture to accept the extra depth information, and initialization of weights in the extended network. The overall best results were achieved with a network that accepts an extra depth channel, pre-processed by the iterative median filter to fill in the missing values, and has the depth weights in the first convolutional layer initialized with the mean of the color weights that were pretrained on ImageNet. However, the improvement over the original method using only RGB channels is not significant (mAP was increased by 1 - 2 %), which suggests a need for different incorporation of the depth information.