Rozvrhování výpočtů inference neuronových sítí na vestavném hardware

Eldar Iosip

Computation scheduling in neural network inference on embedded hardware

Typ dokumentu

diplomová práce
master thesis

Autor

Eldar Iosip

Vedoucí práce

Sojka Michal

Oponent práce

Pošík Petr

Studijní obor

Softwarové inženýrství

Studijní program

Otevřená informatika

Instituce přidělující hodnost

katedra počítačů

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

Cílem této práce je prozkoumat state- of-the-art způsoby detekce objektů po- mocí konvolučních neuronových sítí, využívaných v oblasti autonomního řízení. Proto aby běh na vestavěných systémech byl dostatečně optimalizo- ván, je nutné rozumět struktuře sítě a způsobu, jak se provádí její výpočet pomocí konkrétní knihovny. Hlavním cílem této práce je porovnat něko- lik dostupných knihoven pro oblast strojového učení a popsat nezdokumen- tovanou vnitřní architekturu knihovny TensorFlow, aby bylo možné na základě těchto znalostí upravovat vykonávané části kódu za účelem lepšího rozvrho- vání jednotlivých procesů. Aby bylo možné porovnávat výsledky budoucích optimalizací na cílové platformě NVI- DIA Jetson Tegra X2, je představen jednoduchý benchmark a je popsán postup, jak vyčítat spotřebu energie a tepelný profil čipů na desce.

This thesis aims to examine the state-of-the-art solution of using con- volutional neural networks to address the problem of object detection, during the autonomous driving. The effective execution of these solutions involves an in-depth understanding of used frame- work architectures. The main goal of the thesis is to compare several ma- chine learning frameworks and provide a comprehensive description of the nondocumented internal architecture of the TensorFlow machine learning framework to allow future researches to introduce modifications regarding scheduling mechanisms. To properly evaluate future modifications on the target platform NVIDIA Tegra X2, the thesis introduces the benchmark and provides an instruction how to read power consumption and temperature of board components.