Trénink neuronové sítě a nediferencovatelné objektivní funkce

Yash Patel

Neural Network Training and Non-Differentiable Objective Functions

dc.contributor.advisor	Matas Jiří
dc.contributor.author	Yash Patel
dc.date.accessioned	2023-05-24T10:19:11Z
dc.date.available	2023-05-24T10:19:11Z
dc.date.issued	2023-04-15
dc.identifier	KOS-1092112206505
dc.identifier.uri	http://hdl.handle.net/10467/108276
dc.description.abstract	Mnoho důležitých úkolů počítačového vidění je přirozeně formulováno tak, aby měly nediferencovatelný cíl. Proto standardní, dominantní trénovací postup neuronové sítě není použitelný, protože zpětné šíření vyžaduje gradienty cíle vzhledem k výstupu modelu. Většina metod hlubokého učení obchází problém neoptimálně použitím proxy ztráty pro trénink, který byl původně navržen pro jiný úkol a není přizpůsoben specifikům cíle. Funkce proxy ztráty se mohou, ale nemusí dobře shodovat s původním nediferencovatelným cílem. Pro nový úkol musí být navržen vhodný proxy, který nemusí být proveditelný pro laika. Tato práce přináší čtyři hlavní příspěvky k překlenutí propasti mezi nediferencovatelným cílem a funkcí ztráty tréninku. Ztrátovou funkci v celé práci označujeme jako náhradní ztrátu, pokud se jedná o diferencovatelnou aproximaci nediferencovatelného cíle. Všimněte si, že termíny cíl a evaluační etric používáme zaměnitelně. Nejprve navrhujeme přístup k učení diferencovatelného urrogátu rozložitelné a nediferencovatelné vyhodnocovací metriky. Urrogate se učí společně s modelem specifickým pro úkol střídavým způsobem. Tento přístup je ověřen na dvou praktických úlohách rozpoznávání a vyhledávání textu scény, kde se náhradník učí aproximaci vzdálenosti úprav a průniku přes spojení. V nastavení po ladění, kde je model trénovaný se ztrátou proxy dále trénován s naučeným náhradníkem na stejných datech, navrhovaná metoda ukazuje relativní zlepšení až o 39 % na celkové vzdálenosti úprav pro rozpoznání textu scény a 4,25 % na skóre F1 pro detekci textu scény. Za druhé, vylepšená verze tréninku s naučeným náhradníkem, kde jsou odfiltrovány tréninkové vzorky, které jsou pro náhradníka těžké. Tento přístup je ověřen pro rozpoznávání textu scény. Překonává náš předchozí přístup a dosahuje průměrného zlepšení o 11,2 % na celkové vzdálenosti úprav a snížení chyb o 9,5 % na přesnost v několika oblíbených benchmarcích. Všimněte si, že dvě navrhované metody pro naučení náhradníka a výcvik s narcátem nevytvářejí žádné předpoklady o daném úkolu a mohou být potenciálně rozšířeny na nové úkoly. Za třetí, pro reminiscenci, nerozložitelnou a nediferencovatelnou vyhodnocovací metriku, navrhujeme ručně vytvořenou náhradu, která zahrnuje navrhování diferencovatelných verzí operací třídění a počítání.Je také navržena účinná kombinační technika pro učení metriky, která míchá skóre podobnosti namísto vektorů pro vkládání. Navrhovaná náhrada dosahuje nejmodernějších výsledků na několika metrických učeních a srovnávacích testech vyhledávání na úrovni instancí v kombinaci se školením na velkých dávkách. Dále v kombinaci s klasifikátorem kNN slouží také jako účinný nástroj pro jemnozrnné rozpoznávání, kde překonává přímé klasifikační metody. Za čtvrté navrhujeme ztrátovou funkci nazvanou Extended SupCon, která společně trénuje parametry lassifier a páteře pro řízenou kontrastní klasifikaci.Navrhovaný přístup těží z robustnosti kontrastivního učení a zachovává pravděpodobnostní interpretaci jako soft-max predikci. Empirické výsledky ukazují účinnost našeho přístupu v náročných podmínkách, jako je třídní nerovnováha, korupce štítků a školení s málo označenými údaji. Celkově přínosy této práce činí trénování neuronových sítí škálovatelnějším – na nové úkoly téměř bezpracně, když je vyhodnocovací metrika rozložitelná, což výzkumníkům pomůže s novými úkoly.	cze
dc.description.abstract	Many important computer vision tasks are naturally formulated to have a nondifferentiable objective. Therefore, the standard, dominant training procedure of a neural network is not applicable since back-propagation requires the gradients of the objective with respect to the output of the model. Most deep learning methods side-step the problem sub-optimally by using a proxy loss for training, which was originally designed for another task and is not tailored to the specifics of the objective. The proxy loss functions may or may not align well with the original non-differentiable objective. An appropriate proxy has to be designed for a novel task, which may not be feasible for a non-specialist. This thesis makes four main contributions toward bridging the gap between the non-differentiable objective and the training loss function. Throughout the thesis, we refer to a loss function as a surrogate loss if it is a differentiable approximation of the non-differentiable objective. Note that we use the terms objective and evaluation etric interchangeably.First, we propose an approach for learning a differentiable urrogate of a decomposable and non-differentiable evaluation metric. The urrogate is learned jointly with the task-specific model in an alternating manner. The approach is validated on two practical tasks of scene text recognition and etection, where the surrogate learns an approximation of edit distance and intersection-over-union, respectively. In a post-tuning setup, where a model trained with the proxy loss is trained further with the learned surrogate on the same data, the proposed method shows a relative improvement of up to 39% on the total edit distance for scene text recognition and 4.25% on F1 score for scene text detection. Second, an improved version of training with the learned surrogate where the training samples that are hard for the surrogate are filtered out. This approach is validated for scene text recognition. It outperforms our previous approach and attains an average improvement of 11.2% on total edit distance and an error reduction of 9.5% on accuracy on several popular benchmarks. Note that the two proposed methods for learning a surrogate and training with the urrogate do not make any assumptions about the task at hand and can be potentially xtended to novel tasks. Third, for recall@k, a non-decomposable and non-differentiable evaluation metric, we propose a hand-crafted surrogate that nvolves designing differentiable versions of sorting and counting operations. An efficient mixup technique for metric learning is also proposed that mixes the imilarity scores instead of the embedding vectors. The proposed surrogate attains state-of-the-art results on several metric learning and instance-level search benchmarks when combined with training on large batches. Further, when combined with the kNN classifier, it also serves as an effective tool for fine-grained recognition, where it outperforms direct classification methods. Fourth, we propose a loss function termed Extended SupCon that jointly trains the lassifier and backbone parameters for supervised contrastive classification. The proposed approach benefits from the robustness of contrastive learning and maintains the probabilistic interpretation like a soft-max prediction. Empirical results show the efficacy of our approach under challenging settings such as class imbalance, label corruption, and training with little labeled data. Overall the contributions of this thesis make the training of neural networks more scalable – to new tasks in a nearly labor-free manner when the evaluation metric is dekomposable, which will help researchers with novel tasks. For ondecomposable evaluation metrics, the differentiable components developed for the recall@k surrogate, such as sorting and counting, can also be used for creating new urrogates.	eng
dc.publisher	České vysoké učení technické v Praze. Vypočetní a informační centrum.	cze
dc.publisher	Czech Technical University in Prague. Computing and Information Centre.	eng
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html	eng
dc.rights	Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html	cze
dc.subject	Trénování neuronových sítí	cze
dc.subject	nediferencovatelný cíl	cze
dc.subject	učení náhradní ztráty	cze
dc.subject	rozpoznávání textu scény	cze
dc.subject	detekce textu scény	cze
dc.subject	komprese obrazu	cze
dc.subject	metrické učení	cze
dc.subject	vyhledávání obrazu	cze
dc.subject	vyhledávání na úrovni instancí	cze
dc.subject	řízené kontrastní učení	cze
dc.subject	klasifikace obrazu	cze
dc.subject	jemná klasifikace	cze
dc.subject	Neural Network Training	eng
dc.subject	Non-Differentiable Objective	eng
dc.subject	Learning Surrogate Loss	eng
dc.subject	Scene Text Recognition	eng
dc.subject	Scene Text Detection	eng
dc.subject	Image Compression	eng
dc.subject	Metric Learning	eng
dc.subject	Image Retrieval	eng
dc.subject	Instance Level Search	eng
dc.subject	Supervised Contrastive Learning	eng
dc.subject	Image Classification	eng
dc.subject	Fine-Grained Classification	eng
dc.title	Trénink neuronové sítě a nediferencovatelné objektivní funkce	cze
dc.title	Neural Network Training and Non-Differentiable Objective Functions	eng
dc.type	disertační práce	cze
dc.type	doctoral thesis	eng
dc.contributor.referee	Langs Georg
theses.degree.discipline	Informatics - Department of Cybernetics	cze
theses.degree.grantor	katedra kybernetiky	cze
theses.degree.programme	Computer Science	cze

Soubory tohoto záznamu

Název:: F3-D-2023-Patel-Yash-PhD_Disse ...
Velikost:: 14.28Mb
Formát:: PDF
Popis:: PLNY_TEXT
: Zobrazit/otevřít

Název:: F3-D-2023-Patel-Yash-PhD_Disse ...
Velikost:: 14.26Mb
Formát:: PDF
Popis:: PLNY_TEXT
: Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Disertační práce - 13000 [700]

Zobrazit minimální záznam