Obrazy s vysokým rozlišením a destilace znalostí v hlubokém metrickém učení s transformátory vidění
High-Resolution Images and Knowledge Distillation in Deep Metric Learning with Vision Transformers
Typ dokumentu
diplomová prácemaster thesis
Autor
Yongpan Fu
Vedoucí práce
Tolias Georgios
Oponent práce
Psomas Bill
Studijní obor
Computer Vision and Image ProcessingStudijní program
Open InformaticsInstituce přidělující hodnost
katedra kybernetikyPráva
A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.htmlVysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html
Metadata
Zobrazit celý záznamAbstrakt
This thesis investigates the trade-off between performance and complexity with respect to various image resolutions using Vision Transformers in deep metric learning. The objective of metric learning is to use deep neural networks to embed images into representative vectors such that images of the same class cluster together in the feature space while maintaining separation between different classes. Vision Transformers, among various deep architectures, have proven efficient at extracting high-level semantics from diverse image content and are thus employed as the primary models. Nevertheless, high performance is often accompanied by substantial complexity. Knowledge distillation is utilized as an optimization technique to enhance the performance of cost-effective models under the guidance of more complex models. Moreover, image resolution significantly affects the model’s performance. Therefore, this thesis examines the performance/complexity trade-off in asymmetric metric learning, where images are processed at different resolutions. The term resolution refers to either the input image resolution or the resolution of the patches that are separately processed at the processing stage of Vision Transformers. This thesis investigates the trade-off between performance and complexity with respect to various image resolutions using Vision Transformers in deep metric learning. The objective of metric learning is to use deep neural networks to embed images into representative vectors such that images of the same class cluster together in the feature space while maintaining separation between different classes. Vision Transformers, among various deep architectures, have proven efficient at extracting high-level semantics from diverse image content and are thus employed as the primary models. Nevertheless, high performance is often accompanied by substantial complexity. Knowledge distillation is utilized as an optimization technique to enhance the performance of cost-effective models under the guidance of more complex models. Moreover, image resolution significantly affects the model’s performance. Therefore, this thesis examines the performance/complexity tradeoff in asymmetric metric learning, where images are processed at different resolutions. The term resolution refers to either the input image resolution or the resolution of the patches that are separately processed at the processing stage of Vision Transformers.
Kolekce
- Diplomové práce - 13133 [474]