Reprezentace a generování adversariálních řetězců

Marek Galovič

Representation Learning and Adversarial Sample Generation for Strings

Typ dokumentu

bakalářská práce
bachelor thesis

Autor

Marek Galovič

Vedoucí práce

Bošanský Branislav

Oponent práce

Šmídl Václav

Studijní obor

Základy umělé inteligence a počítačových věd

Studijní program

Otevřená informatika

Instituce přidělující hodnost

katedra kybernetiky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

V analýze správania malwaru je zoznam vytvorených alebo otvorených súborov často silný príznak pre klasifikačný problém, v ktorom sa rozhoduje či je daný súbor bezpečný alebo nebezpečný. Autori malwaru sa snažia uniknúť odhaleniu s pomocou generovania náhodných názvov súborov, alebo modifikovaním existujúcich názvov súborov v nových verziách malwaru. Tieto zmeny predstavujú adversariálne útoky na detekčný klasifikátor. Cieľom tejto práce je učenie sa latentných reprezentácií znakových reťazcov, generovanie adversariálnych vstupov, a zlepšenie robustnosti klasifikátora voči adversariálnym útokom. Pre učenie sa vektorových reprezentácií znakových reťazcov sme vyvinuli rekurentnú autoenkóder architektúru, ktorá dosahuje vysokú rekonštrukčnú kvalitu. S použitím perturbácií latentných reprezentácií, ktoré sú založené na znalosti gradientu, sme potom boli schopní generovať adversariálne vstupy a použiť tieto adversariálne vstupy na zlepšenie robustnosti klasifikátora. Taktiež sme ukázali, že latentné reprezentácie získané pomocou variačných autoenkóderov zlepšujú adversariálnu robustnosť bez potreby adversariálneho učenia.

In malware behavioral analysis, the list of accessed and created files is very often a strong predictive feature for classification whether the examined file is malicious or benign. However, malware authors are trying to avoid detection by generating random filenames, and/or modifying existing filenames with new versions of the malware. These changes represent real-world adversarial examples against the detection classifier. The goal of this work is to learn latent representations of character sequences, generate realistic adversarial examples, and improve the classifier's robustness against adversarial attacks. To obtain fixed-size vector representations of character sequences, we developed a recurrent autoencoder architecture that achieves high sample reconstruction accuracy. Using gradient-based adversarial attacks in the latent representation space, we were able to generate realistic adversarial examples in the input space, and use these adversarial examples to improve the classifier's robustness. Additionally, we showed that latent representations obtained using variational autoencoders improve adversarial robustness without the need for adversarial training.