Generování signatur malwarových rodin z behaviorálních grafů pomocí nesupervizovaného učení

Zvara, Tomáš

Generating Malware Family Signatures from Behavioral Graphs using Unsupervised Learning

Generování signatur malwarových rodin z behaviorálních grafů pomocí nesupervizovaného učení

Authors

Zvara, Tomáš

Supervisors

Jureček, Martin

Reviewers

Lórencz, Róbert

Publisher

České vysoké učení technické v Praze
Czech Technical University in Prague

Files

Full Text (1.59 MB)

Review (45.24 KB)

Review (42.58 KB)

Abstract

Behaviorálny štít je komponenta antivíroveho softvéru firmy Avast zodpovedná za monitorovanie systému a identifikovanie podozrivého správania bežiacich procesov. Správanie procesov je zachytené vo forme behaviorálnych grafov. Prebiehajúci interný výskum skúma možnosti aplikácie neurónových modelov, takzvaných grafových neurónových sietí, za účelom umožnenia strojového učenia nad týmito grafmi. Cieľom práce je skúmať tri rozličné komprimované reprezentácie grafov, ktoré boli vyprodukované existujúcimi modelmi neurónových sietí, a overiť, či tieto reprezentácie umožnujú rozlišovať škodlivé správanie jednotlivých malvérových rodín. Analýza štruktúry týchto reprezentácií bola vykonaná použitim známych klastrovacích algoritmov, a to k-means, DBSCAN a aglomeratívne klastrovanie. Výsledky klastrovacieho procesu boli vyhodnotené pomocou interných a externých merítok. Cieľom je overenie hypotézy, že vytvorené klastre by mali reprezentovať správanie jednotlivých malvérových rodín a umožniť jeho zachytenie vo forme detekcie. Avšak, experimenty ukazujú, že aplikovanie spomenutých klastrovacích metód nevedie k uspokojivým výsledkom a metódy produkujú nekvalitné klastre, ktoré neoddeľujú grafy jednotlivých rodín. To je primárne spôsobené dvoma faktormi. Prvý je, že behaviorálne grafy nezachycujú správanie rodín dostatočne na to, aby mohli byť použité na ich rozlíšenie. Druhý faktor je nízka kvalita poskytnutých označení malvérových rodín.

The behavioral shield is a component of Avast AV responsible for monitoring the system and identifying suspicious behavior of running processes. The behavior is captured in the form of behavioral graphs. There is ongoing internal research that studies the options to use novel deep learning models, i.e., graph neural networks, to allow high-scale learning on these graphs. This thesis aims to study three different graph embeddings, which were produced by the existing graph neural network models, and verify whether the embedded representations allow distinguishing the malicious behavior of various malware strains. The structure of embedded spaces is analyzed using well-known clustering methods, namely k-means, DBSCAN, and agglomerative clustering. The results of the clustering process are evaluated by intrinsic and extrinsic measures. The hypothesis is that the formed clusters should represent individual malware families and thus can be used to create a behavioral signature to detect them. However, performed experiments show that the applied clustering methods produce low-quality clusters that do not allow separating the selected malware strains. There are two factors that cause the low performance. The first one is the poor expressibility of the behavioral graphs with respect to the individual malware strains. The second one is the low quality of the provided labels.

Keywords

škodlivý softvér, rodina škodlivého softvéru, behaviorálna detekcia škodlivého softvéru, grafová neuronová sieť, zhlukovanie, behaviorálna analýza, behaviorálny graf, malware, malware strain, malware behavioral detection, graph neural network, clustering, behavioral analysis, behavioral graph

Permanent link

http://hdl.handle.net/10467/101098

Rights/License

A university thesis is a work protected by the Copyright Act of the Czech Republic. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one`s own expense. The use of thesis should be in compliance with the Copyright Act.

Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem v platném znění.

Collections

Master Theses - 18106

Full item page

Generating Malware Family Signatures from Behavioral Graphs using Unsupervised Learning

Generování signatur malwarových rodin z behaviorálních grafů pomocí nesupervizovaného učení

Authors

Supervisors

Reviewers

Editors

Other contributors

Journal Title

Journal ISSN

Volume Title

Publisher

Date of defense

Files

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

Underlying research data set URL

Permanent link

Rights/License

Collections

Endorsement

Review

Supplemented By

Referenced By