Analýza chování Malware v síti pomocí grafu

Šmolík Daniel

Graph-Based Analysis of Malware Network Behaviors

Type of document

bakalářská práce
bachelor thesis

Author

Šmolík Daniel

Supervisor

García Sebastián

Opponent

Catania Carlos

Field of study

Informatika a počítačové vědy

Study program

Otevřená informatika

Institutions assigning rank

katedra kybernetiky

Rights

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Show full item record

Abstract

Existuje mnoho různých rodin malware a každá se vyznačuje jinými vlastostmi. Cílem této práce je zaměřit se na detekování škodlivého chování pomocí odchozí síťové komunikace. Naše hypotéza je, že tato škodlivá komunikace má sekvenční behaviorální vzory. Představujeme novou grafovou reprezentaci odchozí komunikace, kde jako vrcholy grafu používáme trojice (IP adresa, port, protokol). Myslíme si, že tato reprezentace může být užitečná při detekování vzorů programem i lidským okem. Pro předpověď byl použit algoritmus Random Forest. Testování proběhlo na datech normálních uživatelů, nakaženýho počítačů a normálních uživatelů, jejichž počítače byly později nakaženy. Byli jsme schopni detekovat škodlivou komunikaci až s 97% úspěšností.

There are many malware families and every each of them has some unique features. The aim of this work is to focus on detecting malicious behavior using leaving network communication. Our hypothesis is that this malicious communication has sequential behavioral patterns. We present a new graph representation of leaving network communication using (IP address, port, protocol)-triplets as vertices. There is an edge between two vertices if they come one after the other in the record of the leaving communication of the inspected host.We think this representation might prove useful in detecting the patterns by a program and even by a naked eye. Random Forest algorithm was used for predicting. Testing was done against datasets of normal users, infected hosts and normal users that are later infected. We were able to detect malicious communication with up to 97% accuracy.