Network Traffic Representations for Adaptive Intrusion Detection
Type of document
disertační práceAuthor
Bartoš, Karel
Supervisor
Rehák, Martin
Pevný, Tomáš
Field of study
Informatika a výpočetní technikaStudy program
Elektrotechnika a informatikaInstitutions assigning rank
České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra počítačůMetadata
Show full item recordAbstract
New and unseen polymorphic malware, zero-day attacks, or other types of advanced persistent
threats are usually not detected by traditional security systems. This represents a challenge to
the network security industry as the amount and variability of attacks has been increasing. In
this thesis, we propose three key approaches, each dealing with this challenge at di erent levels
of abstraction.
In order to cope with an increasing volume of network tra c, we propose the adaptive sampling
method based on two concepts that mitigate the negative impact of sampling on the
raw input data: (i) Features used by the analytic algorithms are extracted before the sampling
and attached to the surviving
ows. The surviving
ows thus carry the representation of the
original statistical distribution in these attached features. (ii) Adaptive sampling that deliberatively
skews the distribution of the surviving data to over-represent the rare
ows or
ows with
rare feature values. This preserves the variability of the data and is critical for the analysis of
malicious tra c, such as the detection of stealthy, hidden threats. Our approach has been extensively
validated on standard NetFlow data, as well as on HTTP proxy logs that approximate
the use-case of enriched IPFIX for the network forensics.
Next, we propose a novel representation and classi cation system designed to detect both
known as well as previously unseen security threats. The classi ers use statistical feature representation
computed from the network tra c and learn to recognize malicious behavior. The
representation is designed and optimized to be invariant to the most common changes of malware
behaviors. This is achieved in part by a feature histogram constructed for each group of
network connections (
ows) and in part by a feature self-similarity matrix computed for each
group. The parameters of the representation (histogram bins) are optimized and learned based
on the training samples along with the classi ers. The proposed approach was deployed on large
corporate networks, where it detected 2,090 new variants of malware with 90% precision.
Finally, we propose a distributed and self-organized mechanism for the collaboration of multiple
heterogeneous detection systems. The mechanism is based on a game-theoretical approach
that optimizes the behavior of each detection system with respect to other systems in highly
dynamic environments. The game-theoretical model specializes the detection systems on speci c
types of malicious behaviors to collaboratively cover a wider range of attack classes. According
to our experimental evaluation on the real network tra c, the proposed mechanism shows clear
improvements caused by mutual specialization of individual detection systems.
All three approaches can be combined into a uni ed collaborative fusion system, analyzing
the input network tra c at di erent levels of abstraction. The bene ts of such combination were
demonstrated in the nal experiment, where we combined the proposed adaptive sampling with
a collaborative mechanism for detection systems deployed in multiple networks.
Collections
- Disertační práce - 13000 [697]
The following license files are associated with this item: