Network Traffic Representations for Adaptive Intrusion Detection

Bartoš, Karel

Type of document

disertační práce

Author

Bartoš, Karel

Supervisor

Rehák, Martin

Pevný, Tomáš

Field of study

Informatika a výpočetní technika

Study program

Elektrotechnika a informatika

Institutions assigning rank

České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra počítačů

Metadata

Show full item record

Abstract

New and unseen polymorphic malware, zero-day attacks, or other types of advanced persistent threats are usually not detected by traditional security systems. This represents a challenge to the network security industry as the amount and variability of attacks has been increasing. In this thesis, we propose three key approaches, each dealing with this challenge at di erent levels of abstraction. In order to cope with an increasing volume of network tra c, we propose the adaptive sampling method based on two concepts that mitigate the negative impact of sampling on the raw input data: (i) Features used by the analytic algorithms are extracted before the sampling and attached to the surviving ows. The surviving ows thus carry the representation of the original statistical distribution in these attached features. (ii) Adaptive sampling that deliberatively skews the distribution of the surviving data to over-represent the rare ows or ows with rare feature values. This preserves the variability of the data and is critical for the analysis of malicious tra c, such as the detection of stealthy, hidden threats. Our approach has been extensively validated on standard NetFlow data, as well as on HTTP proxy logs that approximate the use-case of enriched IPFIX for the network forensics. Next, we propose a novel representation and classi cation system designed to detect both known as well as previously unseen security threats. The classi ers use statistical feature representation computed from the network tra c and learn to recognize malicious behavior. The representation is designed and optimized to be invariant to the most common changes of malware behaviors. This is achieved in part by a feature histogram constructed for each group of network connections ( ows) and in part by a feature self-similarity matrix computed for each group. The parameters of the representation (histogram bins) are optimized and learned based on the training samples along with the classi ers. The proposed approach was deployed on large corporate networks, where it detected 2,090 new variants of malware with 90% precision. Finally, we propose a distributed and self-organized mechanism for the collaboration of multiple heterogeneous detection systems. The mechanism is based on a game-theoretical approach that optimizes the behavior of each detection system with respect to other systems in highly dynamic environments. The game-theoretical model specializes the detection systems on speci c types of malicious behaviors to collaboratively cover a wider range of attack classes. According to our experimental evaluation on the real network tra c, the proposed mechanism shows clear improvements caused by mutual specialization of individual detection systems. All three approaches can be combined into a uni ed collaborative fusion system, analyzing the input network tra c at di erent levels of abstraction. The bene ts of such combination were demonstrated in the nal experiment, where we combined the proposed adaptive sampling with a collaborative mechanism for detection systems deployed in multiple networks.

The following license files are associated with this item:

Original License