Semi-supervised learning for malware detection
Semi-supervised learning pro detekci malware
Authors
Supervisors
Reviewers
Editors
Other contributors
Journal Title
Journal ISSN
Volume Title
Publisher
České vysoké učení technické v Praze
Czech Technical University in Prague
Czech Technical University in Prague
Date
Abstract
Využívanie strojového učenia v oblasti detekcie malwaru nie je v súčasnosti až tak veľmi populárne. Jedným z dôvodov je aj skutočnosť, že označovanie malwaru a legitímnych súborov, čo je pre strojové učenie nevyhnutné, je veľmi drahý proces. Táto práca sa zaoberá detekciou malwaru pomocou semi-supervised learningu. Tento typ učenia je jednou z kategórií strojového učenia, kedy k trénovaniu modelu využívame ako označené, tak aj neoznačené vzorky. K trénovaniu sme využívali informácie získane zo súborov v PE formáte. V tejto práci je ukázané, že využitím semi-supervised learningu je možné dosiahnuť lepšiu presnosť, než použitím len samotného supervised learningu.
Nowadays, the use of machine learning for malware detection is not very popular. One of the reasons is that labelling of malware and benign files necessary for machine learning is very expensive process. This thesis is focused on malware detection by semi - supervised learning. Semi-supervised learning is a machine learning technique that makes use of labelled as well as unlabelled samples for training. Information obtained from executable files in PE format was used for training. In the thesis it is showed that it is possible to reach better accuracy using semi - supervised learning, compared to purely supervised approach.
Nowadays, the use of machine learning for malware detection is not very popular. One of the reasons is that labelling of malware and benign files necessary for machine learning is very expensive process. This thesis is focused on malware detection by semi - supervised learning. Semi-supervised learning is a machine learning technique that makes use of labelled as well as unlabelled samples for training. Information obtained from executable files in PE format was used for training. In the thesis it is showed that it is possible to reach better accuracy using semi - supervised learning, compared to purely supervised approach.