Strojové učení k ochraně před kybernetickými útoky na webu

František Střasák

Should I click on a link? Machine Learning to Protect from Cyber Attacks on the Web

dc.contributor.advisor	García Sebastián
dc.contributor.author	František Střasák
dc.date.accessioned	2020-06-15T22:51:28Z
dc.date.available	2020-06-15T22:51:28Z
dc.date.issued	2020-06-15
dc.identifier	KOS-860412734205
dc.identifier.uri	http://hdl.handle.net/10467/88037
dc.description.abstract	Detekce nebezpečných webových stránek dnes představuje velkou výzvu, neboť techniky, které jsou v útocích využívány, jsou velmi rozmanité, pokročilé a nebezpečné. Nebezpečná webová stránka může infikovat uživatelovo zařízení či ukrást jeho citlivá data. Nejrozšířenějším zástupcem nebezpečných stránek jsou tzv. evil twin webové stránky, které používají phishin-gové praktiky ke krádeži citlivých dat. Evil twin webové stránky se snaží co nejvěrněji napodobit vzhled reálné webové stránky, zmást uživatele a přesvědčit ho k zadání citlivých údajů, těmi jsou například přihlašovací údaje. Uživatelé, kteří ověřují autenticitu webové stránky pouze podle vzhledu, mohou být tedy velmi jednoduše podvedeni. Častou technikou k detekci nebezpečných stránek jsou blacklisty, jež obsahují list nebezpečných URL. Problém ale nastává, když se objeví nová URL, která v blacklistu není obsažena, a tím pádem nemůže být detekována. Možným řešením tohoto problému je detekce založená na analýze URL, při jejíž použití bylo v minulosti během několika výzkumů dosaženo uspokojivých výsledků. We-bová stránka ale obsahuje více informací než pouze URL. V této práci jsou předkládány nové metody pro detekci nebezpečných a evil twin stránek, jež jsou založeny na analýze chování, obsahu a struktuře webové stránky. Data o chování a obsahu webové stránky jsou získána z analýzy vytvořené pomocí urlscan.io. Tato analýza ukazuje komplexní popis webové stránky v mnoha směrech. Data o struktuře webové stránky jsou brána ze zdrojového kódu HTML. První část této práce je věnována obecně detekci nebezpečných stránek a druhá část je za-měřena pouze na detekci evil twin webových stránek. Pro oba problémy byly vytvořeny datasety, které jsou veřejně přístupné a mohou být použity pro další výzkum. Z výsledků výzkumu této diplomové práce vyplývá, že data založená na obsahu, chování a struktuře webové stránky hrají důležitou roli při detekci kybernetických útoků. Na základě metod této diplomové práce bylo dosaženo přesnosti 92.69% pro detekci nebezpečných stránek a 95.28% pro detekci evil twin stránek.	cze
dc.description.abstract	The detection of unsafe websites poses a challenging task for our security community because their attacking techniques are varied, advanced and dangerous. There are many types of unsafe websites that can infect user’s devices or steal their sensitive data. The most prevalent representative type of unsafe websites are the evil twin websites that use phishing techniques to steal sensitive data and credentials from users. Evil twin websites are clone websites imitating other real websites to trick users into using them. Therefore, users judging the authenticity of a website by its look, can be defrauded by inputting sensitive information in the evil twin website. To detect these unsafe websites, previous studies have mainly used blacklists, but they constant updates when a new URL appears. This results in the approach not protecting from the new and current threats. Another common solution is to detect the website by analyzing the URL string, which may shows satisfying results under certain conditions. However, the complexity of domain names and URL parameters makes this approach to have errors also. Since websites offer much more information than only a URL, this thesis proposes novel methods to detect unsafe and evil twin websites based on the analysis of the behavior, content, and structure of websites. The structure refers to the HTML structure, the content and the behavior refer a large group of features extracted from the urlscan.io service that provides a complex description of websites. To fulfil its goal of better detecting unsafe websites, this thesis is mainly separated in two parts. The first part focuses on the detection of unsafe websites in general by using different set of features. The second part of this thesis specifically concentrates on the detection of evil twin websites. For both problems we created and publish our own datasets that can be useful for the whole community. This thesis presents evidence that features from the content, behaviour and structure of websites play an essential role for detecting cyber attacks on the websites. The results show that our models are able to separate between unsafe and legitimate websites with an accuracy of 92.69% and between evil twin websites and legitimate websites with an accuracy of 95.28%. Detecting unsafe websites is a hard topic because they keep evolving, but we believe that this thesis improves the research to detect this threat.	eng
dc.publisher	České vysoké učení technické v Praze. Vypočetní a informační centrum.	cze
dc.publisher	Czech Technical University in Prague. Computing and Information Centre.	eng
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html	eng
dc.rights	Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html	cze
dc.subject	Nebezpečná webová stránka	cze
dc.subject	Evil Twin	cze
dc.subject	Strojové učení	cze
dc.subject	Phishing	cze
dc.subject	Unsafe Websites	eng
dc.subject	Evil Twin Websites	eng
dc.subject	Machine Leaning	eng
dc.subject	Phishing	eng
dc.title	Strojové učení k ochraně před kybernetickými útoky na webu	cze
dc.title	Should I click on a link? Machine Learning to Protect from Cyber Attacks on the Web	eng
dc.type	diplomová práce	cze
dc.type	master thesis	eng
dc.contributor.referee	Catania Carlos
theses.degree.discipline	Umělá inteligence	cze
theses.degree.grantor	katedra počítačů	cze
theses.degree.programme	Otevřená informatika	cze

Soubory tohoto záznamu

Název:: F3-DP-2020-Strasak-Frantisek-S ...
Velikost:: 6.444Mb
Formát:: PDF
Popis:: PLNY_TEXT
: Zobrazit/otevřít

Název:: F3-DP-2020-Strasak-Frantisek-p ...
Velikost:: 4.740Mb
Formát:: Neznámý
Popis:: PRILOHA
: Zobrazit/otevřít

Název:: F3-DP-2020-posudek-Garcia_Seba ...
Velikost:: 207.7Kb
Formát:: PDF
Popis:: POSUDEK
: Zobrazit/otevřít

Název:: F3-DP-2020-posudek-Catania_Car ...
Velikost:: 111Kb
Formát:: PDF
Popis:: POSUDEK
: Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Diplomové práce - 13136 [833]

Zobrazit minimální záznam