Detekce phishingu pomocí strojového učení

Jan Koníř

Machine Learning-Based Phishing Detection

dc.contributor.advisor	Krátká Eliška
dc.contributor.author	Jan Koníř
dc.date.accessioned	2025-06-21T21:54:35Z
dc.date.available	2025-06-21T21:54:35Z
dc.date.issued	2025-06-20
dc.identifier	KOS-1248989749605
dc.identifier.uri	http://hdl.handle.net/10467/124349
dc.description.abstract	Práce zkoumá využití strojového učení pro statickou detekci phishingových útoků se zaměřením na phishingové e-maily jako vybraný vektor útoku. Vzhledem k nedostatku veřejně dostupných aktuálních datasetů byl vytvořen vlastní dataset obsahující označené vzorky reálných phishingových pokusů a legitimních e-mailů. V jazyce Python byl vyvinut nástroj pro zřetězené zpracování, který umožňuje různé kombinace metod pro předzpracování textu a extrakci příznaků. Tyto metody zahrnují tokenizaci, stemming a odstranění stop slov, jakož i lexikální, statistické a sémantické techniky extrakce příznaků. Klasifikátor SVM byl natrénován pomocí tohoto datasetu a jeho výkon byl vyhodnocen v sérii strukturovaných experimentů. V závěru této práce jsou diskutována zjištění z experimentů a navrženy různé směry budoucí práce v oblasti detekce phishingu pomocí strojového učení.	cze
dc.description.abstract	This thesis explores the use of machine learning for the static detection of phishing attacks, with a focus on phishing emails as the selected attack vector. Due to the lack of publicly available up-to-date datasets, a custom dataset was created, containing labeled samples of real-world phishing attempts and legitimate emails. A modular Python preprocessing pipeline was developed to allow various combinations of text preprocessing and feature extraction methods. These include tokenization, stemming, and stopword removal, as well as lexical, statistical, and semantic feature extraction techniques. The SVM classifier was trained using this dataset, and its performance was evaluated in a series of structured experiments. Finally, this thesis discusses the findings from the experiments and suggests various directions for future work in machine learning-based phishing detection.	eng
dc.publisher	České vysoké učení technické v Praze. Vypočetní a informační centrum.	cze
dc.publisher	Czech Technical University in Prague. Computing and Information Centre.	eng
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html	eng
dc.rights	Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html	cze
dc.subject	phishing	cze
dc.subject	detekce phishingu	cze
dc.subject	sociální inženýrství	cze
dc.subject	strojové učení	cze
dc.subject	SVM	cze
dc.subject	zpracování přirozeného jazyka	cze
dc.subject	phishing	eng
dc.subject	phishing detection	eng
dc.subject	social engineering	eng
dc.subject	machine learning	eng
dc.subject	SVM	eng
dc.subject	natural language processing	eng
dc.title	Detekce phishingu pomocí strojového učení	cze
dc.title	Machine Learning-Based Phishing Detection	eng
dc.type	bakalářská práce	cze
dc.type	bachelor thesis	eng
dc.contributor.referee	Trummová Ivana
theses.degree.discipline	Informační bezpečnost 2021	cze
theses.degree.grantor	katedra informační bezpečnosti	cze
theses.degree.programme	Informatika	cze

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Bakalářské práce - 18106 [64]

Show simple item record