Identifikace řečníka z akustického signálu

Práce provedená v této práci se primárně zabývá popisem systému Speaker Identification, typu systému Speaker Recognition. Teoreticky se zaměřuje především na různé konfigurace, akustické analýzy a metody, kde hlavní pozornost je věnována GMM-UBM s i-vektorem a současným základním metodám hlubokého učení, jmenovitě X-vector a ECAPA-TDNN. Praktická část se zabývá implementací kanálu identifikace mluvčího na základě specifického úkolu přiděleného MAMA AI. Skládá se z implementace kanálu pomocí open-source sady nástrojů pro řeč, SpeechBrain. Předtrénované modely byly testovány pro různé testovací případy, přičemž byl vybrán nejvýkonnější model. Model pak experimentoval v různých scénářích, jmenovitě výkon proti různým délkám zvukových vzorků, výkon proti zašuměným datům (nesrozumitelným a srozumitelným), výkon proti různým jazykům a výkon proti uměle generovaným zvukovým vzorkům. Vybraný model, ECAPA-TDNN, fungoval výborně pro všechny tyto scénáře, přičemž nejnižší IR (%) nebylo menší než 70 % (kromě konečného experimentu, kde byly hodnoty IR nižší, ale je příznivé na základě experimentálních okolností ) a bylo uzavřeno, že bude použit v konečném kanálu identifikace mluvčího.

The work done in this thesis primarily deals with the description of a Speaker Identification system, a type of Speaker Recognition system. Theoretically, It mainly focuses on the different configurations, acoustic analysis, and methods, where the main attention is given to GMM-UBM with i-vector, and current baseline deep learning methods, namely, X-vector and ECAPA-TDNN. The practical part deals with implementing a Speaker identification pipeline based on the specific task assigned by MAMA AI. It consists of implementing a pipeline using an open-source speech toolkit, SpeechBrain. Pre-trained models were tested for various test cases the best-performing model was selected. The model has then experimented under varying scenarios namely, performance against varying lengths of audio samples, performance against noisy data (non-intelligible and intelligible), performance against various languages, and performance against artificially generated audio samples. The selected model, ECAPA-TDNN, performed excellently for all of these scenarios, with the lowest IR (%) being no less than 70% (apart from the final experimentation, where IR values were lower, but is favorable based on the experimentation circumstances) and was concluded to be used in the final Speaker Identification pipeline.

Keywords

Identifikace mluvčího, Rozpoznávání mluvčího, Hluboké učení, Speaker Identification, Speaker Recognition, Artificial Intelligence, Deep Learning, Python, Speech Analysis, X-vector, ECAPA-TDNN, SpeechBrain

Permanent link

http://hdl.handle.net/10467/107196

Rights/License

A university thesis is a work protected by the Copyright Act of the Czech Republic. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one`s own expense. The use of thesis should be in compliance with the Copyright Act.

Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem v platném znění.

Collections

Bachelor Theses - 13133

Full item page

Speaker Identification from Acoustic Signal