Identifikace mluvčího na bázi hlubokých neuronových sítí

Martin Šubert

Speaker Identification Based on Deep Neural Networks

Typ dokumentu

bakalářská práce
bachelor thesis

Autor

Martin Šubert

Vedoucí práce

Pollák Petr

Oponent práce

Rajnoha Josef

Studijní obor

Multimediální technika

Studijní program

Komunikace, multimédia a elektronika

Instituce přidělující hodnost

katedra radioelektroniky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

Tato práce se zabývá metodami verifikace a identifikace řečníka. Hlavní pozornost je věnována především metodám založeným na bázi GMM resp. i-vektorů. Na teoretické úrovni jsou popsány metody využívající hlubokých neuronových sítí. Implementace byla vytvořena pro systém na bázi GMM resp. i-vektorů, včetně použití LDA a PLDA při výpočtu skóre pro zvýšení přesnosti identifikace resp. verifikace. Bylo využito nástrojů KALDI, které jsou přímo určeny pro úlohy rozpoznávání řeči a rozpoznávání řečníka. Praktická část se zaměřuje především na otestování vlivu počtu a rozložení mluvčích a promluv v rámci jednotlivých trénovacích a testovacích množin. Testování bylo provedeno pro databázi GLOBALPHONE obsahující promluvy několika světových jazyků. Z výsledných hodnot testování lze říci, že s rostoucím počtem promluv použitých pro referenční a testovací množinu dochází k poklesu chyby při verifikaci a identifikaci mluvčího. Tato implementace je základem systému na bázi DNN, kdy velmi často používanou konfigurací je nepřímé použití neuronových sítí pro výpočet příznaků s následnou identifikací na bázi i-vektorů. Konkrétním výsledkem je vzorový skript (recept) dle konvence KALDI, který může být použitý pro navazující implementaci systému s DNN.

This work is focused on methods using in speaker recognition. The main attention is paid to methods based on GMM using i-vectors. At the theoretical level, methods using deep neural networks are described. Implementation was created for the GMM-based systems using i-vectors, including the use of LDA and PLDA. KALDI tools have been used that are directly designed for speech recognition and speaker recognition tasks. The practical part focuses mainly on testing the influence of the number and distribution of speakers and utterances within individual training and testing sets. Testing was done for the GLOBALPHONE database containing several world languages. Based on the results, it can be said that with a higher number of reference and test utterances the identification and verification error decrease. This implementation is the basis of the DNN-based system. Often used configuration is the indirect use of neural networks to calculate the features followed by i-vector based identification. The concrete result is a script (recipe) according to the KALDI convention, which can be used for further implementation of the DNN system.