Rozpoznávání spojité řeči s pokročilými strukturami hlubokých neuronových sítí

Martin Šubert

Continuous Speech Recognition using Advanced Deep Neural Networks

dc.contributor.advisor	Pollák Petr
dc.contributor.author	Martin Šubert
dc.date.accessioned	2021-06-09T22:53:07Z
dc.date.available	2021-06-09T22:53:07Z
dc.date.issued	2021-06-09
dc.identifier	KOS-1064879351205
dc.identifier.uri	http://hdl.handle.net/10467/94803
dc.description.abstract	Tato diplomová práce prezentuje systémy automatického rozpoznávánı́ řeči založené na hlubokých neuronových sı́tı́ch (DNN) s pokročilými strukturami a jejich implementace pro anglický a český jazyk s použitı́m Kaldi nástrojů. Experimenty použı́vajı́ nejnovějšı́ verzi Kaldi DNN nnet3 nástrojů, které podporujı́ zmı́něné pokročilé DNN struktury. Byly použity dva nnet3 modely - standardnı́ nnet3 model a tzv. chain model, který byl vytvořen jako součást nnet3 nástrojů za účelem snı́žit čas dekódovánı́. Implementace pracuje s DNN-HMM architekturou se dvěma typy neuronových sı́tı́, TDNN a LSTM, při použitı́ nnet3 i chain modelů. Navı́c byla použita CNN neuronová sı́t za použitı́ chain modelu. Experimenty porovnávajı́ přesnost DNN-HMM modelu se standardnı́m GMM-HMM přı́stupem, kdy nejlepšı́ přesnost byla dosažena pomocı́ TDNN sı́tě s WER 2.69 % pro anglický jazyk, což bylo zhruba 5% zlepšenı́ oproti standardnı́mu GMM-HMM modelu. Nejlepšı́ výsledek pro český jazyk byl také dosažen s použitı́m TDNN sı́tě s hodnotou WER 10.78 % (přibližně 9% zlepšenı́ oproti GMM-HMM modelu). Druhá část experi-mentů porovnávala zlepšenı́ DNN-HMM systémů vycházejı́cı́ch z GMM-HMM modelů natrénovaných s trigramovým a čtyřgramovým jazykovým modelem, a dále se slovnı́kem s a bez pravděpodobnostmi ticha a výslovnosti. Poslednı́ část experimentů porovnává přesnost a celkovou dobu zpracovánı́ za použitı́ chain modelů a standardnı́ch nnet3 modelů. Zlepšenı́ přesnosti lze vidět u TDNN i u LSTM-TDNN chain modelu, kdy většı́ přı́nos zaznamenala TDNN struktura. Doba dekódovánı́ poklesla u obou DNN struktur, kdy chain LSTM-TDNN model dekódoval téměř devětkrát rychleji než jeho implementace po-mocı́ standardnı́ch nnet3 modelů. U LSTM-TDNN chain modelu navı́c došlo ke snı́ženı́ doby trénovánı́, kdy byl model natrénován zhruba o 20 % rychleji oproti standardnı́mu nnet3 modelu. Nicméně u TDNN chain modelu byla doba trénovánı́ vı́ce než dvojnásobná oproti standardnı́ nnet3 variantě.	cze
dc.description.abstract	This thesis presents Automatic Speech Recognition (ASR) systems based on deep neural networks (DNN) with advanced structures and their implementations for the English and Czech languages using the Kaldi toolkit. The experiments use the newest Kaldi DNN nnet3 setup, which supports mentioned advanced DNN types. Moreover, two nnet3 models are adopted - the standard nnet3 models and the chain models, which are implemented as a part of the nnet3 DNN setup with the intention to decrease decoding time. The implementations work with DNN-HMM architecture with two neural network types, Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM), using both nnet3 and chain models. Additionally, the Convolutional Neural Network (CNN) neural network structure is adopted using the chain model. Experiments compare the accuracy of the DNN-HMM ASR models with the standard GMM-HMM approach, and the best accuracy was achieved with TDNN network with World Error Rate (WER) of 2.69 % for the English language, which was about 5% improvement over the standard GMM-HMM model. The best result for the Czech language was also accomplished with the TDNN network with a WER of 10.78 % (approximately 9% improvement over the GMM-HMM model). Secondly, the performance of DNN-HMM systems using GMM-HMM models trained with trigram and fourgram language models (LM) was analyzed as well as with a dictionary with and without silence and pronunciation probabilities. Finally, the accuracy improvements and the overall processing speed of the chain models over the standard nnet3 models were tested. The enhancement of accuracy was achieved both in TDNN and LSTM-TDNN chain models when more extensive improvements registered the TDNN chain model. The decoding time decreased for both DNN chain models when LSTM-TDNN decoded almost nine times faster than standard nnet3 implementation. LSTM-TDNN chain model also reduces the training time when the model was trained about 20 % faster. Nevertheless, the TDNN chain model had worse training speed when the network trained more than twice slower than the nnet3 variant.	eng
dc.publisher	České vysoké učení technické v Praze. Vypočetní a informační centrum.	cze
dc.publisher	Czech Technical University in Prague. Computing and Information Centre.	eng
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html	eng
dc.rights	Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html	cze
dc.subject	Hluboké neuronové sı́tě	cze
dc.subject	rozpoznávánı́ řeči	cze
dc.subject	DNN-HMM	cze
dc.subject	Kaldi	cze
dc.subject	nnet3	cze
dc.subject	chain modely	cze
dc.subject	Deep neural networks	eng
dc.subject	speech recognition	eng
dc.subject	DNN-HMM	eng
dc.subject	Kaldi	eng
dc.subject	nnet3	eng
dc.subject	chain models	eng
dc.title	Rozpoznávání spojité řeči s pokročilými strukturami hlubokých neuronových sítí	cze
dc.title	Continuous Speech Recognition using Advanced Deep Neural Networks	eng
dc.type	diplomová práce	cze
dc.type	master thesis	eng
dc.contributor.referee	Mizera Petr
theses.degree.discipline	Audiovizuální technika a zpracování signálů	cze
theses.degree.grantor	katedra radioelektroniky	cze
theses.degree.programme	Elektronika a komunikace	cze

Files in this item

Name:: F3-DP-2021-Subert-Martin-Conti ...
Size:: 13.53Mb
Format:: PDF
Description:: PLNY_TEXT
: View/Open

Name:: F3-DP-2021-Subert-Martin-prilo ...
Size:: 4.976Mb
Format:: Unknown
Description:: PRILOHA
: View/Open

Name:: F3-DP-2021-posudek-Mizera_Petr.pdf
Size:: 104.5Kb
Format:: PDF
Description:: POSUDEK
: View/Open

Name:: F3-DP-2021-posudek-Pollak_Petr.pdf
Size:: 851.5Kb
Format:: PDF
Description:: POSUDEK
: View/Open

This item appears in the following Collection(s)

Diplomové práce - 13137 [271]

Show simple item record