Robust recognition of strongly distorted speech
Typ dokumentu
disertační práceAutor
Borský, Michal
Vedoucí práce
Pollák, Petr
Studijní obor
Teoretická elektrotechnikaStudijní program
Elektrotechnika a informatikaInstituce přidělující hodnost
České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra teorie obvodůMetadata
Zobrazit celý záznamAbstrakt
The automatic speech recognition systems have become a part of our daily lives.
People often rely on virtual personal assistants in smartphones, use their voice to control
intelligent devices in cars and smart homes or communicate with automatic dialogue
systems in call-centres. Since these systems often suffer from a performance drop in
realistic acoustic conditions which are characterized by strong distortions, a large portion
of research still must be focused on robust front-end algorithms and acoustic modelling
methods for distorted speech recognition. This thesis is focused on these compensation
methods working at the level of front-end processing and acoustic modelling, whose aim is
to compensate the degradation introduced by a distant microphone, noisy environments
and a lossy compression.
The techniques for noisy and distant speech recognition studied in this thesis were focused
on front-end noise suppression techniques, feature normalization techniques, acoustic
model adaptations and discriminative training. Said techniques were evaluated in
three different car conditions and two different public environments. The experiments
have proved, that extended spectral subtraction can bring significant improvement even
for the state-of-the-art systems in public environments with a strong noise and for a
far-distance microphone recordings.
The evaluation of compressed speech recognition examined the degrading effects of
lossy compression on fundamental frequency, formants and smoothed LPC spectrum and
for standard MFCC and PLP features used for ASR. The low-pass filtering and the areas
of very low energy in a spectrogram were identified as the two main reasons of degradation.
The practical experiments evaluated the contributions of specific feature extraction setups,
combinations of normalization and compensation techniques, supervised and unsupervised
adaptation and discriminative training methods and finally the matched training. The
largest contributions were gained from the application of adaptation techniques, subspace
GMM and discriminative training.
A novel algorithm named Spectrally selective dithering (SSD) was proposed within this
thesis, which compensated the effect of spectral valleys. The contribution of said algorithm
was verified for both GMM-HMM and DNN-HMM speech recognition systems for Czech
and English and for a GMM-HMM system for German. The practical experiments proved
that the proposed algorithm can lower WER for all languages with GMM-HMM systems.
Concerning DNN-HMM system, a significant contribution was achieved only for Czech.
Kolekce
- Disertační práce - 13000 [694]
K tomuto záznamu jsou přiřazeny následující licenční soubory: