Prosody Utilization in Continuous Speech Recognition
Typ dokumentudisertační práce
Studijní oborTeoretická elektrotechnika
Studijní programElektrotechnika a informatika
Instituce přidělující hodnostČeské vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra teorie obvodů.
MetadataZobrazit celý záznam
This doctoral thesis covers the theme of prosody utilization in automatic recognition of continuous speech. Even though automatic speech recognition (ASR) systems have imoproved immensely over the last several decades, they still lack making use of one of the most important aspect of information using speech, which is a prosody. There have already been proofs from other languages about the favourableness of prosody usage in ASR and doctoral thesis tries to investigate the potential of Czech regarding prosody usage. The research activities can be divided into three main areas: a) pitch detection algorithms (PDA) as needed prerequisite for prosodic feature extraction, b) Czech lexical stress system as potential clue from acoustic signal for word boundary detection (and its usage in ASR) and c) classi cation of sentence/phrase modality in Czech based purely on an acoustic signal. Firstly, the eld of pitch detection algorithms, a framework for their evaluation and comparison is presented. Several new evaluation criteria are proposed as an extension to existing ones together with metrics evaluation over four speech pitch reference databases. Besides pure comparison, few modi cations of existing PDA methods are presented. Namely a transition probability function in PDA post-processing is investigated in terms of candidate distance measure and new temporal-forgetting principle for speech is brought in as extension of method by time domain. Czech as a xed-stress language with lexical stress on the rst syllable is known to have a weak lexical stress acoustic correlation. Nevertheless, methods of how stressed syllables or stress-group boundaries can be detected from speech signal were investigated. A system with sophisticated feature extraction followed by statistical machine learning methods to model those phenomenon in Czech is presented. Detected stress-group boundaries can be (in most of cases) mapped to word boundaries which can be used for prosodic evaluation of ASR hypothesis. A metric for such prosodic score, which can be directly used in prosodic N-best evaluation or ASR error detection, is proposed. Also, ASR lattice rescoring algorithm for Czech is presented. Czech phrase modality detection from acoustic signal is covered and together with existing phrase boundary detector can such system serve as an punctuation module for Czech dictation ASR system or in Czech dialogue system to support its natural language processing (NLP) part.
- Disertační práce - 13000 
K tomuto záznamu jsou přiřazeny následující licenční soubory: