Zobrazit minimální záznam



dc.contributor.advisorHanžl, Václav
dc.contributor.authorBartošek, Jan
dc.date.accessioned2016-11-14T14:14:50Z
dc.date.available2016-11-14T14:14:50Z
dc.date.issued2016
dc.identifier.urihttp://hdl.handle.net/10467/66684
dc.description.abstractThis doctoral thesis covers the theme of prosody utilization in automatic recognition of continuous speech. Even though automatic speech recognition (ASR) systems have imoproved immensely over the last several decades, they still lack making use of one of the most important aspect of information using speech, which is a prosody. There have already been proofs from other languages about the favourableness of prosody usage in ASR and doctoral thesis tries to investigate the potential of Czech regarding prosody usage. The research activities can be divided into three main areas: a) pitch detection algorithms (PDA) as needed prerequisite for prosodic feature extraction, b) Czech lexical stress system as potential clue from acoustic signal for word boundary detection (and its usage in ASR) and c) classi cation of sentence/phrase modality in Czech based purely on an acoustic signal. Firstly, the eld of pitch detection algorithms, a framework for their evaluation and comparison is presented. Several new evaluation criteria are proposed as an extension to existing ones together with metrics evaluation over four speech pitch reference databases. Besides pure comparison, few modi cations of existing PDA methods are presented. Namely a transition probability function in PDA post-processing is investigated in terms of candidate distance measure and new temporal-forgetting principle for speech is brought in as extension of method by time domain. Czech as a xed-stress language with lexical stress on the rst syllable is known to have a weak lexical stress acoustic correlation. Nevertheless, methods of how stressed syllables or stress-group boundaries can be detected from speech signal were investigated. A system with sophisticated feature extraction followed by statistical machine learning methods to model those phenomenon in Czech is presented. Detected stress-group boundaries can be (in most of cases) mapped to word boundaries which can be used for prosodic evaluation of ASR hypothesis. A metric for such prosodic score, which can be directly used in prosodic N-best evaluation or ASR error detection, is proposed. Also, ASR lattice rescoring algorithm for Czech is presented. Czech phrase modality detection from acoustic signal is covered and together with existing phrase boundary detector can such system serve as an punctuation module for Czech dictation ASR system or in Czech dialogue system to support its natural language processing (NLP) part.en
dc.language.isoenen
dc.subjectProsodyen
dc.subjectspeech technologyen
dc.subjectASRen
dc.subjectF0en
dc.subjectpitchen
dc.subjectlexical stressen
dc.subjectstress groupen
dc.subjectmodalityen
dc.subjectmelodemeen
dc.subjectprosodic hypothesis scoringen
dc.titleProsody Utilization in Continuous Speech Recognitioncze
dc.typedisertační prácecze
dc.description.departmentKatedra teorie obvodů
theses.degree.disciplineTeoretická elektrotechnika
theses.degree.grantorČeské vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra teorie obvodů.
theses.degree.programmeElektrotechnika a informatika


Soubory tohoto záznamu



Tento záznam se objevuje v následujících kolekcích

Zobrazit minimální záznam