Prosody Utilization in Continuous Speech Recognition
Type of document
disertační práceAuthor
Bartošek, Jan
Supervisor
Hanžl, Václav
Field of study
Teoretická elektrotechnikaStudy program
Elektrotechnika a informatikaInstitutions assigning rank
České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra teorie obvodů.Metadata
Show full item recordAbstract
This doctoral thesis covers the theme of prosody utilization in automatic recognition
of continuous speech. Even though automatic speech recognition (ASR) systems have
imoproved immensely over the last several decades, they still lack making use of one of
the most important aspect of information using speech, which is a prosody. There have
already been proofs from other languages about the favourableness of prosody usage in
ASR and doctoral thesis tries to investigate the potential of Czech regarding prosody
usage.
The research activities can be divided into three main areas: a) pitch detection algorithms
(PDA) as needed prerequisite for prosodic feature extraction, b) Czech lexical
stress system as potential clue from acoustic signal for word boundary detection (and its
usage in ASR) and c) classi cation of sentence/phrase modality in Czech based purely on
an acoustic signal.
Firstly, the eld of pitch detection algorithms, a framework for their evaluation and
comparison is presented. Several new evaluation criteria are proposed as an extension to
existing ones together with metrics evaluation over four speech pitch reference databases.
Besides pure comparison, few modi cations of existing PDA methods are presented.
Namely a transition probability function in PDA post-processing is investigated in terms
of candidate distance measure and new temporal-forgetting principle for speech is brought
in as extension of method by time domain.
Czech as a xed-stress language with lexical stress on the rst syllable is known to have
a weak lexical stress acoustic correlation. Nevertheless, methods of how stressed syllables
or stress-group boundaries can be detected from speech signal were investigated. A system
with sophisticated feature extraction followed by statistical machine learning methods
to model those phenomenon in Czech is presented. Detected stress-group boundaries
can be (in most of cases) mapped to word boundaries which can be used for prosodic
evaluation of ASR hypothesis. A metric for such prosodic score, which can be directly
used in prosodic N-best evaluation or ASR error detection, is proposed. Also, ASR lattice
rescoring algorithm for Czech is presented.
Czech phrase modality detection from acoustic signal is covered and together with
existing phrase boundary detector can such system serve as an punctuation module for
Czech dictation ASR system or in Czech dialogue system to support its natural language
processing (NLP) part.
Collections
- Disertační práce - 13000 [721]
The following license files are associated with this item: