This doctoral thesis covers the theme of prosody utilization in automatic recognition
of continuous speech. Even though automatic speech recognition (ASR) systems have
imoproved immensely over the last several decades, they still lack making use of one of
the most important aspect of information using speech, which is a prosody. There have
already been proofs from other languages about the favourableness of prosody usage in
ASR and doctoral thesis tries to investigate the potential of Czech regarding prosody
usage.
The research activities can be divided into three main areas: a) pitch detection algorithms
(PDA) as needed prerequisite for prosodic feature extraction, b) Czech lexical
stress system as potential clue from acoustic signal for word boundary detection (and its
usage in ASR) and c) classi cation of sentence/phrase modality in Czech based purely on
an acoustic signal.
Firstly, the eld of pitch detection algorithms, a framework for their evaluation and
comparison is presented. Several new evaluation criteria are proposed as an extension to
existing ones together with metrics evaluation over four speech pitch reference databases.
Besides pure comparison, few modi cations of existing PDA methods are presented.
Namely a transition probability function in PDA post-processing is investigated in terms
of candidate distance measure and new temporal-forgetting principle for speech is brought
in as extension of method by time domain.
Czech as a xed-stress language with lexical stress on the rst syllable is known to have
a weak lexical stress acoustic correlation. Nevertheless, methods of how stressed syllables
or stress-group boundaries can be detected from speech signal were investigated. A system
with sophisticated feature extraction followed by statistical machine learning methods
to model those phenomenon in Czech is presented. Detected stress-group boundaries
can be (in most of cases) mapped to word boundaries which can be used for prosodic
evaluation of ASR hypothesis. A metric for such prosodic score, which can be directly
used in prosodic N-best evaluation or ASR error detection, is proposed. Also, ASR lattice
rescoring algorithm for Czech is presented.
Czech phrase modality detection from acoustic signal is covered and together with
existing phrase boundary detector can such system serve as an punctuation module for
Czech dictation ASR system or in Czech dialogue system to support its natural language
processing (NLP) part.
en
dc.language.iso
en
en
dc.subject
Prosody
en
dc.subject
speech technology
en
dc.subject
ASR
en
dc.subject
F0
en
dc.subject
pitch
en
dc.subject
lexical stress
en
dc.subject
stress group
en
dc.subject
modality
en
dc.subject
melodeme
en
dc.subject
prosodic hypothesis scoring
en
dc.title
Prosody Utilization in Continuous Speech Recognition
cze
dc.type
disertační práce
cze
dc.description.department
Katedra teorie obvodů
theses.degree.discipline
Teoretická elektrotechnika
theses.degree.grantor
České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra teorie obvodů.