Theoretical analysis of diversity in an ensemble of automatic speech recognition systems

Audhkhasi, Kartik; Zavou, Andreas M.; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.

doi:10.1109/TASLP.2014.2303295

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: https://hdl.handle.net/20.500.14279/9622

Τίτλος:	Theoretical analysis of diversity in an ensemble of automatic speech recognition systems
Συγγραφείς:	Audhkhasi, Kartik Zavou, Andreas M. Georgiou, Panayiotis G. Narayanan, Shrikanth S.
metadata.dc.contributor.other:	Ζαβού, Αντρέας
Major Field of Science:	Engineering and Technology
Field Category:	Electrical Engineering - Electronic Engineering - Information Engineering
Λέξεις-κλειδιά:	Ambiguity decomposition;Automatic speech recognition;Discriminative training;Diversity;Ensemble methods;ROVER;System combination
Ημερομηνία Έκδοσης:	1-Μαρ-2014
Πηγή:	IEEE Transactions on Audio, Speech and Language Processing, 2014, vol. 22, no. 3, pp. 711-726
Volume:	22
Issue:	3
Start page:	711
End page:	726
Περιοδικό:	IEEE Transactions on Audio, Speech and Language Processing
Περίληψη:	Diversity or complementarity of automatic speech recognition (ASR) systems is crucial for achieving a reduction in word error rate (WER) upon fusion using the ROVER algorithm. We present a theoretical proof explaining this often-observed link between ASR system diversity and ROVER performance. This is in contrast to many previous works that have only presented empirical evidence for this link or have focused on designing diverse ASR systems using intuitive algorithmic modifications. We prove that the WER of the ROVER output approximately decomposes into a difference of the average WER of the individual ASR systems and the average WER of the ASR systems with respect to the ROVER output. We refer to the latter quantity as the diversity of the ASR system ensemble because it measures the spread of the ASR hypotheses about the ROVER hypothesis. This result explains the trade-off between the WER of the individual systems and the diversity of the ensemble. We support this result through ROVER experiments using multiple ASR systems trained on standard data sets with the Kaldi toolkit. We use the proposed theorem to explain the lower WERs obtained by ASR confidence-weighted ROVER as compared to word frequency-based ROVER. We also quantify the reduction in ROVER WER with increasing diversity of the N-best list. We finally present a simple discriminative framework for jointly training multiple diverse acoustic models (AMs) based on the proposed theorem. Our framework generalizes and provides a theoretical basis for some recent intuitive modifications to well-known discriminative training criterion for training diverse AMs.
URI:	https://hdl.handle.net/20.500.14279/9622
ISSN:	15587924
DOI:	10.1109/TASLP.2014.2303295
Rights:	© IEEE
Type:	Article
Affiliation:	University of Southern California Cyprus University of Technology
Publication Type:	Peer Reviewed
Εμφανίζεται στις συλλογές:	Άρθρα/Articles

CORE Recommender

Δείξε την πλήρη περιγραφή του τεκμηρίου

SCOPUS^TM
Citations

14

checked on 9 Νοε 2023

WEB OF SCIENCE^TM
Citations

12

Last Week
0

Last month
0

checked on 29 Οκτ 2023

Page view(s) 50

377

Last Week
0

Last month
4

checked on 22 Δεκ 2024

Google Scholar^TM

Check

Altmetric

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s) 50

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM