Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14279/9622
DC FieldValueLanguage
dc.contributor.authorAudhkhasi, Kartik-
dc.contributor.authorZavou, Andreas M.-
dc.contributor.authorGeorgiou, Panayiotis G.-
dc.contributor.authorNarayanan, Shrikanth S.-
dc.contributor.otherΖαβού, Αντρέας-
dc.date.accessioned2017-02-13T11:25:51Z-
dc.date.available2017-02-13T11:25:51Z-
dc.date.issued2014-03-01-
dc.identifier.citationIEEE Transactions on Audio, Speech and Language Processing, 2014, vol. 22, no. 3, pp. 711-726en_US
dc.identifier.issn15587924-
dc.identifier.urihttps://hdl.handle.net/20.500.14279/9622-
dc.description.abstractDiversity or complementarity of automatic speech recognition (ASR) systems is crucial for achieving a reduction in word error rate (WER) upon fusion using the ROVER algorithm. We present a theoretical proof explaining this often-observed link between ASR system diversity and ROVER performance. This is in contrast to many previous works that have only presented empirical evidence for this link or have focused on designing diverse ASR systems using intuitive algorithmic modifications. We prove that the WER of the ROVER output approximately decomposes into a difference of the average WER of the individual ASR systems and the average WER of the ASR systems with respect to the ROVER output. We refer to the latter quantity as the diversity of the ASR system ensemble because it measures the spread of the ASR hypotheses about the ROVER hypothesis. This result explains the trade-off between the WER of the individual systems and the diversity of the ensemble. We support this result through ROVER experiments using multiple ASR systems trained on standard data sets with the Kaldi toolkit. We use the proposed theorem to explain the lower WERs obtained by ASR confidence-weighted ROVER as compared to word frequency-based ROVER. We also quantify the reduction in ROVER WER with increasing diversity of the N-best list. We finally present a simple discriminative framework for jointly training multiple diverse acoustic models (AMs) based on the proposed theorem. Our framework generalizes and provides a theoretical basis for some recent intuitive modifications to well-known discriminative training criterion for training diverse AMs.en_US
dc.formatpdfen_US
dc.language.isoenen_US
dc.relation.ispartofIEEE Transactions on Audio, Speech and Language Processingen_US
dc.rights© IEEEen_US
dc.subjectAmbiguity decompositionen_US
dc.subjectAutomatic speech recognitionen_US
dc.subjectDiscriminative trainingen_US
dc.subjectDiversityen_US
dc.subjectEnsemble methodsen_US
dc.subjectROVERen_US
dc.subjectSystem combinationen_US
dc.titleTheoretical analysis of diversity in an ensemble of automatic speech recognition systemsen_US
dc.typeArticleen_US
dc.collaborationUniversity of Southern Californiaen_US
dc.collaborationCyprus University of Technologyen_US
dc.subject.categoryElectrical Engineering - Electronic Engineering - Information Engineeringen_US
dc.journalsSubscriptionen_US
dc.countryUnited Statesen_US
dc.countryCyprusen_US
dc.subject.fieldEngineering and Technologyen_US
dc.publicationPeer Revieweden_US
dc.identifier.doi10.1109/TASLP.2014.2303295en_US
dc.relation.issue3en_US
dc.relation.volume22en_US
cut.common.academicyear2013-2014en_US
dc.identifier.spage711en_US
dc.identifier.epage726en_US
item.grantfulltextnone-
item.openairecristypehttp://purl.org/coar/resource_type/c_6501-
item.fulltextNo Fulltext-
item.languageiso639-1en-
item.cerifentitytypePublications-
item.openairetypearticle-
crisitem.journal.journalissn2329-9304-
crisitem.journal.publisherIEEE-
Appears in Collections:Άρθρα/Articles
CORE Recommender
Show simple item record

SCOPUSTM   
Citations

14
checked on Nov 9, 2023

WEB OF SCIENCETM
Citations

12
Last Week
0
Last month
0
checked on Oct 29, 2023

Page view(s) 50

377
Last Week
0
Last month
4
checked on Dec 22, 2024

Google ScholarTM

Check

Altmetric


Items in KTISIS are protected by copyright, with all rights reserved, unless otherwise indicated.