Stochastic Transformer Networks With Linear Competing Units: Application To End-to-End SL Translation

Voskou, Andreas; Panousis, Konstantinos P.; Kosmopoulos, Dimitrios I.; Metaxas, Dimitris; Chatzis, Sotirios P.

doi:10.1109/ICCV48922.2021.01173

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: https://hdl.handle.net/20.500.14279/27107

Τίτλος:	Stochastic Transformer Networks With Linear Competing Units: Application To End-to-End SL Translation
Συγγραφείς:	Voskou, Andreas Panousis, Konstantinos P. Kosmopoulos, Dimitrios I. Metaxas, Dimitris Chatzis, Sotirios P.
Major Field of Science:	Engineering and Technology
Field Category:	Other Engineering and Technologies
Λέξεις-κλειδιά:	Memory management;Stochastic processes;Gesture recognition;Benchmark testing;Assistive technologies;Machine learning architectures and formulations;Representation learning;Vision + language
Ημερομηνία Έκδοσης:	10-Οκτ-2021
Πηγή:	Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11946-11955
Start page:	11946
End page:	11955
Project:	aRTIFICIAL iNTELLIGENCE for the Deaf (aiD)
Conference:	IEEE/CVF International Conference on Computer Vision (ICCV)
Περίληψη:	Automating sign language translation (SLT) is a challenging real-world application. Despite its societal importance, though, research progress in the field remains rather poor. Crucially, existing methods that yield viable performance necessitate the availability of laborious to obtain gloss sequence groundtruth. In this paper, we attenuate this need, by introducing an end-to-end SLT model that does not entail explicit use of glosses; the model only needs text groundtruth. This is in stark contrast to existing end-to-end models that use gloss sequence groundtruth, either in the form of a modality that is recognized at an intermediate model stage, or in the form of a parallel output process, jointly trained with the SLT model. Our approach constitutes a Transformer network with a novel type of layers that combines: (i) local winner-takes-all (LWTA) layers with stochastic winner sampling, instead of conventional ReLU layers, (ii) stochastic weights with posterior distributions estimated via variational inference, and (iii) a weight compression technique at inference time that exploits estimated posterior variance to perform massive, almost lossless compression. We demonstrate that our approach can reach the currently best reported BLEU-4 score on the PHOENIX 2014T benchmark, but without making use of glosses for model training, and with a memory footprint reduced by more than 70%.
URI:	https://hdl.handle.net/20.500.14279/27107
DOI:	10.1109/ICCV48922.2021.01173
Rights:	Attribution-NonCommercial-NoDerivatives 4.0 International
Type:	Conference Papers
Affiliation:	Cyprus University of Technology University of Patras Rutgers University
Publication Type:	Peer Reviewed
Εμφανίζεται στις συλλογές:	Δημοσιεύσεις σε συνέδρια /Conference papers or poster or presentation

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
Stochastic Transformer Networks.pdf		690.84 kB	Adobe PDF	Δείτε/ Ανοίξτε

CORE Recommender

Δείξε την πλήρη περιγραφή του τεκμηρίου

SCOPUS^TM
Citations 20

17

checked on 6 Νοε 2023

Page view(s) 20

305

Last Week
9

Last month
36

checked on 7 Μαρ 2025

Download(s) 20

102

checked on 7 Μαρ 2025

Google Scholar^TM

Check

Altmetric

Αυτό το τεκμήριο προστατεύεται από άδεια Άδεια Creative Commons