Stochastic Transformer Networks With Linear Competing Units: Application To End-to-End SL Translation

Voskou, Andreas; Panousis, Konstantinos P.; Kosmopoulos, Dimitrios I.; Metaxas, Dimitris; Chatzis, Sotirios P.

doi:10.1109/ICCV48922.2021.01173

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14279/27107

DC Field	Value	Language
dc.contributor.author	Voskou, Andreas	-
dc.contributor.author	Panousis, Konstantinos P.	-
dc.contributor.author	Kosmopoulos, Dimitrios I.	-
dc.contributor.author	Metaxas, Dimitris	-
dc.contributor.author	Chatzis, Sotirios P.	-
dc.date.accessioned	2022-12-21T11:18:53Z	-
dc.date.available	2022-12-21T11:18:53Z	-
dc.date.issued	2021-10-10	-
dc.identifier.citation	Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11946-11955	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.14279/27107	-
dc.description.abstract	Automating sign language translation (SLT) is a challenging real-world application. Despite its societal importance, though, research progress in the field remains rather poor. Crucially, existing methods that yield viable performance necessitate the availability of laborious to obtain gloss sequence groundtruth. In this paper, we attenuate this need, by introducing an end-to-end SLT model that does not entail explicit use of glosses; the model only needs text groundtruth. This is in stark contrast to existing end-to-end models that use gloss sequence groundtruth, either in the form of a modality that is recognized at an intermediate model stage, or in the form of a parallel output process, jointly trained with the SLT model. Our approach constitutes a Transformer network with a novel type of layers that combines: (i) local winner-takes-all (LWTA) layers with stochastic winner sampling, instead of conventional ReLU layers, (ii) stochastic weights with posterior distributions estimated via variational inference, and (iii) a weight compression technique at inference time that exploits estimated posterior variance to perform massive, almost lossless compression. We demonstrate that our approach can reach the currently best reported BLEU-4 score on the PHOENIX 2014T benchmark, but without making use of glosses for model training, and with a memory footprint reduced by more than 70%.	en_US
dc.language.iso	en	en_US
dc.relation	aRTIFICIAL iNTELLIGENCE for the Deaf (aiD)	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Memory management	en_US
dc.subject	Stochastic processes	en_US
dc.subject	Gesture recognition	en_US
dc.subject	Benchmark testing	en_US
dc.subject	Assistive technologies	en_US
dc.subject	Machine learning architectures and formulations	en_US
dc.subject	Representation learning	en_US
dc.subject	Vision + language	en_US
dc.title	Stochastic Transformer Networks With Linear Competing Units: Application To End-to-End SL Translation	en_US
dc.type	Conference Papers	en_US
dc.collaboration	Cyprus University of Technology	en_US
dc.collaboration	University of Patras	en_US
dc.collaboration	Rutgers University	en_US
dc.subject.category	Other Engineering and Technologies	en_US
dc.journals	Open Access	en_US
dc.country	Cyprus	en_US
dc.country	Greece	en_US
dc.country	United States	en_US
dc.subject.field	Engineering and Technology	en_US
dc.publication	Peer Reviewed	en_US
dc.relation.conference	IEEE/CVF International Conference on Computer Vision (ICCV)	en_US
dc.identifier.doi	10.1109/ICCV48922.2021.01173	en_US
cut.common.academicyear	2021-2022	en_US
dc.identifier.spage	11946	en_US
dc.identifier.epage	11955	en_US
item.openairecristype	http://purl.org/coar/resource_type/c_c94f	-
item.grantfulltext	open	-
item.cerifentitytype	Publications	-
item.fulltext	With Fulltext	-
item.languageiso639-1	en	-
item.openairetype	conferenceObject	-
crisitem.author.dept	Department of Electrical Engineering, Computer Engineering and Informatics	-
crisitem.author.faculty	Faculty of Engineering and Technology	-
crisitem.author.orcid	0000-0002-4956-4013	-
crisitem.author.parentorg	Faculty of Engineering and Technology	-
crisitem.project.funder	EC Joint Research Centre	-
crisitem.project.fundingProgram	H2020	-
crisitem.project.openAire	info:eu-repo/grantAgreement/EC/H2020/872139	-
Appears in Collections:	Δημοσιεύσεις σε συνέδρια /Conference papers or poster or presentation

Files in This Item:

File	Description	Size	Format
Stochastic Transformer Networks.pdf		690.84 kB	Adobe PDF	View/Open

CORE Recommender

Show simple item record

SCOPUS^TM
Citations 20

17

checked on Nov 6, 2023

Page view(s) 20

201

Last Week
0

Last month
11

checked on May 20, 2024

Download(s) 20

54

checked on May 20, 2024

Google Scholar^TM

Check

Altmetric

This item is licensed under a Creative Commons License