An automated integrated speech and face imageanalysis system for the identification of human emotions

Loizou,  Christos P.

doi:10.1016/j.specom.2021.04.001

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14279/22655

Title:	An automated integrated speech and face imageanalysis system for the identification of human emotions
Authors:	Loizou, Christos P.
Major Field of Science:	Natural Sciences
Field Category:	Computer and Information Sciences
Keywords:	Speech analysis;Face analysis;Human emotions detection;Statistical analysis;Classification analysis
Issue Date:	Jun-2021
Source:	Speech Communication, 2021, vol. 130, pp. 15-26
Volume:	130
Start page:	15
End page:	26
Journal:	Speech Communication
Abstract:	Objective: Human interactions are related to speech and facial characteristics. It was suggested that speech signals and/or images of facial expressions may reveal human emotions and that both interact for the verification of a person's identity. The present study proposes and evaluates an automated integrated speech signal and facial image analysis system for the identification of seven different human emotions (Normal (N), Happy (H), Sad (S), Disgust (D), Fear (F), Anger (A), and Surprise (Su)). Methods: Speech recordings and face images from 7,441 subjects aged 20≤age≤74 were collected, normalized and filtered. From all the above recordings 55 speech signal features and 61 different image face texture features were extracted. Statistical and model multi-classification analysis were performed to select the features able to statistically significantly distinguish between the seven aforementioned human emotions (N, H, S, D, F, A and Su). The selected features alone or a combination of these features along with age and gender of the sample investigated were used to build two learning-based classifiers; and the classifiers’ accuracy was computed. Results: For each of the above mentioned human emotions, statistical significantly different speech and face image features were identified that may be used to distinguish between the aforementioned groups (N vs H, N vs S, N vs D, N vs F, N vs A, N vs Su). Using solely the statistically significant speech and image features identified, an overall percentage of correct classification (%CC) score of 93% was achieved. Conclusions: A significant number of speech and face image features have been derived from continuous speech and face images. Features were identified that were able to identify between seven different emotional human states. This study poses the basis for the development of an integrated system for the identification of emotional states from automatic analysis of free speech and image face analysis. Future work will investigate the development and integration of the proposed method into a mobile device.
URI:	https://hdl.handle.net/20.500.14279/22655
ISSN:	01676393
DOI:	10.1016/j.specom.2021.04.001
Rights:	© Elsevier Attribution-NonCommercial-NoDerivatives 4.0 International
Type:	Article
Affiliation :	Cyprus University of Technology
Publication Type:	Peer Reviewed
Appears in Collections:	Άρθρα/Articles

CORE Recommender

Sorry the service is unavailable at the moment. Please try again later.

Show full item record

SCOPUS^TM
Citations

6

checked on Nov 9, 2023

WEB OF SCIENCE^TM
Citations

4

Last Week
0

Last month
0

checked on Oct 29, 2023

Page view(s)

386

Last Week
4

Last month
26

checked on Apr 15, 2025

Google Scholar^TM

Check

Altmetric

This item is licensed under a Creative Commons License

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM