Please use this identifier to cite or link to this item:
Title: An automated integrated speech and face imageanalysis system for the identification of human emotions
Authors: Loizou,  Christos P. 
Major Field of Science: Natural Sciences
Field Category: Computer and Information Sciences
Keywords: Speech analysis;Face analysis;Human emotions detection;Statistical analysis;Classification analysis
Issue Date: Jun-2021
Source: Speech Communication, 2021, vol. 130, pp. 15-26
Volume: 130
Start page: 15
End page: 26
Journal: Speech Communication 
Abstract: Objective: Human interactions are related to speech and facial characteristics. It was suggested that speech signals and/or images of facial expressions may reveal human emotions and that both interact for the verification of a person's identity. The present study proposes and evaluates an automated integrated speech signal and facial image analysis system for the identification of seven different human emotions (Normal (N), Happy (H), Sad (S), Disgust (D), Fear (F), Anger (A), and Surprise (Su)). Methods: Speech recordings and face images from 7,441 subjects aged 20≤age≤74 were collected, normalized and filtered. From all the above recordings 55 speech signal features and 61 different image face texture features were extracted. Statistical and model multi-classification analysis were performed to select the features able to statistically significantly distinguish between the seven aforementioned human emotions (N, H, S, D, F, A and Su). The selected features alone or a combination of these features along with age and gender of the sample investigated were used to build two learning-based classifiers; and the classifiers’ accuracy was computed. Results: For each of the above mentioned human emotions, statistical significantly different speech and face image features were identified that may be used to distinguish between the aforementioned groups (N vs H, N vs S, N vs D, N vs F, N vs A, N vs Su). Using solely the statistically significant speech and image features identified, an overall percentage of correct classification (%CC) score of 93% was achieved. Conclusions: A significant number of speech and face image features have been derived from continuous speech and face images. Features were identified that were able to identify between seven different emotional human states. This study poses the basis for the development of an integrated system for the identification of emotional states from automatic analysis of free speech and image face analysis. Future work will investigate the development and integration of the proposed method into a mobile device.
ISSN: 0167-6393
DOI: 10.1016/j.specom.2021.04.001
Rights: © Elsevier
Attribution-NonCommercial-NoDerivatives 4.0 International
Type: Article
Affiliation : Cyprus University of Technology 
Appears in Collections:Άρθρα/Articles

CORE Recommender
Show full item record

Page view(s)

Last Week
Last month
checked on Sep 17, 2021

Google ScholarTM



This item is licensed under a Creative Commons License Creative Commons