Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.14279/22655
Title: | An automated integrated speech and face imageanalysis system for the identification of human emotions | Authors: | Loizou, Christos P. | Major Field of Science: | Natural Sciences | Field Category: | Computer and Information Sciences | Keywords: | Speech analysis;Face analysis;Human emotions detection;Statistical analysis;Classification analysis | Issue Date: | Jun-2021 | Source: | Speech Communication, 2021, vol. 130, pp. 15-26 | Volume: | 130 | Start page: | 15 | End page: | 26 | Journal: | Speech Communication | Abstract: | Objective: Human interactions are related to speech and facial characteristics. It was suggested that speech signals and/or images of facial expressions may reveal human emotions and that both interact for the verification of a person's identity. The present study proposes and evaluates an automated integrated speech signal and facial image analysis system for the identification of seven different human emotions (Normal (N), Happy (H), Sad (S), Disgust (D), Fear (F), Anger (A), and Surprise (Su)). Methods: Speech recordings and face images from 7,441 subjects aged 20≤age≤74 were collected, normalized and filtered. From all the above recordings 55 speech signal features and 61 different image face texture features were extracted. Statistical and model multi-classification analysis were performed to select the features able to statistically significantly distinguish between the seven aforementioned human emotions (N, H, S, D, F, A and Su). The selected features alone or a combination of these features along with age and gender of the sample investigated were used to build two learning-based classifiers; and the classifiers’ accuracy was computed. Results: For each of the above mentioned human emotions, statistical significantly different speech and face image features were identified that may be used to distinguish between the aforementioned groups (N vs H, N vs S, N vs D, N vs F, N vs A, N vs Su). Using solely the statistically significant speech and image features identified, an overall percentage of correct classification (%CC) score of 93% was achieved. Conclusions: A significant number of speech and face image features have been derived from continuous speech and face images. Features were identified that were able to identify between seven different emotional human states. This study poses the basis for the development of an integrated system for the identification of emotional states from automatic analysis of free speech and image face analysis. Future work will investigate the development and integration of the proposed method into a mobile device. | URI: | https://hdl.handle.net/20.500.14279/22655 | ISSN: | 01676393 | DOI: | 10.1016/j.specom.2021.04.001 | Rights: | © Elsevier Attribution-NonCommercial-NoDerivatives 4.0 International |
Type: | Article | Affiliation : | Cyprus University of Technology | Publication Type: | Peer Reviewed |
Appears in Collections: | Άρθρα/Articles |
CORE Recommender
SCOPUSTM
Citations
50
6
checked on Nov 9, 2023
WEB OF SCIENCETM
Citations
4
Last Week
0
0
Last month
0
0
checked on Oct 29, 2023
Page view(s)
311
Last Week
1
1
Last month
6
6
checked on Jan 3, 2025
Google ScholarTM
Check
Altmetric
This item is licensed under a Creative Commons License