Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.14279/33221
Title: | Visual Perception of Obstacles: Do Humans and Machines Focus on the Same Image Features? | Authors: | Kyriakides, Constantinos Thoma, Marios Theodosiou, Zenonas Partaourides, Harris Michael, Loizos Lanitis, Andreas |
Major Field of Science: | Social Sciences | Field Category: | Media and Communications | Keywords: | Obstacle Recognition;Deep Learning Algorithms;Explainability;Eye Tracking;Heatmaps | Issue Date: | 2024 | Source: | Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2024, vol. 4, pp. 357-364 | Volume: | 4 | Start page: | 357 | End page: | 364 | Conference: | 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications | Abstract: | Contemporary cities are fractured by a growing number of barriers, such as on-going construction and infrastructure damages, which endanger pedestrian safety. Automated detection and recognition of such barriers from visual data has been of particular concern to the research community in recent years. Deep Learning (DL) algorithms are now the dominant approach in visual data analysis, achieving excellent results in a wide range of applications, including obstacle detection. However, explaining the underlying operations of DL models remains a key challenge in gaining significant understanding on how they arrive at their decisions. The use of heatmaps that highlight the focal points in input images that helped the models reach their predictions has emerged as a form of post-hoc explainability for such models. In an effort to gain insights into the learning process of DL models, we studied the similarities between heatmaps generated by a number of architectures trained to detect obstacles on sidewalks in images collected via smartphones, and eye-tracking heatmaps generated by humans as they detect the corresponding obstacles on the same data. Our findings indicate that the focus points of humans more closely align with those of a Vision Transformer architecture, as opposed to the other network architectures we examined in our experiments. | URI: | https://hdl.handle.net/20.500.14279/33221 | ISBN: | 9789897586798 | DOI: | 10.5220/0012453500003660 | Rights: | Attribution-NonCommercial-NoDerivatives 4.0 International | Type: | Book Chapter | Affiliation : | CYENS - Centre of Excellence Open University Cyprus Cyprus University of Technology AI Cyprus Ethical Novelties |
Funding: | Directorate General for European Programmes, Coordination and Development Funding sponsor Framework Programme | Publication Type: | Peer Reviewed |
Appears in Collections: | Κεφάλαια βιβλίων/Book chapters |
CORE Recommender
This item is licensed under a Creative Commons License