Mining online political opinion surveys for suspect entries: An interdisciplinary comparison
Journal
Journal of Innovation in Digital Ecosystems
Date Issued
December 2016
DOI
10.1016/j.jides.2016.11.003
Abstract
Filtering data generated by so-called Voting Advice Applications (VAAs) in order to remove
entries that exhibit unrealistic behavior (i.e., cannot correspond to a real political view)
is of primary importance. If such entries are significantly present in VAA generated
datasets, they can render conclusions drawn from VAA data analysis invalid. In this work
we investigate approaches that can be used for automating the process of identifying
entries that appear to be suspicious in terms of a users’ answer patterns. We utilize
two unsupervised data mining techniques and compare their performance against a well
established psychometric approach. Our results suggest that the performance of data
mining approaches is comparable to those drawing on psychometric theory with a fraction
of the complexity. More specifically, our simulations show that data mining techniques as
well as psychometric approaches can be used to identify truly ‘rogue’ data (i.e., completely
random data injected into the dataset under investigation). However, when analysing
real datasets the performance of all approaches dropped considerably. This suggests that
‘suspect’ entries are neither random nor clustered. This finding poses some limitations on
the use of unsupervised techniques, suggesting that the latter can only complement rather
than substitute existing methods to identifying suspicious entries.
entries that exhibit unrealistic behavior (i.e., cannot correspond to a real political view)
is of primary importance. If such entries are significantly present in VAA generated
datasets, they can render conclusions drawn from VAA data analysis invalid. In this work
we investigate approaches that can be used for automating the process of identifying
entries that appear to be suspicious in terms of a users’ answer patterns. We utilize
two unsupervised data mining techniques and compare their performance against a well
established psychometric approach. Our results suggest that the performance of data
mining approaches is comparable to those drawing on psychometric theory with a fraction
of the complexity. More specifically, our simulations show that data mining techniques as
well as psychometric approaches can be used to identify truly ‘rogue’ data (i.e., completely
random data injected into the dataset under investigation). However, when analysing
real datasets the performance of all approaches dropped considerably. This suggests that
‘suspect’ entries are neither random nor clustered. This finding poses some limitations on
the use of unsupervised techniques, suggesting that the latter can only complement rather
than substitute existing methods to identifying suspicious entries.
File(s)![Thumbnail Image]()
Name
1-s2.0-S2352664516300256-main.pdf
Size
470.69 KB
Format
Adobe PDF
Checksum (MD5)
268b6549e604d475fd034fd134e22a50

