Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14279/12362
DC FieldValueLanguage
dc.contributor.authorTsapatsoulis, Nicolas-
dc.contributor.authorDjouvas, Constantinos-
dc.date.accessioned2018-07-25T07:07:22Z-
dc.date.available2018-07-25T07:07:22Z-
dc.date.issued2017-07-
dc.identifier.citation12th International Workshop on Semantic and Social Media Adaptation and Personalization, 2017, Bratislava, Slovakia, 9-10 Julyen_US
dc.identifier.urihttps://hdl.handle.net/20.500.14279/12362-
dc.description.abstractSentiment analysis of Twitter data became a research trend the last decade. Thanks to the Twitter API, massive amounts of tweets, relating to a topic of interest, can be collected in real time. Performing sentiment analysis of these tweets can be used to conduct social sensing and opinion mining. For instance, forecasting elections is a primary area in which sentiment analysis of tweets has been extensively applied the last few years. Sentiment analysis of Twitter data presents important challenges compared to the similar task of text classification. Tweets are limited to 140 characters; thus, the conveyed message is compressed and often context-dependent. The tweets are informal and unstructured, usually lacking grammatical soundness and use of a standard lexicon. On the other hand, tweets are usually annotated by their authors regarding their topic and sentiment with the aid of hashtags and emoticons. Identifying appropriate features for sentiment analysis of tweets remains an open research area since text indexing methods face the sparseness problem while POS tagging methods fail due to the lack of grammatical structure of tweets. Character based features, i.e., n-grams of characters, are currently getting popular because they are language independent. However, their effectiveness remains quite low. In this paper, we argue that tokens used by humans for sentiment analysis of tweets are probably the best feature set one can use for that purpose. We compare several automatically extracted features with the features (tokens) used by humans for tweet classification, under a machine learning framework. The results show that the manually indicated tokens combined with a Decision Tree classifier outperform any other feature set-classification algorithm combination. The manually annotated dataset that was used in our experiments is publicly available for anyone who wishes to use it.en_US
dc.formatpdfen_US
dc.language.isoenen_US
dc.rights© 2017 IEEE.en_US
dc.subjectFeature extractionen_US
dc.subjectMachine learningen_US
dc.subjectSentiment analysisen_US
dc.subjectTweet annotationen_US
dc.subjectTweet classificationen_US
dc.titleFeature extraction for tweet classification: do the humans perform better?en_US
dc.typeConference Papersen_US
dc.doihttps://doi.org/10.1109/SMAP.2017.8022667en_US
dc.collaborationCyprus University of Technologyen_US
dc.subject.categoryComputer and Information Sciencesen_US
dc.countryCyprusen_US
dc.subject.fieldNatural Sciencesen_US
dc.publicationPeer Revieweden_US
cut.common.academicyear2016-2017en_US
item.languageiso639-1en-
item.cerifentitytypePublications-
item.fulltextNo Fulltext-
item.grantfulltextnone-
item.openairetypeconferenceObject-
item.openairecristypehttp://purl.org/coar/resource_type/c_c94f-
crisitem.author.deptDepartment of Communication and Marketing-
crisitem.author.deptDepartment of Communication and Internet Studies-
crisitem.author.facultyFaculty of Communication and Media Studies-
crisitem.author.facultyFaculty of Communication and Media Studies-
crisitem.author.orcid0000-0002-6739-8602-
crisitem.author.orcid0000-0003-1215-7294-
crisitem.author.parentorgFaculty of Communication and Media Studies-
crisitem.author.parentorgFaculty of Communication and Media Studies-
Appears in Collections:Δημοσιεύσεις σε συνέδρια /Conference papers or poster or presentation
CORE Recommender
Show simple item record

Page view(s) 50

422
Last Week
0
Last month
3
checked on Oct 4, 2024

Google ScholarTM

Check


Items in KTISIS are protected by copyright, with all rights reserved, unless otherwise indicated.