Unsupervised Approach to Expand Emotion Lexicon (2012-2013)

Overview

In recent years, web-users have been interacting and exchanging opinions on the Internet, producing large quantities of data. Being able to analyze this data would be very useful for many applications in e-commerce and e-tourism sectors; most sentiment analysis applications use a special lexicon to identify the sentiment of the individual words, and make generalizations about the emotion of the document. However, there are various challenges in manually constructing a sentiment lexicon—like how to classify words that are used in different contexts to produce different word-senses. Additionally, crowdsourcing the large amount of work to many annotators requires a method to detect and discard malicious or erroneous annotations. Therefore, we seek to create a sentiment lexicon using an unsupervised approach.

The National Research Council of Canada (NRC) word-emotion association lexicon was created by crowdsourcing on Mechanical Turk has about 14,200 word types and 24,000 human-annotated word senses ({“anger”, “anticipation”, “disgust”, “fear”, “joy”, “negative”, “positive”, “sadness”, “surprise”, “trust”}). Our goal is to increase the number of word types from the NRC lexicon; each of these new “target words” would have corresponding word senses found by an unsupervised corpus-based approach. Our method is to use the frequencies of bigrams, trigrams, four-grams and five-grams provided by Google n-grams and consider other variables—variables relating to the context word types—to calculate the appropriate word senses (emotions and sentiments) of a given target word. We aim to tune our variables using a training dataset and then apply the resulting approach to generate a new sentiment lexicon. Finally, we will also test the quality of the new lexicon in a opinion mining task with a benchmark testing dataset to see how our lexicon is beneficial to the task.
Publications

    1. Jessica Perrie, Aminul Islam, Evangelos Milios, Vlado Keselj, “Using Google n-grams to Expand Word-emotion Association Lexicon”, in Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2013, LNCS 7817, Springer, pp. 137-148, Samos, Greece, March 2013. [available at Springer Link]