OpinionLexicon

OpinionLexicon#

A dictionary for product reviews 🛒, comprising positive and negative words curated for informal language.#

OpinionLexicon summary

Composition:

Approximately 6.7k lexicons (incl. 221 n-grams)
Includes misspelt spellings that frequently appear in informal settings (product reviews)
Binary Class (positive, negative)

Creation Methodology:

Manually tagged an initial seed list of positive and negative adjectives.
Then leveraged the WordNet to expand this list via an iterative, bootstrapping process.
By repeatedly searching for synonyms and antonyms of 30 seed words in WordNet, the authors were able to determine the semantic orientations for a large number of opinion words

Evaluation: Hu and Liu (2004) evaluated their sentiment dictionary on customer reviews of 5 electronics products — 2 digital cameras, 1 DVD player, 1 MP3 player, and 1 cellular phone—from Amazon and Cnet, by manually labelling each review sentence as positive or negative. Sentences were predicted as positive or negative using a “bag-of-words” approach: if positive/negative lexicon words existed, the review was positive/negative. For sentences with multiple opinion words, the average orientation was used. The authors also accounted for negation words (i.e “no” or “not”), reversing the sentiment orientation when such words appeared within 5 words of an opinion word. The dictionary achieved 84% accuracy in predicting sentiment orientations, demonstrating its effectiveness at determining sentiment.

Usage Guidance: Excellent domain-specific lexicon for analysing sentiment in product reviews. Access processed dictionary via sentibank.archive.load().dict("OpinionLexicon_v2004")

📋 Introduction#

Hu and Liu (2004) created a sentiment dictionary specifically tailored for the product review domain. Their dictionary focuses on extracting opinion words - adjectives commonly used to express sentiments.

📚 Original Dictionary#

Creation of Seed Adjectives

To determine the ‘semantic orientation’ (or sentiment polarity) of opinion words, Hu and Liu (2004) first manually created a small list of 30 seed adjectives and tagged them as having either a positive or negative orientation. For example, “great” and “fantastic” were tagged as positive adjectives, while “bad” and “dull” were tagged as negative. Based on the seed list, Hu and Liu (2004) then used a bootstrapping technique leveraging WordNet to expand this seed list and determine the orientation of additional adjectives.

Expansion to Opinion Words (OrientationSearch)

Hu and Liu (2004) leveraged the structure of WordNet, where adjectives are organised into bipolar clusters, to iteratively expand their seed list of opinion words using a function called OrientationSearch. The authors assumed that because adjectives generally share the orientation of their synonyms and the opposite orientation of their antonyms, the orientation of a new adjective can be predicted by checking if its synonyms or antonyms appear in the seed list with known orientations.

In each OrientationSearch call, if the adjective had a synonym in the seed list, it was assigned the same orientation as that synonym. If it had an antonym in the seed list, it was assigned the opposite orientation. The newly oriented adjective was then added to the seed list for subsequent searches.

By repeatedly applying OrientationSearch, Hu and Liu (2004) were able to leverage the initial seed list of 30 adjectives to expand it significantly. Their final expanded opinion word list contained 2,006 positive and 4,783 negative words by utilising the synonyms and antonyms in WordNet.

from sentibank import archive

load = archive.load()
OpinionLexicon = load.origin("OpinionLexicon_v2004")

OpinionLexicon (Hu and Liu, 2004)
opinion words	semantic orientation
Loading... (need help?)

🧹 Processed Dictionary#

First-Pass Processing#

The original OpinionLexicon dictionary contained three duplicates - “envious”, “enviously”, and “enviousness.” These words can convey both positive and negative sentiment depending on the context. As the orientation of these ambiguous terms depends on their usage, the duplicate entries were simply removed, resulting in a processed dictionary of 6,786 entries.

It is notable that the authors intentionally included misspelt words in their dictionary. They observed such informal spellings frequently appear in social media content. By incorporating commonly used misspellings, the dictionary is tailored to analyse such informal texts.