VADER#
📋 Introduction#
VADER stands for Valence Aware Dictionary and sEntiment Reasoner (Hutto and Gilbert 2014). It is a rule-based sentiment analysis algorithm, particularly aimed at social media texts, that uses a sentiment polarity and intensity (valence) sensitive lexicon dictionary.
Despite the fact that it was designed for scoring sentiments of social media texts, VADER achieved the highest F1 scores in all four genres – namely Social Media Text (Tweets), Movie Reviews, Amazon Product Reviews and NY Times Editorial – among seven other dictionary based methods. It should be noted that VADER particularly showed exceptional performance when classifying sentiments in the social media genre, even outperforming human raters (see Table 4, Hutto and Gilbert 2014, p.223).
📚 Original Dictionary#
The construction and validation of VADER dictionary (ver.2014) encompassed the subsequent processes.
Step |
Description |
---|---|
1 |
Construct a sentiment word-bank list inspired by established sources (LIWC, ANEW, and Harvard_GI). |
2 |
Incorporate 9,000 lexical feature candidates common to sentiment expression in microblogs, including: (i) Western-style emoticons (i.e ‘D=’) from Wikipedia; (ii) Sentiment-related acronyms and initialisms (i.e LOL) from Wikipedia; and (iii) Commonly used slangs from Internet Slang |
3 |
Assess the general applicability of each lexicon using the Wisdom-of-the-Crowd (WotC) approach, using Amazon Mechanical Turk[1]. Ten independent raters (or ‘turkers’) rated each lexicons on a scale from -4 (‘Extremely Negative’) to 4 (‘Extremely Positive’) |
To produce reliable Wisdom-of-the-Crowd sentiment intensity ratings, Hutto and Gilbert (2014) implemented several quality control processes. To name few: (i) Raters were prescreened for English Comprehension; (ii) Raters had to complete sentiment rating training and score 90% on matching known ratings; (iii) Batches contained “golden items” to check rating consistency; and (iv) Bonus incentives were given for matching group means (for further details, see Hutto and Gilbert 2014, pp.219-221).
Once all lexicons were scored between the range of [-4,4], Hutto and Gilbert (2014) calculated the summary statistics based on the aggregate ratings of ten independent raters. The final set of lexicons consisted of 7,517 entries with non-zero mean valence scores and a standard deviation below 2.5.
from sentibank import archive
load = archive.load()
vader = load.origin("VADER_v2014")
SentimentExpression | mean | std | sample |
---|---|---|---|
Loading... (need help?) |
🧹 Processed Dictionary#
First-Pass Processing#
The original VADER lexicon contained 15 duplicate entries. For most duplicates, such as “fav” and “lmao”, we calculated average sample scores. However, the entries “d:” and “d=” had contradictory valence ratings:
“d:” had scores of 1.2 (positive) and -2.9 (negative)
“d=” had scores of 1.5 (positive) and -3.0 (negative)
Since both terms lean toward negative sentiment polarity, we removed the positive scores for “d:” (“D:”) and “d=” (“D=”) from the final lexicon. In general, averaging the duplicate ratings created a unified score. But with clear polarity conflicts like “d:” and “d=”, we deferred to the predominant negative rating based on the term’s apparent sentiment leaning.
We also removed redundant uppercase emoticons and substituted two uncommon variants with their lowercase forms. This accounts for VADER comparing tokens in lowercase, ensuring consistency with the lexicon dictionary format:
VADER compares tokens in lowercase form. So we removed 7 uppercase emoticons (“-:O”, “(:O”, “:-D”, “:D”, “;D”, “=-D”, “=D”) since their lowercase versions (“:-o”, “(:o”, “:-d”, “:d”, “;d”, “=-d”, “=d”) already existed.
We substituted “:-Þ” and “:Þ” with their lowercase versions “:-þ” and “:þ” for consistency, since VADER checks the lowercase form.
As a result, the first-pass processed VADER dictionary has 7,495 lexicons instead of 7,517.
Note
We are performing an ongoing quality check focused on abbreviations, slangs, acronyms, and initialisms in the lexicon. For example, in future versions we may remove terms like “aug-00” that do not qualify as accepted types of shorthand.
Contributions to this preliminary lexicon refinement are welcome. If you would like to provide suggestions, please open an issue here.
Note#
[1] Amazon Mechanical Turk (AMT) is a micro-labor website enabling researchers to outsource human intelligence tasks (HITs), such as image annotation, surveys and data validation. To maintain the quality of data, Hutto and Gilbert (2014) hired a quality control processes (see 3.1.1 Screening, Training, Selecting, and Data Quality Checking Crowd-Sourced Evaluations and Validations, Hutto and Gilbert 2014, p.220)