Discrete Emotions Dictionary (DED)

Discrete Emotions Dictionary (DED)#

A sentiment lexicon crafted to precisely capture non-overlapping emotional categories in political communication 🗳️.#

DED summary

Composition:

113 unigram entries
Four discrete emotional categories: anger, anxiety, sadness, and optimism

Creation Methodology:

Initial lexicon compilation involved gathering words from four existing dictionaries, discarding those unique to any dictionary
Manual refinement confirmed each word’s clear expression of the target emotion using intuition, thesauri, and political news example
Contextual validity was assessed by analysing lexicon word usage from 40,000 New York Times article

Evaluation: Fioroni et al. (2022) highlighted that many existing dictionaries lack thorough validation. Their study rigorously tested validity using four validation methods:

Contextual Validation: Before finalising the lexicon, contextual validity was assessed by analysing real-world usage of the words in 40,000 New York Times articles spanning 2000 to 2016. For each word, 50 random example sentences were evaluated by the authors to determine if the word expressed the intended emotion. Words with accuracy below 85% based on the initial 50 sentence samples were removed from the lexicon. For words close to the 85% threshold, a resampling process was used - 50 new randomly selected sentences underwent a subsequent evaluation. If the word’s validity remained below 85% after resampling, it was removed. If accuracy increased above 85%, the word was retained.
Conceptual Validation: On a corpus of 225,729 U.S. presidential campaign articles (from 1980 to 2016, across 17 different newspapers), conceptual validity was tested by examining if longitudinal emotion trends matched expectations based on historical events. These expectations are based in part on the authors’ own sense of trends, but on prior scholarly work on the emotional content of these presidential campaigns. For example, the authors expected anger to peak in 2016, while optimism peaked in 2008 during Obama’s campaign centred on hope. Results were conceptually in line with the authors’ expectations, provided evidence of validity (for further discussion, please refer to Fioroni et al., 2022, pp.7-12).
Discriminant Validation: Discriminant validity tested if the encoded emotions were distinct rather than overlapping. The mean correlation between DED’s emotions was 0.14, indicating they captured discrete affective dimensions. Low inter-emotion correlation provided evidence that each lexical category conveys distinct affective signals, rather than redundancy. This statistical discrimination between encoded emotions further validated the refined lexicon.
Benchmark Validation: The authors utilised a crowdsourced benchmark dataset of 1,600 news article sentences labelled via Amazon Mechanical Turk. The 1,600 sentences were extracted from campaign news articles, comprising 80 examples for each emotion category: anger, anxiety, sadness, optimism and no emotion. In this context, “example” denotes instances. An additional 1,200 sentences were randomly sampled for a comprehensive evaluation.

Crowd workers then evaluated sentences for the presence of anger, anxiety, sadness, optimism or no emotion. All workers received the same first two sentences^[1] as a quality check, followed by five randomly drawn sentences from the dataset. The authors only used data from workers who correctly coded the first two sentences. Each sentence was categorised based on majority vote across five crowd ratings. This benchmark enabled comparison of precision and recall between DED and LIWC using real-world labelled data.

DED showed higher precision than LIWC, precisely identifying expressed emotions when detected. This aligns with DED’s design goal of precisely targeting discrete emotions. However, LIWC had better recall, capturing a wider range of emotional expressions. The authors speculate this wider recall stems from DED’s precision approach overlooking ambiguous expressions spanning multiple emotions. This precision-recall tradeoff highlights DED’s conservative stance - its narrow targeting sacrifices recall for utmost precision. So while DED sometimes misses emotional content, researchers can have high confidence in the emotions it does detect. By comparing crowd-sourced labels, benchmark validation revealed the nuanced performance profile resulting from DED’s design priorities of discrete emotion precision over broad recall.

Usage Guidance: Tailored for sentiment analysis in political communication, DED excels in precisely encoding discrete emotions. Ideal for nuanced analysis of political news content. Consider pairing with domain-specific lexicons for a comprehensive sentiment assessment in the realm of political discourse. Access the lexicon via `sentibank.archive.load().dict(“DED_v2022”).

📋 Introduction#

The Discrete Emotions Dictionary (DED) aimed to reliably distinguish between discrete, non-overlapping emotional categories relevant to political communication (Fioroni et al., 2022). While related lexicons exist, the authors found none sufficiently captured discrete emotions. DED offered improved performance over LIWC for political news content by narrowly targeting distinct affective states.

“Discrete emotion” refers to ‘unique, categorical mental states’, differentiable to each other (Fioroni et al., 2022, p.2). DED takes a conservative approach - its narrow focus seeks to precisely identify emotions while minimising overlap between categories.

📚 Original Dictionary#

DED was designed to address limitations of current sentiment dictionaries by more reliably capturing and differentiating four discrete emotions relevant to political communications: anger, anxiety (interchangeable emotion with fear), optimism, and sadness. The selection of these distinct affective states draws from existing literature on emotionality in political communication research (for an in-depth discussion on prior works, please refer to Fioroni et al., 2022, pp.3-4).

DED was developed and validated in three main steps: 1. Lexicon compilation; 2. Emotion-specific refinement; and 3. Contextual validation.

Lexicon compilation

The authors first compiled an initial list of words appearing in more than one of the four existing dictionaries - The General Inquirer, Regressive Imagery Dictionary (RID), Linguistic Inquiry and Word Count (LIWC) and Lexicoder Sentiment Dictionary (LSD). Any words that appear uniquely in any of the existing dictionaries were discarded. This was due to DED prioritising discrete, non-overlapping terms. The first step left the authors with 184 anger, 39 anxiety, 31 sadness and 11 optimism lexicons.

Emotion-specific refinement

The authors then manually refined this list by confirming each word clearly and unambiguously expressed the target emotion based on intuition, thesauri, and political news examples. The goal was to capture expressed emotions, not reader evocations. For instance, “angrily” directly conveys anger while “war” may or may not evoke emotions depending on the reader. This led to a reduction in the number of lexicons within the ‘anger’ category and a modest increase in the lexicon count across the other three categories, resulting in 71 for ‘anger,’ 57 for ‘anxiety,’ 36 for ‘sadness,’ and 20 for ‘optimism.’

Contextual validation

Finally, the authors tested contextual validity by analysing usage of lexicon words in a random sample of 50 sentences per word drawn from 40,000 NYT articles. Expert coders rated each sentence for whether the word accurately and unambiguously expressed its intended emotion. Validity percentages were calculated based on sentence agreement. Words with at least 85% validity remained in the lexicon. Borderline words near 85% underwent resampling and re-evaluation until reaching the threshold. This iterative, usage-based process ensured the final lexicon reflected language genuinely appearing in political news to express discrete emotions, rather than ambiguous or theoretical terms. The final dictionary comprised 113 entries, encompassing 52 anger, 37 anxiety, 15 sadness and 9 optimism lexicons.

from sentibank import archive 

load = archive.load()
DED = load.origin("DED_v2022")

DED (Fioroni et al., 2022)
words	discrete emotions
Loading... (need help?)

Processed Dictionary 🧹#

From the original word lists, no notable changes were made.

Note#

^[1]The first two sentences were: (i) “President Trump tweeted that “It’s like the wheel, there is nothing better… I know tech better than anyone and technology on a Border is only effective in conjunction with a Wall”; and (ii) GOP lawmakers are angry over Trump’s decision to withdraw troops to make way for a Turkish offensive against Kurdish allies”