Discrete Emotions Dictionary (DED)#
A sentiment lexicon crafted to precisely capture non-overlapping emotional categories in political communication đłď¸.#
DED summary
Composition:
113 unigram entries
Four discrete emotional categories:
anger
,anxiety
,sadness
, andoptimism
Creation Methodology:
Initial lexicon compilation involved gathering words from four existing dictionaries, discarding those unique to any dictionary
Manual refinement confirmed each wordâs clear expression of the target emotion using intuition, thesauri, and political news example
Contextual validity was assessed by analysing lexicon word usage from 40,000 New York Times article
Evaluation: Fioroni et al. (2022) highlighted that many existing dictionaries lack thorough validation. Their study rigorously tested validity using four validation methods:
Contextual Validation: Before finalising the lexicon, contextual validity was assessed by analysing real-world usage of the words in 40,000 New York Times articles spanning 2000 to 2016. For each word, 50 random example sentences were evaluated by the authors to determine if the word expressed the intended emotion. Words with accuracy below 85% based on the initial 50 sentence samples were removed from the lexicon. For words close to the 85% threshold, a resampling process was used - 50 new randomly selected sentences underwent a subsequent evaluation. If the wordâs validity remained below 85% after resampling, it was removed. If accuracy increased above 85%, the word was retained.
Conceptual Validation: On a corpus of 225,729 U.S. presidential campaign articles (from 1980 to 2016, across 17 different newspapers), conceptual validity was tested by examining if longitudinal emotion trends matched expectations based on historical events. These expectations are based in part on the authorsâ own sense of trends, but on prior scholarly work on the emotional content of these presidential campaigns. For example, the authors expected
anger
to peak in 2016, while optimism peaked in 2008 during Obamaâs campaign centred on hope. Results were conceptually in line with the authorsâ expectations, provided evidence of validity (for further discussion, please refer to Fioroni et al., 2022, pp.7-12).Discriminant Validation: Discriminant validity tested if the encoded emotions were distinct rather than overlapping. The mean correlation between DEDâs emotions was 0.14, indicating they captured discrete affective dimensions. Low inter-emotion correlation provided evidence that each lexical category conveys distinct affective signals, rather than redundancy. This statistical discrimination between encoded emotions further validated the refined lexicon.
Benchmark Validation: The authors utilised a crowdsourced benchmark dataset of 1,600 news article sentences labelled via Amazon Mechanical Turk. The 1,600 sentences were extracted from campaign news articles, comprising 80 examples for each emotion category:
anger
,anxiety
,sadness
,optimism
andno emotion
. In this context, âexampleâ denotes instances. An additional 1,200 sentences were randomly sampled for a comprehensive evaluation.Crowd workers then evaluated sentences for the presence of
anger
,anxiety
,sadness
,optimism
orno emotion
. All workers received the same first two sentences[1] as a quality check, followed by five randomly drawn sentences from the dataset. The authors only used data from workers who correctly coded the first two sentences. Each sentence was categorised based on majority vote across five crowd ratings. This benchmark enabled comparison of precision and recall between DED and LIWC using real-world labelled data.DED showed higher precision than LIWC, precisely identifying expressed emotions when detected. This aligns with DEDâs design goal of precisely targeting discrete emotions. However, LIWC had better recall, capturing a wider range of emotional expressions. The authors speculate this wider recall stems from DEDâs precision approach overlooking ambiguous expressions spanning multiple emotions. This precision-recall tradeoff highlights DEDâs conservative stance - its narrow targeting sacrifices recall for utmost precision. So while DED sometimes misses emotional content, researchers can have high confidence in the emotions it does detect. By comparing crowd-sourced labels, benchmark validation revealed the nuanced performance profile resulting from DEDâs design priorities of discrete emotion precision over broad recall.
Usage Guidance: Tailored for sentiment analysis in political communication, DED excels in precisely encoding discrete emotions. Ideal for nuanced analysis of political news content. Consider pairing with domain-specific lexicons for a comprehensive sentiment assessment in the realm of political discourse. Access the lexicon via `sentibank.archive.load().dict(âDED_v2022â).
đ Introduction#
The Discrete Emotions Dictionary (DED) aimed to reliably distinguish between discrete, non-overlapping emotional categories relevant to political communication (Fioroni et al., 2022). While related lexicons exist, the authors found none sufficiently captured discrete emotions. DED offered improved performance over LIWC for political news content by narrowly targeting distinct affective states.
âDiscrete emotionâ refers to âunique, categorical mental statesâ, differentiable to each other (Fioroni et al., 2022, p.2). DED takes a conservative approach - its narrow focus seeks to precisely identify emotions while minimising overlap between categories.
đ Original Dictionary#
DED was designed to address limitations of current sentiment dictionaries by more reliably capturing and differentiating four discrete emotions relevant to political communications: anger
, anxiety
(interchangeable emotion with fear), optimism
, and sadness
. The selection of these distinct affective states draws from existing literature on emotionality in political communication research (for an in-depth discussion on prior works, please refer to Fioroni et al., 2022, pp.3-4).
DED was developed and validated in three main steps: 1. Lexicon compilation; 2. Emotion-specific refinement; and 3. Contextual validation.
Lexicon compilation
The authors first compiled an initial list of words appearing in more than one of the four existing dictionaries - The General Inquirer, Regressive Imagery Dictionary (RID), Linguistic Inquiry and Word Count (LIWC) and Lexicoder Sentiment Dictionary (LSD). Any words that appear uniquely in any of the existing dictionaries were discarded. This was due to DED prioritising discrete, non-overlapping terms. The first step left the authors with 184 anger
, 39 anxiety
, 31 sadness
and 11 optimism
lexicons.
Emotion-specific refinement
The authors then manually refined this list by confirming each word clearly and unambiguously expressed the target emotion based on intuition, thesauri, and political news examples. The goal was to capture expressed emotions, not reader evocations. For instance, âangrilyâ directly conveys anger while âwarâ may or may not evoke emotions depending on the reader. This led to a reduction in the number of lexicons within the âangerâ category and a modest increase in the lexicon count across the other three categories, resulting in 71 for âanger,â 57 for âanxiety,â 36 for âsadness,â and 20 for âoptimism.â
Contextual validation
Finally, the authors tested contextual validity by analysing usage of lexicon words in a random sample of 50 sentences per word drawn from 40,000 NYT articles. Expert coders rated each sentence for whether the word accurately and unambiguously expressed its intended emotion. Validity percentages were calculated based on sentence agreement. Words with at least 85% validity remained in the lexicon. Borderline words near 85% underwent resampling and re-evaluation until reaching the threshold. This iterative, usage-based process ensured the final lexicon reflected language genuinely appearing in political news to express discrete emotions, rather than ambiguous or theoretical terms. The final dictionary comprised 113 entries, encompassing 52 anger
, 37 anxiety
, 15 sadness
and 9 optimism
lexicons.
from sentibank import archive
load = archive.load()
DED = load.origin("DED_v2022")
words | discrete emotions |
---|---|
Loading... (need help?) |
Processed Dictionary đ§š#
From the original word lists, no notable changes were made.
Note#
[1]The first two sentences were: (i) âPresident Trump tweeted that âItâs like the wheel, there is nothing better⌠I know tech better than anyone and technology on a Border is only effective in conjunction with a Wallâ; and (ii) GOP lawmakers are angry over Trumpâs decision to withdraw troops to make way for a Turkish offensive against Kurdish alliesâ