Norms of Valence, Arousal and Dominance (NoVAD)#
A broad coverage of Valence, Arousal, and Dominance ratings for 14,000 common English lemmas đ.#
NoVAD summary
Composition:
Approximately 14k unigrams
Continuous scoring metrics ranging [1,9] across three dimensions (
valence
,arousal
,dominance
)Note that both of the processed dictionary scaled the values ranging [-4, 4], incorporating
valence
andarousal
dimensions only
Creation Methodology:
13,915 common words were selected from ANEW, Category Norms, and SUBTLEX-US
Employed Amazon Mechanical Turk for dimension-specific ratings
Each list rated by 20 participants, ensuring a comprehensive dataset (a total of ~1.09 million ratings gathered from 1,827 participants)
Evaluation: Warriner, Kuperman and Brysbaert (2013) validated NoVAD using three methods:
Connotative Validity: Ratings from a demographically diverse group of 1,827 native English speakers helped validate the connotative accuracy of the norms. Approximately 1.09 million ratings were gathered. The participants spanned a wide range of ages (16-87 years), education levels, and included a gender balance of 60% female, 40% male.
Discriminant Validity: Pearsonâs correlations between dimensions showed
arousal
/valence
andarousal
/dominance
had low correlation. However,dominance
andvalence
were highly correlated (0.974).Concordance Validity: Correlations with 6 existing dictionaries were examined (Warriner, Kuperman and Brysbaert, 2013, p.1197). Valence ratings were highly consistent across studies and languages (correlations ranging from 0.847 to 0.953). Arousal (0.635-0.759) and dominance (0.774-0.833) showed more variability in correlations, indicating greater inconsistency.
Usage Guidance: Useful for studying emotions and psychology in large texts and datasets. Provides granular affective ratings to enrich language analysis. Access via sentibank.archive.load().dict("NoVAD_v2013_norm")
for a two-dimensional vector of valence
and arousal
scores, or sentibank.archive.load().dict("NoVAD_v2013_boosted")
for a condensed ratings that adjusted valence
score based on arousal
.
đ Introduction#
While the ANEW lexicon provided norms for over 1,000 words, its scale was too limited for large psycholinguistic studies. To address this, Warriner, Kuperman and Brysbaert (2013) created Norms of Valence, Arousal and Dominance (NoVAD), compiling affective ratings for nearly 14,000 English lemmas. Their expanded lexicon enabled the study of emotion and language in much larger texts and datasets.
đ Original Dictionary#
The stimulus words were compiled from three sources: 1,029 words from ANEW (Bradley and Lang, 1999), rated on pleasure, arousal, and dominance scales (1-9); 1,060 words from Category Norms (Van Overschelde, Rawson and Dunlosky, 2004) across 60 categories; and 30,000 lemmas (nouns, verbs, adjectives) from SUBTLEX-US (Brysbaert & New, 2009), a corpus of movie subtitles. Only word lemmas were included since emotional values were expected to generalise to inflected forms.
Warriner, Kuperman and Brysbaert (2013) then opted for the highest-frequency words, known to at least 70% of the population according to Kuperman, Stadthagen-Gonzalez and Brysbaert (2012). This ensured valid ratings, as unfamiliar words receive less consistent affective judgments. It is worth noting that Warriner and Kuperman (2014) raised concerns about ANEW introducing distributional bias compared to natural language, given that ANEWâs words were compiled from prior affective studies. Warriner and Kuperman (2014) argued that NoVAD dictionary mitigates this bias, as datasets solely based on word frequency provide a more authentic representation of organic language use. The final set contained 13,915 words: 22.5% adjectives, 63.5% nouns, 12.6% verbs, and 1.4% other/unspecified.
Using Amazon Mechanical Turk, the authors collected ratings on three dimensions using a 9-point scale, similar to ANEW. However, in contrast to ANEW, each assignment involved participants rating word lists on only one dimension. There were 43 word lists total. Each list contained approximately 350 words - 50 ANEW words (10 âcalibratorsâ, 40 âcontrolsâ) plus 300 non-ANEW words. The 10 calibrator words were pre-selected from ANEW, separately for each of the three dimensions (valence
, arousal
, dominance
), and were identical across all lists. The calibrator words for the respective dimensions were as follows (in increasing order of ratings):
Valence
: âjailâ (1.91), âinvaderâ (2.23), âinsecureâ (2.30), âindustryâ (5.07), âiceboxâ (5.67), âhatâ (5.69), âgrinâ (7.66), âkittenâ (7.58), âjokeâ (7.88), and âfreeâ (8.25).Arousal
: âstatueâ (2.82), ârockâ (3.14), âsadâ (3.49), âcatâ (4.50), âcuriousâ (5.74), ârobberâ (6.20), âshotgunâ (6.55), âassaultâ (6.80), âthrillâ (7.19), and âsexâ (7.60).Dominance
: âlightningâ (4.00), âmildewâ (4.19), âwaterfallâ (5.34), âwealthyâ (6.11), âlighthouseâ (6.24), âhoneyâ (6.39), âtreatâ (6.66), âmightyâ (6.85), âadmiredâ (6.94), and âlibertyâ (7.04).
Calibrators were shown first to demonstrate the rating scale range, to help participants understand the range of feelings. The remaining ANEW words, excluding the 30 calibrator words, were randomly divided into sets of 40. These words served as controls for estimating correlations between the participantâs data and the ANEW norms.
Each list was rated by 20 participants, gathering ~1.09 million ratings (from a total of 1,827 participants). Around 3% of responses were removed due to missing data, lack of variability, or too few ratings. Over 87% of words received 18-30 ratings per dimension. The final dataset consisted of 303,539 valence
samples (95% of original), 339,323 arousal
samples (89%), and 281,735
dominance samples (74%)[1].
The authors calculated initial means and standard deviations for all ratings. They reversed 9% of participant ratings that showed a negative correlation with the mean rating for a given word. This addressed cases where a participantâs ratings conflicted with typical responses. For example, a participant might have indicated that a negative word like âjailâ made them very happy. In simpler terms, if a participant gave ratings opposite to what most people would expect for certain words, those ratings were flipped to ensure that the overall patterns of responses aligned more intuitively with the expected emotions associated with those words.The means and standard deviations were then recalculated after rating reversals to reflect more intuitive patterns overall.
from sentibank import archive
load = archive.load()
NoVAD = load.origin("NoVAD_v2013")
Note
The ratings include valence (V), arousal (A), and dominance (D) dimensions, each reported three times for every word. For each word, the overall mean (Mean.Sum), standard deviation (SD.Sum), and the number of contributing ratings (Rat.Sum) are provided. Additionally, group differences are reported with suffixes such as .M (male), .F (female), .O (older), .Y (younger), .H (high education), and .L (low education).
Word | V.Mean.Sum | V.SD.Sum | V.Rat.Sum | A.Mean.Sum | A.SD.Sum | A.Rat.Sum | D.Mean.Sum | D.SD.Sum | D.Rat.Sum | V.Mean.M | V.SD.M | V.Rat.M | V.Mean.F | V.SD.F | V.Rat.F | A.Mean.M | A.SD.M | A.Rat.M | A.Mean.F | A.SD.F | A.Rat.F | D.Mean.M | D.SD.M | D.Rat.M | D.Mean.F | D.SD.F | D.Rat.F | V.Mean.Y | V.SD.Y | V.Rat.Y | V.Mean.O | V.SD.O | V.Rat.O | A.Mean.Y | A.SD.Y | A.Rat.Y | A.Mean.O | A.SD.O | A.Rat.O | D.Mean.Y | D.SD.Y | D.Rat.Y | D.Mean.O | D.SD.O | D.Rat.O | V.Mean.L | V.SD.L | V.Rat.L | V.Mean.H | V.SD.H | V.Rat.H | A.Mean.L | A.SD.L | A.Rat.L | A.Mean.H | A.SD.H | A.Rat.H | D.Mean.L | D.SD.L | D.Rat.L | D.Mean.H | D.SD.H | D.Rat.H |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading... (need help?) |
đ§č Processed Dictionary#
First-Pass Processing#
Two different processed dictionaries have been compiled for NoVAD:
NoVAD_v2013_norm
represents sentiment in a two-dimensional vector encapsulatingvalence
andarousal
. This approach aligns with Warriner and Kupermanâs (2014) assertion that an accurate portrayal of sentiment ârequires a bidimensional perspectiveâ (p.16).NoVAD_v2013_boosted
takes a more experimental approach. It seeks to condense sentiment into a single score by adjustingvalence
intensity based onarousal
levels.
The creation of NoVAD_v2013_norm
followed a simple process. Standardisation was achieved by applying min-max scaling to the original valence
and arousal
scores, initially ranging from 1 to 9. This process resulted in scaled scores ranging from -4 to 4. Subsequently, these scaled scores were organised into a list format, with the first element corresponding to valence
and the second to arousal
.
The NoVAD_v2013_boosted
involved nuanced adjustments to the valence
dimensions. These adjustments were primarily informed by insights from Warriner and Kuperman (2014). Previously, Warriner, Kuperman and Brysbaert (2013) had found that for both arousal/valence and arousal/dominance dimensions, quadratic (non-linear) relationships explained more variance than linear coefficients[2]. This suggests potential threshold effects, where the influence of arousal on valence or dominance is more pronounced at specific arousal levels. Building on this, Warriner and Kuperman (2014) focused on characterising affect using only valence
and arousal
dimensions. They highlighted that âhigher arousal [accompanies] extreme values of valenceâ (Warriner and Kuperman, 2014, p.10).
Warriner and Kupermanâs (2014) observation that arousal
modulates valence
suggests considering arousal
as a degree modifier for valence
. The idea of using arousal as a degree modifier for valence stems from Hutto and Gilbert (2014), who identified that certain words modify sentiment degree on average by 0.293. This value was derived by âcontrolling the specific grammatical or syntactical feature presented as an independent variableâ (p.222). Similar to Hutto and Gilbertâs (2014) hierarchical application of 0.293 with weighted differences, NoVAD_v2013_boosted
systematically integrates the degree modifier effect of arousal
on valence
scores. It does this based on Warriner and Kupermanâs (2014) chi-squared test analysis.
Specifically, Warriner and Kuperman (2014) used a chi-squared test to analyse word types across 100 bins. These bins were formed by the intersection of 10 arousal deciles (A1-A10, with A1 being the lowest arousal percentile and A10 the highest) and 10 valence deciles (V1-V10, with V1 being the most negative percentile, V5 neutral, and V10 most positive). As seen in Fig. 2 (p.11), this analysis revealed distinct patterns, demonstrating arousal strongly modulates the distribution of word types over valence ratings (pp.10-11).
Given these results, we employ a decile-based hierarchy to systematically enhance or dampen sentiment scores. For extreme valence values, we progressively intensify the score based on the arousal decile, amplifying positivity/negativity. Conversely, for neutral valence values, we systematically diminish the score based on the arousal decile, introducing a damping effect to neutrality.
The initial valence
scores underwent a standardisation process, ranging from -4 to 4 through min-max scaling. Subsequently, the polarity of the following word groups was intensified:
Region [A9:A10, V1] showed chi-square residuals ~15. This contains highly aroused (top 20%), very negative words (<10%) like âabuseâ and âpigheadedâ. We intensify their negativity by subtracting 0.293. 646 words belong to this lexical space.
Region [A8, V1] showed residuals ~10. We intensify negativity of these less aroused (top 30-20%) but still very negative words (<10%) by subtracting 0.278 (=0.95x0.293). 241 words belong to this lexical space.
Region [A10, V10] showed residuals around ~10. We intensify the positivity of these highly aroused (>top 10%), very positive words (>top 10%) like âenthusiasticâ and âorgasmâ by adding 0.278. 239 words belong to this lexical space.
Region [A7:A10, V2] showed residuals ~5. We intensify negativity of these moderately aroused (top 40%) but still quite negative words (<20%, >=10%) by subtracting 0.264 (=0.9x0.293). 797 words belong to this lexical space.
Region [A7:A9, V10] showed residuals ~5. We intensify the positivity of these moderately aroused (top 40-10%) yet very positive words (>10%) by adding 0.264. 506 words belong to this lexical space.
The neutrality of the words were also dampened:
Region [A1:A4, V4:V7] showed residuals ~5. These are the large lexical spaces, with calmer and relatively neutral words like âfoamâ and ânorthernâ. Indeed, comparing these words with the words in the region [A10, V4:V7], which are those highly aroused but relatively neutral words like âemotionalâ and âpremonitionâ, these are more truly neutral.
We dampen valence of these words toward neutrality, adding 0.264 for negative words and subtracting 0.264 for positive words, except those already between -0.264 and 0.264. 1,256 words belong to this lexical space.
More negatives (1,684) than positives (745) were intensified, potentially mitigating the positivity bias often discussed in various psychological research (Warriner and Kuperman, 2014, pp. 2-5). Studies note negative words are less common (Clark and Clark, 1977), and thus perceived more potent (Rozin and Royzman, 2001; Baumeister et al., 2001). However, positive words appear more frequently, conveying less information (Garcia et al., 2012). By intensifying negatives, NoVAD_v2013_boosted
may better reflect how people naturally react to negativity. The higher proportion of intensified negatives counterbalances the typically high informativeness of rare negatives.
In total, 2,429 positives/negatives were intensified, 1,256 neutrals were dampened, and 9,280 used the original scaled valence. This resulted in scores from -4.033 to 3.794, rescaled back to -4 to 4.
Note#
[1] Valence
ratings were relatively consistent across participants, while arousal
and dominance
were more variable. For valence
, scores were symmetrical around the median, with more extremely positive/negative words having smaller rating variability, as compared to valence-neutral words. This held for dominance
too. In contrast, for arousal
, calmer words elicited more consistent ratings than exciting words. Based on such findings, the authors commented that valence
and dominance
depend on magnitude (strength), while arousal
on polarity.
[2] This observation is supported by the positive correlation between valence
and arousal
for positive words (mean valence rating > 6), and conversely, the negative correlation between valence
and arousal
for negative words (mean valence rating < 4). Additionally, the relationship between arousal
and dominance
also exhibits a U-shaped pattern. This is further evidenced by the positive correlation between dominance
and arousal
for high-rated dominance words (mean rating > 6) and the negative correlation between dominance
and arousal
for low-rated dominance words (mean rating < 4).