Norms of Valence, Arousal and Dominance (NoVAD)#

A broad coverage of Valence, Arousal, and Dominance ratings for 14,000 common English lemmas 📖.#

NoVAD summary

Composition:

  • Approximately 14k unigrams

  • Continuous scoring metrics ranging [1,9] across three dimensions (valence, arousal, dominance)

  • Note that both of the processed dictionary scaled the values ranging [-4, 4], incorporating valence and arousal dimensions only

Creation Methodology:

  • 13,915 common words were selected from ANEW, Category Norms, and SUBTLEX-US

  • Employed Amazon Mechanical Turk for dimension-specific ratings

  • Each list rated by 20 participants, ensuring a comprehensive dataset (a total of ~1.09 million ratings gathered from 1,827 participants)

Evaluation: Warriner, Kuperman and Brysbaert (2013) validated NoVAD using three methods:

  1. Connotative Validity: Ratings from a demographically diverse group of 1,827 native English speakers helped validate the connotative accuracy of the norms. Approximately 1.09 million ratings were gathered. The participants spanned a wide range of ages (16-87 years), education levels, and included a gender balance of 60% female, 40% male.

  2. Discriminant Validity: Pearson’s correlations between dimensions showed arousal/valence and arousal/dominance had low correlation. However, dominance and valence were highly correlated (0.974).

  3. Concordance Validity: Correlations with 6 existing dictionaries were examined (Warriner, Kuperman and Brysbaert, 2013, p.1197). Valence ratings were highly consistent across studies and languages (correlations ranging from 0.847 to 0.953). Arousal (0.635-0.759) and dominance (0.774-0.833) showed more variability in correlations, indicating greater inconsistency.

Usage Guidance: Useful for studying emotions and psychology in large texts and datasets. Provides granular affective ratings to enrich language analysis. Access via sentibank.archive.load().dict("NoVAD_v2013_norm") for a two-dimensional vector of valence and arousal scores, or sentibank.archive.load().dict("NoVAD_v2013_boosted") for a condensed ratings that adjusted valence score based on arousal.

📋 Introduction#

While the ANEW lexicon provided norms for over 1,000 words, its scale was too limited for large psycholinguistic studies. To address this, Warriner, Kuperman and Brysbaert (2013) created Norms of Valence, Arousal and Dominance (NoVAD), compiling affective ratings for nearly 14,000 English lemmas. Their expanded lexicon enabled the study of emotion and language in much larger texts and datasets.

📚 Original Dictionary#

The stimulus words were compiled from three sources: 1,029 words from ANEW (Bradley and Lang, 1999), rated on pleasure, arousal, and dominance scales (1-9); 1,060 words from Category Norms (Van Overschelde, Rawson and Dunlosky, 2004) across 60 categories; and 30,000 lemmas (nouns, verbs, adjectives) from SUBTLEX-US (Brysbaert & New, 2009), a corpus of movie subtitles. Only word lemmas were included since emotional values were expected to generalise to inflected forms.

Warriner, Kuperman and Brysbaert (2013) then opted for the highest-frequency words, known to at least 70% of the population according to Kuperman, Stadthagen-Gonzalez and Brysbaert (2012). This ensured valid ratings, as unfamiliar words receive less consistent affective judgments. It is worth noting that Warriner and Kuperman (2014) raised concerns about ANEW introducing distributional bias compared to natural language, given that ANEW’s words were compiled from prior affective studies. Warriner and Kuperman (2014) argued that NoVAD dictionary mitigates this bias, as datasets solely based on word frequency provide a more authentic representation of organic language use. The final set contained 13,915 words: 22.5% adjectives, 63.5% nouns, 12.6% verbs, and 1.4% other/unspecified.

Using Amazon Mechanical Turk, the authors collected ratings on three dimensions using a 9-point scale, similar to ANEW. However, in contrast to ANEW, each assignment involved participants rating word lists on only one dimension. There were 43 word lists total. Each list contained approximately 350 words - 50 ANEW words (10 ‘calibrators’, 40 ‘controls’) plus 300 non-ANEW words. The 10 calibrator words were pre-selected from ANEW, separately for each of the three dimensions (valence, arousal, dominance), and were identical across all lists. The calibrator words for the respective dimensions were as follows (in increasing order of ratings):

  • Valence: “jail” (1.91), “invader” (2.23), “insecure” (2.30), “industry” (5.07), “icebox” (5.67), “hat” (5.69), “grin” (7.66), “kitten” (7.58), “joke” (7.88), and “free” (8.25).

  • Arousal: “statue” (2.82), “rock” (3.14), “sad” (3.49), “cat” (4.50), “curious” (5.74), “robber” (6.20), “shotgun” (6.55), “assault” (6.80), “thrill” (7.19), and “sex” (7.60).

  • Dominance: “lightning” (4.00), “mildew” (4.19), “waterfall” (5.34), “wealthy” (6.11), “lighthouse” (6.24), “honey” (6.39), “treat” (6.66), “mighty” (6.85), “admired” (6.94), and “liberty” (7.04).

Calibrators were shown first to demonstrate the rating scale range, to help participants understand the range of feelings. The remaining ANEW words, excluding the 30 calibrator words, were randomly divided into sets of 40. These words served as controls for estimating correlations between the participant’s data and the ANEW norms.

Each list was rated by 20 participants, gathering ~1.09 million ratings (from a total of 1,827 participants). Around 3% of responses were removed due to missing data, lack of variability, or too few ratings. Over 87% of words received 18-30 ratings per dimension. The final dataset consisted of 303,539 valence samples (95% of original), 339,323 arousal samples (89%), and 281,735 dominance samples (74%)[1].

The authors calculated initial means and standard deviations for all ratings. They reversed 9% of participant ratings that showed a negative correlation with the mean rating for a given word. This addressed cases where a participant’s ratings conflicted with typical responses. For example, a participant might have indicated that a negative word like “jail” made them very happy. In simpler terms, if a participant gave ratings opposite to what most people would expect for certain words, those ratings were flipped to ensure that the overall patterns of responses aligned more intuitively with the expected emotions associated with those words.The means and standard deviations were then recalculated after rating reversals to reflect more intuitive patterns overall.

from sentibank import archive 

load = archive.load()
NoVAD = load.origin("NoVAD_v2013")

Note

The ratings include valence (V), arousal (A), and dominance (D) dimensions, each reported three times for every word. For each word, the overall mean (Mean.Sum), standard deviation (SD.Sum), and the number of contributing ratings (Rat.Sum) are provided. Additionally, group differences are reported with suffixes such as .M (male), .F (female), .O (older), .Y (younger), .H (high education), and .L (low education).

NoVAD (Warriner, Kuperman and Brysbaert, 2013; Warriner and Kuperman, 2014)
Word V.Mean.Sum V.SD.Sum V.Rat.Sum A.Mean.Sum A.SD.Sum A.Rat.Sum D.Mean.Sum D.SD.Sum D.Rat.Sum V.Mean.M V.SD.M V.Rat.M V.Mean.F V.SD.F V.Rat.F A.Mean.M A.SD.M A.Rat.M A.Mean.F A.SD.F A.Rat.F D.Mean.M D.SD.M D.Rat.M D.Mean.F D.SD.F D.Rat.F V.Mean.Y V.SD.Y V.Rat.Y V.Mean.O V.SD.O V.Rat.O A.Mean.Y A.SD.Y A.Rat.Y A.Mean.O A.SD.O A.Rat.O D.Mean.Y D.SD.Y D.Rat.Y D.Mean.O D.SD.O D.Rat.O V.Mean.L V.SD.L V.Rat.L V.Mean.H V.SD.H V.Rat.H A.Mean.L A.SD.L A.Rat.L A.Mean.H A.SD.H A.Rat.H D.Mean.L D.SD.L D.Rat.L D.Mean.H D.SD.H D.Rat.H
Loading... (need help?)

đŸ§č Processed Dictionary#

First-Pass Processing#

Two different processed dictionaries have been compiled for NoVAD:

  1. NoVAD_v2013_norm represents sentiment in a two-dimensional vector encapsulating valence and arousal. This approach aligns with Warriner and Kuperman’s (2014) assertion that an accurate portrayal of sentiment ‘requires a bidimensional perspective’ (p.16).

  2. NoVAD_v2013_boosted takes a more experimental approach. It seeks to condense sentiment into a single score by adjusting valence intensity based on arousal levels.

The creation of NoVAD_v2013_norm followed a simple process. Standardisation was achieved by applying min-max scaling to the original valence and arousal scores, initially ranging from 1 to 9. This process resulted in scaled scores ranging from -4 to 4. Subsequently, these scaled scores were organised into a list format, with the first element corresponding to valence and the second to arousal.

The NoVAD_v2013_boosted involved nuanced adjustments to the valence dimensions. These adjustments were primarily informed by insights from Warriner and Kuperman (2014). Previously, Warriner, Kuperman and Brysbaert (2013) had found that for both arousal/valence and arousal/dominance dimensions, quadratic (non-linear) relationships explained more variance than linear coefficients[2]. This suggests potential threshold effects, where the influence of arousal on valence or dominance is more pronounced at specific arousal levels. Building on this, Warriner and Kuperman (2014) focused on characterising affect using only valence and arousal dimensions. They highlighted that “higher arousal [accompanies] extreme values of valence” (Warriner and Kuperman, 2014, p.10).

Warriner and Kuperman’s (2014) observation that arousal modulates valence suggests considering arousal as a degree modifier for valence. The idea of using arousal as a degree modifier for valence stems from Hutto and Gilbert (2014), who identified that certain words modify sentiment degree on average by 0.293. This value was derived by ‘controlling the specific grammatical or syntactical feature presented as an independent variable’ (p.222). Similar to Hutto and Gilbert’s (2014) hierarchical application of 0.293 with weighted differences, NoVAD_v2013_boosted systematically integrates the degree modifier effect of arousal on valence scores. It does this based on Warriner and Kuperman’s (2014) chi-squared test analysis.

Specifically, Warriner and Kuperman (2014) used a chi-squared test to analyse word types across 100 bins. These bins were formed by the intersection of 10 arousal deciles (A1-A10, with A1 being the lowest arousal percentile and A10 the highest) and 10 valence deciles (V1-V10, with V1 being the most negative percentile, V5 neutral, and V10 most positive). As seen in Fig. 2 (p.11), this analysis revealed distinct patterns, demonstrating arousal strongly modulates the distribution of word types over valence ratings (pp.10-11).

Given these results, we employ a decile-based hierarchy to systematically enhance or dampen sentiment scores. For extreme valence values, we progressively intensify the score based on the arousal decile, amplifying positivity/negativity. Conversely, for neutral valence values, we systematically diminish the score based on the arousal decile, introducing a damping effect to neutrality.

The initial valence scores underwent a standardisation process, ranging from -4 to 4 through min-max scaling. Subsequently, the polarity of the following word groups was intensified:

  • Region [A9:A10, V1] showed chi-square residuals ~15. This contains highly aroused (top 20%), very negative words (<10%) like “abuse” and “pigheaded”. We intensify their negativity by subtracting 0.293. 646 words belong to this lexical space.

  • Region [A8, V1] showed residuals ~10. We intensify negativity of these less aroused (top 30-20%) but still very negative words (<10%) by subtracting 0.278 (=0.95x0.293). 241 words belong to this lexical space.

  • Region [A10, V10] showed residuals around ~10. We intensify the positivity of these highly aroused (>top 10%), very positive words (>top 10%) like “enthusiastic” and “orgasm” by adding 0.278. 239 words belong to this lexical space.

  • Region [A7:A10, V2] showed residuals ~5. We intensify negativity of these moderately aroused (top 40%) but still quite negative words (<20%, >=10%) by subtracting 0.264 (=0.9x0.293). 797 words belong to this lexical space.

  • Region [A7:A9, V10] showed residuals ~5. We intensify the positivity of these moderately aroused (top 40-10%) yet very positive words (>10%) by adding 0.264. 506 words belong to this lexical space.

The neutrality of the words were also dampened:

  • Region [A1:A4, V4:V7] showed residuals ~5. These are the large lexical spaces, with calmer and relatively neutral words like “foam” and “northern”. Indeed, comparing these words with the words in the region [A10, V4:V7], which are those highly aroused but relatively neutral words like “emotional” and “premonition”, these are more truly neutral.

  • We dampen valence of these words toward neutrality, adding 0.264 for negative words and subtracting 0.264 for positive words, except those already between -0.264 and 0.264. 1,256 words belong to this lexical space.

More negatives (1,684) than positives (745) were intensified, potentially mitigating the positivity bias often discussed in various psychological research (Warriner and Kuperman, 2014, pp. 2-5). Studies note negative words are less common (Clark and Clark, 1977), and thus perceived more potent (Rozin and Royzman, 2001; Baumeister et al., 2001). However, positive words appear more frequently, conveying less information (Garcia et al., 2012). By intensifying negatives, NoVAD_v2013_boosted may better reflect how people naturally react to negativity. The higher proportion of intensified negatives counterbalances the typically high informativeness of rare negatives.

In total, 2,429 positives/negatives were intensified, 1,256 neutrals were dampened, and 9,280 used the original scaled valence. This resulted in scores from -4.033 to 3.794, rescaled back to -4 to 4.

Note#

[1] Valence ratings were relatively consistent across participants, while arousal and dominance were more variable. For valence, scores were symmetrical around the median, with more extremely positive/negative words having smaller rating variability, as compared to valence-neutral words. This held for dominance too. In contrast, for arousal, calmer words elicited more consistent ratings than exciting words. Based on such findings, the authors commented that valence and dominance depend on magnitude (strength), while arousal on polarity.

[2] This observation is supported by the positive correlation between valence and arousal for positive words (mean valence rating > 6), and conversely, the negative correlation between valence and arousal for negative words (mean valence rating < 4). Additionally, the relationship between arousal and dominance also exhibits a U-shaped pattern. This is further evidenced by the positive correlation between dominance and arousal for high-rated dominance words (mean rating > 6) and the negative correlation between dominance and arousal for low-rated dominance words (mean rating < 4).