Analyse Dictionary#

The analyze().dictionary module provides insights into the structure and content of sentiment lexicons. Here’s a quick example:

from sentibank import archive
from sentibank.utils import analyze

analyze = analyze()
analyze.dictionary(dictionary="MASTER_v2022")
Hide code cell output




{
    'Dictionary Type': 'categorical (multi-label)',
    'Sentiment Score': {
        'labels': [
            'Negative',
            'Uncertainty',
            'Constraining',
            'Positive',
            'Litigious',
            'Weak_Modal',
            'Strong_Modal'
        ],
        'label frequency': {
            'Negative': 2355,
            'Litigious': 905,
            'Positive': 354,
            'Uncertainty': 297,
            'Constraining': 184,
            'Weak_Modal': 27,
            'Strong_Modal': 19
        },
        'multi label frequency': {
            ('Negative', 'Litigious'): 154,
            ('Negative', 'Uncertainty'): 40,
            ('Negative', 'Constraining'): 31,
            ('Uncertainty', 'Weak_Modal'): 27,
            ('Litigious', 'Constraining'): 9,
            ('Negative', 'Litigious', 'Constraining'): 6,
            ('Uncertainty', 'Constraining'): 5,
            ('Positive', 'Strong_Modal'): 4,
            ('Weak_Modal', 'Constraining'): 3,
            ('Uncertainty', 'Weak_Modal', 'Constraining'): 3,
            ('Positive', 'Litigious'): 1
        }
    },
    'Sentiment Lexicon': {
        'general': {
            'nouns': 1650,
            'verbs': 1422,
            'adjectives': 513,
            'adverbs': 262,
            'prepositions': 16,
            'miscellaneous': 12,
            'determiners': 1,
            'conjunctions': 0,
            'pronouns': 0,
            'numerals': 0,
            'particles': 0,
            'emos': 0
        },
        'granular': {
            'NN (noun, singular or mass)': 830,
            'NNS (noun, plural)': 741,
            'JJ (adjective)': 495,
            'VBG (verb, gerund or present participle)': 434,
            'VBD (verb, past tense)': 341,
            'VB (verb, base form)': 338,
            'RB (adverb)': 257,
            'VBN (verb, past participle)': 152,
            'VBZ (verb, 3rd person singular present)': 150,
            'NNP (noun, proper singular)': 79,
            'IN (conjunction, subordinating or preposition)': 16,
            'JJS (adjective, superlative)': 12,
            'VBP (verb, non-3rd person singular present)': 7,
            'MD (verb, modal auxiliary)': 7,
            'JJR (adjective, comparative)': 6,
            'RBR (adverb, comparative)': 4,
            'FW (foreign word)': 2,
            'WRB (wh-adverb)': 2,
            'RBS (adverb, superlative)': 1,
            'UH (interjection)': 1,
            'WDT (wh-determiner)': 1
        },
        'misc': ['MD', 'FW', 'UH', 'WRB']
    }
}

This will provide you with a summary of the sentiment scores and lexicon structure. You can further explore and analyse other sentiment dictionaries using the same approach.

Utilising analyze.dictionary

The analyze.dictionary covers both holistic sentiment statistics as well as detailed lexical category analysis. Together these provide both the forest and the trees - from overall sentiment trends down to word type composition:

  • Dictionary Type: Indicates if the sentiment is measured via labels (discrete/categorical) or scores (continuous). The type includes categorical, discrete, continuous, categorical (multi-label), and discrete (multi-label).

  • Sentiment Score: Distribution statistics of sentiment labels or scores. For labels, it summarises the frequency of labels within the dictionary. For scores, it summarises the overall sentiment distribution, such as frequency, mean, median, range, and standard deviation.

  • Sentiment Lexicon: Breaks down lexicon by its Parts-of Speech (POS). Provides frequency counts for categories like nouns, adjectives, verbs, emoticons, and more. Useful for understanding lexicon composition.

    • General POS Tags: A general overview of POS tags using simplified Universal POS tagging system influenced by NLTK. Includes adjectives, adverbs, conjunctions, determiners, emos (emoticons and emojis), nouns, numerals, particles, prepositions, pronouns, verbs, miscellaneous.

    • Granular POS Tags: More fine-grained lexical breakdown using OntoNotes(ver5.0) tagging system. Includes singular/plural nouns, comparative/superlative adjectives, verb tenses, and more. Enables deeper lexical analysis.

    • Miscellaneous POS: Catches any rare or unknown Part-of-Speech tags for completeness.

Warning

Please note that the input for analyze.dictionary must be either predefined dictionary identifier or processed dictionary loaded with sentibank.archive.load().dict. Also note that SenticNet_v{year}_attributesare currently not compatible with such module.