General Inquirer#

Lexicon categorising words across multiple psycholinguistic šŸ§  dimensions including semantics, values, motivations and much more.#

Summary

Composition:

  • Approximately 11.8k unigram lexicon entries

  • Each lexicons are ā€œmappedā€ under 182 semantic and linguistic categories

  • Multi-Class Multi-Label (NOTE: For simplicity of application, the processed dictionary currently labels lexicons with only ā€˜Positiveā€™ or ā€˜Negativeā€™ tags. )

Creation Methodology:

  • Manual dictionary construction starting in 1960s, coming from multiple sources including Harvard IV-4, Lasswell Values and Social Cognition research

    • Harvard IV-4: Developed in the 1960s and 1970s to represent concepts from major social science theories at the time, including the work of theorists like Lasswell, Bales, Parsons and Osgood

    • Lasswell Values: Created by Lasswell and colleagues to represent values across domains such as power and affection.

  • The main aim was to create valid representations of specific constructs so experts would agree on category entries.

  • Several iterations and expansions over decades, with rigorous multi-cycle human validation

Evaluation: The General Inquirer does not specify the exact evaluation methods. However, the passage states ā€˜for some studies, overall Inquirer codings correlated with those of expert manual coders about as well as expert coders correlated with one anotherā€™. This suggests the Inquirer correlations with experts were reasonably strong, and in the same range as inter-coder reliability.

Usage Guidance: Provides broad psycholinguistic assessment. Combine with modern lexicons for sentiment tasks. Access via sentibank.archive.load().dict(ā€œHarvardGI_v2000ā€)

šŸ“‹ Introduction#

Arising from the work of Stone et al. (1962), the Harvard General Inquirer serves as a fundamental ā€˜mapping tool,ā€™ categorizing each lexicon based on three primary sources. The sources include Harvard IV-4, Lasswell Value, and Semin and Fiedler (1988). The categories provide rich semantic classifications of words that can enable more detailed text analysis. With over 180 categories, the Inquirer dictionaries offer lexical resources beyond basic sentiment or topical classification.

šŸ“š Original Dictionary#

Harvard IV-4#

The Harvard IV-4 categories cover a wide range of lexical classifications including semantic dimensions, social institutions, roles and relations, cognition, communication, and more. These move beyond basic sentiment to enable more nuanced text analysis.

Category

Description

Sub-categories

1. Osgood Dimensions

Words categorised along semantic differential dimensions

Pstv, Ngtv, Strong, Weak, Active, Passive

2. Pleasure/Pain

Words related to emotions, feelings and arousal

Pleasur, Pain, Feel, Arousal, EMOT, Virtue, Vice

3. Over/Understatement

Exaggerated versus tentative language

Ovrst, Undrst

4. Institutions

Language associated with specific social institutions

Academ, Doctrin, Econ@, Exch, Exprsv, Sports, Arts, Legal, Milit, Polit@, Relig

5. Social Relations

Words about social roles, groups and interactions

Role, COLL, Work, Ritual, SocRel

6. Social Categories

Lexical categories based on social/demographic groups

Race, Kin@, MALE, Female, NonAdlt, HU, ANI

7. Places

Words referring to locations and spatial relationships

Place, Social, Region, Route, Aquatic, Land, Sky

8. Objects

Lexical classifications of objects and artefacts

Object, Tool, Food, Vehicle, BldgPt, ComnObj, NatObj, BodyPt

9. Communication

Words related to communication modes and media

ComForm, Say

10. Motivation

Language related to needs, goals and achievement

Need, Goal, Try, Means, Persist, Complet, Fail

11. Processes

Words associated with processes, change and movement

NatrPro, Begin, Vary, Increas, Decreas, Finish, Stay, Rise, Exert, Fetch, Travel, Fall

12. Cognition

Lexical categories related to thought, knowledge and reasoning

Think, Know, Causal, Ought, Perceiv, Compare, Eval@, Solve, Abs@, Quality, Quan, FREQ, DIST, Time@, Space, Rel, COLOR

13. Pronouns

References to 1st, 2nd and 3rd person points of view

Self, Our, You, Name

14. Yes/No

Words indicating agreement, disagreement, negation and interjection

Yes, No, Negate, Intrj

New categories based on social cognition#

The categories are based on social cognition research focused specifically on categorising verbs and adjectives: Verbs based on whether they are interpretative, descriptive or state; Adjectives based on interpersonal versus individual. These categories reveal perspective and social dynamics.

Category

Description

Sub-categories

15. Verb Types

Classification of verbs based on interpretive explanation of an action, description of an action, and description of mental/emotional states

IAV, DAV, SV

16. Adjective Types

Categorization of adjectives as interpersonal vs individual

IPadj, IndAdj

Lasswell Value#

The Lasswell value categories provide a sociological classification of language into four deference domains - power, rectitude, respect, and affiliation - and four welfare domains - wealth, well-being, enlightenment and skill. This provides resources for understanding values and motivations.

Category

Description

Sub-categories

17. Power

Words related to influence, authority and politics

PowGain, PowLoss, PowEnds, PowAren, PowCon, PowCoop, PowAuPt, PowPt, PowDoct, PowAuth, PowOth, PowTot

18. Rectitude

Moral and ethical language

RcEthic, RcRelig, RcGain, RcLoss, RcEnds, RcTot

19. Respect

Words associated with status, honour, recognition and prestige

RspGain, RspLoss, RspOth, RspTot

20. Affection

Language related to relationships and emotions, with a particular focus of love and friendship

AffGain, AffLoss, AffPt, AffOth, AffTot

21. Wealth

Economically oriented words

WltPt, WltTran, WltOth, WltTot

22. Well-Being

Words related to health, safety and security

WlbGain, WlbLoss, WlbPhys, WlbPsyc, WlbPt, WlbTot

23. Enlightenment

Words associated with knowledge, insight and understanding of personal and cultural relations

EnlGain, EnlLoss, EnlEnds, EnlPt, EnlOth, EnlTot

24. Skill

Language related to competencies and professions, with a particular focus on the arts

SklAsth, SklPt, SklOth, SklTot

25. Other

Additional Lasswell lexical categories

TrnGain, TrnLoss, TranLw, MeansLw, EndsLw, ArenaLw, PtLw, Nation, Anomie, NegAff, PosAff, SureLw, If, NotLw, TimeSpc, FormLw

from sentibank import archive 

load = archive.load()
harvard_gi = load.origin("HarvardGI_v2000")
General Inquirer
Entry Source Positiv Negativ Pstv Affil Ngtv Hostile Strong Power Weak Submit Active Passive Pleasur Pain Feel Arousal EMOT Virtue Vice Ovrst Undrst Academ Doctrin Econ@ Exch ECON Exprsv Legal Milit Polit@ POLIT Relig Role COLL Work Ritual SocRel Race Kin@ MALE Female Nonadlt HU ANI PLACE Social Region Route Aquatic Land Sky Object Tool Food Vehicle BldgPt ComnObj NatObj BodyPt ComForm COM Say Need Goal Try Means Persist Complet Fail NatrPro Begin Vary Increas Decreas Finish Stay Rise Exert Fetch Travel Fall Think Know Causal Ought Perceiv Compare Eval@ EVAL Solve Abs@ ABS Quality Quan NUMB ORD CARD FREQ DIST Time@ TIME Space POS DIM Rel COLOR Self Our You Name Yes No Negate Intrj IAV DAV SV IPadj IndAdj PowGain PowLoss PowEnds PowAren PowCon PowCoop PowAuPt PowPt PowDoct PowAuth PowOth PowTot RcEthic RcRelig RcGain RcLoss RcEnds RcTot RspGain RspLoss RspOth RspTot AffGain AffLoss AffPt AffOth AffTot WltPt WltTran WltOth WltTot WlbGain WlbLoss WlbPhys WlbPsyc WlbPt WlbTot EnlGain EnlLoss EnlEnds EnlPt EnlOth EnlTot SklAsth SklPt SklOth SklTot TrnGain TrnLoss TranLw MeansLw EndsLw ArenaLw PtLw Nation Anomie NegAff PosAff SureLw If NotLw TimeSpc FormLw Othtags Defined
Loading... (need help?)

šŸ§¹ Processed Dictionary#

First-Pass Processing#

Drawing on the three resources, the General Inquirer introduced a new category called ā€˜Valenceā€™, which provides positive and negative classifications to enable simple sentiment analysis. The Positive category contains 1,915 entries compiled from the source dictionaries, while the Negative category has 2,291 entries. By aggregating multiple existing lexicons, the Valence category offers broad coverage for classifying sentiment.

From the 4,206 lexicons, there were 381 unigrams with duplicate entries[1]. The majority of duplicates displayed consistent sentiment labels, allowing them to be consolidated into a single entry. However, 16 entries with conflicting labels[2] were deleted due to their contextual ambiguation with sentiment signals. Consequently, the refined dictionary now comprises 3,610 entries.

Note#

[1] Below is the frequency table showing the distribution for duplicate entries.

Duplicates

Unigrams

2

255

3

77

4

32

5

12

6

3

7

2

[2] The entries with a conflicting labels were: ā€˜arrestā€™, ā€˜boardā€™, ā€˜dealā€™, ā€˜evenā€™, ā€˜fineā€™, ā€˜funā€™, ā€˜handā€™, ā€˜helpā€™, ā€˜hitā€™, ā€˜laughā€™, ā€˜makeā€™, ā€˜matterā€™, ā€˜mindā€™, ā€˜orderā€™, ā€˜particularā€™ and ā€˜passā€™.