About#

We introduce πŸ—ƒοΈ sentibank, a large-scale unified database consolidating 15 original sentiment dictionaries and 43 preprocessed dictionaries, spanning 7 genres and 6 domains.#

Overview#

Sentiment Analysis#

Sentiment analysis is the automated process of identifying and extracting subjective information such as opinions, emotions, and attitudes from textual data. It has become an increasingly critical technique across many social science domains such as business, politics, and economics.

While deep learning models have excelled in achieving high accuracy, often surpassing simpler lexicon models in sentiment analysis tasks (Al-Qablan et al. 2023), their inherently opaque nature poses challenges for applications in high-stakes domains like government policy making or mental health diagnosis, where transparent and interpretable decision-making is crucial (Rudin 2019). Recognising the continued importance of rule-based sentiment analysis, particularly in computational social science fields where interpretability is paramount, improving rule-based SA remains vital.

sentibank#

Rule-based sentiment analysis relies on expert-curated lexicons containing words with pre-assigned sentiment scores, as human expertise is required to accurately annotate the sentiment of words across contexts. However, effectively applying these lexicons in rule-based systems faces several challenges:

  • πŸ” Fragmented resources requiring laborious integration

  • 🌐 Lack of verified, high-quality lexicons spanning domains

  • πŸ”“ Inaccessibility limiting transparency and advancement

By consolidating lexicons into an open and public platform, sentibank enables users to tap into a vast knowledge base of sentiments with ease. Here’s some of the key capabilities:

  • πŸ—ƒοΈ 15+ (and counting) sentiment dictionaries spanning domains and use cases

  • 🧠 Curation of dictionaries provided by leading experts in sentiment analysis

  • πŸ“– Access original lexicons and preprocessed versions

  • ✏️ Customise existing dictionaries or contribute new ones

  • πŸš€ Production-ready for integration into analyses