Aigents+#

A lexicon optimised for the social media posts relevant to the cryptocurrency 🪙 domain.#

Summary

Composition:

  • Approximately 12k n-gram lexicon entries (8.2k negative and 3.8k positive).

  • Binary Class (positive, negative)

Creation Methodology:

  • Collected 100,000 posts (tweets and subreddit submissions) over the six month period

  • From such posts, the authors selected n-grams associated with positive and negative sentiments to create ‘Aigents’ dictionary containing 8.2k negative and 3.8k positive lexicons.

  • The ‘Aigents’ dictionary was updated and fine-tuned ‘to get in sync with cryptocurrency domain terminology and jargon’ (Raheman et al., 2022, p.7), resulting in a ‘Aigents+’ dictionary.

Evaluation: The performance of Aigents+ has been evaluated by comparing the performance of 21 different models (incl. 15 BERT-based models). The performance has been evaluated using the Pearson correlation coefficient across 490 reference posts. A simple rule-based logic with Aigents+ dictionary (0.57) outperformed finBERT (0.32) which is a BERT model pre-trained on the financial domain.

Usage Guidance: Useful for sentiment analysis of informal text in the cryptocurrency domain. Access via sentibank.archive.load().dict("Aigents+_v2022")

📋 Introduction#

Raheman et al. (2022) studied how the different sentiment metrics are correlated with the price movement of Bitcoin: They suggested social media feeds can influence cryptocurrency markets because many active traders publish technical analyses and market thoughts online - and when influential voices share market analyses, it sways broader opinion which gets reflected in price movements. To track such relationships, Raheman et al. (2022) created a cryptocurrency domain-specific dictionary called Aigents+.

📚 Original Dictionary#

ver.2022 (original)#

Though not explicitly stated, we can reasonably assume the inaugural Aigents dictionary was generated by automatic extraction of frequently occurring n-grams from a corpus of 100,000 unlabeled social media posts (spanning 77 Twitter and Reddit sources from July to December 2021). However, it is unclear how the authors labelled sentiments for the entire lexicons. This initial unsupervised lexicon provided broad coverage, laying the foundation for the fine-tuned Aigents+.

ver.2022 (fine-tuned)#

To further fine-tune Aigents to the cryptocurrency domain, Raheman et al. (2022) manually annotated a sample of 490 tweets from 5 random public feeds (on a -1 to 1 scale) using two independent raters.

By comparing Aigents’ predictions on this “ground truth” sample, the authors identified terms frequently misaligned by over 0.5. The lexicon was then refined by revising entries to better align with nuanced vocabulary and jargon of cryptocurrency discussion.

from sentibank import archive 

load = archive.load()
aigents = load.origin("Aigents+_v2022") 
Aigents+ (Raheman et al., 2014)
lexicon label
Loading... (need help?)

🧹 Processed Dictionary#

From the original csv, no notable changes were made.

Note

As the originating methodology for labelling the 12,000 lexicons remains unspecified, rigorous examination of sentiment labels is critical. We welcome your help refining this preliminary lexicon through thoughtful contributions. Please share any insights on potential mislabeled or questionable terms via opening an issue here.