đź“‹ Introduction#

đź“š Original Dictionary#

🧹 Processed Dictionary#

The original SO-CAL dictionary contained words with specific regex patterns (e.g. “[#can#]_not_(put)_#NP?#_down”) chosen to enhance the associated SO-CAL algorithm’s capabilities. However, to maximise reusability, 177 words reliant on the algorithm were removed.

This yielded 5,920 lexicons, with 269 duplicates where words appeared multiple times. 179 duplicates had identical sentiment scores, indicating consistency. Among the remaining 90 duplicated words, 83 agreed on polarity (positive/negative), so their scores were averaged. However, 7 words - “upset”, “restraint”, “better”, “rival”, “righteous”, “boast”, “smash” - had duplicates with conflicting scores. As these were ambiguous, they were removed entirely, leaving 5,913 terms. Prior to inclusion, the scores were min-max scaled to a range of [-4,4].