Imagine you have a popular blog and you want to know the mood of all the comments on a specific post.
We could easily setup a Neural Network with Brain.js (I wrote an article about that some time ago), but we have an easier alternative for sentiment analysis: the AFINN Dictionary.

The AFINN Dictionary is a list of english terms (but you can easily find the other languages counterparts) manually rated for valence with an integer between -5 and 5 by Finn Årup Nielsen.
When a term is high rated (for instance, “outstanding" is rated 5), it means that it’s extremely positive.
Conversely, when a term is low rated, it means that it’s extremely negative.

So if a given text after the sentiment analysis gives 5, 10, 2 as a result, it means that has a positive mood.
Otherwise, it has a negative one.

Building the Analyzer

First of all, we need the AFINN Dictionary. I’ve made a JSON version of it which is accessible here.
As you can see, it is an huge object where each key is a term with its rating as a value, let’s see a sneak peek:

{
  "abandon": -2,
  "abandoned": -2,
  "abandons": -2,
  "abducted": -2,
  "abduction": -2,
  "abductions": -2,
  "abhor": -3,
  "abhorred": -3,
  "abhorrent": -3,
  "abhors": -3,
  "abilities": 2,
  "ability": 2,
  "aboard": 1,
  ...
}

Now we need to create the analyzer itself.
The algorithm is pretty simple: we just need to tokenize our input text and associate each word to its corresponding AFINN value.

So, first of all, let’s create a function which will tokenize our text:

function tokenize(text) {
  return text
          .toLowerCase()
          .split(" ");
}

Bonus tip: I am a huge RegEx fan, but I’ve noticed that splitting a large text using a RegEx is really slow. For that reason, in the function above I’m splitting it using a whitespace char, which is a bit less elegant, but works a lot faster! (benchmark)

Next, we need to delete every non-word character. For instance, now we may have an array which looks like this:

["I", "am", "so", "happy!"]

But as you can see, the "happy!" string contains an exclamation point, which needs to be deleted. Otherwise, we won’t be able to find the word “happy” inside our AFINN object:

function deleteUselessChars(word) {
  return word.replace(/[^\w]/g, "");
}

I know we previously said that RegEx are slow. But here it comes an interesting concept in computer science: Clean Code.
We could have wrote the function above using multiple replace statements, but how dirty would it be as a solution?
Unlike the tokenize function, here we’re voluntarily adopting a less performant solution in order to gain maintainability, readability and a cleaner code… and it’s totally worth it!

If you’re not familiar with RegEx, inside our word.replace statement we just said to detect every non-word character and delete it.

Great, now we need to assign a rate for each word using the AFINN Dictionary. Here it comes another question: should we use loops, recursion, higher-order-functions (the awesome .map method)?

Answer is: it depends.
Loops are extremely efficient, but difficult to read and maintain.
Recursion is easier to read and maintain, but has some known memory-related and performance issues.

I’d personally go with the .map method for two reasons:

  1. It helps adopting a functional approach to problems.
  2. It’s memory-safe.
  3. It makes code easier to read.

Again, we’re not adopting the fastest solution for a reason: writing a better codebase.

function rateWord(word) {
  return (word in AFINN) ? AFINN[word] : 0;
}

Hey hey hey what’s happening here?! We’re just declaring a simple rateWord function that checks if the given word is part of the AFINN Dictionary. If that is the case, we’re just taking is rate from the Dictionary, otherwise we’re assigning 0 as rate.
So what do we have now?

tokenize("I am so happy today! Everything is fine.")
  .map(deleteUselessChars)
  .map(rateWord);

We’re now able to take the advantages of Functors and concatenate multiple map methods in order to retrive an array of values representing every word in our text:

const res = tokenize("I am so happy today! Everything is fine.")
              .map(deleteUselessChars)
              .map(rateWord);

console.log(res); // => [ 0, 0, 0, 3, 0, 0, 0, 2 ]

So now we have an array of integers that can be easily summed... and that sum will be our result!

function sum(x, y) {
  return x + y;
}

And that’s actually the last function that we need. Let’s put everything together using the reduce method:

const AFINN = require("./AFINN.json");

function tokenize(text) {
  return text
          .toLowerCase()
          .split(" ");
}

function deleteUselessChars(word) {
  return word.replace(/[^\w]/g, "");
}

function rateWord(word) {
  return (word in AFINN) ? AFINN[word] : 0;
}

function sum(x, y) {
  return x + y;
}

function analyze(text) {
  reutrn tokenize(text)
          .map(deleteUselessChars)
          .map(rateWord)
          .reduce(sum);
}

And that’s all! Let’s make some quick sentiment analysis:

analyze("I feel so grateful for my life!"); // => 3
analyze("Today is the worst day ever."); // => -3
analyze("You stupid!"); // => -2
analyze("It's ok."); // => 0
analyze("I wanna shoot somebody in Call Of Duty."); // => -1
analyze("Love is old, love is new, love is all, love is you."); // => 12

It works perfectly!
So what do we learnt building this simple sentiment analyzer?

Functional approach help us writing a more maintainable and testable code.
When you’re working as a programmer, you’re spending most of the time reading your existing code. Wasting time reading bad written code will make you less productive and will easily drive you to introduce some bugs in your codebase.
Every single function that we wrote today is a pure function which can be easily tested in a deterministic way.
Code flow is easy to follow and to extend, which will allow us to be more productive when we need to make some changes or add some new features to the existing codebase.

Did you like this article? Consider becoming a Patron!

This article is CC0 1.0 (Public Domain) licensed.