Emotion analysis of text documents is an emerging area of interest. Closely related is the visualization of emotions. Emotion analysis is the next step after sentiment analysis. In many respects, sentiment analysis is easier – there’s a single dimension of sentiment ranging from positive to negative. Emotion is a bit more difficult because there are many emotions and the first challenge is to define the range of emotions for the particular analysis. Pixar’s Inside Out settled for five emotions:
But there are more. Surprise was considered but dropped from Inside Out early on. Plutchik argues for eight, adding trust and anticipation on top of Ekman’s six (anger, disgust, joy, fear, sadness and surprise). There are also alternative taxonomies of emotion (e.g. pleasure/pain, excited/calm, etc). And emotions are not mutually exclusive – which is part of the plot line of Inside Out as sadness becomes mixed with other emotions. Here’s Plutchik’s Wheel of Emotions – with 8 different emotions, various degrees of emotion and also items between emotions:
Also there is the challenge of generating an emotion lexicon – that is – a long list of words and their associated emotions. Saif Mohammed uses a crowd-source approach to tag more than 10,000 words with Plutchik’s eight emotions.
Given a text corpus and an emotion lexicon, scores can then be created for different texts or different characters in the text. For example, Scharl et al created radar plots to show emotions associated with characters from Game of Thrones or Saif Mohammad profiles various texts, including love letters, hate mail, and Hamlet using bar and line charts.
However, it’s useful to consider the words themselves and how they relate to emotions. One approach is to consider the emotional intensity associated with a word – for example, terror is a more intense version of anxiety in the emotion of fear, as shown in the Atlas of Emotions.
But suppose you want to understand how a given word is associated with multiple emotions? Words such as death, money or freedom have complex associations. Given a set of words and their associations to emotions, this becomes a set visualization problem. Venn diagrams (and Euler diagrams) are a common type of set visualization.
For emotion words, each word belongs to some combination of eight emotions. However, representing all the possible combinations of eight sets is difficult to do with a traditional Venn diagram (below is a beautiful Venn diagram with seven sets – drag to flip it over). These high-order Venn diagrams are difficult to visually understand: to tell what the set membership is at any given point it’s difficult to trace around the complex looping shapes. Even with a dataset about color, a strange shade of bluish-greenish-greyish color isn’t very obvious which colors it is made up with at the perimeter:
Instead of a Venn layout, we can use a graph-based layout. In this case, each set category is a big anchor around the perimeter, and each emotion word is a node linked to the sets that it belongs to. Using a force-directed layout, each item ends up close to the sets that it belongs to as discussed in this paper Anchored Maps. This approach can work well with small number of items. When using a larger number of items, a few problems emerge, including: a) all the graph lines overlap and it becomes difficult to visually detangle; and, b) items can end up in the same location with completely different memberships (e.g. using anchors set in a square, it is feasible to have an item end up in the middle if the membership of that item corresponds to all four points on the square, or any two diagonally opposing points on the square).
So, we adapt the anchored map approach for visualizing emotion words. We start by setting the eight Plutchik emotions as anchors around the perimeter. Then we use a force-directed graph to locate all the words corresponding to their emotions (plus collision detection, so that words don’t overlap each other). Next we use color to indicate set membership using the same color scheme as Plutchik – words that are a combination of emotions have the average color of the corresponding emotions. Finally, we add font attributes to indicate set membership. A bouncy baseline for joy, w i d e l y s p a c e d letters for trust, underline for fear, exclamation mark (!) for surprise, light-weight letters for sadness, italic for disgust, blackletter for anger and SMALL CAPS for anticipation:
In the above visualization, clusters of words are immediately visible. For example, around the anchor word joy are emotion words such as love, daughter, special, beautiful and so on. We can see that there are many words around the anchor word t r u s t but few around anger or disgust.
You can also see the graph lines underneath connecting words back to their target emotions, for example, half way between anger and disgust are lying and angry – words associated with both anger and disgust. You can tell that both words lying and angry are associated with both anger and disgust by three different cues: 1) the graph lines underneath; 2) the color is magenta – halfway between red (anger) and purple (disgust); and 3) the font is both blackletter (anger) and italic (disgust).
These words that belong to multiple sets is where things get interesting. Near the middle of this plot are words that are all variants of muddy reddish-brownish-greenish colors. Color isn’t particularly effective when trying to communicate eight different dimensions. However, font attributes are useful at two levels of understanding relationships between words and the emotions they are associated with:
1) The variation in font attributes makes it very obvious when two words have the same set membership – they have the same format. If the formats are different then the words have different memberships. For example, ANXIOUS and ESCAPE have the same membership, while S W E E T ! and D E A L ! have the same membership but different from ANXIOUS and ESCAPE.
2) Furthermore, with some cognitive effort, you can decode the membership of any word. ANXIOUS and ESCAPE belong to anticipate (small caps) and fear (underline).
S W E E T ! and D E A L ! belong to surprise (exclamation mark), anticipate (small caps), joy (bouncing baseline) and trust (wide spacing).
While the top 250 words are nice for a readable graph at blog size, Saif Mohammad’s original analysis has 10,000+ words, of which some 4,463 words have at least one emotion associated with them. Below is an image of the same graph, with all 4463 emotion words. Click for full size image to zoom in. Clusters are still visible, font differentiation is still identifiable, and individual words can be visually decoded if needed.
There’s more discussion about set visualization, labels and font attributes in this recent paper: Typographic Sets: Labeled Set Elements with Font Attributes. The emotion word visualization in this paper uses color to represent membership in the additional sets of positive sentiment and negative sentiment – pushing the number of sets uniquely indicated up to 10 sets. This means that each word in the visualization indicates 11 different data attributes: the literal word itself, two sentiments, and eight emotions.