The 19 Dimensional Word Cloud of Pokémon

If you want to catch all 719 Pokémon, the serious gamer needs to know their different skills and abilities. At a high-level, Pokémon classify into 18 different types, such as fire, water or grass. For example, Pikachu is an electric type, Charmander is fire, Squirtle is water and so on. This is important because type determines a Pokémon’s advantage in battle, e.g water-type do well against fire-type.

pokemonplastic

Pokémon may have more than one skill type, for example, Charizard is both fire and flying; while  Gengar is both ghost and poison. This makes Pokémon interesting to the visualization researcher: there are 18 different categories and all those different categories can be combined in different ways. So, an interesting visualization question is how to represent all those different possible combinations?

Visualizing all the combinations of types of Pokémon

Of course, the Pokémon community has made various tables and diagrams to represent this information. Given that there are cute images of each Pokémon, one fun way is to organize these images. Furthermore, the color of a Pokémon is usually related to its type, for example fire type Pokémon tend to be red, water type tend to be blue. Here’s an awesome poster of all Pokémon organized by color. (original by Marie Chelxie Gomez, via polygon):

pokemon_bymariechelxiegomez

Pokemon sorted by color, and color is related to type.

But the above image doesn’t make explicit the types nor the combinations of types. Here’s a great cross-tabulation of all the pairwise combinations of Pokémon posted on reddit a few years ago (http://bit.ly/2hxOcpd):
pokemonmatrix_

Pokémon organized into a table by combination of type. Click for big.

In this case, the combinations are depicted. But you have to zoom in: there’s a lot of empty space since there are a lot of type-combinations that don’t exist. Also, if you don’t happen to know the names that correspond to each picture, you can’t identify a Pokémon by its image (for example, I don’t know all the Pokémon, but my son does). So how could you show all the text, have a readable layout, and indicate all the different types?

Word Clouds

Word clouds are pretty efficient at one thing: packing a lot of words into a tight space. Here’s a word cloud of the 151 first generation Pokémon made with Wordle:
pokemonwordle2
Unfortunately, word clouds have lots of problems. The visualization community isn’t fond of word clouds. In the most typical way that they are used, they convey only the data of the word itself, and usually indicate one more data attribute by setting the size of the word to the frequency of that word in some document. Usually, the position, the color and the orientation are random and convey no information.

“Wordles [i.e. word clouds] are driven by a single minded fetish for filling space.”
– André Skupin, Mapping Text, 2010.
So, instead, consider the opposite question:

How many different dimensions of data could be represented in the words?
This is an interesting question. The answer you get today will depend who you ask:

Visualization Researcher: 5-6 In visualization, one can refer to the standard visualization attributes – perhaps 6  or so are commonly used: x position, y position, size, hue and brightness, plus the word itself. In fact, you can look at all the text visualizations at textvis.lnu.se – if you should happen to count up all the examples you’ll find x,y,size,hue and intensity account for more than 95% of the encodings used on text – and in most cases, only 0-2 of these are actually used (see Table 2 here).

Cartographer:
6-8. Cartographers come from a different perspective and have been using visual attributes to add information to labels for many centuries: italics for water, heavy text for large cities, s p a c e d  o u t text mountain ranges, and so on. One of my favorites is an Ordnance Survey map from the 1920’s:
5d-townlabels

Town labels indicate more than five dimensions of data (click for big).

Each city label indicates: 1) the name of the city via text; 2-3) latitude and longitude via x,y position; 4) town vs. village via uppercase/lowercase; 5) county towns via italics; 6) population category via font size; and 7) country via font-family (serif for U.K. slab-serif or serif variant for Scotland). That’s impressive.

Let’s go further. Certainly more than 10 could be done. How about 15? Or 20?

Why? Sometimes its a good to explore possibilities. Even if the result isn’t pragmatic, it forces new ideas and force new strategies to be considered. Some of these might even be useful in some other context in the future.
The challenge is that each visual attribute needs to be able to be combined with another attribute. For example, one dimension of data might be set to size and another set to shape. However, as size decreases, all the shapes end up being ambiguous dots. So, we need to find a lot of different visual attributes that can work together.

16 Types of Pokémon (first generation)

Let’s consider a set-type visualization looking at Pokémon. In the first version of Pokémon, there are 151 different Pokemon of 16 different skill types (aka first generation Pokémon).

X,Y Layout

Here’s a quick visualization of the 151 Pokémon Generation 1 arranged using a graph layout. The large underlying words indicate the type:
pokemon-plaingraph

First generation Pokémon, arranged by type (shown underneath in large type).

Each Pokémon is placed in proximity to its type(s). Pokémon out near the perimeter belong only to the one type they are close to. Those belonging to more than one type are placed in between the types they belong to. This uses only two dimensions (the spatial dimensions: x,y). Unfortunately, it is highly ambiguous for Pokémon near the center: you can’t tell which types they belong too. If you added lines, it would help, but there could be too many lines all criss-crossing making it difficult to distinguish.

Color

Instead, we can use some other visual attributes to identify type. Many Pokémon guides use color: it can be fairly intuitive, e.g. green for grass type, red for fire type. Here’s the same plot using type colors from Bulbapedia:
pokemon-coloronly

First generation Pokémon indicating type by color. Some colors are ambiguous.

This works OK for the Pokémon around the perimeter but not the Pokemon of multiple type in the interior. Pokémon of more than one type can  end up with muddy, hard to distinguish, hard to decode colors, e.g.:
Purple + green = greyish brown
Orange + blue = brownish grey
And so on.
So, when you look at Dodrio, you see its greyish purple – is that Flying+Dragon? Or is it Ghost+Normal? Or something else?
The reason for this problem is that the original palette of 16 colors is being used to encode 16 separate categories. Attempting to combine these colors results in 128 possible colors (16×16/2). Unfortunately, humans are not good at readily identifying 128 different colors (e.g. see Colin Ware‘s or Tamara Munzner‘s books on visualization). Another way to think about color is as a three dimensional space, such as a combination of red, green and blue; or as a combination of hue, brightness and saturation. Trying to squish 16 different dimensions into a 3 dimensional space is problematic at best.

16 Different Visual Attributes

What’s needed are 16 different dimensions of visual attributes, all of which can be combined together in any order, and unambiguously deciphered. Since we need so many different visual attributes, we need to consider many possibilities, including common visual attributes (e.g. rotation, scale, texture, motion, shape, shadow); and font-specific attributes too (e.g. e.g. bold, italic, case, underline, shifting baseline, punctuation, serif style, outline).  Some of these have to be discarded, for example: 1) shadows on text reduces text legibility – since we’re using text we’d rather not make it illegible; 2) shape isn’t easy to combine with text so skip that; 3) motion attributes such as blink or wobble are so visually dominant they can be annoying, so we’d rather not use them.
Here’s a grid showing 16 different variations of labels across the top row and first column. the middle of the grid shows all 128 pair combinations. Each cell is uniquely different from its neighbors. With some cognitive effort, the viewer can determine what attribute is different in each case:
18_visual_attributes

Many different visual attributes for labels, and all the pair combinations (click for big).

The attributes used are plain serif, upper case, shifting baseline, surround quotes, tracking (i.e. spacing), exclamation mark, underline, boxy version of font, bold, narrow version of font, italic, deep brackets on serifs of font, wide serif version of font, low x-height version of font, outline version of font, tall stretched version, rotated, horizontal stripe texture, vertical stripe texture.
The same approach can be applied to the Pokémon visualization:
pokemongen1-nice2

151 first generation Pokemon with type indicated by unique visual attributes (click for big).

Each type now has a specific visual attribute associated with it. Small caps for fighting, slightly rotated text for flying, italics for poison, and so on. Now it is possible to create some of the interesting combinations, for example:
pokemon-samplecombos
In each case you can see how the different attributes can be combined: Parasect gets the combination of the narrow font for Bug and the wide brackets for Grass (wide brackets, i.e. the fat parts on the letter like the bottom of the r). Kabutops gets the blocky font for Rock and the low-x-height font for Water.  Mr. Mime is the only Pokemon that gets the combination of vertical stripes and horizontal stripes to end up with plaid.
Note that Ice was originally a wide serif which seemed hard to see, so a bumpy edge was added for Ice as well to further differentiate it.

18 Types of Pokémon

The visualization above has only 16 different types. Pokémon aficionados know that two more types were introduced with the second generation of Pokémon, for a total of 18 different types (and more than 300 possible two-way combinations). You may have even picked out Magneton and Magnemite have underlines underneath their labels in the plot above, even though there is no underline showing in the legend (underline is for steel type which didn’t exist in first generation Pokémon but was retroactively added). Here’s the same visualization, now showing all 719+ Pokémon across 18 types. Click for a big version.

pokemonall2

All Pokémon by Type. Click for big.

So, with 18 different visual attributes, plus the text itself, this word cloud represents 19 different data dimensions.

Questions?

So this is a new kind of strange visualization of Pokémon. There may be many questions:
Pokémon  Questions
Variants. Some Pokémon may can have different combinations. For example, the Pokémon Rotom can be Electric+Fire; or Electric+Flying; or Electric+Ghost and so on. Since I’m not a Pokémon expert, I wasn’t expecting this (the last time I played was on a GameBoy Color). I considered representing a single Rotom with attributes for every possible combination – which isn’t correct; so instead, Rotom occurs multiple times in the visualization, each with some appended text to indicate which flavor (e.g. Rotum-EFr for electric-fire variant, Rotum-EFl for the electric-flying variant)

Data Errors
. As mentioned, I’m not a Pokémon expert. I just took data from Bulbapedia used it as is. I don’t know why some Pokémon are a single type such as fire and some have two types, one of which is normal, such as Bibarel, which is listed as water+normal. To me, it seems that water+normal should be the same as water? The visualization just draws the data, and no guarantees that I cut and paste the data correctly.
Visualization Questions
24 Dimensions: In addition to the 19 visual dimensions listed above; each label also has position and color. Position uses x,y spatial location for 2 more dimensions, and color uses variations in hue, brightness and saturation (or variation in  red, green and blue) for another 3 dimensions. That’s on the order of 24 visual dimensions. But from a data perspective, it’s only 19 hence the title is 19 dimensions.
Many-way combinations: In the visualization, the fonts can be assembled in any combination. In the case of Pokémon, it turns out that any single Pokémon can only belong to at most two different types. From a combinatorics perspective, with only 1 way and 2 way combinations there are only a few hundred possible type combinations. However, from a visualization perspective, this palette of 18 visual attributes can be combined in any combination: 2 way, 3 way, 5 way combinations. If Pokémon version 11 has new characters with 5-way combinations, this particular visualization will accommodate it: all the millions of possible font-combinations can be constructed using this approach.
Does it Work? In order to understand which types any particular Pokémon belongs to, it takes some cognitive effort to decode it. A thorough evaluation would require user studies and they have not been done. From a design perspective, I was unsatisfied with some of the font variants: for example the wide serif variant didn’t stand out. So, to enhance the differentiation, I added a bumpy edge to the wide serif (i.e. the strange font for the Ice type Pokémon). Dark type use vertically oriented text, which really jumps out and isn’t particularly easy to read. Electric and Normal use punctuation (exclamation for Electric, surrounding dashes for Normal), which seem a bit arbitrary, although they might be detectable without actually reading the text. And so on.
Typography Questions 
Different font per Pokemon: I was asked: “Why didn’t you use different font types for each Pokémon type?”. Font type is similar to color. You could use Old English for Fighting type and Comic Sans for Psychic, but there’s no good way to combine those fonts together (e.g. what do you get when you combine Old English + Comic Sans?). When you have 18 different fonts that get combined together you won’t necessarily end up with something that’s easily distinguishable from all the other font combinations (e.g. Slab Serif + Script look different than Bodoni + Varsity). And, even if they are distinguishable, it will be difficult to visually assess which fonts a particular font was made out of.
Instead, the approach used here has visually distinct typographic attributes: tall brackets and low-x-heights are separate, can be combined, and still understood as the combination of those two separate things.
Many font variants: You won’t find fonts with variable widths, variable x-heights, variable bracket sizes, and variable serif widths in a commercial-off-the-shelf font family. Ideally, the concept of multiple-master fonts should have made this easy, but that doesn’t exist with current fonts used in browsers. Instead, I used a parametric font generator, in this case from Prototypo.io. In this case, you start with a font and lots of parameters, such as x-height, weight, italic slope angle, serif width, bracket height, and so on. To get the variants I needed, I started with a basic serif font, then created 7 different base types (heavy weight, italic, boxy, narrow, low-x-height, wide-serif, and tall-brackets) and then all pairwise combinations (e.g. heavy+italic, boxy+narrow, etc) to create a total 29 fonts.

Note that a careful selection of attribute must be considered. Sans-serif is not one of the base types because attributes such as wide-serifs or tall-brackets can’t be combined with sans-serif – only one or the other can be represented at a time. However, if all the base types contain serifs, then all the serif combinations are supported.

So What?

From a visualization perspective, this is a useful thought experiment to see what happens when you attempt to use 18-24 different visual attributes all at once – it suggests that we can certainly go well beyond 5-10 attributes. There is lots more research that can be done.

From a typography perspective, it’s a useful thought experiment to think about why multiple-master fonts or parametric fonts may have uses in data visualization in the future, and what sort of technical enhancements might be needed to support this: generating all possible permutations and combinations of a font is not a feasible approach to meet the needs of very high dimensionality.

From a Pokémon perspective: bring it on. I want to see next generation Pokémon that have more than 2 types. How about an evolution of Charizard that includes ghost type; or Dark steel version of Pikachu? The visualization is ready.

P.S. Happy Birthday A. Sorry this is a bit late:-)

 

Advertisements

About richardbrath

Richard is a long time visualization designer and researcher. Professionally, I am one of the partners of Uncharted Software Inc. I am also pursuing a part-time PhD in data visualization at LSBU. The opinions on this blog are related to my personal interests in data visualization, particularly around research interests related to my PhD work- this blog is about exploratory aspects of data visualization not proven principles.
This entry was posted in Data Visualization, Pokemon, Text Visualization. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s