Variable Fonts vs. Parametric Fonts and Data Visualization

I’ve typically been using ready-made fonts to create font-based visualizations in this blog. However, sticking to ready-made fonts means that you get only one slope angle for italic, a few levels of weight, maybe a condensed version or maybe not. Instead, the ability to quantitatively adjust font parameters such as weight, width or slope angle sounds far more enticing for visualization.

The good news is that fonts can be manipulated by the font user, and that there’s two different technical ways to achieve this.

Parametric fonts are programatically defined fonts. Parameters, such as x-height, stroke width and letter width are numeric parameters that can be set and then a new font generated based on those parameters. One early example of parametric fonts is METAFONT by Don Knuth. He enthusiastically claims:

“Infinitely many alphabets can be generated by the programs in this book. All you have to do is define the 62 parameters and fire up the METAFONT system, then presto – out comes a new font of type!” – introduction to Computer Modern Typefaces, 1986.

Knuth has a lot of low level parameters, which affect only a few characters such as dot-size (for the dots on i and j), ess (for the stroke breadth on s), beak, apex correction and so on. Here’s a snapshot showing some of his parameters, which looks much like an “anatomy of type” illustration:

knuthSomeParameters

Some of the font parameters in Knuth’s METAFONT.

Prototypo.io is a modern incarnation of parametric fonts, with click and drag sliders, interactive previews and feedback when you go beyond the expected parameter ranges, all in a browser. The starting point is a ready-made font with manipulation of 30ish parameters, such as as x-height, width, thickness, slant, serif width, serif height, bracket curve and so on. Quickly you can create a low-x-height, heavyweight, tightly-spaced, flared bell-bottom font. However, shifting some parameters into the redzone results in a fun font that isn’t particularly legible (e.g. a, 5, 0 are filled in):

PrototypoBellBottom2.PNG

Manipulating Prototypo serif parameters into the red zone to create a heavy bell-bottomed font.

 

Quite a few of the font parameters are mutually exclusive. So they can be combined together to create combinations in a visualization to represent different data variables. Here’s one of Prototypo’s fonts where I’ve created 8 variants: a boxy, heavyweight, condensed, bulgy serifs, wide serifs, low x-height and default plain. All the pairwise combinations are shown below:

PrototypoPairwiseCombos.png

Pairwise combinations of eight different font variants.

The parameters have ranges of values, so instead of only binary settings (e.g. normal weight and heavyweight; normal x-height and low x-height; etc), a range of parameters could be used. Here’s a font with five different levels of x-height and five different levels of curviness (think of curviness as an ordering from angular to rounded to boxy):

PrototypoXheightVsCurviness

A font with 5 levels of weight and 5 levels of curvature.

One problem with parametric fonts and browsers today is that each parametric variant needs to be saved in a font-file: the 5 x 5 x-height by curivness font requires creating and saving 25 font variants, which is tedious; and then all these variants need to be loaded into the browser, which uses a lot of memory and can be slow. If we also want to add 5 weights, 5 oblique angles and 5 widths, we’d need to generate and save 5 x 5 x 5 x 5 x 5 = 3,125 font variants. This is problematic for visualizations that might want to encode 5 different data attributes each with 5 different levels into labels.

Variable Fonts, a new standard with OpenType 1.8, provides an alternative for font users to interact with font parameters. Variable fonts provide linear interpolation between defined extremes created by the type designer. Here’s a font with variations in width, weight and optical sizing (from John Hudson’s Variable Font blog post). Green letters are defined by the font designer, orange letters are interpolated from these.

VariableFonts.png

Variable font illustration indicating interpolated fonts (orange) from defined fonts (green).

Variable fonts are very new, so it’s exciting to watch the evolution including browser support coming out.

Parametric and Variable Fonts in Visualization

Parametric fonts and variable fonts are interesting for data visualization for a few reasons:

  • You can potentially access low-level parameters, such as x-height, that you can’t normally access.
  • You have quantitative access to each dimension in a font: you’re not limited to only a few weights or two widths.
  • And these parameters can be combined together in new ways.

So what? Here’s a quick snap of the introduction to Mr. Weston from Chapter 2 in Jane Austen’s Emma. Interesting uncommon words have a high x-height while boring frequent words have a low x-height and inbetween words have an inbetween x-height:

xHeight-EmmaWeston.PNG

Words with higher x-heights are less common words in English.

Not that this quick example is particularly readable, but seems reminiscent of some of the formatting in Ronell’s deconstructivist Telephone Book:

RonellTelephoneBook.png

Font size varied to create a rhythm separate from words in Ronell’s Telephone Book.

Ronell’s formatting isn’t constrained to whole words but runs across sequences of letters creating a rhythm separate from the text. In general, being able to tweak and adjust font attributes opens up new creative possibilities and new kinds of data visualizations. Like Ronell, data visualization could apply font attributes like x-height, or width or curviness or other attribute to subsets of words. Suppose you had a dataset that indicated which letters in English words are silent – you could then draw words such that the silent letters are shown differently, say, with a low x-height:

 

 

SilentLetters2.PNG

Silent letters indicated with low x-height letters.

Those are just a couple of quick examples – many of the other font visualization examples in this blog could be adapted to better utilize some of these font parameters. And, of course, there are many other visualization possibilities yet to be considered.

Parametric vs. Variable Fonts?

So which is better – parametric or variable? I have used Prototypo and created a few variants and a few visualizations. I haven’t used variable fonts yet – but I like the specifications.

Both parametric and variable fonts have limitations. As discussed earlier, with parametric fonts in current technologies, a font file needs to be generated for each variant required, which means managing a lot of font files and dealing with inefficient use of application memory.

With variable fonts, however, the font user has to rely on all the variants created by the font designer. If no condensed/expanded variant was created by the font designer, then that axis will not be available to manipulate. Linear interpolation of shapes could also run into issues as well: for example, consider these three widths of Gill Sans (normal, condensed and extra condensed) – for example, the a and g completely change shape, the openings on S, 3, 5, get bigger, at some point the edges of the M start to slope,  and so on. These sorts of sharp changes presumably won’t be captured in variable fonts. In theory, a parametric font might be able to have some rules to accommodate this, but that would depend on how complex the parametric design rules are.

gillsans.png

Gill Sans in 3 widths. Note how heights and letter shapes change with increasing narrowness.

I’m looking forward to experimenting more with OpenType Variable Fonts: it could make font manipulation in visualization much easier to do. I’m hoping that variable fonts won’t go the way of multiple master fonts. We’ll need a couple things to happen to make sure that we have a solid foundation for variable fonts. First, we’ll need browser and application support – and there is already some indication that there will be browser support in Chrome. Then, we’ll need to see font families created that support variable fonts. Ideally, these variants won’t be restricted to typical attributes such as weight or width, but hopefully we’ll see variants that have multiple x-heights, or serif styles, or slopes or other parameters.  Then, on the data visualization side, we’ll need to invent new types of useful visualizations that use these new font capabilities.

Here’s hoping that variable fonts will become a well supported standard.

Advertisements
Posted in Data Visualization, Font Visualization, parametric fonts, Text Skimming, Text Visualization, Variable Fonts | Tagged , | Leave a comment

The Origin of Thematic Maps — and the problem with base maps

Why is there such a big gap between thematic maps and label maps? Both types of maps show data about places. Thematic maps typically use lots of color to show data about places; whereas label maps use a lot of labels to indicate the names of places — plus they use typographic formats such as bold, italics, caps and so on – to show extra data about places.  Compare these two US maps (both from the National Atlas of the United States of America 1,2):

US-choroplethMap

Left: Choropleth map of US counties with color indicating presidential vote in 2012; Right: Atlas map of US with various text labels and formats. Inset shows city labels using size and italics to indicate additional data.

On the left, counties are color coded, indicating one data attribute per county (and tiny counties may not be visible). On the right, cities are indicated with text, plus the population is indicated by text size, plus italics are used to indicate if the city is the capitol of a state or country.

Obviously, both maps serve different purposes, but in both cases additional data about places is getting encoded into visual attributes. The difference between thematic and labelled maps is entrenched in our thinking about maps. In cartography textbooks (e.g. Tyner, Brewer, etc), thematic maps are discussed in completely different chapters than labels: text labels aren’t considered to be thematic.

Why?

Perhaps it is useful to look back in time to figure out where this split first occurred to provide some insight. Thematic maps have been around for a very long time. Here’s a pair of maps from the 1850’s. On the left is a thematic map by Minard (link) with circle sizes indicating shipments per port. On the right is a contemporary map from an atlas by Heinrich Keipert (link) where city labels indicate information via text, font size, bold, underline and capitalization.

Thematic_MinardvsKiepert

The first Choropleth Map

The earliest choropleth maps (according to Michael Friendly), are from Charles Dupin in 1819 (almost 200 years ago!) with an example shown on the left below (link). Simple grey shading applied across almost equally sized regions makes for a great image showing a broad dark band across the center of France. Again, on the right is a contemporary map, this time by Carey and Buchon (link) and again, this map has variation in typography such as spacing, capitalization, italics and size.

Thematic_DupinVsCarey

Crome’s Neue Carte von Europa

So where did Dupin get the idea for a thematic map? A big influence on Dupin was the German researcher August Crome. Below is Crome’s Neue Carte von Europa from 1782. This maps shows where various commodities are produced across Europe.

Thematic_Crome
You can see that Crome starts with a base map that has the labeling conventions of the time, for example italics for rivers, all caps for country name, and color to denote country boundaries. Then he adds on top of this map all the content related to his thematic investigation: different kinds of commodities. He displays these as symbols and codes (only a small portion of the legend and map is shown – original map is here).

Base Map Pain

However, Crome can’t differentiate these symbols and codes from the base map using color, font size, case, and italics — because those have already been used in the base map. Even if he were to use them, they wouldn’t stand out because those formats would just be confused with the base map use of those same attributes.

Anyone who’s designed a map knows the pain of base maps: it’s really hard to make your data standout when the basemap is an already noisy and colorful Internet map or satellite image. And, when you’re designing a thematic map, it’s nice to have patterns in your data visually pop-out. So Crome is backed into a corner and uses different symbols for commodities as well as pairs of letters. However, all of these effectively require perception of different shapes, and different shapes don’t visually pop-out (i.e. shape is not preattentive, e.g. scholarpedia visual search, or Bertin).

So Crome’s proto-thematic map was highly popular but there are no patterns that you see – you have to inspect it closely and read all the labels. Instead, Dupin starts with a much simpler base map – outlines of regions – and his dataset is simpler too – just a single variable. As a result, he is able to use an attribute such as brightness or color. He adds labels too, but his labels are simple plain text and the labels are easily skipped by other later map makers.

What if…

Could Dupin have used text and typographic formats instead, like the other contemporary label-based maps of the time? It’s an interesting hypothetical question. Bold type has strong preattentive properties (e.g. Strobelt et al). Dupin might not have known about or had access to bold type: it was invented around the same time as his map on the other side of the English channel (1820s). And the first bold-faces were not available in a range of different weights which Dupin would have needed. Similarly, italics of varying slopes, or different styles of underlines wouldn’t have been available to him. As a result, Dupin and his engraver use intensity, which was available to them, and launching the split between thematic maps and label maps.

Here’s a thematic map, using font weight (more examples of typographic thematic maps are in the paper just published for ICC available here):

CartogramByFontWeight

Six different levels of font weight are used to convey data.

I wonder what Crome and Dupin would have thought?

 

Posted in Choropleth, Data Visualization, Font Visualization, Thematic Map | Leave a comment

Top NY Picks: Me vs. Reviews

I have a few upcoming speaking appearances, including the International Cartography Conference (DC July 6); Information Visualization 2017 (London July 13) and Strata Data Conference (NYC Sept 26-28).

Since NYC is a city I often visit, I thought I’d make a list of places I like and compare it to the ratings on a trip review website, in this case Trip Advisor. In this example, each row lists a place I like and a few words from a recent review. Then, added typographic symbols indicate the mean and one standard deviation. With the added bolding, it forms a kind of box plot made of text.  What you’ll notice is that some of the things I like are also liked by reviewers (e.g. MOMA, NYPL) but some things are not liked so well by reviewers (Lever House, Citicorp Center):

RichardNYCfaves

To provide a bit more context as to why this may be, the underlines indicate the number of reviewers. There are more than 18,000 reviews of Rockefeller Plaza – seriously? – what does the 18,001st reviewer really think they are adding to the reviews? On the other end though, Lever House and Citicorp aren’t written up by many people – they don’t score highly perhaps because no one has told the visitor to look at these places and they don’t. Some of the reviewers who do take a bit of time to review them are dismissive.

This visualization is interesting for the discrepancy between the review site and the personal favorites. It is also interesting for the insight to herd mentality to notch one’s name on a deemed cultural landmark (18,001!) versus stopping to smell the other roses that everyone else walks past. And that’s what makes cities like New York great: there are millions of interesting places in every nook, plaza and market waiting to be discovered and appreciated.

Posted in Alphanumeric Chart, Data Visualization | Leave a comment

50 categoric colors, inspired by telephone wires


25 pair color-coded telephone wires

Using colors to represent different categories of data is challenging to scale past ten unique colors. Many authors and researchers recommend against it. Colobrewer2 doesn’t go past 12 unique colors. D3.js offers three variants of color scales with 20 unique colors. But what if you want more? Suppose you have a line chart with 50 lines? Or some kind of choropleth map with more than 20 categories?

Dashes Don’t Work

With line charts, one obvious answer is to alter the line style, for example some continuous lines and some dashed lines. In this way a color can be reused, once in a normal style, again as a dash variant, again as a dotted variant. I used to use this approach, but found out that users don’t like dash lines. Why? Dashes might have a gap at a turning point: many financial users want certainty at the local highs and lows. Dashes are also confusable with gaps: in real data there may be a gap and you want to explicitly depict the gap because a gap in data is meaningful.

Cartographic Lines

Like many things in data visualization, researchers want to invent new techniques. But first, it may be more constructive to see how other people solved the problem.

I like to look at cartographers, because there’s 500 years of printed maps to consider for inspiration. There are many variants on line styles: dots, dashes, combinations of dots and dashes and pairs of lines:

Many different simple geometric glyphs can be embedded in lines to differentiate between them: dots, circles, stars, diamonds, boxes, and so on.

Charts, of course, borrow the idea, as seen in Excel’s charts.

These glyph-based line styles could be combined with various colors to create a wide number of different lines, e.g., orange line with stars, orange line with circles, blue line with stars, blue line with circles, and so on.

But, lines with glyphs create visual noise. The high point of a line is confounded with a star making the highest point a few pixels higher. And the entire scene becomes cluttered with circles, dots, stars and diamonds. One suspects that cognitive load may be increased, but some independent studies should be done to confirm the hypothesis.

25 Pair Color-coded Telephone Wires

Instead of glyphs, this post is focused on alternating colors on lines, borrowing an idea from telephone wires.

Interestingly, wire-based telephony had the same problem as line charts. You need a get a pair of wires to each household. Those wires need to be bundled together back to the telephone exchange. You need to be able to visually distinguish between the wires when you open up the bundle at any point. You could interactively determine this by successively testing each wire, but that would be slow and cumbersome. And there’s definitely more than 5 houses per exchange, so you need to create some kind of categorization scheme to differentiate among many different lines.

I remember as a kid in our suburban house we had open floor joists in the basement and my dad had wired up phone jacks in each room (this is pre-wireless technology). Unlike 120 volt power wires (black and white), each telephone wire was predominantly one color, say red, with a second color as a small dash every half centimetre, say yellow. So, you could have red with a bit of green or orange with a bit of yellow, green with white and so on. The telephone standard supports 25 pairs of wires, which results in 50 unique color codes. For example, in the top image in this post, the yellow wire with bits of blue can be clearly distinguished from the blue wire with bits of yellow paired beside it.

This color patterning means that a wire can be visually traced through a spaghetti jumble of wires or decoded at either end without needing to see the middle. Presumably the colors were chosen and standardized to meet the needs of telephone repair, for example, under poor lighting conditions. And, given the standard has been around for many decades with worldwide use, it is probably safe to assume that it has some degree of effectiveness.

Using the 25 Pair Color Code with Visualization

What does this mean for data visualization? Using the same approach, line charts could be created with 50 uniquely identifiable lines. Using this approach, the clutter associated with glyphs on lines does not occur and the objection to gaps associated with dash styles is no longer an issue.

Areas could be filled with 50 uniquely identifiable color combinations, and the pattern orientation, size and glyph remain open to either express other data or allow for aesthetics. Points, such as a scatterplot, however, won’t work, unless those points are larger and given a lack of association between adjacent points, the approach might not work well perceptually.

Color-coded lines are easy to implement in SVG (and D3). A line can be plotted twice: the under-line in the dominant color followed by a second line drawn overtop with the secondary color and a dasharray. Similarly, for areas, SVG patterns can be created.

Note that the colors in the telephone wire standard include colors such as black and white. Given that visualizations are typically on a white or black background, the initial colors need to be tweaked for visualization. Perhaps pink instead of white when used on a white background.

So here are 50 lines in a pseudo-random line chart, where each line is colored using a 25-pair color encoding:

25pairColorLines2

A couple of things worth noticing:

  1. The approach does work in that there are no gaps and each line is clearly and uniquely encoded. Yellow with a bit of green hits the lowest low on this chart.
  2. A line chart with 50 lines is very crowded. You can see sort-of macro-patterns, with a density of lines starting a bit higher and trending down.
  3. You can identify where lines appear and reappear, such as the purple line with a bit of pink at the top left, reappears at the top again the right side. You can even visually trace a line, but that requires considerably more effort particularly through an area of congestion.

Does 25 pair color coding really work? The result above seems promising but inconclusive. More tests would need to be done. And more importantly, what are the tasks that a user might need to do on a 50 line line chart? Tracing is an interesting task to consider. What are the other tasks?

Posted in Data Visualization, Line Chart | Tagged | Leave a comment

Readability

Readability

and what it means for data visualization

Type can be legible, but still unreadable. Consider this image:

The letters are perfectly legible, but the text, upside down and mirrored is unreadable.

Beautiful Helvetica, mirrored and rotated 180 degrees. Two highly common words from the English language. Letters are perfectly legible and even turn into other letters: p becomes b, m turns into a strange w. But it’s unreadable (thanks truetypestories).

Legibility is concerned with the clear delineation of the individual letterforms and their separability from one another.

  • Are the individual letters clearly designed?
    For example, is the opening of a c sufficient.
  • Are the letters clearly distinguishable from one another?
    For example, Helvetica uppercase I, lowercase l and numeral 1 are extremely similar.
  • Is there potential for letters to run together to be mistaken for a different letter?
    E.g. rn in Helvetica could be mistaken for an m, particularly if a drop shadow fills the gap between the r and n.
  • Do the proportions between letters make a potential letter ambiguous.?
    A letter with high x-height may not have much variation to distinguish between an h and an n (again see Helvetica).

Legibility is very much in the domain of the font designer and is very concerned with the shapes of letters, spacing, consistency across the design.

Readability, however, goes beyond type design. Readability is a comprehension issue concerned with ease of reading lines and paragraphs. Readability can be affected by many factors:

  • Line length: paragraphs that are very wide or very narrow are harder to read.
  • Spacing, kerning and leading: spacing between letters and lines, and tuning these spaces. For example too far   a p a r t  and words break apart.
  • Font weight: text that is too heavy or too light can be more difficult to read. Note that most fonts with variable weight have a “book” weight.
  • X-height: a font with a high x-height may increase legibility of words at a distance on signage, but may be more difficult to read for long paragraphs.
  • Uppercase: all uppercase is more difficult to read than type set in mixed case.  This is NOT an endorsement of readability based on word shape, rather simply that all uppercase has no ascenders or descenders, meaning that there is less shape differentiation between letterforms.

Readability is also related to cultural conventions. For example, in languages with longer/shorter words, optimal paragraph widths may be longer/shorter. Font choice is related to readability. Fonts that are more familiar are easy to read:

  • font-blackletter font  is  difficult to read these days because it is uncommon, but was used regularly in Germanic countries until the early 20th century.
  • font-baskerville was considered difficult to read when introduced (claim it hurt eyes), but would likely be unnoticed today.
  • font-neueswift is a modern font, designed by Gerard Unger in 1985. Gerard says: “When I first released Swift, people criticized it has hard to read with many angry angles: now it is a standard used in many newspapers, dictionaries and other major works.” (presentation by Gerard at University of Reading, 2016).
  • Note that there is an ongoing discussion as to whether sans serif or serif fonts are easier to read. In practice, for long printed texts, the convention tends towards serif-based fonts; while on mobile screens, the convention currently tends towards sans-serif fonts (perhaps this may change with more more devices at higher resolutions). Or perhaps a notion that sans serif is better for short bursts (headlines, narrow mobile devices) versus serifs for wide lines (Williams).

So what does readability mean to  visualization? The visualization programmer has control over choice of font, spacing, weight, shadows, and so on – so readability should be considered. Furthermore, techniques that may change font weights or other attributes in running text,  m a y  negatively impact  r e a d a b i l i t y,  particularly if  t h e r e  are multiple different attributes adjusted co-occurring within a text (Carl Dair).

There are also cases where readability is not an issue. Short snippets of text, such as headlines or text specifically designed for skimming are an example. For example, dictionaries often use a wide mix of typographic techniques to differentiate elements within each entry to facilitate ability to quickly skip across parts of a definition of interest.

Furthermore, a visualization may be interested in deliberately interrupting readability, given the appropriate application. The ideal exemplar here is Tallman lettering, used to differentiate among similar sounding medications.

(For more info and examples, see, e.g. Victoria Squire et al’s: Doing it Right with Type; Beier’s Reading Letters, Designing for Legibility; Walter Tracy: Letters of Credit; Isabel Gauthier et al, Font Tuning associated with expertise in letter perception)

 

Posted in Font Visualization, Legibility, Text Skimming, Text Visualization | Leave a comment

The 19 Dimensional Word Cloud of Pokémon

If you want to catch all 719 Pokémon, the serious gamer needs to know their different skills and abilities. At a high-level, Pokémon classify into 18 different types, such as fire, water or grass. For example, Pikachu is an electric type, Charmander is fire, Squirtle is water and so on. This is important because type determines a Pokémon’s advantage in battle, e.g water-type do well against fire-type.

pokemonplastic

Pokémon may have more than one skill type, for example, Charizard is both fire and flying; while  Gengar is both ghost and poison. This makes Pokémon interesting to the visualization researcher: there are 18 different categories and all those different categories can be combined in different ways. So, an interesting visualization question is how to represent all those different possible combinations?

Visualizing all the combinations of types of Pokémon

Of course, the Pokémon community has made various tables and diagrams to represent this information. Given that there are cute images of each Pokémon, one fun way is to organize these images. Furthermore, the color of a Pokémon is usually related to its type, for example fire type Pokémon tend to be red, water type tend to be blue. Here’s an awesome poster of all Pokémon organized by color. (original by Marie Chelxie Gomez, via polygon):

pokemon_bymariechelxiegomez

Pokemon sorted by color, and color is related to type.

But the above image doesn’t make explicit the types nor the combinations of types. Here’s a great cross-tabulation of all the pairwise combinations of Pokémon posted on reddit a few years ago (http://bit.ly/2hxOcpd):
pokemonmatrix_

Pokémon organized into a table by combination of type. Click for big.

In this case, the combinations are depicted. But you have to zoom in: there’s a lot of empty space since there are a lot of type-combinations that don’t exist. Also, if you don’t happen to know the names that correspond to each picture, you can’t identify a Pokémon by its image (for example, I don’t know all the Pokémon, but my son does). So how could you show all the text, have a readable layout, and indicate all the different types?

Word Clouds

Word clouds are pretty efficient at one thing: packing a lot of words into a tight space. Here’s a word cloud of the 151 first generation Pokémon made with Wordle:
pokemonwordle2
Unfortunately, word clouds have lots of problems. The visualization community isn’t fond of word clouds. In the most typical way that they are used, they convey only the data of the word itself, and usually indicate one more data attribute by setting the size of the word to the frequency of that word in some document. Usually, the position, the color and the orientation are random and convey no information.

“Wordles [i.e. word clouds] are driven by a single minded fetish for filling space.”
– André Skupin, Mapping Text, 2010.
So, instead, consider the opposite question:

How many different dimensions of data could be represented in the words?
This is an interesting question. The answer you get today will depend who you ask:

Visualization Researcher: 5-6 In visualization, one can refer to the standard visualization attributes – perhaps 6  or so are commonly used: x position, y position, size, hue and brightness, plus the word itself. In fact, you can look at all the text visualizations at textvis.lnu.se – if you should happen to count up all the examples you’ll find x,y,size,hue and intensity account for more than 95% of the encodings used on text – and in most cases, only 0-2 of these are actually used (see Table 2 here).

Cartographer:
6-8. Cartographers come from a different perspective and have been using visual attributes to add information to labels for many centuries: italics for water, heavy text for large cities, s p a c e d  o u t text mountain ranges, and so on. One of my favorites is an Ordnance Survey map from the 1920’s:
5d-townlabels

Town labels indicate more than five dimensions of data (click for big).

Each city label indicates: 1) the name of the city via text; 2-3) latitude and longitude via x,y position; 4) town vs. village via uppercase/lowercase; 5) county towns via italics; 6) population category via font size; and 7) country via font-family (serif for U.K. slab-serif or serif variant for Scotland). That’s impressive.

Let’s go further. Certainly more than 10 could be done. How about 15? Or 20?

Why? Sometimes its a good to explore possibilities. Even if the result isn’t pragmatic, it forces new ideas and force new strategies to be considered. Some of these might even be useful in some other context in the future.
The challenge is that each visual attribute needs to be able to be combined with another attribute. For example, one dimension of data might be set to size and another set to shape. However, as size decreases, all the shapes end up being ambiguous dots. So, we need to find a lot of different visual attributes that can work together.

16 Types of Pokémon (first generation)

Let’s consider a set-type visualization looking at Pokémon. In the first version of Pokémon, there are 151 different Pokemon of 16 different skill types (aka first generation Pokémon).

X,Y Layout

Here’s a quick visualization of the 151 Pokémon Generation 1 arranged using a graph layout. The large underlying words indicate the type:
pokemon-plaingraph

First generation Pokémon, arranged by type (shown underneath in large type).

Each Pokémon is placed in proximity to its type(s). Pokémon out near the perimeter belong only to the one type they are close to. Those belonging to more than one type are placed in between the types they belong to. This uses only two dimensions (the spatial dimensions: x,y). Unfortunately, it is highly ambiguous for Pokémon near the center: you can’t tell which types they belong too. If you added lines, it would help, but there could be too many lines all criss-crossing making it difficult to distinguish.

Color

Instead, we can use some other visual attributes to identify type. Many Pokémon guides use color: it can be fairly intuitive, e.g. green for grass type, red for fire type. Here’s the same plot using type colors from Bulbapedia:
pokemon-coloronly

First generation Pokémon indicating type by color. Some colors are ambiguous.

This works OK for the Pokémon around the perimeter but not the Pokemon of multiple type in the interior. Pokémon of more than one type can  end up with muddy, hard to distinguish, hard to decode colors, e.g.:
Purple + green = greyish brown
Orange + blue = brownish grey
And so on.
So, when you look at Dodrio, you see its greyish purple – is that Flying+Dragon? Or is it Ghost+Normal? Or something else?
The reason for this problem is that the original palette of 16 colors is being used to encode 16 separate categories. Attempting to combine these colors results in 128 possible colors (16×16/2). Unfortunately, humans are not good at readily identifying 128 different colors (e.g. see Colin Ware‘s or Tamara Munzner‘s books on visualization). Another way to think about color is as a three dimensional space, such as a combination of red, green and blue; or as a combination of hue, brightness and saturation. Trying to squish 16 different dimensions into a 3 dimensional space is problematic at best.

16 Different Visual Attributes

What’s needed are 16 different dimensions of visual attributes, all of which can be combined together in any order, and unambiguously deciphered. Since we need so many different visual attributes, we need to consider many possibilities, including common visual attributes (e.g. rotation, scale, texture, motion, shape, shadow); and font-specific attributes too (e.g. e.g. bold, italic, case, underline, shifting baseline, punctuation, serif style, outline).  Some of these have to be discarded, for example: 1) shadows on text reduces text legibility – since we’re using text we’d rather not make it illegible; 2) shape isn’t easy to combine with text so skip that; 3) motion attributes such as blink or wobble are so visually dominant they can be annoying, so we’d rather not use them.
Here’s a grid showing 16 different variations of labels across the top row and first column. the middle of the grid shows all 128 pair combinations. Each cell is uniquely different from its neighbors. With some cognitive effort, the viewer can determine what attribute is different in each case:
18_visual_attributes

Many different visual attributes for labels, and all the pair combinations (click for big).

The attributes used are plain serif, upper case, shifting baseline, surround quotes, tracking (i.e. spacing), exclamation mark, underline, boxy version of font, bold, narrow version of font, italic, deep brackets on serifs of font, wide serif version of font, low x-height version of font, outline version of font, tall stretched version, rotated, horizontal stripe texture, vertical stripe texture.
The same approach can be applied to the Pokémon visualization:
pokemongen1-nice2

151 first generation Pokemon with type indicated by unique visual attributes (click for big).

Each type now has a specific visual attribute associated with it. Small caps for fighting, slightly rotated text for flying, italics for poison, and so on. Now it is possible to create some of the interesting combinations, for example:
pokemon-samplecombos
In each case you can see how the different attributes can be combined: Parasect gets the combination of the narrow font for Bug and the wide brackets for Grass (wide brackets, i.e. the fat parts on the letter like the bottom of the r). Kabutops gets the blocky font for Rock and the low-x-height font for Water.  Mr. Mime is the only Pokemon that gets the combination of vertical stripes and horizontal stripes to end up with plaid.
Note that Ice was originally a wide serif which seemed hard to see, so a bumpy edge was added for Ice as well to further differentiate it.

18 Types of Pokémon

The visualization above has only 16 different types. Pokémon aficionados know that two more types were introduced with the second generation of Pokémon, for a total of 18 different types (and more than 300 possible two-way combinations). You may have even picked out Magneton and Magnemite have underlines underneath their labels in the plot above, even though there is no underline showing in the legend (underline is for steel type which didn’t exist in first generation Pokémon but was retroactively added). Here’s the same visualization, now showing all 719+ Pokémon across 18 types. Click for a big version.

pokemonall2

All Pokémon by Type. Click for big.

So, with 18 different visual attributes, plus the text itself, this word cloud represents 19 different data dimensions.

Questions?

So this is a new kind of strange visualization of Pokémon. There may be many questions:
Pokémon  Questions
Variants. Some Pokémon may can have different combinations. For example, the Pokémon Rotom can be Electric+Fire; or Electric+Flying; or Electric+Ghost and so on. Since I’m not a Pokémon expert, I wasn’t expecting this (the last time I played was on a GameBoy Color). I considered representing a single Rotom with attributes for every possible combination – which isn’t correct; so instead, Rotom occurs multiple times in the visualization, each with some appended text to indicate which flavor (e.g. Rotum-EFr for electric-fire variant, Rotum-EFl for the electric-flying variant)

Data Errors
. As mentioned, I’m not a Pokémon expert. I just took data from Bulbapedia used it as is. I don’t know why some Pokémon are a single type such as fire and some have two types, one of which is normal, such as Bibarel, which is listed as water+normal. To me, it seems that water+normal should be the same as water? The visualization just draws the data, and no guarantees that I cut and paste the data correctly.
Visualization Questions
24 Dimensions: In addition to the 19 visual dimensions listed above; each label also has position and color. Position uses x,y spatial location for 2 more dimensions, and color uses variations in hue, brightness and saturation (or variation in  red, green and blue) for another 3 dimensions. That’s on the order of 24 visual dimensions. But from a data perspective, it’s only 19 hence the title is 19 dimensions.
Many-way combinations: In the visualization, the fonts can be assembled in any combination. In the case of Pokémon, it turns out that any single Pokémon can only belong to at most two different types. From a combinatorics perspective, with only 1 way and 2 way combinations there are only a few hundred possible type combinations. However, from a visualization perspective, this palette of 18 visual attributes can be combined in any combination: 2 way, 3 way, 5 way combinations. If Pokémon version 11 has new characters with 5-way combinations, this particular visualization will accommodate it: all the millions of possible font-combinations can be constructed using this approach.
Does it Work? In order to understand which types any particular Pokémon belongs to, it takes some cognitive effort to decode it. A thorough evaluation would require user studies and they have not been done. From a design perspective, I was unsatisfied with some of the font variants: for example the wide serif variant didn’t stand out. So, to enhance the differentiation, I added a bumpy edge to the wide serif (i.e. the strange font for the Ice type Pokémon). Dark type use vertically oriented text, which really jumps out and isn’t particularly easy to read. Electric and Normal use punctuation (exclamation for Electric, surrounding dashes for Normal), which seem a bit arbitrary, although they might be detectable without actually reading the text. And so on.
Typography Questions 
Different font per Pokemon: I was asked: “Why didn’t you use different font types for each Pokémon type?”. Font type is similar to color. You could use Old English for Fighting type and Comic Sans for Psychic, but there’s no good way to combine those fonts together (e.g. what do you get when you combine Old English + Comic Sans?). When you have 18 different fonts that get combined together you won’t necessarily end up with something that’s easily distinguishable from all the other font combinations (e.g. Slab Serif + Script look different than Bodoni + Varsity). And, even if they are distinguishable, it will be difficult to visually assess which fonts a particular font was made out of.
Instead, the approach used here has visually distinct typographic attributes: tall brackets and low-x-heights are separate, can be combined, and still understood as the combination of those two separate things.
Many font variants: You won’t find fonts with variable widths, variable x-heights, variable bracket sizes, and variable serif widths in a commercial-off-the-shelf font family. Ideally, the concept of multiple-master fonts should have made this easy, but that doesn’t exist with current fonts used in browsers. Instead, I used a parametric font generator, in this case from Prototypo.io. In this case, you start with a font and lots of parameters, such as x-height, weight, italic slope angle, serif width, bracket height, and so on. To get the variants I needed, I started with a basic serif font, then created 7 different base types (heavy weight, italic, boxy, narrow, low-x-height, wide-serif, and tall-brackets) and then all pairwise combinations (e.g. heavy+italic, boxy+narrow, etc) to create a total 29 fonts.

Note that a careful selection of attribute must be considered. Sans-serif is not one of the base types because attributes such as wide-serifs or tall-brackets can’t be combined with sans-serif – only one or the other can be represented at a time. However, if all the base types contain serifs, then all the serif combinations are supported.

So What?

From a visualization perspective, this is a useful thought experiment to see what happens when you attempt to use 18-24 different visual attributes all at once – it suggests that we can certainly go well beyond 5-10 attributes. There is lots more research that can be done.

From a typography perspective, it’s a useful thought experiment to think about why multiple-master fonts or parametric fonts may have uses in data visualization in the future, and what sort of technical enhancements might be needed to support this: generating all possible permutations and combinations of a font is not a feasible approach to meet the needs of very high dimensionality.

From a Pokémon perspective: bring it on. I want to see next generation Pokémon that have more than 2 types. How about an evolution of Charizard that includes ghost type; or Dark steel version of Pikachu? The visualization is ready.

P.S. Happy Birthday A. Sorry this is a bit late:-)

 

Posted in Data Visualization, Pokemon, Text Visualization | Leave a comment

Papers, Rejections and Critiques

There have been a few recent blog posts after VisWeek about paper rejections such as Niklas’ Elmqvist’s Dealing with Rejection and Robert Kosara’s related Dealing with Paper Rejections. Rejections are certainly a painful part of the academic review process I’ve had my share of rejections.

A paper review is a criticism of a work, but, it’s not a dialogue. One of the painful aspects of a paper rejection is that you don’t get to address your reviewer: perhaps they misunderstood some aspect of the work, missed a key point. Or maybe they have some valid criticisms about some aspects of your work, but you can’t get more details from them which could be really useful in improving your work. Or, they may have some uncovered some other relevant confounding factors that you hadn’t addressed. Or, identified some other relevant prior work. In all the above scenarios you’re stuck without being able to engage your reviewer with more dialogue – and the review process is not the place for discussion as pointed out by Elmqvist and Kosara.

Critiques are very similar to criticism in paper reviews – however critiques are about back-and-forth dialogue, ideally in a face-to-face setting. Critique originates in 18th century Enlightenment when scholars and bourgeoisie were struggling against the absolutists in church and state. Critique is a distinct public discourse based on rational judgement (Eagleton). It is a public exchange of opinion, open to debate, attempts to convince, and invites contradiction (Hohendahl). This dialogue is really useful to explore, investigate, deliberate, explain, expand, probe, refine and revise ideas. The feedback is quick and many different aspects can be considered.

coffeehouse1

Early critical debate.

While conference papers are a good opportunity to get feedback after a paper has been written, a critique can be used to collect a lot of detailed feedback from a variety of peers and experts before a paper is written. This can help with authoring a better paper, and it can help guide better research by stimulating ideas, framing research, exposing assumptions and so on.

Furthermore, critiques from experts (peers, expert users, etc) represent a form of evaluation. Critiques are used often in design education (and medical education) where there is a lot of complexity and many tradeoff decisions. Unlike a traditional time and error test, a critique is wide-ranging across the broad design space and can uncover various unforeseen issues.  Just like rejection, critiques can be difficult and painful for the person receiving the critique as issues are exposed. However, the dialogue allows the person to engage with the critic: to go deeper, to debate, to contradict, to understand, to accept, to learn. Note that formal critiques are the primary form of evaluation in design fields such as architecture.

archiculture3

Snapshots of critiques in architecture (from film Archiculture via Arbuckle Industries).

Interestingly, some types of conferences are more open towards critical discussion than others. Marquee conferences (VisWeek, CHI, etc) are so big and diverse that it might be hard to generate much interest in a specific topic. The marquee conferences are so big that everyone is hurrying from session to session and you don’t necessarily get great group dialogue. Instead, many smaller workshops and smaller conferences are much better for engaging in dialogue directly related to a research topic. Acceptance rates tend to be higher at these more narrowly focused conferences. And, all attendees at these small workshops are focused on similar research and therefore more willing to engage in critical discussion. I.e. smaller conferences and workshops may be better than large conferences.

There are other ways of engaging in critiques as well, such as reaching out to experts via email, blogs, skype, doctoral colloquiums and other means. “Do-it-yourself critiques” and other aspects of critiques are discussed more in my recent paper at BELIV back in October at the BELIV workshop at VisWeek; or, for the abbreviated version, here are the slides.

 

Posted in Critique, Data Visualization, Design Space | Tagged | Leave a comment