Text Visualization and Code Editors

Long before I started investigating typographic attributes for data visualization, Aaron Marcus and Ron Baecker were doing the same thing for software code. Back in the mid 1980s there weren’t integrated development environments, no integrated debugging tools and so on. My first coding environment in the mid 80s consisted of green text on a black background: 24 x 80 characters at maybe 72 dpi (*).


Old school: low resolution green-screen display.

If you wanted to better understand what was happening in a few blocks of code, there was no windowing system: you had to rely on flipping back and forth between code sections and carrying a lot of detail in short term memory – which is difficult.

So, instead, we printed it out. High speed line printers printed out large amounts of text on fan fold paper. You can cross reference between a few hundred of lines of code on paper much more easily than 24 lines on screen. And, on paper, you can use a highlighter to indicate variable declarations, underline conditionals, add squiggles or boxes around code and so on.

Baecker and Marcus realized that the then new Apple Laserwriter offered great resolution (300dpi) and excellent typographic capabilities. It could automate the tasks of differentiating types of statements using typographic attributes, instead of manually marking up the print out after the fact. They itemized the typographic attributes available, wrote routines to format and print code, did numerous studies and created recommendations through the 1980’s (e.g. CHI paper) – all before the academic research field of data visualization got started. Here’s some code from their book Human Factors and Typography for More Readable Programs formatted following their recommendations.

Baecker Marcus Code Formatting

1980’s laser printer-based code formatting using font weight, size, italics, shading, even smallcaps.

It’s a great example of visualization using typography. Italics, bold, background shading, slightly larger size for the function name, even smallcaps on the constant.

Over time, the display resolutions got better, code writing software got better, typography on desktops got better. Leading code editors nowadays do syntax highlighting reminiscent of Baecker and Marcus’s work – notice the many typographic attributes at work:


Snapshot of WebStorm – syntax highlighting varies underline, italics, bold, colors aid understanding. 

While the rich typographic formatting of in-line contextual highlights is the norm in software editors; typographic formatting to convey data is not normal in data visualization. Two years ago I went through all the text visualizations then listed at the Text Visualization Browser and cataloged which visual attributes were used to add data to text. Out of 249 visualizations then listed:

  • 40 didn’t have any text in the main visualization at all – they were visualizations such as point clouds, graphs, etc.
  • 103 used plain text without indicating any additional information, not even color coding – essentially plain labels.
  • Only 106 out of 249 (40%) used any kind of additional formatting(!) And what kinds of formatting did they use? Almost entirely size and color were used (often both). Only very rarely were typographic attributes – such as bold, italics, caps or underlines  – used, as indicated in this table:


Notice, for example, that varying font intensity is more common (occurred 9 times) than varying font weight (occurred 6 times). This is a bad design choice, because fonts come in a wide variety of weights which are more legible than varying font intensity:


Changing brightness reduces text legibility…


while, changing weight maintains text legibility.

What makes this particularly interesting is that the researchers designing and coding these visualizations were spending long hours staring at code editors full of typographic cues like the WebStorm snapshot — but then didn’t use any typographic cues in their visualizations! Possibly they had become accustomed to the typographic manipulation and no longer consciously aware of it, or, perhaps because it was situated in a different (non-visualization) context, the mental connection to visualization was never made (because visualization research rarely talks about typography).

This is just one example of a kind of “design blindness” (like change blindness, but without the change – i.e. missing something that is clearly visible). What other cues are UI designers seeing but completely missing?

Posted in Data Visualization, Font Visualization, Legibility, Text Visualization | Leave a comment

Visualizations of Many Variables

Underlying many of the posts on typographic visualization in this blog is the notion of visualization of many variables. Many modern visualizations only tackle a few variables beyond x,y layout. Treemaps use color and size; choropleth maps use color; tag clouds use size. Even Hans Rosling’s engaging TED talk is a bubble plot with only size, color and animation. Yes, parallel coordinate plots scale up to potentially a dozen or more variables but interactive techniques are often required to reveal the crooked path that corresponds to one particular item — it’s not feasible to do so through a mass of cluttered lines. And furthermore, most visualization techniques work with homogeneous data: not messy heterogeneous data at different scales, different entities, differing qualities and not necessarily easy ways to computationally join the disparate data together.

Tufte’s  Envisioning Data has quite a few examples of the latter; and many more variants are available on the Internet. Maps with many layers, exploded or cutaway drawings, dance notation, graphic timetables and many other kinds of detailed charts:

Bertin considered these types of visualizations graphical inventories – not for visual perception of patterns but rather a means to graphically organize a lot of related information. And so, inventory-style visualizations are not commonly used because most visualization tools are focused on the easy perception of patterns without the clutter.

But can inventory-visualizations be also used for rapid preattentive perception of patterns? At the outset they may seem cluttered, but they can still use attributes such as color to draw fast attention, and then still provide all the detailed context. Presumably this use of visualization as inventory plus preattentive highlights is already used today in real-time operational settings such as factories, pipelines and electric grids: the blinking box requires attention, but all the rich graphical detail around the blinking box indicates other assets that can be manipulated to rectify the situation. Something to consider for 2018.

Posted in Data Visualization | Leave a comment

Stem & Leaf Bigrams

Stem & leaf plots are quirky alphanumeric data visualizations. In typical usage they show distributions of numeric values. Here’s a simple dataset of heights of my family on the left, transformed into a stem and leaf plot on the right:


The resulting stem & leaf plot, at a macro level, shows a distribution. At a micro-level, you can easily read off the minimum value (152), the maximum value (187), and probably make a reasonable guess to the medium value – or in this case – the median is explicitly indicated with an added underline (172).

Character-based Stem & Leaf Plots

So can stem & leaf plots be extended to plot text-based data?

Consider a simple example based on letter pairs. Bigrams are pairs of letters. They can be analyzed across a large body of text and certain letter pairs will be common in certain languages. This can be useful for applications such as auto-detection of language and for cryptography.

Since bigrams are pairs of letters, one could split a bigram into a first letter (for the stem) and the second letter (for the leaf). The resulting stem & leaf bigram plot for all the letter pairs that occur more than 0.5% in the English language are shown here:


The stem (left side) is the leading character of the bigram; the leaves (right side) are all the trailing letters, with font weight indicating frequency. You can see that many of the most common English bigrams start with E. And you can see that ER, IN and TH are among the most frequent bigrams.

Unlike a stem & leaf plot based on numeric values, the trailing characters may be of interest. Instead we shift the stem to the trailing character when we add a second stem&leaf plot on the right:


In the second stem&leaf plot, the stem (now on the far right) indicates the trailing character; and all the leaves indicate the leading character. E is also the most common trailing letter. However, it is now a bit more visible that a trailing N is fairly high frequency.

Stem & Leaf Trigrams

And the approach can be extended to trigrams. The plot below is centered on the second letter of high frequency English language trigrams:


The trigrams are ordered from the center out – so in the top row, the top trigram is HAT, followed by EAR, followed by WAS. The most common trigram is THE, and there appear to be many trigrams with T at the center (note, dataset is from Practical Cryptography: http://bit.ly/1wf1Lgc, which didn’t consider word spaces).

Word Bigrams as Stem & Leaf Plots

Text-based stem & leaf plots can go beyond characters as units and extend to words and phrases. I did an previous blog post regarding characters and adjectives from Grimms Fairy Tales, which essentially was a stem & leaf plot.

Another variant is to use people’s names as bigrams: the typical forename-surname is a word bigram. Forename-surname is commonly used in the west can be used to construct textual stem & leaf plots. Here’s some of the families that were passengers on the Titanic. Surnames form the central stems – first class on the left half of this diagram (in a fancy serif font) – third class on the right half (in a plain sans serif font):


In this example, leaves on the left side of the stems are women (italic), leaves on the right are men. Children are indicated in ALLCAPS. Those who perished are bold. For example, top left is Ethel Fortune, a first class adult woman who survived. In the same family are Mark and Charles Fortune,  first class adult men who died. Among the first class, one can see that the women (along the far left) are almost entirely non-bold, meaning that they survived. Among the first class men, there are many more bold entries (died) although many of the survivors are allcaps (children). Thus, we see among the first class, that many women and children survived. However, in the right half of this plot we see the third class almost entirely bold: Most of the third class perished – regardless if they were women or children.

Phrases as Stem & Leaf Plots

As a final example, we expand the approach out to phrases. This example is based on the book of Psalms from the Bible, which has a number of repetitious structures. Here, we identify some of the most common phrases, indicated along the horizontal grey line. Above the grey line are the leaves that precede the common phrase, below the grey line are the leaves that follow the common phrase.


Interestingly, the phrase “I will praise thee” is common, but there is no commonality among the preceding and following phrases. However “O give thanks unto the Lord” is typically preceded by “Praise ye the Lord” and followed with “for he is good”.





Posted in Alphanumeric Chart, Data Visualization, Font Visualization, Text Visualization | Leave a comment

Lost Works of Jacques Bertin on Typography

It’s been 50 years since Jacques Bertin’s Sémiologie Graphique was published. Bertin looms large in history of both data visualization and cartography. Before we had textbooks on data visualization by Munzner (Visualization Analysis and Design), Ware (Information Visualization: Perception for Design) or even Spence (Information Visualization: Design for Interaction), Bertin provided the theoretical foundation that much of visualization relies on today. Sémiologie Graphique structures the design space of visualization with the now familiar concepts of marks (point, line, area); visual attributes (only six in his version); type of perception (quantitative, ordered, categoric and associative); and layout. Beyond these, Bertin also considered many other aspects such as spatial separation (to form small multiples) ordifferent use cases of visualization including communication, analysis, and inventory.

However, only one aspect of Bertin’s work never made the translation from the French original to English: typography! Strangely, out of 450 pages+ only 4 pages on typography were not translated. In these 4 pages Bertin discusses the importance of the literal information represented by text. He notes that text is often the only encoding commonly accessible to both the textual/verbal system of encoding information; and the visualization system of encoding data.

Furthermore, Bertin points out that text is not selective. In other words, text is not preattentive, meaning that patterns do not automatically pop-out. If I ask you to find the word “six” occurring in the first paragraph, you need to linearly scan through the text – the benefits of visualization do not occur. Typographers would agree: they put significant effort in making text appear uniform with no visual anomalies to standout, carefully tweaking letterforms and kerning pairs to achieve this effect.

However, typographers also understand the need to make some words visually pop-out from surrounding text and therefore provide forms of emphasis such as italics and bold. Bertin also points out that these forms of emphasis are available and discusses them in the context of the technology of his time: pencil, pen, professional printers (which would have used phototypesetting in the late 60’s), and dry-transfer lettering (e.g. Letraset). And he nicely itemizes attributes of typography, available on page 415 of the French edition of Sémiologie Graphique:


Bertin’s font attributes included letter forms, font family, font width, spacing, size, weight, case, and slope (italic).

So, why was Bertin wildly successful, but his commentary on typography so minimal that it was dropped from the English translation? Good question!

One answer is that even though Bertin indicates the potential of text to indicate data beyond literal text; he says the incremental addition of text only helps low-level elementary reading, not the higher level of visual perception of patterns. However, Bertin is writing in the late 1960’s: 15-20 years before Edward Tufte popularizes the notion that a visualization can be read at many different levels depending on the task, which Tufte calls micro/macro reading (Tufte: Envisioning Information).


Another answer is that even though Bertin acknowledges typographic attributes such as individual letter forms, typeface, width, spacing, size, weight, case and italic – he doesn’t provide any examples of the use of these attributes. On the otherhand, he provides hundreds if not thousands of examples of the other six visual attributes (size, orientation, hue, brightness, texture, shape), making sure that his core concepts are well explained and illustrated.  A parallel can be seen in open-source visualization libraries: there were many different open-source visualization alternatives in the early 2010’s.  However, Mike Bostock not only provided a well organized library with D3.js, he also provided a lot of compelling examples of visualizations implemented in D3 with source code. Mike made it far easier to adapt and extend D3’s model by starting with examples rather than requiring the extra effort to learn some other library and then figuring out how to create those examples.

There are other possible reasons, but the unfortunately reality is that Bertin’s typographic insights were side-stepped and never exposed to the English language research community. Bertin also wrote a follow-on article (Classification typographique : Voulez-vous jouer avec mon A doi : 10.3406/colan.1980.1369) specifically on the visual attributes of type in 1980 – but again, no examples and no translation (it does provide a better organization of the typographic attributes).

In Sémiologie Graphique, Bertin made 100 different visualizations of a dataset indicating  three major occupations across 90 departments in France. None use typographic attributes (although a few use simple plain labels). I decided to make one typographic example – here’s Bertin’s ternary plot (p. 115) where the bubbles have been replaced with text, sized in proportion to population, colored based on occupation proportions. You can choose to focus on the macro patterns (e.g. most districts have an agricultural bias, or most of the agricultural districts tend to have smaller populations); or you can choose to focus on the micro details, e.g. district P (Paris) has the largest population and no agriculture; or, district 32 is the district with the highest proportion of people employed in agriculture.  (If you want more non-Bertin examples, see many of my recent postings regarding typographic visualizations).


Ternary plot, based on Bertin, using alphanumeric codes instead of dots.

According to Google Scholar, there are 6004 citations for Bertin’s Sémiologie Graphique (across the French, German and English editions); while there is only ONE citation for Classification typographique : Voulez-vous jouer avec mon A. Perhaps this short article and references will help Bertin’s ideas of typography get more recognition and citations in future visualization and cartography research work.


Posted in Alphanumeric Chart, Data Visualization, Font Visualization, Text Visualization | Tagged , | Leave a comment

Patent Visualization and Litigation Ratios

Earlier this week, Scott Langevin and I were fortunate to speak at the Strata Big Data Conference in NYC. The topic was Text Analytics and New Visualization Techniques. It discussed some of the examples on this blog and my research; and additionally showed these techniques applied as a front-end to big data and text analytics in some large-scale real-world applications from Uncharted.

One example was an extension to a visualization of patents. Understanding patent activity is of interest, as patent activity is a leading indicator of new commercial opportunity and areas where new skills and expertise are required. Also, patent litigation is an indicator of areas with problems where people need to be more diligent in research and more careful in crafting patents.

At Uncharted , we created a visualization of all the patents granted since 1982 as a massive graph. All patent applications refer to earlier patents. From these references, we can build all the connections between patents into a massive graph. Then, we use a hierarchical graph layout technique so that patents that are highly interconnected are drawn close to each other (described here). The result is a visualization where each patent is a small transparent orange dot and links between them are thin transparent blue lines (Images courtesy Uncharted Software, used with permission).


Graph of all patents since 1982.

The graph layout nicely clusters patents together into visible communities. The graph is labeled, by using two or three unique terms from the most heavily cited patent in each community.

As an interactive application, the viewer can zoom in to successively lower and lower levels to see sub-communities and sub-sub-communities. There are also additional features such as search, color-coding, trend analysis and so on. All these features are used to aid the viewer in the deep analysis of IP topics, growth areas, problem areas and so on. In this post, we’ll just look at one feature regarding litigation. In this next image, patents with litigation are colored with purple dots (labels turned off, so you can see all the dots).


Communities of patents, with purple dots indicating patents with litigation.

Clearly, there are various communities that have significant patent litigation. But the ratio of litigated patents to uncontested patents in each community is not clearly distinguishable. While each individual patent is visible as a dot, what’s needed is some way to indicate summary metrics for each community.

Rather than adding extra visual elements that clutter the screen, we can re-use a scene element that already exists at the aggregate community level — in this case, the labels. Following the techniques discussed in this blog, we use the oblique angle of the text to indicate the litigation ratio: text with steep italics indicates communities that have high litigation, text with no italics have normal litigation, text with reverse italics have no or very low litigation.


Label oblique angle indicates ratio of patents under litigation in each community.

This is useful to know in advance if crafting a new patent related to a particular community: more care is likely required to create a new patent in a community that already has many disputes.

There are a half dozen other examples of text analytics and visualization in the full set of Strata slides, available here or at Strata.




Posted in Data Visualization, Font Visualization, Graph Visualization | Tagged | Leave a comment

Variable Fonts vs. Parametric Fonts and Data Visualization

I’ve typically been using ready-made fonts to create font-based visualizations in this blog. However, sticking to ready-made fonts means that you get only one slope angle for italic, a few levels of weight, maybe a condensed version or maybe not. Instead, the ability to quantitatively adjust font parameters such as weight, width or slope angle sounds far more enticing for visualization.

The good news is that fonts can be manipulated by the font user, and that there’s two different technical ways to achieve this.

Parametric fonts are programatically defined fonts. Parameters, such as x-height, stroke width and letter width are numeric parameters that can be set and then a new font generated based on those parameters. One early example of parametric fonts is METAFONT by Don Knuth. He enthusiastically claims:

“Infinitely many alphabets can be generated by the programs in this book. All you have to do is define the 62 parameters and fire up the METAFONT system, then presto – out comes a new font of type!” – introduction to Computer Modern Typefaces, 1986.

Knuth has a lot of low level parameters, which affect only a few characters such as dot-size (for the dots on i and j), ess (for the stroke breadth on s), beak, apex correction and so on. Here’s a snapshot showing some of his parameters, which looks much like an “anatomy of type” illustration:


Some of the font parameters in Knuth’s METAFONT.

Prototypo.io is a modern incarnation of parametric fonts, with click and drag sliders, interactive previews and feedback when you go beyond the expected parameter ranges, all in a browser. The starting point is a ready-made font with manipulation of 30ish parameters, such as as x-height, width, thickness, slant, serif width, serif height, bracket curve and so on. Quickly you can create a low-x-height, heavyweight, tightly-spaced, flared bell-bottom font. However, shifting some parameters into the redzone results in a fun font that isn’t particularly legible (e.g. a, 5, 0 are filled in):


Manipulating Prototypo serif parameters into the red zone to create a heavy bell-bottomed font.


Quite a few of the font parameters are mutually exclusive. So they can be combined together to create combinations in a visualization to represent different data variables. Here’s one of Prototypo’s fonts where I’ve created 8 variants: a boxy, heavyweight, condensed, bulgy serifs, wide serifs, low x-height and default plain. All the pairwise combinations are shown below:


Pairwise combinations of eight different font variants.

The parameters have ranges of values, so instead of only binary settings (e.g. normal weight and heavyweight; normal x-height and low x-height; etc), a range of parameters could be used. Here’s a font with five different levels of x-height and five different levels of curviness (think of curviness as an ordering from angular to rounded to boxy):


A font with 5 levels of weight and 5 levels of curvature.

One problem with parametric fonts and browsers today is that each parametric variant needs to be saved in a font-file: the 5 x 5 x-height by curivness font requires creating and saving 25 font variants, which is tedious; and then all these variants need to be loaded into the browser, which uses a lot of memory and can be slow. If we also want to add 5 weights, 5 oblique angles and 5 widths, we’d need to generate and save 5 x 5 x 5 x 5 x 5 = 3,125 font variants. This is problematic for visualizations that might want to encode 5 different data attributes each with 5 different levels into labels.

Variable Fonts, a new standard with OpenType 1.8, provides an alternative for font users to interact with font parameters. Variable fonts provide linear interpolation between defined extremes created by the type designer. Here’s a font with variations in width, weight and optical sizing (from John Hudson’s Variable Font blog post). Green letters are defined by the font designer, orange letters are interpolated from these.


Variable font illustration indicating interpolated fonts (orange) from defined fonts (green).

Variable fonts are very new, so it’s exciting to watch the evolution including browser support coming out.

Parametric and Variable Fonts in Visualization

Parametric fonts and variable fonts are interesting for data visualization for a few reasons:

  • You can potentially access low-level parameters, such as x-height, that you can’t normally access.
  • You have quantitative access to each dimension in a font: you’re not limited to only a few weights or two widths.
  • And these parameters can be combined together in new ways.

So what? Here’s a quick snap of the introduction to Mr. Weston from Chapter 2 in Jane Austen’s Emma. Interesting uncommon words have a high x-height while boring frequent words have a low x-height and inbetween words have an inbetween x-height:


Words with higher x-heights are less common words in English.

Not that this quick example is particularly readable, but seems reminiscent of some of the formatting in Ronell’s deconstructivist Telephone Book:


Font size varied to create a rhythm separate from words in Ronell’s Telephone Book.

Ronell’s formatting isn’t constrained to whole words but runs across sequences of letters creating a rhythm separate from the text. In general, being able to tweak and adjust font attributes opens up new creative possibilities and new kinds of data visualizations. Like Ronell, data visualization could apply font attributes like x-height, or width or curviness or other attribute to subsets of words. Suppose you had a dataset that indicated which letters in English words are silent – you could then draw words such that the silent letters are shown differently, say, with a low x-height:




Silent letters indicated with low x-height letters.

Those are just a couple of quick examples – many of the other font visualization examples in this blog could be adapted to better utilize some of these font parameters. And, of course, there are many other visualization possibilities yet to be considered.

Parametric vs. Variable Fonts?

So which is better – parametric or variable? I have used Prototypo and created a few variants and a few visualizations. I haven’t used variable fonts yet – but I like the specifications.

Both parametric and variable fonts have limitations. As discussed earlier, with parametric fonts in current technologies, a font file needs to be generated for each variant required, which means managing a lot of font files and dealing with inefficient use of application memory.

With variable fonts, however, the font user has to rely on all the variants created by the font designer. If no condensed/expanded variant was created by the font designer, then that axis will not be available to manipulate. Linear interpolation of shapes could also run into issues as well: for example, consider these three widths of Gill Sans (normal, condensed and extra condensed) – for example, the a and g completely change shape, the openings on S, 3, 5, get bigger, at some point the edges of the M start to slope,  and so on. These sorts of sharp changes presumably won’t be captured in variable fonts. In theory, a parametric font might be able to have some rules to accommodate this, but that would depend on how complex the parametric design rules are.


Gill Sans in 3 widths. Note how heights and letter shapes change with increasing narrowness.

I’m looking forward to experimenting more with OpenType Variable Fonts: it could make font manipulation in visualization much easier to do. I’m hoping that variable fonts won’t go the way of multiple master fonts. We’ll need a couple things to happen to make sure that we have a solid foundation for variable fonts. First, we’ll need browser and application support – and there is already some indication that there will be browser support in Chrome. Then, we’ll need to see font families created that support variable fonts. Ideally, these variants won’t be restricted to typical attributes such as weight or width, but hopefully we’ll see variants that have multiple x-heights, or serif styles, or slopes or other parameters.  Then, on the data visualization side, we’ll need to invent new types of useful visualizations that use these new font capabilities.

Here’s hoping that variable fonts will become a well supported standard.

Posted in Data Visualization, Font Visualization, parametric fonts, Text Skimming, Text Visualization, Variable Fonts | Tagged , | Leave a comment

The Origin of Thematic Maps — and the problem with base maps

Why is there such a big gap between thematic maps and label maps? Both types of maps show data about places. Thematic maps typically use lots of color to show data about places; whereas label maps use a lot of labels to indicate the names of places — plus they use typographic formats such as bold, italics, caps and so on – to show extra data about places.  Compare these two US maps (both from the National Atlas of the United States of America 1,2):


Left: Choropleth map of US counties with color indicating presidential vote in 2012; Right: Atlas map of US with various text labels and formats. Inset shows city labels using size and italics to indicate additional data.

On the left, counties are color coded, indicating one data attribute per county (and tiny counties may not be visible). On the right, cities are indicated with text, plus the population is indicated by text size, plus italics are used to indicate if the city is the capitol of a state or country.

Obviously, both maps serve different purposes, but in both cases additional data about places is getting encoded into visual attributes. The difference between thematic and labelled maps is entrenched in our thinking about maps. In cartography textbooks (e.g. Tyner, Brewer, etc), thematic maps are discussed in completely different chapters than labels: text labels aren’t considered to be thematic.


Perhaps it is useful to look back in time to figure out where this split first occurred to provide some insight. Thematic maps have been around for a very long time. Here’s a pair of maps from the 1850’s. On the left is a thematic map by Minard (link) with circle sizes indicating shipments per port. On the right is a contemporary map from an atlas by Heinrich Keipert (link) where city labels indicate information via text, font size, bold, underline and capitalization.


The first Choropleth Map

The earliest choropleth maps (according to Michael Friendly), are from Charles Dupin in 1819 (almost 200 years ago!) with an example shown on the left below (link). Simple grey shading applied across almost equally sized regions makes for a great image showing a broad dark band across the center of France. Again, on the right is a contemporary map, this time by Carey and Buchon (link) and again, this map has variation in typography such as spacing, capitalization, italics and size.


Crome’s Neue Carte von Europa

So where did Dupin get the idea for a thematic map? A big influence on Dupin was the German researcher August Crome. Below is Crome’s Neue Carte von Europa from 1782. This maps shows where various commodities are produced across Europe.

You can see that Crome starts with a base map that has the labeling conventions of the time, for example italics for rivers, all caps for country name, and color to denote country boundaries. Then he adds on top of this map all the content related to his thematic investigation: different kinds of commodities. He displays these as symbols and codes (only a small portion of the legend and map is shown – original map is here).

Base Map Pain

However, Crome can’t differentiate these symbols and codes from the base map using color, font size, case, and italics — because those have already been used in the base map. Even if he were to use them, they wouldn’t stand out because those formats would just be confused with the base map use of those same attributes.

Anyone who’s designed a map knows the pain of base maps: it’s really hard to make your data standout when the basemap is an already noisy and colorful Internet map or satellite image. And, when you’re designing a thematic map, it’s nice to have patterns in your data visually pop-out. So Crome is backed into a corner and uses different symbols for commodities as well as pairs of letters. However, all of these effectively require perception of different shapes, and different shapes don’t visually pop-out (i.e. shape is not preattentive, e.g. scholarpedia visual search, or Bertin).

So Crome’s proto-thematic map was highly popular but there are no patterns that you see – you have to inspect it closely and read all the labels. Instead, Dupin starts with a much simpler base map – outlines of regions – and his dataset is simpler too – just a single variable. As a result, he is able to use an attribute such as brightness or color. He adds labels too, but his labels are simple plain text and the labels are easily skipped by other later map makers.

What if…

Could Dupin have used text and typographic formats instead, like the other contemporary label-based maps of the time? It’s an interesting hypothetical question. Bold type has strong preattentive properties (e.g. Strobelt et al). Dupin might not have known about or had access to bold type: it was invented around the same time as his map on the other side of the English channel (1820s). And the first bold-faces were not available in a range of different weights which Dupin would have needed. Similarly, italics of varying slopes, or different styles of underlines wouldn’t have been available to him. As a result, Dupin and his engraver use intensity, which was available to them, and launching the split between thematic maps and label maps.

Here’s a thematic map, using font weight (more examples of typographic thematic maps are in the paper just published for ICC available here):


Six different levels of font weight are used to convey data.

I wonder what Crome and Dupin would have thought?


Posted in Choropleth, Data Visualization, Font Visualization, Thematic Map | Leave a comment