In Defence of Data-Dense Visualizations

I’ve done a couple of presentations with content from my book Visualizing with Text to grad classes in the last month. In both cases, a couple of people expressed concern regarding the complexity of some visualizations, being particularly data dense. This is an argument that I have heard on occasion throughout my career as I am often involved in the design and development of data-dense visualization for domain experts. Data-dense displays are not uncommon in domain applications – consider a few examples:

Figure 1: Some data-dense visualizations. Clockwise from top left, map of Abu Kebir, 1918; Earth Scientist’s Periodic Table of the elements and their ions, 2013; Financial trading floor desk, 2012; NYISO’s video wall of electrical grid, 2014.

These visualizations are packed with data. The map has many layers: roads, railways, rivers, canals, drains, buildings, labels, textures, etc. This periodic table has icons, symbols, numbers, names, pathways, ions, solutes, charges, etc. The trading desk shows a wide variety of information associated with securities: timeseries, events, color-coded news, internal data. The electric grid shows hundreds of entities: transformers, capacitors, hydro dams, wind farms, interchanges, power corridors of different voltages, status, thresholds, etc.

And, if you look at the Uncharted research website, you’ll see many more data-dense visualizations that our company has worked on.

Data density slows down understanding

The essence of the data-density criticism is that, with a greater number of data points and/or multi-variate data, the viewer can become confused as to where to look. There may be many different visual patterns competing for attention. Should the viewer be focusing on local patterns in a subset of the display, or macro patterns across the entire display? Should the viewer be attending to the color, or the size, or the labels? If each represents different data then the semantics will be different, based on the visual channel that the viewer is attending to. Worse, if the viewer starts to move back and forth between many different channels, one may forget the encodings and become more confused.

Some people have become conditioned to think of data visualization as they might see in the popular press, visualizations made for communication, or visualizations that utilize straightforward visualization types that you might get with a library such as D3 or Vega:

Figure 2: Some common visualization techniques, not particularly data dense compared to Figure 1.

Viewers may be conditioned to think of these as encompassing all (or most) visualization types, with many articles organizing visualization into a limited palette of visualization layouts: periodic tables, chart choosers, lists, galleries, zoos, and more. These visualizations typically don’t have many data points (20-200), and typically show only a couple of variables. Data is homogeneous – you’re not looking at multiple datasets with different types of entities jumbled together. Answers are easier, because there really aren’t many different dimensions or layers to be considered.

But not all problems are simple.

Complex problems may need complex data

The images in Figure 1 show that multi-metric, data-dense visual representations exist in practice – in both historic visualizations and modern interactive visualizations. These complex visualizations bring together multiple datasets in layers, in many windows, in large displays – i.e. into data-dense representations.

This extra data is required, because there may be multiple answers possible. If the price of a stock goes down, it may be due to the overall market going down, to a competitor dropping their prices, to poor sales data in the company’s earnings, to a negative news story about the company, or other factors. The extra visual context facilitates reasoning across the many plausible causes to assess the situation. In the stock example, multiple causes can be true at once: an expert needs to see all and determine which are most relevant to the current situation.

In addition to quantitative data, there may be other facts and evidence: qualitative data, news, videos, and so on. There may be multiple perspectives to consider. There may be different time horizons to consider. (For example, the stock market collapse in 2008 was triggered by the collapse of Lehman Brothers on September 15, 2008; but months before Bear Stearns (a competitor) was acquired when it ran into funding issues, and even earlier some mortgage origination companies went bankrupt.)

More generally, wicked problems are not easily solvable. The problem can be framed in more than one way, different stakeholders have different timelines, constraints and resources are subject to change, and there is no singular definitive answer.

As a colleague tells me: Complex problems have easy to explain wrong answers.

The Value of Data Density

Communication: There are many different reasons for creating visualizations. The low density visualizations in Figure 2 may be part of narrative visualizations, for explaining the results of an analysis. Data has been distilled to a few key facts.

Dashboard: Or, simple low-density visualizations may be part of a overview dashboard, with many small visualizations, each of which provides an overview of a different process, and can be typically clicked on for more detailed analysis. These overviews only need to provide sense of status: if there are any issues then the viewer has workflows to access more detail.

Beyond the communication and dashboard uses, there are many other uses for visualizations, where density may be valuable:

Organization: For example, the map and the periodic table in Figure 1 organize large amounts of data. These many layers of data allow cross referencing between many different types of information. On the map, the user may need to know the location of buildings (objectives), roads (connections), canals and railroads (obstructions) in order to plan a route.

Monitoring: The market data terminal and the electric grid operations wall in Figure 1 provide real-time monitoring across many data streams. Many heterogeneous datasets come together into a single display. Time is of the essence in real-time operations. Detailed data can’t be hidden a few clicks away: all key information must be designed and organized for quick scanning and immediate access.

Analysis: Knowledge maps and network visualizations are often about analysis of complex data. SciMaps.org has 100 knowledge maps, each collecting and visually representing many facets of a particular corpus; such as Figure 3 left, an interactive visualization of the edit history of Wikipedia articles by Wattenberg & Viégas. Visual Complexity has 1000 network visualizations, such as Figure 3 right, a dynamic visualization of social networks indicating people, links, activity, postings, sequence and message age by Offenhuber & Donath. Both of these are visualizations about text over time: edits, exchanges, persons. In both cases there are many dimensions to understand and comprehend.

Figure 3: Knowledge maps and network visualizations.

Exploration: Data-dense visualizations aren’t limited to domain experts attempting to understand complex datasets for their jobs. In Dear Data, Georgia Lupi and Stephanie Posavec create some awesome multi-variate data-dense visualizations of mundane day-to-day data. Why? Exploratory data analysis needs to consider lots of different data – the exploration is required to consider, assess, investigate, compare, understand and comprehend many different data elements. To do exploratory data analysis with only a well-organized quantitative dataset may miss much relevant data (e.g. see Data Feminism). Lupi and Posavec show by example that many different attributes that can be extracted from everyday life and then made explicitly visible for an initial exploratory view.

Figure 4: Dear Data. Left, Lupi’s visualization on doors. Right: Posavec’s visualization on clocks.

Data Density and Visualizing with Text

The objective of the book is to define the design space of Visualizing with Text for all kinds of visualizations, simple or complex. Section 2 in the book deals with simple labels, such as scatterplots, bar charts and line charts: text can be used to make simple visualizations more effective. Section 3 in the book goes further, using multiple visual attributes to convey more data (figure 5).

Figure 5. Some more complex visualizations from Visualizing with Text.

The Future of Data Density

Data density is likely to become a bigger issue in the next decade. Greater awareness of bias in data makes it more important to represent more datasets in a visualization. Analysis of richer data types – such as text, video and imagery – will likely necessitate new ways to layer in additional visual representations. Bigger data will have even more variables requiring more ways to show more data, or risk summarizing too much useful detail out of data. Specific visualization applications, such as cybersecurity, fake news, and phishing, need to deal with ever more complex attacks which implies more nuanced analyses based on more complex data.

Data density will become increasingly important to future visualization and visualization research.

Posted in Data Visualization, Text Visualization | Leave a comment

Visualizing with Text: author’s copy and new content

I just received my author’s copy of Visualizing with Text this morning! It’s awesome to finally hold the book after 2 years of writing (and the start of this blog 7 years ago!):

Here’s the book with the nice glossy cover.

Flipping through triggers some memories, like finding this user study on charts from 1961 comparing labels vs. legends! (Can you BELIV that there were user studies 59 years ago, before VisWeek? see Sarbaugh et al: Comprehension of Graphs):

Label or legend? An experiment from 1961.

Or a larger effort specifically for the book is captured on this page. Since the book defines a design space for visualizing with text, I felt compelled to demonstrate the flexibility of the design space to create many different visualizations of one document: here’s 14 different visualizations of Alice’s Adventures in Wonderland:

14 different visualizations of Alice’s Adventures in Wonderland.

And, here’s a page on very different uses of visualizations (beyond using visualization for preattentive perception of patterns). On the left is a system diagram of a power grid (an inventory use that organizes all the assets in the grid, courtesy of ISO New England). Top right is an infographic by Nigel Holmes of a graph, where the edges are literal text implicating individuals (a communication use that distills days of testimony down to select statements, courtesy of Nigel):

Different uses of a visualizations.

“Preview” is now working on the CRC Press site, and “Look Inside” is now working on Amazon.

Posted in Graph Visualization, Text Visualization | Leave a comment

Visualizing with Text – high-res figures now on-line

Visualizing with Text releases any day now: I hope to have my copy before the end of VisWeek. I’ve finally posted all the figures that I authored on-line with a CC-BY-SA 4.0 license. There’s 158 high-resolution images and diagrams from the book in the PDF file. These may be a nice complement to the eBook or physical book as some of the text may be too small to be readable in some of the larger visualizations. My figures are all released with a CC-BY-SA license, so they can be reused, for example, for teaching or mixed up into a collage or whatever.

Some of the figures from Visualizing with Text.

There’s another 99 figures that are not mine – I’ve included links to online versions of these images where available on the last page.

And links to many of the external figures.

Sometimes people ask me which of the visualizations I like the best. The answer varies over time, although I am currently biased towards the text-dense multi-variate visualizations designed for a large screen, such as these ones (Figures 6.19, 8.10, 9.8, 10.11, 11.3, 12.11) – see the PDF for high-res versions:

Some text-dense visualizations.

Why? Viscerally, I like the rich texture of shapes, colors, and structure where multiple patterns appears – visualization should be supportive of representing complexity and affording multiple interpretations. In my day-to-day work, I often design visualizations for financial market professionals: they don’t necessarily make money if they have the same ideas and same thesis as everyone else. Data-dense visualizations that prompt multiple hypotheses can be a good thing. (see also Barbara Tversky’s keynote at Visualization Psychology earlier today!).

I also think these dense visualizations push the boundaries of the design space of visualization and of text-visualization. Perceptually, multi-variate data can be a challenge. Data-dense visualizations can be a challenge. The linearity of text (i.e. you have to read words in some order) vs. the volume of information is a challenge: what happens to the global pattern? what happens if “overview first” doesn’t necessarily provide a macro-pattern?

A couple of these visualizations I just presented for the first time yesterday at the Visualization for Digital Humanities workshop in a paper titled Literal Encoding: Text is a first-class data encoding.

Posted in Data Visualization, Design Space, Text Visualization | Tagged , , , | Leave a comment

The transformation of Isotype

Over on EagerEyes, Robert Kosara recently asked, what happened to ISOTYPE? It’s a good question, with a multi-faceted answer. Here’s a few facets, mostly focused on inherent limitations of the Isotype design approach:

1. What’s an Isotype line chart?

Isotype examples are typically bar charts and maps, as they show counts of things – i.e. what we call unit visualizations. Extending it to other types of visualizations is non-trivial. Certainly scatterplots shouldn’t be too hard (e.g. fruit, animals), but what about line charts. Certainly timeseries data is important to plot for many analyses but what’s the Isotype answer for line charts? Typically Isotype reduces the timeseries down to a few time periods and draws them as their typical stacked bar. As such, Isotype is an approach to creating charts, but Isotype is not a comprehensive system for all types of visualizations.

Some have tried, for example, see the Agricultural Outlook Charts from the USDA in the 1950’s. Some of the bar charts in these publications are heavily influenced by Isotype, such as the coins in the Figure 1 left. However the line chart in Figure 1 right struggles with icons, instead the icons are limited to identifying the line, and indicating the trend with the horse representationally and quantitatively heading down-hill.

Figure 1: Isotype is not well suited to a line chart with many time intervals.

Figure 2 shows another 1950’s publication heavily influenced by Isotype: Midcentury White House Conference on Children and Youth, A Chart Book. On the left is a bar chart that could almost have been lifted straight out of Isotype, wonderfully clear. On the right is a pie-chart infused with Isotype icons, where only by luck the thin wedges of the pie fit it smaller icons assigned to them (and strictly speaking, the pie segment with “both parents” should have repetition of that icon through out the area, but that wouldn’t quite work either.

Figure 2: Isotype is also not well suited to this pie chart.

2. What’s the icon for GDP or CPI?

Icons for concrete real-world objects can be easy to design, as the real-world object can be the basis for the icon, e.g. people, fruit, animals, tractors, and so on. It gets trickier when some of those categories are visually similar: Isotype never created separate icons for wheat, barley and rye, for example. And Tufte’s log animal chart has three rather similar looking small furry mammals, which can be only be definitively decoded by referring to the labelled scatterplot on the previous page (exercise for the reader to now go find the previous page:-)

After that, icons get difficult to design for abstract objects. What would the icons be for financial data asset classes, such as: stock, bond, credit default swap, collateralized debt obligation, repo, option, future and a forward? Even concrete real-world entities can be misinterpreted, such as the famous dogcow.

Making simple, expressive icons requires design effort. Gerd Arntz‘s wonderfully expressive icons helped drive the success of Isotype, but are beyond the design capabilities of the average non-designer. Gerd could create an icon for the abstract concept of unemployment with a human icon, looking down, at rest, hands in pocket: it’s brilliant, but not easy to design especially with such clean, clear graphical shapes that can be easily printed.

3. And what about the axes (and the values)?

Perhaps most audacious move of Isotype is the removal of the numeric axes. Isotype charts are beautiful with their clean depiction of icon stacks and graphical cues. Removing the numeric axes is brilliant because you can easily and visually compare ratios of different stacks, e.g. one stack of icons is twice as long as another stack of icons.

But what if you want to know the values?

It’s a common task to want to know what number a stack of icons represents. Unfortunately, Isotype makes it hard. You have to first count the number of icons. Then you have to find the legend, where it tells you how many items that one icon represents. So, for example if I look at Eheschliessungen in Deutschland (Die Bunte Welt, 1928, page 42), I see that in 1919-1922 there are 8 marriage icons and each icon represents 400,000 marriages, so 8 x 400,000 = 3.2 million marriages. That’s math. That’s cognitive effort.

Furthermore, when the design system uses a small number of icons, it’s not very precise. Isotype tends to use full icons or nice fractions such as 1/4 or 1/2 of an icon. The prior stack of 8 icons could easily represent 3.1 million or 3.3 million.

If the chart had a numeric axis, you could just scan it and estimate the number directly – much easier. Or you could put the number directly in the chart. In Figure 3, the same marriage chart from Isotype is replicated with US data the Midcentury White House Conference on Children and Youth, with the addition of quantitative values at the end of the icon stack:

Figure 3: Isotype has only the icons and the legend: if you want to know the value, you can estimate the value by counting icons and multiplying. In this derivative of Isotype, you can just read the number.

4. Good Isotype is hard

Often simple designs are the result of hard work. Simplicity takes effort. In The transformer: principles of making Isotype charts (Hyphen 2009), Marie Neurath’s first hand account describes the design task of transforming data into an Isotype representation (what we might now refer to as encoding). Marie explains a myriad of design decisions made in different charts to get the desired reading of the result. For example, coffins are replaced with tombstones to address the issue of relative size of adjacent icons and potential misinterpretation. Or, doubling with width of an adjacent bar so that relative portions can be perceived. And so on. These are non-obvious design solutions, arrived through a design process to achieve a good effect that may seem obvious in retrospect. (Unfortunately, image copyright status is uncertain).

Has Isotype really disappeared?

The prior four points are focused on Isotype’s limitations that make it hard for Isotype to extend more generally across data visualization. I don’t even address points such as modernism (which both Robert Kosara have both previously talked about) or technical changes (by the late 1950’s phototypesetting became the norm, but the technology tended to soften edges and fine detail, so crisp icons may have been difficult to reproduce- I discuss some of these other factors, which also limit the use to text, in my forthcoming book Visualizing with Text).

That said, I believe that unit visualizations have inherited the legacy of Isotype. And there are some fantastic unit visualizations:

  1. Photos: Instead of the effort of designing icons, why not use photographs of the entities of interest? A favourite is Münzkabinett’s interactive piles of coins (Gortana et al).
  2. Shapes: Less expressive than photos, simple shapes are a good choice for unit visualizations with hundreds of thousands of items, such as SandDance.
  3. Labels: I’m personally interested in the use of labels in unit visualizations, such as this example of the passengers on the Titanic. I’m not the first, there are more compelling examples such as Maya Lin’s Vietnam Memorial, which in turn was based on earlier lists of casualties.
  4. Physical units: Physical visualizations are well suited to using units. These include specifically designed physical unit visualizations using concrete scales, as well as examples in the real world where the units can be perceived individually or as part of a whole, such as the fields of WWI crosses.
Posted in bar chart, Data Visualization, Isotype, Line Chart, Pie Chart | Tagged , , | Leave a comment

Visualizing with Text – Interactive Demos

Over the summer, eight of the examples from Visualizing with Text have been re-implemented as Observable notebooks! These can be interacted with, code is visible, and code can be edited live! All the demos can be accessed from the supplementary website.

They are all text-intensive visualizations with some interesting datasets, such as countries in the world where more people are dying than being born, countries with big jumps in Covid unemployment, and coroners’ reports indicating many ways to die in Georgian London.

Manipulating Code

For the rest of this post, I’ll focus on one demo: word skimming. This demo processes paragraphs of text, ranks words based on frequencies compared to a large corpus (Wikipedia), then weights the words so that least common words are formatted to stand-out. This formatting facilitates perception of uncommon words, which is a taught strategy for skimming text:

While skim formatting may seem strange in prose on a computer screen, the technique has existed for centuries, in instruction manuals, advertisements, comic books, and even software code editors, as shown in some of the examples in the book.

In this particular demo, the viewer can easily cut and paste a completely different text to format for skimming. They can also select from a few different formats, such as adjusting font weight, font width, or x-height. Here’s the same demo using the opening paragraphs from The Decline and Fall of the Roman Empire and formatted using both font weight and x-height to indicate uncommon words, such as valor, emperors, Trajan and Hadrian:

Variable width and variable x-height are uncommon in most prose text formatting. This demo uses the relatively new technology of variable fonts. With variable fonts, the font designer can expose properties, such as weight, width and x-height as quantitative parameters. These parameters are then adjusted based on a simple algorithm which splits apart words (using the JavaScript library compromise), calculates word rank (based on word lists from Wikipedia), and then adjusts the variable font parameters accordingly.

Since this example is implemented in Observable with all code editable, it’s straight-forward to modify the code and try other visual attributes that aren’t used in the demo code. For example, here’s the opening paragraph from Moby Dick, with uncommon words highlighted in red:

To highlight the words in red, a single line of code can be added (without worrying about adapting the drop-down menu, or configuring d3 scales, etc.) In this case, the following was added to the update cell:

// words are in an array elements [0-n] and d.rank is the rank of each word:    
elements[i].style.setProperty("color", (d.rank > 5000) ? "red":"black") 

Essentially, if the rank value is over 5000, color the word red, else color the word black.

Instead of a toggle red/black, we could easily add a d3 scale and use a color ramp, so that colors range from black, through blue and purple to red. (Note, no SVG is used, this is all HTML – D3 is only used in this code to facilitate scaling numeric values into ranges that suit the variable font used). In the snapshot below, a color scale is used in addition to font weight and x-height in this snapshot of text from Frankenstein:

While feasible, this is not necessarily recommended. The many changes of hue, weight and x-height creates many distractors, which reduces the readability of the sequential text. As a design effect, it may be an objective to create disruptive formats, such as the wonderful encoding by Ben Fry of Frankenfont.

On the otherhand, if the goal is skimming, then it is desirable to easily read the immediate surrounding context to aid comprehension. Changes of many formats reduces the typographic consistency requiring greater attention to decipher the surrounding text. Typographers discuss this as readability, which is described more in my book (briefly) and in typography books in more detail. One must take care when manipulating typographic attributes to not create Frankenstein paragraphs unless it suits the particular task or objective.

Posted in compromisejs, Data Visualization, Observable Notebook, Text Skimming, Text Visualization, Variable Fonts | Tagged , , , | Leave a comment

“…and what is the use of a book without pictures or conversations?”

My book Visualizing with Text is nearing publication (early November!). One goal of the book is to appeal to both researchers (with structured text with logical arguments), and designers (with many examples and pictures). I really like books with lots of images. Like the quote from Alice’s Adventures in Wonderland, in the title, I prefer well illustrated books. And I don’t like reading about visualizations where you only get little teeny snapshots: after all the primary subject is visualization!

Text is good to explain, structure and provide context, but pictures are important to a designer: to see how the conceptual description is actualized; to see all the little design decisions that contribute to the whole; to see the anomalies and things that could be improved; to provoke design inspiration; and so on. Arguably, Jacques Bertin was successful with Semiology of Graphics 50 years ago because he presented a theory supported with many examples and illustrations. My ideal goal is to have a 50/50 split between text and images. But, measuring text to images is a bit tricky:

Counting images

It turns out that the publisher and I count images differently. We can easily agree on page count (274), but my publisher currently counts 146 illustrations and I count 250 images (+/- a few to be sorted out). What? How can we differ so much?

I think the publisher is counting image captions. On the otherhand, I’m counting the individual images that might be referred to in a single caption, for example here’s figure 1.3. It has a single caption but two different images: left is a book from 1497, right is a different book from 1589. I count this as two images (the images come from two completely different sources).

Then, there are parts of the book where I discuss using visualization techniques inline with text. Since, I’m advocating inline visualization, these are simply text richly formatted within the flow of the prose. In this case, there is no image caption, so they don’t count at all, whereas I count it as one image (I had to write the code to crunch the data to format the text).

But there’s more nuances. What about a couple words formatted within written text, simply to facilitate cross-referencing to an image? I don’t count those. And what about an image that’s a composite of many teeny snapshots? I count those as one image. And what about a table that’s got some visualization formatting associated with each cell based on data attributes? That might count as a table and not an illustration. Again, I had to write the code to crunch the data to format the text, so again I count that as an image.

So, overall, I get to around 250 images, out of 274 pages. Given that the Table of Contents, Preface and Index don’t have images, that gets close to the 50/50 split.

Types of Pictures

Returning to the topic of pictures, a few other stats are useful. The pictures split approximately 50/50 between pictures that are my own vs. pictures of other visualizations. I think it’s good to provide a grounding based on real-world precedence. For the pictures from other sources, I’ve tried to include URL’s to them, so readers and educators can easily find the online versions where available.

With regards to my pictures, some are new unique visualizations, some are sequences of pictures, such as showing the same visualization using different attributes, or zoomed in or so on. There’s about 80 different examples of visualizations with text that I created in the book. Some have been published before, on this blog or in research papers.

But I wanted to create some new content (why buy a book with old pictures). So there’s new, unpublished visualizations that will be making their first appearance in the book, including scatterplots of cars, an adjacency matrix of dialogue, a syntax diagram, a massive textual stem & leaf diagram, assorted tables with visualization characteristics, a data comic, some expressive lengthening examples, and a topic model visualization. Here’s eight examples taken from the draft:

Larger versions of these pictures will be available when finalized and placed on the publisher’s website with a CC license shortly before the book is released.

Posted in Data Visualization, Text Visualization | Tagged , , , , | Leave a comment

Which Font Should I use in my Visualization?

Yesterday the Data Visualization Society hosted a Fireside Chat regarding typography and visualization, which was fun to participate in. There were too many questions to answer all. One question with many variants was: “Which font should I use in my visualization?” The answers given noted that there isn’t any one font, it depends on the use. In this post, I’ll list a few that I tend to use and why; and a few caveats.

Small text

For things like tick labels or labels in the plot, I tend to use a font that will be robust on the screen at a small size: it needs to be legible. This is not the place for a “display font” (fine serifs, funky letterforms). Use a workhorse font, such as the ones you might see heavily used in mobile design, such as these sans serifs: Roboto, Source Sans Pro or Segoe. A very close second is a slab serif font. Slab serifs are chunky serifs so they can work well at small sizes. Two that I like are Rockwell and Roboto Slab.

Top 250 words associated with one or more emotions.

Data driven text

I like to use data-driven text in visualizations. Like labels in maps, type can express data values not only by varying size, but also by varying attributes such as font weight, width, typeface and so on. Much of this blog has examples of data driven text, such as the emotion word graph above, as well as my upcoming book Visualizing with Text. Here’s a sample of type attributes that can be data driven:

Data driven font attributes

Even though the row “Typeface” shows some rather funky fonts, for data driven fonts, I tend to stick to a small number of different typefaces that can be readily distinguished. Readily distinguished means that each font should look different from the other fonts used but still work at small sizes. Again this rules out display fonts. I might use a mix such as a sans serif with a high x-height (e.g. Source Sans Pro), a slab serif (e.g. Roboto Slab), a serif with a low x-height and humanist letter forms (e.g. Garamond; or maybe a high stress serif, such as Bodoni), a blackletter font (no current favorite, avoid ornate ones, Lucida Blackletter is OK), and maybe a handletter font (again, avoid ornate ones, I like Tekton Pro: verticals are vertical and not sloped). Here’s a snapshot so you can see how different some of these fonts can be:

Examples of different fonts for categoric encoding.

When encoding quantitative values into text, the most common approach in maps is to use small variation in size, or variation in font weight. You need to use a font with a large variation in weight from lightweight to heavyweight. Again, Source Sans Pro and Roboto offer a wide range of weights. Variable fonts often offer a wide range of weights. Some fonts also offer variation in widths – in this case I might use Saira which has many weights and many widths, but there may be better variable font choices now. Variable fonts are also better suited for web: instead of downloading the 36 weight and width combinations, a single font can be downloaded then configured in CSS.

Titles and Subtitles

Titles and subtitles are generally larger. This gives you more options. Often titles and subtitles may contain a sentence or two. Readability is a consideration and serifs are often considered more readable. I tend to like to use slab serifs (e.g. Roboto Slab) or a geometric sans (e.g. Gill Sans or Lato) for titles. Geometric sans tend to use simple geometric forms, such as a perfect circle for the letter o, which tends to make them wider than other sans serifs, which is why I don’t use geometric sans within the visualization.

Caveats?

There’s always caveats. If you’re creating a visualization where the labels use codes, such as airline flights (e.g. AC123), bonds (e.g. IBM2.5-250515), airline reservation codes, etc, make sure that the numbers and letters are clearly distinct – for example O0 or I1l may look too similar (e.g. Titillium Web). This is a real problem in many displays such as air traffic control, electric grid operations, financial market screens, and just about any modern app where items refer to ID codes. Font B612 was specifically designed to maximize these differences usable at small sizes in visual displays. Also note that many monospaced fonts are designed to accentuate these differences, such as Inconsolata.

Posted in Data Visualization | Leave a comment

Designing a Book Cover (or the long history of text on paths)

Note: I will be speaking at the Data Visualization Society (DVS) Fireside Chat on Typography for Data Visualization on Wednesday June 24th.

After two years, my book Visualizing with Text is getting close to publication. Finally, it is time to design the cover! About a year ago, I designed a placeholder “cover image”. It was procrastination: I should have been writing content, researching, tracking down copyrights and preparing images.

The initial place holder image I decided should be something that indicated both the history of representations that manipulated text and the modern, new visualizations that I was creating inspired by some of these historic images. The book has a lot of different visualizations, so I thought of a potential collage, perhaps focusing on a set of images from just one or two techniques. I’d always received strong positive feedback every time I showed text-on-a-path for social media visualization, so I focused on that technique. Furthermore, showing conversational text as text-on-a-path has a long history, so there were lots of fun images available to use ranging from medieval paintings and comics through to my visualizations. Then I made a quick placeholder image with some text, images and an axis:

The placeholder book cover.

With the interior of the book submitted in April, it became time to focus on other aspects of the book, such as endorsements, cleaning up images, cleaning up code, and the real cover! One of the early reviewers of a draft version of the book was John D. Berry, a typographer and designer I’d met at a conference during my PhD. John graciously offered to create the cover and I jumped at the opportunity to work with him – I really like John’s modernist design sensibilities in his portfolio and I like the opportunity to collaborate with other designers with expertise in areas that far surpass my own. We would need to follow the publisher’s template, given the book is one in the AK Peters Visualization Series edited by Tamara Munzner (which I am honoured to be a part of).

AK Peters Visualization Series

John created many different book covers for consideration, some based on content from the book, some based on contemporary typographic art, and some using historic images. John recommended an abstract approach, suggestive of the interior content, using large images, so that it might stand-out both on a shelf in a bookstore and on-line in a browser. That matched with my own preferences for high-contrast, clean modernist designs.

Potential book covers.

I really liked some of the covers based on contemporary typographic art. But we didn’t have much time, nor did we have budget to get license rights for one of these, so we decided to explore the historic image route.

I had provided John with a few dozen historic text-on-path/spoken text images, plus a few variants of my text on path visualizations of social media and news headlines. Historic images included late-Gothic scenes with banderoles (a scroll extending from a character indicating spoken text), such as the monks (above left), colorful paintings, and many examples in block-books from the mid-1400’s:

There are very many examples of text on path over centuries.

I’ve also used comic book examples a number of times in my analysis: as comic artists expressively use type, twisting it along paths, varying font styles and so on. Looking backwards in comics, there are great examples of text at all angles in bubbles in the work of early caricaturists such as Thomas Rowlandson, such as the example I’d used in my placeholder, as well as the hundreds of others that Rowlandson produced. John explored the Rowlandson images and found these emotional characters:

Rowlandson’s characters strong reactions!

At one point, I riffed on one of John’s designs and the above original design to create an over the top collage: many different examples of text on path, many different time periods; then pasted over top maps from many times periods. My mockup stretched across both covers. But, there’s some aspects in cover design that it doesn’t really address: at the end of it all, it needs to be meaningful at a postage stamp size for the person browsing books online. The trouble with big collages is that they invite long viewing but don’t necessarily provide a quick answer at a glance. More effort is required to decode the mix of elements, separate foreground from background, and so on. In design, a poor result can be a good thing – it means we don’t want to explore further in that direction.

Over the top collage

John went in a different direction, discarding the strictly linear layout, taking into account all the required design elements, and came up with a much stronger design. Each image is much more tightly cropped, retaining just enough of each. It plays with the spoken Gothic / Georgian text rising from the bottom, going though the title box, wherein it transforms into the new, colorized social media text emanating above. The title box “Visualizing with Text” transforms the input (historic representations of spoken text) into output (new visualizations of social media text). Much like how I put typography to work in my book, John put the title block to work.

As the x-axis can disappear, now the bottom portion of the book is free to express something different, in this case, a new visualization from in the book viewing the stems of root words, a kind of foundational language inspection underlying the speakers above. Perhaps Rowlandson’s two fearful characters should be afraid of both the title and the foundations.

Close to final cover.

Readers may also notice that John changed the font in the title. The prior books in the series use ITC American Typewriter, a font that hints at typewriters, which, in turn, hints at the monospaced fonts prevalent in computer code (and thus books about computer science). John and I wanted something punchier. The challenge with many typewriter fonts is that they tend to be fairly lightweight: note how it’s difficult to get Courier to standout on a slide with mixed fonts. John instead recommended Dattilo, a newer, heavier weight font with a typewriter feel (“dattilografia” is Italian for “typewriting”) i.e. the same spirit of typewriter but heavy.

Overall, we end up with a meaningful punchy cover, that hopefully engages the casual web viewer when browsing a book website. Maybe they will judge this book by its cover?

Posted in Data Visualization, Line Chart, Microtext, Text Visualization | Tagged , , , , | Leave a comment

Text and Visualization Workshop at ESAD Vallence

I had the good fortune to be invited to speak at a workshop late last year at the ESAD design school in Valence France:

ESAD Valence. It’s an awesome roof!

The workshop was titled sous le texte la carte: La visualisation du texte en cartographie. Although the title focused on text and cartography, the presentations were a bit broader, extending to visualization and other applications.

Before the start of the workshop, I was invited to a design review for a variety of student projects using interactive type. I was expecting to see some videos or maybe some processing: instead, it was all HTML5 + Javascript. As explained to me later: there are no jobs for processing – all the employers want Javascript, so they have shifted a lot of the interactive typography to Javascript now. Projects experimented with techniques such as interleaved text, animated blurs, superimposed scrolling text, interactive hierarchies, and so on within dynamic layouts.

Interactive type projects by students at ESAD Valence.

With regards to the workshop, there were a number of good presentations. However, my French isn’t great, so I wasn’t able to follow the discussions closely. Here’s a great slide regarding typography on historic french maps by Jean-Luc Arnaud (http://www.cartomundi.fr/site/#): note the use of different sizes, allcaps/lowercase and italics, to create an ordering of labels for use on different maps.

Labels for maps varying in size, capitalization and italics.

Jean Luc also presented some of his contemporary typographic maps. Not quite like Axis Maps that some readers may be familiar with, these maps superimpose text over other text and don’t repeat labels:

Small portion of one of Jean Luc Arnaud’s typographic maps.

This was followed by a highly interesting presentation on the use of standardized symbols on shipping navigation maps by Anais Déal. Being important navigational aids, one would hope that these international symbols would be consistently implemented by various national map makers. Unfortunately, they are not. Here’s some examples:

Standard international symbols on marine navigation maps don’t quite follow the standards.

Sophie Boiron and Pierre Huyghebaert showed some historic heavily labelled maps and then showed this fantastic typographic map they created. At a distance, it’s a map of Brussels (left). Zoomed way in, each block is a sentence of text (right):

Boiron and Huyghebaert’s thematic map, with each polygon made of a descriptive sentence.

I find this example particularly compelling from a text visualization perspective. One can imagine using the same technique with choropleth maps, cartograms, treemaps, hierarchical pies or any space filling visualization technique. At a macro level, the areas are highly visible and you can use color to indicate a thematic variable. At a micro level, you’ve got detailed text — not just labels, but the opportunity for explanations, descriptions, details and even a few icons.

Antoine Gelgon and Pierre Huyghebaert presented an extremely detailed analysis of all the variation in the lettering of the famous belgish comic Gaston, going deep into the technical constraints of pens that were used, touch-ups with whiteout and so on. Then, super interesting, they created a parametric font following the same approach as Don Knuth’s Metafont. The result is a variety of tweakable parameters to create computer-generated hand-lettered text for future comics and presumably merchandise:

Gelgon and Huyghebaert’s parametric font for recreating lettering for the comic Gaston.

The final presentation the very important topic of type legibility in visualizations and more broadly user interface design. Specifically, the design task was to revise the font used in displays in aircraft and air traffic control systems. The presentation showed a number of interfaces with various issues, such as low contrast, glare and other real-world operational issues with the existing displays:

Left: visual display in cockpit under ideal conditions. Right: same display with glare.

Furthermore, the existing font had the potential for confusion as the displays often had codes that combined alphabetic characters with numeric characters. With detailed user testing, the design team identified the most confusable glyphs (e.g. B/8) and iteratively designed a new font to minimize these issues, suitable for use on industrial display screens even with low pixel density. The result is the font B612. A subset of the font is freely available for download (e.g. google fonts).

Left: example glyph confusion matrix. Right: example design adjustments to reduce confusion between similar shapes.

All-in-all, a highly relevant workshop to visualization dealing with text visualization issues ranging from interaction techniques, novel layouts, to parametric text, to type legibility. And, Valence is a pretty town, worth adding a stop if you’re visiting southern France. Here’s a couple of tourist photos of the market on Saturday morning and a typographic sculpture:


Posted in Data Visualization, Design Space, Legibility, parametric fonts, Text Visualization, Thematic Map | Leave a comment

Organizing a visualization book

I’d previously created a book, with David Jonker, regarding Graph Analysis and Visualization in 2015. It was a lot of work. With lots of visuals and text, a word processor is pretty good to see a page or two, but you don’t see the whole thing. To get a sense of the book, I printed out a rough draft and taped it up on the wall of basement. It helped a lot in terms of figuring out how to move things around.

Similarly,  over the course of my thesis and my upcoming book, Visualizing with Text, due in October 2020, I wanted to get a better sense of how everything fit together, not just page by page views. However, this time I invested in a 32″ 4K monitor. I could look at, and read, 12 pages at a time. That was good for working on chapters and sections, to see how groups of images worked together. An unexpected side effect was that this large monitor allowed me to sketch out many different alternatives on the screen to help bring together and organize many aspects of the work.

Paper Outlines and Sketches

Before reworking everything on screen or in printouts, the process often starts with some lists and diagrams on paper.  I don’t have many of the rough scribbles, as the loose paper tends to get thrown away. Here’s a few sheets that haven’t hit the trash yet:

Visualizing_with_Type_Reorg_Notes.jpg

Organizing the Design Space

The crux of the book (and the earlier thesis) rely on creating an organized design space for all the bits and pieces of text being used within visualization. Over the last 7 years, pieces emerged by looking at historic examples, talking with experts in different fields, creating bits of code and noting what worked or failed and so on. The organization of these different pieces into a whole was emergent: it was not a linear process.  There were false starts, things that sort-of worked but not quite, and even when the organization got close to the final form, many tweaks and variants. I spent many weekends over many years on a few diagrams that organized everything text and visualization. In effect, these iterative diagrams represent a research through design process. The effort for these diagrams surpassed the writing effort associated with a chapter.

Here’s a diagram of the many iterations, as a timeline, where you can see some historic starting points, successive iterations, and a few dead-ends. You can see near the end, the diagrams become bigger and more complex: more ideas can be explored on a 4K screen rather than paging through many screens.

Visualizing_with_Type_Design_Space

One recent dead-end is labelled “everything” in the above diagram. It attempts to fuse the entire process into a single diagram. The left page in the photo has notes regarding text interaction, the research sources and relation to the everything diagram. In doing so, I realized that some elements in the diagram are less researched and less examined than others (e.g. interaction, cognition). Attempting to add these other pieces into the book would have added another 40-80 pages and possibly two years of new research: working with editors, we agreed that these were out-of-scope for this book (but it helped organize the related content and spurred a few enhancements to parts of the book).

Organizing the Chapters

The design space is the first third of the book. The rest of the book is all about new kinds of text visualizations, heavily illustrated with example visualizations that I’ve created. In late 2019, I had a lot of the content, but I didn’t like how it was fitting together and felt that there were still some gaps. Some of the content was orphaned, some was duplicative, and so on. Furthermore, some examples were throwaway and could have been better constructed to link to broad themes in the book.

At this point, I decided to take one image from each of the examples, create a map of the existing book, and then scribble over top where things should move into different groupings, items to remove, items to add, items that were missing.

Visualizing_with_Type_Chapter_Rework2.jpg

The lower half of the above screenshot represents all the examples in the final 8 chapters. In the upper half are some reference images that organize, structure and introduce these 8 chapters. The references and the content are interdependent: adding / removing / moving examples changes the chapters and changes the introduction. And the organization implies aspects about the design space: the book evolved into Visualizing with Text instead of Text in Visualization, because through these design processes I realized that the design space was bigger than the traditional palette of what most people think of as visualization today.

This process was stressful, because it meant re-writing sections that had already been written, and there was a looming deadline. But, in the end, I am much happier with the result. And I think the extra pixels helped with this reorganization more effectively than rearranging pages or post-it notes on a wall.

Posted in Data Visualization, Text Visualization | Leave a comment