New InfoTypography research validates and quantifies text variables in visualization

A decade ago, I had a hypothesis that text could be integrated within visualization, such as the use of typographic variables to encode data. That is, boldness, oblique angle, underlines, or even x-height might be useful to convey additional data beyond the text itself. Given 500+ years of typography and cartography, I was able to find many examples, some of which are published in Visualizing with Text, and some of which are discussed in blog posts on this site, such as, The Maligned Underline; Italics and Obliques; Font Weight; Typeface; Modified Fonts for Visualization (x-height); and so on.

BUT! The historic examples provide evidence of typographic visualization, however, they don’t quantify how to use type. What amount of boldness? How much difference in boldness is noticeable? Typographers have heuristics for these – for example, when typographers create a font with 5 weights, the amount of ink increases exponentially with each level. Yet, more recently, variable fonts have become popular, meaning that instead of 5 weights, the typographer can define the minimum weight, the maximum weight and allow the font-user to pick any level in between, effectively providing hundreds of weights. There are now hundreds of variable fonts (e.g. see v-fonts.com), some of which have an incredible range of a single variable, such as the weights for this Clarendon, or some have many variables, such as weight, width, oblique angle, x-height, contrast, and more for Roboto or Amstelvar, and some just have fun variables, such as yeast, gravity and temperature such as Cheee (try it!).

So, how should one use the variables in a variable font in visualization?

InfoTypography Experiment

Hot off the press is new research Perception of Letter Glyph Parameters for InfoTypography, by Johannes Lang and Miguel Nacenta. This is experimental research, where humans need to make estimations regarding typographic attributes. Overall, seven different typographic attributes were experimented with (weight, width, contrast, x-height, slant, serifs, and aperture (opening and junction)):

Type attributes used in the experiment.

Two different experiments required humans to make estimations on the text. In one experiment, they needed to match samples to measure how closely humans could estimate the typographic variable in question; in another experiment they had to assess which of two words had greater weight (or width, or contrast, etc.).

When these tests are repeated many times, with many subjects, enough data can be collected to measure the difference between the actual values and the estimated values. This data can be plotted, for example showing the range of the font variable on x-axis and the amount of error on the y axis. Then different regression models fit (i.e. curved lines on the plot), which in turn helps us understand how accurately human perceive variation in these typographic attributes:

Without going into full details, essentially the subjects had low rates of error with font-weight – the experimental dots (red and blue) are all very close to zero. The horizontal grey line with black diamonds at the bottom of the plot indicates that many levels of weight are distinguishable (note the incredibly wide range of weights in the font tested in the prior image). Also note the slight increasing slope on font weight and the increasing distance between the black diamonds, meaning greater variation in weight is required with the heaviest-weight fonts.

Other variables do well such as width, x-Height and slant. To make these variables more comparable, the authors combine all the lines into a single plot:

Weight clearly performs best (red line at the bottom), with the next best performers being width, x-Height and slant (both left and right slant combined into a single blue line). Slant has a very interesting fitted curve: note how it performs well near vertical, but not at vertical nor far out from vertical.

A key takeaway from the paper’s discussion is this validation of visualizing with text:

We interpret our results as supportive of Brath and Banissi’s vision of varied and widespread infotypographic applications. Several of the parameters offer substantial ranges of discriminability for categorical and continuous mapping of information attributes.

Lang and Nacenta, Perception of Letter Glyph Parameters for InfoTypography.

What does this mean for data visualization?

It is exciting to see these real results from perceptual studies. For me, there are a few surprises:

  1. There are more levels of weight, slant, x-Height and width that were perceivable than I have estimated in my book Visualizing with Text. This might be due, in part, to their tested font which has a bigger range for weight, width, slant, etc., than most variable fonts. But even with that caveat, the range is larger than I anticipated. This is promising for visualization – instead of a few levels of a quantitative variable encoded, more levels may actually be perceived.
  2. Perception of slant and it’s behaviour near vertical is unexpected. I would have expected it to be most discriminable right at vertical (0 on the plot).
  3. Yay for x-Height. I’d always thought x-Height would have good discriminability (with caveats for numbers, uppercase, and some lowercase letters). The experimental results are encouraging for further experimentation. There are still a few more caveats though, e.g. a very high x-Height n is confusable with h; a very low x-Height e may be illegible or confusable with c. More x-Height experimentation and more x-Height visualizations need to be tried out (e.g. Text Skimming > pick x-Height, or Weight & x-Height), e.g.:

Also note that these experiments focused on one typographic variable at a time. Combining multiple typographic variables simultaneously will change perception performance and have potential issues with separability, but that’s for future experimenters to evaluate.

Finally, Nacenta has also made a microsite to go along with the research paper, which you can find here. Click one of the big buttons.

Posted in Data Visualization, parametric fonts, Variable Fonts | Leave a comment

Rethinking Flowcharts

Flowcharts are ubiquitous. There are incredibly amusing flowcharts on almost any topic, copied across blogs and other websites, such as this fun one about how to leave a dinner party:

Should I stay or should I say bye?

There are so many fun flowcharts. But you’ll notice many of these fun flowcharts are just simple hierarchies, they branch but there’s no or minimal merging. They are essentially decision trees. Here’s a fun one for choosing a science fiction or fantasy book from NPR.

Need a book to read? Follow this handy flowchart.

More than 100 years of flowcharts

Flowcharts are far more powerful than the fun and games of figuring out which book you’ll end up at. They can document complex processes. Historically, flowcharts have been around for a longtime. Wikipedia claims the first structured method was documented in 1921 as Gilbreth’s process charts – although many earlier examples can be found. I don’t see any reference to the inventor of flowcharts on datavis.ca. Here’s three flow sheets from 1909/1910, showing branches, merges, and backloops [1,2,3]:

Some flowcharts from 1909/1910. Splits, merges, backflows and lots of labels.

And here’s a really interesting example reprinted in Brinton in 1914. There’s much more text along the lines, many parallel lines, and lines that flow through nodes, sometimes connecting with other lines or sometimes not intersecting. The flow of an order through many steps can be visually traced:

Orders flowing through various departments from 1914.

Some awesome flowcharts

Flowcharts are simple to make — anyone could make reasonable flowcharts for publication with a typewriter, so there are many examples to find across the Internet.

What’s interesting, for me, is the combination of the chart and the text. The chart is essentially a graph (aka, a network of nodes and links). But the text can range from simple labels the much longer questions or statements (and it’s those statements that can be fun). Here’s an great flowchart for teaching mass communications from 1977 by Sue Scott Sampson (from before the Internet when social media and blogs weren’t available:-):

Flow chart for mass communications from 1977. Organized into sections, diverging paths, etc.

This chart effectively summarizes a 150 page book. More than a summary, it compactly sequences tasks and alternatives. It has has a simple top to bottom flow – objectives set out at the top, goals at the bottom. It is split into major sections (horizontal bands), with major steps discussed in a couple sentences and simple steps expressed in a word or two. It uses underlines for titles that correspond to major headings in the book. It uses italics for optional activities. It’s impressive!

Below is another very detailed flowchart from a technical manual for the BevMax2. The BevMax2 is a vending machine with a glass front and visible dispenser that picks the bottle from any shelf and delivers it to the customer:

Apparently, there are quite a few things that can go wrong with the dispenser that moves the bottles up/down/left/right/tilts/turns (as well as the coin dispenser, compressor, etc). I’ve taken the liberty of compositing all the flow charts from 16 pages into one image:

Is something wrong with your vending machine? These flowcharts capture all diagnoses and workflow to fix them.

While these flowcharts may look daunting, each deals with a particular problem that can be resolved within 20 or less steps, such as “Picker cup not working”, “X-axis yellow light on/off”, or “Coins rejected”. The flowcharts on the right are essentially sequential (e.g. the 6 steps to ensure that coins are not rejected), whereas the flowcharts on the left have more complex steps in assessing and fixing problems such as the picker cup.

More importantly, these diagrams itemize most everything that can go wrong with your BevMax2, they provide diagnostic steps to collect information to characterize the problem, and prescriptive steps to fix the problem. Diagnostic and prescriptive analytics are core to data analysis. Expert systems and AI approaches can also do these analytics, but flowcharts show the process and the reasoning — presumably there is a role for flowcharts in AI explainability.

Flowcharts vs graphs

In the visualization world, flowcharts don’t come to fore. Graph visualization, i.e. drawing of networks of nodes and edges, is common. But most graph visualization is text-light — all the network and hierarchy examples in D3.js and Vega galleries, have, at most, minimal labels. Similar with freeware point and click tools such as Gephi or Cytoscape. There are a few tools, such as yEd, Concept Draw, and so on, that do a nice job of laying out flowcharts, such as supporting text in nodes, using color and so forth — but not whole sentences.

This is strange: it’s as if flowcharts and graphs are distant cousins, not the same thing. Flowcharts emphasize clear layouts and drawing nodes as boxes with readable text. Color on flowcharts, if used, indicates different categories. Graphs emphasize drawing lots nodes, usually as circles. Attributes of the circles and lines, such as color and size, are used to indicate quantitative (or categoric) data. Labels are minimal – a word or two.

Why are flowcharts text-centric and graph visualization data-centric? Flowcharts are used in designing a process, and visualization is used to monitor processes – therefore – in process monitoring visualization both flowcharts and real-time graph visualization must be combined, right? Looking at process monitoring visualization, it seems like the emphasis is on visualizing the graph with flowchart-like-symbols and colors from data, but minimal text:

Google image search SCADA / industrial process control visualization.

In industrial process control, presumably the emphasis is a visual overview of real-time system health. Ease of visually scanning is important, sure, but what if the operator has to visually correlate between an alert system (text) and the system diagram (graph), requiring cross-referencing which can be slow. Or, what if the operator needs to drill-down into the subcomponents in one part of a graph — say to a particular region or particular equipment — those details may be far less familiar to the operator and may require look-up to a separate document. That separate document, may in turn, have a flowchart in a different orientation/ different symbols/different labels than the system visualization. This will result in slower decision making and increase potential for error.

Why not combine flowcharts and and graph visualization?

Posted in Data Visualization, Flowchart, Graph Visualization | Tagged , | Leave a comment

Maps Leaking Typography into Visualization

I enjoy typography and cartography. Cartographic labels show more than just the name of the place, such as using font weight to indicate population in a town, or spacing to indicate the extents of a mountain range (previous post). It was these insights that provided the starting point and justification for my thesis and eventually my recent book Visualizing with Text.

I’ve written previously about these cartographic uses of typography (link). At the same time, some visualizations and infographics have their origins in maps, and sometimes these typographic features have leaked through to maps. Here’s a few examples:

Charles Booth’s Map of Poverty

Some examples are heavily labelled maps which then have thematic coloring applied, thus retaining all the original encodings in the typography. Charles Booth’s poverty map of London (1890) is an beautiful example:

Tiny portion of Booth’s poverty map. Color yellow-red-blue-black indicates poverty, overtop a high detailed street map.

It occurs after the invention of thematic mapping with minimal labels (such as choropleth maps) was already established by Dupin and followed by others. So why did Booth add thematic colors over a heavily-labelled street map creating so much clutter?

Booth was interested in “facts and figures to combat the conjecture, prejudice and potential social unrest.” (lse) (which is also very relevant to 2022). Using a detailed underlying map allows the viewer to see Booth’s data, building-by-building, block-by-block, parish-by-parish. The granularity makes the fine-grain data collection indisputable. Furthermore, the detail labels, whether highstreets (heavy serif), side streets (light serif), landmarks such as railways and churches (light sans), neighbourhoods (heavy all caps serif, e.g. BLACKWELL), regions (outline all caps drop-shadow spaced serif, e.g. GEORGE IN THE EAST), and parishes (dark black all caps, e.g. St. MATTHEW), allow for detailed navigation and inspection of the survey.

Of course, Booth’s maps still worked at a zoomed out level (like a choropleth map) to show broad patterns of wealth (in West London) to poverty (in East London):

Booth’s map at a distance: wealth in the west, poverty in the east. Many small blocks indicate detail on close reading.

However, Booth’s map goes far beyond an overview analysis. The great detail – and labels – enable fine-grain analysis and reasoning. Any contemporary of Booth could view the map and use their own local knowledge to confirm Booth’s facts. They could place stories from the press in context and determine whether the press reporting aligned with the characteristics of poverty. They could consider more detailed hypothesis — for example, are sidestreets slightly more poor than adjacent highstreets? Or, are indirect streets more poor than straight streets? Is there a relationship between railway lines and poverty? Does poverty align with parishes? These would be much more difficult or impossible to consider in a stripped down choropleth map.

Album de Statistique Graphique Maps

The Album de Statistique Graphique is a collection of statistical atlases from France late 1800’s / early 1900’s (which I’ve previously discussed and see also Michael Friendly). These atlases presented facts and figures, such as crop types, transportation flow, population movement, employment and so on. With the primary focus on summary statistics (and the French heritage of Dupin’s thematic maps), these maps and charts are often data-dense, but also stripped of extraneous details from the underlying maps. Here’s a map of modes of travel in France from Paris to various cities in 1765:

Travel time and mode to/from Paris in 1765.

This map has been stripped of detail, leaving the primary story of modes (color e.g. carriage, coach, water coach, etc.), time (line thickness) and place (map and labels). Note, however, some remaining elements from map conventions. Cities are indicated as labels associated with dots: font-size, caps and italics remain, facilitating quick skimming to principal cities (i.e. heavy-weight all caps). Distances on each path segment are aligned with the path, as they would on a roadmap. Time, a non-geographic measurement, is presented as blue text, in circles, to further differentiate from city or distance labels. The viewer can quickly read this map for insights such as the fast time from DUNKERQUE to PARIS vs. the slow time from Calais; or that all routes to BASLE are slow; while at the same time able to see intermediary towns and distances on closer inspection.

Interestingly, when the Album presents movement in Paris, the underlying base map is not stripped away but includes streets and street names (although not at the level of Booth’s poverty map):

Travel in Paris by tram, railway or boat (1889).

Album de Statistique Graphique Charts

The Albums’ authors also leak these cartographic labelling principles onto other visualization types. Here’s an awesome bar chart where numbers for major values on axes are bolded:

Major labels in bold.

And here’s a dual-axis line chart from the Album, with series labelled directly along the lines, the same way that a cartographer would label a river:

Labels follow lines, like river labels on a map. Playfair did this too.

Curved and angled labels appear throughout the albums. Modern information graphics and statistical chart designers would highly recommend against text rotated off horizontal (e.g. Wallgren et al), as rotated text is more difficult to read. Yet, from a cartographic perspective, text aligns with the graphical features that they are related to, such as this example of radar plots in the Album:

Radar plots with text on angles and curved. Why?

Why is the text rotated? Rotated and angled text is directly associated with feature that they are labelling. Horizontal text associated with an angled feature requires the reader to properly associate the the horizontal text with the appropriate feature — does it correspond to the angled line, or the arc, or something else? This becomes even more of an issue as the plot becomes more dense, such as this example of quantities as bubbles over time in a polar layout (1) – some red bubbles are very close to others – the text on arcs aligned to the bubbles is unambiguous:

Johnston’s Elevation of Plants

The final example is also the oldest. From the Physical Atlas of Natural Phenomena by Alexander Keith Johnston, various charts and maps are shown. The Distribution of Plants in a Vertical Direction is presented as both a simple stacked bar chart (top right) and as more representational mountains center:

Both present similar data – the vertical bands of plants by altitude in different regions of the world. The stacked bar chart makes a slight modification to its topic by using triangles instead of rectangular bars, and shows the corresponding regions of climate in different mountainous zones around the world. While stripped down visually, it retains some typographic formatting such as rotated labels and different fonts for different categories of information.

The larger representational mountains, for some reason have fewer climatic zones, but far more rich data encodings:

Closeup of Vertical Distribution of Plants

In this example, there is typographic variation used to indicate different types of features. Plants are indicated in bold italics, (e.g. Bananas, Orchids, Chesnut, Maize), boundaries in plain italic (e.g. Upper limit of Tropical Zone). locations and altitudes in non-italic (e.g. Djuwahir, Walloong Pass into Tibet, 6,000ft).

Even more interesting in this particular example is the use of texture to indicate type of plants: palm trees at the low tropical elevations, deciduous trees above, rectangles with stripes to indicate cultivated fields of corps, conifers and so on. Without reading the labels, the viewer can understand the content. This kind of texture as indicator will later appear more symbolically in Isotype. But it also raises the question (and provides a hint) of how and where texture might be better used in modern visualizations.

Visualizing with Text Footnote

I recently found this text-based cartogram of Olympic medals from Bloomberg News:

Gold medal counts.

It’s showing medal counts from the Olympics, using country codes – much like I had done in my cartograms in Visualizing with Text – plus the typographic attribute bold to indicate countries where a medal had been won. Bubble size is used (instead of a typographic attribute) to great effect. The associated country is still obvious at the center of the bubble, and the biggest bubbles are most salient. Thanks Bloomberg (Jin Wu, Cedric Sam, Pablo Robles, Jane Pong, Adrian Leung and Alex Tribou).

Posted in Data Visualization, Isotype, Line Chart, texture, Thematic Map | Tagged , , | Leave a comment

Text Visualizations and Proto-Text Visualizations

Michael Friendly recently sent me this Common Sense Revolution visualization by Scott Sørli plotting a timeseries from 1985-2007 of welfare income for a single person in Ontario; and the names of all the homeless who died on the streets of Toronto over the same time period. An inverse correlation is strongly apparent implying a potential causal relation between the welfare amount and the homeless deaths. While the deaths could have been a simple line chart or bar chart, stacked names much more strongly indicate that we’re dealing with people. And more so that a stack of people icons, these are named people: real people with real given names, real surnames and presumably families and connections in their communities, such as Floyd Anderson, Cheryl Lynn Gunn or Norma/n Lewis. And, disappointingly, there are quite a few John Does and Jane Does, where presumably the investigators did not have enough resources to track down the real names of the deceased homeless person.

It’s also a reminder that text visualizations have a long history. In my book, I do look at a lot of historical text visualizations – as a basis for creating a framework for considering the many ways data can be encoded into text. And then given the framework, I create many visualizations.

But it’s also highly useful and relevant to continue to look at historic examples, to see techniques, combinations, and methods that may inspire or inform future visualizations and creative works. I recently found a copy of Language & Structure in North America (November 4-30, 1975, Richard Kostelanetz curator). Here’s a few interesting snaps of visualization-like uses of text from the 1970’s:

Textual flow chart, polar plot and vectors from the 1970’s.

Leftmost is portion of George MaciunasThe history of Fluxus, a text-centric flow chart organized by time indicating in historical art movements leading up to Fluxus. The polar plot is an analytical diagram by Agnes Denes titled Studies of Time/ Exploration of Time Aspects, plotting concepts vs time past/present/future further organized by dimensions such as memory – a prioi knowledge, and reproductive – modification. Noise Text #1 by Ascher/Straus is a result of a series of transformations on texts into what appears to be a set of textual vectors.

Visualizing prosody isn’t new. Here’s a great example from 1969 by Ernest/Marion Robson, using letter width to indicate duration, and font-weight to indicate intensity as well as a baseline shift. Not surprisingly, the encoding is very similar to the example visualizations which I’d created as these are connotative mappings. I like their much more dramatic variation in width and use of all caps, overplotting, and use of leaders (….) and whitespace (from Introduction to Transwhichics, DuFour Editions, PA, 1969):

Prosody indicated with width, weight and shift by Ernest/Marion 1969.

And here’s a very interesting creation of a 3D visualization based on an analysis of syllables per unit measure from Yeats by Beth Learn 1975 (Timeslide Over/Time):

The final two examples are generative works, creating new text from pre-existing work. On the left, a receipt is used as the basis for constraining words by Karen Shaw titled $8.40 (1975) (did not find a good link for Karen). Each line item on the receipt sets the cost per word, where each letter has a unique cost. Words are then stacked into two alternative poems:

On the right, John Perreault, Goddess, 1969, uses parentheses to mark words within larger words or spanning across words, e.g. “(Eve)n in(to t)h(in)e own (so)ft-(con)che(d ear):” thereby creating alternative readings.

Creating and understanding alternative texts becomes more important with an increase in computational textual analytics. Whether overlaying analyses such as attention or assessing generative text sequences, these artistic approaches hint at some possibilities for visualizing text.

Posted in Data Visualization, Design Space, Text Visualization | Tagged , , | Leave a comment

Showing risks, rights & freedoms in visualizations

The tragic events in Ukraine have left me wondering how quantitative visualizations miss showing complex issues such as human rights. One aspect of this conflict mentioned by various media outlets as well as elected officials is the flow of funds to purchase commodities, particularly oil, helps fund the military ambitions of the state. While Russia’s human rights record is terrible, many other oil-exporting nations also have serious human rights issues. How might difficult concepts such as political risk and human rights be shown in a visualization about oil?

In visualization, a quick solution would be to find a metric which encodes risk, rights and freedoms. A metric is needed because:
a. Visualizations encode quantitative (and categorical) data, not unmeasured data;
b. You can’t manage what you can’t measure.

These are commonly-held wisdoms in visualization and management consulting. But is this the right approach? Consider a treemap of oil exports from countries (showing only countries with more than 100,000 barrels per day):

Treemap with size indicating oil exports by country, color indicates a measure of political risk.

The primary encoding of the treemap is oil exports by size. Saudi Arabia is the largest, but also Russia, Iran, Iraq, UAE, Kuwait, Nigeria and Canada are large as well – each exporting more than 1.5m barrels per day. At $100/barrel, that’s more than $150m/day. The dollar amounts are enormous, creating enormous opportunities for sovereign governments to use some portion of that money for state activities.

Not all countries are bad actors. Color in this treemap indicates political risk, as indicated by a risk rating. However, this particular risk rating doesn’t rate some countries such as Norway and Mexico – presumably the level of risk is not similar between these countries.

Thus, we might look a metric with better coverage. The treemap below uses the Corruption Perception Index (from Transparency International) for color:

Treemap with size indicating oil exports by country, color indicates Corruption Perception Index.

In this example there is coverage across all countries. Russia, Iran, Iraq and many others look bad, Libya, South Sudan and Venezuela worse (although this data has not been updated in response to the invasion of Ukraine). The color scale is a diverging scale, copied from a map on the Wikipedia article indicating Corruption Perception Index. Unfortunately, this creates green for countries implying good scores – including for some countries with poor human rights records.

Therefore, we might try to keep searching for a metric (and a color scale), that better captures what we think should this metric should show. This search for metrics is an attempt to capture our real-world knowledge of risks and rights abuses of different countries, but we’re also in danger of simply looking for metrics that confirm our biases. Here’s a nicer version of the treemap perhaps a bit closer to our expectations using the Global Peace Index and the inferno color scale:

Treemap with size indicating oil exports by country, color indicates Global Peace Index.

All of these indexes attempt to capture complex multi-variate data. For example, an American viewer may object the the Peace Index categorizing United States at the same level as Algeria. If no single metric captures these issues, one might turn to a visualization technique that instead shows many variables, such as parallel coordinates. But creating a much more complex visualization, misses the simple immediacy of the treemap – and ignores that all these size-based visualizations (bar charts, pie charts, treemaps, sunbursts, area charts, etc) are highly prevalent and will continue to be popular.

What to do?

Annotations in areas

Many visualizations use size to draw attention to larger objects: bar charts, pie charts, maps, treemaps, etc. In all the treemaps above, Saudi Arabia and Russia are large, Gabon and Vietnam are not. Presumably, the largest exporters should have more scrutiny, not just a larger size.

Interestingly in cartographic maps – such as a roadmap, Google map, etc – large areas end up with more labels. Why shouldn’t visualizations do the same? After all, the largest areas are the items with much larger values, and thus perhaps deserve more attention than the tiny items. Here’s the treemap visualization again, this time with the opening paragraph or lede sentences from Human Rights Watch country pages:

Treemap with size indicating oil exports by country, color indicates Global Peace Index, and prose text indicates some human rights abuses from Human Rights Watch.

In this example, the treemap remains and the color coding remains. Large blocks also have additional text that can be directly read if of interest. Saudi Arabia’s human rights record indicate issues with official accountability for the murder of Jamal Khashoggi; Russia’s record indicates it is the most repressive since the Soviet era (and this is text from before the attack on Ukraine); UAE detains dissidents even after completing their sentences (and UAE is positively biased on both the peace index and corruption index). Even large exporting countries with generally good records, such as Canada and USA, now have enough space to indicate rights issues such as the rights of Indigenous peoples in Canada, or poverty and inequality in USA.

The different kinds of rights issues not visible with a singular metric have the opportunity to become directly visible with the addition of annotations. There is space to shine a light on the details behind the largest exporters. Income inequality and Indigenous issues are human rights issues as are other repressions, but the viewer can make a more informed comparison about the instances, breadth, severity and cruelty of the largest exporters. Abstract concepts such as peace and corruption are made more concrete with instances and examples.

This example helps to turn the concept of a generic commodity (oil) into a more uncomfortable question about where the money goes after you pay to fill up your vehicle, turn on your stove, or take another flight.

Posted in Alphanumeric Chart, Annotation, Data Visualization, Treemap | Leave a comment

Two Y-Axes may be Appropriate!

Many visualization people are against dual-axis charts. There are research papers that recommend against, websites that recommend against, and some tools don’t support it. Hadley Wickham, the author of the popular R visualization package ggplot2, does not support dual axes charts. Critics point out that they can be misunderstood and the viewer may attend to the wrong visual elements, such as line crossings or relative position between lines.

Instead, consider this informative dual-axis chart showing cigarette sales and lung cancer mortality from Our World in Data:

The chart clearly shows:
– Two data sets: cigarette sales and lung cancer deaths
– Rising and falling trends in each are highly visible
– Labels and color-coding clearly distinguish the data, lines, axes and tick labels.
– Annotations indicate key events
A viewer can, at a glance, see that the shape of lung cancer deaths roughly mirrors the shape of cigarette sales, with roughly a 20 year lag.

Critics don’t like multiple y-axis charts for many reasons. However, in this chart, many of these problems have been addressed. Here’s a few issues with dual axis charts:
Axes can be confused, but that is less likely here due to color-coding and titles at the top of each axis.
Line crossings are visually salient, but this chart does not draw attention to the line crossing, instead annotations draw attention to other events.
Comparisons can be gamed, for example, by tweaking the start and end timeframe and the relative scales of the axes, once can manipulate where crossings occur or the slopes of lines. Here, a shared zero baseline and similar peak heights indicate that the chart isn’t being gamed.

Detractors might also suggest a derived chart could be used, such as a rolling correlation between the values — however, that assumes the viewer understands a rolling correlation, and further the base data is lost (if not shown), or a more complicated set of cross-references between charts is required. This chart provides the base data, and allows the viewer to make perceptual comparisons between the series.

I’ve actively created visualization tools with multiple y-axes – in use by hundreds of thousands of users(!). And I’ve written a research paper (Y2Y) on dual axis charts (together with Eugene Sorenson and Craig Hagerman). Here’s some more evidence and reasoning in support of cases for multiple y-axes charts.

A. Financial Services Software

In financial services software, multiple y-axes charts are common. In Refinitiv’s Eikon, the Bloomberg Terminal, or Cosaic timeseries charts, the financial professional can create charts not only with 2 y-axes, but more:

Financial charts with 2, 3 and 4 y-axes.

B. Financial Services Publications

In financial services publications, multiple y-axes charts are common. We looked at 25 publications by financial firms, such as banks, mutual funds, advisory firms and central banks (pg 14-16 PDF). Across 925 pages there were 1305 charts, of which 944 were timeseries charts. Out of those 944 timeseries charts, 179 where dual axes charts. That is: 19% of the timeseries charts use dual axes. Here’s a few examples:

We found hundreds of dual-axis charts in financial publications.

Clearly financial professionals are comfortable with the use of dual axis charts in their communications, and not concerned with mis-interpretation of these charts. You can also find examples in financial news, such as The Economist and the Financial Times:

Sample dual y-axis charts in The Economist and Financial Times.

C. Example Use Case for a Dual Y-Axis Chart

C. We provided a few specific examples. Here is an example of price comparison between two financial securities – the price of oil, and the price of the Canadian dollar, to show how the dual axis chart aids the analytical user. First, here’s a chart of each, side by side:

Oil and the Canadian dollar. Both go up and down. Which is first?

Both start low, go up, then drop back down, even lower than their starting low. This is expected, because Canada produces and exports a lot of oil. But the price of the Canadian dollar isn’t directly linked to the price of oil, and it doesn’t always follow the price of oil either – notice the sharp drop in the price of oil in 2014 whereas the Canadian dollar has a long decline for 2012 to 2015. Questions such as “which series started rising first” cannot be determined by looking at these charts. How about putting the charts together, with a single axis:

A single axis doesn’t work for comparison between the two!

A single axis does not work for comparison, as the Canadian dollar is valued in fractions of a US dollar whereas barrels of oil trade around $40-100 US dollars. Instead, these two series could be normalized to a starting value of 100, and here is the resulting chart with a single normalized axis:

A normalized axis doesn’t quite work either.

A normalized axis is a bit more useful. In absolute terms, over the 8 year period, the price of oil more than doubles, while the Canadian dollar increases perhaps 30%. Price of oil is more volatile than the Canadian dollar.

It’s still hard to see patterns in the dollar. Both price lines zigzag, but which one leads over the other? Do they always move in the same direction? These are critical questions to commodities traders and currency traders. This information impacts whether they can correctly assess price movement, and whether or not they make or lose money. Here’s the two charts plotted vertically, with aligned time axis:

Time aligned charts. Many of the dips and peaks trend together.

The alignment helps to see that many of the local dips and local peaks share trends, but it’s still hard to see when they might be off by a day or two, or if they always move in tandem. Finally, here’s a dual axis chart:

Dual axis chart. Mostly they trend the same, but note the trend divergence in the yellow highlight.

The similar shapes help make lines close to each other and this facilitates local visual comparisons. Most of the time in this chart, the series tend to move in the same direction at the same time, but periods of divergence are also visible. For example, in the larger yellow shaded zone in late 2013/early 2014 oil moves up while the Canadian dollar moves down. Similarly, after a strong rally in both in the first few months of 2016, oil stays relatively unchanged while the Canadian dollar starts a downward trend.

The example is discussed in more detail in the research paper, including examples with scatterplots, a horizon chart and rolling correlation charts. This is also further discussed with more examples of other charts; and a discussion on automating when to use dual-axis charts in a chapter in a forthcoming book by Springer later this year.

Making Effective Dual Axis Charts

In our charts as well as the cigarette and cancer chart you’ll notice a number of cues to help the viewer:

  1. Consistent color-coding of the series (i.e. the line in the plot area), the series name (whether in the title, the top of axes, or in a legend), the tick marks, and the tick labels. Basically, if the graphical bit going onto the chart relates to only one of the series, color it to match that series. Color-coded series names at the top of the axes are helpful to the viewer so that they don’t have to glance back and forth between the axes and a separate legend.
  2. Don’t manipulate the vertical ranges to distort the data: both series should have similar tops and similar bottoms; or possibly aligned to make the lines close for local comparison.
  3. Keep the grid lines simple. Don’t try to draw 2 different sets of horizontal grid lines — it creates clutter. Try to align the ticks and gridlines, or if that doesn’t work, only show the axes ticks and labels and skip one or both sets of grid lines.

Most importantly, know your audience. If your audience is unfamiliar with dual-axes charts, consider alternative charts. Or, if using dual-axes charts with an unfamiliar audience, more care is required to draw attention to the meaningful insights the chart shows: such as the use of titles that indicate key insights, or annotations to specific observations that the viewer should attend to.


Visualizing with Text footnote – Snapchat text chart

I’m seeing examples of interesting text visualizations in the wild. These are relevant to my book Visualizing with Text, particularly if I find examples that don’t quite fit. Occasionally, I’ll pop an example into the blog. Today’s example is from Snap Inc.’s 2021 Annual Report. It’s a timeseries chart with the area under the line filled with various Snap projects at each time interval. Conceptually, it fits into Chapter 6: Distributions:


Posted in Data Visualization, Line Chart, Timeseries | Tagged , , | Leave a comment

Visualizing with Texture: Lessons from Puzzles

Over the holidays, we put out a couple jigsaw puzzles, to solve collaboratively or otherwise take a break from holiday mayhem. These are big puzzles, we did a 2000 piece cartoon puzzle and a 1000 piece photograph puzzle.

Completed 2000 piece puzzle.

With such big puzzles, different strategies are used — for example, a color-blind family member is more reliant on texture over color. Strategies for solving a jigsaw puzzle should be interesting to visualization researchers, because many of the tasks to solve a jigsaw puzzle are visualization-like activities:

Search: finding pieces of that match a subset of visual criteria, based on shape, color, text, texture, etc.
Locate: finding the place in the portion already solved to insert the new piece.
Identify: find a singular unique piece with unique criteria.

To do these tasks, we might use many different visual properties of the puzzle pieces:

A. Shape: The first step is to find the edge pieces and solve the perimeter. While shape is generally not considered preattentive by visualization researchers – it is for finding border pieces. Puzzle borders are straight, all other puzzle pieces are curvy or jaggy — meaning it’s visually preattentive to quickly find those straight edge pieces in a sea of curvy bits.


B. Similarity: Find pieces that share similar features. For example, in the cartoon, this included:
Text: Puzzle pieces with text on them;
Color: Pieces of a similar color (red), then given many red pieces, subdividing those into bright red (helicopter), soft red (stucco wall), red with brick texture (brick wall);
Stripes: green/chartreuse stripes (an awning); or black lines a regularly spaced intervals (pickets on handrails) — stripes are a type of texture;
Shape: tree branches (brownish branchy shapes) and leaves (a ragged zigzag on light green or dark green);
Blur: Interestingly, there is no blur in the cartoon puzzle, but int the photographic puzzle blur was a useful cue. The photo had a sharp focus at one depth with increasing blurriness at further/nearer depths. This was a useful cue for sorting pieces by depth in the scene.

Similar pieces, e.g. with text, by color and color variation, thin stripes, stripes repeating at regular intervals.

Categorization: Note that this similarity task is not trying to decode identity of a specific piece, rather it’s a categorization task, similiar to unsupervised clustering. In some respects, it differs from current data science approaches to clustering where the number of clusters needs to be defined upfront. Instead, the number of clusters is never defined, and is interactively refined throughout the puzzle-solving process.

C. Alignment: Given the similar pieces, use secondary cues to align pieces, e.g.:
– Slope/angle: the green/chartreuse stripes form an awning and the stripes slope at a gradually varying angle: this angle is used to locate adjacent pieces
Texture orientation: the bricks and mortar are in a regular pattern meaning pieces can be rotated to the correct orientation. Stone courses at regular intervals then help locate adjacent pieces.
Texture spacing: lines that represent a handrail are pickets. Pickets are spaced regularly. Two pieces may be adjacent if the spacing between the pickets within a piece match the spacing across the two pieces.

Aligning by texture and text: shadow cross-hatch, brick coursing, text, stripes with regular spacing or in perspective.

D. Content inspection: Near the end of the puzzle solve, the puzzle was largely solved except for highly detailed pieces without strong continuity of color/texture/shape between adjacent pieces (e.g. scenes with lots of little people). In this case content analysis was required and consideration of associations, e.g. a crowd of people shouting, a room full of many technical devices and so on.

E. Other strategies: Not all strategies are visual! One person’s strategy was a trial fit: if the color/texture is close, try to jam the piece in. If it doesn’t fit, no need to visually scrutinize the piece.

So What? Puzzle solving uses visual features such as shape, texture, texture orientation, texture pattern regularity, and blur (in addition to color). These tend to be used infrequently in data visualizations, but might have potential to be used more effectively.

Posted in Data Visualization, Search, Shape Visualization, texture, Visual Attributes | Leave a comment

Lyrics as Tiles: Billie Eilish’s Bad Guy coded with Color and Texture

Song lyrics depend heavily on rhythm, syllables and rhyme (in some songs such as pop songs). Some poetry visualizations add white space between words and lines, which can then be filled with various visualization techniques, such as forming links between related words. Instead, if a lyric is considered like a stacatto sequence of syllables, the layout is more akin to a set of tiles locked together. Then instead of whitespace, visualization is constrained to the tiles.

Simple tiles with English and phonetic syllables

To start, consider Billie Eilish’s Bad Guy. Similar sounds (e.g. rhyme) don’t visibly pop-out in English text. Our goal is to encode those to make them visible. A simple approach is to convert English words to phonetic alphabet, so that the same sounds have the same phonetic symbol:

Bad Guy as tiles, showing English and phonetic alphabet. Note similar phonetic symbols on rhymes.

You can visually scan the phonetic symbols, but you have to look closely at the letter shapes: Rhymes are driven by the vowel sound, which may or may not be at the end of the syllable. Furthermore, in the international alphabet, some vowel sounds are represented by a single symbol and some are represented by two symbol thus making it difficult to attend to the relevant symbols. With phonetic symbols, sounds are comparable, but don’t visually pop-out.

Color-coded vowel sounds

How to make the sounds visually pop-out? Each syllable is a collection of phonemes for vowels and consonants, typically leading consonant(s), vowel(s), and trailing consonant(s). However, there are ~23 consonant phonemes and 16 vowel phonemes in English. Encodings such as brightness, font-weight, etc., don’t scale well to 16-23 uniquely discernible categories. Color is a possibility color — particularly given that some phonemes are similar sounding. Using a confusion matrix, colors can be chosen so that close-sounding sounds have similar colors (although vowel frontness and vowel origin matrix might be better).

Here is a variation where the phoneme is split into three parts:
– leading consonant in light italic serif font
– central vowel in heavyweight sans font, color coded to the vowel
sound, with similar sounds in similar colors
– trailing consonant sounds in a heavyweight serif font

Color-coded vowels visually pop out making patterns of same vowel sounds easily seen.

You can easily scan and notice similar vowel sounds in final syllable of each line, plus the trailing consonant – aka the rhymes (e.g. g). You might also notice some other phonetic techniques such as the leading repetition in the chorus mk / mt, or near rhymes such as ˈkrɪmənəl / ˈsɪnɪkəl.

On the otherhand, using the phonetic alphabet results in some unfamiliar symbols for most native English speakers, e.g. ʌ for “uh” or ʃ for “sh”.

Color-coded backgrounds

Instead, the tile background can be color-coded and the text switched to English spelling:

Color-coded backgrounds of English words by vowel sound. Color patterns pop, but consonant sounds are lost.

But the sound of the trailing consonant has been lost: guy and type have the same vowel sound, but don’t perfectly rhyme due to differing trailing consonant. Worse, nose, toes, and knows, actually do rhyme but are spelled quite differently.

Fun with a polychromatic font

A polychromatic font is a font specifically designed for use with multiple colors. There are a few different fonts that support multiple colors, by providing multiple versions of the font that align overtop each other. Mostly these fonts are available for purchase, not freely available. The example below uses the font Up up and away:

In the example, below, the inside color is the vowel sound, the outside color (and the gratuitous 3D) is the final consonant sound. If there is no final consonant, then background color is used:

A riot of color and gratuitous 3D. Fun, but probably not effective visualization.
A closeup of polychromatic lyrics with colors based on vowel and consonant sounds.

This is just for fun – “Hey, I’ve got this great font, let’s try it out and see what happens”. It has long been known that adjacent colors influence the perception of a color. In practice, this would never work perceptually for effective visualization but could make some viscerally-exciting data-driven text. And some of the color combinations aren’t very legible. See Josef Albers Interaction of Color for awesome paintings of the effect:

Joseph Albers, Colour interaction | Josef albers, Josef albers color, Color

Textures! (plus color and text)

Finally, we get to a version with a tile where:
– English text is used per tile
– Color indicates the vowel sound
– Texture indicates the final consonant sound (if no consonant, then no texture)

Bad Guy with color for vowel sound and texture for final consonant sound. Common sounds line up in many places.

Since color is dominant, it can be seen the guy and type are the same color and thus the same vowel sound. However, type, with the ending p sound, gains the p texture, thus differentiating it from guy. Tough, rough, e-nough all share the same color with puffed, but the texture change gives away the slightly different color between puffed and the others.

Colors are created so that similar vowel sounds have similar colors. Likewise for textures, similar consonant sounds attempt to have similar textures. If rhyme is largely based on the vowel and trailing consonant, this color and texture per syllable create visible patterns across the tiles, visually showing rhyming scheme as well as other phonetic devices. Note similarities also at beginning of lines, e.g. Sleepin‘/ Creepin‘, or Own me/ I’ll be/ with me/ If she/ pity.

At a high-level, sub-columns of same color, same (or similar) trailing consonant visually standout revealing some of the textual structure running through sections of the lyrics.

Dancing Queen

Brig really (really) likes Abba. What happens when we use this to visualize Dancing Queen?

Dancing Queen with color for vowel sound and texture for final consonant sound

Many rhyming pairs are immediately apparent: scene / queen; low / go; swing / king; guy / high. And near rhymes stand out too: queen / sweet / teen / beat / rine all share the long E vowel (purple), and flip between a trailing n or t (diamond hatch vs horizontal line). The near match is also apparent in jive / life (both purple but sawtooth vs x texture).

At a more meta-level, Dancing Queen seems to have more of a blue/purple consistency compared to Bad Guy that tends to be purple and punctuated with other other distinct colors such as cyan and chartreuse.

grandson: Dirty

What about something that isn’t quite so pop music, less lyric driven? Everything above is focused purely on words, i.e. poetry. Pitch, duration and the many other music variables haven’t been considered, and certainly there are many other music visualization techniques (e.g. Ethan Hine, Brian Cort). A linguistic musician tells me genres may use near rhymes rather than perfect rhymes, or may alter the inflection or pronunciation of words to get rhymes (thanks Craig). So, here’s grandson’s Dirty:

Dirty with color for vowel sound and texture for final consonant sound

It is more difficult to define line length and color appears more random as well. There’s no predominant color across the entire lyrics. Unlike Bad Guy and Dancing Queen, there are no columns of color although there are some localized pockets of color. Perfect rhyming pairs exist, such as silence/ violence; sunset / up yet; neighbor / nature; but don’t prevail. There are some near rhymes too such as so go / to go / do you or floorboard / forewarned. There’s a lot more repetition of singular words such as time, you, love, for. And the tiles also help show near repetition of phrases such as: is it time / is it in / isn’t that; or do you love / do you have.

So perhaps the approach also works, but in this case different aspects are lyrics are creating different patterns and potentially different or additional elements need to be visualized as well.

Note: A rough implementation of the above is available as an Observable notebook. I had a few challenges with fonts and leveraged Riccardo Scalco’s texture.js to create the many different textures.

Posted in Data Visualization, Font Visualization, SparkWord, Text Skimming, Text Visualization, texture | 2 Comments

Modley’s Pictographs and Graphs

Rudolf Modley was a key figure in the popularization of Isotype in the United States. I’ve previously written about Isotype (e.g. hypothesizing what happened to it, and thematic axes). I recently received Modley and Lowenstein’s book Pictographs and Graphs (1952, Harper & Brothers). In addition to some beautiful pictographic charts, it also includes useful explanations of the design process and rationale used to create these effective and engaging charts. Here’s some insights from 70 years ago:

Insights from Modley

Storytelling. Modley was talking about storytelling with charts a half-century before data journalism: “The pictorial chartmarker is a headline writer among statisticians. If he fails to tell a story, his charts become pointless.” – pg 23.

Pictographs. “Pictorial symbols should be self-explanatory” – pg 25. A worthy goal, but a big challenge for anyone who’s had to try to design an icon for a menu (hamburger icon? gear?) or CPI (inflated $? balloon?)

Comparisons. “Pictographs make comparisons, not flat statements.” – pg 26. A single row of pictographs referring to a single value is pointless. It’s about comparing one value to another. There are quite a few infographics that fall into this category, with “one big number” and associated pictograph, but what’s it compared to? On it’s own, it’s a single factoid without any potential relative judgment.

Memorable charts. “A good chart may be judged from what the reader remembers the day after he sees it.” – pg 28. Modley sets the stage for this need right at the beginning of the book, on page 2, he describes Mr. Smith consuming information throughout the data – “a flood of varying facts which he must digest and evaluate for himself. Not the least important problem is to retain the essential facts from the wealth of information passing through his mind in one day”.

Personal engagement. “The American development of pictorial statistics has tried to avoid over-standardization of symbols. … it has wanted to bring symbols to life and to adapt them to each new audience. As we have seen in the case of Mr. Smith, his full interest and curiosity are not aroused unless there is some suggestion of his own habits and interests in a graph or illustration.” This is an interesting indication that rather than uniform pictographs used across all charts (perhaps like early Isotype before Gerd Arntz), Modley instead recognizes a requirement for icons intrinsically connected with the subject matter.

Some snapshots

Here’s a couple examples in action from the book:

A couple of charts from Pictographs and Graphs, 1952.

The left image shows the number of women at work – a straight-forward Isotype-like chart with the subtle cue of women’s attire changing with successive rows. This subtle change indicates, minimally that each row represents different data. Further, the attire change reinforces the time scale by using attire associated with each period.

In the right image, a person is comically attempting to hold a pile of coins. The person is literally staggering under a pile of debt (an idiom made into a visualization!). Note the captions above each column indicating the dollar amount – relative visual comparisons are possible, and the quantitative facts are explicitly depicted as well.

Even better

I understand from Nigel Holmes via Jason Forrest, that this 1952 book reprints only some of the content from Modley’s earlier book from 1937 How to Use Pictorial Statistics (a much more rare book). One day I’ll have to track down an edition.

The rare pictograph book: How to Use Pictorial Statistics, 1937.


Visualizing with Text footnote – 2 letter Scrabble words.

I’m seeing examples of interesting, interactive text visualizations in the wild. These are relevant to my book Visualizing with Text, particularly if I find examples that don’t quite fit. Occasionally, I’ll pop an example into the blog. Today’s example is a blog post by Gideon Golden with both an interactive stem&leaf plot of 2 letter Scrabble words, as well as a table of the same words, organized by first letter and last letter and color-coded by Cmglee:

Posted in Data Visualization, Font Visualization, Isotype | Leave a comment

58 Ways to Visualize Alice in Wonderland (+10 more)

How many ways are there to visualize a book? Bar chart, scatterplot, word cloud… that’s too narrow thinking. And, yes, there are websites showing how academics visualize text. But what happens out in the wild? Artists? School assignments? Professional designers? Statistics researchers?

Ever so curious, I decided to find out. To come up with some kind of method to search broadly, I picked one book, Lewis Carroll’s Alice’s Adventures in Wonderland and decided to find all the possible visualizations that might pop-up on Google/Bing text search, image search, scholar search. I found more than 40!

On the right are little teeny snapshots of the visualizations that I found. I won’t go into details on all of them, just a few highlights in this article, or you can view the video from the presentation I did for the Lewis Carroll Society of North America (lewiscarroll.org).

If you’re interested in more details, you can read the peer-review research paper. Some of the snapshots are cropped – the links to the full-size images are in the sources at the end of this post.

Visualizations 1-5 are from the visualization research community. Visualization #2 is a word cloud – only one word cloud of Alice in Wonderland is shown here even though hundreds exist. For the purposes of this article, I’m interested in different visualization techniques. Visualization #5 is Brad Paley’s TextArc from two decades ago – an early, wonderful, highly interactive visualization.

6-10 are visualizations from the digital humanities for analyzing text. I like #8, lining up adjectives for a character, providing a sense of the character. In this case Alice’s speech is described as soothing, piteous, or melancholy.

Visualizations 11-18 are from natural language processing. Interestingly, visualizations 15-18 have almost no words – even though they’re about a text.

Visualization #19 is a wonderful visualization from an art thesis by Yi-Chia Cheng. Paragraphs are converted phonetic sounds, shown as symbols using international phonetic alphabet, and stacked into distributions. Distributions can then be created and compared across languages to show how Alice sounds in different languages. (see Cheng’s thesis for many more distributions across languages).

What happens when looking a bit further a field than linguistic research and data analysis?

Visualization #20 is an artistic tool for drawing using sentences from text by Travis Kirton. In this case, an artist has drawn a figure of the caterpillar smoking his hookah using the corresponding sentences from Alice – creating a figurative, non-linear reading of that text.

Visualization #21 is digital micrography – that is – text which has been flowed to fit into arbitrary shapes. Lines of text are curved, bent and sized to follow the predominant flow of the shape. This particular example is from the PhD thesis of Ron Maharik, who automated the technique for even complex shapes such as puzzle pieces for floral shapes, such as this tiny portion from Alice (see figure 10.1, page 68 for the full image).

22-25 are timeline visualizations, some showing changes in Alice’s height over time. 23 includes Freudian analysis in relation to Alice’s height changes, mapping Alice’s psychological development over the course of the book.

Visualization #26 shows only a small portion of a small multiple visualization, showing 20 instances of Alice’s dress from across many publications and movies by Claire Wenzel. Who knew Alice had so many dresses, and an analysis of the fictional representation of Alice’s dresses over time can provide a view on our own changing society.

Visualization 27-28 are interactive physical visualizations, with flaps, tabs and pop-ups.

Visualizations 29-41 are even more broad examples from across the Internet. Some are borderline visualizations, but do use visualization techniques. #29is a list of color-coded places, characters and events. #30 is an infographic providing context to the book as well as content analysis.

#31 is a social network of characters from Alice in Wonderland. Each character is shown with an original illustration from Tenniel. The social network is shown by the lines joining the characters. Along each line is a sentence of text describing the relationship between the characters. Interestingly, this visualization is authored by a costume website — presumably knowing a bit more about the characters and their relationships helps rents more costumes.

#37 is a wonderfully hand-drawn homework assignment, with keywords in heavy marker underlined and rotated as well as lightweight sentences.

#40-41 are unique editions of Alice, with text layout changing, font sizes, caps, etc., modified by the designer in relation to the semantics of the text. Note the call out in #41 overlaying one of Carroll’s logical inversions to form an X.

That’s 41 visualizations. What can be learned from these? In the wild, there’s a lot more text on the visualizations than the research visualizations. And more use of typographic enhancements such as bold, underline, italics and so on.

* * *

These in-the-wild visualizations spurred me to create a number of other visualizations of Alice in Wonderland. Some of these are in my book Visualizing with Text in more detail (Routledge, Amazon, companion site). Large size versions of these images are available in this PDF, CC-license so available to use in teaching, etc. (Also embedded at the end of this post).

Visualization #42 and 43 are sub-word visualizations, indicating properties on syllables.

#44-50 are about words, typically extracted attributes about characters. For example, #49 lists adverbs associated with characters, with font-weight indicating most frequent descriptors – Alice is timid, the Queen is furious, the Hatter is dreadful.

#51- 56 are visualizations of phrases and sentences. #52 shows connections of repeated words from the Mad Tea Party. There’s a huge amount of repetition among the characters, reinforcing their position against Alice.

#55 shows the chapter title and portion of the first sentence for each chapter. Various metrics are shown — the underlying bar indicates the dominant emotion for that chapter as extracted using natural language processing. Chapter 6, Pig and Pepper is highly disgusting; whereas the Chapter 3, A Caucus-Race and a Long Tale is measured as sad.

#57 and 58 are visualizations of the entire book. They could be readable printed out on a poster. #57 has large red text under longer paragraphs. The large red text is a capitalized noun and an uncommon verb, adjective or noun in that paragraph – such as: “Rabbit rabbit-hole”, “Mouse lesson-book”, “Bill roof”, “Duchess frying-pan”, “Queen quarrelling”, and so on. The idea is to form large scale landmarks in the text to easily locate portions of the text. Even larger behind the text are the chapter numbers and titles in yellow.

#58 is a version of the entire text of Alice where the text is increased in size if it has been quoted on the Internet. After collecting and processing 200 quotations, the most famous quotes from Alice stand-out larger than the surrounding text. You can immediately see the most quotable quotes, and step closer to read the surrounding text. Interested in what’s the largest text?

  • “Who in the world am I? Ah, that’s the great puzzle!” (Alice, Chapter 2)
  • “We’re all mad here.” (Cheshire cat, Chapter 6)

Sometimes it’s important to think outside of the box of word clouds and bar charts: there is so much more possible and feasible.

Addendum

Yes, there are more, so I see from responses on Twitter and elsewhere. 59-61 are some NLP visualizations: 59 creates little squares, one per sentence, brightness by sentence length. 60 transforms words to a vector space and plots, 61 isn’t quite a word cloud. 62-64 are more artistically driven: 62 animating sentences, 63 punctuation only, 64 is words inside large words which in turn forms a rabbit. I would not have though one could quite manage to get the layout of words to clearly form letters of larger words – apparently it’s quite feasible.

A Wonderland of Data Visualization

I did a presentation for the Lewis Carroll Society of North America (LCSNA) titled “A Wonderland of Data Visualization.” This presentation is more accessible to a wider audience and should be available on Youtube under LCSNA channel.

LCSNA is aware of additional visualizations: 65 is a set of interconnected bar charts comparing content from the original Under Ground vs. Alice’s Adventures in Wonderland – note that the first 5 chapters are largely the same, the latter chapters are largely new content. 66 is a similar analysis presented as a table, which is a type of visualization. 67 is another variation on a timeline indicating Alice’s height chart, in this example using Tenniel’s original illustrations and a related table. 68 is also a timeline in the wild, in this example, a highly illustrated timeline with short captions.

  1. Davies, J. Word Tree [with Alice in Wonderland]. https://www.jasondavies.com/wordtree/?source=alice-in-wonderland.txt&prefix=dear (original WordTree by Wattenberg M, Viégas FB. The
    word tree, an interactive visual concordance. IEEE transactions on visualization and computer graphics. 2008 Oct 24;14(6):1221-8.)
  2. Wolfram. Word Cloud examples [using Alice in Wonderland]. www.wolfram.com/language/11/new-visualization-domains/oriented-word-clouds.html. (original Milgram, S. and D. Jodelet. “Psychological maps of Paris”, Environmental Psychology, 1976,)
  3. Semantic Knowledge. Gephi GEXF Exports in Tropes. https://www.semantic-knowledge.com/doc/V81/text-analysis/gephi-gexf-exports.htm (created using Gephi, Bastian, M.; Heymann, S.; Jacomy, M.. “Gephi: An Open Source Software for Exploring and Manipulating Networks.” International AAAI Conference on Web and Social Media, North America, 2009)
  4. Tanahashi, Yuzuru, and Kwan-Liu Ma. “Design considerations for optimizing storyline visualizations.” IEEE Transactions on Visualization and Computer Graphics 18.12 (2012): 2679-2688.
  5. Paley WB. TextArc: Showing word frequency and distribution in text. Poster at IEEE Symposium on Information Visualization. 2002.
  6. Juxta. Alice: Wonderland vs. Underground. juxtacommons.org/shares/GJm4O9. See also juxtasoftware.org, and Dana Wheeles. “Scholar’s Lab Presentation: Using Juxta Commons in the Classroom”. https://scholarslab.lib.virginia.edu/blog/scholars-lab-presentation-using-juxta-commons-in-the-classroom/.
  7. Senghor, L. Alice’s Adventures After Wonderland: Visualizing Alice in the Digital Era. Visual Learning: Transforming the Liberal Arts Conference, 2018. See also: slideplayer.com/slide/3575003 and kateogorman.org/text-analysis/voyant-tools
  8. Hrdličková, J. A Corpus Stylistic Perspective on Lewis Carroll’s Alice’s Adventures in Wonderland, Thesis, Department of English Language and Didactics, Univerzita Karlova v Praze, 2015. https://dspace.cuni.cz/handle/20.500.11956/84093
  9. Ibid.
  10. Ibid.
  11. Brennan, J.R, Dyer C., Kuncoro A., Hale JT. Localizing syntactic predictions using recurrent neural network grammars, Neuropsychologia, Volume 146, 2020, 107479, ISSN 0028-3932.
  12. Jettka, D, and Stührenberg M. “Visualization of concurrent markup: From trees to graphs, from 2D to 3D.” In Proceedings of Balisage: The Markup Conference 2011. Balisage, vol. 7 (2011).
  13. Thys, F. AI in wonderland. SAS blogs. 2017 Jun 23. blogs.sas.com/content/sascom/2017/06/23/ai-in-wonderland
  14. Ibid.
  15. Agarwal A, Corvalan A, Jensen J, Rambow O. Social network analysis of alice in wonderland. In NAACL-HLT 2012 Workshop on computational linguistics for literature 2012 Jun (pp. 88-96).
  16. Zhu X. Persistent homology: An introduction and a new text representation for natural language processing. In Twenty-Third International Joint Conference on Artificial Intelligence 2013.
  17. Langit, L. Visualizing Alice in Wonderland – Wolfram Alpha Pro. 2012 Feb 12. lynnlangit.com/2012/02/12/visulizing-alice-inwonderlandwolframalphapro/
  18. Maharjan, S., Kar, S., Montes, M., González, F., Solorio, T. (2018). Letting Emotions Flow: Success Prediction by Modeling the Flow of Emotions in Books. 259-265. 10.18653/v1/N18-2042.
  19. Cheng, Y. Down the Rabbit Hole: Visualizing Linguistic Distance And Relationships With Alice in Wonderland, MFA thesis, Northeastern University, Nov 2019. repository.library.northeastern.edu/files/neu:m0455c25q
  20. Kirton, T. Artistic Canvas for Gestural and Non-linear Typography. Eurographics, 2011.
  21. Maharik, R. Digital Micrography. MSc Thesis, University of British Columbia, 2011. https://open.library.ubc.ca/soa/cIRcle/collections/ubctheses/24/items/1.0052114
  22. Bakht, S., Chin, R., Harris, S., Hoetzlein, R., Alice Adaptation Project Analysis. 2008. english236-w2008.pbworks.com/w/page/19019830/Alice%20Analysis
  23. Vera, L. Alice in Wonderland, from a Freudian perspective, visual.ly/community/Infographics/entertainment/alice-wonderland-freudian-perspective
  24. Neel, H. Alice’s Adventures in Wonderland Timeline. 2018. prezi.com/w5kk-wlz9mkn/alices-adventures-in-wonderlandtimeline/
  25. Padilla, C., Páez, D., Wolf J, Alice in Wonderland: Growing up & down, La Loma GbR, 2009, laloma.info/projects/alice
  26. Wenzei, C. Alice through the ages. SCAD. portfolios.scad.edu/gallery/51823709/Alice-Timeline-Project
  27. Arndt, E. Alice’s Adventures in Wonderland Unit Study. 2012. confessionsofahomeschooler.com/blog/2012/04/alices-adventures-in-wonderland-unit-study.html
  28. Carroll, L. Alice’s Adventures in Wonderland Carousel Book. Pan Macmillan. 2016.
  29. Arterberry, A. Alice in Wonderland Infographic. 2013. 111booksfor2011.wordpress.com/tag/alice-in-wonderland-infographic/
  30. DeReign, S. The Real-Life Girl Who Inspired Alice in Wonderland. 2016. coursehero.com/blog/2016/05/25/the-real-life-girl-who-inspired-alice-in-wonderland/
  31. Kemke, K., Schwartz, E. Alice’s Adventures in Wonderland Character Guide. halloweencostumes.com/alice-in-wonderland-costumes.html
  32. Alice in Wonderland Character Map. coursehero.com/lit/Alice-in-Wonderland/character-map/
  33. Paez, D.M. Alice’s Adventures in Wonderland: Beyond a children’s story… aliceandlewiscarroll.weebly.com/analysis.html
  34. Shairrick, C. Alice’s Adventures in Wonderland, 2016. mindmeister.com/633491270/alice-s-adventures-in-wonderland
  35. Parfitt, G.. “Alice’s Adventures in Wonderland Themes: Dreams and Reality.” LitCharts. LitCharts LLC, 25 Nov 2013. https://www.litcharts.com/lit/alice-s-adventures-in-wonderland/themes/dreams-and-reality
  36. Parfitt, G. “Alice’s Adventures in Wonderland Themes: Theme Wheel.” LitCharts. LitCharts LLC, 25 Nov 2013. Web. https://www.litcharts.com/lit/alice-s-adventures-in-wonderland/chart-board-visualization
  37. Thompson, S. H., LAB1 Q1 Book Project, 2016. https://syrenahbookproject.weebly.com/plot–conflict.html
  38. Rais, M, and Sari LL., Comparison between Coraline and Alice in The Wonderland, 2016. slideshare.net/ meiiiiillliiiinaa/comparison-between-coraline-and-alice-in-the-wonderland
  39. Kamak, M. Visualization of Alice in Wonderland, 2019 behance.net/gallery/70097119/Visualization-of-Alice-in-Wonderland
  40. Kusama Y, Posavec S. Lewis Carroll’s Alice’s Adventures in Wonderland. Penguin. 2012.
  41. Giuditta D. Alice in Wonderland. Rhode Island School of Design Portfolios. 2013 Feb 14.
  42. Brath, R. Alice Neologisms. Also in Visualizing with Text, AKPeters, 2021.
  43. “. Alice Prosody. Slightly different version in Visualizing with Text, AKPeters, 2021.
  44. “. Alice Character Emotions. Also in Visualizing with Text, AKPeters, 2021.
  45. “. Alice Character Frequency. Also in Visualizing with Text, AKPeters, 2021.
  46. “. Alice Character Sentiment. Also in Visualizing with Text, AKPeters, 2021.
  47. “. Alice Character Timelines.
  48. “. Alice Top Bigrams. Different version in Visualizing with Text, AKPeters, 2021.
  49. “. Alice Character Adverbs. Different version with Grimm in Visualizing with Text, AKPeters, 2021.
  50. “. Alice Character Ranking.
  51. “. Alice Word Sequences. Also in Visualizing with Text, AKPeters, 2021.
  52. “. Alice Repeated Word Pairs and Phrases.
  53. “. Alice Aligned Repetition.
  54. “. Alice Aligned Repetition 2.
  55. “. Alice Top Emotions, Sentiment & Annotations per Chapter.
  56. “. Dialogue from one character to another in Alice in Wonderland. Also in Visualizing with Text, AKPeters, 2021.
  57. “. Full text of Alice, skim formatted with enlarged landmark text.
  58. “. Full text of Alice, with most popular quotations successively enlarged.
  59. Huang, Shanfan. Text Visualization of Alice in Wonderland, 2016. http://shanfan.github.io/Alice/
  60. Crump, Matt. Semantic Librarian,2019 https://semanticlibrarian.shinyapps.io/alice/
  61. Vallandingham, Jim. Text Vis Starter Kit, 2016, https://github.com/vlandham/text-vis-starter
  62. Vallandingham, Jim. Text Vis Starter Kit, 2016, https://github.com/vlandham/text-vis-starter
  63. Rougeux, Nicholas. Between the Words, Exploring the punctuation in literary classics, 2016. https://www.c82.net/work/?id=347
  64. Gassner, Peter. Alice in Wunderland Nach dem Buch von Lewis Carroll, Nov 1 2021. @grossbart https://twitter.com/grossbart/status/1455234909832892421
  65. Demakos, Matt. From Under Ground to Wonderland, Knight Letter, 2012, Spring 2012 Volume II Issue 18 Number 88 page 17. https://archive.org/details/knightletterno8818lewi/page/16/mode/2up
  66. Demakos, Matt. From Under Ground to Wonderland Part II, Knight Letter, 2012, Winter 2012, Volume II Issue 19, Number 89 page 12. https://archive.org/details/knightletterno8919lewi/page/12/mode/2up
  67. Chang, Howard. Alice in Wonderland in China. Presentation at Spring 2017 Meeting of the Lewis Carroll Society of North America, at San Francisco Public Library, 2017. https://www.youtube.com/watch?v=NKm4_i6-LTo
  68. Van Sandwyk, Charles. Alice’s Accurate Chart of Wonderland: Twice Tested with Up-to-date Corrections. In Lewis Carroll’s Alice in Wonderland with original watercolour by Van Sandwyk. London Folio Society. 2016. https://www.abebooks.com/signed/Alice-Wonderland-Original-Watercolour-Sandwyk-Carroll/30247622756/bd

Posted in Data Visualization, Text Visualization | Tagged , | 1 Comment