We create visualizations to aid viewers in making visual inferences. Different visualizations are suited to different inferences. Some visualizations offer more additional perceptual inferences over comparable visualizations. That is, the specific configuration enables additional inferences to be observed directly, without additional cognitive load. (e.g. see Gem Stapleton et al, Effective Representation of Information: Generalizing Free Rides 2016).
Here’s an example from 1940, a bar chart where both bar length and width indicate data:
The length of the bar (horizontally) is the percent increase in income in each industry. Manufacturing has the biggest increase in income (18%), Contract Construction is second at 13%.
The width of the bar (vertically) is the relative size of that industry: Manufacturing is wide – it’s the biggest industry – it accounts for about 23% of all industry. Contract Construction is narrow, perhaps the third smallest industry, perhaps around 3-4%.
What’s really interesting is that area represented by each bar is highly meaningful: the percent increase x size of industry = total income gained in that industry. For example, the area of Transportation and Contract Construction are perceptually quite similar. This can be validated mathematically, Transportation at 7% increase x 7% industry size, is a similar total income gain as Contract Construction at 13% increase x 3.5% industry size. Or Mining at 9% increase x 3% industry size, is about the same total income gain as Agriculture 3.5% increase x 8% industry size.
This meaningful area is the free-ride. Perceptually, one can directly observe and compare relative areas. Total income gain hasn’t been explicitly encoded, it’s a result of the choice on encoding length and width. If the viewer is potentially interested in total income gain in addition to percent increase and relative size, this is a useful encoding. Total income gain might be very important in government policy, for example, as the total income gain is directly proportional to the taxes generated.
A more common design choice these days might be to use a treemap to show one variable (relative industry size) and color to show the second variable (color to indicate percent increase); like this:
In the treemap, size and color are explicit, but there’s no free-ride. The combination of color and area isn’t a perceivable combination: the similarity in total income between Transport and Construction is not obvious; nor the similarity between Mining and Agriculture. In the treemap, the area encodes relative size, but the length and the width of the boxes are not meaningful. The color encodes percent change, but color isn’t effective for comparing relative quantities. If total income gain is a desirable insight, then the treemap fails.
Edward Tufte (1983) discusses multi-functioning graphic elements, which doesn’t quite align with the idea of a free-ride. Johanna Drucker (2014) discusses this notion as generative: a representation that produces knowledge as opposed to a representation that simply displays data. But I like the definition of a free-ride, which succinctly explains the perceptual benefit created by the choice of representation. See Gem’s paper for an example applied to Euler diagrams.
Visualization designers need to consider the free-rides and other perceptual inferences different visualization alternatives provide, and choose among visualizations on how those inferences suit the viewers’ task.
Percent Increase in National Income by Industry is from page 178 in the book How to Chart: Facts from Figures with Graphs, by Walter Weld, 1960. Walter didn’t particularly like this chart, partially because there is no legend nor axis for the widths. Personally, I have seen this type of bar chart used effectively in financial services.
Reblogged this on Boris Gorelik and commented:
Dr. Richard Brath is a data visualization expert who also blogs from time to time. Each post in Richard’s blog provides a deep, and often unexpected to me, insight into one dataviz aspect or another.