I’ve done a couple of presentations with content from my book Visualizing with Text to grad classes in the last month. In both cases, a couple of people expressed concern regarding the complexity of some visualizations, being particularly data dense. This is an argument that I have heard on occasion throughout my career as I am often involved in the design and development of data-dense visualization for domain experts. Data-dense displays are not uncommon in domain applications – consider a few examples:
These visualizations are packed with data. The map has many layers: roads, railways, rivers, canals, drains, buildings, labels, textures, etc. This periodic table has icons, symbols, numbers, names, pathways, ions, solutes, charges, etc. The trading desk shows a wide variety of information associated with securities: timeseries, events, color-coded news, internal data. The electric grid shows hundreds of entities: transformers, capacitors, hydro dams, wind farms, interchanges, power corridors of different voltages, status, thresholds, etc.
And, if you look at the Uncharted research website, you’ll see many more data-dense visualizations that our company has worked on.
Data density slows down understanding
The essence of the data-density criticism is that, with a greater number of data points and/or multi-variate data, the viewer can become confused as to where to look. There may be many different visual patterns competing for attention. Should the viewer be focusing on local patterns in a subset of the display, or macro patterns across the entire display? Should the viewer be attending to the color, or the size, or the labels? If each represents different data then the semantics will be different, based on the visual channel that the viewer is attending to. Worse, if the viewer starts to move back and forth between many different channels, one may forget the encodings and become more confused.
Some people have become conditioned to think of data visualization as they might see in the popular press, visualizations made for communication, or visualizations that utilize straightforward visualization types that you might get with a library such as D3 or Vega:
Viewers may be conditioned to think of these as encompassing all (or most) visualization types, with many articles organizing visualization into a limited palette of visualization layouts: periodic tables, chart choosers, lists, galleries, zoos, and more. These visualizations typically don’t have many data points (20-200), and typically show only a couple of variables. Data is homogeneous – you’re not looking at multiple datasets with different types of entities jumbled together. Answers are easier, because there really aren’t many different dimensions or layers to be considered.
But not all problems are simple.
Complex problems may need complex data
The images in Figure 1 show that multi-metric, data-dense visual representations exist in practice – in both historic visualizations and modern interactive visualizations. These complex visualizations bring together multiple datasets in layers, in many windows, in large displays – i.e. into data-dense representations.
This extra data is required, because there may be multiple answers possible. If the price of a stock goes down, it may be due to the overall market going down, to a competitor dropping their prices, to poor sales data in the company’s earnings, to a negative news story about the company, or other factors. The extra visual context facilitates reasoning across the many plausible causes to assess the situation. In the stock example, multiple causes can be true at once: an expert needs to see all and determine which are most relevant to the current situation.
In addition to quantitative data, there may be other facts and evidence: qualitative data, news, videos, and so on. There may be multiple perspectives to consider. There may be different time horizons to consider. (For example, the stock market collapse in 2008 was triggered by the collapse of Lehman Brothers on September 15, 2008; but months before Bear Stearns (a competitor) was acquired when it ran into funding issues, and even earlier some mortgage origination companies went bankrupt.)
More generally, wicked problems are not easily solvable. The problem can be framed in more than one way, different stakeholders have different timelines, constraints and resources are subject to change, and there is no singular definitive answer.
As a colleague tells me: Complex problems have easy to explain wrong answers.
The Value of Data Density
Communication: There are many different reasons for creating visualizations. The low density visualizations in Figure 2 may be part of narrative visualizations, for explaining the results of an analysis. Data has been distilled to a few key facts.
Dashboard: Or, simple low-density visualizations may be part of a overview dashboard, with many small visualizations, each of which provides an overview of a different process, and can be typically clicked on for more detailed analysis. These overviews only need to provide sense of status: if there are any issues then the viewer has workflows to access more detail.
Beyond the communication and dashboard uses, there are many other uses for visualizations, where density may be valuable:
Organization: For example, the map and the periodic table in Figure 1 organize large amounts of data. These many layers of data allow cross referencing between many different types of information. On the map, the user may need to know the location of buildings (objectives), roads (connections), canals and railroads (obstructions) in order to plan a route.
Monitoring: The market data terminal and the electric grid operations wall in Figure 1 provide real-time monitoring across many data streams. Many heterogeneous datasets come together into a single display. Time is of the essence in real-time operations. Detailed data can’t be hidden a few clicks away: all key information must be designed and organized for quick scanning and immediate access.
Analysis: Knowledge maps and network visualizations are often about analysis of complex data. SciMaps.org has 100 knowledge maps, each collecting and visually representing many facets of a particular corpus; such as Figure 3 left, an interactive visualization of the edit history of Wikipedia articles by Wattenberg & Viégas. Visual Complexity has 1000 network visualizations, such as Figure 3 right, a dynamic visualization of social networks indicating people, links, activity, postings, sequence and message age by Offenhuber & Donath. Both of these are visualizations about text over time: edits, exchanges, persons. In both cases there are many dimensions to understand and comprehend.
Exploration: Data-dense visualizations aren’t limited to domain experts attempting to understand complex datasets for their jobs. In Dear Data, Georgia Lupi and Stephanie Posavec create some awesome multi-variate data-dense visualizations of mundane day-to-day data. Why? Exploratory data analysis needs to consider lots of different data – the exploration is required to consider, assess, investigate, compare, understand and comprehend many different data elements. To do exploratory data analysis with only a well-organized quantitative dataset may miss much relevant data (e.g. see Data Feminism). Lupi and Posavec show by example that many different attributes that can be extracted from everyday life and then made explicitly visible for an initial exploratory view.
Data Density and Visualizing with Text
The objective of the book is to define the design space of Visualizing with Text for all kinds of visualizations, simple or complex. Section 2 in the book deals with simple labels, such as scatterplots, bar charts and line charts: text can be used to make simple visualizations more effective. Section 3 in the book goes further, using multiple visual attributes to convey more data (figure 5).
The Future of Data Density
Data density is likely to become a bigger issue in the next decade. Greater awareness of bias in data makes it more important to represent more datasets in a visualization. Analysis of richer data types – such as text, video and imagery – will likely necessitate new ways to layer in additional visual representations. Bigger data will have even more variables requiring more ways to show more data, or risk summarizing too much useful detail out of data. Specific visualization applications, such as cybersecurity, fake news, and phishing, need to deal with ever more complex attacks which implies more nuanced analyses based on more complex data.
Data density will become increasingly important to future visualization and visualization research.