Early in my career, I’d create data visualizations and without fail, my manager would ask: “So, what’s the story here?” In data visualization the objective isn’t the visualization – it’s the insight gained from the visualization.
Visualizations don’t announce their insights. Whether dashboards with a couple of bar charts, massively complex visualizations of billions of tweets, or hairball graphs there are many possible insights. Narrative visualization is the addition of a story to a visualization, to explain a visualization and to highlight specific insights. The NYTimes and The Guardian create human-authored narratives to explain insights. But visualizations with human-authored narratives like the Times are a lot of work. Instead, an office worker without a graphics team, might add a paragraph or two on top of a visualization, maybe with a link or two that pivots the view.
Instead, data-driven Natural Language Generation (NLG) can completely sidestep visualization. The approach is to use data and advanced analytics to algorithmically derive the insights, then assemble those insights with computer generated text. Some of the results are impressive, generating not just insights but interesting stories.
But automated insights wrapped in natural language loses all the contextual data. One alternative is to simply add a visualization beside an NLG paragraph. But that requires the reader to do all the work cross-referencing back-and-forth between the paragraphs and the visualization.
Why not automate insights, and put them directly in the charts?
Visualization libraries such as Semiotic have built-in code-driven annotations, so it would be feasible to automatically generate the insight, map that to some kind of annotation, then plot the annotation. This sounds great, but before we can do that we need to know:
What are the kinds of insights that work well with annotations?
This is something that we’ve done in a number of different ways on different projects over the years. Looking back, there are some common patterns, such as an insight about a specific data point, about the plot area or some event data:
Insights about data points
Scrutinizing specific data points is a common task in data visualizations. These data points might be the extremes (to identify who the leaders and laggers are); or perhaps outliers (to validate that the data isn’t erroneous); or may be benchmarks that help orient the viewer in the data (like a landmark). Labeling points is straight-forward in a variety of different visualizations, as these diagrams suggest:
Insights about the plot area
In other cases, the desired insights are at an aggregate level. For example, understanding the range of the data is a common task. How big is the difference between the biggest and smallest data point? If I look at a stock chart there’s a big difference between a stock that has a 2% range vs a stock that has a 80% range and a big difference to how an investor responds to that magnitude.
Trend is related, and there are many ways to potentially measure trend, such as average, last-first, regression, moving average, curve fitting and so on.
Sometimes the challenge isn’t about the data, but what the semantics of the plot are: scatterplots can be challenging because the sweet spot requires some cognitive effort to determine the meaningful combinations among coordinates. Instead highlight areas on the plot or using contours are effective.
Another pattern, common in sports commentary, is the threshold: such as the sports superstar approaching the all-time record. This is easily translated into a visual annotation such as a line:
Insights associated with an event
Sometimes the insight is an event that already has associated commentary; such as a news story, a tweet, or a pivotal event. In these cases, a narrative snippet may already exist, such as a news headline. This can be depicted directly as a textual annotation:
Annotations explicitly label insights directly on a plot, with the full context visible. The viewer gains the benefit of the insight. The viewer is also fully informed by the context to ask critical questions or otherwise probe the data. It’s this ability to understand the authored insight but then derive our own insights that makes narrative visualization so compelling.
The above patterns are just a start. What is the catalog of all the insightful patterns that go with visualizations? Do they work across the wide variety of esoteric visualizations? And, more important, which insights are meaningful: there are many possible insights, so from an automation standpoint, which insight should be promoted to be a visible annotation?