When considering text visualization or visual text analytics, search has to be considered a significant application. Tag clouds originated as a form of faceted search.
What is search?
But first, take a step back and consider what search really is. I like Card and Pirolli’s sense-making loop:
A key takeaway from this analysis is that “search” is really made up of many different tasks. As such, there are likely many different user components that address these different tasks. If you take a look at various search interfaces you’ll see all these components working together. Amazon has a search box of course, plus facets for refinement (on the left sidebar), plus a hierarchy of departments (in the center, for browsing down through a hierarchy). Or, if you consider a news portal, such as google or bing, elements include the search and facets; as well as individual stories including headlines, a lead sentence or paragraph, and possibly a photo. Textually, there are things happening at the level of individual words and phrases, lines, (like the headline or a keyword in context), and paragraphs.
A quick review of text visualization
There is a nice repository of text visualizations at textvis.lnu.se. In mid-January 2016, there were 250 text visualizations listed from 1976 – 2015. Looking through these visualizations, you can enumerate whether these visualizations depict text at the level of words, sentences, paragraphs, full documents or down to individual characters. Some don’t have any text at all. Here’s the results:
|Representation||Number of Visualizations||Percent of Visualizations|
*totals do not add up to 100%: some visualizations use multiple techniques.
There’s a big discrepancy here. More than 80% of text visualizations operate at depicting words. Textual representations of lines (sentences), paragraphs, documents are uncommon – all together less common than no text at all. Slicing it another way looking at the visual representation of text we find:
This table shows (again) 40/250 with no text. The next number is interesting 103/250 are just plain text: i.e. label on a graph or a line of text. Plain text uses no color, no size, no enhancement, no additional encoding. This is, in some cases, a missed opportunity.
Out of the remaining, one third are tag clouds. Tag clouds are popular on the web and popular in research too. There are tag clouds in bars, tag clouds in graphs.
It seems that there should be a lot of opportunity beyond words, tag clouds and facets in search interfaces and text visualization. From a search perspective, visualizing the text of lines, paragraphs and documents and should be considered. A focus on words removes them from their context. Loss of context means that the sematics of those words is lost: homonyms, ambiguity, sarcasm and other meanings are lost when words are split up.
My current research on search and text visualization is in this article: Font attributes enrich knowledge maps and information retrieval. It includes a few different visualizations at various different levels of text: words through to paragraphs.
More importantly, there is still a lot of design, experimentation and research into new text visualizations to address all the many tasks associated with search including search, filter, skimming, reading, extracting, connecting, schematizing, assembling and story telling.