Text Visualization and Search

When considering text visualization or visual text analytics, search has to be considered a significant application. Tag clouds originated as a form of faceted search.

What is search?

But first, take a step back and consider what search really is. I like Card and Pirolli’s sense-making loop:

Sensemaking Loop (from Card and Pirolli)

Sensemaking Loop (from Card and Pirolli)

A key takeaway from this analysis is that “search” is really made up of many different tasks. As such, there are likely many different user components that address these different tasks. If you take a look at various search interfaces you’ll see all these components working together. Amazon has a search box of course, plus facets for refinement (on the left sidebar), plus a hierarchy of departments (in the center, for browsing down through a hierarchy). Or, if you consider a news portal, such as google or bing, elements include the search and facets; as well as individual stories including headlines, a lead sentence or paragraph, and possibly a photo. Textually, there are things happening at the level of individual words and phrases, lines, (like the headline or a keyword in context), and paragraphs.

A quick review of text visualization

There is a nice repository of text visualizations at textvis.lnu.se. In mid-January 2016, there were 250 text visualizations listed from 1976 – 2015. Looking through these visualizations, you can enumerate whether these visualizations depict text at the level of words, sentences, paragraphs, full documents or down to individual characters. Some don’t have any text at all. Here’s the results:

Representation Number of Visualizations Percent of Visualizations
None 40 19.1
Character/Syllable 2 1.0
Word 173 82.8
Line 19 9.1
Paragraph 15 7.2
Document 5 2.4

*totals do not add up to 100%: some visualizations use multiple techniques.

There’s a big discrepancy here. More than 80% of text visualizations operate at depicting words. Textual representations of lines (sentences), paragraphs, documents are uncommon – all together less common than no text at all. Slicing it another way looking at the visual representation of text we find:

Text Visualization Number
No Text 40
Plain Text 103
Tag Cloud 39
Other 68

This table shows (again) 40/250 with no text. The next number is interesting 103/250 are just plain text: i.e. label on a graph or a line of text. Plain text uses no color, no size, no enhancement, no additional encoding. This is, in some cases, a missed opportunity.

Out of the remaining, one third are tag clouds. Tag clouds are popular on the web and popular in research too. There are tag clouds in bars, tag clouds in graphs.

So what?

It seems that there should be a lot of opportunity beyond words, tag clouds and facets in search interfaces and text visualization. From a search perspective, visualizing the text of lines, paragraphs and documents and should be considered. A focus on words removes them from their context. Loss of context means that the sematics of those words is lost: homonyms, ambiguity, sarcasm and other meanings are lost when words are split up.

My current research on search and text visualization is in this article: Font attributes enrich knowledge maps and information retrieval. It includes a few different visualizations at various different levels of text: words through to paragraphs.

More importantly, there is still a lot of design, experimentation and research into new text visualizations to address all the many tasks associated with search including search, filter, skimming, reading, extracting, connecting, schematizing, assembling and story telling.


About richardbrath

Richard is a long time visualization designer and researcher. Professionally, I am one of the partners of Uncharted Software Inc. I am also pursuing a part-time PhD in data visualization at LSBU. The opinions on this blog are related to my personal interests in data visualization, particularly around research interests related to my PhD work- this blog is about exploratory aspects of data visualization not proven principles.
This entry was posted in Data Visualization, Search, Text Visualization. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s