Visualizing with Text

Richard Brath. Visualizing with Text. A K Peters Visualization Series, CRC Press, 2020.

Companion Site

Introduction

Visualizing with Text is a new book that uncovers the rich palette of text elements usable in visualizations from simple labels through to documents. Using a multidisciplinary research effort spanning across fields including visualization, typography, and cartography, it builds a solid foundation for the design space of text in visualization. The book illustrates many new kinds of visualizations, including microtext lines, skim formatting, and typographic sets that solve some of the shortcomings of well-known visualization techniques, plus introduces new techniques such as quantitative encoding and SparkWords.

Publisher’s Site with description, table of contents, and reviews:

“A treasure trove of inspiring ideas.”
Ben Shneiderman, University of Maryland

“A delightful invitation to follow Richard Brath.”
Nigel Holmes, Graphic Designer

“Well illustrated and clearly described.”
John D. Berry, Typographer

“A better semantic understanding of data.”
Vidya Setlur, Research Scientist

And more reviews:

“Revel in the new sandbox of text.”
Review by Kaiser Fung, JunkCharts

“- Zoom with one’s attention.”
Review by Alec Barrett, Data Visualization Society

Author’s figures, free to reuse with attribution CC-BY-SA-4.0 license (PDF). Contains 158 images and diagrams by author; and references to 99 other historic and images, most including links.

Video Presentation at UBC (and slides)

Buy the book at Amazon, B&N, Powells, Indigo, BookCity, Waterstones, or contact your independent bookseller.

Interactive Demos and Code

Visualizing with Text isn’t about code: it’s a new way to think about the many different ways that text can be used to convey data in visualizations. It provides a framework for the many different parameters that can be used when designing visualizations that use text. And it provides about 80 different visualization examples using this text visualization framework.

Some of the examples in the book were created with Python, Excel, Word, Illustrator, SVG, or HTML+CSS and a couple are just mockups — but most of the examples were created with JavaScript and D3.js. A few of these examples have been re-implemented in Observable notebooks and appear below. These can be interacted with, the code can be edited, and examples copied into new notebooks or web pages:

This image has an empty alt attribute; its file name is image-8.png

Chapter 5: Labels
Scatterplots typically use dots. Letters, codes or words can be used instead, such as this labelled scatterplot of countries using 2-letter ISO codes (based on figure 5.4).

Chapter 6: Distributions
Distributions are counts of things. Those things can be represented with text instead, such as this example textual stem & leaf plot of characters and associated adjectives extracted from fictional stories (based on figure 6.12)

Chapter 7: Lines
Line charts work well until there are 10 or more lines. Using text can help identify, disambiguate, and facilitate comparison across many lines, such as this unemployment chart with microtext lines. (based on figure 7.9)

Chapter 8: Categories
Set representation can be challenging to show more than 4 sets (e.g. Venn diagram). Typographic indication of set membership can aid noticing differences of membership in this graph layout of Pokemon and 16 skill types (figure 8.18).

This image has an empty alt attribute; its file name is image-5.png

Chapter 9: Ordered Data
Many visualizations show ordered data, such as choropleth maps that use color to show magnitude. But these maps have problems, which can be solved using a labelled cartogram with ordered data instead (figure 9.5)

This image has an empty alt attribute; its file name is image-9.png

Chapter 10: Ratios
Quantitative data supports relative comparison when using lengths. Embedding lengths with text formats facilitates comparison and provides detailed textual context (figure 10.12)

Chapter 11: Prose
Text for reading can be enhanced with typographic visualization to facilitate skimming, pronunciation, spelling and so on. This text skimming example processes prose to make uncommon words more salient (section 11.2)

Chapter 12: SparkWords
Words in running text can be formatted to convey data related to the words, such as, quantitative values associated with word lists.

Visualizing with Text and Teaching

Visualizing with Text can be used in courses in computer science, graphic design, typography, cartography, data journalism, humanities and other areas. The book and examples can be used in teaching. One potential approach to use in teaching is to define, design, implement and evaluate a novel textually-oriented visualization:

1. Identify a text-intensive visualization or application. Chapter 1 references many text-heavy visualizations both old and contemporary, and these occur in other sections as well, such as the examples in Chapter 3, 7, 9 and 12. After reviewing Chapter 1, students, teachers and researchers can find other text-intensive applications and examples to consider as the basis for a project, whether generalized as a computational technique that works across datasets; or as a singular dataset for the purposes such as the design of an infographic; or as the basis for a story for data journalism.

Museum websites, davidrumsey and archive.org may provide inspiration. Data is plural provides many intriguing datasets begging to be visualized, in addition to many other sources of data such as UNdata, OECD, data.gov, NYC open data, Project Gutenberg and API’s to textual data from social media, review sites, academic abstracts, patents and so on. See the figure list at the beginning of the book for sites with historic image collections.

Given a sample visualization, application or dataset, it is necessary to define the goals and uses. With historic examples, the uses may be described directly, whereas datasets are open-ended and the researcher will need to decide on the goals.

Chapter 4 explicitly sizes the economic market opportunity for some broad applications areas: courses with an entrepreneurial component could also attempt to quantify the application opportunity.

2. Create designs for a potential text-oriented visualization. Given an application, data and goals; consider various different design possibilities for visualization. Consider using different subsets of text that can be extracted, different scopes (from characters to paragraphs), different data types and so forth, so that designs aren’t too narrow. Chapter 4 has some examples plotting multiple designs that show different kinds of information from the same dataset.

Consider using the Five Design-Sheet Methodology (Roberts, Headleand & Ritsos, 2015, pdf) if the researchers are unfamiliar with design methods. If coming from the computer science side, the design can also include technical, data processing considerations – for example – what NLP approaches will be used to extract the relevant information from the raw data.

Designs should also be critically aware of what data they have / do not have, and biases that their design may introduce (see Data Feminism, D’Ignazio and Klein, 2020, MIT). How might the design induce a negative feedback loop if used repeatedly (see Weapons of Math Destruction, 2016, Crown)

3. Implementation. Take the designs and make them work with real data. For some graphic design courses or data journalism courses, this might include processing the data using Google Sheets and then using design tools to layout the visualization. Programming courses can use libraries such as D3.js (to do the visualization) and NLP tools (such as spaCy, compromise.js or NLTK to process the text). Consider using observable as a way to quickly, incrementally implement the design and uncover issues not foreseen at design-time (e.g. data quality issues, unexpected NLP results, slow processing time, layout problems, and so on). You can reuse code from the example code above or do cut-and-paste.

4. Evaluation. How should a new text visualization be evaluated? It should depend on the initial goals set back in the initial step and how well the design and implementation meet those goals. Munzner’s nested model of design and validation (2009, 2012, 2015, 2015 book) provides a useful framework for assessing the overall implementation and each step in this outline. I personally find evaluation by critique highly useful (2016), particularly when used throughout all the steps in this process.

Please cite the book if you’re publishing research informed by the book or the sample code.

Visualizing with Text in Industry

My long history in the design and development of visualizations in industrial applications such as financial markets, supply chain, health care, call centre, web analytics and so on made me realize there was a gap in research knowledge regarding the use of text in visualizations, and spurred the origin of this research, investigated over the course of a PhD degree. This research, in turn, is being used to aid the design and development of new visualizations. Feel free to use any of the ideas and/or sample code in new commercial uses. If you’re involved in large scale corporate implementations pushing the boundaries of NLP, text analytics and visualization, reach out to us at Uncharted. If you do use some of the ideas, please cite this book.

Visualizing with Text Speaking Engagements

I have given presentations regarding aspects of Visualizing with Text, for example:

General overview, e.g. at University of British Columbia link

Text analytics, e.g. Strata Data Conference link

Cartography design, e.g. at sous le texte la carte at ESAD Valence

Critical interaction design, e.g. at Indiana University

Big data and global policies, e.g. at Yale University

I can accept a small number of speaking opportunities, contact me if interested.

The Making of Visualizing with Text

The origins of Visualizing with Text start back in 2009, when I was asked to do a keynote speech for IV2009. At that time, I started wondering about all the different areas in the research visualization community that were either ignored or seemed poorly defined. For the purposes of the keynote, I discussed shape, as a visual attribute that people would talk about but meant many different things.

A few years later in 2013, I was having a discussion with Ebad Banissi about these poorly defined areas in visualization, such as shape, texture and text. Ebad thoughtfully replied that these were big and complex subjects worthy of a PhD: so I took him up on the offer and started a part-time PhD, while working at Uncharted full-time. Having previously done a part-time Masters degree, I knew that part-time independent research can be difficult to manage, so I came up with a few strategies to keep the research moving forwards. One strategy was to publish monthly blog posts about the research, as a way to force me to complete at least some research worth discussing every month. These blog posts continued after the PhD, when Enrico Bertini and Tamara Munzner encouraged me to adapt and extend the research into this book. I engaged with CRC Press, which also created some blog posts regarding the book.

All the blog posts capture bits of the research process and book-making process. These might be of interest to future writers interested in making a book, future students struggling with structuring their PhD, or the curious who wonder how some of these ideas evolved. Here are links to relevant blog posts over the last seven years.

Foundations

The design space of visualization is fundamentally what the research and book are about. The conceptualization of the design space regarding text evolved during the research from initially only focusing on typographic attributes (2013-2014), then formally adding in text scope (2015-2016), data type (2016), layout (2017), literal data (2018) and interaction (2019). Posts related to these foundational topics:

  1. The Table of Visual Attributes, showing visual attributes (hue, size, etc) that encode data as identified by researchers. This was the starting point for all this work. Note, the thesis has a more up-to-date version.
  2. Encoding semantic data in text was a recognition early on, that choice of font, layout, and so forth can also encode semantics associated with the text via formats.
  3. Why visualize with fonts now? An early discussion on why it’s feasible and useful to use typographic attributes in visualization.
  4. Font legibility. If you encode with text, first you must make sure that text is legible! Closely related, but different, is readability.
  5. Parametric fonts. This very first example of a parametric font encoding data was sketched in a notebook, then hand-crafted in Illustrator to try out the idea. This evolved to using the highly parameterized fonts from Prototypo to create large font families to experiment with programmatic encoding into type attributes such as x-height and serif-width. In some of the examples above, variable fonts are used.
  6. The scope of text represented in visualizations was considered in a post in early 2016, where existing text visualizations at textvis.lnu.se were analyzed.
  7. Noticing a difference is different than decoding a difference. For some visualization tasks, it is important to easily notice the difference, whereas decoding can be done as a separate task, if needed.
  8. The first blog post that brought together an initial version of the design space was 2016, coincident with a successful journal article in She Ji to the broader design community.
  9. Bertin, back in 1967, actually wrote about using typography in visualization, and again in 1980!

Historic Visualization Examples

Old pre-computer visualizations are relevant to visualization research. I looked at a lot of different examples to see what the range of possibilities were and then use these examples to frame the design space:

  1. Historic examples of font attributes such as Cyclopaedia (a favorite!) and Cryes of London.
  2. Historic posters and a patent using formats to aid text skimming.
  3. Font italics and obliques. They are not the same: upright italics are feasible.
  4. Font weight. The history of bold, plus some historic and new examples.
  5. Font underlines are explored in the post Maligned Underline. Bad idea or good?
  6. Alphanumeric financial charts are something I know professionally. They evolved pre-computer and continue to evolve. Most people don’t know them.
  7. Thematic maps and their problems.
  8. Visualizations that organize many different variables of data into an inventory.
  9. Code editors use typographic visualization — since the early 1980’s.
  10. Color in text visualizations: foreground, background and chromatic fonts.
  11. Album de Statistique Graphique and curvy text on paths or around bubbles.
  12. Isotype legacies: too much text removed?
  13. Metabolic pathways: flowcharts packed with text.
  14. Old maps layer in a lot of data via typographic attributes.

New Visualization Examples

Given a new design space, it should be feasible to use these techniques for different tasks:

  1. Equal area (labelled) cartograms, a very early design idea, created as a reaction to problems with choropleth maps. The first ones were rectangular. Later ones tried other layout techniques, such as force-directed graph.
  2. Embedding quantitative values into text was a challenge. An early example used position to encode quantitative data. Later extend to show proportions, e.g. using bold or underlines.
  3. Text skimming by weighting uncommon words (an early version, with a lot of italics). Then re- implemented so that I could process full books and export to PDF.
  4. A Venn diagram enhanced with typographic attributes.
  5. A set diagram using a graph to show emotion words or Pokémon.
  6. Microtext line charts.
  7. Topic words on a giant graph of patents by Uncharted.
  8. Textual stem & leaf plots of bigrams and word stems.
  9. Narrative table of contents, which became infused with diagrams and snapshots.
  10. A dozen variants of Bertin’s dataset.
  11. Data comic with lots of text.
  12. Sparkwords: text formatted with added data represented in-line in prose.
  13. Headlines on a scatterplot.

Evaluation and Feedback

During the PhD, I kept looking for novel ways to collect incremental feedback from many people to help guide the research and avoid dead-ends or at-least help prune the research into an achievable PhD. Many experts in various domains provided great feedback.

  1. I made a survey to capture some feedback from people after some interim PhD status presentations that I did for a few organizations such as LSBU, UOIT, and Uncharted. The survey still works, but I’m not checking the results any longer:-)
  2. TypeCon was the very first non-visualization conference where I spoke to experts in another domain about intermediate research results. It was a big test to validate outside the narrow community of vis researchers and I met a great group of awesome typographers such as John D. Berry, Gerry Leonidas, Richard Hunt, Nick Shinn, Rob McKaughan, and David Addey. I’ve since talked to experts in design, cartography, etc.
  3. Critiques are an evaluation technique based on dialogue, used in design communities, and relevant to visualization design and evaluation.

Process

Starting near zero and completing a PhD and then a book in 8 years. What was the process? There were a few blog posts:

  1. How to do a part-time PhD in 5 years.
  2. How to organize a visualization book.
  3. Designing a book cover.
  4. Figuring out image counts in a book.
  5. Making interactive demos in Observable.

Acknowledgements

I reside in Toronto: which is on the lands of the Anishinaabe, Haudenosaunee, Huron-wendat (Wyandot) and Mississaugas of the New Credit territory.

Uncharted Software was highly supportive of my research, such as time to attend conferences and meetings; access to staff for presentations, evaluations, feedback and critiques; access to various information visualization resources such as historic publications and access to some customers; assistance with collecting some datasets such as social media datasets; and so on.

Collaborative discussions have included a variety of people from various communities including information visualization: William Wright, Ben Shneiderman, Jock MacKinlay, Jeff Heer, Bob Spence, Tamara Munzner, Colin Ware, Chris Collins, Katy Börner, John Stasko, Enrico Bertini, David Jonker, Fanny Chevalier, Jason Forrest, Eugene Sorenson, and more; typography: Gerard Unger, Gerry Leonidas, Fiona Ross, Paul Luna, Keith Tam, Matthew Lickiss, Richard Hunt, John Berry, Nick Shinn, Steve Ross and Eric Kindel; information design and infographics: Nigel Holmes, Isabel Meirelles, Martin Keohan, and Ray Vella; cartography: Cynthia Brewer, Alan MacEachren, Francis Harvey, Alexander Savelyev, and Judith Tyner; HCI and Text Analytics: Marti Hearst, Eli Blevis, James Hodson, Craig Hagerman, Scott Langevin, Vidya Setlur and Lynn Cherny. And there are many others not mentioned – thank you for taking the time to discuss.

I also acknowledge that some data citations could be improved, and that some visualizations use encodings that reinforce stereotype biases. My intent is to use encodings that are automatically understood (and therefore unfortunately reinforcing the stereotype), while at the same time specifically drawing attention to the bias indicated in the data, such as gender and ethnicity (e.g. US Senate visualization Figure 8.6) or gender and class (e.g. Titanic passenger visualization Figure 8.14).