Lyrics as Tiles: Billie Eilish’s Bad Guy coded with Color and Texture

Song lyrics depend heavily on rhythm, syllables and rhyme (in some songs such as pop songs). Some poetry visualizations add white space between words and lines, which can then be filled with various visualization techniques, such as forming links between related words. Instead, if a lyric is considered like a stacatto sequence of syllables, the layout is more akin to a set of tiles locked together. Then instead of whitespace, visualization is constrained to the tiles.

Simple tiles with English and phonetic syllables

To start, consider Billie Eilish’s Bad Guy. Similar sounds (e.g. rhyme) don’t visibly pop-out in English text. Our goal is to encode those to make them visible. A simple approach is to convert English words to phonetic alphabet, so that the same sounds have the same phonetic symbol:

Bad Guy as tiles, showing English and phonetic alphabet. Note similar phonetic symbols on rhymes.

You can visually scan the phonetic symbols, but you have to look closely at the letter shapes: Rhymes are driven by the vowel sound, which may or may not be at the end of the syllable. Furthermore, in the international alphabet, some vowel sounds are represented by a single symbol and some are represented by two symbol thus making it difficult to attend to the relevant symbols. With phonetic symbols, sounds are comparable, but don’t visually pop-out.

Color-coded vowel sounds

How to make the sounds visually pop-out? Each syllable is a collection of phonemes for vowels and consonants, typically leading consonant(s), vowel(s), and trailing consonant(s). However, there are ~23 consonant phonemes and 16 vowel phonemes in English. Encodings such as brightness, font-weight, etc., don’t scale well to 16-23 uniquely discernible categories. Color is a possibility color — particularly given that some phonemes are similar sounding. Using a confusion matrix, colors can be chosen so that close-sounding sounds have similar colors (although vowel frontness and vowel origin matrix might be better).

Here is a variation where the phoneme is split into three parts:
– leading consonant in light italic serif font
– central vowel in heavyweight sans font, color coded to the vowel
sound, with similar sounds in similar colors
– trailing consonant sounds in a heavyweight serif font

Color-coded vowels visually pop out making patterns of same vowel sounds easily seen.

You can easily scan and notice similar vowel sounds in final syllable of each line, plus the trailing consonant – aka the rhymes (e.g. g). You might also notice some other phonetic techniques such as the leading repetition in the chorus mk / mt, or near rhymes such as ˈkrɪmənəl / ˈsɪnɪkəl.

On the otherhand, using the phonetic alphabet results in some unfamiliar symbols for most native English speakers, e.g. ʌ for “uh” or ʃ for “sh”.

Color-coded backgrounds

Instead, the tile background can be color-coded and the text switched to English spelling:

Color-coded backgrounds of English words by vowel sound. Color patterns pop, but consonant sounds are lost.

But the sound of the trailing consonant has been lost: guy and type have the same vowel sound, but don’t perfectly rhyme due to differing trailing consonant. Worse, nose, toes, and knows, actually do rhyme but are spelled quite differently.

Fun with a polychromatic font

A polychromatic font is a font specifically designed for use with multiple colors. There are a few different fonts that support multiple colors, by providing multiple versions of the font that align overtop each other. Mostly these fonts are available for purchase, not freely available. The example below uses the font Up up and away:

In the example, below, the inside color is the vowel sound, the outside color (and the gratuitous 3D) is the final consonant sound. If there is no final consonant, then background color is used:

A riot of color and gratuitous 3D. Fun, but probably not effective visualization.
A closeup of polychromatic lyrics with colors based on vowel and consonant sounds.

This is just for fun – “Hey, I’ve got this great font, let’s try it out and see what happens”. It has long been known that adjacent colors influence the perception of a color. In practice, this would never work perceptually for effective visualization but could make some viscerally-exciting data-driven text. And some of the color combinations aren’t very legible. See Josef Albers Interaction of Color for awesome paintings of the effect:

Joseph Albers, Colour interaction | Josef albers, Josef albers color, Color

Textures! (plus color and text)

Finally, we get to a version with a tile where:
– English text is used per tile
– Color indicates the vowel sound
– Texture indicates the final consonant sound (if no consonant, then no texture)

Bad Guy with color for vowel sound and texture for final consonant sound. Common sounds line up in many places.

Since color is dominant, it can be seen the guy and type are the same color and thus the same vowel sound. However, type, with the ending p sound, gains the p texture, thus differentiating it from guy. Tough, rough, e-nough all share the same color with puffed, but the texture change gives away the slightly different color between puffed and the others.

Colors are created so that similar vowel sounds have similar colors. Likewise for textures, similar consonant sounds attempt to have similar textures. If rhyme is largely based on the vowel and trailing consonant, this color and texture per syllable create visible patterns across the tiles, visually showing rhyming scheme as well as other phonetic devices. Note similarities also at beginning of lines, e.g. Sleepin‘/ Creepin‘, or Own me/ I’ll be/ with me/ If she/ pity.

At a high-level, sub-columns of same color, same (or similar) trailing consonant visually standout revealing some of the textual structure running through sections of the lyrics.

Dancing Queen

Brig really (really) likes Abba. What happens when we use this to visualize Dancing Queen?

Dancing Queen with color for vowel sound and texture for final consonant sound

Many rhyming pairs are immediately apparent: scene / queen; low / go; swing / king; guy / high. And near rhymes stand out too: queen / sweet / teen / beat / rine all share the long E vowel (purple), and flip between a trailing n or t (diamond hatch vs horizontal line). The near match is also apparent in jive / life (both purple but sawtooth vs x texture).

At a more meta-level, Dancing Queen seems to have more of a blue/purple consistency compared to Bad Guy that tends to be purple and punctuated with other other distinct colors such as cyan and chartreuse.

grandson: Dirty

What about something that isn’t quite so pop music, less lyric driven? Everything above is focused purely on words, i.e. poetry. Pitch, duration and the many other music variables haven’t been considered, and certainly there are many other music visualization techniques (e.g. Ethan Hine, Brian Cort). A linguistic musician tells me genres may use near rhymes rather than perfect rhymes, or may alter the inflection or pronunciation of words to get rhymes (thanks Craig). So, here’s grandson’s Dirty:

Dirty with color for vowel sound and texture for final consonant sound

It is more difficult to define line length and color appears more random as well. There’s no predominant color across the entire lyrics. Unlike Bad Guy and Dancing Queen, there are no columns of color although there are some localized pockets of color. Perfect rhyming pairs exist, such as silence/ violence; sunset / up yet; neighbor / nature; but don’t prevail. There are some near rhymes too such as so go / to go / do you or floorboard / forewarned. There’s a lot more repetition of singular words such as time, you, love, for. And the tiles also help show near repetition of phrases such as: is it time / is it in / isn’t that; or do you love / do you have.

So perhaps the approach also works, but in this case different aspects are lyrics are creating different patterns and potentially different or additional elements need to be visualized as well.

Note: A rough implementation of the above is available as an Observable notebook. I had a few challenges with fonts and leveraged Riccardo Scalco’s texture.js to create the many different textures.

Posted in Data Visualization, Font Visualization, SparkWord, Text Skimming, Text Visualization, texture | 2 Comments

Modley’s Pictographs and Graphs

Rudolf Modley was a key figure in the popularization of Isotype in the United States. I’ve previously written about Isotype (e.g. hypothesizing what happened to it, and thematic axes). I recently received Modley and Lowenstein’s book Pictographs and Graphs (1952, Harper & Brothers). In addition to some beautiful pictographic charts, it also includes useful explanations of the design process and rationale used to create these effective and engaging charts. Here’s some insights from 70 years ago:

Insights from Modley

Storytelling. Modley was talking about storytelling with charts a half-century before data journalism: “The pictorial chartmarker is a headline writer among statisticians. If he fails to tell a story, his charts become pointless.” – pg 23.

Pictographs. “Pictorial symbols should be self-explanatory” – pg 25. A worthy goal, but a big challenge for anyone who’s had to try to design an icon for a menu (hamburger icon? gear?) or CPI (inflated $? balloon?)

Comparisons. “Pictographs make comparisons, not flat statements.” – pg 26. A single row of pictographs referring to a single value is pointless. It’s about comparing one value to another. There are quite a few infographics that fall into this category, with “one big number” and associated pictograph, but what’s it compared to? On it’s own, it’s a single factoid without any potential relative judgment.

Memorable charts. “A good chart may be judged from what the reader remembers the day after he sees it.” – pg 28. Modley sets the stage for this need right at the beginning of the book, on page 2, he describes Mr. Smith consuming information throughout the data – “a flood of varying facts which he must digest and evaluate for himself. Not the least important problem is to retain the essential facts from the wealth of information passing through his mind in one day”.

Personal engagement. “The American development of pictorial statistics has tried to avoid over-standardization of symbols. … it has wanted to bring symbols to life and to adapt them to each new audience. As we have seen in the case of Mr. Smith, his full interest and curiosity are not aroused unless there is some suggestion of his own habits and interests in a graph or illustration.” This is an interesting indication that rather than uniform pictographs used across all charts (perhaps like early Isotype before Gerd Arntz), Modley instead recognizes a requirement for icons intrinsically connected with the subject matter.

Some snapshots

Here’s a couple examples in action from the book:

A couple of charts from Pictographs and Graphs, 1952.

The left image shows the number of women at work – a straight-forward Isotype-like chart with the subtle cue of women’s attire changing with successive rows. This subtle change indicates, minimally that each row represents different data. Further, the attire change reinforces the time scale by using attire associated with each period.

In the right image, a person is comically attempting to hold a pile of coins. The person is literally staggering under a pile of debt (an idiom made into a visualization!). Note the captions above each column indicating the dollar amount – relative visual comparisons are possible, and the quantitative facts are explicitly depicted as well.

Even better

I understand from Nigel Holmes via Jason Forrest, that this 1952 book reprints only some of the content from Modley’s earlier book from 1937 How to Use Pictorial Statistics (a much more rare book). One day I’ll have to track down an edition.

The rare pictograph book: How to Use Pictorial Statistics, 1937.


Visualizing with Text footnote – 2 letter Scrabble words.

I’m seeing examples of interesting, interactive text visualizations in the wild. These are relevant to my book Visualizing with Text, particularly if I find examples that don’t quite fit. Occasionally, I’ll pop an example into the blog. Today’s example is a blog post by Gideon Golden with both an interactive stem&leaf plot of 2 letter Scrabble words, as well as a table of the same words, organized by first letter and last letter and color-coded by Cmglee:

Posted in Data Visualization, Font Visualization, Isotype | Leave a comment

58 Ways to Visualize Alice in Wonderland (+10 more)

How many ways are there to visualize a book? Bar chart, scatterplot, word cloud… that’s too narrow thinking. And, yes, there are websites showing how academics visualize text. But what happens out in the wild? Artists? School assignments? Professional designers? Statistics researchers?

Ever so curious, I decided to find out. To come up with some kind of method to search broadly, I picked one book, Lewis Carroll’s Alice’s Adventures in Wonderland and decided to find all the possible visualizations that might pop-up on Google/Bing text search, image search, scholar search. I found more than 40!

On the right are little teeny snapshots of the visualizations that I found. I won’t go into details on all of them, just a few highlights in this article.

If you’re interested in more details, you can read the peer-review research paper. Some of the snapshots are cropped – the links to the full-size images are in the sources at the end of this post.

Visualizations 1-5 are from the visualization research community. Visualization #2 is a word cloud – only one word cloud of Alice in Wonderland is shown here even though hundreds exist. For the purposes of this article, I’m interested in different visualization techniques. Visualization #5 is Brad Paley’s TextArc from two decades ago – an early, wonderful, highly interactive visualization.

6-10 are visualizations from the digital humanities for analyzing text. I like #8, lining up adjectives for a character, providing a sense of the character. In this case Alice’s speech is described as soothing, piteous, or melancholy.

Visualizations 11-18 are from natural language processing. Interestingly, visualizations 15-18 have almost no words – even though they’re about a text.

Visualization #19 is a wonderful visualization from an art thesis by Yi-Chia Cheng. Paragraphs are converted phonetic sounds, shown as symbols using international phonetic alphabet, and stacked into distributions. Distributions can then be created and compared across languages to show how Alice sounds in different languages. (see Cheng’s thesis for many more distributions across languages).

What happens when looking a bit further a field than linguistic research and data analysis?

Visualization #20 is an artistic tool for drawing using sentences from text by Travis Kirton. In this case, an artist has drawn a figure of the caterpillar smoking his hookah using the corresponding sentences from Alice – creating a figurative, non-linear reading of that text.

Visualization #21 is digital micrography – that is – text which has been flowed to fit into arbitrary shapes. Lines of text are curved, bent and sized to follow the predominant flow of the shape. This particular example is from the PhD thesis of Ron Maharik, who automated the technique for even complex shapes such as puzzle pieces for floral shapes, such as this tiny portion from Alice (see figure 10.1, page 68 for the full image).

22-25 are timeline visualizations, some showing changes in Alice’s height over time. 23 includes Freudian analysis in relation to Alice’s height changes, mapping Alice’s psychological development over the course of the book.

Visualization #26 shows only a small portion of a small multiple visualization, showing 20 instances of Alice’s dress from across many publications and movies by Claire Wenzel. Who knew Alice had so many dresses, and an analysis of the fictional representation of Alice’s dresses over time can provide a view on our own changing society.

Visualization 27-28 are interactive physical visualizations, with flaps, tabs and pop-ups.

Visualizations 29-41 are even more broad examples from across the Internet. Some are borderline visualizations, but do use visualization techniques. #29is a list of color-coded places, characters and events. #30 is an infographic providing context to the book as well as content analysis.

#31 is a social network of characters from Alice in Wonderland. Each character is shown with an original illustration from Tenniel. The social network is shown by the lines joining the characters. Along each line is a sentence of text describing the relationship between the characters. Interestingly, this visualization is authored by a costume website — presumably knowing a bit more about the characters and their relationships helps rents more costumes.

#37 is a wonderfully hand-drawn homework assignment, with keywords in heavy marker underlined and rotated as well as lightweight sentences.

#40-41 are unique editions of Alice, with text layout changing, font sizes, caps, etc., modified by the designer in relation to the semantics of the text. Note the call out in #41 overlaying one of Carroll’s logical inversions to form an X.

That’s 41 visualizations. What can be learned from these? In the wild, there’s a lot more text on the visualizations than the research visualizations. And more use of typographic enhancements such as bold, underline, italics and so on.

* * *

These in-the-wild visualizations spurred me to create a number of other visualizations of Alice in Wonderland. Some of these are in my book Visualizing with Text in more detail (Routledge, Amazon, companion site). Large size versions of these images are available in this PDF, CC-license so available to use in teaching, etc. (Also embedded at the end of this post).

Visualization #42 and 43 are sub-word visualizations, indicating properties on syllables.

#44-50 are about words, typically extracted attributes about characters. For example, #49 lists adverbs associated with characters, with font-weight indicating most frequent descriptors – Alice is timid, the Queen is furious, the Hatter is dreadful.

#51- 56 are visualizations of phrases and sentences. #52 shows connections of repeated words from the Mad Tea Party. There’s a huge amount of repetition among the characters, reinforcing their position against Alice.

#55 shows the chapter title and portion of the first sentence for each chapter. Various metrics are shown — the underlying bar indicates the dominant emotion for that chapter as extracted using natural language processing. Chapter 6, Pig and Pepper is highly disgusting; whereas the Chapter 3, A Caucus-Race and a Long Tale is measured as sad.

#57 and 58 are visualizations of the entire book. They could be readable printed out on a poster. #57 has large red text under longer paragraphs. The large red text is a capitalized noun and an uncommon verb, adjective or noun in that paragraph – such as: “Rabbit rabbit-hole”, “Mouse lesson-book”, “Bill roof”, “Duchess frying-pan”, “Queen quarrelling”, and so on. The idea is to form large scale landmarks in the text to easily locate portions of the text. Even larger behind the text are the chapter numbers and titles in yellow.

#58 is a version of the entire text of Alice where the text is increased in size if it has been quoted on the Internet. After collecting and processing 200 quotations, the most famous quotes from Alice stand-out larger than the surrounding text. You can immediately see the most quotable quotes, and step closer to read the surrounding text. Interested in what’s the largest text?

  • “Who in the world am I? Ah, that’s the great puzzle!” (Alice, Chapter 2)
  • “We’re all mad here.” (Cheshire cat, Chapter 6)

Sometimes it’s important to think outside of the box of word clouds and bar charts: there is so much more possible and feasible.

Addendum

Yes, there are more, so I see from responses on Twitter and elsewhere. 59-61 are some NLP visualizations: 59 creates little squares, one per sentence, brightness by sentence length. 60 transforms words to a vector space and plots, 61 isn’t quite a word cloud. 62-64 are more artistically driven: 62 animating sentences, 63 punctuation only, 64 is words inside large words which in turn forms a rabbit. I would not have though one could quite manage to get the layout of words to clearly form letters of larger words – apparently it’s quite feasible.

A Wonderland of Data Visualization

I did a presentation for the Lewis Carroll Society of North America (LCSNA) titled “A Wonderland of Data Visualization.” This presentation is more accessible to a wider audience and should be available on Youtube under LCSNA channel.

LCSNA is aware of additional visualizations: 65 is a set of interconnected bar charts comparing content from the original Under Ground vs. Alice’s Adventures in Wonderland – note that the first 5 chapters are largely the same, the latter chapters are largely new content. 66 is a similar analysis presented as a table, which is a type of visualization. 67 is another variation on a timeline indicating Alice’s height chart, in this example using Tenniel’s original illustrations and a related table. 68 is also a timeline in the wild, in this example, a highly illustrated timeline with short captions.

  1. Davies, J. Word Tree [with Alice in Wonderland]. https://www.jasondavies.com/wordtree/?source=alice-in-wonderland.txt&prefix=dear (original WordTree by Wattenberg M, Viégas FB. The
    word tree, an interactive visual concordance. IEEE transactions on visualization and computer graphics. 2008 Oct 24;14(6):1221-8.)
  2. Wolfram. Word Cloud examples [using Alice in Wonderland]. www.wolfram.com/language/11/new-visualization-domains/oriented-word-clouds.html. (original Milgram, S. and D. Jodelet. “Psychological maps of Paris”, Environmental Psychology, 1976,)
  3. Semantic Knowledge. Gephi GEXF Exports in Tropes. https://www.semantic-knowledge.com/doc/V81/text-analysis/gephi-gexf-exports.htm (created using Gephi, Bastian, M.; Heymann, S.; Jacomy, M.. “Gephi: An Open Source Software for Exploring and Manipulating Networks.” International AAAI Conference on Web and Social Media, North America, 2009)
  4. Tanahashi, Yuzuru, and Kwan-Liu Ma. “Design considerations for optimizing storyline visualizations.” IEEE Transactions on Visualization and Computer Graphics 18.12 (2012): 2679-2688.
  5. Paley WB. TextArc: Showing word frequency and distribution in text. Poster at IEEE Symposium on Information Visualization. 2002.
  6. Juxta. Alice: Wonderland vs. Underground. juxtacommons.org/shares/GJm4O9. See also juxtasoftware.org, and Dana Wheeles. “Scholar’s Lab Presentation: Using Juxta Commons in the Classroom”. https://scholarslab.lib.virginia.edu/blog/scholars-lab-presentation-using-juxta-commons-in-the-classroom/.
  7. Senghor, L. Alice’s Adventures After Wonderland: Visualizing Alice in the Digital Era. Visual Learning: Transforming the Liberal Arts Conference, 2018. See also: slideplayer.com/slide/3575003 and kateogorman.org/text-analysis/voyant-tools
  8. Hrdličková, J. A Corpus Stylistic Perspective on Lewis Carroll’s Alice’s Adventures in Wonderland, Thesis, Department of English Language and Didactics, Univerzita Karlova v Praze, 2015. https://dspace.cuni.cz/handle/20.500.11956/84093
  9. Ibid.
  10. Ibid.
  11. Brennan, J.R, Dyer C., Kuncoro A., Hale JT. Localizing syntactic predictions using recurrent neural network grammars, Neuropsychologia, Volume 146, 2020, 107479, ISSN 0028-3932.
  12. Jettka, D, and Stührenberg M. “Visualization of concurrent markup: From trees to graphs, from 2D to 3D.” In Proceedings of Balisage: The Markup Conference 2011. Balisage, vol. 7 (2011).
  13. Thys, F. AI in wonderland. SAS blogs. 2017 Jun 23. blogs.sas.com/content/sascom/2017/06/23/ai-in-wonderland
  14. Ibid.
  15. Agarwal A, Corvalan A, Jensen J, Rambow O. Social network analysis of alice in wonderland. In NAACL-HLT 2012 Workshop on computational linguistics for literature 2012 Jun (pp. 88-96).
  16. Zhu X. Persistent homology: An introduction and a new text representation for natural language processing. In Twenty-Third International Joint Conference on Artificial Intelligence 2013.
  17. Langit, L. Visualizing Alice in Wonderland – Wolfram Alpha Pro. 2012 Feb 12. lynnlangit.com/2012/02/12/visulizing-alice-inwonderlandwolframalphapro/
  18. Maharjan, S., Kar, S., Montes, M., González, F., Solorio, T. (2018). Letting Emotions Flow: Success Prediction by Modeling the Flow of Emotions in Books. 259-265. 10.18653/v1/N18-2042.
  19. Cheng, Y. Down the Rabbit Hole: Visualizing Linguistic Distance And Relationships With Alice in Wonderland, MFA thesis, Northeastern University, Nov 2019. repository.library.northeastern.edu/files/neu:m0455c25q
  20. Kirton, T. Artistic Canvas for Gestural and Non-linear Typography. Eurographics, 2011.
  21. Maharik, R. Digital Micrography. MSc Thesis, University of British Columbia, 2011. https://open.library.ubc.ca/soa/cIRcle/collections/ubctheses/24/items/1.0052114
  22. Bakht, S., Chin, R., Harris, S., Hoetzlein, R., Alice Adaptation Project Analysis. 2008. english236-w2008.pbworks.com/w/page/19019830/Alice%20Analysis
  23. Vera, L. Alice in Wonderland, from a Freudian perspective, visual.ly/community/Infographics/entertainment/alice-wonderland-freudian-perspective
  24. Neel, H. Alice’s Adventures in Wonderland Timeline. 2018. prezi.com/w5kk-wlz9mkn/alices-adventures-in-wonderlandtimeline/
  25. Padilla, C., Páez, D., Wolf J, Alice in Wonderland: Growing up & down, La Loma GbR, 2009, laloma.info/projects/alice
  26. Wenzei, C. Alice through the ages. SCAD. portfolios.scad.edu/gallery/51823709/Alice-Timeline-Project
  27. Arndt, E. Alice’s Adventures in Wonderland Unit Study. 2012. confessionsofahomeschooler.com/blog/2012/04/alices-adventures-in-wonderland-unit-study.html
  28. Carroll, L. Alice’s Adventures in Wonderland Carousel Book. Pan Macmillan. 2016.
  29. Arterberry, A. Alice in Wonderland Infographic. 2013. 111booksfor2011.wordpress.com/tag/alice-in-wonderland-infographic/
  30. DeReign, S. The Real-Life Girl Who Inspired Alice in Wonderland. 2016. coursehero.com/blog/2016/05/25/the-real-life-girl-who-inspired-alice-in-wonderland/
  31. Kemke, K., Schwartz, E. Alice’s Adventures in Wonderland Character Guide. halloweencostumes.com/alice-in-wonderland-costumes.html
  32. Alice in Wonderland Character Map. coursehero.com/lit/Alice-in-Wonderland/character-map/
  33. Paez, D.M. Alice’s Adventures in Wonderland: Beyond a children’s story… aliceandlewiscarroll.weebly.com/analysis.html
  34. Shairrick, C. Alice’s Adventures in Wonderland, 2016. mindmeister.com/633491270/alice-s-adventures-in-wonderland
  35. Parfitt, G.. “Alice’s Adventures in Wonderland Themes: Dreams and Reality.” LitCharts. LitCharts LLC, 25 Nov 2013. https://www.litcharts.com/lit/alice-s-adventures-in-wonderland/themes/dreams-and-reality
  36. Parfitt, G. “Alice’s Adventures in Wonderland Themes: Theme Wheel.” LitCharts. LitCharts LLC, 25 Nov 2013. Web. https://www.litcharts.com/lit/alice-s-adventures-in-wonderland/chart-board-visualization
  37. Thompson, S. H., LAB1 Q1 Book Project, 2016. https://syrenahbookproject.weebly.com/plot–conflict.html
  38. Rais, M, and Sari LL., Comparison between Coraline and Alice in The Wonderland, 2016. slideshare.net/ meiiiiillliiiinaa/comparison-between-coraline-and-alice-in-the-wonderland
  39. Kamak, M. Visualization of Alice in Wonderland, 2019 behance.net/gallery/70097119/Visualization-of-Alice-in-Wonderland
  40. Kusama Y, Posavec S. Lewis Carroll’s Alice’s Adventures in Wonderland. Penguin. 2012.
  41. Giuditta D. Alice in Wonderland. Rhode Island School of Design Portfolios. 2013 Feb 14.
  42. Brath, R. Alice Neologisms. Also in Visualizing with Text, AKPeters, 2021.
  43. “. Alice Prosody. Slightly different version in Visualizing with Text, AKPeters, 2021.
  44. “. Alice Character Emotions. Also in Visualizing with Text, AKPeters, 2021.
  45. “. Alice Character Frequency. Also in Visualizing with Text, AKPeters, 2021.
  46. “. Alice Character Sentiment. Also in Visualizing with Text, AKPeters, 2021.
  47. “. Alice Character Timelines.
  48. “. Alice Top Bigrams. Different version in Visualizing with Text, AKPeters, 2021.
  49. “. Alice Character Adverbs. Different version with Grimm in Visualizing with Text, AKPeters, 2021.
  50. “. Alice Character Ranking.
  51. “. Alice Word Sequences. Also in Visualizing with Text, AKPeters, 2021.
  52. “. Alice Repeated Word Pairs and Phrases.
  53. “. Alice Aligned Repetition.
  54. “. Alice Aligned Repetition 2.
  55. “. Alice Top Emotions, Sentiment & Annotations per Chapter.
  56. “. Dialogue from one character to another in Alice in Wonderland. Also in Visualizing with Text, AKPeters, 2021.
  57. “. Full text of Alice, skim formatted with enlarged landmark text.
  58. “. Full text of Alice, with most popular quotations successively enlarged.
  59. Huang, Shanfan. Text Visualization of Alice in Wonderland, 2016. http://shanfan.github.io/Alice/
  60. Crump, Matt. Semantic Librarian,2019 https://semanticlibrarian.shinyapps.io/alice/
  61. Vallandingham, Jim. Text Vis Starter Kit, 2016, https://github.com/vlandham/text-vis-starter
  62. Vallandingham, Jim. Text Vis Starter Kit, 2016, https://github.com/vlandham/text-vis-starter
  63. Rougeux, Nicholas. Between the Words, Exploring the punctuation in literary classics, 2016. https://www.c82.net/work/?id=347
  64. Gassner, Peter. Alice in Wunderland Nach dem Buch von Lewis Carroll, Nov 1 2021. @grossbart https://twitter.com/grossbart/status/1455234909832892421
  65. Demakos, Matt. From Under Ground to Wonderland, Knight Letter, 2012, Spring 2012 Volume II Issue 18 Number 88 page 17. https://archive.org/details/knightletterno8818lewi/page/16/mode/2up
  66. Demakos, Matt. From Under Ground to Wonderland Part II, Knight Letter, 2012, Winter 2012, Volume II Issue 19, Number 89 page 12. https://archive.org/details/knightletterno8919lewi/page/12/mode/2up
  67. Chang, Howard. Alice in Wonderland in China. Presentation at Spring 2017 Meeting of the Lewis Carroll Society of North America, at San Francisco Public Library, 2017. https://www.youtube.com/watch?v=NKm4_i6-LTo
  68. Van Sandwyk, Charles. Alice’s Accurate Chart of Wonderland: Twice Tested with Up-to-date Corrections. In Lewis Carroll’s Alice in Wonderland with original watercolour by Van Sandwyk. London Folio Society. 2016. https://www.abebooks.com/signed/Alice-Wonderland-Original-Watercolour-Sandwyk-Carroll/30247622756/bd

Posted in Data Visualization, Text Visualization | Tagged , | 1 Comment

Dashed and patterned lines for visualization (aka 1D texture)

Bertin and Texture

Jacques Bertin discusses texture in visualization, in his landmark book Semiology of Graphics (1967 original French edition, 1983 English translation). Much of modern visualization theory and implementation hinges on Bertin’s framework, including the notion of visual attributes, marks and layouts (for example, Wilkinson’s Grammar of Graphics, or Bostock’s D3).

Bertin talks about texture as a visual attribute and shows quite a few examples of texture applied to points, lines and areas. Bertin’s work exists at the heyday of Letraset – transfer film used by graphic designers to create professional graphics. Letraset provided professional fonts, graphics and textures that could be applied by any designer to create professional graphics for use in glossy magazine ads, to black-and-white screen graphics in newspapers, down to small scale zines – it was the democratization of desktop publishing before computation desktop publishing appeared in the mid 1980’s. Letraset and their competitors offered a wide variety of ready-made textures:

Samples from 300+ ready-made textures from Letraset (1,2,3,4).

And these kinds of textures appear in Bertin’s lines and maps and substantially influenced his work. Note how textures can be combined together to represent multiple variables:

Bertin uses textures in many visualizations, to represent quantitative values (left), or layer multiple categories (center and right).

But what is texture? What are the variables that make up texture? What are the parameters for use? Did Bertin articulate all the possibilities, or are there more?

This is too big of a question for a single blog post, so instead, consider a very narrow use of texture as applied in only one dimension – that is – a texture applied to a line. Texture constrained to a line is very limited, particularly if the line is understood to have no width (at least for this blog post).

Dash Patterns: Length, Gaps and Rhythm

Like Bertin, we can think of texture as a sequence of on/off values applied to a line to create a sequence of dashes. The regularity of the sequence allows the viewer to distinguish between a line with a short dash pattern with big gaps and or long dash pattern with small gaps, such as A50 and F90 in the image below. The wide range in variation indicates that these lines could be used to show quite a few different categories within one visualization. Also, both the dash length and the the proportion of ink could be utilized to represent quantitative values associated with the line, e.g. less ink for lower contour lines, more ink for higher contour lines; short dashes for low certainty, long dashes for high certainty, etc.

Dashes varying in length, gap proportion, rhythm, and randomness. Link to code using D3 and dasharray.

In the bottom half of the image, other variants are considered. Rhythm can vary: on the left, the lines have a consistent dash-gap rhythm ABABAB, as seen in line G0. In the center column, a dot is added into the gap: the line now has the rhythm ABCBABCB, such as G1. It has the same ink as its counterpart in the 0 column, and is clearly differentiated. The right column has two dots (rhythm ABCDCBABCDCBA – which is clearly easier to describe as a visual line shown in G2 rather than as an alphabetic sequence). Lines with none, one, or two dots are clearly differentiated, allowing for many more possible combinations dash sequences to be used.

Randomizing Lengths and Gaps

In the bottom half of the image, within each column the regularity of the line pattern is perturbed. That is, with each successive line a bit more randomness is added. In G0 the dash lengths and gaps are consistent, in H0 the difference is almost unnoticeable, and by K0 there is some noticeable differences in dash lengths and gaps. G0 is the same line as E70. The K0 line retains the highest similarity to line E70, even though dash lengths and gaps have been modified more heavily. The ability to make minor adjustments to dash lengths and gaps while retaining the same line identity is a property exploited by old mapmakers – essentially it is preferable for the dashed line to be solid at a corner, not a gap:

Dashed lines are solid at corners and gap over other lines.

As seen on this Ordnance Survey map from 1900, the dashes are solid at corners. The draftsman (drafts-person) has made minor adjustments to the intervals to retain the dash style while making the point of change in line direction visually explicit. Another interesting adjustment is that the gaps in the dashed lines occur when a dashed line crosses another line, as can be seen on the dotted line crossing lines in the far left and far right in the above thumbnails. This facilitates visual separation of the lines, not allowing them to be become an intersecting blob.

These kinds of adjustments are critical to making dashed lines effective in visualization. Consider a timeseries chart using a dashed line. If the gap occurs at a high point or low point, then it is impossible to determine the value from visually reading the chart, significantly reducing the effectiveness of the chart:

Left chart from 1912 shows dashed line with every high point and every low point solid. Right chart from Excel some high points and low points are gaps making it impossible to visually determine the value.

In the left chart, the drafts-person has a solid dash at every corner, whereas the Excel chart on the right has arbitrary dash lengths and gaps leading to corners not being explicitly visible. Since the corners on a line chart represent the actual data points, the arbitrary rendering results in data points not being visible! Yes, it is feasible in Excel to also plot the data points (as boxes, circles, stars, etc), but that only clutters the chart and breaks the rhythm of the line style — the drafts-people didn’t need to add extra marks for each point.

Given that a data visualization is supposed to show data, it is strange that the line style pattern overrides the visibility of the datapoints. Therefore, implementing effective dashed lines for visualizations isn’t as simple as defining a dasharray in SVG or D3. A lower-level dashed line module would be needed to make sure solid dashes occur at corners. This is non-trivial: what if there are multiple sharp angles in a short sequence (e.g. on the 18th day in the 1912 chart above), or multiple line crossings?

Showing Rhythm with Transparency, Saturation, Brightness or Hue

In Bertin’s example, texture is binary: it’s on or off. Much more is possible using computer graphics as opposed to transfer film. Transparency, saturation, brightness or hue can be modified with the regularity of the dash sequence:

Texture patterns represented with varying gradients and color. Link.

In the top block, L50 – Q90, gradient transparency varies from solid to transparent to create the same pattern as previously seen in the dashes on lines A50 – F90. The lines in the 50 column use a linear gradient that goes smoothly from min to max and back to min, whereas in column 70 the gradient drops off more quickly than it rises, and in column 90, the gradients have a sharp drop off. Interestingly, the non-equal application of the gradient appears to give the lines the illusion of motion, like motion blur in photographs.

In the bottom block, Rs – Wh, the same patterns are used with attributes of color. In the first column, only saturation is changed (from orange to grey) which is difficult to perceive as there is no contrast between the two colors. In the second column, only brightness is changed, with the alternating pattern of light orange, dark orange clearly visible. And the third column changes hue, resulting in a repeating rainbows along the lines. Again, attempting to use SVG and D3 to implement these is difficult and the approach used in my quick code won’t work with curving lines.

More importantly, these examples suggest that texture is operating at a different level than visual attributes such as length, transparency, hue and so forth; as the texture can be represented in any these other variables. So Bertin’s notion of texture and how texture fits into the formal definitions of visual attributes of data visualization may be more nuanced than the current models used in visualization research.

Conjunction of Textures

Bertin uses multiple overlapping textures to convey multiple variables in his maps: textures of small dense dots, with large coarse dots, with lines oriented in one direction, and lines oriented in a different direction. Can multiple dash patterns be combined on one-dimensional lines? Yes:


In the top row, the line X1 has a very short pattern with a lot of white space is shown. Then, on the left, a line Y2 also has a short pattern and an interval that is a multiple of the interval for X1. The two patterns can be interleaved form the line XY12. The right image from Bertin back at the top of this article uses this interleaving appproach: it uses carefully designed textures such that dots fit neatly between the diagonal lines.

On the right, the first line X1 is the same short dash pattern, while the second line Y7 has a long dash pattern: there is no way to combine the two patterns without intersection. Here the patterns are combined to alternate on/off to create a reverse video effect (i.e. the patterns are XORed together).

This kind of approach might be desirable for charts with many different dimensions, for example, a census chart plotting unemployment overall/minorities, and for all ages/and those under age 25. Thus lines can be overall solid, minorities short dash; all ages solid (same data as overall), under 25 long dash, and the conjunction of minorities under 25 in the conjunction of the two dash patterns.

So What?

If dash patterns are problematic, then why use them at all? Sometimes there may be a need to use many lines, more than can be comfortably differentiated using color. Dashed lines are also common on line charts to show predicted data, or on maps to show unpaved paths: semantically a dashed line can effectively convey the data has uncertainty. Dashed lines have real uses.

Texture is an under-explored area of data visualization. Historical charts and maps do show that these can be effectively used. Visualization tools and libraries, however, use dashes arbitrarily and don’t take into account how to draw dashes to suit perceptual needs. Furthermore, our definition of the design space around texture may be somewhat lacking – perhaps some future grad student will want to take on some of these issues.

Posted in Data Visualization, texture | Leave a comment

Visualizing with Text: Endorsements and more examples

Thanks everyone who’s bought a copy of Visualizing with Text. I hope you’re enjoying it.

I’m really appreciate some of the examples I’m seeing in the wild. Here’s some fantastic examples from Georgios Karamanis. I like the shifted + bold text indicating voting topics at the UN, and word-pairs describing makeup. And Georgios provides the code, so if you’re into R and want to see how to implement some of the text visualization techniques from the book, see his github.

Examples of visualizing with text by George Karamanis.

Also exciting to see an endorsement from Michael Friendly, and many others on Twitter. Thanks for the posts.

Photo of Michael Friendly visually endorsing Visualizing with Text.

I’ve talked about the book internally at our company (Uncharted) and have been pleasantly surprised to see some of the ideas weaving their way into some of our visual analytics, such as a button indicating the color legend within the button glyph; or a technique for interactively labelling neighbourhoods while zooming around a massive network.

While I can’t really do a book tour during Covid, I did a talk at Naomi Robbins’ Data Visualization NYC Meetup and interview with Lee Feinberg’s Analytic Stories. Looking back at the videos, I see I may have talked over a couple people – sorry! Happy to follow up.

Also, I noticed some Tweets regarding Typograms, or more generally laying out type to fit into shapes such as Aaron Kuehn‘s beautifully typographic anatomical posters. It’s a technique discussed in the book, such as Automatic Typographic Maps, which in turn were based on Axis Maps; Jean-Luc Arnaud‘s typographic maps or Kate McLean‘s smell maps.

This technique of fitting type onto lines or into shapes has been going on for centuries: I like Calligrammes from the early 20th century, medieval monks speaking in scrolls (on the cover of the book!), text set into the shape of an axe in a book from 1530, or awesome psychedelic posters, such as Wes Wilson‘s posters from the 1960’s. For any visualization researchers interested in algorithms for fitting text into complex shapes, see Ron Maharik’s Digital Micrography research and PhD thesis.

Posted in Data Visualization, Microtext, Text Visualization, Thematic Map | Leave a comment

Visualizing with SVG before D3: Timeseries

News headlines about the GameStop price swings this week reminded me of some old SVG visualizations of stock bubbles and crashes that I’ve done. SVG was around before D3. I generated SVG visualizations in the mid-2000’s well before D3.js. It was painful in comparison, but at the time it was a lot of fun.

The objective was experimentation: what could be done with a scalable vector graphics library? As such, it wasn’t constrained to screen resolution and screen dimensions (which at that time was typically 1600 x 1200) – rather you could do much more detail, import into Illustrator and print things at very high resolution with lots of detail, lots of transparency, and tiny text.

Analyzing timeseries

Here’s an example: Microsoft’s daily stock price from late 1992 to early 2004.

MSFT daily stock price 1992-2004 with different markers and shadings.

That’s 12 years of daily data with about 250 trading days per year resulting in a 3000 px wide visualization — which now draws just fine on my 4k display. With SVG, it is easy to layer in many different analyses creating different marks, lines and areas. Here’s a closeup of Lucent’s daily stock price during the Internet bubble:

Lucent stock price closeup of 1997-2001.

Many different graphical marks indicate data and derived indicators:

  • The blue line indicates the daily price.
  • The small green/yellow/orange/red bars behind the blue price line indicate the monthly price move from the beginning to the end of the month, colored by whether the price increases (green) or decreases (red). It aligns neatly with the monthly grid, filling in stripes within the grid.
  • The fat green/yellow/orange line behind that indicates yearly change in price. (There’s also light grey boxes behind that align with the thicker yearly grid.)
  • The green/purple circles indicating successive high/low points. Values for the highs and lows are indicated in text as well as the date. All successive low points are connected by a straight purple lines, all successive high points are connected by green straight lines. The zone between the green/purple lines form an envelope around the price range.
  • The many orange lines are moving averages each with a different time period, forming a guilloché. When moving averages start to cross it is indicator of a change in trend, e.g. from up-trend to down-trend (as many stock traders know). In 1997, the averages start to converge but then the trend continues. However, in 2000, the moving averages successively start to cross, and by June most of the moving averages have crossed, before the much steeper crash.
  • The large arcs and corresponding fills indicate major trends. Up-trends start at trend low to the ending trend high (e.g. 3/19/1997 at $10.75 to 12/8/1999 at $71.31: up almost 7x!) Down-trends start at trend high and end at trend low (e.g. June 5, 2000 at $56.23 to a point outside the closeup: 9/27/2002 at $0.77: down to almost 1% from the start, ouch!) This is what a bubble looks like and how it plays out.
  • The filled shadows in pink and blue indicate the 52-week low price and 52-week high price. On the first two-thirds of this chart, the pink shadow is predominant as the stock keeps going up with a few “lakes” of blue filled in the occasional dips. In the last third of the chart, it’s almost entirely under the blue shadow as the stock tanks.

Why is it important to have all these different markers, bars, lines, arcs, text, and so on?

There are many techniques to analyze timeseries data – moving averages, standard deviations, envelopes, and so on. In finance, one can explicitly study timeseries analysis. And more broadly, these analyses apply beyond stock prices to electrical grid loads, network utilization, software performance, automobile diagnostics, and so on.

Long timeseries

SVG is scalable, so I applied it to longer and longer timeseries. Here’s the Dow Jones Industrial Average(R), when I re-ran the code with in 2010 to compare the 2008 financial crisis to the 1929 crash:

Dow Jones Industrial Average 1896-2010 daily: 28500 days, every point plotted.

So, 114 years of daily data results in a plot more than 28000 pixels wide. That’s more than my 4K screen can display in it’s entirety at full resolution. But paper can. The visualization uses a log scale: you can see the magnitude of the 2008 crash at the top right is far smaller compared to the crash of 1929. I.e. from the index peak in 2007 around 14000 to a low of 6500 in 2009, the index lost a bit more than half of it’s value; whereas in 1929 the index peaked at 381 in September 1929 then dropped to 44 in 1932, down 88% of its original value! There was a lot of pain in 2007-8, but the massive intervention in the markets by the Federal Reserve and central banks helped stave off a much bigger crash and avoid a much bigger, longer recession.

There’s a few other things going on with this experimental visualization. Here’s a closeup of the 1950s:

Closeup of Dow Jones Industrial Average during the 1950’s

In addition to the the many different shadings and markers, the grid lines also participate in indicating data range: rather than extend across the display, the grid lines are localized to the line +/- a range. Axis labels also follow data-driven rules. The price labels in this snapshot follow the grid (note the stock market high of 381 in 1929 wasn’t surpassed until 1954, some 25 years later). The date labels are completely data driven – indicating dates on the top side of the line when the price hits new highs, and on the bottom side of the line when the price hits new lows.

Why is it important to have such long timeseries in so much detail?

Our firm has clients with financial timeseries that are more than 200 years long. Seeing all the detail is important. The current GameStop bubble is not unique, there have been many, many more, going back to railroad mania in the 1840’s or the South Sea bubble in 1720’s. Different bubbles will play out in different ways: having detail allows for comparison to prior bubbles for insight into the current bubble. Recoveries from recessions will be different for different sectors. Some market experts will use this information to inform their portfolio strategies in response to GameStop, or in response to Covid, or in response to an election cycle.

And why the variations in grids, arcs and areas?

D3 is great and much can be done out-of-box. But, when you just use the standard examples applied to different data, you might not be indicating the things that matter to the end user (i.e. the purpose is insight, not pictures). As such, it’s important to use the toolset to experiment with the underlying graphics to highlight the core insights.

I’ve done much more experimentation with SVG before D3, but those will be for some future blog posts.

Posted in Data Visualization, Line Chart, Timeseries | Leave a comment

Visualizing Causes of Death in Georgian London

LondonLives.org is a collection of 240,000 historic manuscripts from eighteenth century London. These have been collected, organized and analyzed, such as Sharon Howard’s summary of 2894 coroner inquests from London 1760-1799 with each case including subjects, verdicts and causes of death as well as links back to original handwritten manuscripts. The dataset is fascinating: How might this data be visualized?

Each each row in the summary dataset indicates subjects (e.g. Susanna Thompson, Sarah Cox), verbs (e.g. crushed, falling) and objects (chimney). These can extracted using natural language processing (although, note that Howard cautions that the causes of death are not fully accurate, nor is my natural language processing). These extracted words can be assembled into a hierarchy, for example verb, object, subject:
– Crushed
– Chimney
– Susanna Thompson, Sarah Cox

There are many, many different techniques for visualizing hierarchies (e.g. treevis.net):
Treemaps and sunburst focus on accurate representation of values using area, meaning that fitting legible text can be difficult.
Graph representations, such as nodes and links, emphasize the structure, again potentially making difficult to fit all legible labels.
Org-charts are designed to fit text, but the left-right/up-down layout can result in layouts that become very wide or very deep, making it difficult to fit an org-chart with 3000 items to a rectangular screen and display all the text at a legible size.
– More generally, most quantitative visualizations aren’t designed for representing large amounts of text.

Instead, typographers and publishers have techniques for depicting textual hierarchies not shown in any visualization compilation. Indexes and dictionaries are designed to show a large number of words in hierarchies in a very dense format. Dictionaries use wide variety of formatting, for example, using very heavy-weight text to make the defined word stand-out. Alphabetic ordering facilitates search. This supports quick non-linear skimming to the word of interest; then different formats are used within the definition (e.g. italics, small caps, etc.,) facilitating jumping to the type of data of interest. Size of entries is relevant: words that have many meanings have longer entries. In effect, indexes and dictionaries are visualizations, although constrained to printed pages bound in a book. With larger screens, indexes can be set out to be entirely visible on single screen or two.

Returning to the coroner inquests data, consider an index-like visualization. Organizing the hierarchy by verb, then object, then subject results in a list of subjects under an object, which in turn are listed under a verb. Here’s all the ways that people died under the verb CRUSHED:

Before government building codes and safety standards, many people died by being crushed.

Each object is listed in a separate coloured section. Each section starts with the object in bold (e.g. chimney), and has a list of named subjects in a narrow brown font (e.g. Christiana Jorden and 3 others). Individual names are links to the original manuscripts, e.g. Sarah Cox:

Given some unusual objects, a sample cause of death is shown in italics, (e.g. crushed by falling chimney). More frequent causes result in taller sections, for example being crushed by a chimney was more common than being crushed by a scaffold; while being crushed by a house or a theatre crowd was more common than chimneys.

Also, I was curious about gender: were women more prone to certain causes of death than men (assuming that women’s deaths were investigated equally to men’s deaths)? After each object a pair of numbers indicates the gender ratio, e.g. 1/3 indicates 1 man and 3 women were crushed by chimneys. To facilitate skimming for this ratio, the background colour of each section is shaded light blue to light red. This colour scale isn’t meaningful when viewing causes with only a couple deaths but provides some macro-scale patterns when viewing the larger dataset.

Here’s the 2894 inquests presented in an index layout, as a simple interactive HTML/CSS visualization. On a 4K screen all text is legible and readable:

Causes of death in Georgian London. The biggest blocks correspond to the most frequent causes, e.g. drowned, hanged.

Like a dictionary, the biggest blocks have the most entries: the biggest blocks are DROWNED (in column 3 in the image above) and HANGED (column 6). Both of these tend to be light blue (more men than women). The largest light red block (more women than men) is under BURNT (in the first column).

In the full visualization, the colour of subjects’ names correspond to the verdict: brown is for deaths deemed accidents, red for homicides, green for suicides, etc. These can be interactively filtered, for example, the most popular method of suicide was hanging (biggest box), generally more male suicides are recorded than female (generally more blue than red), many methods were used by both genders (e.g. drowning, poisoning, cutting), although gun shots are almost entirely male.

So what?

This example is from my recent book Visualizing with Text, including a comparison to a treemap. It’s a good illustration of a visualization that provides high-level perceptual patterns (large coloured blocks) and low-level details (words and phrases) within the same visualization – which Edward Tufte refers to as micro/macro readings. The book review of Visualizing with Text by Alec Barrett succinctly summarizes the benefit as being able to “zoom with one’s attention.”

Thanks to LondonLives.org and Sharon Howard for collecting, organizing and summarizing these historic documents.

Posted in Alphanumeric Chart, Data Visualization, Text Skimming, Text Visualization | Leave a comment

In Defence of Data-Dense Visualizations

I’ve done a couple of presentations with content from my book Visualizing with Text to grad classes in the last month. In both cases, a couple of people expressed concern regarding the complexity of some visualizations, being particularly data dense. This is an argument that I have heard on occasion throughout my career as I am often involved in the design and development of data-dense visualization for domain experts. Data-dense displays are not uncommon in domain applications – consider a few examples:

Figure 1: Some data-dense visualizations. Clockwise from top left, map of Abu Kebir, 1918; Earth Scientist’s Periodic Table of the elements and their ions, 2013; Financial trading floor desk, 2012; NYISO’s video wall of electrical grid, 2014.

These visualizations are packed with data. The map has many layers: roads, railways, rivers, canals, drains, buildings, labels, textures, etc. This periodic table has icons, symbols, numbers, names, pathways, ions, solutes, charges, etc. The trading desk shows a wide variety of information associated with securities: timeseries, events, color-coded news, internal data. The electric grid shows hundreds of entities: transformers, capacitors, hydro dams, wind farms, interchanges, power corridors of different voltages, status, thresholds, etc.

And, if you look at the Uncharted research website, you’ll see many more data-dense visualizations that our company has worked on.

Data density slows down understanding

The essence of the data-density criticism is that, with a greater number of data points and/or multi-variate data, the viewer can become confused as to where to look. There may be many different visual patterns competing for attention. Should the viewer be focusing on local patterns in a subset of the display, or macro patterns across the entire display? Should the viewer be attending to the color, or the size, or the labels? If each represents different data then the semantics will be different, based on the visual channel that the viewer is attending to. Worse, if the viewer starts to move back and forth between many different channels, one may forget the encodings and become more confused.

Some people have become conditioned to think of data visualization as they might see in the popular press, visualizations made for communication, or visualizations that utilize straightforward visualization types that you might get with a library such as D3 or Vega:

Figure 2: Some common visualization techniques, not particularly data dense compared to Figure 1.

Viewers may be conditioned to think of these as encompassing all (or most) visualization types, with many articles organizing visualization into a limited palette of visualization layouts: periodic tables, chart choosers, lists, galleries, zoos, and more. These visualizations typically don’t have many data points (20-200), and typically show only a couple of variables. Data is homogeneous – you’re not looking at multiple datasets with different types of entities jumbled together. Answers are easier, because there really aren’t many different dimensions or layers to be considered.

But not all problems are simple.

Complex problems may need complex data

The images in Figure 1 show that multi-metric, data-dense visual representations exist in practice – in both historic visualizations and modern interactive visualizations. These complex visualizations bring together multiple datasets in layers, in many windows, in large displays – i.e. into data-dense representations.

This extra data is required, because there may be multiple answers possible. If the price of a stock goes down, it may be due to the overall market going down, to a competitor dropping their prices, to poor sales data in the company’s earnings, to a negative news story about the company, or other factors. The extra visual context facilitates reasoning across the many plausible causes to assess the situation. In the stock example, multiple causes can be true at once: an expert needs to see all and determine which are most relevant to the current situation.

In addition to quantitative data, there may be other facts and evidence: qualitative data, news, videos, and so on. There may be multiple perspectives to consider. There may be different time horizons to consider. (For example, the stock market collapse in 2008 was triggered by the collapse of Lehman Brothers on September 15, 2008; but months before Bear Stearns (a competitor) was acquired when it ran into funding issues, and even earlier some mortgage origination companies went bankrupt.)

More generally, wicked problems are not easily solvable. The problem can be framed in more than one way, different stakeholders have different timelines, constraints and resources are subject to change, and there is no singular definitive answer.

As a colleague tells me: Complex problems have easy to explain wrong answers.

The Value of Data Density

Communication: There are many different reasons for creating visualizations. The low density visualizations in Figure 2 may be part of narrative visualizations, for explaining the results of an analysis. Data has been distilled to a few key facts.

Dashboard: Or, simple low-density visualizations may be part of a overview dashboard, with many small visualizations, each of which provides an overview of a different process, and can be typically clicked on for more detailed analysis. These overviews only need to provide sense of status: if there are any issues then the viewer has workflows to access more detail.

Beyond the communication and dashboard uses, there are many other uses for visualizations, where density may be valuable:

Organization: For example, the map and the periodic table in Figure 1 organize large amounts of data. These many layers of data allow cross referencing between many different types of information. On the map, the user may need to know the location of buildings (objectives), roads (connections), canals and railroads (obstructions) in order to plan a route.

Monitoring: The market data terminal and the electric grid operations wall in Figure 1 provide real-time monitoring across many data streams. Many heterogeneous datasets come together into a single display. Time is of the essence in real-time operations. Detailed data can’t be hidden a few clicks away: all key information must be designed and organized for quick scanning and immediate access.

Analysis: Knowledge maps and network visualizations are often about analysis of complex data. SciMaps.org has 100 knowledge maps, each collecting and visually representing many facets of a particular corpus; such as Figure 3 left, an interactive visualization of the edit history of Wikipedia articles by Wattenberg & Viégas. Visual Complexity has 1000 network visualizations, such as Figure 3 right, a dynamic visualization of social networks indicating people, links, activity, postings, sequence and message age by Offenhuber & Donath. Both of these are visualizations about text over time: edits, exchanges, persons. In both cases there are many dimensions to understand and comprehend.

Figure 3: Knowledge maps and network visualizations.

Exploration: Data-dense visualizations aren’t limited to domain experts attempting to understand complex datasets for their jobs. In Dear Data, Georgia Lupi and Stephanie Posavec create some awesome multi-variate data-dense visualizations of mundane day-to-day data. Why? Exploratory data analysis needs to consider lots of different data – the exploration is required to consider, assess, investigate, compare, understand and comprehend many different data elements. To do exploratory data analysis with only a well-organized quantitative dataset may miss much relevant data (e.g. see Data Feminism). Lupi and Posavec show by example that many different attributes that can be extracted from everyday life and then made explicitly visible for an initial exploratory view.

Figure 4: Dear Data. Left, Lupi’s visualization on doors. Right: Posavec’s visualization on clocks.

Data Density and Visualizing with Text

The objective of the book is to define the design space of Visualizing with Text for all kinds of visualizations, simple or complex. Section 2 in the book deals with simple labels, such as scatterplots, bar charts and line charts: text can be used to make simple visualizations more effective. Section 3 in the book goes further, using multiple visual attributes to convey more data (figure 5).

Figure 5. Some more complex visualizations from Visualizing with Text.

The Future of Data Density

Data density is likely to become a bigger issue in the next decade. Greater awareness of bias in data makes it more important to represent more datasets in a visualization. Analysis of richer data types – such as text, video and imagery – will likely necessitate new ways to layer in additional visual representations. Bigger data will have even more variables requiring more ways to show more data, or risk summarizing too much useful detail out of data. Specific visualization applications, such as cybersecurity, fake news, and phishing, need to deal with ever more complex attacks which implies more nuanced analyses based on more complex data.

Data density will become increasingly important to future visualization and visualization research.

Posted in Data Visualization, Text Visualization | 1 Comment

Visualizing with Text: author’s copy and new content

I just received my author’s copy of Visualizing with Text this morning! It’s awesome to finally hold the book after 2 years of writing (and the start of this blog 7 years ago!):

Here’s the book with the nice glossy cover.

Flipping through triggers some memories, like finding this user study on charts from 1961 comparing labels vs. legends! (Can you BELIV that there were user studies 59 years ago, before VisWeek? see Sarbaugh et al: Comprehension of Graphs):

Label or legend? An experiment from 1961.

Or a larger effort specifically for the book is captured on this page. Since the book defines a design space for visualizing with text, I felt compelled to demonstrate the flexibility of the design space to create many different visualizations of one document: here’s 14 different visualizations of Alice’s Adventures in Wonderland:

14 different visualizations of Alice’s Adventures in Wonderland.

And, here’s a page on very different uses of visualizations (beyond using visualization for preattentive perception of patterns). On the left is a system diagram of a power grid (an inventory use that organizes all the assets in the grid, courtesy of ISO New England). Top right is an infographic by Nigel Holmes of a graph, where the edges are literal text implicating individuals (a communication use that distills days of testimony down to select statements, courtesy of Nigel):

Different uses of a visualizations.

“Preview” is now working on the CRC Press site, and “Look Inside” is now working on Amazon.

Posted in Graph Visualization, Text Visualization | 1 Comment

Visualizing with Text – high-res figures now on-line

Visualizing with Text releases any day now: I hope to have my copy before the end of VisWeek. I’ve finally posted all the figures that I authored on-line with a CC-BY-SA 4.0 license. There’s 158 high-resolution images and diagrams from the book in the PDF file. These may be a nice complement to the eBook or physical book as some of the text may be too small to be readable in some of the larger visualizations. My figures are all released with a CC-BY-SA license, so they can be reused, for example, for teaching or mixed up into a collage or whatever.

Some of the figures from Visualizing with Text.

There’s another 99 figures that are not mine – I’ve included links to online versions of these images where available on the last page.

And links to many of the external figures.

Sometimes people ask me which of the visualizations I like the best. The answer varies over time, although I am currently biased towards the text-dense multi-variate visualizations designed for a large screen, such as these ones (Figures 6.19, 8.10, 9.8, 10.11, 11.3, 12.11) – see the PDF for high-res versions:

Some text-dense visualizations.

Why? Viscerally, I like the rich texture of shapes, colors, and structure where multiple patterns appears – visualization should be supportive of representing complexity and affording multiple interpretations. In my day-to-day work, I often design visualizations for financial market professionals: they don’t necessarily make money if they have the same ideas and same thesis as everyone else. Data-dense visualizations that prompt multiple hypotheses can be a good thing. (see also Barbara Tversky’s keynote at Visualization Psychology earlier today!).

I also think these dense visualizations push the boundaries of the design space of visualization and of text-visualization. Perceptually, multi-variate data can be a challenge. Data-dense visualizations can be a challenge. The linearity of text (i.e. you have to read words in some order) vs. the volume of information is a challenge: what happens to the global pattern? what happens if “overview first” doesn’t necessarily provide a macro-pattern?

A couple of these visualizations I just presented for the first time yesterday at the Visualization for Digital Humanities workshop in a paper titled Literal Encoding: Text is a first-class data encoding.

Posted in Data Visualization, Design Space, Text Visualization | Tagged , , , | Leave a comment