Visualizing with Text – Interactive Demos

Over the summer, eight of the examples from Visualizing with Text have been re-implemented as Observable notebooks! These can be interacted with, code is visible, and code can be edited live! All the demos can be accessed from the supplementary website.

They are all text-intensive visualizations with some interesting datasets, such as countries in the world where more people are dying than being born, countries with big jumps in Covid unemployment, and coroners’ reports indicating many ways to die in Georgian London.

Manipulating Code

For the rest of this post, I’ll focus on one demo: word skimming. This demo processes paragraphs of text, ranks words based on frequencies compared to a large corpus (Wikipedia), then weights the words so that least common words are formatted to stand-out. This formatting facilitates perception of uncommon words, which is a taught strategy for skimming text:

While skim formatting may seem strange in prose on a computer screen, the technique has existed for centuries, in instruction manuals, advertisements, comic books, and even software code editors, as shown in some of the examples in the book.

In this particular demo, the viewer can easily cut and paste a completely different text to format for skimming. They can also select from a few different formats, such as adjusting font weight, font width, or x-height. Here’s the same demo using the opening paragraphs from The Decline and Fall of the Roman Empire and formatted using both font weight and x-height to indicate uncommon words, such as valor, emperors, Trajan and Hadrian:

Variable width and variable x-height are uncommon in most prose text formatting. This demo uses the relatively new technology of variable fonts. With variable fonts, the font designer can expose properties, such as weight, width and x-height as quantitative parameters. These parameters are then adjusted based on a simple algorithm which splits apart words (using the JavaScript library compromise), calculates word rank (based on word lists from Wikipedia), and then adjusts the variable font parameters accordingly.

Since this example is implemented in Observable with all code editable, it’s straight-forward to modify the code and try other visual attributes that aren’t used in the demo code. For example, here’s the opening paragraph from Moby Dick, with uncommon words highlighted in red:

To highlight the words in red, a single line of code can be added (without worrying about adapting the drop-down menu, or configuring d3 scales, etc.) In this case, the following was added to the update cell:

// words are in an array elements [0-n] and d.rank is the rank of each word:    
elements[i].style.setProperty("color", (d.rank > 5000) ? "red":"black") 

Essentially, if the rank value is over 5000, color the word red, else color the word black.

Instead of a toggle red/black, we could easily add a d3 scale and use a color ramp, so that colors range from black, through blue and purple to red. (Note, no SVG is used, this is all HTML – D3 is only used in this code to facilitate scaling numeric values into ranges that suit the variable font used). In the snapshot below, a color scale is used in addition to font weight and x-height in this snapshot of text from Frankenstein:

While feasible, this is not necessarily recommended. The many changes of hue, weight and x-height creates many distractors, which reduces the readability of the sequential text. As a design effect, it may be an objective to create disruptive formats, such as the wonderful encoding by Ben Fry of Frankenfont.

On the otherhand, if the goal is skimming, then it is desirable to easily read the immediate surrounding context to aid comprehension. Changes of many formats reduces the typographic consistency requiring greater attention to decipher the surrounding text. Typographers discuss this as readability, which is described more in my book (briefly) and in typography books in more detail. One must take care when manipulating typographic attributes to not create Frankenstein paragraphs unless it suits the particular task or objective.

Posted in compromisejs, Data Visualization, Observable Notebook, Text Skimming, Text Visualization, Variable Fonts | Tagged , , , | Leave a comment

“…and what is the use of a book without pictures or conversations?”

My book Visualizing with Text is nearing publication (early November!). One goal of the book is to appeal to both researchers (with structured text with logical arguments), and designers (with many examples and pictures). I really like books with lots of images. Like the quote from Alice’s Adventures in Wonderland, in the title, I prefer well illustrated books. And I don’t like reading about visualizations where you only get little teeny snapshots: after all the primary subject is visualization!

Text is good to explain, structure and provide context, but pictures are important to a designer: to see how the conceptual description is actualized; to see all the little design decisions that contribute to the whole; to see the anomalies and things that could be improved; to provoke design inspiration; and so on. Arguably, Jacques Bertin was successful with Semiology of Graphics 50 years ago because he presented a theory supported with many examples and illustrations. My ideal goal is to have a 50/50 split between text and images. But, measuring text to images is a bit tricky:

Counting images

It turns out that the publisher and I count images differently. We can easily agree on page count (274), but my publisher currently counts 146 illustrations and I count 250 images (+/- a few to be sorted out). What? How can we differ so much?

I think the publisher is counting image captions. On the otherhand, I’m counting the individual images that might be referred to in a single caption, for example here’s figure 1.3. It has a single caption but two different images: left is a book from 1497, right is a different book from 1589. I count this as two images (the images come from two completely different sources).

Then, there are parts of the book where I discuss using visualization techniques inline with text. Since, I’m advocating inline visualization, these are simply text richly formatted within the flow of the prose. In this case, there is no image caption, so they don’t count at all, whereas I count it as one image (I had to write the code to crunch the data to format the text).

But there’s more nuances. What about a couple words formatted within written text, simply to facilitate cross-referencing to an image? I don’t count those. And what about an image that’s a composite of many teeny snapshots? I count those as one image. And what about a table that’s got some visualization formatting associated with each cell based on data attributes? That might count as a table and not an illustration. Again, I had to write the code to crunch the data to format the text, so again I count that as an image.

So, overall, I get to around 250 images, out of 274 pages. Given that the Table of Contents, Preface and Index don’t have images, that gets close to the 50/50 split.

Types of Pictures

Returning to the topic of pictures, a few other stats are useful. The pictures split approximately 50/50 between pictures that are my own vs. pictures of other visualizations. I think it’s good to provide a grounding based on real-world precedence. For the pictures from other sources, I’ve tried to include URL’s to them, so readers and educators can easily find the online versions where available.

With regards to my pictures, some are new unique visualizations, some are sequences of pictures, such as showing the same visualization using different attributes, or zoomed in or so on. There’s about 80 different examples of visualizations with text that I created in the book. Some have been published before, on this blog or in research papers.

But I wanted to create some new content (why buy a book with old pictures). So there’s new, unpublished visualizations that will be making their first appearance in the book, including scatterplots of cars, an adjacency matrix of dialogue, a syntax diagram, a massive textual stem & leaf diagram, assorted tables with visualization characteristics, a data comic, some expressive lengthening examples, and a topic model visualization. Here’s eight examples taken from the draft:

Larger versions of these pictures will be available when finalized and placed on the publisher’s website with a CC license shortly before the book is released.

Posted in Data Visualization, Text Visualization | Tagged , , , , | Leave a comment

Which Font Should I use in my Visualization?

Yesterday the Data Visualization Society hosted a Fireside Chat regarding typography and visualization, which was fun to participate in. There were too many questions to answer all. One question with many variants was: “Which font should I use in my visualization?” The answers given noted that there isn’t any one font, it depends on the use. In this post, I’ll list a few that I tend to use and why; and a few caveats.

Small text

For things like tick labels or labels in the plot, I tend to use a font that will be robust on the screen at a small size: it needs to be legible. This is not the place for a “display font” (fine serifs, funky letterforms). Use a workhorse font, such as the ones you might see heavily used in mobile design, such as these sans serifs: Roboto, Source Sans Pro or Segoe. A very close second is a slab serif font. Slab serifs are chunky serifs so they can work well at small sizes. Two that I like are Rockwell and Roboto Slab.

Top 250 words associated with one or more emotions.

Data driven text

I like to use data-driven text in visualizations. Like labels in maps, type can express data values not only by varying size, but also by varying attributes such as font weight, width, typeface and so on. Much of this blog has examples of data driven text, such as the emotion word graph above, as well as my upcoming book Visualizing with Text. Here’s a sample of type attributes that can be data driven:

Data driven font attributes

Even though the row “Typeface” shows some rather funky fonts, for data driven fonts, I tend to stick to a small number of different typefaces that can be readily distinguished. Readily distinguished means that each font should look different from the other fonts used but still work at small sizes. Again this rules out display fonts. I might use a mix such as a sans serif with a high x-height (e.g. Source Sans Pro), a slab serif (e.g. Roboto Slab), a serif with a low x-height and humanist letter forms (e.g. Garamond; or maybe a high stress serif, such as Bodoni), a blackletter font (no current favorite, avoid ornate ones, Lucida Blackletter is OK), and maybe a handletter font (again, avoid ornate ones, I like Tekton Pro: verticals are vertical and not sloped). Here’s a snapshot so you can see how different some of these fonts can be:

Examples of different fonts for categoric encoding.

When encoding quantitative values into text, the most common approach in maps is to use small variation in size, or variation in font weight. You need to use a font with a large variation in weight from lightweight to heavyweight. Again, Source Sans Pro and Roboto offer a wide range of weights. Variable fonts often offer a wide range of weights. Some fonts also offer variation in widths – in this case I might use Saira which has many weights and many widths, but there may be better variable font choices now. Variable fonts are also better suited for web: instead of downloading the 36 weight and width combinations, a single font can be downloaded then configured in CSS.

Titles and Subtitles

Titles and subtitles are generally larger. This gives you more options. Often titles and subtitles may contain a sentence or two. Readability is a consideration and serifs are often considered more readable. I tend to like to use slab serifs (e.g. Roboto Slab) or a geometric sans (e.g. Gill Sans or Lato) for titles. Geometric sans tend to use simple geometric forms, such as a perfect circle for the letter o, which tends to make them wider than other sans serifs, which is why I don’t use geometric sans within the visualization.

Caveats?

There’s always caveats. If you’re creating a visualization where the labels use codes, such as airline flights (e.g. AC123), bonds (e.g. IBM2.5-250515), airline reservation codes, etc, make sure that the numbers and letters are clearly distinct – for example O0 or I1l may look too similar (e.g. Titillium Web). This is a real problem in many displays such as air traffic control, electric grid operations, financial market screens, and just about any modern app where items refer to ID codes. Font B612 was specifically designed to maximize these differences usable at small sizes in visual displays. Also note that many monospaced fonts are designed to accentuate these differences, such as Inconsolata.

Posted in Data Visualization | Leave a comment

Designing a Book Cover (or the long history of text on paths)

Note: I will be speaking at the Data Visualization Society (DVS) Fireside Chat on Typography for Data Visualization on Wednesday June 24th.

After two years, my book Visualizing with Text is getting close to publication. Finally, it is time to design the cover! About a year ago, I designed a placeholder “cover image”. It was procrastination: I should have been writing content, researching, tracking down copyrights and preparing images.

The initial place holder image I decided should be something that indicated both the history of representations that manipulated text and the modern, new visualizations that I was creating inspired by some of these historic images. The book has a lot of different visualizations, so I thought of a potential collage, perhaps focusing on a set of images from just one or two techniques. I’d always received strong positive feedback every time I showed text-on-a-path for social media visualization, so I focused on that technique. Furthermore, showing conversational text as text-on-a-path has a long history, so there were lots of fun images available to use ranging from medieval paintings and comics through to my visualizations. Then I made a quick placeholder image with some text, images and an axis:

The placeholder book cover.

With the interior of the book submitted in April, it became time to focus on other aspects of the book, such as endorsements, cleaning up images, cleaning up code, and the real cover! One of the early reviewers of a draft version of the book was John D. Berry, a typographer and designer I’d met at a conference during my PhD. John graciously offered to create the cover and I jumped at the opportunity to work with him – I really like John’s modernist design sensibilities in his portfolio and I like the opportunity to collaborate with other designers with expertise in areas that far surpass my own. We would need to follow the publisher’s template, given the book is one in the AK Peters Visualization Series edited by Tamara Munzner (which I am honoured to be a part of).

AK Peters Visualization Series

John created many different book covers for consideration, some based on content from the book, some based on contemporary typographic art, and some using historic images. John recommended an abstract approach, suggestive of the interior content, using large images, so that it might stand-out both on a shelf in a bookstore and on-line in a browser. That matched with my own preferences for high-contrast, clean modernist designs.

Potential book covers.

I really liked some of the covers based on contemporary typographic art. But we didn’t have much time, nor did we have budget to get license rights for one of these, so we decided to explore the historic image route.

I had provided John with a few dozen historic text-on-path/spoken text images, plus a few variants of my text on path visualizations of social media and news headlines. Historic images included late-Gothic scenes with banderoles (a scroll extending from a character indicating spoken text), such as the monks (above left), colorful paintings, and many examples in block-books from the mid-1400’s:

There are very many examples of text on path over centuries.

I’ve also used comic book examples a number of times in my analysis: as comic artists expressively use type, twisting it along paths, varying font styles and so on. Looking backwards in comics, there are great examples of text at all angles in bubbles in the work of early caricaturists such as Thomas Rowlandson, such as the example I’d used in my placeholder, as well as the hundreds of others that Rowlandson produced. John explored the Rowlandson images and found these emotional characters:

Rowlandson’s characters strong reactions!

At one point, I riffed on one of John’s designs and the above original design to create an over the top collage: many different examples of text on path, many different time periods; then pasted over top maps from many times periods. My mockup stretched across both covers. But, there’s some aspects in cover design that it doesn’t really address: at the end of it all, it needs to be meaningful at a postage stamp size for the person browsing books online. The trouble with big collages is that they invite long viewing but don’t necessarily provide a quick answer at a glance. More effort is required to decode the mix of elements, separate foreground from background, and so on. In design, a poor result can be a good thing – it means we don’t want to explore further in that direction.

Over the top collage

John went in a different direction, discarding the strictly linear layout, taking into account all the required design elements, and came up with a much stronger design. Each image is much more tightly cropped, retaining just enough of each. It plays with the spoken Gothic / Georgian text rising from the bottom, going though the title box, wherein it transforms into the new, colorized social media text emanating above. The title box “Visualizing with Text” transforms the input (historic representations of spoken text) into output (new visualizations of social media text). Much like how I put typography to work in my book, John put the title block to work.

As the x-axis can disappear, now the bottom portion of the book is free to express something different, in this case, a new visualization from in the book viewing the stems of root words, a kind of foundational language inspection underlying the speakers above. Perhaps Rowlandson’s two fearful characters should be afraid of both the title and the foundations.

Close to final cover.

Readers may also notice that John changed the font in the title. The prior books in the series use ITC American Typewriter, a font that hints at typewriters, which, in turn, hints at the monospaced fonts prevalent in computer code (and thus books about computer science). John and I wanted something punchier. The challenge with many typewriter fonts is that they tend to be fairly lightweight: note how it’s difficult to get Courier to standout on a slide with mixed fonts. John instead recommended Dattilo, a newer, heavier weight font with a typewriter feel (“dattilografia” is Italian for “typewriting”) i.e. the same spirit of typewriter but heavy.

Overall, we end up with a meaningful punchy cover, that hopefully engages the casual web viewer when browsing a book website. Maybe they will judge this book by its cover?

Posted in Data Visualization, Line Chart, Microtext, Text Visualization | Tagged , , , , | Leave a comment

Text and Visualization Workshop at ESAD Vallence

I had the good fortune to be invited to speak at a workshop late last year at the ESAD design school in Valence France:

ESAD Valence. It’s an awesome roof!

The workshop was titled sous le texte la carte: La visualisation du texte en cartographie. Although the title focused on text and cartography, the presentations were a bit broader, extending to visualization and other applications.

Before the start of the workshop, I was invited to a design review for a variety of student projects using interactive type. I was expecting to see some videos or maybe some processing: instead, it was all HTML5 + Javascript. As explained to me later: there are no jobs for processing – all the employers want Javascript, so they have shifted a lot of the interactive typography to Javascript now. Projects experimented with techniques such as interleaved text, animated blurs, superimposed scrolling text, interactive hierarchies, and so on within dynamic layouts.

Interactive type projects by students at ESAD Valence.

With regards to the workshop, there were a number of good presentations. However, my French isn’t great, so I wasn’t able to follow the discussions closely. Here’s a great slide regarding typography on historic french maps by Jean-Luc Arnaud (http://www.cartomundi.fr/site/#): note the use of different sizes, allcaps/lowercase and italics, to create an ordering of labels for use on different maps.

Labels for maps varying in size, capitalization and italics.

Jean Luc also presented some of his contemporary typographic maps. Not quite like Axis Maps that some readers may be familiar with, these maps superimpose text over other text and don’t repeat labels:

Small portion of one of Jean Luc Arnaud’s typographic maps.

This was followed by a highly interesting presentation on the use of standardized symbols on shipping navigation maps by Anais Déal. Being important navigational aids, one would hope that these international symbols would be consistently implemented by various national map makers. Unfortunately, they are not. Here’s some examples:

Standard international symbols on marine navigation maps don’t quite follow the standards.

Sophie Boiron and Pierre Huyghebaert showed some historic heavily labelled maps and then showed this fantastic typographic map they created. At a distance, it’s a map of Brussels (left). Zoomed way in, each block is a sentence of text (right):

Boiron and Huyghebaert’s thematic map, with each polygon made of a descriptive sentence.

I find this example particularly compelling from a text visualization perspective. One can imagine using the same technique with choropleth maps, cartograms, treemaps, hierarchical pies or any space filling visualization technique. At a macro level, the areas are highly visible and you can use color to indicate a thematic variable. At a micro level, you’ve got detailed text — not just labels, but the opportunity for explanations, descriptions, details and even a few icons.

Antoine Gelgon and Pierre Huyghebaert presented an extremely detailed analysis of all the variation in the lettering of the famous belgish comic Gaston, going deep into the technical constraints of pens that were used, touch-ups with whiteout and so on. Then, super interesting, they created a parametric font following the same approach as Don Knuth’s Metafont. The result is a variety of tweakable parameters to create computer-generated hand-lettered text for future comics and presumably merchandise:

Gelgon and Huyghebaert’s parametric font for recreating lettering for the comic Gaston.

The final presentation the very important topic of type legibility in visualizations and more broadly user interface design. Specifically, the design task was to revise the font used in displays in aircraft and air traffic control systems. The presentation showed a number of interfaces with various issues, such as low contrast, glare and other real-world operational issues with the existing displays:

Left: visual display in cockpit under ideal conditions. Right: same display with glare.

Furthermore, the existing font had the potential for confusion as the displays often had codes that combined alphabetic characters with numeric characters. With detailed user testing, the design team identified the most confusable glyphs (e.g. B/8) and iteratively designed a new font to minimize these issues, suitable for use on industrial display screens even with low pixel density. The result is the font B612. A subset of the font is freely available for download (e.g. google fonts).

Left: example glyph confusion matrix. Right: example design adjustments to reduce confusion between similar shapes.

All-in-all, a highly relevant workshop to visualization dealing with text visualization issues ranging from interaction techniques, novel layouts, to parametric text, to type legibility. And, Valence is a pretty town, worth adding a stop if you’re visiting southern France. Here’s a couple of tourist photos of the market on Saturday morning and a typographic sculpture:


Posted in Data Visualization, Design Space, Legibility, parametric fonts, Text Visualization, Thematic Map | Leave a comment

Organizing a visualization book

I’d previously created a book, with David Jonker, regarding Graph Analysis and Visualization in 2015. It was a lot of work. With lots of visuals and text, a word processor is pretty good to see a page or two, but you don’t see the whole thing. To get a sense of the book, I printed out a rough draft and taped it up on the wall of basement. It helped a lot in terms of figuring out how to move things around.

Similarly,  over the course of my thesis and my upcoming book, Visualizing with Text, due in October 2020, I wanted to get a better sense of how everything fit together, not just page by page views. However, this time I invested in a 32″ 4K monitor. I could look at, and read, 12 pages at a time. That was good for working on chapters and sections, to see how groups of images worked together. An unexpected side effect was that this large monitor allowed me to sketch out many different alternatives on the screen to help bring together and organize many aspects of the work.

Paper Outlines and Sketches

Before reworking everything on screen or in printouts, the process often starts with some lists and diagrams on paper.  I don’t have many of the rough scribbles, as the loose paper tends to get thrown away. Here’s a few sheets that haven’t hit the trash yet:

Visualizing_with_Type_Reorg_Notes.jpg

Organizing the Design Space

The crux of the book (and the earlier thesis) rely on creating an organized design space for all the bits and pieces of text being used within visualization. Over the last 7 years, pieces emerged by looking at historic examples, talking with experts in different fields, creating bits of code and noting what worked or failed and so on. The organization of these different pieces into a whole was emergent: it was not a linear process.  There were false starts, things that sort-of worked but not quite, and even when the organization got close to the final form, many tweaks and variants. I spent many weekends over many years on a few diagrams that organized everything text and visualization. In effect, these iterative diagrams represent a research through design process. The effort for these diagrams surpassed the writing effort associated with a chapter.

Here’s a diagram of the many iterations, as a timeline, where you can see some historic starting points, successive iterations, and a few dead-ends. You can see near the end, the diagrams become bigger and more complex: more ideas can be explored on a 4K screen rather than paging through many screens.

Visualizing_with_Type_Design_Space

One recent dead-end is labelled “everything” in the above diagram. It attempts to fuse the entire process into a single diagram. The left page in the photo has notes regarding text interaction, the research sources and relation to the everything diagram. In doing so, I realized that some elements in the diagram are less researched and less examined than others (e.g. interaction, cognition). Attempting to add these other pieces into the book would have added another 40-80 pages and possibly two years of new research: working with editors, we agreed that these were out-of-scope for this book (but it helped organize the related content and spurred a few enhancements to parts of the book).

Organizing the Chapters

The design space is the first third of the book. The rest of the book is all about new kinds of text visualizations, heavily illustrated with example visualizations that I’ve created. In late 2019, I had a lot of the content, but I didn’t like how it was fitting together and felt that there were still some gaps. Some of the content was orphaned, some was duplicative, and so on. Furthermore, some examples were throwaway and could have been better constructed to link to broad themes in the book.

At this point, I decided to take one image from each of the examples, create a map of the existing book, and then scribble over top where things should move into different groupings, items to remove, items to add, items that were missing.

Visualizing_with_Type_Chapter_Rework2.jpg

The lower half of the above screenshot represents all the examples in the final 8 chapters. In the upper half are some reference images that organize, structure and introduce these 8 chapters. The references and the content are interdependent: adding / removing / moving examples changes the chapters and changes the introduction. And the organization implies aspects about the design space: the book evolved into Visualizing with Text instead of Text in Visualization, because through these design processes I realized that the design space was bigger than the traditional palette of what most people think of as visualization today.

This process was stressful, because it meant re-writing sections that had already been written, and there was a looming deadline. But, in the end, I am much happier with the result. And I think the extra pixels helped with this reorganization more effectively than rearranging pages or post-it notes on a wall.

Posted in Data Visualization, Text Visualization | Leave a comment

Shapes or Alphabetic Point Marks?

In some visualizations, such as scatterplots, a visualization designer might use different shapes to encode categoric data. Abstract shapes such as circles and squares can be used, but in practice, many visualization systems have a limited number of shapes (e.g. 9 in Excel, 10 in Tableau, 7 in D3.js). What if you need more?

Pictographic icons can be used, but are difficult to design for abstract concepts (e.g. GDP, CPI, or a list of cities); are not intrinsically orderable; and may be ambiguous (e.g. see Clarus the dog-cow, an early Mac icon). More important, using pictographs can be problematic, the difference between two pictographs might be subtle and require close inspection.

Wouldn’t it be nice to have a ready-made set of 25 or so simple but very different shapes available to use?

Many categoric shapes, same aspect ratio, same area

What are the design criteria for these shapes:

  • Same area. You don’t want some to be big, and some small: If there are two clusters, each with 10 items but different shapes, you want the total ink to be the same.
  • Square aspect ratio. You don’t want some shapes to be really long, some to be really tall. You still want to be able to quickly scan and find a minimum or a maximum without being fooled by shapes that are stretched out.
  • Different. You want these shapes to be different, because they’re encoding categoric data. Each category is different. So, how do you get a bunch of shapes that are maximally different?

The last criteria is hard to solve for. It asks “What is shape?” The answer is longer than a blog post. But you want variation in tangible shape-like attributes such as curvature, angle, convexity, orientation, corners and so on.

Procedural Shapes

One approach is to procedurally generate a bunch of different shapes. This sounds like a good idea – until you try to generate 25 unique shapes. Here’s a naive set of 18 procedural shapes. It starts with a square (bottom left) and replacing corners of the square with a diagonal edge, a radius, and so on:

Yes, all these shapes are different, but they’re underwhelming. They are all arbitrary – and other than the square none of them look like anything. And they aren’t that different – no convexity, all smoothish edges, and so on.They all look like bits of wood left on the floor of the woodshop. They aren’t recognizable or nameable.

Nameable Shapes

Perhaps another criteria — an unproven hypothesis — is that we’d prefer the shapes to be recognizable and nameable. Think about color – we tend to use colors such as red, blue, orange, green, black in visualizations. We tend not to use colors such as burnt umber, raw sienna, charcoal, chartreuse; nor patterns such as plaid, houndstooth and polkadots.  Things that we are more familiar with are easier to recognize and differentiate: we already have a slot for it in long term verbal memory. So, for nameable shapes, ideally we’d like abstract shapes, so they are not too finicky, complex and difficult to use at small sizes. But we do want them to correspond to nameable things, so they need to be really simple and different.

So here’s 27 highly differentiated, nameable shapes, all with roughly the same aspect ratio and area:
They seem more different than the procedural shapes. The nameable may be a bit dubious:
the top row is more nameable than the bottom row.

Alphanumeric Shapes

Having worked the last 6 years with text and visualization, it now seems obvious that another set of 26 squareish, similar area, nameable shapes are Latin uppercase characters:

These are Source Code Pro – a fixed width font – so the area should be highly similar between each glyph. And uppercase so they are all the same height (except for the Q in this font). And having been tuned over 2000 years, perhaps they have naturally evolved to be maximally different? Furthermore, since we read millions of letters, we have highly tuned our visual systems to recognize them.

Which one to use?

Alphabetic shapes or nameable shapes? Which to use? We could subject them to tests, to make sure that they work at small sizes and remain clearly different:

The green shapes aren’t quite as robust – the rounded rectangle and the square are too similar. Some fine tuning may be required.

Ideally, it would be great to run some usability studies to see which work better.

Thoughts? I’m also curious as to what you might name the green shapes, feel free to name them all in the comments.

More info: For more in depth look at some really interesting glyph research, take a look at Eamonn Maquire’s PhD thesis and Reta Borgo et al’s state of the art report on glyphs.

 

Posted in Alphanumeric Chart, Data Visualization, Shape Visualization | Leave a comment

Awesome periodic table with aligned bars per cell

Periodic tables of the atoms are great visualizations. Much has been written about Mendeleev’s periodic table and other tables that organize atomic data. The periodic table is a powerful tool because the elements are organized and aligned by commonalities, enabling prediction of unknown elements in the early usage of periodic tables.

While looking at various tables regarding use of text to visualize data in tables, I stumbled across this periodic table by Henry Hubbard and William Meggers (1963) at the Smithsonian:

Periodic_Table_Hubbard_Meggers_1963_Smithsonian

Data dense periodic table from Hubbard and Meggers 1963.

Most periodic tables show only a few attributes per element, such as the atomic symbol, the atomic number, and the name. But there are many more data attributes per element such as expansivity, compressibility, ionization potential, atomic weight, isotopes, crystal form, orbits, magnetism, state at room temperature, melting point, boiling point, atomic radius, and so on. What’s really interesting in Hubbard & Megger’s table is that they pack in all of this information into each cell using various visual cues, as shown in this blurry legend from an earlier edition:

 

Periodic_Table_Hubbard_Meggers_Legend

Each table cell is packed with data.

Cell’s have text and numbers like many modern periodic tables, but they also have bars around the perimeter and triangular markers indicating quantitative values, plus dots, symbols and diagrams.  One may wonder:

Why is the quantitative data represented as bars around the cell, and not just numerical data?

Recall that the periodic table is organized so that rows and columns organize elements by commonality. By using bars, visual comparisons can be made along a row or column. Here is a redrawn simplification of the first column from this chart:

Periodic_Table_Hubbard_Meggers_Col1_redraw

Closeup drawing from Column I showing bars and triangles around the perimeter and an overlay line showing trend.

This redrawn closeup is focused on the quantitative graphics around the perimeter of the cells. For example, the bottom bar on a cell shows the ionization potential in bright orange. A viewer visually attending to these orange bars can compare this quantity within a column by scanning vertically (as shown by the overlaid dashed orange line). In effect, this creates an embedded bar chart that spans across the cells – as shown by the overlaid orange dotted line. It is highest for Hydrogen (H) at the top of column, then decreases down successive elements in the column to Cesium (Cs). The next element in column, Francium (FR), has no bar, as presumably this value has not been measured when this chart was published; however, by observing the trend, one might predict the value for Francium.

Similarly, the top bar per cell can be visually scanned to show a trend (as shown by the overlaid dashed green line). In addition to the four perimeter bars around the cell, there are also tiny triangles that float along each edge, showing other quantitative variables. For example, the triangle on the right edge indicates specific heat by its vertical position. These can similarly be compared across cells.

Note that horizontally oriented bars better facilitate comparison within a column than across a row. That is, horizontally oriented bars share a common baseline along the left edge of the column. A common baseline allows for more accurate comparisons of quantities than bars that do not share a common baseline (Cleveland and McGill 1984, or Heer and Bostock 2010). It is unknown how Hubbard and Meggers specifically chose which variables to place horizontally and which to place vertically to facilitate columnar comparison and row-based comparisons.

The notion of creating these aligned marks in the context of other data seems to be an interesting idea for both packing a lot of data into the visualization while at the same time organizing the data to facilitate visual comparisons and projections.

 

Posted in bar chart, Data Visualization | Leave a comment

Revisiting Maps for Inspiration

I write a lot about typography and visualization. It all started with critically looking at maps and noticing differences between modern visualization and old maps. I did a PhD looking at typography, text and visualization. (Stay tuned, there will even be a book in late 2020 about visualizing with text – with many new visualizations beyond what I had in my thesis!)

Back to maps. I was invited to speak at ESAD Valence about visualization and I decided to take a break from book writing and revisit the original inspiration: maps. Cartography has different rules than visualization, a much longer history, and many different techniques readily visible. So, I cobbled together some of my favorite maps to talk about and point out some observations.

Gough Map, 1360

The Gough map is a wonderful medieval hand-drawn map. Rivers are diagrammatic starting as bullets and flowing in almost straight lines. The iconography for towns varies from simple sheds, to an added cathedral tower, to a cluster of small buildings, to the walled city of London.  Typographically, it’s interesting with an ordering of labels. While most towns are labeled in brown, London is literally labelled in gold. Distances between towns are labelled in red, and counties are labelled in red with boxes (e.g. Suffolk).

Map_Gough_1360.png

The Gough map. London is literally labelled in gold.

Munster’s Geographia Universalis, 1540

Skipping ahead two centuries, Munster’s maps from Geographia Universalis (1540) are interesting maps at the transition to the printing press. Like the medieval Gough map, rivers, mountains and towns are highly stylized forms and pictographs, which are combined together with typographically differentiated text in italics, caps and roman. Although the geographic map is a woodcut, the lettering is highly uniform and likely metal type composed together with the woodcut by a form cutter. The resulting aesthetic balances the rougher shapes and textures of the woodcut with the fine metal letters plus some ingenuity by the artisans to get it all fit together. Towns are consistently horizontal but labels are angled to fit, such as Vincentza turned almost upside down:

Map_Munster_1540_2.png

Munster’s maps: woodcuts plus text.

Willem Janszoon Blaeu, 1629

Engraving enabled much finer detail than feasible with woodcuts: both the topography and the labels could be engraved in detail. Willem Janszoon Blaeu‘s maps have an expanded set of iconography, now reduced even smaller to tents, pyramids and tiny houses. The path of rivers is more accurate and mountains have shading. The engraved text now has more opportunity for variation. River labels more closely align with river courses. Labels corresponding to areas are larger and spacing starts to increase (e.g. D A N).  Plus many other text variants (size, case, italics) differentiate between names of towns, cities, provinces and regions.

Map_Blaeu_1629.png

Blaeu’s engravings: more detail and more text variation.

Crome’s Neue Carte von Europa, 1782

Crome creates an early thematic map, Neue Carte Von Europa, showing location of different crops, livestock and minerals in Europe in 1782 (previous post). An even wider range of icons are now required to indicate all the different types of resources: gold, silver, copper, zinc, iron, mercury, marble, fruit, honey, salt, rice, fish, wood, horses, pigs, etc. — 56 different types of commodities. After running out of icons, two letter codes are used, e.g. Kr for cork, Tb for tobacco, Cr for currants and so on.

Thematic_Crome

Crome’s map filled with icons and alphanumeric codes.

Sherman’s map, 1864

During the U.S. Civil war, general Sherman lead his army deep into the Confederacy, far beyond his supply lines. Sherman’s map combines traditional topographic detail with an overlay of resources summarized from the 1860 census. Starting with a base map showing counties, cities, rivers and railroads, an additional 15 variables of census data are added regarding the quantitative resources available: population, livestock, and agriculture. The map provide Sherman with the ability “to act with confidence that insured success.” As an early datamap for analytical and planning purposes, it shows the value of depicting many dimensions of data simultaneously, to aid in trade-off decisions, such as food available, potential resistance and potential supporters.

Map_Sherman_1864

Sherman’s map: 15 quantitative resources per county.

Ordnance Survey, 1921

Modern maps, using printing presses, reach a high in the early 20th century for the amount of information packed into them. Ordnance survey are a favorite for the amount of information that they pack into each label. In this example from the early 1920’s, place names vary capitalization, italics, size, font family (plus the actual name) to indicate 5 attributes per label (legend here).

Map_Ordnance_Survey_1921.jpg

Ordnance Survey: 5 variables indicated per name.

Steiler’s Atlas, 1924

Similar to the Ordnance survey, mapmakers on the continent also created maps with high-dimensional labels. Stieler‘s maps are typographically interesting as the labels use an ordering of underlines (dot, dash, solid, double solid) to indicate cities with different levels of governance (e.g. capital of a county, province or country). Also, backward italics for water features, curved and spaced test to indicate area features, and so on.

Map_Stielers_Atlas_1925_2.jpg

Reverse italics, multi-level underlines, and more.

 

FAA Aeronautic Chart, 2019

Here’s a map that’s only a few months old from FAA.gov, and packed with a phenomenal amount of information for pilots. There are many different classes of information, visually distinct from each other. The base map has topographical shading in hilly areas, bright yellow in urban areas. Overlaid are blue and red layers, each with a wealth of information regarding the corresponding airport, runway configuration, airspace, routes, waypoints, radio frequency, visual markers such as stadiums, wide turbines and bridges, and more. Icons and alphanumeric codes are heavily used to compact data for expert users. All text remains legible, with the background/basemap largely being light/bright upon which other layers can be superimposed, and if needed, some text is set with light halos.

Map_FAA_2019_SFO.jpg

Aeronautic chart, packed with relevant data for navigation.

So what?

Even though most people might think of Google maps these days, with minimal representation of roads and highly undifferentiated labels, the history of maps shows far richer solutions packed with many layers of information. These much richer maps, like the aeronautic chart and Sherman’s map, show that there are uses and applications where people need more information than only a couple classes of information within one visualization. And all the examples here show how all this extra data can be communicated with labels, symbols, lines, layers and more.

So, where and when could scatterplots, timeseries charts and treemaps add many layers to increase their information content and aid new analytical uses?

 

Posted in Data Visualization | Leave a comment

Bertin’s Reorderable Matrix

I recently had the opportunity to attend a workshop at ESAD Valence. To my surprise, in their collection, they have original parts from one of Bertin’s reorderable matrix!

Bertin_Matrix_Blocks_Box.PNG

I had the opportunity to use the rebuilt matrix at VisWeek in Paris 2014. I’ve simulated the matrix using Excel macros and Excel conditional formats. Essentially the reorderable matrix is a physical visualization that takes a table of structured data and enables resorting of rows and columns based on data values to reveal clusters. Each block shows data on the top surface which represents a numeric value varying from the lowest value (full white) to the highest value (full black) and various textures inbetween. The user can then shuffle (i.e. reorder) full rows or full columns to regroup the data based on values so that clusters visually appear (Bertin called the process diagonalization, see the video). It’s a human-powered physical clustering algorithm.

This particular version is made with tiny plastic blocks, about the size of Lego 1×1 bricks and sound the same as Lego when they jostle in the big bag of bricks (Bertin called them dominoes). I arranged a few on a desk into a matrix (the connecting rods weren’t available). You can see how patterns of all black, textured, and partially textured surfaces are highly visible:

Bertin_Matrix_Top.PNG

One really interesting aspect that I noticed is the colored edge stripe on some of the bricks, seen in the picture below (and quite noticeable in the bag where you can see some blocks have bright stripes in green, blue, yellow, orange, etc). I asked, but it was uncertain what their purpose was. The stripes are always on the sides where the rods go in; never the top. I’m guessing that it is some kind of recording system. Perhaps the user would draw a stripe across a row of bricks, maybe as a way to record the state. Since these colors were on the sides of the blocks, they wouldn’t be visible from above and therefore not interfere with patterns and clusters being created.

Another interesting aspect is that both the tops and bottoms of the blocks have the black-to-white texture patterns. We speculated that the blocks were reused from analysis to analysis, and it was easy to code both sides of the blocks. But, maybe there’s more. It would be feasible to re-order a matrix, take some kind of intervention, collect more data, then color the new state on the bottom of the blocks. Then a user could flip over the entire matrix, to see if the pattern had changed in some way. Again, speculation on my part.

The Lego-like aspect also suggests to me that a reorderable matrix could potentially be constructed out of standard Lego-blocks today: a 1×1 with holes on both sides, rods, and tiles in assorted shades of grey. And then concepts about data clustering could be taught in grade school.

Bertin-Lego.PNG

 

Posted in Bertin, Data Visualization | Leave a comment