Album de Statistique Graphique

The Album de Statistique Graphique is a set of annual publications of data visualizations in France in the late 1800’s. I first heard about them from Michael Friendly a decade ago and have always been on the lookout to find them. Over the course of my thesis I did find a couple copies in research libraries, but the particular libraries required signing agreements that I would not share the photos (why do libraries do this?).

Now, finally, they are on-line, easily accessible, in high quality scans courtesy of David Rumsey (thank you!). And they are amazing! You can access all of them with a search query.

While I would like to systematically review the Album, that would be a significant multi-year project given the depth of data, the quality of the visualizations, and the time period they are situated in. (Probably of similar scale and scope to Sandra Rendgen’s new book: The Minard System which I haven’t read yet but is high on my reading list)

So, instead, here’s a few of quick snaps for inspiration.

1. Whole pages

Each page is remarkable with many different visualization techniques. A page is a full composition with detailed titles, annotations,  visual legends and narrative legends that accompany every visualization. Not only are the visualizations unique, they are authoritative. This particular set of half pies on a map of France from 1882 feels familiar to some the charts in Bertin’s Sémiologie Graphique 80 years later:


2. Legends

Legends are used to explain many visualizations — presumably some of these visualization techniques are quite new in 1880’s. This particular legend (on the left) acts as both a legend and a summary of the full dataset. These pies are interesting too, using hue to create a top-level hierarchy and brightness as a secondary-level. These pies also work nicely as nodes in a graph (something to consider in the ongoing pie chart debates):


3. Overlapping Colors

With computer-based visualization, it’s really easy to create data-driven geometry — if that geometry is overlapping, then it is also easy to use transparency to help differentiate between the overlapping colors. However, overlapping colors are non-intuitive. We know theoretically that yellow occurs when red and green overlap, but it requires some mental effort to decode. In the Album, this particular radar plot has a filled red object and a filled yellow object: where they overlap it’s alternating horizontal stripes of red and yellow – much more easy to decode if you simply look at the book more closely.


4. Multigraph Lines

For those who think Sankey diagrams are great, here’s an awesome graph visualization. A multigraph is a node-link diagram where there can be more than one link between nodes. In this railway traffic diagram, there are many lines between each town. Four color variants indicates type of traffic, and which side of the center line indicates direction of travel. Line width indicates traffic volume. Corners and joins are neatly beveled. Types of traffic, mainlines, local traffic high volume anomalies are all clearly visible. (Btw, I like many kinds of graph visualizations).


5. Curvy Text

And finally, text on a path. Text annotations follow gridlines and objects, so that text doesn’t disrupt the overall visualization patterns, but still provides relevant information directly in context. Text doesn’t overlap the objects: it curves gently along paths, or angles out if the curves are too tight.


That’s a really quick take on a few examples. I didn’t even get into discussing the average circles on the radar plots or the red overlay text on the multigraph or other interesting aspects. Far better to browse the full set of images over at David Rumsey and draw your own insights.


Album Search





Quotidein de Train1893


Posted in Data Visualization, Graph Visualization, Shape Visualization | Leave a comment

Using Font Attributes with D3.js

Most of the typographic visualization examples on this site are created in D3.js, using  SVG. There are tons of examples of D3 on the web used to create bars, dots, etc; but not many online examples where font attributes are manipulated using D3. So, to help people create data-driven font-attributes using D3.js – here’s a collection of examples. A running version is over on codepen which creates a simple little visualization like this (on Chrome):


And, here’s some pointers on how to make this work.

Color “fill”

The simplest example is to use color with text. It uses the same attribute as any other colored element in d3, i.e. the fill attribute. A trivial example is to set the color with a function, in this case, a separate list of colors:

 .attr("fill", function(d,i) {return clrs[i];})

Font Weight (aka Bold): “font-weight”

Font-weight is almost as simple as color. The font-weight tag takes a numeric value between 100 and 900.

.attr("font-weight",function(d,i) {return i*100+100;})

For default web-safe fonts, such as serif or sans-serif, there are usually only 2 weights available: plain and bold, which are typically set to the numeric values 400 and 700. If you want to use a font with many more weights, you first have to load the font including all the target weights. There are many commercial fonts with many weight. Google fonts has some free fonts in many weights such as Saira and Roboto. In the header section of the html, the font needs to be loaded, such as this loading of all 9 weights of the font Saira:

<link href=",200,300,400,500,600,700,800,900" rel="stylesheet">

Font Oblique Angle (aka Italic): “skewX()”

Font oblique angle is a mechanical skew of the font. It’s not the same as true italics. For data-driven purposes, the oblique angle of the font can be manipulated by using the transformation skewX() on a group per each element. This is a bit more effort than the simple color or weight. First the visual element has to at the group level. Then the transform attribute applied, wherein both the x,y location are set as well as skewX:

var obliquetext = svg.selectAll("obtext").data(data).enter().append("g")
  .attr("transform", function (d,i)
     {return " translate(" + (i*40+100) + ",55)" +
             " skewX(" + (i*10-40) + ")"; });

Then, with the group transform set, append a text element and the text will be appropriately skewed to represent oblique text:

  .attr("font-family", "Saira")
  .text( function(d) {return d;});

One benefit of this approach is that any angle can be created, and fonts without italics or obliques can be created as needed. For example, a sloped blackletter can be created, although typographers might not be enthusiastic about this approach as legibility and readability will be impacted.

Case and Small Caps: toUpperCase(), toLowerCase and “font-variant”

Case can manipulated directly in Javascript using functions toUpperCase() and toLowerCase(). Small caps can be accessed by setting the SVG attribute “font-variant” to “small-caps” or “normal”. A simple example chooses between small-caps and normal lower-case letters:

.attr("font-variant", function(d,i) {return i<5 ? "small-caps" : "normal"; })

These can be combined with <tspan> so  parts of words can be set in upper-case, small-caps or lower-case, as shown in the codepen example and below:

var casetext = svg.selectAll("casetext").data(data).enter().append("text") 
  .attr("x",function(d,i) {return i*40;})
  .attr("font-family", "Saira");
  .text( function(d,i) {return i<5 ? 
          d.substring(0,i).toLowerCase() : 
          d.substring(0,i%5).toUpperCase(); })
  .attr("font-variant",function(d,i) {return i<5 ? "normal" : "small-caps";})
  .text( function(d,i) {return d.substring(i%5,10).toLowerCase();});

Typeface: “font-family”

Changing fonts is easy using the attribute “font-family”. For simple use cases, you can use websafe fonts, meaning there is no need to load the font into the browser, for example:

.attr("font-family", function(d,i) {return i<5 ? "serif" : "sans-serif"; })

For more variation in fonts, you can load lots of different fonts into your page and then access them in SVG. First step is to find and load fonts: see for a large variety of free webfonts with easy cut-and-paste code to add to the <head> section your page. Here’s an example of a dozen fonts loaded into the page:

<link href="|Arima+Madurai|Arvo|Henny+Penny|Indie+Flower|Libre+Baskerville|Pirata+One|Poiret+One|Sancreek|Satisfy|Share+Tech+Mono|Smokum|Snowburst+One|Special+Elite" rel="stylesheet">

Then, you can access those fonts in your Javascript, e.g.

.attr("font-family", function(d,i) {return i<5 ? "Arvo" : "Sancreek"; })

Note that fonts with spaces in their names have a plus symbol in the link string, e.g. “Henny+Penny”, but when you specify that font as an attribute, you need to use the space, e.g. “Henny Penny”.

Underline: “text-decoration”

Underlines in SVG are disappointing: a single underline: no dashes, no wavy styles, no double lines. Blah. You could make your own underlines – it’s just a line added to text, but then you have to worry about interference with descenders, getting the right lengths and so on. Not for this post. For simple underlines, set the attribute “text-decoration” to “underline”. For this post, I tried to make underlines of different lengths by applying underlines to portions of text using <tspan> and surprised that different browsers gave different results. Use at own risk.

Font width: via “font-family”

Font width could be manipulated the same way that oblique text is created above – using scale() instead of skewX() – but scaling text is highly frowned upon by typographers (see Lupton’s type crimes, and more generally a really nice introduction to typography).

Instead, we can use a font that has been well-designed with lots of different widths.  Saira, at Google fonts, comes in 4 widths. So, you just need to load all the width variants, then access them using “font-family”. Here’s a trivial example:

.attr(“font-family”, function(d,i) {return i<5 ? “Saira” : “Saira Condensed”; })

Btw, thanks Omnibus Type for making a well-designed open-source font super-family in a variety of widths and weights freely available.

Spacing (aka Tracking): “letter-spacing”

The spacing can be adjusted between letters and this is often done on maps to indicate features that span across a large area such as  R o c k y  M o u n t a i n s. It is easily accessed in SVG via the attribute “letter-spacing”. Set the value in type coordinates using fractions of an em. 0em is the default spacing. Negative values such as -0.1em will pull the letters tighter together, positive values such as 0.5em will space them further apart. Note browser inconsistency: Chrome seems to do OK, Firefox and IE not.

.attr("letter-spacing", function(d,i) {return return i*.05-.1 + "em"; })

Outlines: “stroke” and “stroke-width”

I don’t like to use outlines on text, but you can use them. Generally, outlines don’t work well on thin or lightweight fonts, you need to start with a fairly heavyweight font without much detail (e.g. Saira Extra Bold, Arial Black, Source Code Pro Black, etc). Then it’s just a matter of setting the attribute “stroke” to a color such as black, “stroke-width” to some very small value and “fill” to none. Notice how it has a very limited effective range as shown in the example. The stroke is almost invisible on Abe and at the other end, the insides of the a in Ian and e in Gem just turn into blobs.

Baseline Shift: “dy”

You can shift letters up and down using the attribute dy. (if you want to put text on a path, see earlier post regarding microline text). You can provide a list of values to dy, then each successive character will take each successive value in dy. Here’s a simple example:

.attr("dy", "0 0.2 -0.1")

Outlines: “stroke” and “stroke-width”

I don’t like to use outlines on text, but you can use them. Generally, outlines don’t work well on thin or lightweight fonts, you need to start with a fairly heavyweight font without much detail (e.g. Saira Black, Arial Black, Source Code Pro Black, etc). Then it’s just a matter of setting the attribute “stroke” to a color such as black, “stroke-width” to some very small value and “fill” to none. Notice how it has a very limited effective range as shown in the example. The stroke is almost invisible on Abe and at the other end, the insides of the a in Ian and e in Gem just turn into blobs.


All the  above can be used together in any combination. See the codepen examples.

Variable Fonts:

In theory, variable fonts can make other unique features of fonts available to D3. I haven’t figured out how to do this directly in a line of SVG (i.e. by setting an attribute tag) and presumably need to dig into style tag or CSS to connect these together. Maybe someone else will do and post examples?




Posted in Data Visualization, Font Visualization, Text Visualization | Tagged , , , | Leave a comment

Text in Visualization – thesis on-line

You can now find my full thesis on-line. Instead of reading the whole thesis to learn about the design space of text in visualization, you can find a two page overview that summarizes the entire thesis on page v-vi:BrathThesisTextInVizDesginSpaceOverview.PNG

The first half of the thesis (page v on the left) methodically defines the design space by reviewing many examples. The second half of the thesis (page vi on the right) then tests the breadth of the design space by creating many different kinds of extended and novel visualizations and provides general critiques. If you want to drill down into any area, little blue subscripts are links to the corresponding chapters.

For readers of the blog, you’ll find more detail on many items previously discussed here.

Posted in Data Visualization | Leave a comment

Word Stems Visualized

In this blog there have been many posts of words visualized where differences are accentuated and encoded using bold, italics, underlines, etc. But what if you want to visualize the similarities?

Stemming is a basic task in a lot of text analytics where the same semantic word has variant spellings, for example, to indicate different verb tenses (e.g. swim, swam, swimming). But there are also interesting derivations with different meanings, e.g. swim, swimmer, swimmable. So, how could you visualize these to focus on the commonality across the word roots, not the differences?

Stem & leaf plots are possible, but previous examples shown here create lists of leaves, not well suited to comparing syllables. Word trees have been discussed here before, but word trees put a big gap between different parts of text and may vary sizes and weights of different chunks of text in the tree.

Here are six English word sets, each with a common five letter root:


Visually, there are six root-word-plots here. Each has the common root word running vertically along the left side of the plot as a stem (e.g. night-, orthomicro-, etc.) and affixes branching out horizontally along the right side of the plot. The left root and right affix indicate full words (e.g. nightcap, nightclub, nightfall, etc). When there are common intermediate syllables, these span across the common words (e.g. graph in orthographic, orthographical, orthography).

Visually scanning any group means that the affixes can be easily compared. For example, night- -cap, -club, -fall, -ie, -ingale are all derived words with complete different affixes, all referring to completely different objects. Under micro-, microbe and microbiology have much more commonality in meaning: being about tiny life forms or the study thereof; — although very different meaning from the words microchip and microcosm. Under astro- there are two very different branches of study, namely astrologer, astrological, astrology versus astronomer, astronomical and astronomy.

You can see that the first column, showing words starting with night- and with stand-, have root words that are independent words used to form compound words. Visually, these compound words don’t have additional shared syllables: the prefix is being used to create new words to define unique objects. However, the later two columns have Greek prefixes (ortho-, micro-, chrom-, astro- ) — none of these prefixes are independent words. And in each of these, there are common syllables indicating subsets of related words. At the same time, these common prefixes can be used to create new, highly different words that deviate more from the others, such as astroturf, or microchip.

From a design standpoint, the visual layout borrows from stem & leaf plots, with additional intermediate grouping and only singular leaves (so, not much like a stem and leaf plot:-). Design-wise, it also seems problematic that words don’t split quite on syllables: for example, the plot shows astro-nom-er whereas it should be as-tron-o-mer. 

Technically, it is not easy to create text of different sizes and widths that all visually appear to have similar stroke weights. Ideally, a font based purely on strokes rather than fills would work well for this. The early vector-based computer fonts by Allen Hershey would be great (which I used once-upon-a-time on an old Textronix 4014). However, their obscure format isn’t readily adaptable to modern font standards. Please, Frank Grießhammer, I hope you can find the time to release the Hershey fonts in OTF format! This is one example of a real-world application for vector fonts.


Posted in Alphanumeric Chart, Data Visualization, Text Visualization | Tagged , | Leave a comment

Why history matters in data visualization

In any thesis or academic peer-reviewed paper, positioning your work in context of prior research is paramount to show your unique contribution and how your work “stands on the shoulders of giants”[ref].

My recent PhD thesis goes beyond the typical references of the last 10-20 years in my field (data visualization) and even the origins of my field (arguably, the foundations were set 50 years ago by Jacques Bertin [ref]).  I look beyond the field to other old domains such as cartography, typography and the arts.


A lot of what we are “inventing” in visualization have precedents in history and other domains. If there are precedents – maybe something was learned over the other field that we can leverage? Here’s a few of examples:

Sunburst chart
(aka hierarchical pie chart, concentric chart)

John Stasko and Eugene Zhang did this great visualization of a sunburst chart back in 2000. It’s a great approach to intuitively show hierarchical data. And you can find many great implementations on D3 these days too:


But there are earlier precedents. I particularly like this one: A Zoological Chart from Fike’s Concentric Charts of the Sciences, from 1890 (110 years earlier than Sunburst):


There are some really interesting details here in this pre-sunburst chart. Text rotates to best fit each segment – and spaced out to fill wide wedges, tight for narrow wedges. There are great little images out at the edge of the hierarchy, presumably a great way to engage bored students. Delicate colors that don’t fight with the text. And particularly interesting, the chart is padded with empty slots so that the each circle is complete – not ragged like most sunburst charts.

Word Trees

Word trees are awesome. The examples by Martin Wattenberg and Fernanda Viégas are viscerally and intellectually engaging with wonderful examples from classic texts. Not as many examples in D3.js, and, unfortunately, IBM’s Many Eyes implementation no longer exists:


But there are interesting earlier examples. How about this example from 1541 in a text by Loys Vasse?

It’s a sentence that’s been structurally split into a tree. It’s quite similar to the WordTree, in that sentences can be split apart into trees, whether representing repetition across many sentences (such as WordTree) or logically structuring content (such as Vasse’s example). In fact, this hierarchical structuring of text lasts for hundreds of years in print documents. We can see examples 200 years later in Chambers’ Cyclopedia in 1720:


Interestingly, the approach is not strictly limited to trees, but can be generalized to draw sentences as directed acyclic graphs, such as this example (again from Vasse): WordTreeLoysVasse1541.PNG

So what?

Why do we care about these old examples? They aren’t interactive, they don’t dynamically update to different content and they were certainly difficult to create in their old technologies.

They are important because they show other approaches for solving similar problems.

In the early 2000’s I had a particularly vexing project where we needed to show a hierarchy and through the design process both tree maps and sunbursts were rejected by the client, as were other representations such as a graph, a file structure, a radial graph, and so on. All were “too complicated”. This was pre-D3, so lots of prototyping code was being written (and discarded). Instead, we revved a sunburst with padding, so that the chart was always fully circular, not ragged. The client loved it. Two years later, I saw Fike’s Concentric Charts and was impressed that Fike found a similar solution 115 years earlier. If I’d been aware of Fike’s example, we might have reached the solution faster with less code.

Similarly, the old word trees hint at other potential uses for Word Trees. And so on.


If we assume that old techniques are interesting, then what? How do we find these old examples? You can’t find “concentric charts” via Google Search if you don’t know the search term. And since Fike’s concentric charts predate the Internet (and have a very tiny Internet footprint), even searching for “concentric charts” doesn’t return these vintage results. So far, browsing is the best answer that I have: on-line such as, museum websites, library websites, antique prints, blogs, etc. But also, browsing in the real-world, such as museums, art galleries and libraries.

Let me know if you find any more great charts by Fike: despite the plural “charts” in Fike’s title, the above chart is the only example I’ve found.

Posted in Data Visualization | Leave a comment

Microtext Line Charts: Sample Code

MicrotextLinesRandomI’ve presented Microtext Line Charts a number of times. There is a lot of interest and a lot of questions. Questions are generally two flavors.

  1. How do you implement this?
  2. What happens if:
    • there are more data points in the line
    • the lines cross each other more frequently
    • the lines have sharp corners rather than interpolated bends
    • the text is a bit larger (or bit smaller)
    • the text is differentiated using caps; or italics; or weight; or etc.
    • the text has a halo, doesn’t use color, uses different sizes on different lines, etc.
    • the text animates with each successive update
    • you could put data values in the lines at a high point or a low point
    • you could shift the text so that it is less likely to overlap
    • the text is a narrative explanation instead of labels
    • etc.!

Short answer to both questions: Here’s a link to an interactive example on CodePen. Try it out, copy it, make changes, run evaluations. It uses random fonts, colors and data. It has buttons to turn on/off the underlying lines and change text size. If you do use the technique, I appreciate acknowledgement (e.g. refer to this post, or cite this research paper).

For those asking how the code works, essentially D3 is a library that manipulates SVG. SVG has built-in text-on-path functionality. D3 makes lines for line charts. These lines can be used as paths. For SVG text, you can add text to a text path and then associate the text path with the line. From the SVG reference:

In addition to text drawn in a straight line, SVG also includes the ability to place text along the shape of a ‘path’ element. To specify that a block of text is to be rendered along the shape of a ‘path’, include the given text within a ‘textPath’ element which includes an xlink:href attribute with an IRI reference to a ‘path’ element.
— W3 SVG Specification, Text On A Path

For those asking all the other questions, click the link to the sample code. Each time you refresh the page, the random data will be different – more points, fewer points, more volatility, less volatility, different colors. And you can modify the code from there.

See the Pen Microtext Line Chart by RBrath (@Rbrath) on CodePen.

For example, white halos around the text *could* be added by changing the stroke outline of the text (bad idea! – the stroke width will eat into the fill of the letterform reducing text legibility in an representation where legibility is already challenged by overlapping text); or better, the halo could be added by making a second copy of the text with a white fill and a fat white stroke under the other text.

Instead of one long string of microtext, individual pieces of microtext could be placed along the line, then nudged left or right (dx) to reduce collision. Similarly, text labels corresponding to the high point or low point for a line could be shifted to the high/low points on each line based on shifting its left right position.

And so on.

Posted in Data Visualization, Line Chart, Microtext, Text Visualization | Leave a comment

Successful PhD Defense!

I recently successfully defended my PhD. Yay! It was almost 3 hours, as there were many questions. There have since been many congratulations and questions from others.  The most common question is:

How did you complete a part-time PhD in 5 years?

This is a really good question. I had previously completed a part-time masters degree in the 1990’s which unfortunately took my 6 years to do. Doing any kind of independent research it’s easy to fall into a hole where you get side-tracked on something not important, over-work some code more than necessary, design a poor experiment, complete a task without being aware of prior work, and so on. Back when I started the PhD, I specifically made a list of things to avoid/improve so that I wouldn’t fall into the same trap as before.

  1. Meet with your supervisor frequently. It’s easy to have scheduling conflicts, but in the days of Skype, web meetings, Slack, etc., it’s pretty easy to reschedule and do live meetings. My supervisor and I both agreed on meeting at least once a month and we’d reschedule as needed so the meetings didn’t get missed. This is really important to to avoid the above pitfalls.
  2. Lots of small tasks instead of really big tasks. Decomposing a big research project into small tasks is a good idea regardless of the circumstances. However, when part-time, this is really important. Small tasks can be chunked into a weekend or two.
  3. Know your limitations. You’re not on campus, you don’t have the same access to resources, you don’t have the same access to big blocks of time. I would have liked to do a evaluation study, but it had more overhead (e.g. experiment design, ethics committee), less access to students, and it would have been a big task. Instead, I did a number of small surveys.
  4. Submit, submit, submit. Submit posters, talks, papers, journals and so on. The submission process means that you have to organize your ideas, perform some focused research, analyse results — all of which are good. Then, you get reviews. Sometimes these are disappointing rejections (I got a -3 on a 1-5 score range on one paper), but there are lots of good nuggets of useful information in each rejection.
  5. Workshops. Instead of really big conferences, workshops and side-conferences are a great venue to get feedback on work in progress. Workshop papers are smaller scale making it easier to do the work and write the paper rather than the big conference. The workshops also provide for a more collaborative environment to get feedback from your direct peers specifically interested in your topic, as opposed to the mega-conference where questions can be somewhat random. If you do a really good job on a workshop paper, you might get invited to submit to a journal too. Overall, I had 12 peer-reviewed publications during my PhD vs. 2 for my masters (which was longer duration).
  6. Solicit cross-disciplinary feedback. Likely whatever you’re working on has applications across domains or at least there are different constituents of stakeholders. Directly approach those different stakeholders and get their input. They have different viewpoints. In my case, I reached out to typographers and cartographers a couple years into my thesis; and both these groups helped identify significant gaps in my work. I might have been able to get away without their feedback since my thesis reviewers were not typographers nor cartographers, but it made for a much stronger, much more defensible thesis because I’d incorporated their feedback.
  7. Background. Too many papers that I review seem to be missing related relevant research. Google Scholar has made search through a lot of current peer review research relatively easy. But don’t stop there: there is likely older relevant research that can also be found: many of the world’s largest libaries and museums are online, old websites and old texts can be found on, and so on.
  8. Blog! I used the blog as means of forcing me to always write about something related to my research (Ahem, I did not always achieve one post per month). It’s great to get feedback from the Internet at-large and see what resonates across the Internet. I thought my posting about Pokemon would have more reposts than it did. I got more reposts on my discussion regarding 500 years of separation than expected.
  9. Time-outs. There are unplanned events that always occur and need to be accommodated. My wife’s step father passed away. My mom sold her house and downsized. I had pneumonia for a couple months. You have to take some time out, but then you need mechanisms to get started again so that you don’t lose momentum. Always having a paper submitted  somewhere means you’ll get a response. Having commitments such as supervisor meetings or blog posts to do gets you back on track.

In case it’s not obvious yet, rapid iteration with frequent feedback is at the core of almost all the above tasks. Essentially, it’s about putting in place mechanisms to keep you on track, guided and focused.  It worked fairly well for me so far – now I just need to do the “minor revisions” and keep focused on getting those done.

Posted in Data Visualization | Tagged | Leave a comment