Microtext Line Charts

Tangled Lines

Line charts are a staple of data visualization. They’ve existed at least since William Playfair and possibly earlier. Like many charts, they can be very powerful and also have their limitations. One limitation is the number of lines that can be displayed. One line works well: you can see trend, volatility, highs, lows, reversals. Two lines provides opportunity for comparison. 5 lines might be getting crowded. 10 lines and you’re starting to run out of colors. But what if the task is to compare across a peer group of 30 or 40 items? Lines get jumbled, there aren’t enough discrete colors, legends can’t clearly distinguish between them. Consider this example looking at unemployment across 37 countries from the OECD: which country had the lowest unemployment in 2010?


Tooltips are an obvious way to solve this, but tooltips have problems – they are much slower than just shifing visual attention. And tooltips don’t work on hardcopy, nor in PowerPoint, nor do they work during a presentation unless you’re the person holding the mouse.

The Limits of Small Multiples

There are other visual ways to solve this, for example, sparklines, small multiples (i.e. separating each line into its own chart), horizon charts and so on. Each of these techniques creates a lot of separate charts. For example, with small multiples, you can see trend, but it’s very difficult to compare magnitude when all the charts are at different scales. Here is a subset of 16 of the 37 countries – is Denmark higher or lower than Estonia at the end?


Of course, the small multiples could be made at a common scale, so that magnitudes can be compared. Here’s the same subset, all sharing the same vertical scale:


Now, it is easier to tell that Estonia is higher than Denmark at the end. But all of Austria’s performance is squished into a few vertical pixels, making it difficult to get a sense of any of the detail of Austria’s trend. And it’s still very difficult to answer a question such as which country has the lowest unemployment in 2010. In order to do this comparison, you have to make a note of a particular point on each chart and rely on short term memory to keep track of all the different values. The benefit of a direct visual inference possible when the lines were superimposed in a single chart has been lost.

Superimposition vs. Juxtaposition

I think of this as the superimposition/juxtaposition tradeoff. With superimposition, everything is overlaid in one space with high resolution enabling local comparisons between entities – but points can be occluded, lines tangled, and difficult to identify individual elements. Juxtaposition pulls everything apart into little separate little charts – but you sacrifice a huge amount of resolution as each data subset is forced into 1/(number of items), and you have to rely much more on short term visual memory. A lot of popular visualization approaches today go for juxtaposition, e.g. dashboards.

Bringing Text More Directly into the Chart

Following the theme of this blog, consider how text can be integrated more directly with the chart. There are many possibilities:

  1. End Labels

You could remove the legend and place labels directly associated with each line, and use some collision detection make sure labels are spaced apart. This is useful if you want to compare two lines at the end – e.g. does Spain or Greece have higher unemployment in 2014?. But it is less useful the further away from the end that you move into the chart, especially if there are a lot of crossovers making it difficult to visually trace the lines – e.g. how were Spain and Greece doing back in 2006?


2. River Labels

Why do the labels need to be outside the chart? Instead, consider moving the label directly into the chart. Labels can be directly associated with the path of a line, just like river labels on a map. A little bit of collision detection can be used again to reduce label overlap. Yes, Greece had slightly higher unemployment than Spain back in 2006.


3. Microtext Lines

Consider the line again. Why do we need both lines and labels? The entire line can be replaced with a continuous string of small-sized labels. You lose a little bit of accuracy, but each line is identifiable and can be re-found after passing through a congested area. E.g. What happened to Estonia’s unemployment rate before, during and after the crisis of 07-08?


In effect the line has become a multifunctioning graphical element:  it works as both a line and as a label. There’s quite a few things going on here, so let’s address them:

  1. Color: The different colors have been maintained: it helps differentiate one line from another. If all the text were black, it would be very difficult to read where there is any overlap. Note how the yellow text (Australia) is harder to read than other text. Legibility depends on contrast, light colors on a white background are difficult to read: Australia should have been a darker shade.
  2. Different font styles: Similarly, different fonts have been used to help differentiate lines. Each font has different widths, weights, spacing, which give it a rhythm and helps distinguish it from other lines.
  3. Size: The text on the lines is smaller than the labels at the end. It’s still readable to my eyes, but brings into question how small can the text go and still be readable. I printed out this chart so that the font was 5 points (1.7mm) and gave it to a half-dozen 40-60 year olds: 5 out of 6 could read it.
    Microtext is super tiny text, sometimes printed on money as a security feature. It’s used to form lines or areas that can be revealed to be text on close inspection. I asked two teens to read microtext and they could read text down to 1.5 points (about 1/2mm). Here’s a full shot and closeup of an old Canada $2 bill – the even brown texture in the center behind CANADA is actually text as seen in the closeup on the right.
    canada2far canada2close
    Historically, you couldn’t use microtext in data visualization because monitors had very poor resolution. Monitors remained at 72-96 dpi for about 20 years – if a font became too small it was too pixelated to be readable. Guidelines in the late 1990’s recommended 12 point font with 8 or 9 point as a minimum.
    But much higher resolution displays with much higher pixel densities finally broke out into the market in the late 2000’s (thanks Steve Jobs). Which means that much smaller fonts are technically feasible. Going back to printed maps, guidelines recommended 5 or 6 point font and allowed for minimums down to 3 or 4 point (Robinson et al Elements of Cartography, 1995, or Hodges The Guild Handbook of Scientific Illustration, 2003).
    Side note, talking about fonts in point sizes on computers is tricky now. A point used to be defined in the physical world as 1/72 of an inch (about 1/3mm). However, in CSS, point sizes are relative to the display device and a font defined as “13 points” on a stylesheet actually renders physically at approximately 3.5 points on an iPhone.
    The likely answer will be that end-users may need some control over font-size on screen, or for print, map conventions are likely reasonable (5-6 point recommended minimum).

There are also some interesting questions about distinguishing lines in congested areas. With maps, typically there aren’t alot of overlapping things at one location, so maps don’t have to deal with 20 different items all overlapping at once (e.g. typographic maps). As discussed earlier, one problem with superimposed lines is areas of congestion and losing track of lines. The variation in colors and font characteristics help. On maps halos (white outlines around text) are commonly used in some software. I tried halos (left) and no-halo (right):

unemployment_halo   unemployment-nohalo

The left image, with halos, is more clear for the words on top, e.g. Brazil, but words at the lowest level are completely obscured. However, in the right image, words at lower levels are still partially visible: the colors and forms can be seen through the gaps between letters at the higher level. Try to follow Finland in the left image and the right image.


Once a decision has been made to use text along a line, there are more opportunities. The text doesn’t necessarily need to be a short label repeated over and over. Phrases, sentences or multiple languages can be used. Here’s the same chart with the microtext in Japanese, Greek, French, Russian, German, Arabic and English (click for bigger version):


Once we start bringing text into the core of the chart, it opens up a lot of new possibilities. What do tweets, news headlines, poetry, table of contents and web pages look like when they become integrated within charts?


Posted in Alphanumeric Chart, Data Visualization, Line Chart, Microtext | Tagged | Leave a comment

Visualizing Emotions

Emotion analysis of text documents is an emerging area of interest. Closely related is the visualization of emotions. Emotion analysis is the next step after sentiment analysis. In many respects, sentiment analysis is easier – there’s a single dimension of sentiment ranging from positive to negative. Emotion is a bit more difficult because there are many emotions and the first challenge is to define the range of emotions for the particular analysis. Pixar’s Inside Out settled for five emotions:

Five personified emotions from Inside Out: Anger, Disgust, Joy, Fear and Sadness

Five personified emotions from Inside Out: Anger, Disgust, Joy, Fear and Sadness (c) Pixar.

But there are more. Surprise was considered but dropped from Inside Out early on. Plutchik argues for eight, adding trust and anticipation on top of Ekman’s six (anger, disgust, joy, fear, sadness and surprise). There are also alternative taxonomies of emotion (e.g. pleasure/pain, excited/calm, etc). And emotions are not mutually exclusive – which is part of the plot line of Inside Out as sadness becomes mixed with other emotions. Here’s Plutchik’s Wheel of Emotions – with 8 different emotions, various degrees of emotion and also items between emotions:

Plutchik's Wheel Of Emotions.

Plutchik’s Wheel Of Emotions.

Also there is the challenge of generating an emotion lexicon – that is – a long list of words and their associated emotions.  Saif Mohammed uses a crowd-source approach to tag more than 10,000 words with Plutchik’s eight emotions.

Given a text corpus and an emotion lexicon, scores can then be created for different texts or different characters in the text. For example, Scharl et al created radar plots to show emotions associated with characters from Game of Thrones or Saif Mohammad profiles various texts, including love letters, hate mail, and Hamlet using bar and line charts. 

However, it’s useful to consider the words themselves and how they relate to emotions. One approach is to consider the emotional intensity associated with a word – for example, terror is a more intense version of anxiety in the emotion of fear, as shown in the Atlas of Emotions

But suppose you want to understand how a given word is associated with multiple emotions? Words such as death, money or freedom have complex associations. Given a set of words and their associations to emotions, this becomes a set visualization problem. Venn diagrams (and Euler diagrams) are a common type of set visualization.

For emotion words, each word belongs to some combination of eight emotions. However, representing all the possible combinations of eight sets is difficult to do with a traditional Venn diagram (below is a beautiful Venn diagram with seven sets – drag to flip it over). These high-order Venn diagrams are difficult to visually understand: to tell what the set membership is at any given point it’s difficult to trace around the complex looping shapes. Even with a dataset about color, a strange shade of bluish-greenish-greyish color isn’t very obvious which colors it is made up with at the perimeter:

7 way Venn: 128 different set combinations shown by colors.

7 way Venn: 128 different set combinations shown by colors by Santiago Ortiz.

Instead of a Venn layout, we can use a graph-based layout. In this case, each set category is a big anchor around the perimeter, and each emotion word is a node linked to the sets that it belongs to. Using a force-directed layout, each item ends up close to the sets that it belongs to as discussed in this paper Anchored Maps.  This approach can work well with small number of items. When using a larger number of items, a few problems emerge, including: a) all the graph lines overlap and it becomes difficult to visually detangle; and, b) items can end up in the same location with completely different memberships (e.g. using anchors set in a square, it is feasible to have an item end up in the middle if the membership of that item corresponds to all four points on the square, or any two diagonally opposing points on the square).

So, we adapt the anchored map approach for visualizing emotion words.  We start by setting the eight Plutchik emotions as anchors around the perimeter. Then we use a force-directed graph to locate all the words corresponding to their emotions (plus collision detection, so that words don’t overlap each other). Next  we use color to indicate set membership using the same color scheme as Plutchik – words that are a combination of emotions have the average color of the corresponding emotions. Finally, we add font attributes to indicate set membership. A bouncy baseline for joy, w i d e l y  s p a c e d  letters for trust, underline for fear, exclamation mark (!) for surprise,  light-weight letters for sadness, italic for disgust, blackletter for anger and SMALL CAPS for anticipation:

Top 250 words associated with one or more emotions.

Top 250 words associated with one or more emotions.

In the above visualization, clusters of words are immediately visible. For example, around the anchor word joy are emotion words such as love, daughter, specialbeautiful and so on. We can see that there are many words around the anchor word  t r u s t  but few around anger or disgust.

You can also see the graph lines underneath connecting words back to their target emotions, for example, half way between anger and disgust are lying and angry – words associated with both anger and disgust. You can tell that both words lying and angry are associated with both anger and disgust by three different cues: 1) the graph lines underneath; 2) the color is magenta – halfway between red (anger) and purple (disgust); and 3) the font is both blackletter (anger) and italic (disgust).

These words that belong to multiple sets is where things get interesting. Near the middle of this plot are words that are all variants of muddy reddish-brownish-greenish colors. Color isn’t particularly effective when trying to communicate eight different dimensions. However, font attributes are useful at two levels of understanding relationships between words and the emotions they are associated with:

1) The variation in font attributes makes it very obvious when two words have the same set membership – they have the same format. If the formats are different then the words have different memberships. For example, ANXIOUS and ESCAPE have the same membership, while S W E E T ! and D E A L ! have the same membership but different from ANXIOUS and ESCAPE.

2) Furthermore, with some cognitive effort, you can decode the membership of any word. ANXIOUS and ESCAPE belong to anticipate (small caps) and fear (underline).
S W E E T ! and D E A L ! belong to surprise (exclamation mark), anticipate (small caps), joy (bouncing baseline) and trust (wide spacing).

While the top 250 words are nice for a readable graph at blog size, Saif Mohammad’s original analysis has 10,000+ words, of which some 4,463 words have at least one emotion associated with them. Below is an image of the same graph, with all 4463 emotion words. Click for full size image to zoom in. Clusters are still visible, font differentiation is still identifiable, and individual words can be visually decoded if needed.

4463 words associated with 8 different emotions. Click for larger version.

4463 words associated with 8 different emotions. Click for larger version.

There’s more discussion about set visualization, labels and font attributes in this recent paper: Typographic Sets: Labeled Set Elements with Font Attributes. The emotion word visualization in this paper uses color to represent membership in the additional sets of positive sentiment and negative sentiment – pushing the number of sets uniquely indicated up to 10 sets. This means that each word in the visualization indicates 11 different data attributes: the literal word itself, two sentiments, and eight emotions.

Posted in Data Visualization, Font Visualization, Graph Visualization, Set Visualization, Text Visualization, Venn Diagram | Tagged , | Leave a comment

The Design Space of Typographic Data Visualization

There are many possible new visualizations using typography, some of which I’ve previously discussed in posts on this blog. One way to consider this design space is to decompose it into the different elements that can be used to assemble visualizations. These elements include:

  1. Typographic attributes. This is all the variation within type that can be used to create differentiation and encode information. This includes the literal alphanumeric glyphs as well as font weight, italic, case, typeface (e.g. Helvetica, Times), underline, width, baseline shifts (e.g. superscript), delimiters, x-height, as so on. Of course, other visual attributes such as color, size and outline can also be used.
  2. Data Encoding. Type can encode different kinds of data. Labels on maps use type to indicate different types of data as shown in the example below. The names of areas in this map use type to indicate: a) literal data, such as the name of the town or region; b) categoric data, such as whether the area is a country, province or city; and c) quantities, such as the population.


    Steiler’s Atlas (1920). Labels indicate place (literal text), categorize the type of place (typeface), indicate the level of political administration (underlines ordered dash, single, double), and population size (ordering of case, italics and size). From davidrumsey.com

  3. Scope. Type attributes may extend across a sequence of letters. They scope of the type attributes may apply to whole words (as on the map); could apply to a subset of letters within word (for example, to indicate silent letters in words such as though and answer); extend across multiple words (e.g. “There goes the HMS Titanic.”); or even extend across lines, paragraphs or portions of a document.

So What?

This creates a multi-dimensional space for design exploration: attribute x data-type x scope, which we can then use to consider some interesting new kinds of visualizations. For example, we could apply literal text to a line in a line chart (alphanumeric text x literal data x sentence). Why bother using a tooltip or creating a visually separate legend, when the content can be directly embedded in the line?

Line chart showing retweets over time for some top tweets about Trump from late Aug 2015.

Line chart showing retweets over time for some top tweets about Trump from late Aug 2015.

Or, we can vary a type attribute, such as weight to indicate word frequency. For example, the chart below indicates how frequently adjectives are associated with characters from Grimms’ Fairy Tales (i.e. font weight x quantitative data x word).

Font weight indicates the frequency of adjectives associated with characters from Grimms Fairy Tales. Kings are old, princesses are beautiful and girls are little.

Font weight indicates the frequency of adjectives associated with characters from Grimms Fairy Tales. Kings are old, princesses are beautiful and girls are little.

How I got to this framework and lots of other examples – both historic and new types of visualizations – are discussed in more detail in this journal article Using Typography to Expand the Design Space of Data Visualization (html version, PDF version), which was just published in the open-access journal She Ji: The Journal of Design, Economics and Innovation (here).  

Posted in Data Visualization, Design Space, Font Visualization, Text Visualization | Leave a comment

500+ years of increasing separation of text from visualization

In the beginning typography and infovis were tightly integrated. In this illuminated manuscript from circa 1480, the biblical text and the genealogical tree interwoven. Text in the visualization is not just simple labels. While bold, italics and sans serif don’t exist yet to create differentiation, here the text varies in color. Some node labels are plain black, some are red, some start with a red initial (and node size, outline color, outline shape all vary too). Similarly, the explanatory text is woven around the graph: it’s not separate to the visualization. These same kinds of relationships between text, typographic attributes and visualization can be seen in other medieval visualizations and tables (e.g. see more examples at Bodleian).

Genealogical tree from late 1400's. Note graph nodes use of image (people, shield) or text, where text may be black, red or start with a red initial. The nodes can vary in size, color, or shape (circle, crescent, shield). Textual commentary is intertwined throughout.

Genealogical tree from late 1400’s. Note graph nodes use of image (people, shield) or text, where text may be black, red or start with a red initial. The nodes can vary in size, color, or shape (circle, crescent, shield). Textual commentary is intertwined throughout. Via Bodleian library.

Step forward a century to the proliferation of the printing press and movable type. With movable type it is easier to set an entire page of type, but more difficult to set type and images together. It’s hard to get the image (an engraving such as a woodcut) to work together with movable type. It’s difficult to configure a page to get the components to lock together, difficult to get the ink to spread evenly, difficult to set everything to the same height. It’s hard to use color – that’s two separate pressings or laborious masking of different areas for ink to be spread. So images start to move into separate blocks or separate pages. Words within images now need to be carved, in reverse. It becomes simpler to create an image without text (or very little) and make reference to the image from the text or with captions.

1573: Image separate from text. from William Bullein, A Dialogue… Against the Fever Pestilence. @ Bodleian.

1573: Image separate from text. from William Bullein’s A Dialogue… Against the Fever Pestilence. Author photo from Bodleian exhibition “Shakespeare’s Dead“.

By the time of the Enlightenment, images have become beautifully engraved plates executed by skilled engravers. Diderot’s famous Encyclopedia (1751-1777) has beautiful images and a wealth of text – completely separated. Plate numbers and key letters on the images provide the sole reference to relate the detailed text pages to the plates, which are in separate sections in the bound volume.

Diderot's Encyclopedia has great illustrations of various occupations - all neatly labeled, but the viewer has to cross-reference the text to understand.

Diderot’s Encyclopedia has great illustrations of various occupations – all neatly labeled, but the viewer has to cross-reference the text to understand.


Bring that forward another century and you see statistical graphics. Like the earlier Enlightenment illustrations, text is separate from charts. Within charts, text is minimal – pushed to the edges (title, axis labels) and maybe the occasional label on a line internally, carefully placed to avoid colliding with a grid line.

1930 book explaining charts. Text is pushed to the periphery of the chart. (T.G. Rose, Business Charts, 1930)

1930 book explaining charts. Text is pushed to the periphery of the chart. (T.G. Rose, Business Charts, 1930)

Information visualization utilizes many of the techniques from charting and statistical graphics. In general, most of the text is pushed to the edge in information visualizations. Yes, there are many news infographics where text is integrated into the visualization. And there are text visualizations where the entire visualization is made of text (e.g. tag clouds) or perhaps labels on markers (e.g. graphs). But there’s still gaps. From the infographics perspective, text is typically hand-crafted annotations carefully placed around the visualization. From the information visualization perspective, the text is limited – i.e. usually labels. Detailed text might be accessible via a tooltip, but tooltips are slow and if you don’t focus on the particular item then the tooltip content is not available. Detailed text might be visible in another linked panel (think Google finance charts), but this requires cross-referencing back and forth between two different visuals. This cross-referencing is a point of slowness (e.g. see Larkin and Simon’s Why a picture is sometimes worth 10,000 words). In a few instances, a full sentence might make it into an information visualization (e.g. newsmap.jp), but even these have various issues (e.g. newsmap has many headlines too small to read) .

Should the medieval visualization be dismissed as an early visualization created with limited tools; or should it be considered an exemplar of how visualization, text, imagery and typographic attributes can all be used together to create a clear communication of complex data. And furthermore, the medieval scribe achieved this using only a pen while we have incredible computing resources. If the medieval example is considered a goal, then the question is:
How can we move towards automated information visualization with rich textual information directly integrated into visualizations?

(This post was inspired by discussions at TDi2016 Reading University and exhibits at Bodleian Library).

Posted in Data Visualization, Text Visualization | Tagged | 2 Comments

Venn Diagrams enhanced with Typographic Attributes

Here is an example visualization illustrating a potential use case where typographic attributes add functionality and usefulness beyond a familiar representation such as a Venn diagram. In the case of set visualization, typography has many different attributes (e.g. weight, italic, case, font family, underlines, capitalization, and so on) that can be combined together to indicate membership in multiple different sets. Furthermore, attributes such as font-family, can be used to indicate membership across different categories within a set, not just binary membership.

U.S. House of Representatives 4-Way Venn Diagram

Below is a 4-way Venn diagram that includes the name of every member of the U.S. House of Representatives. At a high-level there are 4 bubbles indicating 4 different sets: gender (blue/pink); party affiliation (red stripes for Republicans); race (freckle dots for white) and multiple terms (light green):

4-way Venn diagram of U.S. House of Representatives with members indicated in stacks of text.

By using stacks of names, you don’t even need to read the names to make high-level macro-comparisons. It’s easy to visually compare the relative heights of stacks to gauge the approximate number of representatives in each set and set intersection. So, this Venn diagram indicates some discrepancies between the proportions of the elected representatives and the general population. For example, women (in pink) seem to be under-represented, assuming that women make up approximately 50% of the population. There is also a bias in ethnicity. Democrats (top half) have a larger number of ethnic minorities, particularly in the far right column (serving more than a single term), whereas Republicans have few ethnic minorities.

Micro-details. Close-up you can read the name of each member. Unlike a simple Venn diagram, all the individual elements are visible (congress members), and named. Read names directly without relying on tooltips. Search for a specific name (browser search e.g. ctrl+f in Chrome on Windows). Click any name for details. I.e. there are lots of benefits to making names available.

Plus, with all the detailed names, we can also use font attributes to reinforce the Venn memberships, plus add additional data. So, for example, Republicans are right leaning italics and Democrats are left leaning italics. Here’s a closeup:

Closeup of names on Venn diagram. Left leaning indicates Democrats, right leaning indicates Republican. Plain text indicates white ethnicity, etc.

Gender is pink/blue; party affiliation is left leaning for Democrats and right leaning for Republicans. Members serving multiple terms are bold. Those with white ethnic background are in a plain sans-serif font. But there’s more…

Beyond 4 sets. Venn diagrams do have limitations: they can be difficult to show more than 4 different sets. While feasible, it may be difficult to distinguish set membership following complex outlines. Instead, additional font attributes can be used.  For example, those over 65 years of age are all caps; and those with higher education are indicated with underlines. Note that there isn’t a separate Venn bubble for these attributes – these represent memberships in 5th and 6th sets.

Beyond binary membership. Venn diagrams also do not qualify between different categories within a set. For example, for ethnicity, different ethnic backgrounds are differentiated in the data, but a Venn only indicates on category (e.g. white or not-white). In this example, white politicians are in a plain sans serif font, while non-whites have more diversity: a script font for Latino, a serif font for Asian Americans and a block font for African Americans. An additional level of information is revealed.

Demo! Here’s the URL for the interactive version: http://codepen.io/Rbrath/full/QEGBOo/

The demo version lets you toggle on/off different ways of showing set membership. Do you notice the difference in font? Toggle any text feature button and notice how the labels adjust appropriately.

About the code and demo. This is the first post with a functioning demo. It’s is not meant to be an example of good programming – the code is prototype-grade code, which means just enough coding to get it running and not bothering to go back and clean it up. It should ideally be more flexible with data, for example, allowing the user to pick and choose which attributes to use for any Venn set. Nice future features would be to use a more generalized Venn diagram, capable of using circles or ellipses rather than rounded rectangles. This would require some more intensive computational geometry which is an exercise left to the reader – I haven’t looked into it, but Jonathan Feinberg’s approach used in Wordle may be a good starting point.

Data is from Measure of America and Wikipedia . Feel free to reuse any concepts. Please cite: Richard Brath. Typographic Sets: Labelled Set Elements with Font Attributes. in SetVR 2016 International Workshop on Set Visualisation and Reasoning (2016).

Posted in Data Visualization, Font Visualization, Set Visualization, Venn Diagram | Leave a comment

Noticing a Difference vs. Decoding

I’ve had a number of papers rejected where I’ve varied multiple (font) attributes within a single visualization – jump ahead to fig. 6 for an example. There are members in the visualization community and the typography community who have reservations about varying too many things at once. However, there can be some cases where multiple variations are actually useful to tasks such as noticing similarities and differences between elements, as will be explored in this post.

Noticing a Difference

There have been many psychology experiments looking at preattentive perception. Healey has a great summary.  When presented with some kind of visual search task, the requirement is to determine (quickly) whether or not a particular target exists. This is a bit like looking for the thing that is different among all the other things. There’s a lot of nuances in this research, for example some types of visual cues can be perceived more quickly than others. Some types of cues are not symmetic, e.g. finding an Q in a field of O’s is faster than finding an O in a field of Q’s:

A Q in a field of O's is easier to find than an O in a field of Q's.

A Q in a field of O’s is easier to find than an O in a field of Q’s.

The same kind of nuances apply to font attributes such as bold or italic as well.  Here’s an interesting example using font weight.  In fig. 2, finding the different name in the first and second examples is easy because the difference in weight between plain and bold in the font family Segoe UI is significant. However in the third column, weights used are plain and light variants in the font family Segoe UI, which don’t have as much differentiation. It’s reasonable to assume that finding the light text using the font Segoe will be slower than finding the bold text.

Find bold in plain or vice versa is fast; but finding lightweight in plain will be harder.

Fig. 2. Find bold in plain or vice versa is fast; but finding lightweight in plain will be harder.

In addition to the font attribute, the choice of font family may impact the degree of notice-ability. Fig 3 shows italics. Typographers already think italics are a quieter form of emphasis than bold. And Stroebelt’s research seems to confirm this too.  On the left, italics are used in the sans-serif font Segoe UI, wherein the italicized form of the font is very similar in shape to original font with an oblique skew, i.e. a geometric transformation of the letters* (see footnote). At the far right is an example using Garamond, which, like most serif fonts, supports true italics, wherein the shapes of the italic letters are different from their upright counterparts. Presumably italics set in Garamond pop-out more quickly than italics set in a sans-serif. This is purely conjecture and not actually proven. It would be very interesting to test and see if the results confirm the hypothesis.

Presumably the noticing the difference in the sans-serif font Segoe (left) is not as fast and effective as noticing the italics in the serif Garamond font.

Fig. 3. Presumably the noticing the difference in the sans-serif font Segoe (left) is not as fast and effective as noticing the italics in the serif Garamond font.

This can be explored with each font attribute. Fig. 4. shows some examples using font family and case. Noticing blackletter in the middle of sans serif (left top row) is a lot easier than trying to find the serif in the middle of sans serif (right top row).

Spot the difference in font family (top row) and case (lower row).

Fig. 4. Spot the difference in font family (top row) and case (lower row).

Interference among multiple attributes

Search tasks become more complicated when multiple attributes are varied among each element. Some visual attributes can interfere with the ability to quickly detect the target attribute. Search for a combination of attributes is difficult. The classic example is the interference between shape and color. Fig. 5. asks you to find a specific club in a field of spades. Mixing multiple attributes can make it harder to locate the target.

Finding a combination of attributes can be difficult.

Fig. 5. Finding a combination of attributes can be difficult.

Difficulty decoding multiple attributes

Another challenge with multiple attributes is the ability to decode. If only a single attribute varies, like the examples in fig. 1 -4, remembering what the attribute means is easy. However, when many different data attributes vary, it can be more difficult to remember, given the limitations of short term memory. E.g. fig. 5. right could be described as a collection 35 people, with red indicating republicans, spades indicating men, and underlines indicating high wealth. Seeing a specific marker, e.g. red club with underline, requires one to recall each mapping to decode. With each additional data attribute the cognitive load increases, the time to decode increases and the chance of error increases.

Spotting Differences

However, the task may not be searching and locating targets, nor deciphering the encoding for a particular glyph. Sometimes the task may simply be to assess whether an item is the same or different compared to its neighbors. The alphanumeric map in figure 6 of UK postcodes varies italics, font weight and case.

UK post code areas indicating data via font weight, italic, and case.

Fig. 6. UK post code areas indicating data via font weight, italic, and case.

Even without knowing the encodings, we can ask simple questions such as whether a particular location is similar to its neighbors. For example, NG is similar to S and LE (near center top) or CA is similar to LA, DL, TD and FY (near top left).

We may also get a sense of the degree of difference between items. Near center bottom we can see wc. Immediately above is nw, in a slightly heavier font, while to the right is EC, in an upper case font. Above left from wc is HA, varying in both weight and case, indicating more difference than the previous two comparisons. This notion of degree of difference is also completely untested. Some differences may not even be noticed (see research on change blindness).

A Difference is Insightful Information

Depending on the task, seeing differences may adequately solve requirements, as illustrated in the previous example. There may be various applications where noticing differences is a relevant tasks, for example, understanding differences between elements in a items in a scatterplot, glyphs on a map, nodes on a graph, or possibly infographics (think Isotype, where pictographs may be combined of a number of elements).

Sometimes there may be additional tasks, such as a task that requires accurate decoding. This can be facilitated in many ways, such as providing a legend or providing interactions such as tooltips.

One could make a similar argument to use interactivity to reveal similarities and making all the visual attributes uniform. For example, pointing at NG in figure 6 could be used to highlight the similar S, LE, B and CF – no need to adjust italic, weight and case. While this is feasible, the viewer loses the ability to see patterns serendipitously. By making the differences visible, one can see patterns much more readily than relying on slow mouse movements across all the possible combinations and permutations. Of course, both techniques (visual encoding and interactive highlighting) could be used together to improve the overall effectiveness.

*Note 1: Segoe UI, unlike many sans serif, does have italics which vary letter form. Compare Segoe UI lowercase a and l in their plain and italic forms to see the difference. In this example, however, the variation between Segoe plain/italic is not as pronounced as the variation between Garamond plain/italic.  

Posted in Alphanumeric Chart, Data Visualization, Font Visualization, Search, Text Visualization, Visual Attributes | Tagged | 1 Comment

Alphanumeric Financial Charts

Financial charting has long used alphanumerics as point indicators in charts. One of the oldest I can find is Hoyle’s Figure Chart (from The Game in Wall Street and How to Play it Successfully: 1898) which essentially plots individual security prices in a matrix organized by time (horizontally) and price (vertically).

An early figure chart. Time is implied horizontally, price vertically. A numeric "figure" is recorded for each price that occurs for each day.

An early figure chart (from Hoyle: 1898). Time is implied horizontally, price vertically. A numeric “figure” is recorded for each price that occurs for each day.

This textual representation evolved over the decades. By 1910, Wyckoff (Studies in Tape Reading: 1910) was creating charts where x and y are still time and price, but he was writing down volumes instead of prices, and connecting together subsequent observations with a line.

Wyckoff's figure chart records rising and falling prices in adjacent columns. For each price level he records the volume figures and connects together the sequence with a line.

Wyckoff’s figure chart records rising and falling prices in adjacent columns. For each price level he records the volume figures and connects together the sequence with a line.

By the 1930’s these had evolved into early point and figure charts, such as can be seen in DeVilliers and Taylor (Devilliers and Taylor on Point and Figure Charting: 1933).  Columns use X’s to plot prices and other characters to denote particular price thresholds.

DeVilliiers and Taylor's Point and Figure chart (1933).

DeVilliiers and Taylor’s Point and Figure chart (1933).

These charts look pretty close to modern financial point and figure charts. Now we typically use X’s for a column of rising prices and O’s for a column of falling prices, and other character may be used to denote particular time thresholds (e.g. 1-9, A-C to indicate the start of each month).

Modern Point and Figure chart, via Wikipedia.

Modern Point and Figure chart, via Wikipedia.

Other alphanumeric charts evolved along the way as well. Here’s an interesting depression era chart plotting a histogram of states based on state unemployment rates. Like Wyckoff, the author seems to be interested to keep the alphanumerics inside circles. Also, note standardized 2 letter codes for states did not yet exist – states are numbered instead. (from W.C.Cope’s book Graphic Presentation: 1939).

Distribution Chart made of stacked characters. Note additional information encoded in shading and added markers.

Distribution Chart made of stacked characters. Note additional information encoded in shading and added markers.

Fast forward to the 1980’s, and we have Peter Steidlmayer’s Market Profile (R) charts that appear reminiscent to the alphanumeric distributions seen in the depression era chart. In these distributions, the alphanumeric value represent times when a security traded at a specific price. Depending on the timeframe of the chart different mappings may be used. One common intraday convention is to use characters A-X and a-x to represent half hour intervals throughout the day, with a split from uppercase to lowercase at noon.

Very basic Market Profile chart

Very basic Market Profile chart

There are many, many variants of market profile charts now e.g. sierrachart.com, windotrader.com, bluewatertradingsolutions.com, prorealtime.com, cqg.com, etc, etc. Given the many possible data attributes and analytics that one might associate with a character in a chart, it can become a challenge to encode them. As a result, one can find interesting variants. Beyond position, letters and case:

  • color: of the foreground letter or background square
  • bold: to indicate a row or potentially as a highlight to one time interval, e.g. MarketDelta
  • superscripts: e.g. eSignal.
  • added symbols: asterisks, less than, greater than, etc.
  • added shapes: circles and diamonds
Market Picture Variants

Many variants of Market Profile (R) charts by various vendors. Note all the additional information added via foreground/background color, bold, superscript, etc.

Jesse Livermore (How to Trade in Stocks: 1940created his own variant of alphanumeric charts stripped down to tracking only the minimums and maximums, discarding the intervening levels and using color and underlines to indicate information.

Livermore strips down charts to a simple table recording only the local minimums and maximums, using different colored text and different colored underlines.

Livermore strips down charts to a simple table recording only the local minimums and maximums, using different colored text and different colored underlines.

One interesting discussion point is the actual use of these charts. Whenever I show these charts to the visualization research community, people are aghast and suspect. There’s so much going on in these charts, so many different things being shown simultaneously, that they don’t believe that people actually use them or that somehow these charts can’t be perceptually efficient.

On the otherhand, I’ve talked to people who’ve traded off these charts their entire career. They see patterns and pick out things immediately at very different scales: individual outliers, columns of a particular letter, the shape of a distribution, and so on. Much like an expert chess player, these market participants have learned these charts, know how to interpret them, and use them to make trading decisions.

To be fair, not everyone in the visualization community is shocked: some are genuinely curious. Instead of reducing visualizations down to just one or two attributes, here’s something heavily loaded with a lot of visual attributes. And it’s not a static poster where you have no interaction: these are on computer screens packed with interactive features. In spite of all the computational ability to filter and reduce, here’s a community that that has these densely packed charts. People are actually using them to see macro patterns (shapes of distributions) and micro readings (individual characters), but they are also able to attend to intermediate patterns such as particular letters within a distribution. Perhaps they aren’t seeing patterns as fast as preattentive recognition, but they are still seeing patterns quickly with this external cognitive aid. There’s still more that the visualization community needs to understand about expert users.

Posted in Alphanumeric Chart, Data Visualization, Font Visualization | Tagged , , | Leave a comment