Shapes or Alphabetic Point Marks?

In some visualizations, such as scatterplots, a visualization designer might use different shapes to encode categoric data. Abstract shapes such as circles and squares can be used, but in practice, many visualization systems have a limited number of shapes (e.g. 9 in Excel, 10 in Tableau, 7 in D3.js). What if you need more?

Pictographic icons can be used, but are difficult to design for abstract concepts (e.g. GDP, CPI, or a list of cities); are not intrinsically orderable; and may be ambiguous (e.g. see Clarus the dog-cow, an early Mac icon). More important, using pictographs can be problematic, the difference between two pictographs might be subtle and require close inspection.

Wouldn’t it be nice to have a ready-made set of 25 or so simple but very different shapes available to use?

Many categoric shapes, same aspect ratio, same area

What are the design criteria for these shapes:

  • Same area. You don’t want some to be big, and some small: If there are two clusters, each with 10 items but different shapes, you want the total ink to be the same.
  • Square aspect ratio. You don’t want some shapes to be really long, some to be really tall. You still want to be able to quickly scan and find a minimum or a maximum without being fooled by shapes that are stretched out.
  • Different. You want these shapes to be different, because they’re encoding categoric data. Each category is different. So, how do you get a bunch of shapes that are maximally different?

The last criteria is hard to solve for. It asks “What is shape?” The answer is longer than a blog post. But you want variation in tangible shape-like attributes such as curvature, angle, convexity, orientation, corners and so on.

Procedural Shapes

One approach is to procedurally generate a bunch of different shapes. This sounds like a good idea – until you try to generate 25 unique shapes. Here’s a naive set of 18 procedural shapes. It starts with a square (bottom left) and replacing corners of the square with a diagonal edge, a radius, and so on:

Yes, all these shapes are different, but they’re underwhelming. They are all arbitrary – and other than the square none of them look like anything. And they aren’t that different – no convexity, all smoothish edges, and so on.They all look like bits of wood left on the floor of the woodshop. They aren’t recognizable or nameable.

Nameable Shapes

Perhaps another criteria — an unproven hypothesis — is that we’d prefer the shapes to be recognizable and nameable. Think about color – we tend to use colors such as red, blue, orange, green, black in visualizations. We tend not to use colors such as burnt umber, raw sienna, charcoal, chartreuse; nor patterns such as plaid, houndstooth and polkadots.  Things that we are more familiar with are easier to recognize and differentiate: we already have a slot for it in long term verbal memory. So, for nameable shapes, ideally we’d like abstract shapes, so they are not too finicky, complex and difficult to use at small sizes. But we do want them to correspond to nameable things, so they need to be really simple and different.

So here’s 27 highly differentiated, nameable shapes, all with roughly the same aspect ratio and area:
They seem more different than the procedural shapes. The nameable may be a bit dubious:
the top row is more nameable than the bottom row.

Alphanumeric Shapes

Having worked the last 6 years with text and visualization, it now seems obvious that another set of 26 squareish, similar area, nameable shapes are Latin uppercase characters:

These are Source Code Pro – a fixed width font – so the area should be highly similar between each glyph. And uppercase so they are all the same height (except for the Q in this font). And having been tuned over 2000 years, perhaps they have naturally evolved to be maximally different? Furthermore, since we read millions of letters, we have highly tuned our visual systems to recognize them.

Which one to use?

Alphabetic shapes or nameable shapes? Which to use? We could subject them to tests, to make sure that they work at small sizes and remain clearly different:

The green shapes aren’t quite as robust – the rounded rectangle and the square are too similar. Some fine tuning may be required.

Ideally, it would be great to run some usability studies to see which work better.

Thoughts? I’m also curious as to what you might name the green shapes, feel free to name them all in the comments.

More info: For more in depth look at some really interesting glyph research, take a look at Eamonn Maquire’s PhD thesis and Reta Borgo et al’s state of the art report on glyphs.

 

Posted in Alphanumeric Chart, Data Visualization, Shape Visualization | Leave a comment

Awesome periodic table with aligned bars per cell

Periodic tables of the atoms are great visualizations. Much has been written about Mendeleev’s periodic table and other tables that organize atomic data. The periodic table is a powerful tool because the elements are organized and aligned by commonalities, enabling prediction of unknown elements in the early usage of periodic tables.

While looking at various tables regarding use of text to visualize data in tables, I stumbled across this periodic table by Henry Hubbard and William Meggers (1963) at the Smithsonian:

Periodic_Table_Hubbard_Meggers_1963_Smithsonian

Data dense periodic table from Hubbard and Meggers 1963.

Most periodic tables show only a few attributes per element, such as the atomic symbol, the atomic number, and the name. But there are many more data attributes per element such as expansivity, compressibility, ionization potential, atomic weight, isotopes, crystal form, orbits, magnetism, state at room temperature, melting point, boiling point, atomic radius, and so on. What’s really interesting in Hubbard & Megger’s table is that they pack in all of this information into each cell using various visual cues, as shown in this blurry legend from an earlier edition:

 

Periodic_Table_Hubbard_Meggers_Legend

Each table cell is packed with data.

Cell’s have text and numbers like many modern periodic tables, but they also have bars around the perimeter and triangular markers indicating quantitative values, plus dots, symbols and diagrams.  One may wonder:

Why is the quantitative data represented as bars around the cell, and not just numerical data?

Recall that the periodic table is organized so that rows and columns organize elements by commonality. By using bars, visual comparisons can be made along a row or column. Here is a redrawn simplification of the first column from this chart:

Periodic_Table_Hubbard_Meggers_Col1_redraw

Closeup drawing from Column I showing bars and triangles around the perimeter and an overlay line showing trend.

This redrawn closeup is focused on the quantitative graphics around the perimeter of the cells. For example, the bottom bar on a cell shows the ionization potential in bright orange. A viewer visually attending to these orange bars can compare this quantity within a column by scanning vertically (as shown by the overlaid dashed orange line). In effect, this creates an embedded bar chart that spans across the cells – as shown by the overlaid orange dotted line. It is highest for Hydrogen (H) at the top of column, then decreases down successive elements in the column to Cesium (Cs). The next element in column, Francium (FR), has no bar, as presumably this value has not been measured when this chart was published; however, by observing the trend, one might predict the value for Francium.

Similarly, the top bar per cell can be visually scanned to show a trend (as shown by the overlaid dashed green line). In addition to the four perimeter bars around the cell, there are also tiny triangles that float along each edge, showing other quantitative variables. For example, the triangle on the right edge indicates specific heat by its vertical position. These can similarly be compared across cells.

Note that horizontally oriented bars better facilitate comparison within a column than across a row. That is, horizontally oriented bars share a common baseline along the left edge of the column. A common baseline allows for more accurate comparisons of quantities than bars that do not share a common baseline (Cleveland and McGill 1984, or Heer and Bostock 2010). It is unknown how Hubbard and Meggers specifically chose which variables to place horizontally and which to place vertically to facilitate columnar comparison and row-based comparisons.

The notion of creating these aligned marks in the context of other data seems to be an interesting idea for both packing a lot of data into the visualization while at the same time organizing the data to facilitate visual comparisons and projections.

 

Posted in bar chart, Data Visualization | Leave a comment

Revisiting Maps for Inspiration

I write a lot about typography and visualization. It all started with critically looking at maps and noticing differences between modern visualization and old maps. I did a PhD looking at typography, text and visualization. (Stay tuned, there will even be a book in late 2020 about visualizing with text – with many new visualizations beyond what I had in my thesis!)

Back to maps. I was invited to speak at ESAD Valence about visualization and I decided to take a break from book writing and revisit the original inspiration: maps. Cartography has different rules than visualization, a much longer history, and many different techniques readily visible. So, I cobbled together some of my favorite maps to talk about and point out some observations.

Gough Map, 1360

The Gough map is a wonderful medieval hand-drawn map. Rivers are diagrammatic starting as bullets and flowing in almost straight lines. The iconography for towns varies from simple sheds, to an added cathedral tower, to a cluster of small buildings, to the walled city of London.  Typographically, it’s interesting with an ordering of labels. While most towns are labeled in brown, London is literally labelled in gold. Distances between towns are labelled in red, and counties are labelled in red with boxes (e.g. Suffolk).

Map_Gough_1360.png

The Gough map. London is literally labelled in gold.

Munster’s Geographia Universalis, 1540

Skipping ahead two centuries, Munster’s maps from Geographia Universalis (1540) are interesting maps at the transition to the printing press. Like the medieval Gough map, rivers, mountains and towns are highly stylized forms and pictographs, which are combined together with typographically differentiated text in italics, caps and roman. Although the geographic map is a woodcut, the lettering is highly uniform and likely metal type composed together with the woodcut by a form cutter. The resulting aesthetic balances the rougher shapes and textures of the woodcut with the fine metal letters plus some ingenuity by the artisans to get it all fit together. Towns are consistently horizontal but labels are angled to fit, such as Vincentza turned almost upside down:

Map_Munster_1540_2.png

Munster’s maps: woodcuts plus text.

Willem Janszoon Blaeu, 1629

Engraving enabled much finer detail than feasible with woodcuts: both the topography and the labels could be engraved in detail. Willem Janszoon Blaeu‘s maps have an expanded set of iconography, now reduced even smaller to tents, pyramids and tiny houses. The path of rivers is more accurate and mountains have shading. The engraved text now has more opportunity for variation. River labels more closely align with river courses. Labels corresponding to areas are larger and spacing starts to increase (e.g. D A N).  Plus many other text variants (size, case, italics) differentiate between names of towns, cities, provinces and regions.

Map_Blaeu_1629.png

Blaeu’s engravings: more detail and more text variation.

Crome’s Neue Carte von Europa, 1782

Crome creates an early thematic map, Neue Carte Von Europa, showing location of different crops, livestock and minerals in Europe in 1782 (previous post). An even wider range of icons are now required to indicate all the different types of resources: gold, silver, copper, zinc, iron, mercury, marble, fruit, honey, salt, rice, fish, wood, horses, pigs, etc. — 56 different types of commodities. After running out of icons, two letter codes are used, e.g. Kr for cork, Tb for tobacco, Cr for currants and so on.

Thematic_Crome

Crome’s map filled with icons and alphanumeric codes.

Sherman’s map, 1864

During the U.S. Civil war, general Sherman lead his army deep into the Confederacy, far beyond his supply lines. Sherman’s map combines traditional topographic detail with an overlay of resources summarized from the 1860 census. Starting with a base map showing counties, cities, rivers and railroads, an additional 15 variables of census data are added regarding the quantitative resources available: population, livestock, and agriculture. The map provide Sherman with the ability “to act with confidence that insured success.” As an early datamap for analytical and planning purposes, it shows the value of depicting many dimensions of data simultaneously, to aid in trade-off decisions, such as food available, potential resistance and potential supporters.

Map_Sherman_1864

Sherman’s map: 15 quantitative resources per county.

Ordnance Survey, 1921

Modern maps, using printing presses, reach a high in the early 20th century for the amount of information packed into them. Ordnance survey are a favorite for the amount of information that they pack into each label. In this example from the early 1920’s, place names vary capitalization, italics, size, font family (plus the actual name) to indicate 5 attributes per label (legend here).

Map_Ordnance_Survey_1921.jpg

Ordnance Survey: 5 variables indicated per name.

Steiler’s Atlas, 1924

Similar to the Ordnance survey, mapmakers on the continent also created maps with high-dimensional labels. Stieler‘s maps are typographically interesting as the labels use an ordering of underlines (dot, dash, solid, double solid) to indicate cities with different levels of governance (e.g. capital of a county, province or country). Also, backward italics for water features, curved and spaced test to indicate area features, and so on.

Map_Stielers_Atlas_1925_2.jpg

Reverse italics, multi-level underlines, and more.

 

FAA Aeronautic Chart, 2019

Here’s a map that’s only a few months old from FAA.gov, and packed with a phenomenal amount of information for pilots. There are many different classes of information, visually distinct from each other. The base map has topographical shading in hilly areas, bright yellow in urban areas. Overlaid are blue and red layers, each with a wealth of information regarding the corresponding airport, runway configuration, airspace, routes, waypoints, radio frequency, visual markers such as stadiums, wide turbines and bridges, and more. Icons and alphanumeric codes are heavily used to compact data for expert users. All text remains legible, with the background/basemap largely being light/bright upon which other layers can be superimposed, and if needed, some text is set with light halos.

Map_FAA_2019_SFO.jpg

Aeronautic chart, packed with relevant data for navigation.

So what?

Even though most people might think of Google maps these days, with minimal representation of roads and highly undifferentiated labels, the history of maps shows far richer solutions packed with many layers of information. These much richer maps, like the aeronautic chart and Sherman’s map, show that there are uses and applications where people need more information than only a couple classes of information within one visualization. And all the examples here show how all this extra data can be communicated with labels, symbols, lines, layers and more.

So, where and when could scatterplots, timeseries charts and treemaps add many layers to increase their information content and aid new analytical uses?

 

Posted in Data Visualization | Leave a comment

Bertin’s Reorderable Matrix

I recently had the opportunity to attend a workshop at ESAD Valence. To my surprise, in their collection, they have original parts from one of Bertin’s reorderable matrix!

Bertin_Matrix_Blocks_Box.PNG

I had the opportunity to use the rebuilt matrix at VisWeek in Paris 2014. I’ve simulated the matrix using Excel macros and Excel conditional formats. Essentially the reorderable matrix is a physical visualization that takes a table of structured data and enables resorting of rows and columns based on data values to reveal clusters. Each block shows data on the top surface which represents a numeric value varying from the lowest value (full white) to the highest value (full black) and various textures inbetween. The user can then shuffle (i.e. reorder) full rows or full columns to regroup the data based on values so that clusters visually appear (Bertin called the process diagonalization, see the video). It’s a human-powered physical clustering algorithm.

This particular version is made with tiny plastic blocks, about the size of Lego 1×1 bricks and sound the same as Lego when they jostle in the big bag of bricks (Bertin called them dominoes). I arranged a few on a desk into a matrix (the connecting rods weren’t available). You can see how patterns of all black, textured, and partially textured surfaces are highly visible:

Bertin_Matrix_Top.PNG

One really interesting aspect that I noticed is the colored edge stripe on some of the bricks, seen in the picture below (and quite noticeable in the bag where you can see some blocks have bright stripes in green, blue, yellow, orange, etc). I asked, but it was uncertain what their purpose was. The stripes are always on the sides where the rods go in; never the top. I’m guessing that it is some kind of recording system. Perhaps the user would draw a stripe across a row of bricks, maybe as a way to record the state. Since these colors were on the sides of the blocks, they wouldn’t be visible from above and therefore not interfere with patterns and clusters being created.

Another interesting aspect is that both the tops and bottoms of the blocks have the black-to-white texture patterns. We speculated that the blocks were reused from analysis to analysis, and it was easy to code both sides of the blocks. But, maybe there’s more. It would be feasible to re-order a matrix, take some kind of intervention, collect more data, then color the new state on the bottom of the blocks. Then a user could flip over the entire matrix, to see if the pattern had changed in some way. Again, speculation on my part.

The Lego-like aspect also suggests to me that a reorderable matrix could potentially be constructed out of standard Lego-blocks today: a 1×1 with holes on both sides, rods, and tiles in assorted shades of grey. And then concepts about data clustering could be taught in grade school.

Bertin-Lego.PNG

 

Posted in Bertin, Data Visualization | Leave a comment

Visualizations with perceptual free-rides

We create visualizations to aid viewers in making visual inferences. Different visualizations are suited to different inferences. Some visualizations offer more additional perceptual inferences over comparable visualizations. That is, the specific configuration enables additional inferences to be observed directly, without additional cognitive load. (e.g. see Gem Stapleton et al, Effective Representation of Information: Generalizing Free Rides 2016).

Here’s an example from 1940, a bar chart where both bar length and width indicate data:

Walter_Weld__How_to_chart_data_1960_hathitrust2

The length of the bar (horizontally) is the percent increase in income in each industry.  Manufacturing has the biggest increase in income (18%), Contract Construction is second at 13%.

The width of the bar (vertically) is the relative size of that industry: Manufacturing is wide – it’s the biggest industry – it accounts for about 23% of all industry. Contract Construction is narrow, perhaps the third smallest industry, perhaps around 3-4%.

What’s really interesting is that area represented by each bar is highly meaningful: the percent increase x size of industry = total income gained in that industry. For example, the area of Transportation and Contract Construction are perceptually quite similar. This can be validated mathematically, Transportation at 7% increase x 7% industry size, is a similar total income gain as Contract Construction at 13% increase x 3.5% industry size. Or Mining at 9% increase x 3% industry size, is about the same total income gain as Agriculture 3.5% increase x 8% industry size.

This meaningful area is the free-ride. Perceptually, one can directly observe and compare relative areas. Total income gain hasn’t been explicitly encoded, it’s a result of the choice on encoding length and width. If the viewer is potentially interested in total income gain in addition to percent increase and relative size, this is a useful encoding. Total income gain might be very important in government policy, for example, as the total income gain is directly proportional to the taxes generated.

A more common design choice these days might be to use a treemap to show one variable (relative industry size) and color to show the second variable (color to indicate percent increase); like this:

Walter_Weld__How_to_chart_data_as_a_treemap

In the treemap, size and color are explicit, but there’s no free-ride. The combination of color and area isn’t a perceivable combination: the similarity in total income between Transport and Construction is not obvious; nor the similarity between Mining and Agriculture. In the treemap, the area encodes relative size, but the length and the width of the boxes are not meaningful. The color encodes percent change, but color isn’t effective for comparing relative quantities. If total income gain is a desirable insight, then the treemap fails.

Edward Tufte (1983) discusses multi-functioning graphic elements, which doesn’t quite  align with the idea of a free-ride. Johanna Drucker (2014) discusses this notion as generative: a representation that produces knowledge as opposed to a representation that simply displays data. But I like the definition of a free-ride, which succinctly explains the perceptual benefit created by the choice of representation. See Gem’s paper for an example applied to Euler diagrams.

Visualization designers need to consider the free-rides and other perceptual inferences different visualization alternatives provide, and choose among visualizations on how those inferences suit the viewers’ task.

Percent Increase in National Income by Industry is from page 178 in the book How to Chart: Facts from Figures with Graphs, by Walter Weld, 1960. Walter didn’t particularly like this chart, partially because there is no legend nor axis for the widths. Personally, I have seen this type of bar chart used effectively in financial services.

Posted in bar chart, Critique, Data Visualization | 1 Comment

Metabolic Pathways and Visualization Pathways

Metabolic pathway diagrams show series of linked chemical reactions occurring within cells (Wikipedia). These diagrams started more than a half-century ago, such as this example from 1967 in the Smithsonian:

Metabolic6.gif

These diagrams have been continuously expanded over decades as new research identifies new reactions and new connections. A 2017 version at Roche is a massive interactive poster documenting thousands of compounds and reactions:

Metabolic_Roche8

These are extremely interesting visualizations that document the knowledge of a research community showing the connection and flows of chemical reactions.

Could the equivalent exist in data visualization and analytics? The field is growing rapidly and there are many techniques. Like biology, as the visual analytics field grows, it becomes more difficult to keep track of all the evolving techniques. Surely, a similar diagram of data and the many ways it can flow through analytics into visualizations (and other perceptualizations) and interactions – should be feasible and useful for the community. Here’s an attempt to sketch out a bit of it related to data that expresses structures such as hierarchies, graphs or sequences; and corresponding visualization approaches:

Visualization_Pathways_Data_Structures_and_Layout_sketch.png

It’s bit trickier than biochemical processes as there are many-to-many relationships potentially making it overloaded with too many connections, so there’s some editorial or process to determine which pathways to show. And, it’s missing so much, e.g. no interactions, many data analytic techniques, and no visual attributes (color, size, icons, etc). And it’s not obvious how to group visualization layouts, e.g. by mark type, by coordinate system, or maybe by the primary structure that they represent?

Perhaps someone else has already created something going down this path already? If not, is something like this valuable? Let me know.

Posted in Data Visualization, Design Space, Graph Visualization | Leave a comment

Legacies of Isotype

ISOTYPE was a dramatic reconceptualization of statistical graphics in the 1930’s by Otto and Marie Neurath and their collaborators. Contemporary charts, such as seen in Brinton, were mostly black, simple dots or lines, tiny captions and full of dense grid lines, axes, ticks and labels. Isotype instead was bold; almost always devoid of grid lines, axes and tick marks; minimal bold sans serif text; and usually relied on repetition of expressive icons to convey quantities. Compare the two images below. Isotype evolved at the same time as Modernism, where these same ideas — broadly, “less is more” — was applied to many areas of design including architecture, art, dance, industrial design, etc.

How did Isotype’s visual language become diffused across charts, visualization and interfaces over the next few decades? Here’s three:

Pictographic Icons

Perhaps the best known feature of Isotype is the use of pictographic icons. Use of pictographic icons to indicate things became increasingly important with post-war globalization. Pictographic icons are recognizable across language and use less space than long labels. Standardized icons became popular across many areas of society such as highway traffic signs, Olympic symbols, airport signage, warning symbols and so on. And then Mac and Windows used icons as core interaction elements in graphic user interfaces (How many icons are visible in your screen right now? I have more than 125). Here’s a mid-1970’s set of standardized symbols for the US Dept. of Transport:

Isotype_SymbolSigns_CooperHewittOrg_18673291

Standardized icons from mid 1970’s, US Dept. of Transport.

No Grids

The diffusion of Isotype benefited in part from technical changes to printing, moving from metal-based printing (which could handle fine detail) to offset printing (which was based on photographic compositing techniques and this reduced the ability to use fine details such as thin lines and crisp serifs). As such, thin grid lines and small text are more difficult to use than chunky icons, large patches of color and bold, heavy-weight labels. This lines up well with design ideology of Isotype. If we look at some charts from the mid-1970’s, we can see the remains of Isotype — few or no grid lines, minimal text, and expressive pictographs:

Isotype_Graphis_Diagram_1976_p24.PNG

Charts from 1975: low on grids, low on text and some icons (Graphis Diagrams, 1976)

Labeled Values

Isotype worked hard to reduce text, but showing the numeric values seems to be important when we look at charts after Isotype. In the prior image, there are explicitly labelled numeric values in all six charts. Presumably viewers want an estimate of numerical quantities corresponding to the visual marks, and they don’t want the cognitive load of counting icons or guessing the area associated with a circles, folded corners or the relative width of smoke. Or, perhaps icons are difficult to express fractions. Regardless, the addition of numerical values either as labels on marks or labels on axes come back. This was probably one of the first aspects of Isotype that may have slipped — here’s a US Dept Agriculture bar chart from 1950, highly influenced by Isotype:

Isotype_Agricultural_Outlook_Charts_1950_Fuel_p25..PNG

Chart from 1950, highly influenced by Isotype (compare to first pair of images).

It has the icons (although moved to the axis and explicitly labelled), and minimal grids (although an outer frame has been added to the plot area). And it labels the bars. In this chart, like the 1970’s charts, the values are explicitly labelled.

The take-away is that removing value labels completely may have been a bit too far on Isotype’s part. Even Haroz et al‘s study on “Isotype” charts always included quantities along the y-axis in all test conditions. Either a numeric axis or labelled bars or some numeric guidance on the values seems to be broadly desired. We see these labelled values in many charts, such as many Excel charts that label both the numeric axis and number value per bar (3 of the 11 quick styles provide both) such as this one:

Isotype_Excel_Chart_with_Axis_and_Labels.PNG

or the USA Today Snapshots (which use many cues from Isotype, including pictographs, minimal text and no grids):
Isotype_USA_Today_Snapshot_Chart_Parks.PNG
Or in the very first bar chart in the very first tutorial of D3js (“Let’s make a bar chart):

Isotype_D3js_bars_with_values.PNG

 

 

 

Posted in Data Visualization, Isotype | Leave a comment