Visualizing Quantitative Values in 3D

We’re working on a few data visualization projects at Uncharted using VR, AR and 3D printing. Given the rise of these new techniques, it may be time to dust off 3D data visualization (again). What are the use cases where 3D visualization works? What were the things that were difficult with 3D on the desktop that devices or 3D prints might solve? Yes, 3D has issues such as occlusion, navigation, perspective foreshortening and so on. And 3D is already known to be effective for things that are already inherently 3D, such as fluid flow analysis or 3D medical imaging.

For this particular post, I’ll consider some cases where 3D may be effective for visualizing quantities, such as scatterplots, bar charts and surfaces:

1. Length

Length is effective for representing quantities in 2D (Bertin, MacKinlay, Cleveland and McGill, Heer and Bostock, etc, all agree on this). The viewer can make quick comparisons of ratios, for example, to estimate if one bar is twice as long as another bar. In 2D, error increases when base lines are not aligned, but it’s still much more accurate to use lengths rather than, say, hue, brightness or area.

Going into 3D perspective, presumably the error to estimate lengths will increase due to perspective distortion. But is it really that much of an error? There are extremely strong visual perspective cues that we use to facilitate making judgements in 3D spaces. For example, we know parallel lines converge towards the horizon, such as a roadway. Regular patterns, such as the dashed lines, also provide a strong cue – the regularity of the dash pattern in perspective provides a cue for estimating distance.

So error will increase in perspective, but lengths in perspective can still be quite accurate. Consider this old “pin map” from Brinton (from 100 years ago!):

3D_pin_map_Willard_Cope_Brinton_Graphic_Methods_1914_smaller.png

All the pin-stacks are set on a common base. The perspective effect, judging from the base appears to be not particularly distorted. The consistent size of the round pin-heads further increases confidence that sizes aren’t distorted.  A viewer likely has a high degree of confidence to say that the height of Boston is around 2.5x New York.

Compare this to a contemporary 2D map, using bubbles to indicate quantities: 2D_bubble_map_EnvironmentAmerica.png

A viewer likely has much less certainty comparing the bubble in New Hampshire to the bubble in Rhode Island. New Hampshire is bigger but how much? 3x? 4x? 5x? Area is less accurate than length.

2. Perspective is just a log transformation

While some people consider perspective to distort data, it’s really just a log transformation of the entire scene. Log transformations are common in data visualization, except we’re used to transforming only the plot area, not the entire scene. Here’s a bar chart from the 1970’s tilted back in 3D (with a weird bend at the back):

3D_Tilted_Bar_Chart_California_Water_Atlas.jpg

At the front of the scene, i.e. near the base of the chart, we can see more detail than we can see at the back of the scene. Small bars are comparable e.g. in September (far right) the Feather River appears to be 2x American River, which in turn is perhaps 5x Putah Creek. Large values are also visually comparable to other large bars, for example, in April the Feather River is almost 2x the Yuba River. The perspective effect is much stronger in this example, but the strong grid lines and the vanishing effect on the the consistent-width bars are strong cues facilitating estimation:  you can see the dip in Putah Creek from Oct – Nov with values that are in the low 100’s and the slight dip in Feather River Mar – May with values in mid 10,000’s.

You can apply the perspective distortion along the x-axis instead. Here’s a timeseries chart with a few years of daily data:

3D_Tilted_Timeseries_Uncharted_Software.png In the foreground, far right, each day is clearly visible. In the background, far left, individual days are not, in effect compressing time for older date. This time compression is typical in a lot of timeseries analysis: a typical tabular analysis might provide comparisons such as week-to-date, month-to-date, quarter-to-date or year-to-date.

Essentially, this is a focus+context visualization technique (e.g. see TableLens or Fisheye views). The right side clearly shows the discrete daily movement of the price with more than 30 times the 2D area compared to the start of the timeseries which provides the context where daily movement is not clearly visible but the longer trend and broader vertical range is clearly visible.

However, perspective provides additional value beyond other focus+context techniques. A table lens of fisheye are discontinuous in their magnification adding extra cognitive load on the user switching back and forth between the closeup and the context. Perspective provides a continuous transformation across the display facilitating continuous comparison between the detail data and the context data.

Trends across the perspective are clearly visible. For example, a straight line could be drawn from the starting point (at $10 in Jan 2009) to the high point (near $27 in May 2011), and this line would be near to many of the other high points in 2009 and 2010.  And this straight line would remain a valid straight line regardless of the perspective viewpoint.

3. 3D bars may facilitate comparisons

3D bars are commonly used as an example where 3D should not be used. Tall 3D bars in the foreground can occlude short 3D bars in the background. Short bars are more visually salient because they still have a larger graphical area than just their height as their tops are visible. And so on.

But 2D bars also have issues and introduce biases. Here’s a quick example:

3D_bar_chart_vs_2D_bar_chart_smallest.png

In 2D, the bars must be oriented either vertical or horizontal. The orientation introduces bias: it is far easier to compare across bars in columns, than it is to compare across bars in rows. In the 3D representation the viewer can compare by row or column. In 3D the viewer can also distinguish between a zero value (flat bar is in cell B2) versus a null value (no bar). There’s probably a few experiments that could be done here for a keen masters or PhD student.

4. Meshes and Surfaces

Rather than just bars or lines, rectangular meshes are well suited to 3D. When the mesh is spaced at regular intervals, there is a strong perspective cue facilitating comparison across other points on the mesh. Relative heights between points can be assessed. Here’s a couple of examples from Brinton’s book:

3D_Surface_Brinton_1914.PNG

3D surfaces have many modern applications  such as plotting distributions across two variables, evaluating financial derivatives, etc. Here’s an example surface showing the Canadian yield curve (along the right side, i.e. interest rates for one month out to 10 years), and the value of that curve every day over 5 years (left side) (via Uncharted):

3D_Surface_Canadian_Yields_2006-2010.png

The huge drop in short term rates in 2008 is immediately visible as interest rates dive during the financial crisis. Areas where the surface is nearly flat, tilted at an angle, or periods where there are curves and kinks are visible as well. These waves, wobbles and kinks are visible in part due the consistent grid lines of the mesh and color applied to the surface. It is also aided by the careful lighting and material configuration in the 3D scene which creates highlights.

3D Printed Surfaces

Given a data-driven 3D computer-generated surface, why not print it? Here’s the same dataset, as a 3D print (set on a matching laser cut wood box):

3D_Surface_Canadian_Yields_2006-2010_3D_Print_Uncharted.jpg

The grid in 3D print is obtained by changing the print material from transparent plastic to black plastic at regular intervals. While there are no tooltips nor interactive slicing, there are some other observations facilitated with a physical object. It’s tactile – you can feel how the shape changes. Some of the sharp ridges and depths of crevices are more easily explored  as a tactile 3D object. In a physical environment the viewer can easily tumble the object to any orientation without strange keyboard or mouse movements.  And one can easily adjust position of the object to relative to physical light sources to see highlights (or not) or otherwise gain insight to the complex shape. And there is a light in the box to illuminate the surface from behind.

There’s more to 3D than just estimating lengths and heights. Perhaps there are many future blog posts to be done on other aspects such as navigating 3D, text in 3D, mental models in 3D and so on.

Advertisements
Posted in Data Visualization | Tagged , | Leave a comment

Generating stories about data with visualization

Early in my career, I’d create data visualizations and without fail, my manager would ask: “So, what’s the story here?” In data visualization the objective isn’t the visualization – it’s the insight gained from the visualization.

Visualizations don’t announce their insights. Whether dashboards with a couple of bar charts, massively complex visualizations of billions of tweets, or hairball graphs there are many possible insights. Narrative visualization is the addition of a story to a visualization, to explain a visualization and to highlight specific insights. The NYTimes and The Guardian create human-authored narratives to explain insights. But visualizations with human-authored narratives like the Times are a lot of work. Instead, an office worker without a graphics team, might add a paragraph or two on top of a visualization, maybe with a link or two that pivots the view.

Instead, data-driven Natural Language Generation (NLG) can completely sidestep visualization. The approach is to use data and advanced analytics to algorithmically derive the insights, then assemble those insights with computer generated text. Some of the results are impressive, generating not just insights but interesting stories.

But automated insights wrapped in natural language loses all the contextual data. One alternative is to simply add a visualization beside an NLG paragraph. But that requires the reader to do all the work cross-referencing back-and-forth between the paragraphs and the visualization.

Why not automate insights, and put them directly in the charts?

Visualization libraries such as Semiotic have built-in code-driven annotations, so it would be feasible to automatically generate the insight, map that to some kind of annotation, then plot the annotation. This sounds great, but before we can do that we need to know:

What are the kinds of insights that work well with annotations?

This is something that we’ve done in a number of different ways on different projects over the years. Looking back, there are some common patterns, such as an insight about a specific data point, about the plot area or some event data:

 

Insights about data points

Scrutinizing specific data points is a common task in data visualizations. These data points might be the extremes (to identify who the leaders and laggers are); or perhaps outliers (to validate that the data isn’t erroneous); or may be benchmarks that help orient the viewer in the data (like a landmark). Labeling points is straight-forward in a variety of different visualizations, as these diagrams suggest:

Annotations_about_data_points.PNG

 

Insights about the plot area

In other cases, the desired insights are at an aggregate level. For example, understanding the range of the data is a common task. How big is the difference between the biggest and smallest data point? If I look at a stock chart there’s a big difference between a stock that has a 2% range vs a stock that has a 80% range and a big difference to how an investor responds to that magnitude.

Trend is related, and there are many ways to potentially measure trend, such as average, last-first, regression, moving average, curve fitting and so on.

Sometimes the challenge isn’t about the data, but what the semantics of the plot are: scatterplots can be challenging because the sweet spot requires some cognitive effort to determine the meaningful combinations among coordinates. Instead highlight areas on the plot or using contours are effective.

Another pattern, common in sports commentary, is the threshold: such as the sports superstar approaching the all-time record. This is easily translated into a visual annotation such as a line:

Annotations_about_plot_area.PNG

 

Insights associated with an event

Sometimes the insight is an event that already has associated commentary; such as a news story, a tweet, or a pivotal event. In these cases, a narrative snippet may already exist, such as a news headline. This can be depicted directly as a textual annotation:

Annotations_about_events.PNG

 

So what?

Annotations explicitly label insights directly on a plot, with the full context visible. The viewer gains the benefit of the insight. The viewer is also fully informed by the context to ask critical questions or otherwise probe the data. It’s this ability to understand the authored insight but then derive our own insights that makes narrative visualization so compelling.

The above patterns are just a start. What is the catalog of all the insightful patterns that go with visualizations? Do they work across the wide variety of esoteric visualizations? And, more important, which insights are meaningful: there are many possible insights, so from an automation standpoint, which insight should be promoted to be a visible annotation?

 

Posted in Annotation, Data Visualization | Tagged | 1 Comment

Bertin extended to text (pt 2)

In a previous post, I’d talked about Bertin’s previous writings on text attributes in visualization in the classic text Sémiologie Graphique (only the French edition – not translated into English). In particular, I noted that Bertin has been highly influential in the fields of visualization and cartography — not only because he provided a framework for creating visualizations, but also because he created hundreds of examples to illustrate the breadth of possibilities.

Figure 1. Bertin’s dataset of
populations in 90 departments.

So now, 50 years after Bertin, I decided to mash up some of the text-based visualization ideas that I’ve been using with Bertin’s original French population dataset.

Bertin originally used a small data set of 90 French departments, with population counts for three different occupations (agriculture, manufacturing, and services). The dataset was small enough for Bertin to publish on half a page (page 100 of the English edition, shown here in Figure 1). The additional columns are totals and ratios.

Bertin takes the dataset and then creates nearly 100 different visualizations: bar charts, scatterplots, ternary plots, parallel coordinate plots, maps, cartograms, and so on (pages 101-137 in the English edition). A small subset are shown in Figure 2 below. But none of them use text.

Figure 2. Just a few of the many different visualizations that Bertin constructs from the same small dataset of populations per department.

I take the same dataset (why not?) and then create a dozen new, text-rich visualizations (shown in Figure 3). Typically, I use the names of the departments in the visualizations. Individual departments can be identified directly in the visualizations: there is no need to cross-reference to tables, no need to rely on interactions.

Figure 3: 12 new text-dense visualizations based on the same dataset as Bertin.

For example, I’ve previously talked about microtext line charts. In the center is a parallel coordinates plot where the lines connecting columns have been replaced with microtext – shown as a much larger image in Figure 4. Color is based on percent of occupation: green for high agriculture, red for high manufacturing and blue for high services. At a macro-level you can see the inverse relationship between agriculture and manufacturing. At a detail level you can trace the lines a bit more easily and directly identify them.

Figure 4: Microtext parallel coordinates chart. Names and codes for each department are along each line. Click for big version.

A full research paper on these visualizations is in a special issue of the Cartography and Geographic Information Systems journal (CaGIS) (volume 46, issue 2). The issue is specifically on the 50th anniversary of Jacques Bertin. Full volume description here and this link is a free view to the paper (first 50 viewers only).

 

Posted in Data Visualization, Microtext, Text Visualization | Leave a comment

Album de Statistique Graphique

The Album de Statistique Graphique is a set of annual publications of data visualizations in France in the late 1800’s. I first heard about them from Michael Friendly a decade ago and have always been on the lookout to find them. Over the course of my thesis I did find a couple copies in research libraries, but the particular libraries required signing agreements that I would not share the photos (why do libraries do this?).

Now, finally, they are on-line, easily accessible, in high quality scans courtesy of David Rumsey (thank you!). And they are amazing! You can access all of them with a search query.

While I would like to systematically review the Album, that would be a significant multi-year project given the depth of data, the quality of the visualizations, and the time period they are situated in. (Probably of similar scale and scope to Sandra Rendgen’s new book: The Minard System which I haven’t read yet but is high on my reading list)

So, instead, here’s a few of quick snaps for inspiration.

1. Whole pages

Each page is remarkable with many different visualization techniques. A page is a full composition with detailed titles, annotations,  visual legends and narrative legends that accompany every visualization. Not only are the visualizations unique, they are authoritative. This particular set of half pies on a map of France from 1882 feels familiar to some the charts in Bertin’s Sémiologie Graphique 80 years later:

AlbumStatistiqueGraphique-PourcentageTravailleursAgricoles1882

2. Legends

Legends are used to explain many visualizations — presumably some of these visualization techniques are quite new in 1880’s. This particular legend (on the left) acts as both a legend and a summary of the full dataset. These pies are interesting too, using hue to create a top-level hierarchy and brightness as a secondary-level. These pies also work nicely as nodes in a graph (something to consider in the ongoing pie chart debates):

AlbumStatistiqueGraphique-RecettesDesStations1882

3. Overlapping Colors

With computer-based visualization, it’s really easy to create data-driven geometry — if that geometry is overlapping, then it is also easy to use transparency to help differentiate between the overlapping colors. However, overlapping colors are non-intuitive. We know theoretically that yellow occurs when red and green overlap, but it requires some mental effort to decode. In the Album, this particular radar plot has a filled red object and a filled yellow object: where they overlap it’s alternating horizontal stripes of red and yellow – much more easy to decode if you simply look at the book more closely.

AlbumStatistiqueGraphique-MouvementDesVoyageursPourExpositionUniverselleDe1889

4. Multigraph Lines

For those who think Sankey diagrams are great, here’s an awesome graph visualization. A multigraph is a node-link diagram where there can be more than one link between nodes. In this railway traffic diagram, there are many lines between each town. Four color variants indicates type of traffic, and which side of the center line indicates direction of travel. Line width indicates traffic volume. Corners and joins are neatly beveled. Types of traffic, mainlines, local traffic high volume anomalies are all clearly visible. (Btw, I like many kinds of graph visualizations).

AlbumStatistiqueGraphique-NombreQuotideindeDeTrain1893

5. Curvy Text

And finally, text on a path. Text annotations follow gridlines and objects, so that text doesn’t disrupt the overall visualization patterns, but still provides relevant information directly in context. Text doesn’t overlap the objects: it curves gently along paths, or angles out if the curves are too tight.

AlbumStatistiqueGraphique-PrixDuTransport1881

That’s a really quick take on a few examples. I didn’t even get into discussing the average circles on the radar plots or the red overlay text on the multigraph or other interesting aspects. Far better to browse the full set of images over at David Rumsey and draw your own insights.

Links:

Album Search
https://www.davidrumsey.com/luna/servlet/view/search?sort=Pub_List_No_InitialSort%2CPub_Date%2CPub_List_No%2CSeries_No&q=Album+de+Statistique+Graphique

RecettesDesStations1882

https://www.davidrumsey.com/luna/servlet/s/455p92

PrixDuTransport1881

https://www.davidrumsey.com/luna/servlet/s/2e3v9k

PourcentageTravailleursAgricoles1882

https://www.davidrumsey.com/luna/servlet/s/wt0024

ExpositionUniverselle1889

https://www.davidrumsey.com/luna/servlet/s/5y7k81

Quotidein de Train1893

https://www.davidrumsey.com/luna/servlet/s/a1vd9l

 

Posted in Data Visualization, Graph Visualization, Shape Visualization | Leave a comment

Using Font Attributes with D3.js

Most of the typographic visualization examples on this site are created in D3.js, using  SVG. There are tons of examples of D3 on the web used to create bars, dots, etc; but not many online examples where font attributes are manipulated using D3. So, to help people create data-driven font-attributes using D3.js – here’s a collection of examples. A running version is over on codepen which creates a simple little visualization like this (on Chrome):

Font_Attributes_D3js.PNG

And, here’s some pointers on how to make this work.

Color “fill”

The simplest example is to use color with text. It uses the same attribute as any other colored element in d3, i.e. the fill attribute. A trivial example is to set the color with a function, in this case, a separate list of colors:

 .attr("fill", function(d,i) {return clrs[i];})

Font Weight (aka Bold): “font-weight”

Font-weight is almost as simple as color. The font-weight tag takes a numeric value between 100 and 900.

.attr("font-weight",function(d,i) {return i*100+100;})

For default web-safe fonts, such as serif or sans-serif, there are usually only 2 weights available: plain and bold, which are typically set to the numeric values 400 and 700. If you want to use a font with many more weights, you first have to load the font including all the target weights. There are many commercial fonts with many weight. Google fonts has some free fonts in many weights such as Saira and Roboto. In the header section of the html, the font needs to be loaded, such as this loading of all 9 weights of the font Saira:

<link href="https://fonts.googleapis.com/css?family=Saira:100,200,300,400,500,600,700,800,900" rel="stylesheet">

Font Oblique Angle (aka Italic): “skewX()”

Font oblique angle is a mechanical skew of the font. It’s not the same as true italics. For data-driven purposes, the oblique angle of the font can be manipulated by using the transformation skewX() on a group per each element. This is a bit more effort than the simple color or weight. First the visual element has to at the group level. Then the transform attribute applied, wherein both the x,y location are set as well as skewX:

var obliquetext = svg.selectAll("obtext").data(data).enter().append("g")
  .attr("transform", function (d,i)
     {return " translate(" + (i*40+100) + ",55)" +
             " skewX(" + (i*10-40) + ")"; });

Then, with the group transform set, append a text element and the text will be appropriately skewed to represent oblique text:

obliquetext.append("text")
  .attr("font-family", "Saira")
  .text( function(d) {return d;});

One benefit of this approach is that any angle can be created, and fonts without italics or obliques can be created as needed. For example, a sloped blackletter can be created, although typographers might not be enthusiastic about this approach as legibility and readability will be impacted.

Case and Small Caps: toUpperCase(), toLowerCase and “font-variant”

Case can manipulated directly in Javascript using functions toUpperCase() and toLowerCase(). Small caps can be accessed by setting the SVG attribute “font-variant” to “small-caps” or “normal”. A simple example chooses between small-caps and normal lower-case letters:

.attr("font-variant", function(d,i) {return i<5 ? "small-caps" : "normal"; })

These can be combined with <tspan> so  parts of words can be set in upper-case, small-caps or lower-case, as shown in the codepen example and below:

var casetext = svg.selectAll("casetext").data(data).enter().append("text") 
  .attr("x",function(d,i) {return i*40;})
  .attr("y",75)
  .attr("font-family", "Saira");
casetext.append("tspan")
  .attr("font-variant","small-caps")
  .text( function(d,i) {return i<5 ? 
          d.substring(0,i).toLowerCase() : 
          d.substring(0,i%5).toUpperCase(); })
casetext.append("tspan")
  .attr("font-variant",function(d,i) {return i<5 ? "normal" : "small-caps";})
  .text( function(d,i) {return d.substring(i%5,10).toLowerCase();});

Typeface: “font-family”

Changing fonts is easy using the attribute “font-family”. For simple use cases, you can use websafe fonts, meaning there is no need to load the font into the browser, for example:

.attr("font-family", function(d,i) {return i<5 ? "serif" : "sans-serif"; })

For more variation in fonts, you can load lots of different fonts into your page and then access them in SVG. First step is to find and load fonts: see fonts.google.com/ for a large variety of free webfonts with easy cut-and-paste code to add to the <head> section your page. Here’s an example of a dozen fonts loaded into the page:

<link href="https://fonts.googleapis.com/css?family=Aldrich|Arima+Madurai|Arvo|Henny+Penny|Indie+Flower|Libre+Baskerville|Pirata+One|Poiret+One|Sancreek|Satisfy|Share+Tech+Mono|Smokum|Snowburst+One|Special+Elite" rel="stylesheet">

Then, you can access those fonts in your Javascript, e.g.

.attr("font-family", function(d,i) {return i<5 ? "Arvo" : "Sancreek"; })

Note that fonts with spaces in their names have a plus symbol in the link string, e.g. “Henny+Penny”, but when you specify that font as an attribute, you need to use the space, e.g. “Henny Penny”.

Underline: “text-decoration”

Underlines in SVG are disappointing: a single underline: no dashes, no wavy styles, no double lines. Blah. You could make your own underlines – it’s just a line added to text, but then you have to worry about interference with descenders, getting the right lengths and so on. Not for this post. For simple underlines, set the attribute “text-decoration” to “underline”. For this post, I tried to make underlines of different lengths by applying underlines to portions of text using <tspan> and surprised that different browsers gave different results. Use at own risk.

Font width: via “font-family”

Font width could be manipulated the same way that oblique text is created above – using scale() instead of skewX() – but scaling text is highly frowned upon by typographers (see Lupton’s type crimes, and more generally a really nice introduction to typography).

Instead, we can use a font that has been well-designed with lots of different widths.  Saira, at Google fonts, comes in 4 widths. So, you just need to load all the width variants, then access them using “font-family”. Here’s a trivial example:

.attr(“font-family”, function(d,i) {return i<5 ? “Saira” : “Saira Condensed”; })

Btw, thanks Omnibus Type for making a well-designed open-source font super-family in a variety of widths and weights freely available.

Spacing (aka Tracking): “letter-spacing”

The spacing can be adjusted between letters and this is often done on maps to indicate features that span across a large area such as  R o c k y  M o u n t a i n s. It is easily accessed in SVG via the attribute “letter-spacing”. Set the value in type coordinates using fractions of an em. 0em is the default spacing. Negative values such as -0.1em will pull the letters tighter together, positive values such as 0.5em will space them further apart. Note browser inconsistency: Chrome seems to do OK, Firefox and IE not.

.attr("letter-spacing", function(d,i) {return return i*.05-.1 + "em"; })

Outlines: “stroke” and “stroke-width”

I don’t like to use outlines on text, but you can use them. Generally, outlines don’t work well on thin or lightweight fonts, you need to start with a fairly heavyweight font without much detail (e.g. Saira Extra Bold, Arial Black, Source Code Pro Black, etc). Then it’s just a matter of setting the attribute “stroke” to a color such as black, “stroke-width” to some very small value and “fill” to none. Notice how it has a very limited effective range as shown in the example. The stroke is almost invisible on Abe and at the other end, the insides of the a in Ian and e in Gem just turn into blobs.

Baseline Shift: “dy”

You can shift letters up and down using the attribute dy. (if you want to put text on a path, see earlier post regarding microline text). You can provide a list of values to dy, then each successive character will take each successive value in dy. Here’s a simple example:

.attr("dy", "0 0.2 -0.1")

Outlines: “stroke” and “stroke-width”

I don’t like to use outlines on text, but you can use them. Generally, outlines don’t work well on thin or lightweight fonts, you need to start with a fairly heavyweight font without much detail (e.g. Saira Black, Arial Black, Source Code Pro Black, etc). Then it’s just a matter of setting the attribute “stroke” to a color such as black, “stroke-width” to some very small value and “fill” to none. Notice how it has a very limited effective range as shown in the example. The stroke is almost invisible on Abe and at the other end, the insides of the a in Ian and e in Gem just turn into blobs.

Combos:

All the  above can be used together in any combination. See the codepen examples.

Variable Fonts:

In theory, variable fonts can make other unique features of fonts available to D3. I haven’t figured out how to do this directly in a line of SVG (i.e. by setting an attribute tag) and presumably need to dig into style tag or CSS to connect these together. Maybe someone else will do and post examples?

 

 

 

Posted in Data Visualization, Font Visualization, Text Visualization | Tagged , , , | Leave a comment

Text in Visualization – thesis on-line

You can now find my full thesis on-line. Instead of reading the whole thesis to learn about the design space of text in visualization, you can find a two page overview that summarizes the entire thesis on page v-vi:BrathThesisTextInVizDesginSpaceOverview.PNG

The first half of the thesis (page v on the left) methodically defines the design space by reviewing many examples. The second half of the thesis (page vi on the right) then tests the breadth of the design space by creating many different kinds of extended and novel visualizations and provides general critiques. If you want to drill down into any area, little blue subscripts are links to the corresponding chapters.

For readers of the blog, you’ll find more detail on many items previously discussed here.

Posted in Data Visualization | Leave a comment

Word Stems Visualized

In this blog there have been many posts of words visualized where differences are accentuated and encoded using bold, italics, underlines, etc. But what if you want to visualize the similarities?

Stemming is a basic task in a lot of text analytics where the same semantic word has variant spellings, for example, to indicate different verb tenses (e.g. swim, swam, swimming). But there are also interesting derivations with different meanings, e.g. swim, swimmer, swimmable. So, how could you visualize these to focus on the commonality across the word roots, not the differences?

Stem & leaf plots are possible, but previous examples shown here create lists of leaves, not well suited to comparing syllables. Word trees have been discussed here before, but word trees put a big gap between different parts of text and may vary sizes and weights of different chunks of text in the tree.

Here are six English word sets, each with a common five letter root:

WordStems.png

Visually, there are six root-word-plots here. Each has the common root word running vertically along the left side of the plot as a stem (e.g. night-, orthomicro-, etc.) and affixes branching out horizontally along the right side of the plot. The left root and right affix indicate full words (e.g. nightcap, nightclub, nightfall, etc). When there are common intermediate syllables, these span across the common words (e.g. graph in orthographic, orthographical, orthography).

Visually scanning any group means that the affixes can be easily compared. For example, night- -cap, -club, -fall, -ie, -ingale are all derived words with complete different affixes, all referring to completely different objects. Under micro-, microbe and microbiology have much more commonality in meaning: being about tiny life forms or the study thereof; — although very different meaning from the words microchip and microcosm. Under astro- there are two very different branches of study, namely astrologer, astrological, astrology versus astronomer, astronomical and astronomy.

You can see that the first column, showing words starting with night- and with stand-, have root words that are independent words used to form compound words. Visually, these compound words don’t have additional shared syllables: the prefix is being used to create new words to define unique objects. However, the later two columns have Greek prefixes (ortho-, micro-, chrom-, astro- ) — none of these prefixes are independent words. And in each of these, there are common syllables indicating subsets of related words. At the same time, these common prefixes can be used to create new, highly different words that deviate more from the others, such as astroturf, or microchip.

From a design standpoint, the visual layout borrows from stem & leaf plots, with additional intermediate grouping and only singular leaves (so, not much like a stem and leaf plot:-). Design-wise, it also seems problematic that words don’t split quite on syllables: for example, the plot shows astro-nom-er whereas it should be as-tron-o-mer. 

Technically, it is not easy to create text of different sizes and widths that all visually appear to have similar stroke weights. Ideally, a font based purely on strokes rather than fills would work well for this. The early vector-based computer fonts by Allen Hershey would be great (which I used once-upon-a-time on an old Textronix 4014). However, their obscure format isn’t readily adaptable to modern font standards. Please, Frank Grießhammer, I hope you can find the time to release the Hershey fonts in OTF format! This is one example of a real-world application for vector fonts.

 

Posted in Alphanumeric Chart, Data Visualization, Text Visualization | Tagged , | Leave a comment