LondonLives.org is a collection of 240,000 historic manuscripts from eighteenth century London. These have been collected, organized and analyzed, such as Sharon Howard’s summary of 2894 coroner inquests from London 1760-1799 with each case including subjects, verdicts and causes of death as well as links back to original handwritten manuscripts. The dataset is fascinating: How might this data be visualized?
Each each row in the summary dataset indicates subjects (e.g. Susanna Thompson, Sarah Cox), verbs (e.g. crushed, falling) and objects (chimney). These can extracted using natural language processing (although, note that Howard cautions that the causes of death are not fully accurate, nor is my natural language processing). These extracted words can be assembled into a hierarchy, for example verb, object, subject:
– Susanna Thompson, Sarah Cox
There are many, many different techniques for visualizing hierarchies (e.g. treevis.net):
– Treemaps and sunburst focus on accurate representation of values using area, meaning that fitting legible text can be difficult.
– Graph representations, such as nodes and links, emphasize the structure, again potentially making difficult to fit all legible labels.
– Org-charts are designed to fit text, but the left-right/up-down layout can result in layouts that become very wide or very deep, making it difficult to fit an org-chart with 3000 items to a rectangular screen and display all the text at a legible size.
– More generally, most quantitative visualizations aren’t designed for representing large amounts of text.
Instead, typographers and publishers have techniques for depicting textual hierarchies not shown in any visualization compilation. Indexes and dictionaries are designed to show a large number of words in hierarchies in a very dense format. Dictionaries use wide variety of formatting, for example, using very heavy-weight text to make the defined word stand-out. Alphabetic ordering facilitates search. This supports quick non-linear skimming to the word of interest; then different formats are used within the definition (e.g. italics, small caps, etc.,) facilitating jumping to the type of data of interest. Size of entries is relevant: words that have many meanings have longer entries. In effect, indexes and dictionaries are visualizations, although constrained to printed pages bound in a book. With larger screens, indexes can be set out to be entirely visible on single screen or two.
Returning to the coroner inquests data, consider an index-like visualization. Organizing the hierarchy by verb, then object, then subject results in a list of subjects under an object, which in turn are listed under a verb. Here’s all the ways that people died under the verb CRUSHED:
Each object is listed in a separate coloured section. Each section starts with the object in bold (e.g. chimney), and has a list of named subjects in a narrow brown font (e.g. Christiana Jorden and 3 others). Individual names are links to the original manuscripts, e.g. Sarah Cox:
Given some unusual objects, a sample cause of death is shown in italics, (e.g. crushed by falling chimney). More frequent causes result in taller sections, for example being crushed by a chimney was more common than being crushed by a scaffold; while being crushed by a house or a theatre crowd was more common than chimneys.
Also, I was curious about gender: were women more prone to certain causes of death than men (assuming that women’s deaths were investigated equally to men’s deaths)? After each object a pair of numbers indicates the gender ratio, e.g. 1/3 indicates 1 man and 3 women were crushed by chimneys. To facilitate skimming for this ratio, the background colour of each section is shaded light blue to light red. This colour scale isn’t meaningful when viewing causes with only a couple deaths but provides some macro-scale patterns when viewing the larger dataset.
Here’s the 2894 inquests presented in an index layout, as a simple interactive HTML/CSS visualization. On a 4K screen all text is legible and readable:
Like a dictionary, the biggest blocks have the most entries: the biggest blocks are DROWNED (in column 3 in the image above) and HANGED (column 6). Both of these tend to be light blue (more men than women). The largest light red block (more women than men) is under BURNT (in the first column).
In the full visualization, the colour of subjects’ names correspond to the verdict: brown is for deaths deemed accidents, red for homicides, green for suicides, etc. These can be interactively filtered, for example, the most popular method of suicide was hanging (biggest box), generally more male suicides are recorded than female (generally more blue than red), many methods were used by both genders (e.g. drowning, poisoning, cutting), although gun shots are almost entirely male.
This example is from my recent book Visualizing with Text, including a comparison to a treemap. It’s a good illustration of a visualization that provides high-level perceptual patterns (large coloured blocks) and low-level details (words and phrases) within the same visualization – which Edward Tufte refers to as micro/macro readings. The book review of Visualizing with Text by Alec Barrett succinctly summarizes the benefit as being able to “zoom with one’s attention.”
Thanks to LondonLives.org and Sharon Howard for collecting, organizing and summarizing these historic documents.