There has been recent discussion on equal area cartograms (aka tile maps), for example, NPR Hexagon Maps. But why are they colored symmetric shapes, for example these equal area cartograms from UK publications are squares, hexes and circles:
And what other alternatives are there?
Problems with Choropleth Maps and Distorted Cartograms
Choropleth maps are extremely popular (100,000+ on Google search), but have several problems:
- Small areas can be invisible (e.g. Dubai, Singapore are not visible on a world map, Rhode Island may be hardly visible on the U.S. map).
- Perception can be biased by large regions that are more prominent than small regions. This is particularly problematic if the task requires that regions be perceived equally (e.g. comparison of policy effectiveness).
- Just because it’s a map doesn’t mean readers can identify the various shapes. Identification of a selected area or finding a named target can be difficult e.g. 63% of young adults in USA could not locate Iraq on a map in a National Geographic survey
- Data encoding is typically limited to a single value, such as color hue or brightness.
So instead, there are other solutions, such as the cartogram, which varies the size of geographic regions in accordance with the data. These distorted cartograms have other issues, for example:
- The geometry may be stretched and distorted. If 63% of young adults had trouble identifying Iraq in the most familiar map representations, they are going to be even less likely to find it once it’s been squashed and stretched.
- Squashing and stretching sometimes ends up with some shapes distorted beyond recognition (see Alaska in the npr link).
- Sizing shapes runs into potential data problems. What happens to items with a zero value – should they completely disappear? What about items with negative values? Or items with null values?
Therefore, equal area cartograms are an interesting alternative. Suppose you’re interested in questions of policy: you want to see what each country (or state, or county, or town) is doing: land area and population are irrelevant – each entity has its own stance and should be the same size. One could argue the same for other geo-statistics too – GDP, inflation, etc – why should land area or population size be so important as to dominate the visual representation? Here’s an equal-area cartogram from Bloomberg news:
Every geographic entity gets the same size, positional adjustment means that adjacent areas remain adjacent and nothing overlaps on top of each other.
- Because each item is the same size, it is perceived with the same weighting: there’s no perceptual bias towards geographies with large areas.
- All locations appear, even those with zero, null or negative values.
- Because the shapes are regular, it’s more likely that a label can fit. A label can help people identify locations without relying on interactions such as tooltips.
So equal area cartograms can be a good choice for addressing some types of questions and data. But there’s more. Here are a few other thoughts:
Beyond symmetric shapes
Most examples of equal area cartograms have nice symmetric shapes: squares, hexes, circles. These symmetric shapes are aesthetically pleasing and they pack together nicely without weird little holes in between them.
But they don’t have to be!
When working with regular shapes with similar in width and height there are challenges trying to fit them together when the original geographies were perhaps narrow or wide. When you try to make the United States out of these shapes there’s only 3 states on the west coast and more than a dozen on the east coast. Instead consider rectangular shapes of equal area. The rectangular proportions can be adjusted in different parts of the country to better fit the space, the area remains the same and they can still be tiled to neatly fit together. Here’s a map that I put together a few years ago of the United States as equal area rectangles:
Things don’t line up exactly and north east uses a call out, but now the designer has more freedom with the layout. (To be fair, others have made different proportioned equal area shapes) But there’s still a problem in that each square is still showing only a single variable using color. What if you want to show more than one variable?
When working with a cartogram, the labels are desirable since you’re now dealing with a bunch of arbitrary shapes and the viewer needs cues and landmarks to orient themselves around local geographies. Unfortunately, we might just color the shapes and apply a label as separate after-thought.
Sometimes, though, there may be a need to convey more than a single variable. And it’s really hard to show more than one variable with color – sure it can be forced in, e.g. using hue to convey one variable and brightness to convey a second variable, but that’s still limited to two variables and some people may have challenges perceiving brightness separately from hue (e.g. what is brown? is it a unique hue or a variation of some other hue – and if it is a dark version of a hue, which hue is it?).
Interestingly, fonts have many visual attributes that can be manipulated independently: weight (e.g. bold), angle (e.g. italic), case, underline, font family, condensed/expanded, size, color, etc. Some of these cues are perceptually strong (e.g. bold). Some combinations can be easily parsed separately (e.g. underlines vs. bold). So, here’s an equally weighted cartogram where the country codes display multiple variables:
Using multivariate labels means that complex questions can be asked of this graphic. For example, are there countries with high HIV (all italic), short lives (all lowercase) even though they have high health expenditures (all bold)? Sadly yes (rwa, zaf – i.e. Rwanda and South Africa). How about countries with long lives (all uppercase) and low health expenditures (lightweight). Yes (not USA, try south-east Asia).
You’ll notice that all the countries are identified by three letter ISO codes. Every country has an equal number of letters and all the data is encoded in the label. The object showing the equal area is the space associated with the label, not some underlying geometry. Therefore the containing box, or hex, or circle, is no longer required. And the background can be used for something else, in the above example to differentiate between the major continents and island nations. This may help the viewer orient themselves in the scene.
So far, these equal area cartograms show equal areas, multiple variables and mnemonic labels. It’s easy to layout 50 squares by hand, but this is something a computer algorithms should do. And the layouts shouldn’t need to be a grid. This is something I’m currently investigating. Here are some quick snaps using force directed layouts on UK postcodes.
There are 5 data attributes per UK postcode area in this map:
- Letters indicate the region, e.g. B for Birmingham, M for Manchester, OX for Oxford and so on.
- Case indicates number of bedrooms per person, uppercase if more than one bedroom per person, lowercase is < 1. Regions around London (nw, wc, se, e) are lowercase – space is at premium.
- Oblique angle indicates median age: reverse slope indicates a younger age, forward slope (i.e. like typical italics) indicates older age. Old along the south coast, young around London and some of the other big cities.
- Weight indicates population in the area. Cities such as Manchester and Birmingham pop-out. London does not as its population is split across many postcodes.
- Color indicates the occupation with the highest difference vs. national average. Green for agriculture in the west, blue for finance around east London, yellow for mining in the north-east, etc.
If you like any of the ideas feel free to use them and send me a comment, email or tweet.
Addendum: Layout Notes
So one question I had was regarding the algorithms to create the layouts.
1. The equal area cartogram of the United States, above, was hand edited as quick proof of concept.
2. The global map showing country health expenditures, life expectancy, HIV prevalence was created following a sort of reverse of the bin-packing algorithm (i.e. a greedy algorithm taking out the biggest empty items first):
- Start with a largish grid, e.g. every 2 degrees of latitude and longitude, such that each country is in a unique grid cell (cell shading can also be determined whether the extents of the cell are land or water).
- Walk through all columns and all rows deleting any empty row or empty column.
- Then find the longest partially empty row (column), then for the remaining non-empty cells, inspect adjacent rows (column) where there are empty cells and delete both of these rows. E.g. say the grid has 20 columns; row 5 is empty from col 1 – 17, and row 7 is empty col 18-20. Delete these two partial rows.
- Repeat until stopping criteria occurs (e.g. less than 30% of original area remains).
- In later steps, a deleted row may consist of 3 or 4 partial rows.
While I liked the grid-based approach and the resulting strong grid layout, one criticism of the grid-based map was: this looks like a map from a Commodore 64. This motivated a non-grid based layout:
3. The non-overlapping force-directed layout (UK map above) uses a few different techniques:
- First, each point is placed based on its original latitude/longitude.
- A Delaunay triangulation was used to connect adjacent points together into a triangular mesh.
- A force-directed layout was desirable as the connectivity between adjacent points would retain the relative adjacencies between points. A repulsive charge per point pushed points out in dense areas (e.g. London) and edge forces kept the triangles together. This algorithm stops at some relevant criteria (energy threshold).
- A finesse point: the default D3 triangulation ended up with lots of thin triangles around the perimeter as various bays were crossed until it had a convex hull. These long, thin triangles were not desirable as the force step would then pull various peninsulas together (e.g. Cornwall pulled up to Wales, or Canterbury pulled up to Norwich). Therefore the triangulation was modified so that so some triangles or edges were not included in the final edge set, based on length of the sides of triangle and/or angles.
- The result of the force-directed layout still had various locations where labels partially overlapped. This is due to the fact that the force-directed layout is an overall energy system and two points can still be close together given the nuances of the local connections. A final step walks through all the pairwise overlapping bounding boxes and pushes labels apart iteratively (until some stopping criteria or set number of iterations).