Using colors to represent different categories of data is challenging to scale past ten unique colors. Many authors and researchers recommend against it. Colobrewer2 doesn’t go past 12 unique colors. D3.js offers three variants of color scales with 20 unique colors. But what if you want more? Suppose you have a line chart with 50 lines? Or some kind of choropleth map with more than 20 categories?
Dashes Don’t Work
With line charts, one obvious answer is to alter the line style, for example some continuous lines and some dashed lines. In this way a color can be reused, once in a normal style, again as a dash variant, again as a dotted variant. I used to use this approach, but found out that users don’t like dash lines. Why? Dashes might have a gap at a turning point: many financial users want certainty at the local highs and lows. Dashes are also confusable with gaps: in real data there may be a gap and you want to explicitly depict the gap because a gap in data is meaningful.
Like many things in data visualization, researchers want to invent new techniques. But first, it may be more constructive to see how other people solved the problem.
I like to look at cartographers, because there’s 500 years of printed maps to consider for inspiration. There are many variants on line styles: dots, dashes, combinations of dots and dashes and pairs of lines:
Many different simple geometric glyphs can be embedded in lines to differentiate between them: dots, circles, stars, diamonds, boxes, and so on.
Charts, of course, borrow the idea, as seen in Excel’s charts.
These glyph-based line styles could be combined with various colors to create a wide number of different lines, e.g., orange line with stars, orange line with circles, blue line with stars, blue line with circles, and so on.
But, lines with glyphs create visual noise. The high point of a line is confounded with a star making the highest point a few pixels higher. And the entire scene becomes cluttered with circles, dots, stars and diamonds. One suspects that cognitive load may be increased, but some independent studies should be done to confirm the hypothesis.
25 Pair Color-coded Telephone Wires
Instead of glyphs, this post is focused on alternating colors on lines, borrowing an idea from telephone wires.
Interestingly, wire-based telephony had the same problem as line charts. You need a get a pair of wires to each household. Those wires need to be bundled together back to the telephone exchange. You need to be able to visually distinguish between the wires when you open up the bundle at any point. You could interactively determine this by successively testing each wire, but that would be slow and cumbersome. And there’s definitely more than 5 houses per exchange, so you need to create some kind of categorization scheme to differentiate among many different lines.
I remember as a kid in our suburban house we had open floor joists in the basement and my dad had wired up phone jacks in each room (this is pre-wireless technology). Unlike 120 volt power wires (black and white), each telephone wire was predominantly one color, say red, with a second color as a small dash every half centimetre, say yellow. So, you could have red with a bit of green or orange with a bit of yellow, green with white and so on. The telephone standard supports 25 pairs of wires, which results in 50 unique color codes. For example, in the top image in this post, the yellow wire with bits of blue can be clearly distinguished from the blue wire with bits of yellow paired beside it.
This color patterning means that a wire can be visually traced through a spaghetti jumble of wires or decoded at either end without needing to see the middle. Presumably the colors were chosen and standardized to meet the needs of telephone repair, for example, under poor lighting conditions. And, given the standard has been around for many decades with worldwide use, it is probably safe to assume that it has some degree of effectiveness.
Using the 25 Pair Color Code with Visualization
What does this mean for data visualization? Using the same approach, line charts could be created with 50 uniquely identifiable lines. Using this approach, the clutter associated with glyphs on lines does not occur and the objection to gaps associated with dash styles is no longer an issue.
Areas could be filled with 50 uniquely identifiable color combinations, and the pattern orientation, size and glyph remain open to either express other data or allow for aesthetics. Points, such as a scatterplot, however, won’t work, unless those points are larger and given a lack of association between adjacent points, the approach might not work well perceptually.
Color-coded lines are easy to implement in SVG (and D3). A line can be plotted twice: the under-line in the dominant color followed by a second line drawn overtop with the secondary color and a dasharray. Similarly, for areas, SVG patterns can be created.
Note that the colors in the telephone wire standard include colors such as black and white. Given that visualizations are typically on a white or black background, the initial colors need to be tweaked for visualization. Perhaps pink instead of white when used on a white background.
So here are 50 lines in a pseudo-random line chart, where each line is colored using a 25-pair color encoding:
A couple of things worth noticing:
- The approach does work in that there are no gaps and each line is clearly and uniquely encoded. Yellow with a bit of green hits the lowest low on this chart.
- A line chart with 50 lines is very crowded. You can see sort-of macro-patterns, with a density of lines starting a bit higher and trending down.
- You can identify where lines appear and reappear, such as the purple line with a bit of pink at the top left, reappears at the top again the right side. You can even visually trace a line, but that requires considerably more effort particularly through an area of congestion.
Does 25 pair color coding really work? The result above seems promising but inconclusive. More tests would need to be done. And more importantly, what are the tasks that a user might need to do on a 50 line line chart? Tracing is an interesting task to consider. What are the other tasks?