In all the articles I've written here I've covered a fairly broad range of topics related to data visualization: the use of tick marks and labels, data density, the problems with dual-axis charts and much more. I've touched upon the use of color a few times but only in passing. That's because I think, while interesting, the topic can be quite confusing and that makes writing short articles difficult. In this two-part series I'll try to bring together previous advice on the use of color, cover why I think it's a complex topic, define some relevant jargon, and provide links to a few resources that I have found useful. In this part I cover why what we see might not be what we expect to see. In Part 2 I'll look at picking suitable color palettes.
What Color is That?
One morning in February 2015 I awoke and checked what was going on in the world via Twitter. Everyone was talking about a white and gold dress. Or rather they were talking about what looked like to me like a white and gold dress. Many felt the same as me but some strange people were claiming it was blue and black. It turns out those strange people were actually right. The viral phenomenon that was "the dress" showcased the peculiarities of our vision system.
Colin Ware describes, on page 69 of Information Visualization (third edition), how "[n]eurons processing visual information in the early stages of the retina and primary visual cortex do not behave like light meters; they act as change meters". One benefit of the complex way our visual system works is that we can usually detect a gray surface as being gray, a white surface as being white and a black surface as being black whether we're in bright sunlight or a dimly lit room and independently of the color of the illuminant. This is called "color constancy" and it shouldn't be too difficult to imagine how this could have been an evolutionary advantage in the past. Information about the light source itself isusuallymuch less important.
To achieve color constancy the brain has to make some educated guesses about the illuminant. Sometimes it gets things wrong. This would appear to be at least part of the reason for the disagreement over the dress.
If you're creating a visual representation of some data it's rare you'll ever have to worry too much about the perceived colors of a dress. But it does still highlight the fact that sometimes we misinterpret color stimuli. Take the simple image below:
If you've never seen this illusion before you may be surprised to learn that the small squares are the same color. You can check this using the eyedropper or color-picker tool of your favorite image editing program. If you're on a Mac it's quicker to use OSX's DigitalColor Meter app.
This color contrast illusion can be significant for data visualization: if you're using the same color encoding on two different backgrounds you need to check whether they really look the same. Remember the blocks of color in your key or legend too. If a chart background is, say, light gray then the background in the key should also be light gray and not white or black (we're not talking about natural illuminants here so don't expect your brain to fix it for you).
Not Everyone Has Perfect Color Vision
The color-sensitive cells of the retina are called cones and we (most of us) have three types — millions of each — making us "trichromats". The types are frequently referred to as red, green and blue, though it's more proper to use long (L), medium (M) and short (S), describing the wavelengths of peak sensitivity. Even this is very much a relative designation: L cones are most sensitive to light at around 580 nanometers, M cones to light at around 540 nm and S cones to light at around 450 nm (Ware, page 97). (There's no relation here with the designations for radio waves!)
Color blindness, or color vision deficiency (CVD), in humans is the result of a lack of, or deficiency in, one type of cone cell. It can be acquired or inherited and the latter is fairly common in men (around one in 12 suffer). If it is L or M cones that are lacking the resulting condition is frequently described as red-green color blindness, while blue-yellow color blindness results from defective S cones. In reality, the effect is more nuanced than these common names would suggest and a number of tools have been developed to help trichromats without a CVD ensure their work is accessible to those whodosuffer. My favorite is ColorOracle. It's a really simple app for Windows, Mac and (some) Linux OS's that sits in the notification area (system tray) or menubar. You click on its icon, select a form of color deficiency and it instantly (temporarily!) changes the colors on the screen to simulate the deficiency.
Deuteranopia, the formal name for a problem with M cones, is the most common form of CVD. As I've previously mentioned, it's a good reason to avoid using only red and green color encoding in your visualizations. If youdowant to use a "traffic light" color scheme then one option is to use a secondary encoding to reinforce the differences, for example a red circle and a green triangle (perhaps with an amber square).