How much of what we do is based on sound understanding? And how often do we do things without really understanding why? Everything we do: from walking down the street to writing software – sometimes we understand why we do things, other times we follow superstition.
We’re trained to associate cause with effect, even if we don’t understand the connection:
Press button: get food. Hungry. Press button. Food.
At an animal level, the reason is irrelevant. But as rational human beings, we’re able to understand the relationship between cause and effect; but that doesn’t mean we always behave rationally. Ever walked around a set of ladders rather than under them? My wife makes me laugh by saluting magpies. These superstitions are entirely irrational behaviours, but are mostly just amusing and harmless.
But when it comes to computers, this ability for us to not worry about why things work is pervasive. For example, ever had a conversation like the following:
Wife: I need to email someone this word document I just wrote
Me: ok, where did you save it?
Wife: in word
Me: ok, but is it in your documents folder or on your desktop?
Wife: I clicked File then Save.
Me: ooookaaaay, but what folder did you save it in?
Now, this isn’t an uncommon problem – people use computers all the time without really understanding what’s happening underneath. We learn a set of behaviours that do what we want. Why that set of behaviours work is irrelevant. We have a mental model that describes the world as we see it, our model is only called into question when we need to do something new or different.
The trouble is, from the outside, it can be very difficult to tell
the difference between superstition and understanding. For example, how
many Windows users actually understand shortcuts? Ever tried
explaining the idea to your Dad? But yet, I bet he can click links on
his desktop or in his start menu without any problem. His mental model
of the world is sufficient to let him work, but insufficient to let him
understand why it works. As soon as you step outside the normal
behaviour – for example, trying to explain the difference between
deleting the desktop icon and deleting the application – your mental
model is challenged and superstitions exposed.
For users to not question things and adopt computer superstitions is understandable. But for software developers, it’s frightening. If you don’t understand something someone is explaining, you have to challenge it – either their model of the world is wrong or yours is. Either way, you’re not going to understand and agree until you can agree a consistent model.
But I’ve worked with developers who effectively become superstitious programmers. They don’t really know why something works, they just know that clicking these buttons does the right thing. Who needs to know how Hibernate works, as long as I know the magic incantations (or can copy&paste from somewhere else) – I can get the job done! But as another developer on the team, without looking closely – can I tell whether you’ve setup the Hibernate mappings because you know how they work, or just copy&pasted it from somewhere else without understanding?
The trouble is, almost all programmers resort to copy&paste&change from time-to-time. There are some incantations that are just too complex and too rarely used to memorise, so we quite reasonably borrow from somewhere else. But the difference between a developer that uses the incantations to aide his memory and the developer who just blindly copies without thinking is incredibly subtle. It’s easy to be one while thinking you’re the other – especially if it’s something you don’t do all the time.
How many times have you found a fix for something but not understood why? Do you keep investigating or move on? For example, I was investigating a race condition recently and I’d spotted an incorrect implementation of double-checked locking - but I found that fixing it didn’t actually fix the bug. I wasn’t convinced I’d implemented the double-checked lock correctly either, so I replaced it with a method that always locked. What do you know: problem fixed!
Now, given what I know of the bug – that’s not right. Double-checked locking is a performance tweak, it shouldn’t impact the thread-safety of the solution. The fact that locking in all cases has fixed the bug gives me a hint that there must be something else that isn’t thread safe. By introducing the lock, the code that runs just after ends up effectively locked – it’s unlikely to get interrupted by the scheduler with another thread in the same critical section.
After another couple of hours investigation, I found the code in question – there was a singleton that was being given a reference to a non-thread safe object. I could have left my lock in place, since it “fixed” the bug – but by following my intuition that I didn’t understand the reason I found a much more damaging bug.
Understanding is key, be wary of superstitions. If you don’t understand why something works, keep digging. If this means you slow down: good! Going fast when you don’t understand what you’re doing is a recipe for disaster.