I don’t mean, “which programming language should you choose?” I mean, which human language should you use to name your classes, methods, variables and other artifacts?
If you’re thinking, “Why would I use anything other than English?” then don’t worry: this article isn’t for you. This article is for programmers in places where English is not the language of business. In that situation, there is a good argument for coding in English… and a good argument for not doing so. The question being: which argument wins?
The argument for coding entirely in English is that it makes for cleaner code. You can never code entirely in the local language, because your code will always and unavoidably(*) be studded with English terms: from language keywords, from library classes and annotations, from coding conventions like JavaBeans, from XML schemas for configuration data…
Consequently, if you want to code in the local language, you will in fact end up with a mishmash of both languages – often within the space of a single name. These mixed-up names are always ugly and often confusing, particularly when the word-order of the two languages are in conflict. It makes for cleaner and clearer naming to name everything in English.
(*) Sidenote: A few French-speaking colleagues have told me they first learned BASIC with the keywords translated: SI x < 10 ALORS .... And non-English versions of MS Excel ship with all the functions translated (which causes chaos when exchanging between different-language versions, since the function names are not encoded in the file format but remain identified by their textual names). But for enterprise programming we can rule out that kind of thing.
In the domain language
The argument for coding in the local language is that one should code using the language of the domain. The Pragmatic Programmer's Tip 17 says, “Program Close to the Problem Domain”; Agile practitioners advise learnibng and using the language of the business. By doing this, you avoid both the effort and the risk of misunderstanding that arise from continually translating between the business vocabulary and the vocabulary used in the code.
My answer to this dilemma is: names in code should always use the language of the problem domain, even when the language of the problem domain is not English. The cost of maintaining a separate English-language business vocabulary (if you don't otherwise need one) and incessantly translating to and from it, is far greater than the cost of lost clarity from mixing two languages.
I've tried the all-English approach, on a cross-application component I wrote to manage persons, companies and their addresses. It does indeed make for prettier code, but I started to realise that it was the wrong approach: round about the thirtieth or perhaps fortieth time that a fellow programmer came up and asked, “Andrew, if I want the commune of an address, should I use placeName or postTown?”
At first I put the translations in Javadoc comments, but soon realised that (1) no-one read them, (2) the comments added an extra maintenance burden, and above all, (3) it was far simpler to have a method getCommune() than a method getPlaceName() that no-one understood unless they read the Javadoc comment explaining that placeName actually meant commune. I ended up changing all the method names: it hurt for a few days, but at least I won't still be getting phone calls asking the same question, ten years from now.
Lesson: it's not worth translating business terms into English just for the sake of it.
The need to apply judgement
The trouble with my answer is the difficulty of knowing which is the language of the domain.
- For purely business concepts, like the address example above, it's clear enough, and the business term (e.g. commune) should be conserved.
- At the other extreme, there are times when the problem domain is clearly purely technical: for instance, reading a Spring application context. In that case, there is no need to translate from English, since every programmer would be expected to understand English terms such as "read" and "application context".
- Between the two is a large grey area, where the only answer is to apply judgement.
An example may help illustrate the grey area. Recently I was asked to capitalise and remove accents from surnames before saving person data. The requirement was expressed in French: mettre le nom en majuscule et supprimer les accents. These two conversions are essentially independant of each other, and String.toUpperCase() already has the first covered.
So, with a
lot of help from the Internet, I wrote a utility method to remove accents. I named it in English, because at the level of the utility method, the problem domain was the technical one of character set manipulation - and the parallel manipulation of upper-case conversion is named in English, and if I had found such a utility in commons-lang, it also would have been named in English.
Then, in the business code, I simply wrote nom = StringUtil.removeAccents(nom.toUpperCase()) (nom being “surname” in French). I could have written a wrapper function mettreEnMajusculeEtSupprimerAccents(), which would have combined the two conversions and translated between business and technical vocabulary. And perhaps I should have done; but I wasn't convinced that it helped.
The problem of translating between business and technical vocabulary crops up in English-only projects, too. In this case, the translation was a direct translation from one language to the other - there wasn't any conceptual translation. When there's a conceptual translation too, standard advice applies: single level of abstraction, and this from Dan North. Which means, use the business term in business-logic code, and the technical term in technical (e.g. persistence) code.
My advice is:
- Don't translate business terms (but do use English terms if the business uses them
- Do stick with English for technical code
- In between, you have to second-guess who will be reading your code, and which choice of words will be clearer to them.
I'd be interested to hear any feedback or counter-opinions.