Lines of Code is a Bad Metric, Either Way
Join the DZone community and get the full member experience.Join For Free
I did find this from the Dropbox post simplistic:
In the process of converting, we shaved off more than 5000 lines of code, a 21% reduction. Granted, many of those lines looked like this:
[lots of lines with only brackets and semi-colons]
Regardless, fewer lines is beneficial for simple reasons — being able to fit more code into a single editor screen, for example.
Measuring reduction in code complexity is of course much harder, but we think the stats above, especially token count, are a good first-order approximation.
It is instructive that the only reason for fewer lines actually provided is being able to see more of the code (which may be important for you!) This is the reverse of the old project manager’s method of measuring programmer productivity by using lines of code written. The argument against, rightly, was that writing more LOC was simply a measure of how much you typed, and not how quality code you wrote. A strict use of LOC as a metric could introduce dysfunctional dynamics through people inflating their LOC by not refactoring their code properly and creating future maintenance problems.
But this doesn’t mean that fewer LOC automatically translates into quality code. There is a general truth that one line not written translates into one line not having potential bugs. But when you are replacing multiple lines of code with a single line, you are sometimes not eliminating those bugs. You are just bringing all potential bugs into one line. To give an example, if you replace an “if-else” statement with an “?” operator, you are reducing 5 lines of code with one statement. But you didn’t eliminate any bug. You just folded them together.
Another case is where you eliminate intermediate variables and roll them into a final statement, which looks especially neat if you can get a fluent interface going on. The problem is that each section of a chained statement can fail at runtime and so you need to have many more lines simply to ensure that the code works as intended or fails gracefully. Any serious code base will always have a significant percentage of code (and libraries) for error-checking and resilience.
Another instance is when you reduce code by moving duplicated code to common classes or methods. This seems like a sure-fire way of reducing huge chunks of code. But centralized code with global side effects can be dangerous because they can be called from any place in your code base. So you need to have a good state machine (and good scoping) to ensure that the code is not executed at inappropriate times.
People also tend to forget that a vast portion of the code is not in the code you write, but in the libraries that you use. One aspect is that the total LOC is way more than the LOC usually counted. But another aspect is that people shouldn’t be counting some of the LOC they write. For example, if you have written a bunch of code that is reusable and tested (both in test and production environments), then for all intents and purposes, that is code equivalent to an external library.
What I am trying to get at is that in the twenty thousand lines of Dropbox code, a huge percentage of code is already tested and working. Nobody even looks at much of the code, because it has a clear API. Unless there is a fundamental change to the architecture or there needs to be a significant improvement in its performance, that code won’t be touched. So why count them? The code you should count is the code that is in play. This figure should be kept small, but not only by reducing what is written, but also moving them into code libraries that can be tested and forgotten.
Published at DZone with permission of Krishna Kumar, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.