Eliminating duplication

DZone 's Guide to

Eliminating duplication

· Agile Zone ·
Free Resource

The title of this article is Eliminating duplication, and when you read that I think you will have guessed I would be talking about code. Of course I will dedicate some space to code-related issues, but not only: duplication is a monster with many heads, that can be found outside of code.

But let's start with it.

Code, of course

The simplest form of code duplication is a group of lines of code which are copied and pasted. They will quickly go out of sync with their original source, resulting in reappearance of bugs and high costs of refactoring (due to all the places where you have to refactor exactly the same code).

But there also many non-obvious duplication that we should target, like:

  • calls to an object which are always made together, a sign of too much fine-grained interface.
  • Needless parallel hierarchies which have to be kept parallel.
  • Multiple implementations of an interface which should actually be Bridges.

Speaking about not actual code, but about CTRL + C, CTRL + Vcomments

Your docblocks (or Javadoc) should not tell me anything that is already in the code.

Comments can be often eliminated by renaming variables and extracting methods and classes. It's more difficult for a method name to get outdated with respect to its own code than for a comment, which if usually filtered out by our minds (which act like a compiler sometimes, stripping everything).

When you write clean code, comments are duplication and can should be deleted.

Speaking about not actual code, but how you write it

How many time I wrote classdef for a Matlab class in this month? Too many, and the next thing I have to do is to configure some snippets to generate classes or methods. The same can be done for every language, I use vim but many editors allows you to define snippets.

How many times I moved my hands from the keyboard to the mouse? I'm learning or setting up more and more keyboard shortcuts.

How many times I looked at the keyboard today? Touch typing really helps in speed and accuracy, as I don't have to continuously switch my eyes direction between screen and keyboard.

And speaking about writing articles: I switched to Dropbox instead of rsync to avoid issuing those commands all the time. Look for your repeated commands in the shell with:

history |awk '{print $2}' | sort | uniq -c | sort -n | tail

and make an alias for them, or find a way to avoid typing them all the time.

However, we're not confined to code

The problem of duplication is always the same: that perfect duplicates do not exist. What is replicated multiple times over space or time won't stay always duplicated, but will get out of sync or will be duplicated incorrectly.

Duplicated incorrectly? Did my pc have corrupted Ram? No, but when we duplicate things over time by redoing them, we have the natural human tendency to not duplicate perfectly. That lead us to the next point.

Test automation

Every time you make a manual test, this test is prone to be repeated in another way (unless you are very, very careful in describing the procedure to follow.) Worse, that test can be forgotten: if you extend the analogy to test suites, forgotten tests are the manifestation of the the inability of duplicating a manual test suite perfectly.

Manual tests are boring to execute, especially in regression settings, and this is already a reason for automate them. Failure to replicate them perfectly is another.

Automation in Continuos Integration follows the same logic: the build steps are often so complex that forgetting one of them or making an error would be the norm. So you can automate completely the chain from commit to release (or to deployment), avoiding duplicating the same steps by hand every time.

Project management tools instead of different sheets and documents

Why we use many online, or however computer-based, project management tool? Of course, for managing everything in one place, but you have to be careful not introducing duplication between the computer database and your paperwork; they will certainly get out of sync. One of the key practices of the Getting Things Done framework for time management is to manage everything in one place, so that you don't have to search for post-it under three different desks, and you can maintain a focused mind when you're working, because you know you don't have to keep in your brain RAM unrelated things.

It's not about having an electronic board with fancy features like easy editing: it's about having a single board which can be accessed everywhere without duplicating via faxes and prints. For example, I use a paper notepad for my GTD process phase (collecting thoughts and possible TODOs) on paper and .txt files in Dropbox for the GTD plan phase: there is no overlap between them, thus I find myself comfortable even without a central electronic management system.

So if you work with Uml diagrams in your office, and you write them with pencil on paper, why use an online tool? If you have telecommuting colleagues, you are forced to do so, but for a colocated team just prescribe that all diagrams should be kept on paper (pencil is editable). No old files in an unreadable format on the server. Agile methodologies indeed are favorable towards paper.

Take-home points

So one of the programmer's jobs is to eliminate duplication: from code, from actions, from managegement tools, from documentation. Avoiding boring and repetitive tasks often puts the fun back in programming.

Now think about one duplicated thing you did yesterday and today and figure out a way to avoid doing it tomorrow.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}