Your Code Is Redundant, Live With It!
It's time to deal with code redundancy.
Join the DZone community and get the full member experience.Join For Free
This article is about necessary, and unavoidable, code redundancy. We look closer at a model of code redundancy that helps to better understand why source code generators do what they do, and why they are even needed at all.
You may also like: Duplication Vs. Redundancy in Code
The code you write in Java, or for that matter, in any other language, is redundant. Not by the definition that says (per Wikipedia page https://en.wikipedia.org/wiki/Redundant_code):
"In computer programming, redundant code is source code or compiled code in a computer program that is unnecessary, such as..."
Your code may also be redundant this way, but that is a different kind of story than what I want to talk about here and now. If it is, then fix it and improve your coding skills. But this probably is not the case because you are a good programmer. The redundancy that is certainly in your code is not necessarily unnecessary. There are different sources of redundancy and some redundancies are necessary; others are unnecessary but also unavoidable.
The actual definition of redundancy we need, in this case, is more like the information theory definition of redundancy (per the Wikipedia page https://en.wikipedia.org/wiki/Redundancy_(information_theory))
"In Information theory, redundancy measures the fractional difference between the entropy H(X) of an ensemble X, and its maximum possible value log(|A_X|)"
UPPPS... DO NOT STOP READING!!!
This is a very precise but highly unusable definition for us. Luckily, the page continues and says:
Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy.
In other words, some information encoded in some form is redundant if it can be compressed.
For example, downloading and zipping the text of the classical English novel Moby Dick will shrink its size down to 40% of the original text. Doing the same with the source code of Apache Commons Lang, we get 20%. It is definitely NOT because of this "code in a computer program that is unnecessary". This is some other "necessary" redundancy. English and other languages are redundant; programming languages are redundant and that is the way it is.
If we analyze this kind of redundancy, we can see that there are six levels of redundancy. What I will write here about the six layers is not well-known or well-established theory. Feel free to challenge it.
This model and categorization are useful to establish a way of thinking about code generation, when to generate code, and why to generate code.
After all, I came up with this model when I was thinking about the Java::Geci framework and I was thinking about why I invested a year of hobby time into this when there are so many other code generation tools. This redundancy model kind of gives the correct reason that I was only feeling before.
Levels of Redundancy
The next question is: are these (English and programming languages) the only reasons for redundancy? The answer is that we can identify six different levels of redundancy, including those already mentioned. Let's take a closer look.
This is the redundancy of the English language or just any other natural language. This redundancy is natural and we got used to it. The redundancy evolved with the language, and it was needed to help the understanding a noisy environment. We do not want to eliminate this redundancy, because if we did, we may end up reading some binary code. For most of us, this is not really appealing. This is how both human and programmer brains work.
The programming language is also redundant. It is even more redundant than the natural language it is built on. The extra redundancy is because the number of keywords is very limited. That makes the compression ration from 60% percent up to 80% in the case of Java. Other languages, like Perl, are denser, and alas, they are less readable. However, this is also a redundancy that we do not want to fight. Decreasing the redundancy coming from the programming language redundancy certainly would decrease readability, and thus, maintainability.
There is another source of redundancy that is already independent of the language. This is code structure redundancy. For example, when we have a method that has one argument, then the code fragments that call this method should also use one argument. If the method changes for more arguments, then all the places that call the method also have to change. This is a redundancy that comes from the program structure, and this is not only something that we do not want to avoid, but it is also not possible to avoid without losing information and that way code structure.
We talk about domain-induced redundancy when the business domain can be described in a clear and concise manner but the programming language does not support such a description. A good example can be a compiler. This example is in a technical domain that most programmers are familiar with. A context-free syntax grammar can be written in a clear and nice form using BNF format. If we create the parser in Java, it certainly will be longer. Since the BNF form and the Java code mean the same, and the Java code is significantly longer, we can be sure that the Java code is redundant from the information theory point of view. That is the reason why we have tools for this example domain, like ANTLR, Yacc, Lex, and a few other tools.
Another example is the fluent API. The fluent API can be programmed implementing several interfaces that guide the programmer through the possible sequences of chained method calls. It is a long and hard way to maintain a fluent API. At the same time, the fluent API grammar can be neatly described with a regular expression because fluent APIs are described by finite-state grammars. The regular expression listing the methods describing alternatives, sequences, optional calls, and repetitions is more readable, shorter, and less redundant than the Java implementation of the same. That is the reason why we have tools like Java::Geci Fluent API generators that convert a regular expression of method calls to fluent API implementation.
This is an area where decreasing the redundancy can be desirable and may result in easier-to-maintain and more readable code.
4 Language Evolution
Language evolution redundancy is similar to the domain induced redundancy, but it is independent of the actual programming domain. The source of this redundancy is a weakness of the programming language. For example, Java does not automatically provide getters and setters for fields. If you look at C# or Swift, they do. If we need them in Java, we have to write the code for it. It is boilerplate code, and it is a weakness in the language. Also, in Java, there is no declarative way to define
hashCode() methods. There may be a later version of Java that will provide something for that issue. Looking at past versions of Java, it was certainly more redundant to create an anonymous class than writing a lambda expression. Java evolved and this was introduced into the language.
Language evolution is always a sensitive issue. Some languages run fast and introduce new features. Other languages, like Java, are more relaxed, or we can say, conservative. As Brian Goetz wrote in response to a tweet that was urging new features:
"It depends. Would you rather get the wrong feature sooner, but have to live with it forever?"
@BrianGoetz Replying to @joonaslehtinen and @java 10:43 PM · Sep 16, 2019
The major difference between domain-induced redundancy and language-evolution-caused redundancy is that while it is impossible to address all programming domains in a general-purpose programming language, the language evolution will certainly eliminate the redundancy enforced by language shortages. While the language evolves, we have code generators in the IDEs and in programs like Lombok that address these issues.
This kind of redundancy correlates with the classical meaning of code redundancy. This is when the programmer cannot generate good enough code and there are unnecessary and excessive code structures or even copy-paste code in the program. The typical example is the before mentioned "Legend of the sub-par developer". In this case, code generation can be a compromise, but it is usually a bad choice. On a high level, from the project manager point of view, it may be okay. They care about the cost of the developers and they may decide to hire only cheaper developers. On the programmer level, on the other hand, this is not acceptable. If you have the choice to generate code or write better code you have to choose the latter. You must learn and develop yourself so that you can develop better code.
... or takeaway.
When I first started to write about the Java::Geci framework, somebody commented "why another code generation tool"? And the question is certainly valid. There are many tools like that, as mentioned in the article.
However, if we look at the code redundancy categorization, then what we can see is that Java::Geci can be used to manage the domain-induced redundancy and perhaps the language-evolution-caused redundancy. In the case of the latter, there are many concurrent programs, and Java::Geci cannot compete, for example, with the ease of use of the IDE-built-in code generation.
There are many generators that address some specific domains and manage the extra redundancy using code generation. Java::Geci is the only one — to my knowledge — that provides a general framework that makes the domain-specific code generator creation simple.
To recognize that the real use case is for domain-specific generators, the above redundancy model helps a lot.
[DZone Refcard] Java API Best Practices
Published at DZone with permission of Peter Verhas, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.