Hand-written and Generated Code: Never the Twain Shall Meet

DZone 's Guide to

Hand-written and Generated Code: Never the Twain Shall Meet

· Java Zone ·
Free Resource

There are many tools these days that generate code. Before writing such a tool, stop to consider if you really should be generating code in the first place. After all, you're generating code from a model---you might not think of it as a model, but that's what I call it because that's what it is---so if you have enough information to generate the code that realizes the behavior described by the model, you obviously have enough information to emulate the behavior of the model. Byte code has a cost. It causes bloat. Don't produce it if you don't have to. EMF, for example, can emulate an instance of an Ecore model, including a fully functional editor, without generating a single line of code; just trying invoking "Create Dynamic Instance..." on any EClass' pop-up. It's a cool thing.

If you do have a good reason to generate code, keep in mind that humans will read it. Hand writing bad code is unacceptable, but generating bad code is completely inexcusable. Have you ever seen generated code where every referenced class name is fully qualified? It's clearly the simplest way to avoid name collisions, but it seems disrespectful of the human reader. Generating code that isn't of hand-written quality gives generators a bad reputation so focus on creating a thing of beauty.

In the ideal world, generated code would be complete. It would never need to be sullied from its untouched pristine state. Technically, you would not even need to put it under source code control because of course you can always regenerate it. You'd want to be very careful to version the generator in that case though. And keep in mind that if you don't version control the generated code, all your clients will need to install the right version of the generator tools simply to produce a functional code base. Also, it will be more difficult to detect when changes in the generator produces code that's different from what you've been testing. Treating generated code as if it's ephemeral has definite appeal, but is something to consider carefully.

You've probably noticed that the world is typically not quite ideal, and often far from it. So it's often the case that clients need to tailor what's generated. Sometimes that's even the whole point: the generated code is just scaffolding or a starting point from which to hand code a complete application. It's typically important to be able to invoke the generator again if the input model for it changes. Because many generators will simply overwrite any files they generated that last time, keeping hand written changes separate is obviously important in that case. But many generators also support protected regions where users can write their code such that it will not be overwritten. EMF takes this design to the extreme, effectively inverting it, by marking all the regions that the generator may touch. I like to think it's a bright idea.

There are those who believe that one should never modify generated code. I'm not one of those people, though there are clear advantages to avoiding it. For example, it's really easy to see what you've written yourself verses what was generated for you. JDT's support for filters mitigates that advantage by supporting the same thing dynamically, i.e., hides everything marked @generated. More importantly, it's possible to delete all the generated stuff to do a clean sweep. That's probably the strongest reason. On the downside, more classes result in more bloat. Even an empty class will take close to 0.5k. Worse yet, if you can't anticipate which files a user will wish to specialize, you're liable to double the number of classes. For example, in the implementation of MOF that preceded EMF, for every EClass Foo, it would generate FooGen, Foo, FooGenImpl, and FooImpl, where Foo extends FooGen, FooGenImpl implements Foo, and FooImpl extends FooGenImpl and implements Foo. The whole design caused significant bloat and just looked very stilted; even in the public API was very clearly tainted by the fact a generator was being employed. It's import to realize that small droplets of bloat will tend add up...

So while some will argue that when it comes to hand written and generated code, never the twain shall meet. I think it's important to keep in mind that, as with most things in life, there are trade-offs to our design decisions . As such, it's more important to explain and understand all the considerations that should be taken into account when making a choice than it is to decide which specific choice is a best practice in general. After all, EMF's generator model generates both the Ecore model and itself, so we're not actually in a position to delete our generated code. We need it to bootstrap the environment. It's prickly problem.

So while it's often a good practice to separate generated code from hand written code, and it's not necessary to version control generated code in that case, these decisions come at a price.

This is an extract from http://ed-merks.blogspot.com/2008/10/hand-written-and-generated-code-never.html


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}