Let The Caller Choose
Join the DZone community and get the full member experience.
Join For FreeA programming idiom
I often see in managed runtime environments, such as Java, is to return
a collection or array from a method that is allocated within the
method. Having originally come from a C/C++ language background this
always struck me as a bit strange and lazy. In the C language world
this idiom would likely be a source of memory leaks. I do however
understand how convenient it can be in garbage collected environments. A
good example is the split function on a String as below:
String str = “Some string that I may want to split”; String[] values = str.split(“ ”);
For this a new array has to be allocated
within the method. If the array is small it can be allocated in the
young generation. Due to the round-robin nature of the young generation
it will most likely involve a cache miss. The OS, in addition, must
zero the memory for security before handing it over. What’s worse is
the size of many arrays cannot be determined until the method completes,
so temporary structures may have to be allocated before the final
structure is allocated and copied into. Then after all this, the memory
will eventually need to be garbage collected. This feels like the CPU
is doing a lot of work to make this idiom work.
What is the alternative?
Another idiom is to let the caller allocate the structure and pass it into the method like in the example below:
String str = “Some string that I may want to split”; Collection<String> values = new ArrayList<String>(); str.split(values, “ ”);
With this approach the caller has more
control and flexibility. Not only can the caller reuse the collection
across multiple calls, they can also choose the most appropriate
structure for the end needs. For example, it could be a HashSet if
finding distinct words, an ArrayList for cache friendly performance, or a
LinkedList if memory is a scarce resource. The method can return the
collection allowing a fluent coding pattern.
In the Java world this idiom becomes
critical to performance when large arrays are involved. An array may be
too large to be allocated in the young generation and therefore needs
to be allocated in the old generation. Besides being less efficient and
contended for large object allocation, the large array may cause a
compaction of the old generation resulting in a multi-second
stop-the-world pause. Now that is a good topic for another blog post...
I can hear folk cry out, “but short lived
objects are cheap!” While this is true when compared to longer living
objects, they are relatively cheap but certainly not free. I’d prefer
to use those extra cycles solving the business problem and, in addition,
have the flexibility to choose the most appropriate data structure for
the task at hand.
From http://mechanical-sympathy.blogspot.com/2011/07/let-caller-choose.html
Topics:
Opinions expressed by DZone contributors are their own.
Comments