Memoize Groovy functions with GPars
Join the DZone community and get the full member experience.
Join For Free
Once again I looked over the fence into the Clojure garden for
inspiration and came back attracted by the memoize function,
which, simply put, enables caching of function's return values. You call
a function once - it will do the usual calculation. You call the
function a second time - if the same argument values are used, the
return value is retrieved from the internal transparent cache without
starting the actual calculation. And so you get your result faster.
You're trading-off memory for performance.
To see a concrete
use-case, checkout out this brief example in Groovy using the
experimental GPars
memoize implementation:
GParsPool.withPool {
def urls = ['http://www.dzone.com', 'http://www.theserverside.com', 'http://www.infoq.com']
Closure download = {url ->
println "Downloading $url"
url.toURL().text.toUpperCase()
}
Closure cachingDownload = download.memoize()
println 'Groovy sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GROOVY')}
println 'Grails sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRAILS')}
println 'Griffon sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRIFFON')}
println 'Gradle sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRADLE')}
println 'Concurrency sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('CONCURRENCY')}
println 'GPars sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GPARS')}
}
Notice
closures are enhanced inside the GParsPool.withPool() blocks
with a memoize() function, which returns a new closure wrapping
the original closure with a cache.
In the example we're calling the cachingDownload
function in several places in the code, however, each unique url gets downloaded
only once - the first time it is needed. The values are then cached
and available for subsequent calls. And also to all threads, no
matter which thread originally came first with a download request for
the particular url and had to handle the actual calculation/download.
So,
to wrap up, memoize shields a function by a cache of past return
values.
No matter how simple the principle looks at first glance, read on to see pretty exciting consequences.
Taking off
Chances are pretty high you, being developers, have seen any of the many variants of Fibonacci number generators before. Here I show the least verbose and least performant, purely recursive, implementation:
Closure fib = {n -> n > 1 ? call(n - 1) + call(n - 2) : n}
It's not only inefficient, it's incredibly inefficient, exponentially, I would say. Try running the function to give you the 40th or so Fibonacci number and you're done for the day.
Now is the time, when your brain is about to start thinking about ways to optimize the algorithm to turn it from its unacceptable exponential complexity into a linear one. You may come up with solutions revolving around turning recursion into a cycle, bubling the n-1 and n-2 values up or keeping track of smaller so far calculated Fibonacci numbers somewhere for reuse. These are all good approaches, in which, in a sence, you're going to trade-off memory for peformance. But wait a moment, haven't I say earlier that trading-off memory for performance is something that memoize can do?
Check out the modified version of my Fibonacci generator, this time with linear complexity:
Closure fib
fib = {n -> n > 1 ? fib(n - 1) + fib(n - 2) : n}.memoize()
Pick any number and you get the result almost instantly. And
subsequent calls to the fib function will leverage the values calculated
by the earlier calls. The only change? The fib function is
now memoized, meaning that it keeps a transparent cache of so far
calculated values. The memoize concept allowed me to optimize my code
without having to give up the beauty of the original recursive function.
Now, I'm sure the functional guys out there may laugh out loud at
me, since this is something they've been doing for ages, but my
functional-flavoured-imperative-mind got all excited seeing the elegance
of the concept.
Cut down on fat
Since some may argue that we're usurping a bit too much memory in our calculation remembering all Fibonacci numbers up to the argument value, in the last optimization we'll make an attempt to be nice and only grab as much memory as absolutely necessary - enough to remember two Fibonacci numbers, which is the minimum required to avoid repetitive calculations:
Closure fib
fib = {n -> n > 1 ? fib(n - 1) + fib(n - 2) : n}.memoizeAtMost(2)
Well, that's it - linear time complexity, using only two extra memory positions to store intermediate values. I would have hard time trying to write a considerably more CPU and memory efficient imperative algorithm to compete with this clean recursive functional code.
Conclusion
Although I might be a bit of an exception, the functional aspect of Groovy appeals to me quite a lot. Closures, the ability to define partially defined functions, immutable data structures or memoized functions all belong to the family of exciting concepts in the language.
The good practical news - If you want to experiment yourself with memoize in Groovy, check out the GPars memoize samples, grab the recent GPars 0.11 snapshot and don't forget to send us your feedback. I'm especially curious to hear ideas on domains or algorithms where memoize could be helpful.
From http://www.jroller.com/vaclav/entry/memoize_groovy_functions_with_gpars
Opinions expressed by DZone contributors are their own.
Comments