John Wilson: Groovy and XML

Andres Almiray

Apr. 01, 08 · News

Likes (1)

Comment

Save

17.3K Views

John Wilson is mainly known to the Groovy community because of his work on XmlSlurper, one of the easiest ways to work with XML in the JVM. Continue reading to learn what inspired John to get into Groovy. Enjoy!

[img_assist|nid=2119|title=|desc=|link=none|align=right|width=220|height=188]Q. John, you are the creator of XmlSlurper, what motivated you to make it ?

A. I had a problem with processing very large XML documents. XmlParser uses a simple and robust way of implementing GPath expressions which involves building an array to hold the result of each term on the expression. Unfortunately this means that you can consume large amounts of memory if the original document is large. I was getting out of memory errors quite a bit so I wrote the original version of XmlSlurper to use Iterarors rather than arrays which cut down the memory footprint quite a bit. It had the happy side effect of being faster too. I have rewritten XmlSlurper a couple of times since then and it now plays very well with StreamingMarkupBuilder, handles namespaces nicely and has an interesting way of doing edits on the fly as the slurped document is written out.

Q. XmlSluper and XmlParser are so similar, is there a reason to have both ?

A. On the one hand it's a disadvantage because users are unsure which one to use (The answer is - for most things it doesn't matter). However they have differences which are significant and, in my view, valuable. The most significant difference is their approach to editing the document. XmlParser is very straightforward, you just change the in memory tree structure which represents the document. XmlSlurper does not let you have direct access to the in memory data structure. It forces you to put you editing code in closures and specify where in the document the closures should be applied. It then does the editing on the fly as the document is written out. The first mechanism is simple but limited the second is more complicated but very powerful. Most of us only want to do relatively simple operations on small XML documents and XmlParser is excellent for that. For those people who want to do rather complex operations on XMl documents which can be quite large then XmlSlurper is a better choice. Fortunately they support an almost identical GPath syntax so switching from one to the other is no big deal. This was not always so - the community owes a big debt of gratitude to Paul King for doing a large amount of work on documenting these implementations and aligning them. My long term aim is to be able to do away altogether with the need to hold the whole document in memory but to stream the document through memory whist executing the GPath expressions.

Q. You are also the creator of the xmlrpc module, what can you tell us about it ?

A. It's a module I'm very fond of. It's an excellent example of how Groovy can make something that is quite complex in Java completely trivial. I can build an XML-RPC server in 4 lines of Groovy and a client in 1 line. Performance is excellent and it interoperates well. It is based on code I wrote a few years ago to implement XML-RPC on the Dallas Semiconductor TINI. The TINI is an amazing device the size of a memory SIMM which runs Java on a 8051 (the processor which controlled the keyboard in the original IBM PC) with 1Mb of battery backed up RAM and an Ethernet port. One of my favourite TINI apps was a weather forecasting toaster! [it's true! check it out for yourselves here]

Q. Have you participated in other Open Source projects ?

A. Yes, mostly in the embedded Java arena. I wrote tiny XML parsers (MinML and MinML2) and a tiny XML-RPC server (MinMl-RPC) i have also contributed to VNC and the Snort intrusion detections system. In the background I'm working on Ng which is an attempt to build a runtime system for dynamic languages on the JVM which is both simple and fast.

Q. Ng, can you share more about it?

A. Ng is a solo (at the moment) project which tries to answer the question: How can we implement a fully dynamic language on the current JVM which runs no more than ten times slower than Java? This is looking for an improvement of one to two orders of magnitude over current implementations (Groovy, JRuby, Jython, etc.). The idea is to design a programming language "backwards". I start with a highly optimised runtime system and then derive a language which can be optimally compiled for that runtime system. I'm hoping that some of the insights I get whist doing this can be fed back into the Groovy 2.0 MOP redesign. I'm making good but slow progress. I have arithmetic operations executing at less than twice as slow as Java in some benchmarks and method calls are coming below ten times as slow. I have started to document some of the techniques I have developed http://docs.google.com/View?docid=ah76zbd6xsx2_9ck33c8dp

Q. How did you get involved with Groovy ?

A. I was looking for an Open Source project to get involved in. I have a long term interest in programming languages (my first paid job was as a compiler writer in 1971). I looked at Ruby and JRuby but it was too Perlish for my tastes and the JRuby project looked moribund. Google found me Groovy and I liked the feel of the language and the community was very lively so I stuck around.

Q. Do you use Groovy at work ?

A. Yes. If I have to mung XML I will always do it in Groovy. I also spend quite a bit of time building DSLs in Groovy. I think the return on investment in DSLs is huge if they are done properly.

Q. Do you have a preferred technique for building DSLs (builders, metaprogramming, ... ) ?

A. I like builders a lot. I think that the Builder concept is one of James Strachan's best ideas. I built a little DSL to allow people to specify arbitrary graphs - it took about an hour to develop and it's saved days in allowing us to specify complex graphs simply, clearly and reliably. I tend to override invokeMethod, etc. or use Categories rather than ExpandoMetaClass to do MetaPrograming magic. That's probably because to got into the habit before Graeme wrote ExpandoMetaClass. However I do like the fact that Categories allow me to limit the extent of the change to a single thread - they need to have less impact on performance, though.

Q. Is there a specific feature you would like to see in a future version of Groovy ?

A. I think Inner Classes need to be added. Other than that I don't see much urgent need for language extensions. Quite a lot of work has been done on making the run time system cleaner and that work needs to continue. The speed of the implementation has been improved in the last few months but there is more work needed there (especially with Categories). the big thing I'd like to see is the ability to not compile to class files but to execute the AST (Abstract Syntax Tree) directly. JRuby does this and it can be very useful in cases where you are generating code dynamically and executing it once or twice before discarding it (which is quite a common use). It would also help with the Groovy console.

Thanks John!

John's bio

John Wilson has been a programmer, project manager, teacher, CTO and CEO. He's now CTO of an English engineering company and is enjoying working with a great crowd in the Groovy/Grails community.

XML Groovy (programming language) Document Open source XML-RPC Java (programming language) Memory (storage engine) Runtime system

Opinions expressed by DZone contributors are their own.

Related

Trending

John Wilson: Groovy and XML

Related

Partner Resources