Docx Templating With docx4j: Tips and Tricks
Looking to make fancy templates for docx Word documents? See how you can make it happen with docx4j and some nasty pitfalls to avoid during your work.
Join the DZone community and get the full member experience.
Join For FreeThe Problem
Say we need to create a Word document (based on a template) filled with data from a system with 'unlimited' rows and a possibly 'unlimited' number of columns.
Just like in the awesome, pro, fancy picture below.
We have a few requirements:
We have to use our super/hyper docx template from our legal/report/whatever deparment.
We have many rows, so they do not all fit on one page.
We need to repeat the header on every page.
We have many columns, so they do not all fit on one page.
We need to generate another table starting from a new page if all the columns do not fit on one page.
Solution
My solution for such problem was to:
Read the docx file with docx4j.
Find the template table.
Clear template content (or remove this one table).
Split the data into parts that fit on one page.
Generate every table with the most possible data (e.g. 6 drivers on one page).
Start a new table on a new page if there is more data to write.
At the end, clear possibly blank columns.
I've shared my complete solution on GitHub.
Tips and Tricks
Set the repeat header row in the template docx: When you prepare the template document, take care to set the "repeat header row" in the table settings. This way, you will avoid writing your own pagination, which was a nightmare in my case. I facepalmed after realizing this is only one setting in Word/LibreOffice.
Do NOT use Word Online to create the template document: I'm a Linux user, so preparing a nice-looking Word document is not an easy task. So, I decided to use the online version of Word, which led me into a lot of trouble. The biggest one was that there is no option for repeating the header row for tables (Why? ¯\_(ツ)_/¯ ). You have to use full Word or try to handle this with LibreOffice.
Do not use VariablePrepare.prepare() when generating multiple copies from one template: This guy was suggested on few sites, including GitHub and SO. But if you try to invoke it multiple times, it is not going to work nicely. In my repo, I've created a version of prepare that works on objects, so you do not have to run prepare for the whole document.
Avoid using documentPart.variableReplace(Map<String, String>) when creating multiple copies: This guy got me in trouble as well. I was trying to create a copy of my table, then run variableReplace multiple times. The result was so strange that even now I'm not sure what was happening there.
Use simple String to handle templates: When you look inside docx4j, there is a lot of marshalling and unmarshalling of data and operations on plain Strings. You should do the same.
Doing just...
Object template = XmlUtils.deepCopy(object);
String templateAsAString = XmlUtils.marshaltoString(template);
// mappings is a map of variables from docx with values
// in docx, you use ${var}. In mappings, var
StrSubstitutor strSubstitutor = new StrSubstitutor(mappings);
Object result = XmlUtils.unmarshalString(strSubstitutor.replace(templateAsAString));
...is more than enough to replace variables with actual data.
Good Template
The template was quite a challenge. First of all, docx has an annoying tendency to split your text into XML parts. So instead of ${var}, you end with something like ${var</tag><tag>} — which will display OK, but during processing, you will find that the variables are not filled with the proper data.
Summary
After all that, I think docx4j is quite a handy tool. Still, I found myself a little disappointed that doing something so common, like generating a document from a template, can be such hard thing to do.
@EDIT
Thanks to Tom Hombergs for link to his project which is wrapping docx4j.
So if you have Spring based project try it: https://github.com/thombergs/docx-stamper
Opinions expressed by DZone contributors are their own.
Comments