iText, the free and open source Java library for creating and manipulating PDF documents, was used to create the program guide at Devoxx this year. But there's more interesting news about iText, because the 2nd Edition of the very popular "iText in Action" (Manning) has just been released. Time to talk to its author, who is also the author of iText itself (an interview relating to the 1st edition can be found here on DZone).
Hi Bruno, the 2nd edition of "iText in Action" has just come out. Congratulations!
Thanks. It's always a magical moment when you receive the first boxes with the first copies of a book you've worked on. Manning had been asking for a revision for a long time because the 1st edition was over 3 years old and had sold almost 12000 copies. In the summer of 2009 I agreed to write a 2nd edition on condition that it could be a rewrite and not just a revision.
So, it's a rewrite. What's new about it?
It's about iText, obviously, but the content is organized in a completely different way. Most of the examples are new examples. In the first book the examples were very short, none of them used a database, and for the 2nd book I wanted more real world examples. For example, I created a database with 120 movies, 80 directors, and 32 countries. I used this database to create documents for an imaginary movie festival, such as an overview of the movies and a schedule, as can be seen below:
Reusing those examples, it wouldn't be hard to create a program like that used at Devoxx, for example, which is what I did.
The examples are more attractive because you can use movie posters, while in the first edition I only had "quick brown fox jumps over the lazy dog" examples.
But iText is simply a library for converting documents to PDF, right, so what does it have to do with schedules and movie programs and things like that?
iText isn't really a document converter, it's not really for converting Word documents to PDF. iText is used for creating documents from scratch using data coming directly from a database. Or, in the case of Devoxx, from the REST interface that is made available:
And here is the source code that was used to create the conference guide:
Here's the result:
Not only did you create the conference guide, but you also presented iText during a lab session yesterday. How did that go, what did you highlight, and how were the responses from your audience?
In the first part I showed how I made a proof of concept for the conference guide. That was interesting for new iText users but experienced iText users said that one could do all that with the 1st edition of the book. After the break, I highlighted what's new in the 2nd edition and that was very interesting, which was useful for experienced users.
One question was about the new text extraction functionality: "how can you check if iText will be able to extract the text or not". PDFs come in different flavors and text extraction is a very complex matter. I also invited a company that was using iText to give a demo and we ended up brainstorming new features to add to their product.
You're also doing a book signing while you're at Devoxx.
Right. On the first conference day, i.e., tomorrow, Wednesday, at 13.00 at the Pearson booth in the exhibition room. The book will be available throughout the conference at the bookstore.
iText has become really popular over the years. It's used all over the place. Had you foreseen that, are you surprised?
Had I foreseen that? No. But, it was a childhood dream to write a book... I never thought it would be possible but thanks to the success of iText I had the opportunity to write my first book. And the first book gave iText the credibility it needed to break through.
Now it is used by Adobe itself, IBM, JBoss Seam, lots of airlines to print vouchers, and lots of banks to print bank statements. In fact, whenever I receive an invoice in PDF, I look for the name of the producer...
With all those large organizations using your library, you must be a millionaire by now, right?
If only... Things have changed in the sense that I didn't foresee the success and, well, at a certain point, I didn't feel I owned iText, but iText owned me. On top of that, there was the dark year 2008, when my son was diagnosed with cancer. I was more or less rescued by an American, Andrew Binstock, an author and developer, who I met in Gent, in Belgium. He proposed to create a company in the States which started to provide a commercial license for iText. This created revenue, with this revenue we are now able to pay developers.
In short, we have a dual license. iText Software Corporation sells licenses, while 1T3XT owns and protects the intellectual property, ensuring further development. Cost of license depends on your usage, for example, in a desktop application or as OEM or in a SaaS environment. Banks and insurance companies were the first customers.
Great. So there's a team of developers working on iText?
Well, last year I started writing a business plan, so it's still all a work in progress. But the main goal is to professionalize iText. Basically, what you could get away with a few years ago in open source, you can't have anymore, such as having a library that depends on one person.
After being a developer and an author, I now have to "think business", and it's hard to adapt, but I'm learning...
What are some aspects of the profesionalization that iText is going through in the coming period?
I think there are three things. First, there are the legal aspects. The past few years there has been a lot of change in this area. Every open source project has a lawyer now, which is new for me. Secondly, we need to stay ahead, so we need to follow the PDF specification very closely. We need to add new functionality... I have a long to-do list of things I've always wanted to do. And we need to train the developers, there are 5 core developers, and now I'm cooperating with an existing team of developers close by. The 5 core developers are spread all over the world, in fact, only one of them have I actually seen in person.
Thirdly, what I've always refused to do is consultancy. E.g., creating a conference guide. That distracts from the core iText, the easy money wasn't interesting enough. But now we can start thinking of consultancy, which the existing team in Belgium will do. I've been working with Gent University for 10 years, which inspired me, because I had real world requirements. I was never able to combine these different commitments, so now I am taking a sabbatical to really focus on iText.
Can you reveal some of the features you'd really like to have in iText?
Well, I discuss them in the book:
- In chapter 8, I talk about the different technologies to create interactive forms. The old AcroForm technology and the newer XFA (XML Forms Architecture). The former is fully supported in iText, but we've only started to support XFA. You can fill out a dynamic XFA form with iText but you can't flatten it. An XFA to PDF converter is high on the priority list.
- In chapter 12, I discuss digital signatures. What's new in the book is OCSP (online certificate status protocol) and timestamping. There is a new specification for long term validation of signatures, by ETSI (the European Technical Standard Institute. The specification will be added to ISO-32000-2, which is expected next year. It's not yet in the PDF specification, but we've already started implementing it.
- In the first book I talked about the fact that iText doesn't do text extraction. But we received a code contribution that we couldn't refuse. And now, in chapter 15, there are some examples that show how to extract text from a PDF and there's even an example where we go from XML to PDF and back. This functionality was originally written as an example for the book, but it's now in the main iText release. It comes with a warning: it doesn't work for all structured PDFs yet, but that's also one of the things on my to-do list, to make this work for all tagged PDFs.
Something also is that there's a tool that has been used by the core iText developers, to debug a PDF, which is called RUPS (reading and updating PDF syntax). For the moment we've made the source code available, but we haven't provided it as a separate application. It would be nice to integrate it into IDEs, for example, so that you can open a PDF file and see what's inside, maybe via highlighted syntax.
Anything else you want to share?
Well, with the 1st edition I gave an overview of the history of PDF. As a first time author, I was too shy to contact one of my heroes, Jim King, the creator of PDF, i.e., the James Gosling of PDF.
For the 2nd edition, I was pleased that he was happy to review that chapter. So, it was great to get the real story first hand from Jim King himself.
Congrats again and all the best with iText!