Apache PDFBox. Don’t Leave Home Without It.

Got PDF problems. Check out how you can use Java and a handy tool to design and create PDFs for just about anything you could need.

Eugen Hoble

Oct. 07, 16 · Tutorial

Likes (21)

Comment

Save

27.1K Views

Scenario: You have to generate a multiple page PDF report from production data. Each page must be a PDF page that will show your production data and allow users to change it through the use of text fields or text areas. Each report could have from 1 to 50 pages, each page is divided into sections, and each section can have 20 UI components (text fields and text areas). One page can contain from two sections to eight sections, depending on what specific template is it used for that page.

Please see below how the template of such page might look like:

Explanation: In Section #1 in the above image, the text (for instance Label 1) is displayed inside a text field. The text for Label 3, for instance, is shown in the document inside a text area. Hence, the big number of text fields per page.

Solution: You will need a library to generate PDFs, and there are at least two libraries that allow developers to create PDF fillable forms: Apache PDFBox and iText. This article details only how to use Apache PDFBox to generate a PDF report.

I think at this point some explanation might be needed regarding what classes PDFBox uses to generate a PDF document:

PDDocument: This class is a blueprint for a PDF document (the document can contain one or multiple pages).

PDPage: This class is used to model a PDF page that is part of a document.

PDAcroForm: This class models the AcroForm attached to a PDF document. An AcroForm allows the use of UI components and JavaScript in a PDF document.

A function that could generate a one-page or multi-page report could look like this:

private PdfPageBuilder pdfPageBuilder;
private ReportFileProcessingImpl reportFileProcessing;
private File createPdf(ProductionExportData exportData)throws IOException {
    File file = null;
    if (exportData != null && exportData.getTotalNrPages()>0) {
        Stack < PDDocument > pdfStack = new Stack < PDDocument > ();

        PDDocument pdfPageDoc = null;
        PDAcroForm acroForm = null;

        while (exportData.hasPage()) {
            Page page = exportData.getNextPage();

            pdfPageDoc = new PDDocument();
            acroForm = initializeNewPDDocument(pdfPageDoc, page);

            PDPage pdPage = pdfPageBuilder.createPdfPage(pdfPageDoc, acroForm, page )
            if (pdPage != null)
            pdfPageDoc.addPage(pdPage);

            //some more cod here...
            if (isMultiPageReport) {
                if (pdfStack != null && !pdfStack.isEmpty())
                file = reportFileProcessing.generateZip(pdfStack);
            } else
            file = reportFileProcessing.generatePagePdf(pdfPageDoc);
        }

    }
    return file;
}

Details about the code above:

ProductionExportData contains your production data ready to be placed in a PDF page.
Page is your custom blueprint for a PDF page. This class will contain all the data you need to show in a page.
initializeNewPDDocument is a function that initializes a PDDocument by creating a new AcroForm for each PDDocument.
PdfPageBuilder is responsible for creating a PDPage
ReportFileProcessing is an interface that exposes two functions: one to create a ZIP file and the other to create a PDF file.

private PDAcroForm initializeNewPDDocument(PDDocument pdfPageDoc, Page page) {

    PDAcroForm acroForm=null;
    if (pdfPageDoc != null && page != null ) {

        PDDocumentInformation info = new PDDocumentInformation();
        info.setTitle("some title of the document here");
        pdfPageDoc.setDocumentInformation(info);

        // Add a new AcroForm and add that to the document
        PDAcroForm acroForm = new PDAcroForm(pdfDocument);

        PDFont fontNormal = PDType1Font.HELVETICA;
        PDResources resources = new PDResources();
        resources.put(COSName.getPDFName("Helv"), fontNormal);

        PDFont fontBold = PDType1Font.HELVETICA_BOLD;
        resources.put(COSName.getPDFName("HelvBold"), fontBold);

        // Add and set the resources and default appearance at the form level
        acroForm.setDefaultResources(resources);

        // Acrobat sets the font size on the form level to be// auto sized as default. This is done by setting the font size to '0'
        String defaultAppearanceString = "/Helv 0 Tf 0 g";
        acroForm.setDefaultAppearance(defaultAppearanceString);
        acroForm.setNeedAppearances(true);

        pdfDocument.getDocumentCatalog().setAcroForm(acroForm);

    }
    return acroForm;
}

Question: Why do I create a new PDDocument for each Page instead of creating only one PDDocument for all pages?

Performance Consideration #1

An AcroForm contains all the UI components in a PDDocument. In case your document has hundreds or thousands of UI components, the generation of the PDF will be extremely slow because the AcroForm will become too big.

For the specific case presented in this article, I have mentioned in the beginning that on a single PDF page is it possible to have anywhere from 40-160 UI components.

If the document has 50 pages, that makes between 2000 and 8000 UI components in the final PDF document. This is way too much for an AcroForm, and the generation of your PDF will become extremely slow.

The solution, in this case, is to have a separate PDDocument for each Page or a group of pages (up to 10 pages).

Note: The number of UI components presented in this article is a real number .

Drawing Margins

The margins can be drawn line by line:

PDPageContentStream contentStream = new PDPageContentStream(pdfDocument, pdPage);
contentStream.setStrokingColor(Color.BLACK);
contentStream.setLineWidth(1);

float marginStartX = 0;
float marginStartY = 0;
float marginEndX = page.getWidth();
float marginEndY = page.getHeight();

contentStream.moveTo(marginStartX, marginStartY);
contentStream.lineTo(marginEndX, marginStartY);
contentStream.moveTo(marginEndX, marginStartY);
contentStream.lineTo(marginEndX, marginEndY);
contentStream.moveTo(marginEndX, marginEndY);
contentStream.lineTo(marginStartX, marginEndY);
contentStream.moveTo(marginStartX, marginEndY);
contentStream.lineTo(marginStartX, marginStartY);
contentStream.closeAndStroke();

Alternatively, they can be drawn using the addRect function.

contentStream.addRect(footerStartX, footerStartY, width, height);

PDPageContentStream is responsible for the actual drawing of the PDF.

To write text at specific coordinates in a PDF page use this:

contentStream.beginText();
// set font and font size
contentStream.setNonStrokingColor(Color.BLACK);
contentStream.setFont(fontSettings.getFontNormal(), fontSettings.getFontNormalSize());
contentStream.newLineAtOffset(footerStartX + 10, footerStartY + footerHeight/2);
contentStream.showText(pdfpageNumber);
contentStream.endText();

How to Add a Text Field

PDTextField textBox = new PDTextField(acroForm);
textBox.setMultiline(false);
textBox.setPartialName("this name must be UNIQUE between all existing UI components!");
textBox.setDefaultAppearance("/HelvBold 7 Tf 0 g");
acroForm.getFields().add(textBox);

PDAnnotationWidget widget = textBox.getWidgets().get(0);
widget.setPrinted(true);
widget.setAnnotationName("UNIQUE name");

//the position/size of the text field
PDRectangle rect = new PDRectangle(startX, y, 100,20);
widget.setRectangle(rect);
widget.setPage(pdPage);
pdPage.getAnnotations().add(widget);

textBox.setValue("some value here that will be shown in this field");

How to Add an Annotation to a Specific Area

File file = loadAnnotationIcon();
PDImageXObject ximage = PDImageXObject.createFromFileByContent(file, pdfDocument);
contentStream.drawImage(ximage,
                        headerCoordinates.getTopX() - ANNOTATION_ICON_WIDTH,
                        headerCoordinates.getTopY() - ANNOTATION_ICON_HEIGHT,
                        ANNOTATION_ICON_WIDTH, ANNOTATION_ICON_HEIGHT);

PDAnnotationRubberStamp annoPopup = new PDAnnotationRubberStamp();
annoPopup.setRectangle(new PDRectangle(ANNOTATION_POPUP_WIDTH,ANNOTATION_POPUP_HEIGHT);
annoPopup.setContents("some content here");
annoPopup.setConstantOpacity(1f);
annoPopup.setReadOnly(true);
annoPopup.setColor(annotationBackground);
annoPopup.setPrinted(false);

The code above can generate an annotation that will look like in the image below.

When you click on the yellow icon, you'll see the annotation popup.

How to Generate a PDF File From a PDDocument

public File generatePagePdf(PDDocument doc) {
    String tempFilePath="some temp. path";
    doc.save(filePath);
    doc.close();
}

Note: The generatePagePdf function creates a file in a temporary location. This is possible even from a web application.

Performance Consideration #2

Due to the performance issues detailed in the Performance Consideration #1 note, in case the final report has a large number of pages (50 pages in this case), it is better to create a PDF document for each page (or group of pages up to 10), and after that, archive them in a ZIP file. The ZIP file will be the one returned to the client.

Very important: For the case presented in this article, a ZIP file containing 50 pages can be returned to the client in about 2-5 seconds.

Generating a ZIP File

/**
* tempPdfPathList - the list of the paths for all the pdf pages.
* zipPdfPath - the temporary location where the resulting zip file will be saved.
*
*/privatevoid createExternalZip(List<String> tempPdfPathList, String zipPdfPath)
int BUFFER = 2048; 
    FileOutputStream dest = new FileOutputStream(reportPdfPath);
    ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest));
    byte data[] = newbyte[BUFFER];

    BufferedInputStream origin = null;
    for (int f = 0; f < tempPdfPathList.size(); f++) {
        File pdfFile = new File(tempPdfPathList.get(f));
        FileInputStream pdfFileIS = new FileInputStream(pdfFile);
        origin = new BufferedInputStream(pdfFileIS, BUFFER);
        ZipEntry entry = new ZipEntry(pdfFile.getName());
        out.putNextEntry(entry);
        int count;
        while ((count = origin.read(data, 0, BUFFER)) != -1) {
            out.write(data, 0, count);
        }
        origin.close();
    }
    out.close();
}

As a result of calling createExternalZip, the resulting ZIP file that contains the PDFs for all the pages will be placed in zipPdfPath.

Returning the report to the Client From a Servlet

The doInvoke method of the servlet will contain:

protected void doInvoke(HttpServletRequest request, HttpServletResponse response) throws Exception {

    //some othe business code.....// the pdfOrZipFile is created here.....


    ByteArrayOutputStream pdfAsByteArrayStream = convertPdfFileToOutputStream(pdfOrZipFile);
    response.setHeader("Content-disposition", "attachment;filename=" + exportFileName);
    response.setHeader("Expires", "0");
    response.setHeader("Cache-Control", "must-revalidate, post-check=0, pre-check=0");
    response.setContentType("application/pdf");
    // the contentlength is needed for MSIE!!!
    response.setContentLength(pdfAsByteArrayStream.size());

    ServletOutputStream out = response.getOutputStream();
    pdfAsByteArrayStream.writeTo(out);
    out.flush();
}

The convertPdfFileToOutputStream function is below:

private ByteArrayOutputStream convertPdfFileToOutputStream(File pdfFile) throws IOException {

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    InputStream in = new FileInputStream(pdfFile);
    byte[] buff = newbyte[8000];
    int bytesRead = 0;

    baos = new ByteArrayOutputStream();
    while ((bytesRead = in.read(buff)) != -1) {
        baos.write(buff, 0, bytesRead);
    }
    in.close();

    return baos;
}

This article is only a glimpse of what Apache PDFBox can do. It's a complete solution that can help you easily generate any PDF document and having fun while using it.

To find out more and also try the examples please visit the Apache PDFBox site.

PDF Document

Opinions expressed by DZone contributors are their own.

Related

Trending