DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Google Cloud Document AI Basics
  • Thumbnail Generator Microservice for PDF in Spring Boot
  • How to Split PDF Files into Separate Documents Using Java
  • How to Get Plain Text From Common Documents in Java

Trending

  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  • Optimizing Integration Workflows With Spark Structured Streaming and Cloud Services
  • How To Develop a Truly Performant Mobile Application in 2025: A Case for Android
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 2

Apache PDFBox. Don’t Leave Home Without It.

Got PDF problems. Check out how you can use Java and a handy tool to design and create PDFs for just about anything you could need.

By 
Eugen Hoble user avatar
Eugen Hoble
·
Oct. 07, 16 · Tutorial
Likes (21)
Comment
Save
Tweet
Share
27.2K Views

Join the DZone community and get the full member experience.

Join For Free

Scenario: You have to generate a multiple page PDF report from production data. Each page must be a PDF page that will show your production data and allow users to change it through the use of text fields or text areas. Each report could have from 1 to 50 pages, each page is divided into sections, and each section can have 20 UI components (text fields and text areas). One page can contain from two sections to eight sections, depending on what specific template is it used for that page.

Please see below how the template of such page might look like:


Explanation: In Section #1 in the above image, the text (for instance Label 1) is displayed inside a text field. The text for Label 3, for instance, is shown in the document inside a text area. Hence, the big number of text fields per page.

Solution: You will need a library to generate PDFs, and there are at least two libraries that allow developers to create PDF fillable forms: Apache PDFBox and iText. This article details only how to use Apache PDFBox to generate a PDF report.

I think at this point some explanation might be needed regarding what classes PDFBox uses to generate a PDF document:

PDDocument: This class is a blueprint for a PDF document (the document can contain one or multiple pages).

PDPage: This class is used to model a PDF page that is part of a document.

PDAcroForm: This class models the AcroForm attached to a PDF document. An AcroForm allows the use of UI components and JavaScript in a PDF document.

A function that could generate a one-page or multi-page report could look like this:

private PdfPageBuilder pdfPageBuilder;
private ReportFileProcessingImpl reportFileProcessing;
private File createPdf(ProductionExportData exportData)throws IOException {
    File file = null;
    if (exportData != null && exportData.getTotalNrPages()>0) {
        Stack < PDDocument > pdfStack = new Stack < PDDocument > ();

        PDDocument pdfPageDoc = null;
        PDAcroForm acroForm = null;

        while (exportData.hasPage()) {
            Page page = exportData.getNextPage();

            pdfPageDoc = new PDDocument();
            acroForm = initializeNewPDDocument(pdfPageDoc, page);

            PDPage pdPage = pdfPageBuilder.createPdfPage(pdfPageDoc, acroForm, page )
            if (pdPage != null)
            pdfPageDoc.addPage(pdPage);

            //some more cod here...
            if (isMultiPageReport) {
                if (pdfStack != null && !pdfStack.isEmpty())
                file = reportFileProcessing.generateZip(pdfStack);
            } else
            file = reportFileProcessing.generatePagePdf(pdfPageDoc);
        }

    }
    return file;
}

Details about the code above:

  • ProductionExportData contains your production data ready to be placed in a PDF page.

  • Page is your custom blueprint for a PDF page. This class will contain all the data you need to show in a page.

  • initializeNewPDDocument is a function that initializes a PDDocument by creating a new AcroForm for each PDDocument.

  • PdfPageBuilder is responsible for creating a PDPage

  • ReportFileProcessing is an interface that exposes two functions: one to create a ZIP file and the other to create a PDF file.

private PDAcroForm initializeNewPDDocument(PDDocument pdfPageDoc, Page page) {

    PDAcroForm acroForm=null;
    if (pdfPageDoc != null && page != null ) {

        PDDocumentInformation info = new PDDocumentInformation();
        info.setTitle("some title of the document here");
        pdfPageDoc.setDocumentInformation(info);

        // Add a new AcroForm and add that to the document
        PDAcroForm acroForm = new PDAcroForm(pdfDocument);

        PDFont fontNormal = PDType1Font.HELVETICA;
        PDResources resources = new PDResources();
        resources.put(COSName.getPDFName("Helv"), fontNormal);

        PDFont fontBold = PDType1Font.HELVETICA_BOLD;
        resources.put(COSName.getPDFName("HelvBold"), fontBold);

        // Add and set the resources and default appearance at the form level
        acroForm.setDefaultResources(resources);

        // Acrobat sets the font size on the form level to be// auto sized as default. This is done by setting the font size to '0'
        String defaultAppearanceString = "/Helv 0 Tf 0 g";
        acroForm.setDefaultAppearance(defaultAppearanceString);
        acroForm.setNeedAppearances(true);

        pdfDocument.getDocumentCatalog().setAcroForm(acroForm);

    }
    return acroForm;
}

Question: Why do I create a new PDDocument for each Page instead of creating only one PDDocument for all pages?

Performance Consideration #1

An AcroForm contains all the UI components in a PDDocument. In case your document has hundreds or thousands of UI components, the generation of the PDF will be extremely slow because the AcroForm will become too big.

For the specific case presented in this article, I have mentioned in the beginning that on a single PDF page is it possible to have anywhere from 40-160 UI components.

If the document has 50 pages, that makes between 2000 and 8000 UI components in the final PDF document. This is way too much for an AcroForm, and the generation of your PDF will become extremely slow.

The solution, in this case, is to have a separate PDDocument for each Page or a group of pages (up to 10 pages).

Note: The number of UI components presented in this article is a real number .

Drawing Margins

The margins can be drawn line by line:

PDPageContentStream contentStream = new PDPageContentStream(pdfDocument, pdPage);
contentStream.setStrokingColor(Color.BLACK);
contentStream.setLineWidth(1);

float marginStartX = 0;
float marginStartY = 0;
float marginEndX = page.getWidth();
float marginEndY = page.getHeight();

contentStream.moveTo(marginStartX, marginStartY);
contentStream.lineTo(marginEndX, marginStartY);
contentStream.moveTo(marginEndX, marginStartY);
contentStream.lineTo(marginEndX, marginEndY);
contentStream.moveTo(marginEndX, marginEndY);
contentStream.lineTo(marginStartX, marginEndY);
contentStream.moveTo(marginStartX, marginEndY);
contentStream.lineTo(marginStartX, marginStartY);
contentStream.closeAndStroke();

Alternatively, they can be drawn using the addRect function.

contentStream.addRect(footerStartX, footerStartY, width, height);

PDPageContentStream is responsible for the actual drawing of the PDF.

To write text at specific coordinates in a PDF page use this:

contentStream.beginText();
// set font and font size
contentStream.setNonStrokingColor(Color.BLACK);
contentStream.setFont(fontSettings.getFontNormal(), fontSettings.getFontNormalSize());
contentStream.newLineAtOffset(footerStartX + 10, footerStartY + footerHeight/2);
contentStream.showText(pdfpageNumber);
contentStream.endText();

How to Add a Text Field

PDTextField textBox = new PDTextField(acroForm);
textBox.setMultiline(false);
textBox.setPartialName("this name must be UNIQUE between all existing UI components!");
textBox.setDefaultAppearance("/HelvBold 7 Tf 0 g");
acroForm.getFields().add(textBox);

PDAnnotationWidget widget = textBox.getWidgets().get(0);
widget.setPrinted(true);
widget.setAnnotationName("UNIQUE name");

//the position/size of the text field
PDRectangle rect = new PDRectangle(startX, y, 100,20);
widget.setRectangle(rect);
widget.setPage(pdPage);
pdPage.getAnnotations().add(widget);

textBox.setValue("some value here that will be shown in this field");

How to Add an Annotation to a Specific Area

File file = loadAnnotationIcon();
PDImageXObject ximage = PDImageXObject.createFromFileByContent(file, pdfDocument);
contentStream.drawImage(ximage,
                        headerCoordinates.getTopX() - ANNOTATION_ICON_WIDTH,
                        headerCoordinates.getTopY() - ANNOTATION_ICON_HEIGHT,
                        ANNOTATION_ICON_WIDTH, ANNOTATION_ICON_HEIGHT);

PDAnnotationRubberStamp annoPopup = new PDAnnotationRubberStamp();
annoPopup.setRectangle(new PDRectangle(ANNOTATION_POPUP_WIDTH,ANNOTATION_POPUP_HEIGHT);
annoPopup.setContents("some content here");
annoPopup.setConstantOpacity(1f);
annoPopup.setReadOnly(true);
annoPopup.setColor(annotationBackground);
annoPopup.setPrinted(false);


The code above can generate an annotation that will look like in the image below.

When you click on the yellow icon, you'll see the annotation popup.

How to Generate a PDF File From a PDDocument

public File generatePagePdf(PDDocument doc) {
    String tempFilePath="some temp. path";
    doc.save(filePath);
    doc.close();
}


Note: The generatePagePdf function creates a file in a temporary location. This is possible even from a web application.

Performance Consideration #2

Due to the performance issues detailed in the Performance Consideration #1 note, in case the final report has a large number of pages (50 pages in this case), it is better to create a PDF document for each page (or group of pages up to 10), and after that, archive them in a ZIP file. The ZIP file will be the one returned to the client.

Very important: For the case presented in this article, a ZIP file containing 50 pages can be returned to the client in about 2-5 seconds.

Generating a ZIP File

/**
* tempPdfPathList - the list of the paths for all the pdf pages.
* zipPdfPath - the temporary location where the resulting zip file will be saved.
*
*/privatevoid createExternalZip(List<String> tempPdfPathList, String zipPdfPath)
int BUFFER = 2048; 
    FileOutputStream dest = new FileOutputStream(reportPdfPath);
    ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest));
    byte data[] = newbyte[BUFFER];

    BufferedInputStream origin = null;
    for (int f = 0; f < tempPdfPathList.size(); f++) {
        File pdfFile = new File(tempPdfPathList.get(f));
        FileInputStream pdfFileIS = new FileInputStream(pdfFile);
        origin = new BufferedInputStream(pdfFileIS, BUFFER);
        ZipEntry entry = new ZipEntry(pdfFile.getName());
        out.putNextEntry(entry);
        int count;
        while ((count = origin.read(data, 0, BUFFER)) != -1) {
            out.write(data, 0, count);
        }
        origin.close();
    }
    out.close();
}

As a result of calling createExternalZip, the resulting ZIP file that contains the PDFs for all the pages will be placed in zipPdfPath.

Returning the report to the Client From a Servlet

The doInvoke method of the servlet will contain:

protected void doInvoke(HttpServletRequest request, HttpServletResponse response) throws Exception {

    //some othe business code.....// the pdfOrZipFile is created here.....


    ByteArrayOutputStream pdfAsByteArrayStream = convertPdfFileToOutputStream(pdfOrZipFile);
    response.setHeader("Content-disposition", "attachment;filename=" + exportFileName);
    response.setHeader("Expires", "0");
    response.setHeader("Cache-Control", "must-revalidate, post-check=0, pre-check=0");
    response.setContentType("application/pdf");
    // the contentlength is needed for MSIE!!!
    response.setContentLength(pdfAsByteArrayStream.size());

    ServletOutputStream out = response.getOutputStream();
    pdfAsByteArrayStream.writeTo(out);
    out.flush();
}

The convertPdfFileToOutputStream function is below:

private ByteArrayOutputStream convertPdfFileToOutputStream(File pdfFile) throws IOException {

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    InputStream in = new FileInputStream(pdfFile);
    byte[] buff = newbyte[8000];
    int bytesRead = 0;

    baos = new ByteArrayOutputStream();
    while ((bytesRead = in.read(buff)) != -1) {
        baos.write(buff, 0, bytesRead);
    }
    in.close();

    return baos;
}

This article is only a glimpse of what Apache PDFBox can do. It's a complete solution that can help you easily generate any PDF document and having fun while using it.

To find out more and also try the examples please visit the Apache PDFBox site.

PDF Document

Opinions expressed by DZone contributors are their own.

Related

  • Google Cloud Document AI Basics
  • Thumbnail Generator Microservice for PDF in Spring Boot
  • How to Split PDF Files into Separate Documents Using Java
  • How to Get Plain Text From Common Documents in Java

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: