DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • How To Convert Common Documents to PNG Image Arrays in Java
  • How To Convert ODF Files to PDF in Java
  • How to Convert a PDF to Text (TXT) Using Java
  • How to Get Plain Text From Common Documents in Java

Trending

  • Start Coding With Google Cloud Workstations
  • Agentic AI for Automated Application Security and Vulnerability Management
  • MySQL to PostgreSQL Database Migration: A Practical Case Study
  • DGS GraphQL and Spring Boot
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Convert DOCX to PDF in Java

How to Convert DOCX to PDF in Java

A simple approach to performing the infamously difficult format conversion between DOCX and PDF

By 
Brian O'Neill user avatar
Brian O'Neill
DZone Core CORE ·
Aug. 13, 20 · Interview
Likes (2)
Comment
Save
Tweet
Share
22.1K Views

Join the DZone community and get the full member experience.

Join For Free

Ever since its introduction in Microsoft Word 2003, DOCX format has always maintained a high prominence in offices worldwide due to its easy editing and deep levels of design choices. Its limitations do start to show when it comes to compatibility, and especially consistency of viewing for the end user. Its complexity quickly becomes a liability, with varying versions of compatible applications resulting in unintended (and often unfortunate) consequences for your painstaking designs. On the opposite side of things, you have PDF, with its ubiquitous support and unshakably consistent display fidelity, no matter the device, operating system, or app. Unfortunately, PDFs are also notoriously impractical when it comes time to make edits.

As a result of these individual strengths and weaknesses, converting between these two formats remains very necessary and often crucial in many cases. While it might be a simple matter to manually converting a handful of DOCX files into PDF format, this is certainly not the case when a more automatic approach is required. Approaching this conversion from a programmatic perspective, there are many problems that have to be solved. 

Our primary issue is the matter of parsing the DOCX file to begin with. The main reason for this is that DOCX is immensely complicated. The ECMA specifications for this format comprise a staggering 5,000 pages, with new features being added regularly. Once again, the sheer depth of choice in DOCX returns as a double-edged sword. Another problem is the fact that DOCX files are actually zipped archives containing multiple metadata and document files. Sorting the relationships between these files using “rels” is certainly no easy task. And we still have not even addressed the matter of converting all of this parsed data into the final PDF.

Let us assume that you do not have the development time or budget to grind through this whole process from scratch. This tutorial will show you how to solve this dilemna by using a cloud-based API to perform our conversion from DOCX to PDF.

We will also cover how to use this API to perform search and replace operations on DOCX files. Performing search and replace programmatically for a DOCX file is actually surprisingly difficult, running directly into the previously mentioned parsing issues. Thankfully our API can perform this task for us as well. Putting this all together will allow us to use the editing power of DOCX to easily create rich text templates for reports, invoices, letters, etc, populate them with search and replace, and then convert them to PDF format. Thus, we can use the strengths of DOCX to compensate for the lack of editing options in PDF.

Our primary goal in today's demonstration will be to maintain the maximum level of fidelity in our conversions. Important design choices such as page layout, tables, and annotations will all remain intact. With that said, let us get started with our setup process.

Our first step covers installation of our API client. Let us add a repository reference to our Maven POM file, like so:

XML
x
 
1
<repositories>
2
    <repository>
3
        <id>jitpack.io</id>
4
        <url>https://jitpack.io</url>
5
    </repository>
6
</repositories>


That will allow Jitpack to dynamically compile our library after we add the following dependency reference:

XML
 
xxxxxxxxxx
1
 
1
<dependencies>
2
<dependency>
3
    <groupId>com.github.Cloudmersive</groupId>
4
    <artifactId>Cloudmersive.APIClient.Java</artifactId>
5
    <version>v3.62</version>
6
</dependency>
7
</dependencies>


With our library compiled, we can now proceed with implementing it into our controller. Simply add these import commands to the beginning of our file.

Java
xxxxxxxxxx
1
 
1
// Import classes:
2
//import com.cloudmersive.client.invoker.ApiClient;
3
//import com.cloudmersive.client.invoker.ApiException;
4
//import com.cloudmersive.client.invoker.Configuration;
5
//import com.cloudmersive.client.invoker.auth.*;
6
//import com.cloudmersive.client.EditDocumentApi;


Now it is time to call our first function, in this case convertDocumentDocxToPdf. Below is some example code, demonstrating how to structure this.

Java
 
xxxxxxxxxx
1
17
 
1
ApiClient defaultClient = Configuration.getDefaultApiClient();
2

          
3
// Configure API key authorization: Apikey
4
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
5
Apikey.setApiKey("YOUR API KEY");
6
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
7
//Apikey.setApiKeyPrefix("Token");
8

          
9
EditDocumentApi apiInstance = new EditDocumentApi();
10
ReplaceStringRequest reqConfig = new ReplaceStringRequest(); // ReplaceStringRequest | Document string replacement configuration input
11
try {
12
    byte[] result = apiInstance.editDocumentDocxReplace(reqConfig);
13
    System.out.println(result);
14
} catch (ApiException e) {
15
    System.err.println("Exception when calling EditDocumentApi#editDocumentDocxReplace");
16
    e.printStackTrace();
17
}


While not particularly complicated, it is important to follow some requirements here:

  • A valid DOCX document should be used as our inputFile
  • Our function must be called from an API instance
  • Use an API key, which can be obtained for free from the Cloudmersive website. This key is free, valid forever, limits input files to 4MB, and allows 1,000 API calls from any Cloudmersive API.

With that done, we have finished with our setup for DOCX to PDF. If you give it a test run, you will see that we can already start converting documents in real time. 

Now let us turn to the matter of using DOCX templates to create rich text PDF documents. Search and replace is the perfect tool for dynamically replacing fields to populate these templates. For a single search and replace operation, we can use editDocumentDocxReplace, which will take in a ReplaceStringRequest object. This is comprised of an inputFile (either through byte array or URL), a matchString to be searched for, a replaceString, and the matchCase bool, which determines if letter case is taken into account. Here is some example code you can use as reference:

Java
xxxxxxxxxx
1
17
 
1
ApiClient defaultClient = Configuration.getDefaultApiClient();
2
3
// Configure API key authorization: Apikey
4
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
5
Apikey.setApiKey("YOUR API KEY");
6
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
7
//Apikey.setApiKeyPrefix("Token");
8
9
EditDocumentApi apiInstance = new EditDocumentApi();
10
ReplaceStringRequest reqConfig = new ReplaceStringRequest(); // ReplaceStringRequest | Document string replacement configuration input
11
try {
12
    byte[] result = apiInstance.editDocumentDocxReplace(reqConfig);
13
    System.out.println(result);
14
} catch (ApiException e) {
15
    System.err.println("Exception when calling EditDocumentApi#editDocumentDocxReplace");
16
    e.printStackTrace();
17
}


So what if you need to replace a large number of strings at once? Instead of calling the previously mentioned function repeatedly, we can instead make use of editDocumentDocxReplaceMulti. This function also takes in a request object, which contains an array of individual string replacement requests, each with its own matchString and replaceString. This allows rapid string replacement, making it particularly useful when combined with DOCX templates. For example, you could populate all of the various fields in a form with values such as names, addresses, and dates, all in real time with a single function call.

In this same library, you can also find functions for identifying and populating PDF form fields, retrieving and editing metadata, file validation, and conversions between numerous popular file formats.

PDF Java (programming language) Convert (command) API

Opinions expressed by DZone contributors are their own.

Related

  • How To Convert Common Documents to PNG Image Arrays in Java
  • How To Convert ODF Files to PDF in Java
  • How to Convert a PDF to Text (TXT) Using Java
  • How to Get Plain Text From Common Documents in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!