DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How to Get Plain Text From Common Documents in Java
  • Exploring Hazelcast With Spring Boot
  • How To Convert HTML to PNG in Java
  • Three Ways To Separate Plain Text From HTML Using Java

Trending

  • Implementing Explainable AI in CRM Using Stream Processing
  • Designing a Java Connector for Software Integrations
  • How To Build Resilient Microservices Using Circuit Breakers and Retries: A Developer’s Guide To Surviving
  • Securing the Future: Best Practices for Privacy and Data Governance in LLMOps
  1. DZone
  2. Coding
  3. Languages
  4. Reading an HTML File, Parsing It and Converting It to a PDF File With the Pdfbox Library

Reading an HTML File, Parsing It and Converting It to a PDF File With the Pdfbox Library

In this article, we will read an HTML file from a specified folder and replace variables with their actual values.

By 
Erkin Karanlık user avatar
Erkin Karanlık
·
Nov. 22, 23 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
13.7K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, we will ensure that the HTML file we put in a folder we specify is read and the variables in its content are parsed and replaced with their real values. Then,  I modified the HTML file with the "openhtmltopdf-pdfbox"  library. We will cover converting it to a PDF file.

First, we will read the HTML file under a folder we have determined, parse it, and pass our own dynamic values to the relevant variables in the HTML. We will convert the HTML file to PDF file using the "openhtmltopdf-pdfbox"  library in its latest updated form.

I hope it will be a reference for those who need it on this subject. You can easily do the conversion in your Java projects. You can see an example project below.

First, we will create a new input folder where we will read our input HTML file and an output folder where we will write the PDF file.

We can put the HTML file under the input folder. We define a key value to be replaced in the HTML file. This key value is given as #NAME# as an example. Optionally, you can replace the key value you want here in Java with an externally sent value.

Plain Text
 
input folder :  \ConvertHtmlToPDF\input

output folder:  \ConvertHtmlToPDF\output




HTML
 
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE html>
<html lang="tr">
  <head>
    <meta data-fr-http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </meta>
    <title>Convert to Html to Pdf</title>
    <style type="text/css">
      body {
        font-family: "Times New Roman", Times, serif;
        font-size: 40px;
      }
    </style>
  </head>
  <body topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0">
    <table width="700" border="0" cellspacing="0" cellpadding="0" style="background-color: #1859AB;
                                 
                                 color: white;
                                 font-size: 14px;
                                 border-radius: 1px;
                                 line-height: 1em; height: 30px;">
      <tbody>
        <tr>
          <td>
            <strong style="color:#F8CD00;">   Hello </strong>#NAME#
          </td>
        </tr>
      </tbody>
    </table>
  </body>
</html>


Creating a New Project

We are creating a new spring project.  I am using Intellj Idea.

Controller 

To replace a key with a value in HTML, we will send the value value from outside. We will write a rest service for this.

We create the "ConvertHtmlToPdfController.java" class under the Controller folder. We create a get method called "convertHtmlToPdf " within the Controller class. We can pass the value to this Method dynamically as follows.

Java
 
package com.works.controller;

import com.works.service.ConvertHtmlToPdfService;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("")
public class ConvertHtmlToPdfController {

    private final ConvertHtmlToPdfService convertHtmlToPdfService;

    public ConvertHtmlToPdfController(ConvertHtmlToPdfService convertHtmlToPdfService) {
        this.convertHtmlToPdfService = convertHtmlToPdfService;
    }

    @GetMapping("/convertHtmlToPdf/{variableValue}")
    public ResponseEntity<String> convertHtmlToPdf(@PathVariable @RequestBody String variableValue) {
        try {
            return ResponseEntity.ok(convertHtmlToPdfService.convertHtmlToPdf(variableValue));
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

}


Service 

Java
 
package com.works.service.impl;

import com.works.service.ConvertHtmlToPdfService;
import com.works.util.ConvertHtmlToPdfUtil;
import io.micrometer.common.util.StringUtils;
import org.springframework.stereotype.Service;

@Service("convertHtmlToPdfService")
public class ConvertHtmlToPdfServiceImpl implements ConvertHtmlToPdfService {

    private String setVariableValue(String htmlContent, String key, String value) {

        if (StringUtils.isNotEmpty(value)) {
            htmlContent = htmlContent.replaceAll("#" + key + "#", value);
        } else {
            htmlContent = htmlContent.replaceAll("#" + key + "#", "");
        }

        return htmlContent;
    }

    @Override
    public String convertHtmlToPdf(String variableValue) throws Exception {
        String inputFile = "/convertHtmlToPDF/input/input.html";
        String outputFile = "/convertHtmlToPDF/output/output.pdf";
        String fontFile = "/convertHtmlToPDF/input/times.ttf";

        try {
            String htmlContent = ConvertHtmlToPdfUtil.readFileAsString(inputFile);
            htmlContent = setVariableValue(htmlContent, "NAME", variableValue);
            ConvertHtmlToPdfUtil.htmlConvertToPdf(htmlContent, outputFile, fontFile);

        } catch (Exception e) {
            throw new Exception("convertHtmlToPdf - An error was received in the service : ", e);
        }
        return "success";
    }
}


ConvertHtmlToPdfService.java service contains the method called convertHtmlToPdf. The convertHtmlToPdf method takes string variableValue input.

In the convertHtmlToPdf service method;

The "inputFile" variable is defined to read the html file under the input folder. We can give this variable the URL of the input html file we will read.

The "outputFile" variable is defined to assign the pdf file to the output folder. We can give the output file folder url to this variable.

You can also read the font file from outside. You can get this from under the input folder. We can also assign the URL where the font file is located to the "fontFile" variable.

In the above code line, the URL of the folder containing the input is given to the "ConvertHtmlToPdfUtil.readFileAsString" method to read the HTML file in the input folder.

             String htmlContent = ConvertHtmlToPdfUtil.readFileAsString(inputFile);


Java
 
package com.works.util;

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;

public class ConvertHtmlToPdfUtil {

    public static void safeCloseBufferedReader(BufferedReader bufferedReader) throws Exception {
        try {
            if (bufferedReader != null) {
                bufferedReader.close();
            }
        } catch (IOException e) {
            throw new Exception("safeCloseBufferedReader  - the method got an error. " + e.getMessage());
        }
    }

    public static String readFileAsString(String filePath) throws Exception {
        BufferedReader br = null;
        String encoding = "UTF-8";

        try {

            br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));

            StringBuilder fileContentBuilder = new StringBuilder();
            String line;

            while ((line = br.readLine()) != null) {
                if (fileContentBuilder.length() > 0) {
                    fileContentBuilder.append(System.getProperty("line.separator"));
                }
                fileContentBuilder.append(line);
            }

            return fileContentBuilder.toString();

        } catch (Exception e) {
            new Exception("readFileAsString - the method got an error." + e.getMessage(), e);
            return null;
        } finally {
            safeCloseBufferedReader(br);
        }
    }

    public static OutputStream htmlConvertToPdf(String html, String filePath, String fonts) throws Exception {
        OutputStream os = null;
        try {
            os = new FileOutputStream(filePath);
            final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();
            pdfBuilder.useFastMode();
            pdfBuilder.withHtmlContent(html, null);
            String fontPath = fonts;
            pdfBuilder.useFont(new File(concatPath(fontPath, "times.ttf")), "Times", null, null, false);
            pdfBuilder.toStream(os);
            pdfBuilder.run();
            os.close();
        } catch (Exception e) {
            throw new Exception(e.getMessage(), e);
        } finally {
            try {
                if (os != null) {
                    os.close();
                }
            } catch (IOException e) {
            }
        }
        return os;
    }

    public static String concatPath(String path, String... subPathArr) {
        for (String subPath : subPathArr) {
            if (!path.endsWith(File.separator)) {
                path += File.separator;
            }
            path += subPath;
        }

        return path;
    }
}


The HTML file is read with FileInputStream in the ConvertHtmlToPdfUtil.readFileAsString method. It is converted into a character set with InputStreamReader and put into the internal buffer with BufferedReader.

The characters in BufferedReader are read line by line as seen in the code block below. All HTML content is thrown into the string variable. With the safeCloseBufferedReader method, we close the buffer when we are done with it. 

Plain Text
 
         br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));
            StringBuilder fileContentBuilder = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                if (fileContentBuilder.length() > 0) {
                    fileContentBuilder.append(System.getProperty("line.separator"));
                }
                fileContentBuilder.append(line);
            }
            return fileContentBuilder.toString();


We can send our HTML content to the setVariableValue method to be replaced with the value we sent to the service from outside. The key value we marked as #key# in HTML is replaced with the value value.

Plain Text
 
    private String setVariableValue(String htmlContent, String key, String value) {
        if (StringUtils.isNotEmpty(value)) {
            htmlContent = htmlContent.replaceAll("#"+key+"#", value);
        }else {
            htmlContent = htmlContent.replaceAll("#"+key+"#", "");
        }
        return htmlContent;
    }


Then, after the replacement process, we can call the ConvertHtmlToPdfUtil.htmlConvertToPdf method to produce the html URL file as pdf output. ConvertHtmlToPdfUtil.htmlConvertToPdf method can receive html content, output, and font inputs, as can be seen below.

We can pass these inputs to the method.

Plain Text
 
              ConvertHtmlToPdfUtil.htmlConvertToPdf(htmlContent, outputFile, fontFile);


ConvertHtmlToPdfUtil.htmlConvertToPdf method content;

We will create a new FileOutputStream. This will determine the creation of the output.pdf file we specified.

PdfRendererBuilder class is in the com.openhtmltopdf.pdfboxout library. Therefore, we must add this library to the pom.xml file as follows.

Plain Text
 
                 final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();


 Pom.xml

XML
 
      <dependency>
            <groupId>com.openhtmltopdf</groupId>
            <artifactId>openhtmltopdf-pdfbox</artifactId>
            <version>1.0.10</version>
        </dependency>
XML
 
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>3.1.1</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.works</groupId>
	<artifactId>convertHtmlToPDF</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>convertHtmlToPDF</name>
	<description>convertHtmlToPDF</description>
	<properties>
		<java.version>17</java.version>
		<spring-cloud.version>2022.0.3</spring-cloud.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>com.openhtmltopdf</groupId>
			<artifactId>openhtmltopdf-pdfbox</artifactId>
			<version>1.0.10</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-starter-zookeeper-discovery</artifactId>
			<version>4.0.1</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
				<configuration>
					<image>
						<builder>paketobuildpacks/builder-jammy-base:latest</builder>
					</image>
				</configuration>
			</plugin>
		</plugins>
	</build>

</project>


 After the PdfRendererBuilder object is implemented, we can set the HTML parameter to 'withHtmlContent' and the fontpath parameters to 'useFont'. We can set the output file with toStream. Finally, we can run it with the run method.

Java
 
            pdfBuilder.useFastMode();
            pdfBuilder.withHtmlContent(html, null);
            String fontPath = fonts;
            pdfBuilder.useFont(new File(concatPath(fontPath, "times.ttf")), "Times", null, null, false);
            pdfBuilder.toStream(os);
            pdfBuilder.run();


pdfBuilder.run(); After the method is run, we should see that the output.pdf file is created under the output folder.

Thus, we can see the smooth HTML to PDF conversion process with the openhtmltopdf-pdfbox library.


HTML Library PDF Plain text XML Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • How to Get Plain Text From Common Documents in Java
  • Exploring Hazelcast With Spring Boot
  • How To Convert HTML to PNG in Java
  • Three Ways To Separate Plain Text From HTML Using Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!