Get or Set PDF Metadata in Java

If you mold the metadata of your PDFs to allow for SEO optimization via keywords, you will be able to increase the online searchability of your document.

Brian O'Neill

CORE ·

Feb. 24, 21 · Tutorial

Likes (2)

Comment

Save

4.3K Views

Introduction

Due to their fixed and presentable nature, PDF files are widely used in web applications by both users and businesses. Each of these files contains ‘metadata,' which essentially translates to data about data. PDF metadata contains supplementary information on the document, such as its author, subject, title, creation date, and more. If a PDF file was initially created by a transfer from an original source document (i.e. DocX, PPT, etc.) additional information, such as the file size and whether the file has been optimized for the Web, is automatically added as well.

So why is this PDF metadata relevant to your business? If you have PDF documents that are accessible on your website or application, the metadata can enable search engines to easily locate the documents. Therefore, if you mold the metadata of your PDFs to allow for keywords that may be picked up by search engines, you will be able to increase the searchability of your document.

The following APIs will allow you to extract or set metadata from PDF files and either edit or leverage the information to meet your business’s needs.

Tutorial

To begin, we first need to install the Maven SDK by adding a reference to the repository in pom.xml:

    Java
   
          x
         
<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

Then we can add a reference to the dependency:

    Java
   
xxxxxxxxxx

<dependencies>
<dependency>
    <groupId>com.github.Cloudmersive</groupId>
    <artifactId>Cloudmersive.APIClient.Java</artifactId>
    <version>v3.90</version>
</dependency>
</dependencies>

Once the installation is complete, we can add the imports to the top of the controller and configure the API key. If you don’t already have an API key, you can register for a free account on the Cloudmersive website to retrieve it.

    Java
   
xxxxxxxxxx

// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditPdfApi;
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

If you simply wish to gather metadata from the PDF, the following function will perform the action for you—all you need for input is the target PDF file.

    Java
   
xxxxxxxxxx

EditPdfApi apiInstance = new EditPdfApi();
File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on.
try {
    PdfMetadata result = apiInstance.editPdfGetMetadata(inputFile);
    System.out.println(result);
} catch (ApiException e) {
    System.err.println("Exception when calling EditPdfApi#editPdfGetMetadata");
    e.printStackTrace();
}

However, if you are looking to edit/set metadata for a PDF document, you will use the following API function instead:

    Java
   
xxxxxxxxxx

EditPdfApi apiInstance = new EditPdfApi();
SetPdfMetadataRequest request = new SetPdfMetadataRequest(); // SetPdfMetadataRequest | 
try {
    byte[] result = apiInstance.editPdfSetMetadata(request);
    System.out.println(result);
} catch (ApiException e) {
    System.err.println("Exception when calling EditPdfApi#editPdfSetMetadata");
    e.printStackTrace();
}

In order for the above operation to run smoothly, be sure to input the desired metadata information as well:

    Java
   
xxxxxxxxxx

{
  "InputFileBytes": "string",
  "MetadataToSet": {
    "Successful": true,
    "Title": "string",
    "Keywords": "string",
    "Subject": "string",
    "Author": "string",
    "Creator": "string",
    "DateModified": "2021-02-22T17:38:53.962Z",
    "DateCreated": "2021-02-22T17:38:53.962Z",
    "PageCount": 0
  }
}

In conclusion, we hope that the tools provided in this tutorial can assist in optimizing PDF metadata for your personal or business requirements.

PDF Metadata Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

Trending

Get or Set PDF Metadata in Java

If you mold the metadata of your PDFs to allow for SEO optimization via keywords, you will be able to increase the online searchability of your document.

Introduction

Tutorial

Related

Partner Resources