DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How to Merge HTML Documents in Java
  • How to Get Plain Text From Common Documents in Java
  • How to Merge Excel XLSX Files in Java
  • How To Convert Common Documents to PNG Image Arrays in Java

Trending

  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • Implementing Explainable AI in CRM Using Stream Processing
  • AI Agents: A New Era for Integration Professionals
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. How To Get the Comments From a DOCX Document in Java

How To Get the Comments From a DOCX Document in Java

In this article, learn how to extract comments from DOCX documents at scale and pick up key insights which improve team collaboration.

By 
Brian O'Neill user avatar
Brian O'Neill
DZone Core CORE ·
Aug. 01, 23 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
6.0K Views

Join the DZone community and get the full member experience.

Join For Free

Contemporary document collaboration tools help make it possible to push projects from start to finish on tighter deadlines than ever before. Where pre-digital project collaboration relied on manual markups and annotations to modify/improve critical reports and memos before their distribution, contemporary teams across a variety of industries can accomplish the same essential goals – and much, much more – using the simple revision tools accessible to all users in DOCX files. Suggestions, changes, and callouts can be added by any team member to a DOCX document in a SharePoint site drive, drastically minimizing the time it takes to publish and share final products with stakeholders.

Underneath the hood, it’s Microsoft’s OpenXML document format that makes this team-oriented file manipulation possible. Because DOCX format is structured as a zip file composed of multiple XML-based files, comments and other revisions are physically separated from the document’s core content, and data that define the relationships between these separate files are stored in a folder of its own.  

In other words, the comments and revisions we see when we open a collaborative DOCX document are part of an independent file communicating with the document’s body of text, which is stored in a file of its own. This file-structure compartmentalization ultimately creates the fluid, dynamic experience we’re accustomed to when we add, remove or resolve revisions, or even when we elect to turn collaboration features on and off entirely.

Since comments in a DOCX document are stored in an XML-based file, they can be manually or programmatically accessed independently of the document’s other components. Once extracted, useful comment metadata – including the comment text along with the author names, dates, and much more - can be analyzed independently of the original content it’s associated with. While this data excavation isn’t necessarily useful on a one-off basis, there’s a notable benefit to accumulating comments from multiple documents of the same type (e.g., cyclical reports and memos) over time and using that information to better understand the overall content collaboration process. With volumes of comment metadata readily available, it’s possible, for example, to apply NLP analysis and better understand how a team tends to feel about specific sections of a biweekly memo. It’s also possible to get a sense of how often collaboration occurs on a particular topic, learn more about who the most frequent contributors are, and much more.

If insights like these are intriguing enough to pursue, the challenge becomes one of extracting that information in an organized and efficient manner across multiple documents in a reasonably short period of time. While Open XML files can be converted to .zip files and extracted independently (or accessed individually using documented code examples in C# or Visual Basic), these methods are largely impractical or too limited in scope to be practical across a larger array of files. Rather, it’s far more practical to rely on fully realized programmatic solutions which extract and return the data we need in a simple, organized, and human-readable format. This is a perfect role for a specialized document conversion API.

Demonstration

In the remainder of this article, I’ll demonstrate two APIs that are designed to retrieve comment text and comment metadata from a DOCX file. These two solutions can be utilized easily (and freely, using a free API key) by copying from the ready-to-run Java code examples provided further down the page, and they both perform slightly different variations of the same basic function. I’ll briefly outline both solutions below.

1. Get Comments From a DOCX Document as a Flat List

This API returns comments and review annotations without any hierarchy showing the reply-child comments attached to the original comments. In the response object, replies to original comments are distinguished by an IsReply Boolean.  Refer to the below example JSON response body:

JSON
 
{
  "Successful": true,
  "Comments": [
    {
      "Path": "string",
      "Author": "string",
      "AuthorInitials": "string",
      "CommentText": "string",
      "CommentDate": "2023-07-27T15:15:44.278Z",
      "IsTopLevel": true,
      "IsReply": true,
      "ParentCommentPath": "string",
      "Done": true
    }
  ],
  "CommentCount": 0
}


2. Get Comments From a DOCX Document Hierarchically

This API returns comments and review annotations in an object with reply-child comments nested beneath their associated comment. This serves to make the relationship between reply comments and original comments distinct in the API response body.  Refer to the below example JSON response body:

JSON
 
{
  "Successful": true,
  "Comments": [
    {
      "Path": "string",
      "Author": "string",
      "AuthorInitials": "string",
      "CommentText": "string",
      "CommentDate": "2023-07-27T15:16:28.931Z",
      "ReplyChildComments": [
        {
          "Path": "string",
          "Author": "string",
          "AuthorInitials": "string",
          "CommentText": "string",
          "CommentDate": "2023-07-27T15:16:28.931Z",
          "IsTopLevel": true,
          "IsReply": true,
          "ParentCommentPath": "string",
          "Done": true
        }
      ],
      "Done": true
    }
  ],
  "TopLevelCommentCount": 0
}


You can begin structuring either API call in Java by first installing Maven.  Add the following reference to the repository in pom.xml:

XML
 
<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>


Then, add a reference to the dependency in pom.xml:

XML
 
<dependencies>
<dependency>
    <groupId>com.github.Cloudmersive</groupId>
    <artifactId>Cloudmersive.APIClient.Java</artifactId>
    <version>v4.25</version>
</dependency>
</dependencies>


With installation complete, you can copy the below examples (including import classes) to retrieve DOCX comments as a flat list:

Java
 
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditDocumentApi;

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

EditDocumentApi apiInstance = new EditDocumentApi();
GetDocxGetCommentsRequest reqConfig = new GetDocxGetCommentsRequest(); // GetDocxGetCommentsRequest | Document input request
try {
    GetDocxCommentsResponse result = apiInstance.editDocumentDocxGetComments(reqConfig);
    System.out.println(result);
} catch (ApiException e) {
    System.err.println("Exception when calling EditDocumentApi#editDocumentDocxGetComments");
    e.printStackTrace();
}


And you can copy the below examples (including import classes) to retrieve DOCX comments hierarchically:

Java
 
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditDocumentApi;

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

EditDocumentApi apiInstance = new EditDocumentApi();
GetDocxGetCommentsHierarchicalRequest reqConfig = new GetDocxGetCommentsHierarchicalRequest(); // GetDocxGetCommentsHierarchicalRequest | Document input request
try {
    GetDocxCommentsHierarchicalResponse result = apiInstance.editDocumentDocxGetCommentsHierarchical(reqConfig);
    System.out.println(result);
} catch (ApiException e) {
    System.err.println("Exception when calling EditDocumentApi#editDocumentDocxGetCommentsHierarchical");
    e.printStackTrace();
}


Now you can easily automate the retrieval of DOCX comment/annotation metadata and parse that information seamlessly into other applications and workflows.

API Document XML Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • How to Merge HTML Documents in Java
  • How to Get Plain Text From Common Documents in Java
  • How to Merge Excel XLSX Files in Java
  • How To Convert Common Documents to PNG Image Arrays in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!