How to Merge Excel XLSX Files in Java
This article discusses programmatically merging XLSX files into a single document as well as open-source and web API solutions for automating this workflow.
Join the DZone community and get the full member experience.
Join For FreeIn this article, we’re going to learn how to increase the efficiency of a common file merging workflow with the help of a web API solution. Specifically, we are going to learn how to merge Excel XLSX files – one of the most common document types we can expect to work within a file-processing automation environment.
Context for Programmatic XLSX File Merging
If we’re writing code to solve problems related to file processing efficiency and automation, there’s a good chance we’re creating and/or expanding applications that deal with large volumes of Excel files. We may, for example, find that there’s no automated workflow currently in place (or, perhaps, an inefficient one) to combine the various unique Excel reports created by each individual department in our organization.
Monthly, quarterly, and annual reports tend to become disorganized and unmanageable when they are shared with relevant stakeholders as a series of independent files. Resolving this issue by assembling larger multi-worksheet reports often becomes the responsibility of overburdened non-technical employees (i.e., accountants and office administrators) – or it gets handled by too many lines of messy, poorly documented code, which requires constant maintenance and updates to function correctly.
For a common workflow like Excel file merging, the benefits of replacing manual and/or inefficient code processes with a web API boil down to simplicity and efficiency. Calling an external cloud service tends to reduce the amount of code maintenance we need to do, and it reduces the load we place on internal systems by abstracting most of the processing – including local memory allocation, which can be problematic with larger Excel files – to an external server. On top of that, it helps establish a uniform formatting standard for all our XLSX merging operations, ensuring we can build downstream file processing workflows around consistent, predictable results.
We’ll look at calling a web API solution with Java code examples towards the end of this article, but first, we’ll talk about XLSX file structure in the context of file merging, and we’ll briefly review a common open-source Java API that can be used locally for the same purpose.
How XLSX File Merging Works
Combining two Excel files programmatically isn’t as complicated as interacting directly with data sets and other multimedia objects stored within the zipped Open Office XML file structure (something I’ve gone over in previous articles). Still, it’s worth understanding the process at a high level before selecting from APIs and open-source libraries that streamline the operation for us.
Each individual worksheet in an XLSX document (or “workbook”) is stored as its own XML file within the xl/worksheets/
folder. The relationship between these files – which, very importantly in the context of file merging operations, includes the exact order worksheets should appear in when we open the XLSX document – is described in a high-level metadata file, typically xl/workbook.xml
. This stores the custom worksheet names users view when they open an XLSX document, and it maps those worksheet names to the worksheet contents themselves. This metadata file also contains crucial information about the individual styling and formulas associated with each worksheet, which we’ll need for file merges if we don’t want carefully tailored spreadsheets to reset to plain, unformatted text with missing formulas.
Programmatically merging XLSX files means accessing worksheet information through the zipped XLSX folder path, retrieving all worksheet XML files in the order specified by the high-level metadata file, retrieving styling and formula metadata for each file, and copying those XML files into a new XLSX file. The new XLSX file should carry the worksheet XML references in a specific order that mirrors the order the files were originally loaded; otherwise, we’ll compromise the structure of the original documents when we merge multiple.
Open Source XLSX File Merging in Java
For Java developers interested in doing a little more hands-on coding work, most of the heavy lifting for this operation can be accomplished via Apache POI, a popular Java API for Microsoft Documents. As with most Apache products, the benefits of using Apache POI stem from leveraging a well-documented, open-source solution with simple classes (like XSSFWorkbook
in this case). At a high level, you can effectively load Excel files via FileInputStream
, represent those files as XSSFWorkbook
objects, and create a separate XSSFWorkbook
object to store the merged worksheets. Downstream, the copySheet
method can read through all source sheets and replicate data contained within them in the new object, and the output can be written to a new workbook via FileOutputStream
.
Beyond the significant hands-on coding this process requires, however, memory allocation may be a concern for large-scale production environments. All data is generally loaded into and managed within JVM heap memory, which can create some problems when larger XLSX files are involved. As discussed earlier on, one of the primary benefits of leveraging a web API is the ability to abstract memory to an external server, which offloads a considerable portion of the file processing requirements. Conversely, web APIs aren't always a viable option for every environment. We’ll look at a web API solution for handling file merging now.
Demonstration
We’ll look at two different iterations of a web API solution that are optimized to merge XLSX files, and we’ll walk through code examples to call each method correctly. One is designed to handle no more than two files at a time, while the other can handle 10+ files at once. Both are free to use with a free API key.
To structure our API calls, we’ll start by adding the client library to our Maven project. Let’s first add the JitPack repository reference to our pom.xml file:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
After that, let’s add the API client dependency reference to pom.xml:
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v4.25</version>
</dependency>
</dependencies>
Next, let’s add import classes to the top of our file:
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.MergeDocumentApi;
And after that, we’ll initialize the default API client:
ApiClient defaultClient = Configuration.getDefaultApiClient();
In our next step, we’ll configure API key authorization. After we’ve retrieved the authorization instance, we’ll use the setApiKey
method to capture our actual key:
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
At this point, we can choose whether to merge 2 Excel documents or 10+ documents in our request. In either case, we’ll replace the example file paths shown in each snippet below with our actual XSLX file paths.
To merge two documents, we can use the below code:
MergeDocumentApi apiInstance = new MergeDocumentApi();
File inputFile1 = new File("/path/to/inputfile"); // File | First input file to perform the operation on.
File inputFile2 = new File("/path/to/inputfile"); // File | Second input file to perform the operation on (more than 2 can be supplied).
try {
byte[] result = apiInstance.mergeDocumentXlsx(inputFile1, inputFile2);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling MergeDocumentApi#mergeDocumentXlsx");
e.printStackTrace();
}
And to merge 10+, we can use the following code instead:
MergeDocumentApi apiInstance = new MergeDocumentApi();
File inputFile1 = new File("/path/to/inputfile"); // File | First input file to perform the operation on.
File inputFile2 = new File("/path/to/inputfile"); // File | Second input file to perform the operation on.
File inputFile3 = new File("/path/to/inputfile"); // File | Third input file to perform the operation on.
File inputFile4 = new File("/path/to/inputfile"); // File | Fourth input file to perform the operation on.
File inputFile5 = new File("/path/to/inputfile"); // File | Fifth input file to perform the operation on.
File inputFile6 = new File("/path/to/inputfile"); // File | Sixth input file to perform the operation on.
File inputFile7 = new File("/path/to/inputfile"); // File | Seventh input file to perform the operation on.
File inputFile8 = new File("/path/to/inputfile"); // File | Eighth input file to perform the operation on.
File inputFile9 = new File("/path/to/inputfile"); // File | Ninth input file to perform the operation on.
File inputFile10 = new File("/path/to/inputfile"); // File | Tenth input file to perform the operation on.
try {
byte[] result = apiInstance.mergeDocumentXlsxMulti(inputFile1, inputFile2, inputFile3, inputFile4, inputFile5, inputFile6, inputFile7, inputFile8, inputFile9, inputFile10);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling MergeDocumentApi#mergeDocumentXlsxMulti");
e.printStackTrace();
}
We’ve now invoked the merge method and handled exceptions (we’ll catch ApiException
for API-related issues like invalid inputs, network failures, or server errors). We’ll receive the resulting byte[]
arrays from mergeDocumentXlsx
and/or mergeDcoumentXlsxMulti
respectively, and we can wrap up our workflow by creating a new XLSX file or passing those file bytes downstream in our workflow for various other processes.
Conclusion
In this article, we provided some context for Excel XLSX file merging operations, and we briefly reviewed how XLSX file merging occurs according to XLSX file structure. Further, we discussed the pros and cons of using open-source Java APIs for file merging, and we learned how to call a web API using Java code examples that streamline file merging even more efficiently.
Opinions expressed by DZone contributors are their own.
Comments