DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Configuring Java Apps With Kubernetes ConfigMaps and Helm
  • How to Use Bootstrap to Build Beautiful Angular Apps
  • How to Use Java to Build Single Sign-on
  • Javac and Java Katas, Part 1: Class Path

Trending

  • Caching 101: Theory, Algorithms, Tools, and Best Practices
  • Designing Fault-Tolerant Messaging Workflows Using State Machine Architecture
  • ITBench, Part 1: Next-Gen Benchmarking for IT Automation Evaluation
  • Data Lake vs. Warehouse vs. Lakehouse vs. Mart: Choosing the Right Architecture for Your Business
  1. DZone
  2. Coding
  3. Frameworks
  4. Reproducible Builds in Java

Reproducible Builds in Java

What stands in the way of reproducible builds in Java? What are their benefits? And how can you create them? Time to turn to Gradle.

By 
Maria Camenzuli user avatar
Maria Camenzuli
·
Feb. 23, 18 · Tutorial
Likes (18)
Comment
Save
Tweet
Share
19.3K Views

Join the DZone community and get the full member experience.

Join For Free

When it comes to software, we look at source code to learn what an application will do when executed. However, the source code we read is not what will actually be executed by our computers. In the Java world, it is the Java bytecode that is produced by our compilers that eventually gets executed, normally after having been packaged into an archive. So when we are given code that has already been compiled and packaged, how can we verify that it is indeed the product of the source code we think produced it?

The answer to this is reproducible builds! A reproducible build is a deterministic function that, given our source code as input, will produce an executable artifact that is the same, byte for byte, every time it is run on the same input.

What Is the Value of a Reproducible Build?

I personally discovered the need for a reproducible build when I was tasked with implementing an automated change management system at work. Our production system, which is Java-based, for the most part, is frequently audited. The way this works is that every time the auditors come in, we need to provide them with a checksum of the software artifacts running on production at the time. They then compare these checksums with the checksums from the previous audit to determine which artifacts were modified and follow up with us on the changes detected. We, therefore, want to make sure that the checksums of software artifacts only change when a change in the source code has been made. If someone rebuilds and redeploys an artifact, that should not be flagged as a change in the live system, so as not to waste anybody's time.

Apart from this, a reproducible build enables you to do more with your build tools. For example, if you have a build that produces multiple artifacts, without being able to detect which of the artifacts has changed after every build, you either automatically deploy all artifacts or none at all. A reproducible build would allow your tools to recognize what has changed and deploy accordingly.

Is the Standard Java Build Reproducible?

We can get the answer to that question by running a simple test with Gradle, one of the two most commonly used Java build tools.

Let's open up a terminal and create a simple Java project with Gradle.

 > mkdir reproducible-build-test 

 > cd reproducible-build-test 

 > gradle init --type java-application 

This will generate a simple Java command line application that prints out 'Hello world'. Next, we will build this project and take a checksum of the resulting JAR file.

 > gradle build 

 > md5sum build/libs/reproducible-build-test.jar 

Finally, we clean our project and rebuild it, then take a checksum of the rebuilt JAR.

 > gradle clean 

 > gradle build  

 > md5sum build/libs/reproducible-build-test.jar 

If you have followed these steps, you will note that we got 2 different checksums, even though we built the exact same source code twice. You can try the same experiment with a simple Maven project, but the result will be the same. We can, therefore, conclude that no, standard Java builds are not reproducible.

The Makings of a JAR File

To build our JAR file, Gradle began by creating .class files from our .java   files by compiling them into Java bytecode. Then, these files were packaged together with some metadata to form a JAR archive. This is quite a simple 2-step process, so with one more test, we can find out which part of the build is non-deterministic.

Let's put aside Gradle and use the Java compiler javac dirctly to compile a class, which we will checksum, recompile, and checksum again.

 > javac src/main/java/App.java  

 > md5sum src/main/java/App.class  

 > rm src/main/java/App.class  

 > javac src/main/java/App.java  

 > md5sum src/main/java/App.class 

This time the checksums are the same. We now know that javac  is deterministic, so we can, therefore, conclude that the non-determinism in our Java build has to be introduced while the JAR file is being packaged.

To quote the official Java documentation, a JAR file is "essentially a ZIP file that contains an optional META-INF directory," and this is the root of our problem. It turns out that the specification of ZIP files requires every entry in the ZIP file to include a local file modification timestamp. This means that every time we recompile and repackage our Java project, we will get a different JAR file because there is a difference in file modified timestamps.

Apart from this, additional non-determinism can be introduced if files are put into the archive in different orders. This can happen in the case of parallel builds, or possibly even if you run the same build on different operating systems. 

Are Reproducible Builds in Java Possible?

Yes. To make a Java build reproducible, we need to tweak it so that the files that make up the JAR archive are always packaged in the same order, and that all timestamps are set to a constant date and time.

Luckily for us, Gradle actually introduced support for reproducible builds starting from version 3.4. If you build your Java project using Gradle, you can specify that you want your archive generating tasks to use a reproducible file order and to discard timestamps by adding the following to your build script.

tasks.withType(AbstractArchiveTask) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}


You can read more about this in the Gradle user guide.

For the Maven users, I am not aware of any way to get the archiver to generate reproducible JARs. However, there is a reproducible build plugin available that will uncompress JARs and repackage them for you, sorting the contents and replacing varying timestamps with a constant.

Note on using Gradle reproducible builds with Spring Boot versions predating version 2: The Spring Boot Gradle plugin repackages your JAR file after Gradle packages it to add in the things it needs to make your project executable. This repackaging step does not support preserving file order and setting constant timestamps. This issue has been solved for Spring Boot 2.

Conclusion

We have discussed what a reproducible build is, and seen that it provides value by enabling verification of software artifacts and supporting automated build tools. Although standard Java builds are non-deterministic because of the specification of ZIP files, there are workarounds that enable us to achieve reproducible builds in Java. If you want to read more about reproducible builds, I suggest you take a look at reproducible-builds.org.

Build (game engine) Java (programming language) Spring Framework JAR (file format) Gradle Artifact (UML) Spring Boot

Published at DZone with permission of Maria Camenzuli. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Configuring Java Apps With Kubernetes ConfigMaps and Helm
  • How to Use Bootstrap to Build Beautiful Angular Apps
  • How to Use Java to Build Single Sign-on
  • Javac and Java Katas, Part 1: Class Path

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!