Cloud + data orchestration: Demolish your data silos. Enable complex analytics. Eliminate I/O bottlenecks. Learn the essentials (and more)!
2024 DZone Community Survey: SMEs wanted! Help shape the future of DZone. Share your insights and enter to win swag!
Java is an object-oriented programming language that allows engineers to produce software for multiple platforms. Our resources in this Zone are designed to help engineers with Java program development, Java SDKs, compilers, interpreters, documentation generators, and other tools used to produce a complete application.
Twenty Things Every Java Software Architect Should Know
Maven Archetypes: Simplifying Project Template Creation
This is Part 2, a continuation of Javac and Java Katas, Part 1: Class Path, where we will run through the same exercises (katas) but this time the main focus will be the usage of the Java Platform Module System. Getting Started As in Part 1, all commands in this article are executed inside a Docker container to make sure that they work and to mitigate any environment-specific setup. So, let's clone the GitHub repository and run the command below from its java-javac-kata folder: Shell docker run --rm -it --name java_kata -v .:/java-javac-kata --entrypoint /bin/bash maven:3.9.6-amazoncorretto-17-debian Kata 1: "Hello, World!" Warm Up We will start with a primitive Java application, /module-path-part/kata-one-hello-world-warm-up, which does not have any third-party dependencies. The directory structure is as follows: In the picture above, we can see the Java project package hierarchy with two classes in the com.example.kata.one package and the module-info.java file which is a module declaration. Compilation To compile our code, we are going to use javac in the single-module mode, which implies that the module-source-path option is not used: Shell javac -d ./target/classes $(find -name '*.java') As a result, the compiled Java classes should appear in the target/classes folder. The verbose option can provide more details on the compilation process: Shell javac -verbose -d ./target/classes $(find -name '*.java') We can also obtain the compiled module description as follows: Shell java --describe-module com.example.kata.one --module-path target/classes Execution Shell java --module-path target/classes --module com.example.kata.one/com.example.kata.one.Main What should result in Hello World! in your console. Various verbose:[class|module|gc|jni] options can provide more details on the execution process: Shell java -verbose:module --module-path target/classes --module com.example.kata.one/com.example.kata.one.Main Also, experimenting a bit during both the compilation and execution stages, by removing or changing classes and packages, should give you a good understanding of which issues lead to particular errors. Packaging Building Modular JAR According to JEP 261: Module System, "A modular JAR file is like an ordinary JAR file in all possible ways, except that it also includes a module-info.class file in its root directory. " With that in mind, let's build one: Shell jar --create --file ./target/hello-world-warm-up.jar -C target/classes/ . The jar file is placed in the target folder. Also, using the verbose option can give us more details: Shell jar --verbose --create --file ./target/hello-world-warm-up.jar -C target/classes/ . You can view the structure of the built jar by using the following command: Shell jar -tf ./target/hello-world-warm-up.jar And get a module description of the modular jar: Shell jar --describe-module --file ./target/hello-world-warm-up.jar Additionally, we can launch the Java class dependency analyzer, jdeps, to gain even more insight: Shell jdeps ./target/hello-world-warm-up.jar As usual, there is the verbose option, too: Shell jdeps -verbose ./target/hello-world-warm-up.jar With that, let's proceed to run our modular jar: Shell java --module-path target/hello-world-warm-up.jar --module com.example.kata.one/com.example.kata.one.Main Building Modular Jar With the Main Class Shell jar --create --file ./target/hello-world-warm-up.jar --main-class=com.example.kata.one.Main -C target/classes/ . Having specified the main-class, we can run our app by omitting the <main-class> part in the module option: Shell java --module-path target/hello-world-warm-up.jar --module com.example.kata.one Kata 2: Third-Party Dependency Let's navigate to the /module-path-part/kata-two-third-party-dependency project and examine its structure. This kata is also a Hello World! application, but with a third-party dependency, guava-30.1-jre.jar, which has an automatic module name, com.google.common. You can check its name by using the describe-module option: Shell jar --describe-module --file lib/guava-30.1-jre.jar Compilation Shell javac --module-path lib -d ./target/classes $(find -name '*.java') The module-path option points to the lib folder that contains our dependency. Execution Shell java --module-path "target/classes:lib" --module com.example.kata.two/com.example.kata.two.Main Building Modular Jar Shell jar --create --file ./target/third-party-dependency.jar --main-class=com.example.kata.two.Main -C target/classes/ . Now, we can run our application as follows: Shell java --module-path "lib:target/third-party-dependency.jar" --module com.example.kata.two Kata 3: Spring Boot Application Conquest In the /module-path-part/kata-three-spring-boot-app-conquest folder, you will find a Maven project for a primitive Spring Boot application. To get started with this exercise, we need to execute the script below. Shell ./kata-three-set-up.sh The main purpose of this script is to download all necessary dependencies into the ./target/lib folder and remove all other files in the ./target directory. As seen in the picture above, the ./target/lib has three subdirectories. The test directory contains all test dependencies. The automatic-module stores dependencies used by the module declaration. The remaining dependencies used by the application are put into the unnamed-module directory. The intention of this separation will become clearer as we proceed. Compilation Shell javac --module-path target/lib/automatic-module -d ./target/classes/ $(find -P ./src/main/ -name '*.java') Take notice that for complications, we only need the modules specified in the module-info.java, which are stored in the automatic-module directory. Execution Shell java --module-path "target/classes:target/lib/automatic-module" \ --class-path "target/lib/unnamed-module/*" \ --add-modules java.instrument \ --module com.example.kata.three/com.example.kata.three.Main As a result, you should see the application running. For a better understanding of how the class-path option works here together with the module-path, I recommend reading the 3.1: The unnamed module part of "The State of the Module System." Building Modular Jar Let's package our compiled code as a modular jar, with the main class specified: Shell jar --create --file ./target/spring-boot-app-conquest.jar --main-class=com.example.kata.three.Main -C target/classes/ . Now, we can run it: Shell java --module-path "target/spring-boot-app-conquest.jar:target/lib/automatic-module" \ --class-path "target/lib/unnamed-module/*" \ --add-modules java.instrument \ --module com.example.kata.three Test Compilation For simplicity's sake, we will use the class path approach to run tests here. There's little benefit in struggling with tweaks to the module system and adding additional options to make the tests work. With that, let's compile our test code: Shell javac --class-path "./target/classes:./target/lib/automatic-module/*:./target/lib/test/*" -d ./target/test-classes/ $(find -P ./src/test/ -name '*.java') Test Execution Shell java --class-path "./target/classes:./target/test-classes:./target/lib/automatic-module/*:./target/lib/unnamed-module/*:./target/lib/test/*" \ org.junit.platform.console.ConsoleLauncher execute --scan-classpath --disable-ansi-colors For more details, you can have a look at Part 1 of this series (linked in the introduction), which elaborates on the theoretical aspect of this command. Wrapping Up That's it. I hope you found this useful, and that these exercises have provided you with some practical experience regarding the nuances of the Java Platform Module System.
Displaying images on your website makes for an interesting problem: on one side, you want to make them publicly available; on the other, you want to protect them against undue use. The age-long method to achieve it is watermarking: A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as audio, video or image data. It is typically used to identify ownership of the copyright of such signal. "Watermarking" is the process of hiding digital information in a carrier signal; the hidden information should, but does not need to, contain a relation to the carrier signal. Digital watermarks may be used to verify the authenticity or integrity of the carrier signal or to show the identity of its owners. It is prominently used for tracing copyright infringements and for banknote authentication. — Digital watermarking The watermark can be visible to act as a deterrent to people stealing the image; alternatively, you can use it to prove its origin after it has been stolen. However, if there are too many images on a site, it can be a burden to watermark them beforehand. It can be much simpler to watermark them dynamically. I searched for an existing JVM library dedicated to watermarking but surprisingly found nothing. We can achieve that in a Jakarata EE-based web app with the Java 2D API and a simple Filter. The Java 2D API has been part of the JDK since 1.0, and it shows. It translates into the following code: Kotlin private fun watermark(imageFilename: String): BufferedImage? { val watermark = ImageIO.read(ClassPathResource("/static/$imageFilename").inputStream) ?: return null //1 val watermarker = ImageIO.read(ClassPathResource("/static/apache-apisix.png").inputStream) //2 watermark.createGraphics().apply { //3 drawImage(watermarker, 20, 20, 300, 300, null) //4 dispose() //5 } return watermark } Get the original image Get the watermarking image Get the canvas of the original image Draw the watermark. I was too lazy to make it partially transparent Release system resources associated with this object Other stacks may have dedicated libraries, such as photon-rs for Rust and WebAssembly. With this in place, we can move to the web part. As mentioned above, we need a Filter. Kotlin class WatermarkFilter : Filter { override fun doFilter(request: ServletRequest, response: ServletResponse, chain: FilterChain) { val req = request as HttpServletRequest val imageFilename = req.servletPath.split("/").last() //1 val watermarked = watermark(imageFilename) //2 response.outputStream.use { ImageIO.write(watermarked, "jpeg", it) //3 } } } Get the image filename Watermark the image Write the image in the response output stream I explained how to watermark images on a Java stack in this post. I did the watermark manually because I didn't find any existing library. Next week, I'll show a no-code approach based on infrastructure components. To Go Further Digital watermarking Java 2D API Image Processing in WebAssembly
With Spring Boot 3.2 and Spring Framework 6.1, we get support for Coordinated Restore at Checkpoint (CRaC), a mechanism that enables Java applications to start up faster. With Spring Boot, we can use CRaC in a simplified way, known as Automatic Checkpoint/Restore at startup. Even though not as powerful as the standard way of using CRaC, this blog post will show an example where the Spring Boot applications startup time is decreased by 90%. The sample applications are from chapter 6 in my book on building microservices with Spring Boot. Overview The blog post is divided into the following sections: Introducing CRaC, benefits, and challenges Creating CRaC-based Docker images with a Dockerfile Trying out CRaC with automatic checkpoint/restore Summary Next blog post Let’s start learning about CRaC and its benefits and challenges. 1. Introducing CRaC, Benefits, and Challenges Coordinated Restore at Checkpoint (CRaC) is a feature in OpenJDK, initially developed by Azul, to enhance the startup performance of Java applications by allowing them to restore to a previously saved state quickly. CRaC enables Java applications to save their state at a specific point in time (checkpoint) and then restore from that state at a later time. This is particularly useful for scenarios where fast startup times are crucial, such as serverless environments, microservices, and, in general, applications that must be able to scale up their instances quickly and also support scale-to-zero when not being used. This introduction will first explain a bit about how CRaC works, then discuss some of the challenges and considerations associated with it, and finally, describe how Spring Boot 3.2 integrates with it. The introduction is divided into the following subsections: 1.1. How CRaC Works 1.2. Challenges and Considerations 1.3. Spring Boot 3.2 integration with CRaC 1.1. How CRaC Works Checkpoint Creation At a chosen point during the application’s execution, a checkpoint is created. This involves capturing the entire state of the Java application, including the heap, stack, and all active threads. The state is then serialized and saved to the file system. During the checkpoint process, the application is typically paused to ensure a consistent state is captured. This pause is coordinated to minimize disruption and ensure the application can resume correctly. Before taking the checkpoint, some requests are usually sent to the application to ensure that it is warmed up, i.e., all relevant classes are loaded, and the JVM HotSpot engine has had a chance to optimize the bytecode according to how it is being used in runtime. Commands to perform a checkpoint: Shell java -XX:CRaCCheckpointTo=<some-folder> -jar my_app.jar # Make calls to the app to warm up the JVM... jcmd my_app.jar JDK.checkpoint State Restoration When the application is started from the checkpoint, the previously saved state is deserialized from the file system and loaded back into memory. The application then continues execution from the exact point where the checkpoint was taken, bypassing the usual startup sequence. Command to restore from a checkpoint: Shell java -XX:CRaCRestoreFrom=<some-folder> Restoring from a checkpoint allows applications to skip the initial startup process, including class loading, warmup initialization, and other startup routines, significantly reducing startup times. For more information, see Azul’s documentation: What is CRaC? 1.2. Challenges and Considerations As with any new technology, CRaC comes with a new set of challenges and considerations: State Management Open files and connections to external resources, such as databases, must be closed before the checkpoint is taken. After the restore, they must be reopened. CRaC exposes a Java lifecycle interface that applications can use to handle this, org.crac.Resource, with the callback methods beforeCheckpoint and afterRestore. Sensitive Information Credentials and secrets stored in the JVM’s memory will be serialized into the files created by the checkpoint. Therefore, these files need to be protected. An alternative is to run the checkpoint command against a temporary environment that uses other credentials and replace the credentials on restore. Linux Dependency The checkpoint technique is based on a Linux feature called CRIU, “Checkpoint/Restore In Userspace”. This feature only works on Linux, so the easiest way to test CRaC on a Mac or a Windows PC is to package the application into a Linux Docker image. Linux Privileges Required CRIU requires special Linux privileges, resulting in Docker commands to build Docker images and creating Docker containers also requiring Linux privileges to be able to run. Storage Overhead Storing and managing checkpoint data requires additional storage resources, and the checkpoint size can impact the restoration time. The original jar file is also required to be able to restart a Java application from a checkpoint. I will describe how to handle these challenges in the section on creating Docker images. 1.3. Spring Boot 3.2 Integration With CRaC Spring Boot 3.2 (and the underlying Spring Framework) helps with the processing of closing and reopening connections to external resources. Before the creation of the checkpoint, Spring stops all running beans, giving them a chance to close resources if needed. After a restore, the same beans are restarted, allowing beans to reopen connections to the resources. The only thing that needs to be added to a Spring Boot 3.2-based application is a dependency to the crac-library. Using Gradle, it looks like the following in the gradle.build file: Groovy dependencies { implementation 'org.crac:crac' Note: The normal Spring Boot BOM mechanism takes care of versioning the crac dependency. The automatic closing and reopening of connections handled by Spring Boot usually works. Unfortunately, when this blog post was written, some Spring modules lacked this support. To track the state of CRaC support in the Spring ecosystem, a dedicated test project, Spring Lifecycle Smoke Tests, has been created. The current state can be found on the project’s status page. If required, an application can register callback methods to be called before a checkpoint and after a restore by implementing the above-mentioned Resource interface. The microservices used in this blog post have been extended to register callback methods to demonstrate how they can be used. The code looks like this: Java import org.crac.*; public class MyApplication implements Resource { public MyApplication() { Core.getGlobalContext().register(this); } @Override public void beforeCheckpoint(Context<? extends Resource> context) { LOG.info("CRaC's beforeCheckpoint callback method called..."); } @Override public void afterRestore(Context<? extends Resource> context) { LOG.info("CRaC's afterRestore callback method called..."); } } Spring Boot 3.2 provides a simplified alternative to take a checkpoint compared to the default on-demand alternative described above. It is called automatic checkpoint/restore at startup. It is triggered by adding the JVM system property -Dspring.context.checkpoint=onRefresh to the java -jar command. When set, a checkpoint is created automatically when the application is started. The checkpoint is created after Spring beans have been created but not started, i.e., after most of the initialization work but before that application starts. For details, see Spring Boot docs and Spring Framework docs. With an automatic checkpoint, we don’t get a fully warmed-up application, and the runtime configuration must be specified at build time. This means that the resulting Docker images will be runtime-specific and contain sensitive information from the configuration, like credentials and secrets. Therefore, the Docker images must be stored in a private and protected container registry. Note: If this doesn’t meet your requirements, you can opt for the on-demand checkpoint, which I will describe in the next blog post. With CRaC and Spring Boot 3.2’s support for CRaC covered, let’s see how we can create Docker images for Spring Boot applications that use CRaC. 2. Creating CRaC-Based Docker Images With a Dockerfile While learning how to use CRaC, I studied several blog posts on using CRaC with Spring Boot 3.2 applications. They all use rather complex bash scripts (depending on your bash experience) using Docker commands like docker run, docker exec, and docker commit. Even though they work, it seems like an unnecessarily complex solution compared to producing a Docker image using a Dockerfile. So, I decided to develop a Dockerfile that runs the checkpoint command as a RUN command in the Dockerfile. It turned out to have its own challenges, as described below. I will begin by describing my initial attempt and then explain the problems I stumbled into and how I solved them, one by one until I reach a fully working solution. The walkthrough is divided into the following subsections: 2.1. First attempt 2.2. Problem #1, privileged builds with docker build 2.3. Problem #2, CRaC returns exit status 137, instead of 0 2.4. Problem #3, Runtime configuration 2.5. Problem #4, Spring Data JPA 2.6. The resulting Dockerfile Let’s start with a first attempt and see where it leads us. 2.1. First Attempt My initial assumption was to create a Dockerfile based on a multi-stage build, where the first stage creates the checkpoint using a JDK-based base image, and the second step uses a JRE-based base image for runtime. However, while writing this blog post, I failed to find a base image for a Java 21 JRE supporting CRaC. So I changed my mind to use a regular Dockerfile instead, using a base image from Azul: azul/zulu-openjdk:21.0.3-21.34-jdk-crac Note: BellSoft also provides base images for CraC; see Liberica JDK with CRaC Support as an alternative to Azul. The first version of the Dockerfile looks like this: Dockerfile FROM azul/zulu-openjdk:21.0.3-21.34-jdk-crac ADD build/libs/*.jar app.jar RUN java -Dspring.context.checkpoint=onRefresh -XX:CRaCCheckpointTo=checkpoint -jar app.jar EXPOSE 8080 ENTRYPOINT ["java", "-XX:CRaCRestoreFrom=checkpoint"] This Dockerfile is unfortunately not possible to use since CRaC requires a build to run privileged commands. 2.2. Problem #1, Privileged Builds With Docker Build As mentioned in section 1.2. Challenges and Considerations, CRIU, which CRaC is based on, requires special Linux privileges to perform a checkpoint. The standard docker build command doesn’t allow privileged builds, so it can’t be used to build Docker images using the above Dockerfile. Note: The --privileged - flag that can be used in docker run commands is not supported by docker build. Fortunately, Docker provides an improved builder backend called BuildKit. Using BuildKit, we can create a custom builder that is insecure, meaning it allows a Dockerfile to run privileged commands. To communicate with BuildKit, we can use Docker’s CLI tool buildx. The following command can be used to create an insecure builder named insecure-builder: Shell docker buildx create --name insecure-builder --buildkitd-flags '--allow-insecure-entitlement security.insecure' Note: The builder runs in isolation within a Docker container created by the docker buildx create command. You can run a docker ps command to reveal the container. When the builder is no longer required, it can be removed with the command: docker buildx rm insecure-builder. The insecure builder can be used to build a Docker image with a command like: Shell docker buildx --builder insecure-builder build --allow security.insecure --load . Note: The --load flag loads the built image into the regular local Docker image cache. Since the builder runs in an isolated container, its result will not end up in the regular local Docker image cache by default. RUN commands in a Dockerfile that requires privileges must be suffixed with --security=insecure. The --security-flag is only in preview and must therefore be enabled in the Dockerfile by adding the following line as the first line in the Dockerfile: Dockerfile # syntax=docker/dockerfile:1.3-labs For more details on BuildKit and docker buildx, see Docker Build architecture. We can now perform the build; however, the way the CRaC is implemented stops the build, as we will learn in the next section. 2.3. Problem #2, CRaC Returns Exit Status 137 Instead of 0 On a successful checkpoint, the java -Dspring.context.checkpoint=onRefresh -XX:CRaCCheckpointTo... command is terminated forcefully (like using kill -9) and returns the exit status 137 instead of 0, causing the Docker build command to fail. To prevent the build from stopping, the java command is extended with a test that verifies that 137 is returned and, if so, returns 0 instead. The following is added to the java command: || if [ $? -eq 137 ]; then return 0; else return 1; fi. Note: || means that the command following will be executed if the first command fails. With CRaC working in a Dockerfile, let’s move on and learn about the challenges with runtime configuration and how to handle them. 2.4. Problem #3, Runtime Configuration Using Spring Boot’s automatic checkpoint/restore at startup, there is no way to specify runtime configuration on restore; at least, I haven’t found a way to do it. This means that the runtime configuration has to be specified at build time. Sensitive information from the runtime configuration, such as credentials used for connecting to a database, will written to the checkpoint files. Since the Docker images will contain these checkpoint files they also need to be handled in a secure way. The Spring Framework documentation contains a warning about this, copied from the section Automatic checkpoint/restore at startup: As mentioned above, and especially in use cases where the CRaC files are shipped as part of a deployable artifact (a container image, for example), operate with the assumption that any sensitive data “seen” by the JVM ends up in the CRaC files, and assess carefully the related security implications. So, let’s assume that we can protect the Docker images, for example, in a private registry with proper authorization in place and that we can specify the runtime configuration at build time. In Chapter 6 of the book, the source code specifies the runtime configuration in the configuration files, application.yml, in a Spring profile named docker. The RUN command, which performs the checkpoint, has been extended to include an environment variable that declares what Spring profile to use: SPRING_PROFILES_ACTIVE=docker. Note: If you have the runtime configuration in a separate file, you can add the file to the Docker image and point it out using an environment variable like SPRING_CONFIG_LOCATION=file:runtime-configuration.yml. With the challenges of proper runtime configuration covered, we have only one problem left to handle: Spring Data JPA’s lack of support for CRaC without some extra work. 2.5. Problem #4, Spring Data JPA Spring Data JPA does not work out-of-the-box with CRaC, as documented in the Smoke Tests project; see the section about Prevent early database interaction. This means that auto-creation of database tables when starting up the application, is not possible when using CRaC. Instead, the creation has to be performed outside of the application startup process. Note: This restriction does not apply to embedded SQL databases. For example, the Spring PetClinic application works with CRaC without any modifications since it uses an embedded SQL database by default. To address these deficiencies, the following changes have been made in the source code of Chapter 6: Manual creation of a SQL DDL script, create-tables.sql Since we can no longer rely on the application to create the required database tables, a SQL DDL script has been created. To enable the application to create the script file, a Spring profile create-ddl-script has been added in the review microservice’s configuration file, microservices/review-service/src/main/resources/application.yml. It looks like: YAML spring.config.activate.on-profile: create-ddl-script spring.jpa.properties.jakarta.persistence.schema-generation: create-source: metadata scripts: action: create create-target: crac/sql-scripts/create-tables.sql The SQL DDL file has been created by starting the MySQL database and, next, the application with the new Spring profile. Once connected to the database, the application and database are shut down. Sample commands: Shell docker compose up -d mysql SPRING_PROFILES_ACTIVE=create-ddl-script java -jar microservices/review-service/build/libs/review-service-1.0.0-SNAPSHOT.jar # CTRL/C once "Connected to MySQL: jdbc:mysql://localhost/review-db" is written to the log output docker compose down The resulting SQL DDL script, crac/sql-scripts/create-tables.sql, has been added to Chapter 6’s source code. The Docker Compose file configures MySQL to execute the SQL DDL script at startup. A CraC-specific version of the Docker Compose file has been created, crac/docker-compose-crac.yml. To create the tables when the database is starting up, the SQL DDL script is used as an init script. The SQL DDL script is mapped into the init-folder /docker-entrypoint-initdb.d with the following volume-mapping in the Docker Compose file: Dockerfile volumes: - "./sql-scripts/create-tables.sql:/docker-entrypoint-initdb.d/create-tables.sql" Added a runtime-specific Spring profile in the review microservice’s configuration file. The guidelines in the Smoke Tests project’s JPA section have been followed by adding an extra Spring profile named crac. It looks like the following in the review microservice’s configuration file: YAML spring.config.activate.on-profile: crac spring.jpa.database-platform: org.hibernate.dialect.MySQLDialect spring.jpa.properties.hibernate.temp.use_jdbc_metadata_defaults: false spring.jpa.hibernate.ddl-auto: none spring.sql.init.mode: never spring.datasource.hikari.allow-pool-suspension: true Finally, the Spring profile crac is added to the RUN command in the Dockerfile to activate the configuration when the checkpoint is performed. 2.6. The Resulting Dockerfile Finally, we are done with handling the problems resulting from using a Dockerfile to build a Spring Boot application that can restore quickly using CRaC in a Docker image. The resulting Dockerfile, crac/Dockerfile-crac-automatic, looks like: Dockerfile # syntax=docker/dockerfile:1.3-labs FROM azul/zulu-openjdk:21.0.3-21.34-jdk-crac ADD build/libs/*.jar app.jar RUN --security=insecure \ SPRING_PROFILES_ACTIVE=docker,crac \ java -Dspring.context.checkpoint=onRefresh \ -XX:CRaCCheckpointTo=checkpoint -jar app.jar \ || if [ $? -eq 137 ]; then return 0; else return 1; fi EXPOSE 8080 ENTRYPOINT ["java", "-XX:CRaCRestoreFrom=checkpoint"] Note: One and the same Dockerfile is used by all microservices to create CRaC versions of their Docker images. We are now ready to try it out! 3. Trying Out CRaC With Automatic Checkpoint/Restore To try out CRaC, we will use the microservice system landscape used in Chapter 6 of my book. If you are not familiar with the system landscape, it looks like the following: Chapter 6 uses Docker Compose to manage (build, start, and stop) the system landscape. Note: If you don’t have all the tools used in this blog post installed in your environment, you can look into Chapters 21 and 22 for installation instructions. To try out CRaC, we need to get the source code from GitHub, compile it, and create the Docker images for each microservice using a custom insecure Docker builder. Next, we can use Docker Compose to start up the system landscape and run the end-to-end validation script that comes with the book to ensure that everything works as expected. We will wrap up the try-out section by comparing the startup times of the microservices when they start with and without using CRaC. We will go through each step in the following subsections: 3.1. Getting the source code 3.2. Building the CRaC-based Docker images 3.3. Running end-to-end tests 3.4. Comparing startup times without CRaC 3.1. Getting the Source Code Run the following commands to get the source code from GitHub, jump into the Chapter06 folder, check out the branch SB3.2-crac-automatic, and ensure that a Java 21 JDK is used (Eclipse Temurin is used here): Shell git clone https://github.com/PacktPublishing/Microservices-with-Spring-Boot-and-Spring-Cloud-Third-Edition.git cd Microservices-with-Spring-Boot-and-Spring-Cloud-Third-Edition/Chapter06 git checkout SB3.2-crac-automatic sdk use java 21.0.3-tem 3.2. Building the CRaC-Based Docker Images Start with compiling the microservices source code: Shell ./gradlew build If not already created, create the insecure builder with the command: Shell docker buildx create --name insecure-builder --buildkitd-flags '--allow-insecure-entitlement security.insecure' Now we can build a Docker image, where the build performs a CRaC checkpoint for each of the microservices with the commands: Shell docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t product-composite-crac --load microservices/product-composite-service docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t product-crac --load microservices/product-service docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t recommendation-crac --load microservices/recommendation-service docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t review-crac --load microservices/review-service 3.3. Running End-To-End Tests To start up the system landscape, we will use Docker Compose. Since CRaC requires special Linux privileges, a CRaC-specific docker-compose file comes with the source code, crac/docker-compose-crac.yml. Each microservice is given the required privilege, CHECKPOINT_RESTORE, by specifying: YAML cap_add: - CHECKPOINT_RESTORE Note: Several blog posts on CRaC suggest using privileged containers, i.e., starting them with run --privleged or adding privileged: true in the Docker Compose file. This is a really bad idea since an attacker who gets control over such a container can easily take control of the host that runs Docker. For more information, see Docker’s documentation on Runtime privilege and Linux capabilities. The final addition to the CRaC-specific Docker Compose file is the volume mapping for MySQL to add the init file described above in section 2.5. Problem #4, Spring Data JPA: Dockerfile volumes: - "./sql-scripts/create-tables.sql:/docker-entrypoint-initdb.d/create-tables.sql" Using this Docker Compose file, we can start up the system landscape and run the end-to-end verification script with the following commands: Shell export COMPOSE_FILE=crac/docker-compose-crac.yml docker compose up -d Let’s start with verifying that the CRaC afterRestore callback methods were called: Shell docker compose logs | grep "CRaC's afterRestore callback method called..." Expect something like: Shell ...ReviewServiceApplication : CRaC's afterRestore callback method called... ...RecommendationServiceApplication : CRaC's afterRestore callback method called... ...ProductServiceApplication : CRaC's afterRestore callback method called... ...ProductCompositeServiceApplication : CRaC's afterRestore callback method called... Now, run the end-to-end verification script: Shell ./test-em-all.bash If the script ends with a log output similar to: Shell End, all tests OK: Fri Jun 28 17:40:43 CEST 2024 …it means all tests run ok, and the microservices behave as expected. Bring the system landscape down with the commands: Shell docker compose down unset COMPOSE_FILE After verifying that the microservices behave correctly when started from a CRaC checkpoint, we can compare their startup times with microservices started without using CRaC. 3.4. Comparing Startup Times Without CRaC Now over to the most interesting part: How much faster does the microservice startup when performing a restore from a checkpoint compared to a regular cold start? The tests have been run on a MacBook Pro M1 with 64 GB memory. Let’s start with measuring startup times without using CRaC. 3.4.1. Startup Times Without CRaC To start the microservices without CRaC, we will use the default Docker Compose file. So, we must ensure that the COMPOSE_FILE environment variable is unset before we build the Docker images for the microservices. After that, we can start the database services, MongoDB and MySQL: Shell unset COMPOSE_FILE docker compose build docker compose up -d mongodb mysql Verify that the databases are reporting healthy with the command: docker compose ps. Repeat the command until both report they are healthy. Expect a response like this: Shell NAME ... STATUS ... chapter06-mongodb-1 ... Up 13 seconds (healthy) ... chapter06-mysql-1 ... Up 13 seconds (healthy) ... Next, start the microservices and look in the logs for the startup time (searching for the word Started). Repeat the logs command until logs are shown for all four microservices: Shell docker compose up -d docker compose logs | grep Started Look for a response like: Shell ...Started ProductCompositeServiceApplication in 1.659 seconds ...Started ProductServiceApplication in 2.219 seconds ...Started RecommendationServiceApplication in 2.203 seconds ...Started ReviewServiceApplication in 3.476 seconds Finally, bring down the system landscape: Shell docker compose down 3.4.2. Startup Times With CRaC First, declare that we will use the CRaC-specific Docker Compose file and start the database services, MongoDB and MySQL: Shell export COMPOSE_FILE=crac/docker-compose-crac.yml docker compose up -d mongodb mysql Verify that the databases are reporting healthy with the command: docker compose ps. Repeat the command until both report they are healthy. Expect a response like this: Shell NAME ... STATUS ... crac-mongodb-1 ... Up 10 seconds (healthy) ... crac-mysql-1 ... Up 10 seconds (healthy) ... Next, start the microservices and look in the logs for the startup time (this time searching for the word Restored). Repeat the logs command until logs are shown for all four microservices: Shell docker compose up -d docker compose logs | grep Restored Look for a response like: Shell ...Restored ProductCompositeServiceApplication in 0.131 seconds ...Restored ProductServiceApplication in 0.225 seconds ...Restored RecommendationServiceApplication in 0.236 seconds ...Restored ReviewServiceApplication in 0.154 seconds Finally, bring down the system landscape: Shell docker compose down unset COMPOSE_FILE Now, we can compare the startup times! 3.4.3. Comparing Startup Times Between JVM and CRaC Here is a summary of the startup times, along with calculations of how many times faster the CRaC-enabled microservice starts and the reduction of startup times in percentage: MICROSERVICE WITHOUT CRAC WITH CRAC CRAC TIMES FASTER CRAC REDUCED STARTUP TIME product-composite 1.659 0.131 12.7 92% product 2.219 0.225 9.9 90% recommendation 2.203 0.236 9.3 89% review 3.476 0.154 22.6 96% Generally, we can see a 10-fold performance improvement in startup times or 90% shorter startup time; that’s a lot! Note: The improvement in the Review microservice is even better since it no longer handles the creation of database tables. However, this improvement is irrelevant when comparing improvements using CRaC, so let’s discard the figures for the Review microservice. 4. Summary Coordinated Restore at Checkpoint (CRaC) is a powerful feature in OpenJDK that improves the startup performance of Java applications by allowing them to resume from a previously saved state, a.k.a., a checkpoint. With Spring Boot 3.2, we also get a simplified way of creating a checkpoint using CRaC, known as automatic checkpoint/restore at startup. The tests in this blog post indicate a 10-fold improvement in startup performance, i.e., a 90% reduction in startup time when using automatic checkpoint/restore at startup. The blog post also explained how Docker images using CRaC can be built using a Dockerfile instead of the complex bash scripts suggested by most blog posts on the subject. This, however, comes with some challenges of its own, like using custom Docker builders for privileged builds, as explained in the blog post. Using Docker images created using automatic checkpoint/restore at startup comes with a price. The Docker images will contain runtime-specific and sensitive information, such as credentials to connect to a database at runtime. Therefore, they must be protected from unauthorized use. The Spring Boot support for CRaC does not fully cover all modules in Spring’s eco-system, forcing some workaround to be applied, e.g., when using Spring Data JPA. Also, when using automatic checkpoint/Restore at startup, the JVM HotSpot engine cannot be warmed up before the checkpoint. If optimal execution time for the first requests being processed is important, automatic checkpoint/restore at startup is probably not the way to go. 5. Next Blog Post In the next blog post, I will show you how to use regular on-demand checkpoints to solve some of the considerations with automatic checkpoint/restore at startup. Specifically, the problems with specifying the runtime configuration at build time, storing sensitive runtime configuration in the Docker images, and how the Java VM can be warmed up before performing the checkpoint.
Despite being nearly 30 years old, the Java platform remains consistently among the top three most popular programming languages. This enduring popularity can be attributed to the Java Virtual Machine (JVM), which abstracts complexities such as memory management and compiles code during execution, enabling unparalleled internet-level scalability. Java's sustained relevance is also due to the rapid evolution of the language, its libraries, and the JVM. Java Virtual Threads, introduced in Project Loom, which is an initiative by the OpenJDK community, represent a groundbreaking change in how Java handles concurrency. The Complete Java Coder Bundle.* *Affiliate link. See Terms of Use. Exploring the Fabric: Unveiling Threads A thread is the smallest schedulable unit of processing, running concurrently and largely independently of other units. It's an instance of java.lang.Thread. There are two types of threads: platform threads and virtual threads. A platform thread is a thin wrapper around an operating system (OS) thread, running Java code on its underlying OS thread for its entire lifetime. Consequently, the number of platform threads is limited by the number of OS threads. These threads have large stacks and other OS-managed resources, making them suitable for all task types but potentially limited in number. Virtual threads in Java, unlike platform threads, aren't tied to specific OS threads but still execute on them. When a virtual thread encounters a blocking I/O operation, it pauses, allowing the OS thread to handle other tasks. Similar to virtual memory, where a large virtual address space maps to limited RAM, Java's virtual threads map many virtual threads to fewer OS threads. They're ideal for tasks with frequent I/O waits but not for sustained CPU-intensive operations. Hence virtual threads are lightweight threads that simplify the development, maintenance, and debugging of high-throughput concurrent applications. Comparing the Threads of Fabric: Virtual vs. Platform Let’s compare platform threads with virtual threads to understand their differences better. Crafting Virtual Threads Creating Virtual Threads Using Thread Class and Thread.Builder Interface The example below creates and starts a virtual thread that prints a message. It uses the join method to ensure the virtual thread completes before the main thread terminates, allowing you to see the printed message. Java Thread thread = Thread.ofVirtual().start(() -> System.out.println("Hello World!! I am Virtual Thread")); thread.join(); The Thread.Builder interface allows you to create threads with common properties like thread names. The Thread.Builder.OfPlatform subinterface creates platform threads, while Thread.Builder.OfVirtual creates virtual threads. Here’s an example of creating a virtual thread named "MyVirtualThread" using the Thread.Builder interface: Java Thread.Builder builder = Thread.ofVirtual().name("MyVirtualThread"); Runnable task = () -> { System.out.println("Thread running"); }; Thread t = builder.start(task); System.out.println("Thread name is: " + t.getName()); t.join(); Creating and Running a Virtual Thread Using Executors.newVirtualThreadPerTaskExecutor() Method Executors allow you to decouple thread management and creation from the rest of your application. In the example below, an ExecutorService is created using the Executors.newVirtualThreadPerTaskExecutor() method. Each time ExecutorService.submit(Runnable) is called, a new virtual thread is created and started to execute the task. This method returns a Future instance. It's important to note that the Future.get() method waits for the task in the thread to finish. As a result, this example prints a message once the virtual thread's task is completed. Java try (ExecutorService myExecutor = Executors.newVirtualThreadPerTaskExecutor()) { Future<?> future = myExecutor.submit(() -> System.out.println("Running thread")); future.get(); System.out.println("Task completed"); // ... Is Your Fabric Lightweight With Virtual Threads? Memory Program 1: Create 10,000 Platform Threads Java public class PlatformThreadMemoryAnalyzer { private static class MyTask implements Runnable { @Override public void run() { try { // Sleep for 10 minutes Thread.sleep(600000); } catch (InterruptedException e) { System.err.println("Interrupted Exception!!"); } } } public static void main(String args[]) throws Exception { // Create 10000 platform threads int i = 0; while (i < 10000) { Thread myThread = new Thread(new MyTask()); myThread.start(); i++; } Thread.sleep(600000); } } Program 2: Create 10,000 Virtual Threads Java public class VirtualThreadMemoryAnalyzer { private static class MyTask implements Runnable { @Override public void run() { try { // Sleep for 10 minutes Thread.sleep(600000); } catch (InterruptedException e) { System.err.println("Interrupted Exception!!"); } } } public static void main(String args[]) throws Exception { // Create 10000 virtual threads int i = 0; while (i < 10000) { Thread.ofVirtual().start(new Task()); i++; } Thread.sleep(600000); } } Executed both programs simultaneously in a RedHat VM. Configured the thread stack size to be 1mb (by passing JVM argument -Xss1m). This argument indicates that every thread in this application should be allocated 1mb of stack size. Below is the top command output of the threads running. You can notice that the virtual threads only occupies 7.8mb (i.e., 7842364 bytes), whereas the platform threads program occupies 19.2gb. This clearly indicates that virtual threads consume comparatively much less memory. Thread Creation Time Program 1: Launches 10,000 platform threads Java public class PlatformThreadCreationTimeAnalyzer { private static class Task implements Runnable { @Override public void run() { System.out.println("Hello! I am a Platform Thread"); } } public static void main(String[] args) throws Exception { long startTime = System.currentTimeMillis(); for (int counter = 0; counter < 10_000; ++counter) { new Thread(new Task()).start(); } System.out.print("Platform Thread Creation Time: " + (System.currentTimeMillis() - startTime)); } } Program 2: Launches 10,000 virtual threads Java public class VirtualThreadCreationTimeAnalyzer { private static class Task implements Runnable { @Override public void run() { System.out.println("Hello! I am a Virtual Thread"); } } public static void main(String[] args) throws Exception { long startTime = System.currentTimeMillis(); for (int counter = 0; counter < 10_000; ++counter) { Thread.startVirtualThread(new Task()); } System.out.print("Virtual Thread Creation Time: " + (System.currentTimeMillis() - startTime)); } } Below is the table that summarizes the execution time of these two programs: Virtual Threads Platform Threads Execution Time 84 ms 346 ms You can see that the virtual Thread took only 84 ms to complete, whereas the Platform Thread took almost 346 ms. It’s because platform threads are more expensive to create. Because whenever a platform needs to be created an operating system thread needs to be allocated to it. Creating and allocating an operating system thread is not a cheap operation. Reweaving the Fabric: Applications of Virtual Threads Virtual threads can significantly benefit various types of applications, especially those requiring high concurrency and efficient resource management. Here are a few examples: Web servers: Handling a large number of simultaneous HTTP requests can be efficiently managed with virtual threads, reducing the overhead and complexity of traditional thread pools. Microservices: Microservices often involve a lot of I/O operations, such as database queries and network calls. Virtual threads can handle these operations more efficiently. Data processing: Applications that process large amounts of data concurrently can benefit from the scalability of virtual threads, improving throughput and performance. Weaving Success: Avoiding Pitfalls To make the most out of virtual threads, consider the following best practices: Avoid synchronized blocks/methods: When using virtual threads with synchronized blocks, they may not relinquish control of the underlying OS thread when blocked, limiting their benefits. Avoid thread pools for virtual threads: Virtual threads are meant to be used without traditional thread pools. The JVM manages them efficiently, and thread pools can introduce unnecessary complexity. Reduce ThreadLocal usage: Millions of virtual threads with individual ThreadLocal variables can rapidly consume Java heap memory. Wrapping It Up Virtual threads in Java are threads implemented by the Java runtime, not the operating system. Unlike traditional platform threads, virtual threads can scale to a high number — potentially millions — within the same Java process. This scalability allows them to efficiently handle server applications designed in a thread-per-request style, improving concurrency, throughput, and hardware utilization. Developers familiar with java.lang.Thread since Java SE 1.0 can easily use virtual threads, as they follow the same programming model. However, practices developed to manage the high cost of platform threads are often counterproductive with virtual threads, requiring developers to adjust their approach. This shift in thread management encourages a new perspective on concurrency. "Hello, world? Hold on, I’ll put you on hold, spawn a few more threads, and get back to you" Happy coding. :)
This guide is a valuable resource for Java developers seeking to create robust and efficient GraphQL API servers. This detailed guide will take you through all the steps for implementing GraphQL in Java for real-world applications. It covers the fundamental concepts of GraphQL, including its query language and data model, and highlights its similarities to programming languages and relational databases. It also offers a practical step-by-step process for building a GraphQL API server in Java utilizing Spring Boot, Spring for GraphQL, and a relational database. The design emphasizes persistence, flexibility, efficiency, and modernity. Additionally, the blog discusses the trade-offs and challenges involved in the process. Finally, it presents an alternative path beyond the conventional approach, suggesting the potential benefits of a "GraphQL to SQL compiler" and exploring the option of acquiring a GraphQL API instead of building one. What Is GraphQL and Why Do People Want It? GraphQL is a significant evolution in the design of Application Performance Interfaces (API). Still, even today, it can be challenging to know how to get started with GraphQL, how to progress, and how to move beyond the conventional wisdom of GraphQL. This is especially true for Java. This guide attempts to cover all these bases in three steps. First, I'll tell you what GraphQL is, and as a bonus, I'll let you know what GraphQL really is. Second, I'll show you how to implement state-of-the-art GraphQL in Java for an actual application. Third, I'll offer you an alternative path beyond the state-of-the-art that may suit your needs better in every dimension. So, what is GraphQL? Well, GraphQL.org says: "GraphQL is a query language for your API and a server-side runtime for executing queries using a type system you define for your data. GraphQL isn’t tied to any specific database or storage engine and is instead backed by your existing code and data." That's correct, but let's look at it from different directions. Sure, GraphQL is "a query language for your API," but you might as well just say that it is an API or a way of building an API. That contrasts it with REST, which GraphQL is an evolution from and an alternative to. GraphQL offers several improvements over REST: Expressivity: A client can say what data they need from a server, no more and no less. Efficiency: Expressivity leads to efficiency gains, reducing network chatter and wasted bandwidth. Discoverability: To know what to say to a server, a client needs to know what can be said to a server. Discoverability allows data consumers to know exactly what's available from data producers. Simplicity: GraphQL puts clients in the driver's seat, so good ergonomics for driving should exist. GraphQL's highly-regular machine-readable syntax, simple execution model, and simple specifications lend themselves to inter-operable and composable tools: Query tools Schema registries Gateways Code generators Client libraries But GraphQL is also a data model for its query language, and despite the name, neither the query language nor the data model is very "graphy." The data model is essentially just JSON. The query language looks like JSON and can be boiled down to a few simple features: Types: A type is a simple value (a scalar) or a set of fields (an object). While you naturally introduce new types for your own problem domain, there are a few special types (called Operations). One of these is Query, which is the root of requests for data (setting aside Subscription for now, for the sake of simplicity). A type essentially is a set of rules for determining if a piece of data–or a request for that piece of data–validly conforms to the given type. A GraphQL type is very much like a user-defined type in programming languages like C++, Java, and Typescript, and is very much like a table in a relational database. Field: A field within one type contains one or more pieces of data that validly conform to another type, thus establishing relationships among types. A GraphQL field is very much like a property of a user-defined type in a programming language and is very much like a column in a relational database. Relationships between GraphQL types are very much like pointers or references in programming languages and are very much like foreign key constraints in relational databases. There's more to GraphQL, but that's pretty much the essence. Note the similarities between concepts in GraphQL and programming languages, and especially between concepts in GraphQL and relational databases. OK, we’ve covered what GraphQL is, but what is GraphQL for? Why should we consider it as an alternative to REST? I listed above some of GraphQL's improvements over typical REST–expressivity, efficiency, discoverability, simplicity–but another perhaps more concise way to put it is this: GraphQL's expressivity, efficiency, discoverability, and simplicity make life easier for data consumers. However, there's a corollary: GraphQL's expressivity, efficiency, discoverability, and simplicity make life harder for data producers. That's you! If you're a Java programmer working with GraphQL, your job is probably to produce GraphQL API servers for clients to consume (there are relatively few Java settings on the client). Offering all that expressivity, discoverability, etc. is not easy, so how do you do it? How Do I Provide the GraphQL That People Want, Especially as a Java Developer? On the journey to providing a GraphQL API, we confront a series of interdependent choices that can make life easier (or harder) for data producers. One choice concerns just how "expressive, efficient, discoverable, and simple" our API is, but let's set that aside for a moment and treat it as an emergent property of the other choices we make. Life is about trade-offs, after all. Another choice is over build-versus-buy [PDF], but let's also set that aside for a moment, accept that we're building a GraphQL API server (in Java), explore how that is done, and evaluate the consequences. If you’re building a GraphQL API server in Java, another choice is whether to build it completely from scratch or to use libraries and frameworks and if the latter, then which libraries and frameworks to use. Let's set aside a complete DIY solution as pointless masochism, and survey the landscape of Java libraries and frameworks for GraphQL. As of writing (May 2024) there are three important interdependent players in this space: Graphql-java:graphql-java is a lower-level foundational library for working with GraphQL in Java, which began in 2015. Since the other players depend on and use graphql-java, consider graphql-java to be non-optional. Another crucial choice is whether you are or are not using the Spring Boot framework. If you're not using Spring Boot then stop here! Since this is a prerequisite, in the parlance of the ThoughtWorks Radar this is unavoidably adopt. Netflix DGS: DGS is a higher-level library for working with GraphQL in Java with Spring Boot, which began in 2021. If you're using DGS then you will also be using graphql-java under the hood, but typically you won't come into contact with graphql-java. Instead, you will be sprinkling annotations throughout the Java code to identify the code segments called "resolvers" or "data fetchers” that execute GraphQL requests. ThoughtWorks said Trial as of 2023 for DGS but this is a dynamic space and their opinion may have changed. I say Hold for the reasons given below. Spring for GraphQL: Spring for GraphQL is another higher-level library for working with GraphQL in Java with Spring Boot, which began around 2023, and is also based on annotations. It may be too new for ThoughtWorks, but it's not too new for me. I say Adopt and read on for why. The makers of Spring for GraphQL say: "It is a joint collaboration between the GraphQL Java team and Spring engineering…It aims to be the foundation for all Spring, GraphQL applications." Translation: The Spring team has a privileged collaboration with the makers of the foundational library for GraphQL in Java, and intends to "win" in this space. Moreover, the makers of Netflix DGS have much to say about that library's relationship to Spring for GraphQL. "Soon after we open-sourced the DGS framework, we learned about parallel efforts by the Spring team to develop a GraphQL framework for Spring Boot. The Spring GraphQL project was in the early stages at the time and provided a low level of integration with graphql-java. Over the past year, however, Spring GraphQL has matured and is mostly at feature parity with the DGS Framework. We now have 2 competing frameworks that solve the same problems for our users. Today, new users must choose between the DGS Framework or Spring GraphQL, thus missing out on features available in one framework but not the other. This is not an ideal situation for the GraphQL Java community. For the maintainers of DGS and Spring GraphQL, it would be far more effective to collaborate on features and improvements instead of having to solve the same problem independently. Finally, a unified community would provide us with better channels for feedback. The DGS framework is widely used and plays a vital role in the architecture of many companies, including Netflix. Moving away from the framework in favor of Spring-GraphQL would be a costly migration without any real benefits. From a Spring Framework perspective, it makes sense to have an out-of-the-box GraphQL offering, just like Spring supports REST." Translation: If you're a Spring Boot shop already using DGS, go ahead and keep using it for now. If you're a Spring Boot shop starting afresh, you should probably just use Spring for GraphQL. In this guide, I've explained GraphQL in detail, setting the stage by providing some background on the relevant libraries and frameworks in Java. Now, let me show you how to implement state-of-the-art GraphQL in Java for a real application. Since we're starting afresh, we'll take the advice from DGS and just use Spring for GraphQL. How Exactly Do I Build a GraphQL API Server in Java for a Real Application? Opinions are free to differ on what it even means to be a "real application." For the purpose of this guide, what I mean by "real application" in this setting is an application that has at least these features: Persistence: Many tutorials, getting-started guides, and overviews only address in-memory data models, stopping well short of interacting with a database. This guide shows you some ways to cross this crucial chasm and discusses some of the consequences, challenges, and trade-offs involved. This is a vast topic so I barely scratch the surface, but it's a start. The primary goal is to support Query operations. A stretch goal is to support Mutation operations. Subscription operations are thoroughly off-the-table for now. Flexibility: I wrote above that just how expressive, efficient, discoverable, and simple we make our GraphQL API is technically a choice we make, but is practically a property that emerges from other choices we make. I also wrote that building GraphQL API servers is difficult for data producers. Consequently, many data producers cope with that difficulty by dialing way back on those other properties of the API. Many GraphQL API servers in the real world are inflexible, superficial, shallow, and are, in many ways, "GraphQL-in-name-only." This guide shows some of what's involved in going beyond the status quo and how that comes into tension with other properties, like efficiency. Spoiler Alert: It isn't pretty. Efficiency: In fairness, many GraphQL API servers in the real world achieve decent efficiency, albeit at the expense of flexibility, by essentially encoding REST API endpoints into a shallow GraphQL schema. The standard approach in GraphQL is the data-loader pattern, but few tutorials really show how this is used even with an in-memory data model let alone with a database. This guide offers one implementation of the data loader pattern to combat the N+1 problem. Again, we see how that comes into tension with flexibility and simplicity. Modernity: Anyone writing a Java application that accesses a database will have to make choices about how to access a database. That could involve just JDBC and raw SQL (for a relational database) but arguably the current industry standard is still to use an Object-Relational Mapping (ORM) layer like Hibernate, jooq, or the standard JPA. Getting an ORM to play nice with GraphQL is a tall order, may not be prudent, and may not even be possible. Few if any other guides touch this with a ten-foot pole. This guide at least will make an attempt with an ORM in the future! The recipe I follow in this guide for building a GraphQL API server in Java for a relational database is the following: Choose Spring Boot for the overall server framework. Choose Spring for GraphQL for the GraphQL-specific parts. Choose Spring Data for JDBC for data access in lieu of an ORM for now. Choose Maven over Gradle because I prefer the former. If you choose the latter, you're on your own. Choose PostgreSQL for the database. Most of the principles should apply to pretty much any relational database, but you've got to start somewhere. Choose Docker Compose for orchestrating a development database server. There are other ways of bringing in a database, but again, you've got to start somewhere. Choose the Chinook data model. Naturally, you will have your own data model, but Chinook is a good choice for illustration purposes because it's fairly rich, has quite a few tables and relationships, goes well beyond the ubiquitous but trivial To-Do apps, is available for a wide variety of databases, and is generally well-understood. Choose the Spring Initializr for bootstrapping the application. There's so much ceremony in Java, any way to race through some of it is welcomed. Create a GraphQL schema file. This is a necessary step for graphql-java, for DGS, and for Spring for GraphQL. Weirdly, the Spring for GraphQL overview seems to overlook this step, but the DGS "Getting Started" guide is there to remind us. Many "thought leaders" will exhort you to isolate your underlying data model from your API. Theoretically, you could do this by having different GraphQL types from your database tables. Practically, this is a source of busy work. Write Java model classes, one for every GraphQL type in the schema file and every table in the database. You're free to make other choices for this data model or for any other data model, and you can even write code or SQL views to isolate your underlying data model from your API but do ask how important this really is when the number of tables/classes/types grows to the hundreds or thousands. Write Java controller classes, with one method at least for every root field. In practice, this is the bare minimum. There probably will be many more. By the way, these methods are your "resolvers". Annotate every controller class with @Controller to tell Spring to inject it as a Java Bean that can serve network traffic. Annotate every resolver/data-fetcher method with @SchemaMapping or QueryMapping to tell Spring for GraphQL how to execute the parts of a GraphQL operation. Implement those resolver/data-fetcher methods by whatever means necessary to mediate interactions with the database. In version 0, this will be just simple raw SQL statements. Upgrade some of those resolver/data-fetcher methods by replacing @SchemaMapping or @QueryMapping with @BatchMapping. This latter annotation signals to Spring for GraphQL that we want to make the execution more efficient by combating the N+1 problem, and we're prepared to pay the price in more code in order to do it. Refactor those @BatchMapping annotated methods to support the data loader pattern by accepting (and processing) a list of identifiers for related entities rather than a single identifier for a single related entity. Write copious test-cases for every possible interaction. Just use a fuzz-tester on the API and call it a day. But Really, How Exactly Do I Build a GraphQL API Server in Java for a Real Application? That is a long recipe above! Instead of going into chapter and verse for every single step, in this guide, I do two things. First, I provide a public repository (Steps 1-5) with working code that is easy to use, easy to run, easy to read, and easy to understand. Second, I highlight some of the important steps, put them in context, discuss the choices involved, and offer some alternatives. Step 6: Choose Docker Compose for Orchestrating a Development Database Server Again, there are other ways to pull this off, but this is one good way. YAML version: "3.6" services: postgres: image: postgres:16 ports: - ${PGPORT:-5432}:5432 restart: always environment: POSTGRES_PASSWORD: postgres PGDATA: /var/lib/pgdata volumes: - ./initdb.d-postgres:/docker-entrypoint-initdb.d:ro - type: tmpfs target: /var/lib/pg/data Set an environment variable for PGPORT to expose PostgreSQL on a host port, or hard-code it to whatever value you like. Step 7: Choose the Chinook Data Model The Chinook files from YugaByte work out-of-the-box for PostgreSQL and are a good choice. Just make sure that there is a sub-directory initdb.d-postgres and download the Chinook DDL and DML files into that directory, taking care to give them numeric prefixes so that they're run by the PostgreSQL initialization script in the proper order. Shell mkdir -p ./initdb.d-postgres wget -O ./initdb.d-postgres/04_chinook_ddl.sql https://raw.githubusercontent.com/YugaByte/yugabyte-db/master/sample/chinook_ddl.sql wget -O ./initdb.d-postgres/05_chinook_genres_artists_albums.sql https://raw.githubusercontent.com/YugaByte/yugabyte-db/master/sample/chinook_genres_artists_albums.sql wget -O ./initdb.d-postgres/06_chinook_songs.sql https://raw.githubusercontent.com/YugaByte/yugabyte-db/master/sample/chinook_songs.sql Now, you can start the database service using Docker Compose. docker compose up -d Or docker-compose up -d There are many ways to spot-check the database's validity. If the Docker Compose service seems to have started correctly, here's one way using psql. psql "postgresql://postgres:postgres@localhost:5432/postgres" -c '\d' SQL List of relations Schema | Name | Type | Owner --------+-----------------+-------+---------- public | Album | table | postgres public | Artist | table | postgres public | Customer | table | postgres public | Employee | table | postgres public | Genre | table | postgres public | Invoice | table | postgres public | InvoiceLine | table | postgres public | MediaType | table | postgres public | Playlist | table | postgres public | PlaylistTrack | table | postgres public | Track | table | postgres public | account | table | postgres public | account_summary | view | postgres public | order | table | postgres public | order_detail | table | postgres public | product | table | postgres public | region | table | postgres (17 rows) You should at least see Chinook-specific tables like Album, Artist, and Track. Step 8: Choose the Spring Initializr for Bootstrapping the Application The important thing with this form is to make these choices: Project: Maven Language: Java Spring Boot: 3.2.5 Packaging: Jar Java: 21 Dependencies: Spring for GraphQL PostgreSQL Driver You can make other choices (e.g., Gradle, Java 22, MySQL, etc.), but bear in mind that this guide has only been tested with the choices above. Step 9: Create a GraphQL Schema File Maven projects have a standard directory layout and a standard place within that layout for resource files to be packaged into the build artifact (a JAR file) is ./src/main/java/resources. Within that directory, create a sub-directory graphql and deposit a schema.graphqls file. There are other ways to organize the GraphQL schema files needed by graphql-java, DGS, and Spring for GraphQL, but they all are rooted in ./src/main/java/resources (for a Maven project). Within the schema.graphqls file (or its equivalent), first there will be a definition for the root Query object, with root-level fields for every GraphQL type that we want in our API. As a starting point, there will be a root-level field under Query for every table, and a corresponding type for every table. For example, for Query: Java type Query { Artist(limit: Int): [Artist] ArtistById(id: Int): Artist Album(limit: Int): [Album] AlbumById(id: Int): Album Track(limit: Int): [Track] TrackById(id: Int): Track Playlist(limit: Int): [Playlist] PlaylistById(id: Int): Playlist PlaylistTrack(limit: Int): [PlaylistTrack] PlaylistTrackById(id: Int): PlaylistTrack Genre(limit: Int): [Genre] GenreById(id: Int): Genre MediaType(limit: Int): [MediaType] MediaTypeById(id: Int): MediaType Customer(limit: Int): [Customer] CustoemrById(id: Int): Customer Employee(limit: Int): [Employee] EmployeeById(id: Int): Employee Invoice(limit: Int): [Invoice] InvoiceById(id: Int): Invoice InvoiceLine(limit: Int): [InvoiceLine] InvoiceLineById(id: Int): InvoiceLine } Note the parameters on these fields. I have written it so that every root-level field that has a List return type accepts one optional limit parameter which accepts an Int. The intention is to support limiting the number of entries that should be returned from a root-level field. Note also that every root-level field that has a Scalar object return type accepts one optional id parameter which also accepts an Int. The intention is to support fetching a single entry by its identifier (which happens to all be integer primary keys in the Chinook data model). Next, here is an illustration of some of the corresponding GraphQL types: Java type Album { AlbumId : Int Title : String ArtistId : Int Artist : Artist Tracks : [Track] } type Artist { ArtistId: Int Name: String Albums: [Album] } type Customer { CustomerId : Int FirstName : String LastName : String Company : String Address : String City : String State : String Country : String PostalCode : String Phone : String Fax : String Email : String SupportRepId : Int SupportRep : Employee Invoices : [Invoice] } Fill out the rest of the schema.graphqls file as you see fit, exposing whatever table (and possibly views, if you create them) you like. Or, just use the complete version from the shared repository. Step 10: Write Java Model Classes Within the standard Maven directory layout, Java source code goes into ./src/main/java and its sub-directories. Within an appropriate sub-directory for whatever Java package you use, create Java model classes. These can be Plain Old Java Objects (POJOs). They can be Java Record classes. They can be whatever you like, so long as they have "getter" and "setter" property methods for the corresponding fields in the GraphQL schema. In this guide's repository, I choose Java Record classes just for the minimal amount of boilerplate. Java package com.graphqljava.tutorial.retail.models; public class ChinookModels { public static record Album ( Integer AlbumId, String Title, Integer ArtistId ) {} public static record Artist ( Integer ArtistId, String Name ) {} public static record Customer ( Integer CustomerId, String FirstName, String LastName, String Company, String Address, String City, String State, String Country, String PostalCode, String Phone, String Fax, String Email, Integer SupportRepId ) {} ... } Steps 11-14: Write Java Controller Classes, Annotate Every Controller, Annotate Every Resolver/Data-Fetcher, and Implement Those Resolver/Data-Fetcher These are the Spring @Controller classes, and within them are the Spring for GraphQL QueryMapping and @SchemaMapping resolver/data-fetcher methods. These are the real workhorses of the application, accepting input parameters, mediating interaction with the database, validating data, implementing (or delegating) to business logic code segments, arranging for SQL and DML statements to be sent to the database, returning the data, processing the data, and sending it along to the GraphQL libraries (graphql-java, DGS, Spring for GraphQL) to package up and send off to the client. There are so many choices one can make in implementing these and I can't go into every detail. Let me just illustrate how I have done it, highlight some things to look out for, and discuss some of the options that are available. For reference, we will look at a section of the ChinookControllers file from the example repository. Java package com.graphqljava.tutorial.retail.controllers; // It's got to go into a package somewhere. import java.sql.ResultSet; // There's loads of symbols to import. import java.sql.SQLException; // This is Java and there's no getting around that. import java.util.List; import java.util.Map; import java.util.stream.Collectors; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.graphql.data.ArgumentValue; import org.springframework.graphql.data.method.annotation.BatchMapping; import org.springframework.graphql.data.method.annotation.QueryMapping; import org.springframework.graphql.data.method.annotation.SchemaMapping; import org.springframework.jdbc.core.RowMapper; import org.springframework.jdbc.core.simple.JdbcClient; import org.springframework.jdbc.core.simple.JdbcClient.StatementSpec; import org.springframework.stereotype.Controller; import com.graphqljava.tutorial.retail.models.ChinookModels.Album; import com.graphqljava.tutorial.retail.models.ChinookModels.Artist; import com.graphqljava.tutorial.retail.models.ChinookModels.Customer; import com.graphqljava.tutorial.retail.models.ChinookModels.Employee; import com.graphqljava.tutorial.retail.models.ChinookModels.Genre; import com.graphqljava.tutorial.retail.models.ChinookModels.Invoice; import com.graphqljava.tutorial.retail.models.ChinookModels.InvoiceLine; import com.graphqljava.tutorial.retail.models.ChinookModels.MediaType; import com.graphqljava.tutorial.retail.models.ChinookModels.Playlist; import com.graphqljava.tutorial.retail.models.ChinookModels.PlaylistTrack; import com.graphqljava.tutorial.retail.models.ChinookModels.Track; public class ChinookControllers { // You don't have to nest all your controllers in one file. It's just what I do. @Controller public static class ArtistController { // Tell Spring about this controller class. @Autowired JdbcClient jdbcClient; // Lots of ways to get DB access from the container. This is one way in Spring Data. RowMapper<Artist> // I'm not using an ORM, and only a tiny bit of help from Spring Data. mapper = new RowMapper<>() { // Consequently, there are these RowMapper utility classes involved. public Artist mapRow (ResultSet rs, int rowNum) throws SQLException { return new Artist(rs.getInt("ArtistId"), rs.getString("Name"));}; @SchemaMapping Artist Artist (Album album) { // @QueryMapping when we can, @SchemaMapping when we have to return // Here, we're getting an Artist for a given Album. jdbcClient .sql("select * from \"Artist\" where \"ArtistId\" = ? limit 1") // Simple PreparedStatement wrapper .param(album.ArtistId()) // Fish out the relating field ArtistId and pass it into the PreparedStatement .query(mapper) // Use our RowMapper to turn the JDBC Row into the desired model class object. .optional() // Use optional to guard against null returns! .orElse(null);} @QueryMapping(name = "ArtistById") Artist // Another resolver, this time to get an Artist by its primary key identifier artistById (ArgumentValue<Integer> id) { // Note the annotation "name" parameter, when the GraphQL field name doesn't match exactly the method name for (Artist a : jdbcClient.sql("select * from \"Artist\" where \"ArtistId\" = ?").param(id.value()).query(mapper).list()) return a; return null;} @QueryMapping(name = "Artist") List<Artist> // Yet another resolver, this time to get a List of Artists. artist (ArgumentValue<Integer> limit) { // Note the one "limit" parameter. ArgumentValue<T> is the way you do this with GraphQL for Java. StatementSpec spec = limit.isOmitted() ? // Switch SQL on whether we did or did not get the limit parameter. jdbcClient.sql("select * from \"Artist\"") : jdbcClient.sql("select * from \"Artist\" limit ?").param(limit.value()); return // Run the SQL, map the results, return the List. spec .query(mapper) .list();} ... There's a lot to unpack here, so let's go through it step by step. First, I included the package and import statements in the example because all too often, tutorials and guides that you find online elide these details for brevity. The problem with that, however, is that it's not compilable or runnable code. You don't know where these symbols are coming from, what packages they're in, and what libraries they're coming from. Any decent editor like IntelliJ, VSCode, or even Emacs will help sort this out for you when you're writing code, but you don't have that when reading a blog article. Moreover, there can be name conflicts and ambiguities among symbols across libraries, so even with a smart editor it can leave the reader scratching their head. Next, please forgive the nested inner classes. Feel free to explode your classes into their own individual files as you see fit. This is just how I do it, largely for pedagogical purposes like this one, to promote Locality of Behavior, which is just a fancy way of saying, "Let's not make the reader jump through a lot of hoops to understand the code." Now for the meat of the code. Aside from niggling details like "How do I get a database connection?", "How do I map data?", etc., the patterns I want you to see through the forest of code are these: Every field in our schema file (schema.graphqls) which isn't a simple scalar field (e.g., Int, String, Boolean) will probably need a resolver/data-fetcher. Every resolver is implemented with a Java method. Every resolver method gets annotated with @SchemaMapping, @QueryMapping, or @BatchMapping. Use @QueryMapping when you can because it's simpler. Use @SchemaMapping when you have to (your IDE should nag you). If you keep the Java method names in sync with the GraphQL field names, it's a little less code, but don't make a federal case out of it. You can fix it with a name parameter in the annotations. Unless you do something different (such as adding filtering, sorting, and pagination), you probably will be fetching either a single entry by its primary key or a list of entries. You won't be fetching "child" entries; that's handled by the GraphQL libraries and the recursive divide-and-conquer way they process GraphQL operations. Note: This has implications for performance, efficiency, and code complexity. The "something different" in the above item refers to the richness that you want to add to your GraphQL API. Want limit operations? Filter predicates? Aggregations? Supporting those cases will involve more ArgumentValue<> parameters, more SchemaMapping resolver methods, and more combinations thereof. Deal with it. You will experience the urge to be clever, to create abstractions that dynamically respond to more and more complex combinations of parameters, filters, and other conditions. Step 15: Upgrade Some of Those Resolver/Data-Fetcher Methods With the Data Loader Pattern You will quickly realize that this can lead to overly chatty interaction with the database, sending too many small SQL statements and impacting performance and availability. This is the proverbial "N+1" problem. In a nutshell, the N+1 problem can be illustrated by our Chinook data model. Suppose we have this GraphQL query. query { Artist(limit: 10) { ArtistId Album { AlbumId Track { TrackId } } } } Get up to 10 Artist entry. For each Artist, get all of the related Album entries. For each Album, get all of the related Track entries. For each entry, just get its identifier field: ArtistId, AlbumId, TrackId. This query is nested 2 levels below Artist. Let n=2. Albumis a List wrapping type on Artist, as is Track is a List wrapping type on Album. Suppose the typical cardinality is m. How many SQL statements will typically be involved 1 to fetch 10 Artist entries. 10*m to fetch the Album entries. 10*m^m to fetch the Track entries. In general, we can see that the number of queries scales as m^n, which is exponential in n. Of course, observe that the amount of data retrieved also scales as m^n. In any case, on its face, this seems like an alarmingly inefficient way to go about fetching this data. There is another way, and it is the standard answer within the GraphQL community for combating this N+1 problem: the data loader pattern (aka "batching"). This encompasses three ideas: Rather than fetch the related child entities (e.g., Album) for a single parent entity (e.g., Artist) using one identifier, fetch the related entities for all of the parent entities in one go, using a list of identifiers. Group the resulting child entities according to their respective parent entities (in code). While we're at it, we might as well cache the entities for the lifetime of executing the one GraphQL operation, in case a given entity appears in more than one place in the graph. Now, for some code. Here's how this looks in our example. Java @BatchMapping(field = "Albums") public Map<Artist, List<Album>> // Switch to @BatchMapping albumsForArtist (List<Artist> artists) { // Take in a List of parents rather than a single parent return jdbcClient .sql("select * from \"Album\" where \"ArtistId\" in (:ids)") // Use a SQL "in" predicate taking a list of identifiers .param("ids", artists.stream().map(x -> x.ArtistId()).toList()) // Fish the list of identifiers out of the list of parent objects .query(mapper) // Can re-use our usual mapper .list() .stream().collect(Collectors.groupingBy(x -> artists.stream().collect(Collectors.groupingBy(Artist::ArtistId)).get(x.ArtistId()).getFirst())); // ^ Java idiom for grouping child Albums according to their parent Albums } Like before, let's unpack this. First, we switch from either the @QueryMapping or @SchemaMapping annotation to @BatchMapping to signal to Spring for GraphQL that we want to use the data loader pattern. Second, we switch from a single Artist parameter to a List<Artist> parameter. Third, we somehow have to arrange the necessary SQL (with an in predicate in this case) and the corresponding parameter (a List<Integer> extracted from the List<Album> parameter). Fourth, we somehow have to arrange for the child entries (Album in this case) to get sorted to the right parent entries (Album in this case). There are many ways to do it, and this is just one way. The important point is that however it's done, it has to be done in Java. One last thing: note the absence of the limit parameter. Where did that go? It turns out that InputValue<T> is not supported by Spring for GraphQL for @BatchMapping. Oh well! In this case, it's no great loss because arguably these limit parameters make little sense. How often does one really need a random subset of an artist's albums? It would be a more serious issue if we had filtering and sorting, however. Filtering and sorting parameters are more justified, and if we had them we would somehow have to find a way to sneak them into the data loader pattern. Presumably, it can be done, but it will not be so easy as just slapping a @BatchMapping annotation onto the method and tinkering with Java streams. This raises an important point about the "N+1 problem" that is never addressed, and that neglect just serves to exaggerate the scale of the problem in a real-world setting. If we have limits and/or filtering, then we have a way of reducing the cardinality of related child entities below m (recall that we took m to be the typical cardinality of a child entity). In the real world, setting limits or, more precisely filtering are necessary for usability. GraphQL APIs are meant for humans, in that at the end of the day, the data are being painted onto a screen or in some other way presented to a human user who then has to absorb and process those data. Humans have severe limits in perception, cognition, and memory, for the quantity of data we can process. Only another machine (i.e., computers) could possibly process a large volume of data, but if you're extracting large volumes of data from one machine to another, then you are building an ETL pipeline. If you are using GraphQL for ETL, then you are doing it wrong and should stop immediately! In any event, in a real-world setting, with human users, both m and n will be very small. The number of SQL queries will not scale as m^n to very large numbers. Effectively, the N+1 problem will inflate the number of SQL queries not by an arbitrarily large factor, but by approximately a constant factor. In a well-designed application, it probably will be a constant factor well below 100. Consider this when balancing the trade-offs in developer time, complexity, and hardware scaling when confronting the N+1 problem. Is This the Only Way To Build a GraphQL API Server? We saw that the "easy way" of building GraphQL servers is the one typically offered in tutorials and "Getting Started" guides, and is over tiny unrealistic in-memory data models, without a database. We saw that the "real way" of building GraphQL servers (in Java) described in some detail above, regardless of library or framework, involves: Writing schema file entries, possibly for every table Writing Java model classes, possibly for every table Writing Java resolver methods, possibly for every field in every table Eventually writing code to solve arbitrarily complex compositions of input parameters Writing code to budget SQL operations efficiently We also observe that GraphQL lends itself to a "recursive divide-and-conquer with an accumulator approach": a GraphQL query is recursively divided and sub-divided along type and field boundaries into a "graph," internal nodes in the graph are processed individually by resolvers, but the data are passed up the graph dataflow style, accumulating into a JSON envelope that is returned to the user. The GraphQL libraries decompose the incoming queries into something like an Abstract Syntax Tree (AST), firing SQL statements for all the internal nodes (ignoring the data loader pattern for a moment), and then re-composing the data. And, we are its willing accomplices! We also observe that building GraphQL servers according to the above recipes leads to other outcomes: Lots of repetition Lots of boilerplate code Bespoke servers Tied to a particular data model Build a GraphQL server more than once according to the above recipes and you will make these observations and will naturally feel a powerful urge to build more sophisticated abstractions that reduce the repetition, reduce the boilerplate, generalize the servers, and decouple them from any particular data model. This is what I call the "natural way" of building a GraphQL API, as it's a natural evolution from the trivial "easy way" of tutorials and "Getting Started" guides, and from the cumbersome "real way" of resolvers and even data loaders. Building a GraphQL server with a network of nested resolvers offers some flexibility and dynamism, and requires a lot of code. Adding in more flexibility and dynamism with limits, pagination, filtering, and sorting, requires more code still. And while it may be dynamic, it will also be very chatty with the database, as we saw. Reducing the chattiness necessitates composing the many fragmentary SQL statements into fewer SQL statements which individually do more work. That's what the data loader pattern does: it reduces the number of SQL statements from "a few tens" to "less than 10 but more than 1." In practice, that may not be a huge win and it comes at the cost of developer time and lost dynamism, but it is a step down the path of generating fewer, more sophisticated queries. The terminus of that path is "1": the optimal number of SQL statements (ignoring caching) is 1. Generate one giant SQL statement that does all the work of fetching the data, teach it to generate JSON while you're at it, and this is the best you will ever do with a GraphQL server (for a relational database). It will be hard work, but you can take solace in having done it once, it need not ever be done again if you do it right, by introspecting the database to generate the schema. Do that, and what you will build won't be so much a "GraphQL API server" as a "GraphQL to SQL compiler." Acknowledge that building a GraphQL to SQL compiler is what you have been doing all along, embrace that fact, and lean into it. You may never need to build another GraphQL server again. What could be better than that? One thing that could be better than building your last GraphQL server, or your only GraphQL server, is never building a GraphQL server in the first place. After all, your goal wasn't to build a GraphQL API but rather to have a GraphQL API. The easiest way to have a GraphQL API is just to go get one. Get one for free if you can. Buy one if the needs justify it. This is the final boss on the journey of GraphQL maturity. How To Choose "Build" Over "Buy" Of course, "buy" in this case is really just a stand-in for the general concept, which is to "acquire" an existing solution rather than building one. That doesn't necessarily require purchasing software, since it could be free and open-source. The distinction that I want to draw here is over whether or not to build a custom solution. When it's possible to acquire an existing solution (whether commercial or open-source), there are several options: Apollo Hasura PostGraphile Prisma If you do choose to build GraphQL servers with Java, I hope you will find this article helpful in breaking out of the relentless tutorials, "Getting Started" guides, and "To-Do" apps. These are vast topics in a shifting landscape that require an iterative approach and a modest amount of repetition.
Java records fit perfectly in Spring Boot applications. Let’s have several scenarios where Java records can help us increase readability and expressiveness by squeezing the homologous code. Using Records in Controllers Typically, a Spring Boot controller operates with simple POJO classes that carry our data back over the wire to the client. For instance, check out this simple controller endpoint returning a list of authors, including their books: Java @GetMapping("/authors") public List<Author> fetchAuthors() { return bookstoreService.fetchAuthors(); } Here, the Author (and Book) can be simple carriers of data written as POJOs. But, they can be replaced by records as well. Here it is: Java public record Book(String title, String isbn) {} public record Author(String name, String genre, List<Book> books) {} That’s all! The Jackson library (which is the default JSON library in Spring Boot) will automatically marshal instances of type Author/Book into JSON. In the bundled code, you can practice the complete example via the localhost:8080/authors endpoint address. Using Records With Templates Thymeleaf is probably the most used templating engine in Spring Boot applications. Thymeleaf pages (HTML pages) are typically populated with data carried by POJO classes, which means that Java records should work as well. Let’s consider the previous Author and Book records, and the following controller endpoint: Java @GetMapping("/bookstore") public String bookstorePage(Model model) { model.addAttribute("authors", bookstoreService.fetchAuthors()); return "bookstore"; } The List<Author> returned via fetchAuthors() is stored in the model under a variable named authors. This variable is used to populate bookstore.html as follows: HTML ... <ul th:each="author : ${authors}"> <li th:text="${author.name} + ' (' + ${author.genre} + ')'" /> <ul th:each="book : ${author.books}"> <li th:text="${book.title}" /> </ul> </ul> ... Done! You can check out the application Java Coding Problems SE. Using Records for Configuration Let’s assume that in application.properties we have the following two properties (they could be expressed in YAML as well): Properties files bookstore.bestseller.author=Joana Nimar bookstore.bestseller.book=Prague history Spring Boot maps such properties to POJO via @ConfigurationProperties. But, a record can be used as well. For instance, these properties can be mapped to the BestSellerConfig record as follows: Java @ConfigurationProperties(prefix = "bookstore.bestseller") public record BestSellerConfig(String author, String book) {} Next, in BookstoreService (a typical Spring Boot service), we can inject BestSellerConfig and call its accessors: Java @Service public class BookstoreService { private final BestSellerConfig bestSeller; public BookstoreService(BestSellerConfig bestSeller) { this.bestSeller = bestSeller; } public String fetchBestSeller() { return bestSeller.author() + " | " + bestSeller.book(); } } In the bundled code, we have added a controller that uses this service as well. Record and Dependency Injection In the previous examples, we have injected the BookstoreService service into BookstoreController using the typical mechanism provided by SpringBoot – dependency injection via constructor (it can be done via @Autowired as well): Java @RestController public class BookstoreController { private final BookstoreService bookstoreService; public BookstoreController(BookstoreService bookstoreService) { this.bookstoreService = bookstoreService; } @GetMapping("/authors") public List<Author> fetchAuthors() { return bookstoreService.fetchAuthors(); } } But, we can compact this class by re-writing it as a record as follows: Java @RestController public record BookstoreController(BookstoreService bookstoreService) { @GetMapping("/authors") public List<Author> fetchAuthors() { return bookstoreService.fetchAuthors(); } } The canonical constructor of this record will be the same as our explicit constructor. The application is available on GitHub. Feel free to challenge yourself to find more use cases of Java records in Spring Boot applications.
In my previous article, I took a closer look at the Java ExecutorService interface and its implementations, with some focus on the Fork/Join framework and ThreadPerTaskExecutor. Today, I would like to take a step forward and check how well they behave when put under pressure. In short, I am going to make benchmarks, a lot of benchmarks. All the code from below, and more, will be available in a dedicated GitHub repository. Logic Under Benchmark I would like to start this text with a walk through the logic that will be the base for benchmarks as it is split into two basic categories: Based on the classic stream Based on the Fork/Join approach Classic Stream Logic Java public static Map<Ip, Integer> groupByIncomingIp(Stream<String> requests, LocalDateTime upperTimeBound, LocalDateTime lowerTimeBound) { return requests .map(line -> line.split(",")) .filter(words -> words.length == 3) .map(words -> new Request(words[1], LocalDateTime.parse(words[2]))) .filter(request -> request.timestamp().isBefore(upperTimeBound) && request.timestamp().isAfter(lowerTimeBound)) .map(i -> new Ip(i.ip())) .collect(groupingBy(i -> i, summingInt(i -> 1))); } In theory, the purpose of this piece of code is to transform a list of strings, then do some filtering and grouping around and return the map. Supplied strings are in the following format: 1,192.168.1.1,2023-10-29T17:33:33.647641574 It represents the event of reading an IP address trying to access a particular server. The output maps an IP address to the number of access attempts in a particular period, expressed by lower and upper time boundaries. Fork/Join Logic Java @Override public Map<Ip, Integer> compute() { if (data.size() >= THRESHOLD) { Map<Ip, Integer> output = new HashMap<>(); ForkJoinTask .invokeAll(createSubTasks()) .forEach(task -> task .join() .forEach((k, v) -> updateOutput(k, v, output)) ); return output; } return process(); } private void updateOutput(Ip k, Integer v, Map<Ip, Integer> output) { Integer currentValue = output.get(k); if (currentValue == null) { output.put(k, v); } else { output.replace(k, currentValue + v); } } private List<ForkJoinDefinition> createSubTasks() { int size = data.size(); int middle = size / 2; return List.of( new ForkJoinDefinition(new ArrayList<>(data.subList(0, middle)), now), new ForkJoinDefinition(new ArrayList<>(data.subList(middle, size)), now) ); } private Map<Ip, Integer> process() { return groupByIncomingIp(data.stream(), upperTimeBound, lowerTimeBound); } The only impactful difference here is that I split the dataset into smaller batches until a certain threshold is reached. By default, the threshold is set to 20. After this operation, I start to perform the computations. Computations are the same as in the classic stream approach logic described above - I am using the groupByIncomingIp method. JMH Setup All the benchmarks are written using Java Microbenchmark Harness (or JMH for short). I have used JMH in version 1.37 to run benchmarks. Benchmarks share the same setup: five warm-up iterations and twenty measurement iterations. There are two different modes here: average time and throughput. In the case of average time, the JMH measures the average execution time of code under benchmark, and output time is expressed in milliseconds. For throughput, JMH measures the number of operations - full execution of code - in a particular unit of time, milliseconds in this case. The result is expressed in ops per millisecond. In more JMH syntax: Java @Warmup(iterations = 5, time = 10, timeUnit = SECONDS) @Measurement(iterations = 20, time = 10, timeUnit = SECONDS) @BenchmarkMode({Mode.AverageTime, Mode.Throughput}) @OutputTimeUnit(MILLISECONDS) @Fork(1) @Threads(1) Furthermore, each benchmark has its unique State with a Benchmark scope containing all the data and variables needed by a particular benchmark. Benchmark State Classic Stream The base benchmark state for Classic Stream can be viewed below. Java @State(Scope.Benchmark) public class BenchmarkState { @Param({"0"}) public int size; public List<String> input; public ClassicDefinition definitions; public ForkJoinPool forkJoinPool_4; public ForkJoinPool forkJoinPool_8; public ForkJoinPool forkJoinPool_16; public ForkJoinPool forkJoinPool_32; private final LocalDateTime now = LocalDateTime.now(); @Setup(Level.Trial) public void trialUp() { input = new TestDataGen(now).generate(size); definitions = new ClassicDefinition(now); System.out.println(input.size()); } @Setup(Level.Iteration) public void up() { forkJoinPool_4 = new ForkJoinPool(4); forkJoinPool_8 = new ForkJoinPool(8); forkJoinPool_16 = new ForkJoinPool(16); forkJoinPool_32 = new ForkJoinPool(32); } @TearDown(Level.Iteration) public void down() { forkJoinPool_4.shutdown(); forkJoinPool_8.shutdown(); forkJoinPool_16.shutdown(); forkJoinPool_32.shutdown(); } } First, I set up all the variables needed to perform benchmarks. Apart from the size parameter, which is particularly special in this part, thread pools will be used only in the benchmark. The size parameter, on the other hand, is quite an interesting mechanism of JMH. It allows the parametrization of a certain variable used during the benchmark. You will see how I took advantage of this later when we move to running benchmarks. As for now, I am using this parameter to generate the input dataset that will remain unchanged throughout the whole benchmark - to achieve better repeatability of results. The second part is an up method that works similarly to @BeforeEach from the JUnit library. It will be triggered before each of the 20 iterations of my benchmark and reset all the variables used in the benchmark. Thanks to such a setting, I start with a clear state for every iteration. The last part is the down method that works similarly to @AfterEach from the JUnit library. It will be triggered after each of the 20 iterations of my benchmark and shut down all the thread pools used in the iteration - mostly to handle possible memory leaks. Fork/Join The state for the Fork/Join version looks as below. Java @State(Scope.Benchmark) public class ForkJoinState { @Param({"0"}) public int size; public List<String> input; public ForkJoinPool forkJoinPool_4; public ForkJoinPool forkJoinPool_8; public ForkJoinPool forkJoinPool_16; public ForkJoinPool forkJoinPool_32; public final LocalDateTime now = LocalDateTime.now(); @Setup(Level.Trial) public void trialUp() { input = new TestDataGen(now).generate(size); System.out.println(input.size()); } @Setup(Level.Iteration) public void up() { forkJoinPool_4 = new ForkJoinPool(4); forkJoinPool_8 = new ForkJoinPool(8); forkJoinPool_16 = new ForkJoinPool(16); forkJoinPool_32 = new ForkJoinPool(32); } @TearDown(Level.Iteration) public void down() { forkJoinPool_4.shutdown(); forkJoinPool_8.shutdown(); forkJoinPool_16.shutdown(); forkJoinPool_32.shutdown(); } } There is no big difference between the setup for classic stream and Fork/Join. The only difference comes from placing the definitions inside benchmarks themselves, not in state as in the case of the Classic approach. Such change comes from how RecursiveTask works - task executions are memoized and stored - thus, it can impact overall benchmark results. Benchmark Input The basic input for benchmarks is a list of strings in the following format: 1,192.168.1.1,2023-10-29T17:33:33.647641574 Or in a more generalized description: {ordering-number},{ip-like-string},{timestamp} There are five different input sizes: 100 1000 10000 100000 1000000 There is some deeper meaning behind the sizes, as I believe that such a size range can illustrate how well the solution will scale and potentially show some performance bottleneck. Additionally, the overall setup of the benchmark is very flexible, so adding a new size should not be difficult if someone is interested in doing so. Benchmark Setup Classic Stream There is only a single class related to the classic stream benchmark. Different sizes are handled on a State level. Java public class ClassicStreamBenchmark extends BaseBenchmarkConfig { @Benchmark public void bench_sequential(SingleStreamState state, Blackhole bh) { Map<String, Integer> map = state.definitions.sequentialStream(state.input); bh.consume(map); } @Benchmark public void bench_defaultParallelStream(SingleStreamState state, Blackhole bh) { Map<String, Integer> map = state.definitions.defaultParallelStream(state.input); bh.consume(map); } @Benchmark public void bench_parallelStreamWithCustomForkJoinPool_4(SingleStreamState state, Blackhole bh) { Map<String, Integer> map = state.definitions.parallelStreamWithCustomForkJoinPool(state.forkJoinPool_4, state.input); bh.consume(map); } @Benchmark public void bench_parallelStreamWithCustomForkJoinPool_8(SingleStreamState state, Blackhole bh) { Map<String, Integer> map = state.definitions.parallelStreamWithCustomForkJoinPool(state.forkJoinPool_8, state.input); bh.consume(map); } @Benchmark public void bench_parallelStreamWithCustomForkJoinPool_16(SingleStreamState state, Blackhole bh) { Map<String, Integer> map = state.definitions.parallelStreamWithCustomForkJoinPool(state.forkJoinPool_16, state.input); bh.consume(map); } @Benchmark public void bench_parallelStreamWithCustomForkJoinPool_32(SingleStreamState state, Blackhole bh) { Map<String, Integer> map = state.definitions.parallelStreamWithCustomForkJoinPool(state.forkJoinPool_32, state.input); bh.consume(map); } } There are six different benchmark setups of the same logic: bench_sequential: Simple benchmark with just a singular sequential stream bench_defaultParallelStream: Benchmark with default Java parallel stream via .parallelStream() method of Stream class in practice a commonPool from ForkJoinPool and parallelism of 19 (at least on my machine) bench_parallelStreamWithCustomForkJoinPool_4: Custom ForkJoinPool with parallelism level equal to 4 bench_parallelStreamWithCustomForkJoinPool_8: Custom ForkJoinPool with parallelism level equal to 8 bench_parallelStreamWithCustomForkJoinPool_16: Custom ForkJoinPool with parallelism level equal to 16 bench_parallelStreamWithCustomForkJoinPool_32 : Custom ForkJoinPool with parallelism level equal to 32 For classic stream logic, I have 6 different setups and 5 different input sizes resulting in a total of 30 different unique combinations of benchmarks. Fork/Join Java public class ForkJoinBenchmark extends BaseBenchmarkConfig { @Benchmark public void bench(ForkJoinState state, Blackhole bh) { Map<Ip, Integer> map = new ForkJoinDefinition(state.input, state.now).compute(); bh.consume(map); } @Benchmark public void bench_customForkJoinPool_4(ForkJoinState state, Blackhole bh) { ForkJoinDefinition forkJoinDefinition = new ForkJoinDefinition(state.input, state.now); Map<Ip, Integer> map = state.forkJoinPool_4.invoke(forkJoinDefinition); bh.consume(map); } @Benchmark public void bench_customForkJoinPool_8(ForkJoinState state, Blackhole bh) { ForkJoinDefinition forkJoinDefinition = new ForkJoinDefinition(state.input, state.now); Map<Ip, Integer> map = state.forkJoinPool_8.invoke(forkJoinDefinition); bh.consume(map); } @Benchmark public void bench_customForkJoinPool_16(ForkJoinState state, Blackhole bh) { ForkJoinDefinition forkJoinDefinition = new ForkJoinDefinition(state.input, state.now); Map<Ip, Integer> map = state.forkJoinPool_16.invoke(forkJoinDefinition); bh.consume(map); } @Benchmark public void bench_customForkJoinPool_32(ForkJoinState state, Blackhole bh) { ForkJoinDefinition forkJoinDefinition = new ForkJoinDefinition(state.input, state.now); Map<Ip, Integer> map = state.forkJoinPool_32.invoke(forkJoinDefinition); bh.consume(map); } } There are six different benchmark setups of the same logic: bench -> simple benchmark with just a singular sequential stream bench_customForkJoinPool_4: Custom ForkJoinPool with parallelism level equal to 4 bench_customForkJoinPool_8: Custom ForkJoinPool with parallelism level equal to 8 bench_customForkJoinPool_16: Custom ForkJoinPool with parallelism level equal to 16 bench_customForkJoinPool_32: Custom ForkJoinPool with parallelism level equal to 32 For classic stream logic, I have 5 different setups and 5 different input sizes resulting in a total of 25 different unique combinations of benchmarks. What is more, in both cases I am also using the Blackhole concept from JMH to “cheat” the compiler optimization of dead code. There’s more about Blackholes and their use case here. Benchmark Environment Machine 1 The tests we conducted on my Dell XPS with the following parameters: OS: Ubuntu 20.04.6 LTS CPU: i9-12900HK × 20 Memory: 64 GB JVM OpenJDK version "21" 2023-09-19 OpenJDK Runtime Environment (build 21+35-2513) OpenJDK 64-Bit Server VM (build 21+35-2513, mixed mode, sharing) Machine 2 The tests we conducted on my Lenovo Y700 with the following parameters: OS: Ubuntu 20.04.6 LTS CPU: i7-6700HQ × 8 Memory: 32 GB JVM OpenJDK version "21" 2023-09-19 OpenJDK Runtime Environment (build 21+35-2513) OpenJDK 64-Bit Server VM (build 21+35-2513, mixed mode, sharing) For both machines, all side/insignificant applications were closed. I tried to make the runtime system as pure as possible so as to not generate any unwanted performance overhead. However, on a pure Ubuntu server or when run inside a container, the overall performance may differ. Benchmark Report The results of running benchmarks are stored in .csv files and the GitHub repository in the reports directory. Furthermore, to ease the download of reports, there is a separate .zip file named reports.zip that contains all the .csv files with data. Reports directories are structured on per size basis with three special reports for all input sizes: report_classic: All input sizes for classic stream report_forkjoin: All input sizes for fork/join stream report_whole: All input sizes for both classic and fork/join stream Each report directory from the above 3 separate files: averagetime.csv: Results for average time mode benchmarks throughput.csv: Results for throughput mode benchmarks total.csv: Combine results for both modes For the particular reports, I have two formats: averagetime.csv and throughput.csv share one format, and total.csv has a separate one. Let’s call them modes and total formats. The modes report contains eight columns: Label: Name of the benchmark Input Size: Benchmark input size Threads: Number of threads used in benchmark from set 1,4,7,8,16,19,32 Mode: Benchmark mode, either average time or throughput Cnt: The number of benchmark iterations should always be equal to 20 Score: Actual results of benchmark Score Mean Error: Benchmark measurement error Units: Units of benchmark either ms/op (for average time) or ops/ms (for throughput) The total report contains 10 columns: Label: Name of the benchmark Input Size: Benchmark input size Threads: Number of threads used in benchmark from set 1,4,7,8,16,19,32 Cnt: The number of benchmark iterations should always be equal to 20 AvgTimeScore: Actual results of benchmark for average time mode AvgTimeMeanError: Benchmark measurement error for average time mode AvgUnits: Units of benchmark for average time mode in ms/op ThroughputScore: Actual results of benchmark ThroughputMeanError: Benchmark measurement error for throughput mode ThroughputUnits: Units of benchmark for throughput mode in ops/ms Results Analysis Assumptions Baseline I will present general results and insights based on the size of 10000 – so I will be using the .csv files from the report_10000 directory. There are two main reasons behind choosing this particular data size. The execution time is high enough to show any difference based on different setups. Data sizes 100 and 1000 are, in my opinion, too small to notice some performance bottlenecks Thus, I think that an in-depth analysis of this particular data size would be the most impactful. Of course, other sizes will also get a results overview but it will not be as thorough as this one unless I encounter some anomalies – in comparison to the behavior for size 10000. A Word On Fork/Join Native Approach With the current code under benchmark, there will be performance overhead associated with Fork/Join benchmarks. As the fork-join benchmark logic heavily relies on splitting the input dataset there must be a moment when all of the results are combined into a single cohesive output. This is the fragment that is not included in normal benchmarks, so correctly understanding its impact on overall performance is crucial. Please remember about this. Analysis Machine 1 (20 cores) As you can see above the best overall result for input volume 10 thousands on machine 1 belongs to versions with defaultParallelStream. For ClassicStream-based benchmarks, bench_defaultParallelStream returns by far the best result. Even when we take into consideration a possible error in measurements, it still comes on top. Setup for ForkJoinPool with parallelism 32 and 16 and return worse results. On one hand, it is surprising - for parallelism 32, I would expect a better score than for the default pool (parallelism 19). However, parallelism 16 has worse results than both parallelism 19 and 32. With 20 CPU threads on Machine 1, parallelism 32 is not enough to picture performance degradation caused by an overabundance of threads. However, you would be able to notice such behavior for Machine 2. I would assume that to show such behavior on Machine 1, the parallelism should be set to 64 or more. What is curious here is that the relationship with bench_defaultParallelStream coming on top seems not to hold for higher input sizes of 100k and one million. The best performance belongs to bench_parallelStreamWithCustomForkJoinPool_16 which may indicate that in the end, reasonably smaller parallelism may be a good idea. The Fork/Join-based implementation is noticeably slower than the default parallel stream implementation, with around 10 % worse performance. This pattern also occurs for other sizes. It confirms my assumption from above that joining different smaller parts of a split data set has a noticeable impact. Of course, the worst score belongs to the single-threaded approach and is around 5 times slower than the best result. Such a situation is expected, as a single-threaded benchmark is a kind of baseline for me. I want to check how far we can move its execution time and 5 times better average execution time in the best-case scenario seems like a good score. As for the value of the score mean error it is very very small. In the worst case (the highest error), it is within 1,5% of its respectable score (result for ClassicStreamBenchmark.bench_parallelStreamWithCustomForkJoinPool_4). In other cases, it varies from 0,1 % to 0,7 % of the overall score. There seems to be no difference in result positions for sizes above 10 thousand. Machine 2 (8 cores) As in the case of Machine 1, the first score also belongs to bench_defaultParallelStream. Again, even when we consider a possible measurement error, it still comes out on top: nothing especially interesting.What is interesting, however, is that the pattern of the first 3 positions for Machine 2 changes quite a lot based on higher input sizes. For input 100 to 10000, we have somewhat similar behavior, with bench_defaultParallelStream occupying 1 position and bench_parallelStreamWithCustomForkJoinPool_8 following shortly after. On the other hand, for inputs 100000 and 1000000, the first position belongs to bench_parallelStreamWithCustomForkJoinPool_8 followed by bench_parallelStreamWithCustomForkJoinPool_32. While bench_defaultParallelStream is moved to 4th and 3rd positions. Another curious thing about Machine 2 may be that for smaller input sizes, parallelism 32 is quite far away from the top. Such performance degradation may be caused by the overabundance of threads compared to the 8 CPU threads total available on the machine. Nevertheless, on inputs 100000 and 1000000, ForkJoinPool with parallelism 32 is in the second position, which may indicate that for longer time spans, such overabundance of threads is not a problem. Some other aspects that are very similar to the behavior of Machine 1 are skipped here and are mentioned below. Common Points There are a few observations valid for both machines: My ForkJoinNative (“naive”)-based benchmarks yield results that are noticeably worse, around 10% on both machines, than those delivered by default versions of a parallel stream or even ones with customForkJoinPool. Of course, one of the reasons is that they are not optimized in any way. There are probably some low-hanging performance fruits here. Thus, I strongly recommend getting familiar with the Fork/Join framework, before moving its implementations to production. The time difference between positions one to three is very, very small - less than a millisecond. Thus, it may be hard to achieve any type of repeatability for these benchmarks. With such a small difference it is easy for the results distribution to differ between benchmark runs. The mean error of the scores is also very, very small, up to 2% of the overall score in worse cases - mostly less than 1%. Such low error may indicate two things. The first, benchmarks are reliable because results are focused around some point. If there were some anomalies along the way the error would be higher. Second, JMH is good at making measurements. There is no breaking difference in results between throughput and average time modes. If one of the benchmarks performed well in average time mode, it would also perform well in throughput mode. Above you can see all the differences and similarities I found inside the report files. If you find anything else that seems to be interesting do not hesitate to mention it in the comment section below. Summary Before we finally split ways for today, I would like to mention one more very important thing: JAVA IS NOT SLOW Processing the list with one million elements, all potential JMH overhead, and a single thread takes 560 milliseconds (Machine 1) and 1142 milliseconds (Machine 2). There are no special optimizations or magic included, just pure default JVM. The total best time for processing one million elements for Machine 1 was 88 milliseconds for ClassicStreamBenchmark.bench_parallelStreamWithCustomForkJoinPool_16. In the case of Machine 2, it was 321 milliseconds for ClassicStreamBenchmark.bench_parallelStreamWithCustomForkJoinPool_8. Although both results may not be as good as C/C++-based solutions, the relative simplicity and descriptiveness of the approach make it very interesting, in my opinion. Overall, it is quite a nice addition to Java’s one billion rows challenge. I would just like to mention that all the reports and benchmark code are in the GitHub repository (linked in the introduction of this article). You can easily verify my results and compare them to the benchmark behavior on your machine. Furthermore, to ease up the download of reposts there is a separate .zip file named reports.zip that contains all the .csv files with data. Besides, remember Java is not slow. Thank you for your time. Review by: Krzysztof Ciesielski, Łukasz Rola
Uniform Resource Locators (URLs) function as the address of unique resources on the internet. Entering a website URL into our browser retrieves the HTML/CSS files required to construct the page we’re visiting, and making API calls against an endpoint URL allows us to remotely access and/or modify important data — the list goes on. URLs effectively facilitate the interconnectivity we take for granted on the internet today. When we capture URL string inputs in our web applications, it’s critical that we validate those inputs to ensure the URLs are useful. Retrieving and storing any form of address data (whether that's a URL address, an IP address, or even a physical street address) without immediately validating its utility is a waste of time; it’ll leave us empty-handed when we attempt to access important resources in the future. Automating URL validation isn’t exactly as straightforward as it sounds, however. Any given URL can present multiple issues at once, and some of those issues are harder and more resource-intensive to find out about than others. We can look at URL validity from a syntax perspective (i.e., ensuring the URL is well-formed), and we can also look at it from a domain and endpoint validity perspective (i.e., ensuring the domain exists and the unique resources are actually accessible). In this article, we’ll discuss what constitutes a valid URL from a syntax, domain, and endpoint validity perspective, and we’ll learn how to call an API (using ready-to-run Java code examples) that validates all three of these factors simultaneously. Understanding URL Validity Validating a URL string starts with checking the URL syntax. Each component of the URL structure must be incorporated correctly to access any given URL's resources. Let’s quickly break down the basic components of a valid URL. We’ll use https://example.com as a simple example. A valid URL begins with a correctly typed scheme that identifies the internet protocol used for communication. In the case of https://example.com, that protocol is https. The scheme must be followed by the scheme delimiter :// to separate it from the rest of the URL. Errors in scheme syntax are common, but they’re relatively easy to identify with lightweight programmatic methods. A valid URL next presents a top-level domain (e.g., .com) and a second-level domain (e.g., example). A subdomain (e.g., api.example) can sometimes precede the second-level domain. A domain syntax error might involve a simple misspelling at this stage, such as https://examplecom. The missing period between example and com means the top-level domain is missing, and the URL cannot be accessed. Syntax is crucially important, but validating syntax alone won’t entirely ensure a URL is functional. A misspelled domain can appear syntactically correct, but we won’t know it’s a real domain unless we check the DNS (Domain Name System) to see if it’s registered there. If we misspell our example URL as https://exmpl.com, for instance, we won’t be able to access https://example.com resources (unless example.com also owned the exmpl.comdomain), but we will technically have a syntactically valid URL string. Furthermore, validating a domain name with a DNS lookup doesn’t necessarily mean we can access resources from that URL, either. Well-formed URLs with registered domain names can still point to resources that are inaccessible for one reason or another. For example, if we’re planning to make API calls against https://api.example.com, we’ll need to make a request to the URL endpoint directly to determine whether it’s listening and prepared to modify/return resources as expected. Validating URLs in Java There are a few standard ways we can validate URLs in Java. In this case, we’ll briefly discuss two common classes that can be used for this purpose: java.net.url, and HttpURLConnection. Both classes are part of the java.net package, which is provided by the Java Development Kit (JDK). Using the java.net.url class, we can perform limited validation checks during URL parsing. We can check for syntax errors in a URL string, and we can ensure URLs follow a standard format. However, this class isn’t primarily designed for validation; rather, it’s designed for working with URLs in other important ways, such as parsing or composing URLs. We won’t be able to validate domain names and endpoints with this class. Using the HttpURLConnection class, we can open a connection with a URL and check the response code from the underlying server. This technically works as a method for validating URL endpoints, but it’s a bit resource-intensive (and, much like the java.net.url class, it's not explicitly designed with validation in mind). When we use the HttpURLConnection class, we need our application to handle the connection setup, send requests, read responses, and manage errors — all of which puts a significant burden on our server. Fully Validating URLs With Free API Rather than build a URL validation workflow around a Java class, we can instead take advantage of a free URL validation API that performs an exhaustive URL validation check on our behalf. This way, we can very easily validate URL syntax, domain existence, and endpoint availability in one step. Perhaps most importantly, we can abstract the heavy lifting involved in domain and endpoint validation to another server. Our application won’t need to handle HTTP connection management or error handling by itself, and — as an added benefit — it won’t need to deal directly with potentially threatening URLs either. If we use this API to validate our earlier example https://example.com, we’ll get the following response: JSON { "ValidURL": true, "Valid_Syntax": true, "Valid_Domain": true, "Valid_Endpoint": true, "WellFormedURL": "https://example.com/" } With a simple response object like this, we can quickly determine if URLs are usable based on several important URL validation categories. Demonstration To take advantage of this multi-step URL validation API, we can use the ready-to-run Java code examples provided below to structure our API call, and we can use a free API key to authorize our API calls. With a free API key, we can make up to 800 API calls per month without any additional commitments. To install the client SDK, let’s add the following reference to the repository in our Maven POM file (Jitpack is used to dynamically compile the library): XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> And then let’s add the following reference to the dependency: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> Next, let’s add the imports to our file: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.DomainApi; And after that, let’s use the below examples to call the URL validation function, and let's replace the "YOUR API KEY" placeholder text with our own API key: Java ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); DomainApi apiInstance = new DomainApi(); ValidateUrlRequestFull request = new ValidateUrlRequestFull(); // ValidateUrlRequestFull | Input URL request try { ValidateUrlResponseFull result = apiInstance.domainUrlFull(request); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling DomainApi#domainUrlFull"); e.printStackTrace(); } That’s all the code we’ll need. We can now easily use this API to capture URL input strings in any of our Java web applications and carry out a useful multi-step validation check. Conclusion In this article, we discussed the importance of validating URLs, the various components of a valid URL, and two Java classes we can use to handle URL validation. In the end, we learned how to call a free URL validation API that performs a multi-step URL validation check on our behalf.
In the realm of software development, particularly in Java programming, testing frameworks are essential tools that help ensure the reliability, efficiency, and quality of code. Two of the most prominent testing frameworks for Java are TestNG and JUnit. Both frameworks have their strengths, weaknesses, and unique features, making them suitable for different testing needs. This article aims to provide a comprehensive comparison between TestNG and JUnit, exploring their features, advantages, limitations, and use cases. Overview of TestNG TestNG, inspired by JUnit and NUnit, is a testing framework designed to simplify a broad range of testing needs, from unit testing to integration testing. TestNG stands for "Test Next Generation," reflecting its intention to cover a wide spectrum of testing capabilities. Key Features of TestNG Annotations: TestNG offers a rich set of annotations that provide greater flexibility and control over test execution. Examples include @BeforeSuite, @AfterSuite, @BeforeTest, @AfterTest, and more. Parallel execution: TestNG supports running tests in parallel, which can significantly reduce test execution time, especially for large test suites. Data-driven testing: With the @DataProvider annotation, TestNG facilitates data-driven testing, allowing tests to run multiple times with different sets of data. Flexible test configuration: TestNG's XML-based configuration files offer extensive customization for test execution, grouping, and prioritization. Dependency testing: TestNG allows specifying dependencies between test methods using the dependsOnMethods and dependsOnGroups attributes, ensuring that tests are executed in a specific order. Built-in reporting: TestNG generates detailed HTML and XML reports, providing insights into test execution and results. Overview of JUnit JUnit is one of the most widely used testing frameworks for Java. Its simplicity, robustness, and widespread adoption have made it a standard tool for unit testing in Java development. Key Features of JUnit Annotations: JUnit 5, the latest version, introduced a modular architecture and a rich set of annotations, including @Test, @BeforeEach, @AfterEach, @BeforeAll, and @AfterAll. Parameterized tests: JUnit supports parameterized tests, allowing a test method to run multiple times with different parameters using the @ParameterizedTest annotation. Assertions: JUnit provides a comprehensive set of assertion methods to validate test outcomes, such as assertEquals, assertTrue, assertFalse, and assertThrows. Extension model: JUnit 5 introduced an extension model that enables developers to add custom behavior to tests, such as custom annotations and listeners. Test suites: JUnit supports grouping multiple test classes into a test suite, facilitating organized and structured testing. Integration with build tools: JUnit integrates seamlessly with build tools like Maven and Gradle, making it an integral part of the continuous integration and continuous deployment (CI/CD) pipeline. Comparative Analysis To better understand the differences and similarities between TestNG and JUnit, let's delve into various aspects of these frameworks. Annotations and Test Configuration TestNG: Offers a more extensive set of annotations, providing finer control over test setup, execution, and teardown. The XML-based configuration allows for complex test configurations and suite definitions. JUnit: While JUnit 5 has significantly improved its annotation set and modularity, it is still generally considered simpler compared to TestNG. The use of annotations like @BeforeEach and @AfterEach provides a straightforward approach to test configuration. Parallel Execution TestNG: Native support for parallel test execution is one of TestNG's strong points. It allows tests to run concurrently, which is beneficial for large test suites. JUnit: Parallel execution is possible in JUnit 5 but requires additional setup and configuration, making it slightly less straightforward than TestNG's approach. Data-Driven Testing TestNG: The @DataProvider annotation in TestNG makes data-driven testing easy and intuitive. It allows passing multiple sets of data to a test method, which is particularly useful for testing with different input values. JUnit: JUnit 5's @ParameterizedTest provides similar functionality, but the setup is more verbose and might require more boilerplate code compared to TestNG. Dependency Testing TestNG: The ability to define dependencies between test methods and groups is a unique feature of TestNG, enabling complex test scenarios where the execution order is crucial. JUnit: JUnit does not natively support method dependencies, which can be a limitation for tests that require a specific order of execution. Reporting TestNG: Generates detailed HTML and XML reports out of the box, which includes information on test execution time, passed and failed tests, and skipped tests. JUnit: JUnit's reporting capabilities are often supplemented by external tools and plugins, such as Surefire for Maven or the JUnit plugin for Gradle, to generate comprehensive test reports. Community and Ecosystem TestNG: While TestNG has a strong community and ecosystem, it is not as widely adopted as JUnit. However, it remains popular for its advanced features and flexibility. JUnit: JUnit enjoys a larger user base and broader support from the Java development community. Its integration with various tools, libraries, and frameworks is more extensive. Use Cases When To Use TestNG If you require advanced features such as parallel test execution, complex test configurations, and dependency management. For projects where test flexibility and customization are paramount. In scenarios where data-driven testing is a common requirement, leveraging the @DataProvider annotation. When To Use JUnit For straightforward unit testing needs with a focus on simplicity and ease of use. In projects where integration with CI/CD pipelines and build tools like Maven and Gradle is essential. If you prefer a testing framework with extensive community support and resources. Conclusion Both TestNG and JUnit are powerful testing frameworks that cater to different needs in Java development. TestNG excels in scenarios requiring advanced features, flexibility, and detailed reporting, making it suitable for complex test environments. On the other hand, JUnit's simplicity, robustness, and integration capabilities make it an excellent choice for standard unit testing and integration into CI/CD workflows. Choosing between TestNG and JUnit depends on the specific requirements of your project, the complexity of your test scenarios, and your preference for certain features and configurations. By understanding the strengths and limitations of each framework, developers can make an informed decision that best aligns with their testing needs and project goals.
Dependency Injection is one of the foundational techniques in Java backend development, helping build resilient and scalable applications tailored to modern software demands. DI is used to simplify dependency management by externalizing dependencies from the class itself, streamlining code maintenance, fostering modularity, and enhancing testability. Why is this technique crucial for Java developers? How does it effectively address common pain points? In this article, I present to you the practical benefits, essential practices, and real-world applications of Dependency Injection. Let's explore the practical strategies that underlie Dependency Injection in Java backend applications. What Do We Need Dependency Injection For? Testability Testability – the extent to which you can test a system – is a critical aspect of Java backend development, and Dependency Injection is indispensable here. Say, you have a Java class fetching data from an external database. If you don’t use DI, the class will likely tightly couple itself to the database connection, which will complicate unit testing. By employing DI, you can inject database dependencies, simplifying mocking during unit tests. For instance, Mockito, a popular Java mocking framework, will let you inject mock DataSource objects into classes, facilitating comprehensive testing without actual database connections. Another illustrative example is the testing of classes that interact with external web services. Suppose a Java service class makes HTTP requests to a third-party API. By injecting a mock HTTP client dependency with DI, you can simulate various responses from the API during unit tests, achieving comprehensive test coverage. Static calls within a codebase can also be mocked, although it’s both trickier to implement and less efficient, performance-wise. You will also have to use specialized libraries like PowerMock. Additionally, static methods and classes marked as final are much more challenging to mock. Compared to the streamlined approach facilitated by DI, this complexity undermines the agility and effectiveness of unit testing. Abstraction of Implementation Achieving abstraction of implementation is a crucial technique for building flexible and maintainable codebases. DI can help you achieve this goal by decoupling classes from concrete implementations and promoting programming to interfaces. In practical terms, imagine you have a Java service class responsible for processing user data. You can use DI to inject the validation utility dependency instead of directly instantiating a validation utility class. For example, you can define a common interface for validation and inject different validation implementations at runtime. With this, you’ll be able to switch between different validation strategies without modifying the service class. Let me illustrate this idea with a simple example: Java public interface Validator { boolean isValid(String data); } public class RegexValidator implements Validator { @Override public boolean isValid(String data) { // Regular expression-based logic return true; } } public class CustomValidator implements Validator { @Override public boolean isValid(String data) { // Custom logic return true; } } public class DataService { private final Validator validator; public DataService(Validator validator) { this.validator = validator; } public void processData(String data) { if (validator.isValid(data)) { // Processing valid data } else { // Handling invalid data } } } Here, the DataService class depends on a Validator interface, allowing different validation implementations to be injected. This approach makes your code more flexible and maintainable, as different validation strategies can be easily swapped without modifying the DataService class. Readability and Understanding of Code The third area where DI shines is ensuring the readability of code. Let’s say that, during a Java codebase review, you encounter a class with external dependencies. Without DI, these dependencies might be tightly coupled within the class, making it challenging to decipher the code's logic. Using DI and constructor injection, for example, you make the dependencies explicit in the class's constructor signature, enhancing code readability and simplifying understanding of its functionality. Moreover, DI promotes modularization and encapsulation by decoupling classes from their dependencies. With this approach, each class has a clearly defined responsibility and can be easily understood in isolation. Additionally, DI encourages the use of interfaces, further enhancing code readability by abstracting implementation details and promoting a contract-based approach to software design. And this was the second time I mentioned interfaces. An interface is a common Java class, but in conjunction with DI, it serves as a powerful tool for decoupling dependencies and promoting flexibility in codebases. Below, I will talk about how this combo can be implemented in code – among other practical insights that will help you make the most of DI. Best Practices for Dependency Injection Use Interfaces Interfaces serve as contracts defining the behavior expected from implementing classes, allowing for interchangeable implementations without modifying client code. As I mentioned above, if a change is required later for some dependency (e.g., to change implementation from v1 to v2), then, if you are lucky, it may require zero changes on the caller's side. You’ll just have to change the configuration to provide one actual implementation instead of another; and since the classes depend on an interface and not on implementation, they won’t require any changes. For instance, let’s say you have a Java service class requiring database access. By defining a DataAccess interface representing the database access operations and injecting it into the service class, you decouple the class from specific database implementations. With this approach, you simplify swapping of database providers (e.g., from MySQL to PostgreSQL) without impacting the service class's functionality: Java public interface DataAccess { void saveData(String data); } public class MySQLDataAccess implements DataAccess { @Override public void saveData(String data) { // Saving data to MySQL } } public class PostgreSQLDataAccess implements DataAccess { @Override public void saveData(String data) { // Saving data to PostgreSQL } } public class DataService { private final DataAccess dataAccess; public DataService(DataAccess dataAccess) { this.dataAccess = dataAccess; } public void processData(String data) { dataAccess.saveData(data); } } Here, the DataService class depends on the DataAccess interface, allowing different database access implementations to be injected as needed. Use DI to Wrap External Libraries Incorporating external libraries into your Java backend may make maintaining testability a challenge due to tight coupling. DI enables you to encapsulate these dependencies within your own abstractions. Imagine that your Java class requires the functionality of an external library, like cryptographic operations. Without DI, your class becomes closely tied to this library, making testing and adaptability difficult. Through DI, you can wrap the external library in an interface or abstraction layer. This artificial dependency can be subsequently injected into your class, enabling easy substitution during testing: Java public interface CryptoService { String encrypt(String data); } public class ExternalCryptoLibrary implements CryptoService { @Override public String encrypt(String data) { // Encryption logic using the external library return encryptedData; } } public class DataProcessor { private final CryptoService cryptoService; public DataProcessor(CryptoService cryptoService) { this.cryptoService = cryptoService; } public String processData(String data) { String encryptedData = cryptoService.encrypt(data); // Additional data processing logic return processedData; } } In this example, the DataProcessor class depends on the CryptoService interface. During production, you can use the ExternalCryptoLibrary implementation, which utilizes the external library for encryption. However, during testing, you can provide a mock implementation of the CryptoService interface, simulating encryption without invoking the actual external library. Use Dependency Injection Judiciously However powerful a technique DI is, you don’t want to overuse it; its excessive use may overcomplicate your code where and when it doesn’t even help that much. Let’s say, you need to extract some functionality to a utility class (e.g., comparing two dates). If the logic is straightforward enough and is not likely to change, utilizing a static method will be a sufficient solution. In such cases, static utility methods are simple and efficient, eliminating the overhead of DI when unnecessary. On the other hand, if you deal with a business logic that can evolve within your app’s lifetime, or it’s something domain-related – this is a great candidate for dependency injection. So, ultimately, you should base your decision to use or not use DI on the nature of the functionality in question and its expected development. Yes, DI shines when we speak about flexibility and adaptability, but traditional static methods offer simplicity for static and unchanging logic. Leverage Existing DI Frameworks Try to use existing DI frameworks rather than building your own, even though creating one might be tempting – I should know, I've made one myself! ;) However, the advantages of existing frameworks often outweigh the allure of crafting your solution from scratch. Established frameworks offer reliability, predictability, and extensive documentation. They've been refined through real-world use, ensuring stability in your projects. Plus, leveraging them grants you access to a trove of community knowledge and support – therefore, opting for an existing framework may save time and effort. So, while it might be tempting to reinvent the wheel, not actually doing it can streamline your development process and set you up for success. * * * Although this article just touches on a few of Dependency Injection's vast benefits, I hope it served as a helpful and engaging exploration of this splendid technique. If you haven't already embraced DI in your Java development practices, I hope this piece has piqued your interest and inspired you to give it a try. And so – here's to smooth, maintainable code and a brighter future in your coding endeavors. Happy coding!
Nicolas Fränkel
Head of Developer Advocacy,
Api7
Shai Almog
OSS Hacker, Developer Advocate and Entrepreneur,
Codename One
Andrei Tuchin
Lead Software Developer, VP,
JPMorgan & Chase
Ram Lakshmanan
yCrash - Chief Architect