Over a million developers have joined DZone.

Cross Language Benchmarking Part 3 – Git Submodules and the Single-Command Cross Language Benchmark

· Performance Zone

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

In my recent blog posts (part 1part 2) I have described in detail how to do micro benchmarking for Java and C/C++ with JMH and Hayai. I have presented a common execution approach based on Gradle.

Today I want to improve the overall project structure. Last time I already mentioned, that the project structure of the Gradle projects is not optimal. In the first part I will roughly repeat the main goal and proceedings from the past articles, secondly introduce some new requirements, and finally I will present you a more flexible module structure to split production code and benchmarks, which will then be embedded in a cross language super-project.

1.) What we covered so far

To keep track of our Todo list, here an excerpt from part 1:

Let’s assume we have a Java and C++ project which implement the same algorithms. We want to benchmark and compare the performance of hot code parts. We also want to track the changes over time when the project source code grows and changes. We will:

  1. Benchmark Java Code with JMH as part of a Gradle build
  2. Benchmark C++ code with Hayai
  3. Integrate c++ binary compilation with Gradle
  4. Integrate Hayai benchmark execution with Gradle
  5. Bring Java and C++ projects together in one cross-language Gradle build chain
  6. Aggregate JMH and Hayai results in a third composite result
  7. Split benchmarking code out of the projects into a dedicated project and dedicated SCM.
  8. Automatically push aggregated benchmarking results to a cleverly structured git repository to keep track between current source code versions and benchmarking results.

We have already achieved all striked out items. Today, we will focus on the blue items.


2.) The messy things

For the sake of completeness, here is the Gradle build script where we want to identify new requirements today:

buildscript {
  repositories {
    jcenter()
  }
  dependencies {
    classpath 'me.champeau.gradle:jmh-gradle-plugin:0.2.0'
  }
}

repositories {
  mavenCentral()
  jcenter()
}

apply plugin: 'me.champeau.gradle.jmh'
apply plugin: 'java'

group = 'net.chroma'
version = '0.0.1-SNAPSHOT'

jmh {
  jmhVersion = '1.6.1'
}

sourceCompatibility = 1.8

dependencies {
  testCompile 'junit:junit:4.11'
}

Baseline:

  • This build script is responsible for building the artifact under test AND for micro benchmarking execution
    • main project has dependencies to plugins and libraries that are not needed by the actual output artifact.
    • benchmark definitions need to reside in the same project source root together with the main program code.
  • The build script contains many setup details, you normally do not want to see

To have a second look at the project setup here is the link to the branch where we start off today: Chroma@github, Branch: crolabefra_starting_point.

Requirements:

  • Benchmarking code should be in a separate project, which depends on the artifact to benchmark, but the artifact itself should be independent of any benchmarking libraries or setup
  • The automation tool should be called once to execute all benchmarks over all attached projects
  • As a cross-language micro benchmark developer, I want to apply a framework plugin, that handles all the setup for me, which is needed to become result data (in the right format).

Both new requirements lead to restructuring the current setup completely. In the following sections, I will show you how

  • to use git and submodules to split up product code and benchmarks
  • the whole picture looks, tied up together with the Hayai integration in a cross language benchmarking project for Java and C/C++ code.
  • to tie everything together in a neat Gradle Plugin (future post)

3.) Separate out benchmarks project

This is the status quo from part 1:

chromarenderer-java
├── src
    ├── main
      ├── java
    ├── jmh
      ├── java
    ├── test
      ├── java
├── build.gradle

We aim for the following structure:

├── chromarenderer-java-benchmarks
    ├── src
      ├── jmh
      ├── java
    ├── build.gradle
    ├── settings.gradle [new]
    ├── chromarenderer-java
        ├── src
            ├── main
                ├── java
            ├── test
                ├── java
        ├── build.gradle 

Advantages are obvious:

  • We have two different gradle modules that decouple build and library dependencies: Separation of dependencies, build steps and custom Gradle code.
  • JMH code is not part of the product source: Separation of responsibility, visibility. Cleaner project setup. The code to benchmark can either be a dependency to fetch from a repository or library or whatever comes to your mind!
  • Two different directories can be in managed in separate git repositories, means, benchmarking code and production code versions are no longer tightly coupled! A set of benchmarks can be easily executed on different SCM versions of the production code with just checking out another commit!

The only disadvantage that comes with the first bullet is, that we end up in a multi-module project. But the setup is very easy in Gradle. Basically only a settings.gradle file is needed in the ‘chromarenderer-java-benchmarks’ directory:

include 'chromarenderer-java'
include 'chromarenderer-java-benchmarks'

After changing the directory structure,, we can go on with removing the parts from both build.gradle files we no longer need. For the benchmarking project first:

buildscript {
  repositories {
  jcenter()
  }
  dependencies {
  classpath 'me.champeau.gradle:jmh-gradle-plugin:0.2.0'
  }
}

repositories {
  mavenCentral()
}

apply plugin: 'me.champeau.gradle.jmh'
apply plugin: 'java'

group = 'net.chroma'
version = '0.0.1-SNAPSHOT'

jmh {
  jmhVersion = '1.6.1'
}

sourceCompatibility = 1.8

dependencies {
-  testCompile 'junit:junit:4.11'
+  compile project(":chromarenderer-java)
}

Ok, we no longer need the JUnit dependency but need the dependency on the project to benchmark now. Well we did not win that much. But wait, what happens for the production code project? Look:

- buildscript {
-  repositories {
-  jcenter()
-  }
-  dependencies {
-  classpath 'me.champeau.gradle:jmh-gradle-plugin:0.2.0'
-  }
-}

repositories {
  mavenCentral()
-  jcenter()
}

-apply plugin: 'me.champeau.gradle.jmh'
apply plugin: 'java'

group = 'net.chroma'
version = '0.0.1-SNAPSHOT'

-jmh {
-  jmhVersion = '1.6.1'
-}

sourceCompatibility = 1.8

dependencies {
  testCompile 'junit:junit:4.11'
}

That what we wanted to see! Not a single line of evidence that there is JMH running benchmarks on the project! Like a vanilla hassle-free gradle project. For the effective examples, take a look at the example project repositories I prepared for this article:


 4.) Git submodules

Now for all of you who have never used Git submodules before, here a short summary: Roughly speaking, Git submodules become handy when you want to have

  • repositories in repositories
  • one repository to participate in different other SCMs
  • import 3rd party code from other repositories which you do not want to manage in your own SCM obviously.

4.1) Concept of submodules

Use case example: Assume you have two projects, that both use a common set of files:

PROJECT_A (own git repository)
├── src
  ├── main
  ├── java
├── common_assets
    ├── ....

PROJECT_B (own git repository)
├── src
  ├── main
  ├── groovy
├── common_assets
    ├── ...

Now instead of adding the same set of files to both git repositories (which obviously will cause a version mess a soon as both projects want to make changes to the common files, which need to be synced manually then), it is more clever to move all files in ‘common_assets’ to their own git repository:

PROJECT_A (git repository A)
├── src
  ├── main
  ├── java
├── common_assets (git repository C)
  ├── ....

PROJECT_B (git repository B)
├── src
  ├── main
  ├── groovy
├── common_assets (git repository C)
  ├── ...

Now you can tell your project repository to clone another repository into the directory and mark that directory as a submodule.

$> git submodule add git@github.com:bensteinert/repository-C.git

Git will then clone that submodule repository which will stay completely independent from the surrounding repository.

The thing which can now cause headaches is the fact that the parent repository only keeps track of the current HEAD revision. Stages and change-sets on the files inside the submodule are not visible to the parent, only changes of the local HEAD revision are recognized. Initially, after the command above, the change set should look like this:

$> git status
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

   new file:  .gitmodules
   new file:  common_assets

.gitmodules contains the details about the newly created submodule. The new directory common_assets is now shown as it would be a simple file. Background is, that git created a link to a directory inside the .git folder (.git/modules/common_assets to be precise). After creating a commit out of the change set, git stored the currently checked out revision in the submodule. Just to bring probably more clarity to it, step into the submodule and change the HEAD by checking out another commit. This operation will produce a change set which will look like this

$> git status
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

   modified:  common_assets (new commits)

If you return to the previous commit, the change is gone. Those submodule HEAD changes can be added to a commit like any other file change. The only thing that will be stored internally in the parent repository is the commit id of the currently checked out HEAD. What happens behind the scenes in the parent directory can be observed easily with the ‘diff’ command after changing the submodule HEAD revision again:

$> git diff
diff --git a/common_assets b/common_assets
index 244411c..d416be2 160000
--- a/common_assets
+++ b/common_assets
@@ -1 +1 @@
-Subproject commit 244411c0aad0b2278eb05622966ba59e1f48ab4b
+Subproject commit d416be2c15250a85bf7b993c0c4a41ff99162b56

That again looks like a normal file change, but in fact you changed the tracked HEAD revision of a submodule :).

I hope I brought some clarity to the concept. In any case I recommend the git manual forfurther reading.

General benefits

  • A submodule allows to combine different repositories in one parent without really adding the source files to a second SCM.
  • Changes to submodule content can be done at any time because it is a regular git repository on its own.
  • You can decide, when to pull new content in your submodules.
  • A project that consists of different submodules can keep track of the different combinations that were committed in history.
  • Submodule HEAD revisions can be easily changed at any time to an arbitrary point in the history without causing major file-set changes.

4.2) Git submodules and benchmarks?

Let’s get back to our new project structure. Combining the new structure of having benchmarks ‘surrounding’ the production code allows us to put them in independent repositories. The project to benchmark is then checked out as submodule into the benchmark project:

├── chromarenderer-java-benchmarks ( git repository chromarenderer-java-benchmarks)
    ├── src
      ├── jmh
      ├── java
    ├── build.gradle
    ├── settings.gradle
    ├── chromarenderer-java (git submodule chromarenderer-java)
        ├── src
            ├── main
                ├── java
            ├── test
                ├── java
        ├── build.gradle 

Important note regarding git repository cloning which contain submodules:

After cloning a git repository, all contained submodules are NOT initialized and cloned automatically! You have two options. Either clone the repository with the recursive parameter:

$> git clone --recursive git@github.com:bensteinert/chromarenderer-java-benchmarks.git

or, after a normal clone, execute:

$> git submodule update --init

All submodules you have in the repository will be fetched and the currently valid HEAD revision will be checked out.


5.) The missing ‘cross’ in cross language benchmarking

We spoke about the Java part so far. The concepts can be directly applied to the C++ project we started in Part 2 as well, of course. For the sake of not bothering you with the same things twice, I will just refer to the git repository I prepared for the Hayai benchmarks project:chromarenderer-cpp-benchmarks@github. You will recognize exactly the same changes:

  • All benchmark-related things moved one directory up
  • The core project ‘chromarenderer-cpp’ got rid of all dependencies on benchmarking infrastructure
  • Surrounding benchmarking module defines dependency on core module

Cool, looks like that can be generalized :).

Means, we now have two different benchmarking projects that can be run in isolation:

5.1) Yet another super Project

We could now check out both repositories and execute the benchmarks with a single command each. But no, we are even more lazy. We want one cross-language repository with one cross-language build. So let’s add another level of super project:

CroLaBeFra (git repository crolabefra) [new]
├── settings.gradle [new]
├── chromarenderer-java-benchmarks (git submodule chromarenderer-java-benchmarks)
    ├── [src ...]
    ├── build.gradle
    ├── settings.gradle
    ├── chromarenderer-java (git submodule chromarenderer-java)
        ├── [src ...]
        ├── build.gradle 
├── chromarenderer-cpp-benchmarks ( git submodule chromarenderer-cpp-benchmarks)
    ├── [src ...]
    ├── build.gradle
    ├── settings.gradle
    ├── chromarenderer-cpp (git submodule chromarenderer-cpp)
        ├── [src ...]
        ├── build.gradle

5.2) Small Gradle pitfall

In theory, the proposed directory model looks promising. But as soon as you moved everything and give it a try, you will be surprised, that this setup doesn’t work. In your settings.gradle in the CroLaBeFra super project, you would probably try something like:

1
2
3
rootProject.name = 'CroLaBeFra'
include 'chromarenderer-cpp-benchmarks'
include 'chromarenderer-java-benchmarks'

Now what strikes you, is that Gradle is unable to detect multiple ‘settings.gradle’ files in one project. Consequently, includes in the sub projects are ignored. But it wouldn’t be Gradle if there wasn’t a workaround ;). Because the super project should know that its includes bring more sub projects into the build, you could include the ‘settings.gradle’ files to you super project as well:

rootProject.name = 'CroLaBeFra'
include 'chromarenderer-cpp-benchmarks'
include 'chromarenderer-java-benchmarks'
apply from: 'chromarenderer-cpp-benchmarks/settings.gradle'
apply from: 'chromarenderer-java-benchmarks/settings.gradle

Now Gradle will read the subproject ‘settings.gradle’ files as well as if it would be part of the super-project ‘settings.gradle’ file. Downside is now, that the assumed directory structure gets inconsistent. Background:

apply from: 'chromarenderer-java-benchmarks/settings.gradle'

basically means the same as

include 'chromarenderer-java'

because the content will simply be evaluated in the super project context. But that directory and sub-project ‘chromarenderer-java’ do not exist on the super project directory level! But again, it wouldn’t be Gradle if there wasn’t a solution.

Fortunately, Gradle first collects all includes from projects before really accessing the directories. This means, we can change it afterwards by adding the following lines:

project(':chromarenderer').projectDir = new File(rootDir, 'chromarenderer-cpp-benchmarks/chromarenderer')
project(':chromarenderer-java').projectDir = new File(rootDir, 'chromarenderer-java-benchmarks/chromarenderer-java')

It feels a little bit dirty, but in the end, as long as Gradle does not support multiple settings.gradle files natively, there is no other way. As we do all the dirty things in the super project, we do not have to touch our sub-projects and they stay free of such workarounds.”

Ultimate result of the day with all changes and proposals applied: CroLaBeFra-POC@github

Clone (recursively) and simply run

$> gradle runBenchmarks jmh

Mission accomplished :)

6.) Conclusion and ongoing work

Today we solved topic 7 and 5 from the list. Our core product project is independent of any benchmarking infrastructure and/or source code. A straightforward directory structure combined with the power of Git submodules allows for an easy management of the Gradle multi-module setup. With some Gradle magic, we added another project level on top, which allows to access and execute all benchmarking tasks with a single command. Looking at the list, we are almost done.

  1. Benchmark Java Code with JMH as part of a Gradle build
  2. Benchmark C++ code with Hayai
  3. Integrate c++ binary compilation with Gradle
  4. Integrate Hayai benchmark execution with Gradle
  5. Bring Java and C++ projects together in one cross-language Gradle build chain
  6. Aggregate JMH and Hayai results in a third composite result
  7. Split benchmarking code out of the projects into a dedicated project and dedicated SCM.
  8. Automatically push aggregated benchmarking results to a cleverly structured git repository to keep track between current source code versions and benchmarking results.

But I have one more topic I would like to add to the list:

  • Extract all benchmarking config from the ‘build.gradle’ files into easy-to-use Gradle plugins, in order to offer them to all of you :). Overall goal is to have a set of various plugins for different languages that can be applied to a benchmarking project. Ideally the only thing you need to do then, is
apply plugin: 'com.comsysto.gradle.crolabefra.cpp'

instead of having thirty lines of Gradle script code to copy. Sounds good? Stay tuned for my next article on that!

So long!



Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:
java ,high-perf ,tips and tricks ,tools & methods

Published at DZone with permission of Comsysto Gmbh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}