In Plumbr we have spent the last month by building the foundation for future major improvements. One of such building blocks was addition of the unique identifier for JVM in order to link all sessions from the same JVM together.
While it seems a trivial task at the beginning, the complexities surrounding the issue start raising their ugly heads when looking at the output of the JVM-bundled jps command listing all currently running Java processes in my machine:
My Precious:tmp my$ jps 1277 start.jar 1318 Jps 1166
Above is listed the output of the jps command listing all currently running Java processes in my machine. If you are unfamiliar with the tool – it lists all processes process ID in the left and process name in the right column. Apparently the only one bothering to list itself under a meaningful name is the jps itself. Other two are not so polite. The one hiding behind the start.jar acronym is a Jetty instance and the completely anonymous one is actually Eclipse. I mean, really – the biggest IDE in the Java world cannot even bother to list itself under a name in the standard java tools?
So, with a glimpse to the state of the art in built-in tooling, lets go back to our requirements at hand. Our current solution is identifying a JVM by process ID + machine name combination. This has one obvious disadvantage – whenever the process dies, its reincarnation not going to get the same ID from the kernel. So whenever the JVM Plumbr was monitoring was restarted or killed, we lost track and were not able to bind the subsequent invocations together. Apparently this is not a reasonable behaviour for a monitoring tool, so we went ahead to look for a better solution.
Next obvious step was taken three months ago when we allowed our users to specify the name for the machine via -Dplumbr.application.name=my-precious-jvm startup parameter. Wise and obvious as it might seem, during those three months just 2% of our users have actually bothered to specify this parameter. So, it was time to go back to the drawing board and see what options we have when trying to automatically bind unique and human-readable identifier to a JVM instance.
Our first approach was to acquire the name of the class with the main() method and use this as an identifier. Immediate drawbacks were quickly visible when we launched the build in the development box containing four different Jetty instances – immediately you had four different JVMs all binding themselves under the same not-so-unique identifier.
Next attempt was to parse the content of the application and identify the application from the deployment descriptors – after all, most of the applications monitored by Plumbr are packaged as WAR/EAR bundles, thus it would make sense and use the information present within the bundle. And indeed, vast majority of the engineers have indeed given meaningful names in the <display-name> parameter inside web.xml or application.xml.
This solved part of the problem – when all those four Jetty instances are running apps with different<display-name>’s, they would appear as unique. And indeed they did, until our staging environment revealed that this might not always be the case. We had several different Plumbr Server instances on the same machine, using different application servers but deploying the same WAR file with the same <display-name> parameter. As you might guess, this is again killing the uniqueness of such ID.
Another issue raised was the fact that there are application servers running several webapps – what will happen when you have deployed several WAR files to your container?
So we had to dig further. To distinguish between several JVMs running the same application in the same machine, we added the launch folder to warrant the uniqueness of the identifier. But the problem of multiple WAR’s still persisted. For this we fell back to our original hypothesis where we used the main class name as identifier.
Some more technical nuances, such as distinguishing between the actual hash used for ID and the user-friendly version of the same hash, asides – we now the solution which will display you something similar in the list of your monitored JVMs:
|artemis.staging||Self Service (WAR)||07.07.2014 11:45|
|artemis.staging||E-Shop (WAR)||08.07.2014 18:30|
So, we were actually able to come up with a decent solution and fallback to manual naming with -Dplumbr.application.name parameter if everything else fails. One question still remains – why is something so commonly required by system administrators completely missing from the JVM tooling and APIs?