Design and Architecture of the Chronon Recorder
The Chronon recorder had directly opposing goals - to collect as much data about your program as possible, while at the same time having the least possible impact on it. In this post I will try to describe some of the design and architectural decisions I made to achieve that.
The prime design goals of the Chronon recorder were -
- Minimum impact on application responsiveness
Scalability was higher on the list than raw performance. The reason being was that with a scalable implementation, even if you hit a performance wall with the recorder, you can always upgrade your machine/configure the recorder to continue recording.
To achieve this scalability, we made the following assumptions about how hardware is progressing:
- Cpu cores will keep increasing in number and go down in cost.
- Memory is cheap.
- Due to point 2 above, 64 bit computing is becoming the standard.
- Cpu cores aren't getting any faster.
The recorder works by running as a java agent and instrumenting the bytecode of your java program in memory, thus not requiring you to make any changes to your code.
By universal, I mean that the recorder
should be able to record any Java app whether it's a J2EE app, Swing/SWT
app or any other kind of application.
It should also be platform independent, being in line with the Java philosophy of 'Write Once, Run Anywhere'. Thus you can record on any platform say a Mac and playback on any other platform, say Windows.
How it all works
The recorder works as follows to achieve its design goals:
- The work done in the instrumented threads of your application is kept to a minimum. This is done to ensure minimum impact on the responsiveness of your application.
- The recording data that is generated by the application threads is stored in a buffer in memory.
- 'Flusher Threads' keep reading chunks of data from this buffer, do some processing on it and save it to disk in a highly packed format. Thereby essentially 'flushing' the data generated by the recorder.
So how do our assumptions about hardware help with this? Lets take a look -
- If you have more cores than the threads of your applications, the recorder will do most of its work on those cores, inside the flusher threads and have minimum impact on the performance of your program. This also gives you a hint on how many flusher threads you should use. So if you have a single threaded application and a quad core processor, you can tell Chronon to use 3 flusher threads, similarly if the application has 2 threads, cpu had 4 cores, use 2 flusher threads to make use of those 2 extra cores.
- Now there are always going to be applications which use more threads than the number of cores or are generating data way faster than they can flush it out. This is where assumption 2 comes in. If you have enough memory the generated data will have a place to sit there while it waits to get flushed out. It is for these cases that we recommend using a 64 bit machine.
What about all the Garbage Collector (GC) issues with using all that extra memory?
It is a well known fact that current JVMs don't handle heap sizes above 2gb very well. It is possible that if you have an extremely computationally intensive program that Chronon does generate utilize that much memory or that your application already is reaching the 2gb limit and Chronon makes it go over that.
To solve this issue we use a custom memory management. Thus even if the data generated by Chronon goes a little high, it wont have a heavy impact on the GC. It is common to see a 2-3gb Chronon heap shrink to a few hundred megabytes within a blink of an eye, which would ordinarily take many seconds or minutes without our custom memory management. That said, in most development scenarios, the heap sizes wont reach even near that high.
But even that ain't enough...
But even after all this you may run into hardware assumption 4 above. This happens when even though you have enough cores and memory, the application threads of your program are doing too much work and even the small overheard of the recorder directly on those threads is affecting the responsiveness your application.
For these situations, Chronon allows you to specify any part of your program which to be excluded from recording. So if you have a portion of your program which is doing some heavy computation and which you know doesn't have any errors, or you just don't care about examining it for now, you can exclude it from recording.
We won't go into details of how to configure this right now, but it's suffice to say that any part that is excluded runs with absolutely zero overhead, just like it would run without the recorder. For calls to these 'unrecorded' methods, we will record just the input arguments and the return value on the call site, which is usually enough information for debugging purposes.