How to Intercept and Debug All Java HTTP
Building debugging tools for the JVM, with Java agents and Byte Buddy.
Join the DZone community and get the full member experience.Join For Free
Java and the JVM more generally are widely used for services everywhere, but often challenging to debug and manually test, particularly in complicated microservice architectures.
HTTP requests and responses are the core of interactions between these services, and with their external APIs, but they're also often invisible and inaccessible. It's hard to examine all outgoing requests, simulate unusual responses & errors in a running system, or mock dependencies during manual testing & prototyping.
Over the last couple of weeks, I've built a Java agent which can do this, completely automatically. It can seize control of all HTTP & HTTPS requests in any JVM, either at startup or attaching later, to redirect them to a proxy and trust that proxy to decrypt all HTTPS, allowing MitM of all JVM traffic. Zero code changes or manual configuration required.
This means you can pick any JVM process - your own locally running service, Gradle, Intellij, anything you like - and inspect, breakpoint, and mock all of its HTTP(S) requests in 2 seconds flat.
In this article, I want to walk you through the details of how this is possible, so you can understand some of the secret powers of the JVM, learn how to transform raw bytecode for yourself, and build on the examples and source code behind this to build your own debugging & instrumentation tools.
If you just want to try this out right now, go download HTTP Toolkit.
If you want to know how on earth this is possible, and how you can write code that does the same, read on:
What's Going on Here?
In some ways, intercepting all HTTP(S) should be easy: the JVM has standard HTTP proxy and SSL context configuration settings (e.g. `-Dhttp.proxy` and `-Djavax.net.ssl.trustStore`) so you could try to configure this externally by setting those options at startup.
Unfortunately for you, that doesn't work. Most modern libraries ignore these settings by default, opting to provide their own defaults and configuration interfaces. Even when the library doesn't, many applications define their own connection & TLS configuration explicitly. This is often convenient and sensible in general, but very inconvenient later when you want to start debugging and manually testing your HTTP interactions.
Instead of setting config values at startup that nobody uses, we can capture HTTP by force, using a Java agent. Java agents allow us to hook into a JVM process from the outside, to run our own code, and rewrite existing bytecode.
When our agent is attached to the JVM (either at startup before everything loads, or later on) we match against specific classes used within built-in packages and a long list of popular external libraries, looking for everything from TLS configuration state to connection pool logic, and we inject a few small changes throughout. This lets us change defaults, ignore custom settings, recreate existing connections, and reconfigure all HTTP(S) to be intercepted by our HTTPS-intercepting proxy.
This is really cool! From outside a JVM process, we can use this to reliably rewrite arbitrary bytecode to change how all HTTP in a codebase works, and take control of the entire thing ourselves. It's aspect-orientated programming on steroids, and it's surprisingly easy to do.
Let's talk about the details.
What's a Java Agent?
A Java agent is a special type of JAR file, which can attach to other JVM processes, and is given extra powers by the JVM to transform and instrument bytecode.
Despite the name, they're not Java-only; they work for anything that runs on the JVM.
There's two ways to use a Java agent. You can either attach it at startup, like so:
or you can attach it later dynamically, like so:
The agent can have two separate entry points in its JAR manifest to manage this: one for attachment at startup, and one for attachment later. There are also JAR manifest attributes that opt into transformation of bytecode. Configuring that for a JAR built by gradle looks like this:
Lastly, you have an agent class that implements these methods. Like so:
That Instrumentation class we're given here provides us with methods like `addTransformer` and `redefineClasses` which we can use to read and overwrite the raw bytecode of any class in the VM.
HTTP Toolkit includes an agent JAR built from all the above, which allows it to attach to any JVM application, run code within that application (to set defaults and configuration values using normal APIs, where possible) and to transform and hook internals of all HTTP-related classes we care about.
The agent setup is just the first step though: this gives us almost complete power to change what the target application is doing, but working out how to transform classes is complicated, there are some limitations to our transformations, and handling raw bytecode isn't easy...
How Do You Transform Raw Bytecode?
In short: using Byte Buddy.
This is a complex library, which can do a lot of powerful things with bytecode including generating subclasses and interface implementations dynamically at runtime (e.g. for mocking frameworks), manually mutating classes and methods, and transforming bytecode automatically through templates.
In agent cases like HTTP Toolkit's, we're interested in the template approach, because there is a Java agent limitation: when reloading already loaded classes, the new definition must match the same class schema. That means we can add new logic into existing method bodies, but we can't create new methods or fields on existing classes, or make changes to existing method signatures.
To handle this, Byte Buddy's built-in 'advice' system defines method transformation templates, which it can apply for us whilst guaranteeing that the schema is never changed in any other way.
First, we need to set up Byte Buddy. This configuration seems to work nicely:
Then, we define an Advice class which will transform our target. Advice classes look something like this:
This says "at the end of the targeted method body, insert extra logic which replaces the return value with [our proxy value]".
The code here is effectively injected into the end of the method body (because of Advice.OnMethodExit), and annotations can be used on method parameters (like @Advice.Return) to link variables in this template method to method arguments, field values, `this`, or return values in the existing method body.
To tie this all together, we have to tell Byte Buddy when to apply this advice, like so:
Byte Buddy uses this fluent API to build maps from type matchers (like `named` here) to type transformers, and then build transformations that apply specific advice templates to methods matching certain patterns (e.g. hasMethodName("getProxy")).
The above code is effectively the real implementation logic we use to intercept OkHttp: for all OkHttpClient instances, even ones that are already instantiated when we attach, we override `getProxy()` so it always returns our proxy configuration, regardless of its previous configuration. This ensures that all new connections from all OkHttp clients go to our proxy.
This is just part of one simple case though (the full OkHttp logic is here) and doing this for all HTTP is significantly more involved...
What Transformations Allow you to Capture all HTTPS?
With the above, we can build a Java agent that can attach to a JVM target, and easily arbitrarily transform method bodies.
Usefully intercepting HTTP(S) still requires us to find the method bodies we care about though, and work out how to transform them.
In practice, there's three steps to transforming any target library to intercept HTTPS:
- Redirect new connections to go via the HTTP Toolkit proxy server
- Trust the HTTP Toolkit certificate during HTTPS connection setup
- Reset/stop using any open non-proxied connections when attaching to already running applications
I'm not going to walk through the detailed implementation of that for every version of every supported library (if you're interested, feel free to explore the full source) but let's look at a couple of illustrative examples.
Some of this logic is written in Kotlin, and it uses a few helpers on top of the above, but if you've read the above and you understand Java you'll get the gist:
Intercepting Apache HttpClient:
Apache HttpClient is part of their HttpComponents project, a successor to the venerable Commons HttpClient library.
It's been around for a long time in various forms, it's very widely used, and fortunately it's very easy to intercept.
For v5, for example, all outgoing traffic runs through an implementation of the `HttpRoutePlanner` interface, which decides where requests should be sent.
We just need to change the return value for all implementations of that interface:
With that alone, we've redirected all traffic elsewhere.
Meanwhile resetting all SSL connections requires prepending to SSL socket creation to change the SSL configuration.
As a nice bonus, the above `HttpRoutePlanner` approach means we don't even need to reset connections: request routes no longer match existing open connections, so requests immediately stop using those connections, start using our proxy instead, and the existing connections harmlessly time out.
Intercepting Java's built-in ProxySelector:
Let's try something more difficult: we can rewrite a built-in Java class? Yes we can.
When our agent first attaches, it changes the default ProxySelector using the normal public APIs, so that any code using Java's default proxy selector automatically uses our proxy with no transformation required.
Unfortunately though, some applications manually manage proxy selectors, and this could result in HTTP not being intercepted.
To fix this, we set the proxy selector using the normal `ProxySelector.setDefault()` API during agent setup, and then later we transform the built-in class to disable that setter completely, so nobody else can change it.
That looks like this:
Transforming build-in classes does come with some caveats, e.g. you need to set `.ignore(none()` during Byte Buddy setup (see the example above) and you can't reference any non-built-in types within your advice class. For simple changes like this though, that's no big problem.
Intercepting Spring WebClient HTTP:
Ok, last example, let's see a more complicated case. How does Spring's WebClient work?
Spring WebClient is a relatively new client on the block - it's a reactive client released as part of Spring 5, offering a Spring-integrated API built over the top of Reactor-Netty by default (but configurable to use other engines too).
I suspect the vast majority of users use the default Reactor-Netty engine, and if they don't then they use an engine that's already intercepted by another one of our configurations. That means we just need to intercept Reactor-Netty, and we'll capture all Spring WebClient traffic ready for debugging.
Extremely helpfully, Reactor Netty stores all the state we care about (both proxy & SSL context) in one place: the HttpClientConfig class. We need to reset that internal state somehow for all instances, but it's not conveniently exposed in the public APIs...
Even more helpfully though, their HttpClient class is cloned during each request, passing the config to the request's client, giving us the perfect hook to grab the config and modify it before every request.
That looks like this:
Isn't this fun?
Ok, while I'm fully expecting that while half the people who've read this far may be fascinated, the other half will be horrified.
We are elbow-deep in library internals here, and unrepentantly so.
This does have some caveats: it's quite possible that library changes could break this, or that some transformations could cause side effects. I wouldn't recommend doing this in production without significantly more careful transformation & testing, but for local development and testing the risk is low, and this works like a charm.
In practice, I suspect the fragility issues will be small. The code we're transforming is the low-level internals of connection setup, which changes relatively infrequently. Some git-blaming of the repos of various targets here suggests that in most cases this logic has barely changed since v1, or changes only marginally every 5 years or so, and updating this logic when there are changes is not a huge task. In addition, while new libraries will come out too, most of them build on top of these existing engines, so we can support them for free!
This kind of power is little-known and underused in much of the JVM community, and I'm really excited to see how you use it! Test this out now in HTTP Toolkit, try building your own Java agents, and get in touch on Twitter if you have any thoughts or questions.
Published at DZone with permission of Tim Perry. See the original article here.
Opinions expressed by DZone contributors are their own.