Query Substitution

What to do when an application uses a bad query, and you can't change the application.

Max Tardiveau

Jan. 20, 22 · Tutorial

Like (6)

Save

3.5K Views

The Problem

Imagine a business application that uses a database. While using this application, you discover that performing a specific task takes an unusually long time.

You set out to investigate, and find out that this long response time is caused by a specific query to the database. For some reason, this query, which was running fine until today, now takes a long time.

No problem, you figure. Maybe you can optimize the database -- create a new index, maybe, or update the database statistics.

After trying various things, you realize that the problem is not that easy. This query cannot be fixed just by tweaking the database. Or maybe changing the database was never an option.

Regardless, the situation is now more serious. This query needs to be changed. Thankfully it can be rephrased in a more efficient way that returns the same data, so it should be easy. But maybe you don't own this application, and therefore you cannot change it. Or maybe this application is no longer maintained. Or maybe it would just take too much work to update it.

Whatever the reason, you are now between a rock and a hard place. When you cannot change the server, and you cannot change the client, what's left?

Enter the Database Proxy

In situations like this, a database proxy can be a lifesaver. A database proxy sits on the network, between your database client (the business application) and your database server, and can dynamically change requests and responses as needed, without modifying the database client or the database server.

Zooming In

Let's assume, as a slightly contrived example, that our application is suddenly having performance problems when it executes the query:

select * from demo.customers where balance > ?

This is not a very good query, but we're stuck with it.

It turns out that someone has changed the customers table and added a new, very large column named photos, of type blob. That's another questionable decision, but it's out of our control.

Our application has no interest in these photos, but it did not specify which columns to retrieve, so now our query returns these photos, needlessly consuming bandwidth and memory.

We've already determined that updating the application is not an option, and neither is changing the database. So our only viable option is to have a proxy that will substitute our improved query for the old one.

Let's see how this works with Gallium Data, a free database proxy:

Step 1: Install the Database Proxy

Installation is trivial -- it's a simple Docker image, so you can start it up in a minute.

Step 2: Configure the Proxy

Next, we need to set up the proxy so that it connects to the database server. All we need is the host name and port number of the database server. Database credentials are not required because authentication is pass-through. If we need to support encryption, we can either use a new certificate, or the same certificate as the database server (if available).

Step 3: Set up the Query Substitution

Next, we need to tell the proxy which query to catch, and what to substitute it with. That's done with a request filter, which will look at all the queries coming in over the wire:

      
     Replace:
select * from demo.customers where balance > ?

with:
select name, balance, address from demo.customers where balance > ?

Step 4: Redirect the Application

Finally, we need to tell the application to talk to the proxy, rather than directly to the database. This is usually done by changing a configuration file, an ODBC/JDBC data source, or a command-line parameter.

That's it -- the whole thing can usually be done in a few minutes. Neither the application nor the database server will notice any difference. From now on, the "bad" query will be replaced with the "good" query every time the application issues it.

Once this is in place, doing the same thing for other queries is of course trivial -- more request filters can be created and deployed in moments.

How Does it Work?

You are probably familiar with proxies for things like HTTP and SSH. The principle here is the same, except that the proxy has a deep understanding of the database protocol, and can parse the packets, modify them, and reassemble them as needed.

In the middle of the proxy is a JavaScript engine that runs your logic, which can manipulate any packet as needed. For simple query substitutions, it's usually a trivial one-liner, but it can get as sophisticated as required.

Everything going over the wire is available to your logic: prepared statements with their parameters, calls to stored procedures, result sets and their metadata, etc... You can shape the dialog between applications and their database as needed.

Cost/Benefit

The main disadvantage of this approach is the added complexity. Instead of two systems (the application and the database), you now have three systems to worry about. Fixing issues with the application will now require you to also look at the proxy.

On the other hand, putting a proxy in front of a database gives you enormous flexibility, and allows you to modify any request or response as needed. It's a lot of power -- use it wisely.

In some situations, there are simply no alternatives. A proxy can extend the lifespan of applications that would otherwise have to be discarded because they cannot be updated.

Conclusion

A database proxy opens up whole new possibilities when dealing with database applications. Query substitution is only one of many use cases: you can also augment security, do data masking, fine-grained access control, etc. Think of the possibilities.

Database application

Opinions expressed by DZone contributors are their own.

Related

Trending