There are three ways to interact with the database in client/server computing:
- Play twenty questions with SQL statements.
- Move all the data to the client and then back again.
- Use a stored procedure to move the problem to the server.
Of these, using stored procedures is the most efficient and logical choice for high-volume, low-latency, online transaction processing (OLTP). Unconvinced? I’m not surprised. Stored procedures have a (mostly unearned) poor reputation, so let’s go through each choice, taking a closer look at the benefits of stored procedures.
JDBC/ODBC (AKA the Twenty Questions Approach)
This is a 1980s paradigm. The client issues numerous SQL statements, which whizz back and forth across the network. The "users" are typically humans looking at screens, browsing data, locking records, updating records, and eventually finishing a transaction. The approach was revolutionary at the time and works fine at the speed of human interaction, which in practice is between two and seven seconds. Problems begin, however, as transactions scale to 5K TPS and over.
The 1980s model places a huge load on the database server. Implementing read consistency inside the database is like having twenty online conversations at once; significant resources are required to converse consistently with the 19 other people involved in the same discussion. Why is it difficult? Latency is the issue. And latency can be caused by many different activities within the database or the network.
For example, research by Michael Stonebraker showed that 28% of database CPU time in an OLTP database was spent on latching and locking, introducing unacceptable latencies for many operations. In addition, legacy RDBMS databases struggle to scale beyond a single node due to a lack of built-in partitioning. Another problem is that the many network trips required for each transaction create unavoidable latency. Even small per-trip latencies add up and create a minimum possible time, which is driven by the laws of physics and the vagaries of TCP/IP.
Because of the protracted nature of transactions — and that we’re trying to do twenty at once around the same shared resource, such as credit — locking is a major issue with this approach. Solving locking issues is generally left up to developers. Issues rarely show up in testing and instead cause havoc once enough of the right kind of load happens in your environment. Even when all goes well, locking can consume 18% of CPU.
A further problem is that locking evolved in the 1980s and is designed with human speeds and behavior in mind. As an example of the penalties of locking, consider an M2M/IoT/Telco scenario in which the default behavior of many components is to re-start the same transaction again on a new thread if they don’t get a timely response. This leads to the possibility of locking yourself as your session waits for its previous incarnation to time out.
The conventional RDBMS approach to locking works reasonably well for updates but not for inserts. While the semantics of a low-level lock are fairly simple, an uncommitted insert is problematic. A legacy RDBMS server will know that someone else has inserted a row with the same Primary Key, but if the insert hasn’t been committed, it can’t tell you anything about it, as doing so would allow you to ‘see’ someone else’s uncommitted work. Your session will hang until the other session commits or rolls back, at which point your insert will either fail or work. This is not good in any scenario with a hard SLA for latency.
An overlooked aspect of the traditional “twenty questions” approach is its implications for high availability. While people spend lots of time working to create highly available database servers, almost no energy is devoted to looking at what happens when a client or application server with uncommitted changes either forgets to commit or crashes. If this happens while the client is working with a shared resource such as credit or a hotel room booking, you must wait until the server figures out that the connection is dead and cleans up, which can take several minutes. Consequently, the failure of a single busy application server can rapidly escalate to a system-wide outage, because zombie data sessions will hold uncommitted locks on key resources.
Move All the Data to the Client — and Back Again
This is a common approach used in NoSQL databases and object stores. It demos well but gets ugly as life gets more complicated, as the granular nature of access makes working with shared finite resources challenging.
Latency Issues When Working With Shared Resources
Most data shipping approaches also involve some form of timestamping and hashing to determine which is the correct data when multiple different replacement copies of the data show up, all of which share an ancestor. One of the sessions will lose and must re-read and then retry the transaction (with all its overhead) again, but there is absolutely no guarantee that when working with a finite shared resource the second, third, or subsequent attempts will actually work. Therefore, long latency tails can be expected to appear as volumes increase.
A Failure to Handle Emergent Complexity
An issue we see in a lot of NoSQL deployments is that while the first application works just fine, problems begin as the level of complexity increases. Once the underlying schema is too complicated for your data to be represented in a single chunk of data, you will find yourself shuttling several chunks back and forth, any or all of which could be changed by someone else’s code while you are working with them. Consequently, you are vulnerable to a much more serious version of the latency issues described above.
Not all use cases involve create, read, update, and delete (CRUD) operations – in many cases, we’re trying to count, aggregate, summarize, or otherwise create some form of aggregated information out of raw data. While these operations can be expensive in a conventional RDBMS, in the data shipping scenario their costs can be exorbitant, unless some additional indexing strategy is in play. One can happily build a document database to keep track of cars for the Department of Motor Vehicles and use "license plate" as a key, but a simple request for a list of all the purple cars could plausibly involve moving all the data across the wire to a client who must re-instantiate it.
CPU, Network, and Storage Overhead
Data must be parsed into individual elements to be usable by a machine. In a SQL database, this is done once, at insert time. In a data shipping environment, every use involves parsing the object in its entirety every time you want to do anything more sophisticated than counting it. While parsers such as GSON look fast in a development environment, especially with trivial examples, their overhead becomes significant as an object becomes more complex. At VoltDB we’ve seen multiple examples where the cost of using a utility to parse an object was far more expensive than the database-related CPU costs. Shipping blobs, another approach, also introduces network and storage costs.
Use a Stored Procedure to Move the Problem to the Server
For many of the use cases that we see at VoltDB — especially IoT/M2M/Telco — the approach that the database takes of moving the logic to where the data is pays dividends. Now, I’m well aware that a lot of readers probably groaned audibly when they read the phrase "stored procedures," but we shouldn’t let the weaknesses of prior implementations hide the reality that they make sense in OLTP. Anyone who has worked with stored procedures will have encountered a series of problems; here are a couple of them that will be familiar.
Vendor-Specific SQL Manipulation Language the Dev Doesn’t Know
Oracle, TimesTen and DB2 use variants of PL/SQL, which in turn sought inspiration from Ada. It has almost nothing in common with Java or SQL, which are the two languages we’d plausibly expect developers to know. Other databases have variations on the PL/SQL theme, but with less functionality. PL/SQL is one of the very few languages where a "Hello World" program is non-trivial.
JDBC/ODBC Impedance Mismatch
JDBC thinks in terms of individual SQL statements, cursors, and scalar variables. PL/SQL is full of concepts for records and arrays of records that have no clear, easy-to-use representation within the JDBC standard. As a result, developers traditionally can’t easily work with procedures that do anything other than work with scalars and return result sets, despite richer functionality being implemented by the server-side language. This often means multiple trips are needed to execute a single logical transaction, defeating the whole purpose of a stored procedure.
Stored Procedures for the 21st Century
VoltDB was written with an awareness of these issues and was architected to avoid them. The API is simple and allows the user to make arbitrary collections to and from the server without requiring extensive additional programming. VoltDB thus avoids the impedance mismatch issue. VoltDB also uses Java inside the server, and while there are certain rules you must follow to make High Availability work, it’s a lot simpler to learn those rules than to master a whole new language.
Therefore, we can support the assertion that the historical prejudices against stored procedures is misplaced in the case of VoltDB. By using stored procedures, we provide the following advantages:
Reduced Latency Because We Have Fewer Network Trips
By packaging all the data required to make a decision into a single network trip, we eliminate the “twenty questions” approach and all the latency it implies.
As we discussed above, maintaining a consistent SLA can be challenging if the read consistency takes place at the client level. Valuable shared data (such as credit) becomes unusable as clients clutter the database with unfinished work which affects it. As the workload increases, the number of collisions and retries will increase. By refusing to allow this, we avoid the repeated retry scenarios that appear when working with shared, finite resources.
Efficient Use of CPU and Network
The fact that using stored procedures reduces network trips is obvious, but there’s also the overhead created by moving large quantities of data around the network to do a much smaller amount of work. Stored procedures reduce this traffic to a logical minimum.
By preventing transactions from outliving calls to the database, VoltDB avoids making HA dependent on the survival of the application server.
In the context of OLTP, how applications interact with the database is just as critical as how the database works internally. Traditional approaches assume that the actual ‘thinking’ takes place on the client side, and the job of the database is simply to store data. But the laws of physics prevent clients from having instant access to the data – it must travel across a network – and as a consequence the number of network trips, and the time they take up, account for the minimum possible time to do a transaction if you are working with data shipping or individual SQL statements.
This minimum unavoidable network time period is at least two orders of magnitude more than the CPU time needed to process the data. Attempting to work faster than this speed will inevitably result in contention between sessions when accessing the remote data. This means latency will become geometrically worse as workloads increase. No amount of hardware can reliably eliminate the long latency tail that we encounter in OLTP if all clients insist on taking the data home to process it.
When you consider all this, the idea of moving the work to where the data is by using a stored procedure no longer looks as ugly and unpleasant as it did initially. Bear in mind that nobody questions the Hadoop concept that moving computation is cheaper than moving data, so why can’t we do it with a database?