Since we released
dbbench 7 months ago, it has been widely adopted across our engineering and sales teams as the definitive tool for testing database workloads. Today, we are announcing the availability of a new version of
dbbench, as well as a package of high-level tools to enhance it.
In this latest release, we enhanced both the flexibility and ease of use of the tool. We augmented capabilities of
dbbench and added a tutorial to help new sales engineers get started. In addition to these enhancements, we released a package of internal tools specifically designed to support high level workflows that use
dbbench. These changes improve our technical proof-of-concept (POC) process for our customers and open up
dbbench for new workloads and use cases. This version of
dbbench not only increases the performance and power of this benchmark testing tool, but also makes it easily accessible to anyone interested in using it.
dbbench in Action
One of the new features of
dbbench is the ability to display latency histograms during the final analysis to easily detect and understand outliers. In the example below, we can see that most of the queries executed between 0.2 and 0.5 ms, but there were outliers as high as 30ms:
To support more complicated and varying data,
dbbench can read arguments from a
.csv file and parameterize queries via the `query-args-file` parameter. The snippet below shows this parameter specified in a configuration.ini file for
[insert] query=insert into table colors (?) query-args-file=/tmp/colors.csv
In addition, the
query-results-file parameter allows
dbbench to chain multiple jobs together, and makes it possible to have external tools in between each of these jobs.
dbbench-tools is a package of sales engineering tools, used by the MemSQL team to easily describe and efficiently run a customer workload during a technical POC. The most important question to answer during a POC is, "How will MemSQL scale to more concurrent users?"
dbbench-scaler is one of several tools provided by the
dbbench-tools package to help answer this question.
dbbench-scaler runs a
dbbench-defined workload at different levels of concurrency to find the limits of the database installation.
For example, I was doing a POC where I wanted to determine how many concurrent inserts a small MemSQL cluster could handle. I set up a simple
[setup] query=create table test(a varchar(20), b float, c float, d float, \ shard key a (a), key b (b), key c (c), key d (d)) [teardown] query=drop table test [batch sized 128 multi-inserts] query=insert into test values (?, ?, ?, ?), (?, ?, ?, ?), …, (?, ?, ?, ?) query-args-file=/tmp/data.csv concurrency=1
dbbench-scaler with this configuration to determine that this small test cluster could sustain 100,000 inserts per second in batches of 128, as depicted in the top half of the chart below. In the bottom half of the chart, we see that using more connections allows us to insert more records per second (RPS) until we reach 32 connections, after which there is no benefit to using more connections and we see a hugely adverse effect on query latency:
$ dbbench-scaler --database=db --concurrency=1,2,4,8,16,32,64,128,256,512 --output out.png /tmp/test.ini
In this example, not only did
dbbench-scaler answer the question, "Will MemSQL handle the scale of many users," it also found the ideal number of users for this application.