Ignorance can be bliss. Well, it can be blissful until your application encounters production loads and begins to wail. Fortunately, you load test your application so the temper tantrum is contained to your QA environment - for now. Nevertheless, you have a problem. What could be wrong? You have built a very simple microservice with rock solid technologies. You even have second level caching to minimize the DB bottleneck. But for some unknown reason your application is being very unDude-like.
The logs report the dreaded mySQL deadlock exception (com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction). What’s more - it is reporting that the wayward transaction involves an INSERT. How could two unrelated threads attempt to access the same resources that do not even exist yet? Welcome to gap locks.
Gap locks ensure that the same SELECT returns the same data within the same transaction by locking the gap on the index employed by the query. The lock starts as a shared lock, but can elevate to an exclusive lock upon INSERT/UPDATE by a transaction. Obviously, these exclusive locks could follow the same pattern as all other fatal deadlocks. Gap A is locked by transaction #1 while trying to lock Gap B. Concurrently, transaction #2 is seeking the locks in reverse order.
This all sounds very deep in the woods for operations that should work out of the box. Why is this happening? Our first clue is the assurance “that the same SELECT returns the same data within the same transaction.” It appears that our transaction isolation level is at least REPEATABLE READ or SERIALIZABLE. These isolation levels have always seemed too restrictive for the typical application use-case. Aside from edge cases, any application that requests the same data from its database multiple times in the same transaction is probably doing twice as much work. 99.9% of the time the data should be maintained in memory after the initial retrieval for at least the duration of the transaction.
Now, we have a solution at hand. Just change the isolation level to READ COMMITTED. As a good Hibernate citizen, we set the “hibernate.connection.isolation” to 2 (java.sql.Connection.TRANSACTION_READ_COMMITTED) or less and then fire back up our load tests. Our smiles turn to frowns again, when the same old errors flash across our console. Sadly, Dropwizard does not honor the ORM’s configuration.
Ugh! we’re stuck again. This is starting to become a problem. What is going on? Well, it appears that Dropwizard has its own way of overriding the default isolation level. The switch that we want is the “defaultTransactionIsolation” on the io.dropwizard.db.DataSourceFactory. This property accepts the values NONE, DEFAULT, READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, and SERIALIZABLE. Once we modify our Dropwizard configuration file with the new data source property, we are ready to pummel our application again.
Assuming that you have no other problems, it should be all green for your tests. Of course, we could have brute forced it and changed the isolation globally in the database. This seems a little heavy handed. Plus, it is more emphatic for each application to state its desired level of isolation.
Please be warned that there is one more gotcha. If any of your databases are running statement based replication, you need to change the BINLOG_FORMAT to either ROW or MIXED. The default of STATEMENT only works for the REPEATABLE READ and SERIALIZABLE isolation levels. Any transaction with an isolation lower than REPEATABLE READ will receive BINLOG_FORMAT errors.
To read more about gap-locks, see the mySQL documentation - http://dev.mysql.com/doc/refman/5.6/en/innodb-record-level-locks.html.
To browse all the possible data source properties for a Dropwizard database configuration, see their reference documentation - http://dev.mysql.com/doc/refman/5.6/en/innodb-record-level-locks.html.
David Small is a Senior Software Engineer at Jibe.