Database synchronisation techniques - The Good, the Bad, and the Ugly
How do you synchronise update invocations to your database? With the right technique, the question becomes irrelevant, and makes no sense anymore
Join the DZone community and get the full member experience.
Join For FreeThere exists two types of synchronisation techniques to make sure you avoid concurrency issues and race conditions when updating database records. These are as follows.
- Optimistic locking
- Pessimistic locking
Optimistic locking is usually performed by using native database mechanism such as RowVersion in SQL Server, TimeStamp in MySQL, or ETag in Cosmos DB. These are special field types that changes automatically as the record is updated, allowing you to inject the values of these fields as a criteria to your update invocations. For an example of how this works, imagine the following schema.
xxxxxxxxxx
create table languages (
locale varchar(5) not null,
language varchar(2048) null,
description varchar(2048) null,
version RowVersion not null,
primary key (locale)
)
In the above table there are two data columns, one primary key, in addition to the 'version' column. If I update both of my data fields simultaneously, using something such as the following ...
xxxxxxxxxx
update languages set language = 'foo', description = 'bar' where locale = 'xyz'
... I have effectively implemented a piece of code that is destined to experience race conditions at some point in the future. This is due to that multiple users might fetch the record at the same time, and both might edit one of the fields and save their edits. This is because my update invocation above is updating both data fields at the same time. Imagine a record such as the following.
- locale = 'en-US'
- language = 'English'
- description = 'British English'
The above record is obviously not correct, and needs editing, so user A reads the record and starts editing it. Before he saves his changes though, user B also sees the bogus data, reads the record and starts editing it too. When user A clicks save, all fields are updated. When user B clicks save, all fields are updated too, overwriting whatever changes user A applied. Maybe user B only edited the 'language', while user A edited the 'description'. However, once user B clicks the save button, user A's changes are gone.
The ugly
To prevent the above problem, we could add the version column as a part of our update SQL statement, resulting in something such as the following.
xxxxxxxxxx
update languages set language = 'foo', description = 'bar' where locale = 'xyz' and version = 'whatever'
This will result in that the record is not updated if somebody else updated the record after we fetched its existing values. Normally this is implemented adding a trigger on the 'version' column, that changes every time the record is updated. This is called optimistic locking, or "the ugly", because it gets nothing once the movie is over.
The bad
Pessimistic locking is performed by literally locking the record on the server as the record is fetched for being changed. Visual Source Safe used this technique to prevent multiple users from modifying the same code file at the same time some 20 years ago. By default, all files in your code base was read only, and if you wanted to modify a file, you had to choose "Check out" to be allowed to edit it. Only one person could check out a specific file at the same time. Yes I know. What can I say, this was a decade before Linus created GIT, when computers could still measure RAM in kilobytes, and yes I know I am old ... :/
To illustrate the problem, realise a colleague of mine went on vacation 20 years ago, and had checked out some 3-4 code files to work on these before he left. He forgot to "check in" these files before he left though, so nobody else could work on these files before my colleague came back from vacation 3 weeks later. We of course had a bug in one of those files, and I had to edit one of these files, but we couldn't, because the file was already checked out by my colleague. This is pessimistic locking, or "the bad".
The good
The good in this equation is "partial record updates", and implies only updating parts of your record. For the record, before you start going through your CosmosDB Data Access Layer to implement this, please realise that CosmosDB doe not have this feature. Microsoft, are you listening?
If you want to do this using document based database systems, you'll have to choose MongoDB or CouchBase for instance. To understand how to implement this, imagine our above A and B users again, and remember that user A only changed the 'description' column, which would result in an update SQL statement such as the following.
x
update languages set description = 'bar' where locale = 'xyz'
User B edited the 'language' field, which would result in something such as the following.
x
update languages set language = 'foo' where locale = 'xyz'
Hence, regardless of which update statement is executing first, no race conditions occurs, and both users are happy toddlers, getting their changes persisted into their database. With partial record updates, we've effectively eliminated an entire axiom of database synchronisation issues, and there cannot exist race conditions anymore - At least not race conditions that are something to actually worry about. Hence, I present to you; The Good!
To implement this in your own solutions, you'll have to track changes applied as you apply changes to your records. For a GUI solution in Angular for instance, you can probably dissect my code in Magic, which implements this idea. If you look through the frontend code of Babelfish for instance, you will see that I subscribe to the (change) event on my input fields, and as changes are applied, I add the field to a list of "to be updated fields". Then as I transmit my payload to my HTTP backend, I only pass in the fields that were changed by the user, ignoring everything that wasn't changed.
Don't be bad, don't be ugly, use partial record updates instead :)
Opinions expressed by DZone contributors are their own.
Comments