Concurrency Handling Techniques in ADO.NET
Concurrency Handling Techniques in ADO.NET
This article looks at what concurrency violations are; the types of concurrency violations, the strategies to mitigate such issues, and the performance issues involved.
Join the DZone community and get the full member experience.Join For Free
Concurrency handling is a technique that allows you to detect and resolve conflicts that arise out of two concurrent requests to the same resource. It is a technique that allows multiple users to access the same data at the same point in time and yet ensures the data remains consistent across all subsequent requests. This article would discuss how we can what concurrency is, the types of concurrency, the techniques involved in implementing each of them, the pros and cons of each of the concurrency handling techniques, the performance issues involved, and when we should use one over the other in our applications.
Interdependent Transactions – A Real-Life Example
Imagine a situation in which you were to transfer funds from your bank account to your friend’s account. Now at the time when the transaction is in execution, imagine what would happen if you were to check your account balance. Or, imagine what should be displayed as an account balance while your friend is checking his account balance while the funds transfer is in progress. Also, what would happen if your friend is trying to withdraw funds from his account while the funds transfer is in progress?
These were examples of interdependent transactions i.e., transactions which are dependent on one another. Other typical examples can be that a particular record has been deleted by a user while the same record is being updated by another user. To avoid such conflicts, database and record level locks are implemented. Note that the amount of time it takes for release of a database lock after it was set previously to mitigate such conflicting issue primarily depends upon the transaction time. As such, it is recommended to have your transactions as short as possible (fewer statements) in order to keep the lock time minimal. Also, if transaction takes extensive amount of time then there would be serious locking issues as other users might want to access the same data. Working on the data without transactions may help but it does not guarantee that the updates made by you are the latest.
Basically, there are three approaches to handling concurrency conflicts - pessimistic concurrency control, optimistic concurrency control and "last saved wins". In the first case, a particular record is made unavailable to the users from the time it was last fetched - until the time it is updated in the database. In optimistic concurrency, the last updated value is saved. In other words, the last saved record, "wins". In this mode, it is assumed that resource conflicts among concurrent users are unlikely, but, not impossible. In essence, it is assumed that when you are updating a particular record, no other user is updating the record at the same point in time.
If a conflict occurs while a particular record is being updated, the latest data is re-read from the database and the change is re-attempted. The update checks for any concurrency violation by determining the changes to the record from the time it was last read for the update operation to be performed. In the "last saved wins" situation, no checks are made for concurrent updates for the same record. The record is overwritten - any changes made to the record by other concurrent users are simply ignored as they are overwritten. ADO.NET uses optimistic concurrency mode as its architecture is based on the disconnected model of data through the usage of DataSets and Data Adapters.
In pessimistic concurrency, a check is made to see if any changes to a particular record have been made while it is being updated. In essence, pessimistic concurrency uses ROWVERSION or TIMESTAMP to check for updates to a particular record by other concurrent users. In the pessimistic concurrency mode it is assumed that a conflict will arise while concurrent data updates are taking place – so, locks are imposed on the requested data to ensure that the access to the data is blocked to other concurrent users. On the contrary, optimistic concurrency doesn't check for concurrency violations or concurrent updates, the last updated record is the one that is saved last to the database - and hence, it is the recent one too!
Handling Concurrency Conflicts in the Connected Mode
Concurrency conflicts can be resolved in ADO.NET while it is working in the connected and in the disconnected modes. In the connected mode, you can resolve concurrency conflicts and ensure data security and integrity by using the transactions efficiently. So, what is a transaction, anyway? A transaction is actually a group of operations/statements combined into a logical unit of work that is either guaranteed to be executed as a whole or rolled back. Transactions help ensure data integrity and security.
Transaction Isolation Levels
SQL Server follows the following isolation levels:
- Read Committed -this is the default isolation level and it specifies that transactions attempting to update the data would be blocked until the lock acquired on the data by other concurrent threads are released
- Read Uncommitted - in this mode, the transactions are not blocked and there are no exclusive locks on the data
- Repeatable Read – this states that all transactions that are currently being executed in this isolation level cannot read data that has been modified but not yet committed by other concurrent threads. Also, other concurrent executing threads cannot modify the data that a thread has thread but not yet modified
- Serializable – this is similar to the repeatable read isolation level with the addition that any new rows that tend to violate data consistency for the currently executing threads are aborted
- Snapshot – this offers a perfect balance between data consistency and performance. You can get a snapshot of a previous copy of the data that was last modified whilst in the middle of a transaction
You can start a transaction in ADO.NET using the BeginTransaction method on the connection instance. This method returns an object of type SqlTransaction. You should then commit or rollback a transaction a transaction depending on whether or not the transaction was a success or a failure. It should be noted that you have to have an open connection to work with transactions in ADO.NET. The following piece of code illustrates how you can implement transaction management in the connected mode to enforce data consistency.
A better approach would be to use the TransactionScope class of the System.Transactions namespace for flexible transaction management. Here is an example:
Though transactions are a good choice for preserving database integrity and data consistency, they should be used with utmost care. It should be noted that transactions hold locks and may cause contention issues; they should be as short as it is possible. Also, transactions require an open connection and hence they consume resources for a larger amount of time.
Handling Concurrency Conflicts in the Disconnected Mode
In the optimistic concurrency model, a check is made to see if the record being updated is the most recent one, or, if it has been modified by any other concurrent user. In order to do this, the data provider in use maintains two sets of data – one is the original, i.e., it contains the data that was last read from the database, and the other is the most recent or the changed data.
A check is then made using the WHERE clause in the update or delete statement to see if the data in the original set matches with the one in the database. If it does, the update or delete is performed – else, a concurrency violation is reported and the update or delete statement is aborted. Here is a typical example of an update statement with optimistic concurrency turned on:
Note that the while the most recent data is used in the SET statement, the WHERE clause of the UPDATE statement in the example shown above would contain the original data that was read from the database prior to performing the update operation. With optimistic concurrency turned off, the same update statement can be re-written as shown below:
In essence, when using optimistic concurrency, a check is made to see if the data being updated of deleted from the database is the most recent one. This works fine as long as your queries contain fewer fields for an update. However, if you have long queries with a large number of fields, this approach would work, but at the cost of performance.
A Better Approach
A better approach in such cases would be to have a TimeStamp column in each of your database tables. Note that the TimeStamp column contains binary data that is unique within the database. You could re-create your employee table as shown below:
As and when a record becomes dirty, i.e., is modified from the time it was last read, the value in the TimeStamp column would change. You can then just check for whether the value of the TimeStamp column for a particular record has changed from the time you last read the record. Here is how you can check for concurrency violations now:
Note that the value for the TStamp in the WHERE clause would be checked with the one in the original version of data you have with you. If no change is made to the record being updated from the time it was last read from the database, the value of the TStamp column for that record would remain the same. You can use this approach irrespective of the number of fields you have in your update statement. And, you are done!
So, What Then Is the Right Choice?
It is advisable not to have transactions that run for a long time. If you need to access large data in a transaction that needs to be sent to the client, then you can have that operation at the end of the transaction. Transactions that require user input to commit are also a degrade factor, ensure that explicit transactions are wither committed or rollback at some point of time. Also, you could find a boost in the performance if the resources are accessed in the same order. Proper utilization of isolation levels helps minimize locking.
Concurrency conflicts can also be handled by locking mechanisms – the database table or the record to be updated can be locked from other users until the transaction is complete in its entirety. The major pitfall to this approach however is that a continuous connection needs to be maintained with the underlying database and that the wait times can be significantly large when the data to be handled is more with more concurrent users connected. Hence, this is not a recommended approach in high data driven applications with large number of concurrent users connected.
Choosing the right type of concurrency is a tough proposition – you need to balance perfectly between performance and data availability. You need to select a strategy that would ensure that you require fewer locks on your resources and at the same time you should be able to get the right data with minimal locking time involved. There isn't a common rule which insists one over the other, as each has its pros and cons, instead you need to balance between your normal and critical scenarios in your application and then chose a one over the other.
Note that SQL Server follows a pessimistic concurrency model by default - it assumes that a conflict can arise when a read request comes for a piece of data and doesn't allow other transactions to read the data unless the current session commits - this is what we say a writer block. In the optimistic concurrency model an assumption is made that parallel updates to the same piece of data may or may not occur.
The ability of ADO.NET to work with data in a disconnected mode is great - you can have an in memory representation of your data that even comprises the data relationships. The disconnected model of ADO.NET has brought up revolutionary changes in the way applications interact with the database. You can use ADO.NET to store disconnected in memory collection of data locally. However, it brings in major concerns too - concurrency violations.
This article has had a look at what concurrency violations are; the types of concurrency violations, the strategies to mitigate such issues, and the performance issues involved. Happy reading!
This article was originally published here.
Opinions expressed by DZone contributors are their own.