Synchronization in PHP
Synchronization in PHP
Join the DZone community and get the full member experience.Join For Free
Download the Scale-Out and High Availability whitepaper. Learn why leading enterprises choose the Couchbase NoSQL database over MongoDB™ after evaluating side by side.
What does synchronization mean
We are talking about the process-related synchronization here. Synchronization is the the application of particular mechanisms to ensure that two concurrently-executing processes do not execute specific portions of a program at the same time [Wikipedia]. More in general, the specific portions are the ones that access data structures shared between different processes, and that should not be available to a process while another one is halfway through modifying them, thus keeping the invariants (which processes rely on in their logic) of such data satisfied.
For example, different processes can get the value of a common variable, increment it, and save this new value in the original location. If a moderately high number of processes is executed in time-sharing, two or more of them can obtain the value of the variable at the same time, resulting in only one increment for two executions. It is said that this kind of operations on data should be atomic: specific blocks of code should be allowed to run in their entirety before any other portion that touches the same data structure (essentially that modifies the same state) is executed.
A PHP script is inherently immune from internal synchronization issues, unlike other languages, because of its simple architecture. There is no native support for creating multiple threads that share the same PHP variables and that run concurrently in the same PHP script. You can use exec() to run an external program in background or another PHP scripts, but it does not share the scope of the original script (like an include()d file would do) nor this is generally a good idea if not for specific use cases that involve complex integration of external systems.
PHP scripts have a short life, and there is usually no need for multiple processes inside of them. Concurrent processes are commonly employed to perform tasks that wait for external resources or computations, so that if one process blocks while expecting a function's return value, other processes can use the cpu time for their own sake.
But beware that in PHP, the client waits for the completion of a generated page until the PHP script ends and the control is returned to the HTTP server; if you have business logic that is waiting for external events or resources and you want to return the control to the user in the meantime, you don't need a new thread (which in the current architecture you had to wait, too): you need to outsource this work. Zend Server Job Queue and, more simply, cron let you set up tasks for asynchronous execution. For example you can schedule a PHP script that performs heavy computation and that has to be executed once an hour. In my opinion, this is multithreading as intended in PHP.
Even if PHP is shared-nothing by nature and does not present classical race conditions on variables due to its limitations, every kind of shared resource is a candidate for synchronization issues. Since PHP scripts are short-lived and stateless, the resources that maintain state in behalf of PHP scripts are the target of this analysis: different scripts or even the very same script can be executed more than one time, concurrently. Thus, PHP code can interfere with itself as much as Java code does.
Sessions are not usually a source of race conditions. Every session is confined to a single client via a cookie used as a key, so you have to consider only the pages and the actions undertaken by a single user at the same time for an analysis of what can go wrong. The biggest issue, though, is the stale of data represented in different views while it is updated in the model, but you have to look in the Ajax world for a workaround due to the nature of the HTTP protocol.
Databases, at least the relational ones, guarantee atomicity of operations to a certain extent with the definition of transactions. Databases are the quintessence of application state, and you can take advantage of their ACID properties in a transaction. For example, if you really have to calculate a new key for a table by looking at the already existent ones:
SELECT MAX(id) + 1 FROM mytable;
INSERT INTO mytable VALUES (4242, 'The name', ...); -- 4242 is the new id
UPDATE mytable SET field = ... WHERE id = 4242;
at least wrap in a transaction this changeset so that you do not end up updating some other 4242th row which has been inserted by another instance of the PHP script after your SELECT. PDO throws exceptions (unlike mysql_*() functions which often fail silenty), so this example should work anyway when the INSERT fails. But why fail when there is nothing that prevent the row from being inserted with a valid id? This is not the user's fault. You can see the potential for race conditions when doing data-intensive queries in PHP scripts.
The file system is not immune from concurrency problems as well; in this case the rows are substituted by files and the primary keys by file names. The best solutions involve using a natural key for the filename, or generating an hash:
$filename = uniqid('prefix', true); // second parameter is 'more cool?'
Unless two filenames for uploaded files are generated at the same microsecond, the resulting ids should be different. Or, if you're paranoid:
$filename = uniqid(mt_rand(), true);
In sum, when you're expecting your applications to run many scripts every second, take your time to check the possibility that race conditions arise.
Opinions expressed by DZone contributors are their own.