PHP has a particular approach to the generation of web pages (and by consequence of the generation of web applications). Its shared nothing nature makes PHP scripts bound to the incoming HTTP request they respond to, so that they not only cannot keep in-memory data structures in between different requests without recurring to sessions or caches, but they cannot run particular routines at a predefined date or time, basically because there is no guarantee that an HTTP request will arrive just to trigger them at the right hour.
The shared nothing architecture is also one of the factors that make PHP very scalable, since PHP scripts can be replicated on different servers without incurring in stale data kept in the memory of the different machines. When we have to move processing outside of HTTP requests or schedule actions for execution at a particular date or time, we must recur to solutions that run outside of the PHP environment (but which may only wrap the PHP scripts which execute the real action.) While there is usually a pool of processes maintained by the web server to quickly respond to requests, they only respond to external events and have not autonomy.
At a school were we managed the electronic platform for students votes and attendance, we had setup an automatic PHP script to import the student's data from the local MySQL database to the web site's one. This script had to be run daily and a secretary was in charge of accessing the local server at midday and start it, then checking it has finished its work correctly.
Although having a human manually run a script is usually a reliable solution, there are many issues with this approach. First, the person in charge of the task may forget to accomplish it, or do it at the wrong date or time. Furthermore, human time is much more costly than a machine's one and a person usually can spend it in a much more productive way than just pushing a button.
cron is a classical job scheduler for Unix platforms. Although its configuration usually needs root permissions to be modified, many shared hosting service provide cron support via setup from a control panel.
The cron configuration allows to run a command at a specified time, given that the cron daemon is running. For example:
0 12 * * * php /var/www/command.php
will run command.php every day at 12:00. This is the standard automated solution to deal with scheduled, periodical scripts execution.
The need for cron-like features in shared hosting services which does not provide access to this service has paved the way for hacked up solutions, like the Poormanscron module for Drupal, which has been subsequently integrated in the core of Drupal 7. Basicaly, this module checks the time at which HTTP requests are received and trigger cron actions if it matches the definitions (but it requires that at least an HTTP request is received in the given time interval.)
Gearman is a solution which is gaining momentum in the PHP world, and I have just attended a talk about it at the phpDay conference held by Felix de Vliegher from iBuildings. In the words of its official website, Gearman
provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.
The name Gearman is the anagram of manager, because it tells other what to do, without actually doing anything by itself. When a German job server is running (again this is a solution that shared hosting environments may find difficult to follow), it can be accessed on the client or worker side via APIs in nearly any language, and of course in PHP.
Is Gearman a substitute for cron? Not yet. The Gearman infrastructure is designed for outsourcing tasks to a set of workers, so that PHP scripts can return immediately a response to the user without waiting for an heavy operation like file upload processing to finish. However, we are still talking about asynchronous processing as long as the performed operations are not executed while the HTTP response is being sent to the web client.
Surely you can accomplish a bit of the same functionalities of Gearman via forking the PHP process, but Gearman is a standardized framework which decouples the workers (which perform a task) from the clients, resulting in a much more versatible and scalables solution. And the code to use Gearman from the client point of view is as simple as:
$client= new GearmanClient();
echo $client->do("operationName", "Argument");
The use cases for Gearman are many: Zend Framework is now using it for executing post-commit hooks without having the committer experience delays (they're actually post-receive hooks since the source control system used is Git.) Gearman supports both synchronous calls, like the one shown in the code sample, and background calls which return immediately and are executed in background.
PHP is born in the HTTP request environment, but with the aid of external processes it is not limited to this scope. If you know your toolbox, you can join the scalability of PHP without sacrificing the ease of asynchronous processing.