People part of a development team are different from cogs of a machine - every methodology approach that tream them as such is too simple a model to work correctly for long.
I think we can agree on this. Sometimes however, it's interesting to draw analogies between the inner workings of a computer (or of the Internet) and see if we can learn something that we can apply to our every day job.
The basic reason is that hardware and software systems are wildly successful, and have not crumbled (so far) under the weight of their own complexity. The topic of today, scheduling in operating systems, hasn't brought
The second reason is familiarity - when explaining a concept to a programmer or a geek, an analogy to technology is picked up earlier than a comparison to a philosophical current, a football team, or even ordinary life. That's why Joel Spolsky chose to explain the perils of multitasking by comparing people switching between tasks to CPUs juggling many processes via time-sharing; he could have chosen instead the workflow of a restaurant kitchen.
The final reason is that even as the context change, we can analyze why an approach used in software does not work for people, and viceversa. We expose the hidden assumptions of both contexts, machines and people, when we try to transport a practice from one to another. So Joel can tell us that programmers have a very costly context switch between one problem to another, while computers can mask that with sheer computational power.
The basic model of scheduling is that of N processes that must be run by a single station (for simplicity: there may be M stations, like in multi-core CPUs or multiprocessor machines.)
The job of scheduling then is to decide an order for running the processes, and the points in time when the running process should be suspended to leave space for the others. Since processes can take a long time to run, operating systems often use this time-sharing technique with respect to executing a process until it is completed.
Scheduling algorithms can be evaluated under several metrics, which may be in contrapposition from each other.
- throughput: the amount of processes that can be executed (and completed) in the unit of time.
- Latency: the time between the creation of a process and its actual execution (whether it's the time the process is started or is completed.)
- fairness: how much processes get an equal amount of execution time (or an amount proportional to their priority.)
Some minor properties:
- overhead: how much time is spent scheduling instead of working
- jitter: variations in the rate at which processes are completed
- deterministic execution: whether the scheduling can guarantee a process to complete before a determined instant in time.
Mainstream operating system (kernels) such as Windows, Mac and Linux employ scheduling systems that implement an high degree of fairness and rotate processes after quanta of execution in the order of milliseconds. If you don't rotate processes fast enough, some of them will give a slow response time to the user.
The problem with mainstream operating system is that they allow no soft or hard deadlines: a time specification in which the processes must complete. In the real world, we have many deadlines instead! Your user story has to be completed before the end of the iteration, or before the release day. Even when the deadline is soft (the client will wait some days more, with a minor annoyance) it still exists.
Real time OSes
If you have to manage stuff with real deadlines, such as mechanical hardware, you have to resort to Real Time Operating Systems. The scheduling algorithm of RTOSes sacrifice many properties (such as low overhead) to guarantee that a task can be completed in the allotted time when a scheduling that satisfies it exist (which means, in our analogy, that you're not asking to deliver 10 features for tomorrow).
Algorithms from this field substitute the simple round robin (process rotation with no priorities) with more complex schedulers, usually for periodic tasks that repeat themselves with a fixed frequency.
For example, the Rate-Monotonic scheduling allows a specification of static priorities for the tasks, and guarantees that deadlines can be met if the utilization is under a limit.
The Earliest Deadline First always executes the process with the nearest deadline; it results in possible overhead as it doesn't take into account context switches, but in some conditions provides the highest throughput. Of course, when there is an overload everything may break.
Keeping up our analogy, EDF is the equivalent to firefighting: everyone on the team changes continuously priorities as new tasks come up that have to be completed as fast as possible. Nothing can't be guaranteed if there are too many tasks to be juggled and utilization is 100%; and there are always too many tasks; context switches reign as they aren't assigned a cost in the model. Most of all, the property of completing urgent tasks correctly always takes priority over everything else: there's no room left for Important, Not Urgent tasks.
Let's take a look at how scheduling works for people with respect to the algorithm we have seen.
First of all, there is (almost) no preemption of tasks: a team member may execute one until it ends. There is still a limited number of CPUs, as in the ideal case every person or pair is taking up a single task at the time; and there are in any case WIP limits.
So how do you deal with hard deadlines when they come up? You add to the board a fast lane reserved for urgent items, or emergency subteams that deal with emergencies, but leave the other people free to work on Important, Not Urgent tasks.
And with soft deadlines instead? You work to keep the cycle time (read average process completion time) low so that new processes have low latency in any case. Their priority may then change from 0 (because they weren't there) or from low to high, and they can enter execution quickly. You don't have to shelve running tasks if they will finish anyway in
It's interesting to see how operating systems schedule tasks, because they have the same limitations we have in executing only a limited number of them concurrently. However, it's fun to expose their assumptions in scheduling and see why the desired properties and model parameters of knowledge work make us choose substantially different algorithms.