Contributing to Paratest
The Web Dev Zone is brought to you in partnership with Mendix. Discover how IT departments looking for ways to keep up with demand for business apps has caused a new breed of developers to surface - the Rapid Application Developer.
I've already written about my experiments with Paratest. Paratest is a PHPUnit wrapper that allows you to run tests written for PHPUnit in parallel, making us of multiple processes running on the same machine. In a world where cycle time is an important metric, trading resources to get the test suite to finish earlier is a net gain; especially when you're stepping on unstable stones and run the suite very often.
Our clock speeds aren't going to increase anymore; you should already know that the direction in which the CPU architectures are heading is to provide multiple cores instead. A PHPUnit process executed serially can get to 100% CPU, while executing two processes can theoretically double your performance, in essence halving the time taken for running your test suite.
Moreover, while unit tests do not touch external resources, integration and end-to-end ones are relying on the response time of a database, network calls (even towards the same machine), files to be written, and so on. This means the CPU is not being fully utilized, because a percentage of the time is spent waiting for these resources to become available or to respond. The presence of an high iowait makes room for a number of processes greater than the number of cores available.
In some cases, the resources you are using do not depend on the local machine, but are provided remotely in unlimited number. Facebook runs tests in parallel on multiple machines, but for an example available to any wallet check out SauceLabs: providing to you many different Selenium Servers from their cloud, so that you can run tests on N different browser instances at the same time.
The thing you have to keep in mind is that tests must be totally independent from each other to be executed. Clearing the whole database before a test or counting the rows of an entire table aren't a suitable option when parallelizing, as tests would interfere with each other.
Paratest delegates to a Runner object the execution of a test suite, and collects log files to aggregate the results into a single count of passed and failed tests (among with other statistics such as assertions, errors and time).
The default runner creates a new php process for each test class (or even test method with the --functional configuration). This is fine for large scale tests that take in the order of tens of seconds to be run, but there is indeed a slow down with unit tests. This decrease in performance is noticeable whenever there is a sizeable PHPUnit bootstrap file, which is executed again for each of the tests.
For this reason, I contributed a new Runner, named WrapperRunner. This Runner executes only a fixed number of processes (which I called Workers, with common terminology) and that take PHPUnit commands on their standard input:
$ bin/phpunit-wrapper phpunit tests/MyTest.php ... phpunit tests/OtherTest.php ...
Since the bootstrap file is require_once()d into the process, it is only executed once for each Worker, so if you have 1000 tests executed in 5 processes it will be executed 5 times instead of 1000. The tests are capable of reusing the same bootstrap because they previously ran serially after a single execution of it: if they are capable of running in parallel at all, they should share the bootstrap flawlessly.
I will probably use Gearman in the future to manage workers as Behat does, but for now this is a very easy-to-install solution: only PHP code.
With this contribution, Paratest gets back to a comparable time to PHPUnit for unit tests, while before it was doubling the (short) time required. The more cores we get, the more this time could go down. The WrapperRunner is currently an open PR.
Another contribution from Dimitris Baltas and that I ported also to the new WrapperRunner regards providing an environment variable named TEST_TOKEN. If tests are using a shared resource such as a database table, they cannot easily run in parallel if they are working on the same data or reading globally, for example counting the documents in a Mongo collection
The TEST_TOKEN shuld be used to map each process to a different instance of the shared resource: for example, a set of schemas database1, database2, ..., database5. The token is guaranteed to be given to a single test at any time in both Runners. This pull request is open too and will be merged into Paratest soon.