Refactoring away from spaghetti PHP
Join the DZone community and get the full member experience.
Join For FreeThis article implements the Beginner pattern.
Sometimes you have to take a step back from discussions on coupling, cohesion, patterns and katas to give some training to the ones of us that have a procedural mindset. With this article I hope to provide some initial tips for the members of the PHP community that are ready to abandon the concept of the OneSingleProcedure(TM) to embrace the object world. In particular, to list some specific concerns that standard PHP applications have and that should be kept in mind while reengineering an existing system or creating new objects; and to keep an eye on (unit) testability, which is greatly improved by the introduction of objects over procedures or just spaghetti.
The OneSingleProcedure(TM)
The old style PHP programmer, when faced with a new feature, writes this kind of script:
<?php $connection = mysql_connect(...); // not really different with PDO $parameter = $_GET['name'] ? $_GET['name'] : ''; if (!validate($parameter)) { echo "<p>Error!</p>"; exit(); } mysql_query("INSERT INTO ... "); echo "<p>Success!</p>";
The point of the critique is not that mysql_*() is unsafe, but that this script mixes together many different concerns that PHP programmers repeatedly deal with and that they must learn to keep separate:
- the HTTP domain: GET and POST parameters, headers, and by extension printing a response in UTF-8
- database (and data source in general) access: connections, queries, or grabbing data from web services
- domain logic like validation and application-specific code
I don't suggest to build an Hexagonal Architecture in a day, but to apply basic object-oriented decomposition to transform this typical script into a real script of objects put on the stage to talk to each other.
Decomposition
Decomposition is the knife that we use to cut an object to analyze in many pieces and putting it back together after having understood its parts. This is Motorcycle Maintenance 101: when applied to code, it can lead us to modeling a problem with several types of basic entities:
- E/R modeling leads us to divide a problem into database tables and their rows. This is something PHP developers are usually expert in.
- Procedural decomposition (top-down) decompose the OneSingleProcedure(TM) in smaller ones, which in turn are composed of multiple function calls. Usually, this decomposition is performed to extracting code to reuse, like in the validate() method we saw earlier, and is stricly time-based: each subroutine is a step to perform.
- Parnas decomposition, which is the kind we should try to apply in objects and classes, is based on information hiding: each part hides a design decision or a technical concern; these parts may individually change in the future without affecting the whole, or be reengineered by themselves when the change request is aligned with the existing decomposition.
Thus in the case of our script, there are multiple concerns that we could separate in objects (and in bigger cases in layers). The problem with database decomposition is that the schema assumes the *utmost* importance and logic effectively disappears. Procedural decomposition instead has the defect that you have only one axis of decomposition on where to align your subroutines: time.
But I just wanted to insert a row in the posts table!
No, you didn't; you wanted to:
- map some HTTP inputs like text and binary files into a set of in-memory data structures like arrays and other variables.
- Validate this input to check its conformance to domain rules (such as 'The text of a post must not be empty')
- Change the persistent state of the application by inserting a new row.
- Build an HTML response for the user's browser.
These four concerns (concretely speaking HTTP, domain rules, SQL and HTML) should never be mixed together in the same file; a strict separation is a facilitating (but not sufficient) condition for easily deal with maintenance and change the existing code without headaches. The failure of separating these concersn leads instaed to the impossibility of unit testing the code in isolation.
<?php class ForumPostsTest extends PHPUnit_Framework_TestCase { public function testAPostIsSavedAfterValidationAndAParagraphResponseIsShown() { // apart from the overly long test name, what code could I write in order to test newpost.php? } }
Conclusion
As a rule of thumb, keep in mind that if you're writing any two items from the HTTP, validate(), SQL, HTML list in the same sourcefile, something has gone wrong. It takes time to forget the procedural mindset and start decomposing responsibilities instead of data; emulating the existing architectures and their layering is not a point of arrival but a good start for writing something more understandable than a 4000-line script.
Opinions expressed by DZone contributors are their own.
Comments