Practical PHP Patterns: Metadata Mapping
Join the DZone community and get the full member experience.
Join For FreeThe intent of the Metadata Mapping pattern is to express implementation details, related a particular domain and Domain Model, as metadata of a general purpose library. In the sense intended here, metadata is related to the persistence operations (transferring objects back and forth from a database). These metadata is usually fed to a general purpose object-relational mapper.
Technically the term metadata is plural (of metadatum, data about data), but it is commonly used as an uncountable noun.
Why expressing metadata
Object-relational mapping is a difficult task to automate, prone to lots of potential bugs and undefined behaviors; expressing the domain-related peculiarities as metadata means that you are able to code only one ORM, and not have to repeat the same work in many custom Data Mappers, which are very boring to write and can't be transported out a specific application.
Custom Data Mappers were a cleaner solution for Domain Models with regard to employing Active Records, and they are advocated for example in Zend Framework books like Keith's Pope one. They are finally becoming obsolete thanks to the power of a declarative approach like this pattern, which tools like Doctrine 2 are based on.
Historicacally, Hibernate from JBoss was the first Data Mapper implemented as a generic ORM (it is a Java product). Doctrine 2 is the most famous PHP implementation, and it is in beta at the time of this writing.
The metadata we'd like to tell to an ORM are for example:
- which classes should be persisted at all.
- Optional names for the tables (it can use the class names.)
- Which fields form the primary key.
- The types of the different columns, particularly important in a loosely typed language like PHP.
- Which collaborators have to be persisted and via what means: foreign keys and additional association tables.
The metadata should usually not consist of code: non-standard behavior shouldn't be contained in them, as in general all the behavior like ineritance strategies and conversion of relationships is extracted in the generic ORM. Thus there are different formats we can use in place of PHP code: XML, annotations, YAML, INI...
Different approaches
There are two approaches to Metadata Mapping pattern, described by Fowler in his original book.
The first one is code generation: the metadata is processed to generate the source code of the mapping classes, for example a Data Mapper for every entity or aggregate root of your model (one for User, one for BlogPost, and so on). The ORM would theoretically not be necessary in production if the generation is complete enough.
Doctrine 1 used this approach in part, but it generated also the PHP code of the domain model itself from the Yaml mapping, as subclasses of Doctrine_Record. Still, Doctrine 1 was necessary to instantiate those classes and the solution wasn't so clean. Doctrine 2 is very different in architecture and goals.
The second approach is called reflective program, and consists in interpreting the mapping at runtime in the ORM's code, to open up correctly the objects via reflection (or a standard interface) and putting them in the database. The converse can happen: objects can be recreated from the union of metadata and database tables.
How it is used
The reflective solution is the common one nowadays, and Doctrine 2 borrows it from Hibernate in its own design.
Reflection is used to access the private fields to persist. Some critics point out speed problems of this technique, but keep in mind that your ORM is communicating with an external process or database machine at the same time of using reflection: it probably won't count much in the benchmark. Doctrine 2 however takes optimization seriously to the point that metadata internal classes (accessed very often) present an Api with public properties instead of methods to avoid every overhead in a crucial part (hydration of objects with data retrieved from the database).
An advantage of generated code is that it would be easier to debug, but it is usually a pain to maintain: every time you evolve or refactor a domain class you have to regenerate the Mapper classes. You can't customize this code either, because you would lost your changes at the regeneration time.
Advantages and (few) disadvantages
Of course we lose some expressiveness by specifying metadata instead of a programmatical behavior like the source code of a custom Data Mapper. But we gain very much: a fully tested ORM, like Doctrine 2 in the PHP case, with only some lines of added metadata to keep in sync with the rest of the code base.
Declarative approaches trading off completeness of functionalities (the absent ones are not used very often anyway) for developers time. But there are other advantages, such as the generation (and migration) of the database schema based on the metadata, and also of the proxy classes.
Ideally, the metadata mapping is the only point of strong coupling of your Domain Model with an external adapter, the ORM. It is of course part of the infrastructure, so keep it under version control along with the code!
Adding and removing fields or relationships, changing keys or refactoring is much easier because you do it declaratively instead of refactoring a specific mapper class. Note that automated refactoring tools are not to be trusted here: for example they usually ignore the mapping when you change a field name. So grep is your best ally.
Examples
The sample code of this article will present the different ways of specifying metadata for Doctrine 2, the most high-tech PHP ORM. The performance of the different methods are equivalent, since the metadata are read only one time into native PHP objects and then cached.
Metadata is a vast subject since all the different persistence implementations have to be driven by it, but we will look more at the types of metadata specification we can use instead of all the different metadata instances, which are best described in conjunction with the single features (for example, the inheritance patterns articles contain the description of the metadata related to subclassing.)
The simplest way to express metadata mapping in Doctrine 2 is via annotations, embedded in the docblocks and ignored from anything but the ORM:
<?php
/** @Entity */
class MyPersistentClass
{
/** @Column(type="integer") */
private $id;
/** @Column(length=50) */
private $name; // type defaults to string
//...
}
If you want to separate metadata from the class itself, you can use an XML configuration file instead:
<?xml version="1.0" encoding="UTF-8"?>
<doctrine-mapping xmlns="http://doctrine-project.org/schemas/orm/doctrine-mapping"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://doctrine-project.org/schemas/orm/doctrine-mapping
/Users/robo/dev/php/Doctrine/doctrine-mapping.xsd">
<entity name="Doctrine\Tests\ORM\Mapping\User" table="cms_users">
<lifecycle-callbacks>
<lifecycle-callback type="prePersist" method="onPrePersist" />
</lifecycle-callbacks>
<id name="id" type="integer" column="id">
<generator strategy="AUTO"/>
</id>
<field name="name" column="name" type="string" length="50"/>
<one-to-one field="address" target-entity="Address">
<join-column name="address_id" referenced-column-name="id"/>
</one-to-one>
<one-to-many field="phonenumbers" target-entity="Phonenumber" mapped-by="user">
<cascade>
<cascade-persist/>
</cascade>
</one-to-many>
<many-to-many field="groups" target-entity="Group">
<join-table name="cms_users_groups">
<join-columns>
<join-column name="user_id" referenced-column-name="id"/>
</join-columns>
<inverse-join-columns>
<join-column name="group_id" referenced-column-name="id"/>
</inverse-join-columns>
</join-table>
</many-to-many>
</entity>
</doctrine-mapping>
Don't be alarmed by the size: this mapping does much more than the annotations example's one.
A third way to specify metadata is via YAML, a format widely used in symfony-related software:
---
# Doctrine.Tests.ORM.Mapping.User.dcm.yml
Doctrine\Tests\ORM\Mapping\User:
type: entity
table: cms_users
id:
id:
type: integer
generator:
strategy: AUTO
fields:
name:
type: string
length: 50
oneToOne:
address:
targetEntity: Address
joinColumn:
name: address_id
referencedColumnName: id
oneToMany:
phonenumbers:
targetEntity: Phonenumber
mappedBy: user
cascade: cascadePersist
manyToMany:
groups:
targetEntity: Group
joinTable:
name: cms_users_groups
joinColumns:
user_id:
referencedColumnName: id
inverseJoinColumns:
group_id:
referencedColumnName: id
lifecycleCallbacks:
prePersist: [ doStuffOnPrePersist, doOtherStuffOnPrePersistToo ]
postPersist: [ doStuffOnPostPersist ]
Opinions expressed by DZone contributors are their own.
Comments