DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Practical PHP Patterns: Unit of Work
The Unit of Work pattern is one of the most complex moving parts of Object-Relational Mappers, and usually of Data Mappers in general. A Unit of Work is a component (for us, an object with collaborators) which keeps track of the new, modified and deleted domain objects whose changes have to be reflected in the data store. At at the end of a transaction the Unit of Work, if used correctly, is capable of producing a list of changes to perform on the data store, solving concurrency or consistency problems, and avoiding too many redundant queries in the relational case or a chatty communication in the schemaless one. As I've already said, the Unit of Work pattern is usually not employed alone but as part of a Data Mapper, which provides a different interface to the internal client code and mixes up this pattern with several other ones. The minimum transaction that a PHP Unit of Work performs is usually an HTTP request, or a session composed by more than one request in case the domain objects can be saved in an intermediate store (like $_SESSION or a cache of any kind). Being able to serialize objects in a store and reattaching them to the Unit of Work during subsequent requests is not a trivial problem. Advantages The power of a Unit of Work resides in the fact that the actual database transaction is only performed (and kept opened) when the commit() method of the Unit of Work is called, while until that moment there is ideally no use of the database connection. This paradigm is called batch update. Objects stored in a Unit of Work have usually an associated state, like: new (which correspondes to INSERT queries during the batch update) clean (no SQL queries have to be issued since the object has been retrieved and not modified) dirty (UPDATE queries) removed (DELETE queries) There are different strategies for detecting changes to the object graph. The simplest strategy is comparing objects with a clean copy kept in memory (while it is usually not performance-wise to compare them with the database.) A more complex solution is having a specific interface which is implemented by the objects, so that they can manage their state and declare they are dirty or have to be removed. This implementation choice introduces a dependency from the domain layer to the infrastructure one, thus I prefer heavier approaches like the former, which is equivalent to generate a diff with your source control system of choice, but on the object graph instead of a codebase: the source files are not responsible for diffing themselves. Furthermore, the Unit of Work decoupling from the database state introduces an upper level of management, that makes us able to rollback changes if some constraint are not satisfied, or the computation has produced an error. In PHP, the client code can simply throw the object graph away, and the partial Unit of Work changeset is forgotten in the next requests. Issues While decoupling the object graph from the data store to perform custom computations is a comfortable possibility for the client code, at the same time it can be an issue that introduces stale data. The more the objects are kept in the Unit of Work, the more the data store is prone to external concurrent modifications inconsistent with the in-memory graph (for example updating fields with different values than the ones modified in this very session.) Either a optimistic or pessimistic locking mechanism has to be introduced when the scope of the object graph is longer than the few seconds necessary of producing an HTTP response, or even less than that when the traffic is higher. Injecting the Unit of Work in the domain objects so that they can track their state can be problematic and too much an invasion of the domain layer. Usually the problem is solved the other way around: when the objects are passed to the Object-Relational Mapper (almost always implemented as a Data Mapper and not as an Active Record), it delegates part of the logic to the Unit of Work, which is a first-class citizen and can be tested independently from the other components of the library. The alternative to the inherent complexity of the Unit of Work pattern is saving an object at the moment it is updated. This solution is problematic because either the client code has to explicitly call save() methods, or queries (read modification to the data store in case of non-relational model) have to be performed at the very time of an atomic change, for instance issuing multiple UPDATE statements, one for every time a field is modified. Example The sample code of this article is the internal API of Doctrine 2. The actual Unit of Work code is dependent on the strategy adopted to detect changes to domain objects, but the interface exposed to the Entity Manager is always the same and should provide a panoramic of an Unit of Work's responsibilities and features. In this implementation, the methods persist() and remove() are used to introduce new objects to the Unit of Work or to schedule something for deletion from the database, while commit() executes a batch update on demand. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel * @internal This class contains highly performance-sensitive code. */ class UnitOfWork implements PropertyChangedListener { /** * An entity is in MANAGED state when its persistence is managed by an EntityManager. */ const STATE_MANAGED = 1; /** * An entity is new if it has just been instantiated (i.e. using the "new" operator) * and is not (yet) managed by an EntityManager. */ const STATE_NEW = 2; /** * A detached entity is an instance with a persistent identity that is not * (or no longer) associated with an EntityManager (and a UnitOfWork). */ const STATE_DETACHED = 3; /** * A removed entity instance is an instance with a persistent identity, * associated with an EntityManager, whose persistent state has been * deleted (or is scheduled for deletion). */ const STATE_REMOVED = 4; /** * Commits the UnitOfWork, executing all operations that have been postponed * up to this point. The state of all managed entities will be synchronized with * the database. * * The operations are executed in the following order: * * 1) All entity insertions * 2) All entity updates * 3) All collection deletions * 4) All collection updates * 5) All entity deletions * */ public function commit() { // Compute changes done since last commit. $this->computeChangeSets(); if ( ! ($this->_entityInsertions || $this->_entityDeletions || $this->_entityUpdates || $this->_collectionUpdates || $this->_collectionDeletions || $this->_orphanRemovals)) { return; // Nothing to do. } if ($this->_orphanRemovals) { foreach ($this->_orphanRemovals as $orphan) { $this->remove($orphan); } } // Raise onFlush if ($this->_evm->hasListeners(Events::onFlush)) { $this->_evm->dispatchEvent(Events::onFlush, new Event\OnFlushEventArgs($this->_em)); } // Now we need a commit order to maintain referential integrity $commitOrder = $this->_getCommitOrder(); $conn = $this->_em->getConnection(); $conn->beginTransaction(); try { if ($this->_entityInsertions) { foreach ($commitOrder as $class) { $this->_executeInserts($class); } } if ($this->_entityUpdates) { foreach ($commitOrder as $class) { $this->_executeUpdates($class); } } // Extra updates that were requested by persisters. if ($this->_extraUpdates) { $this->_executeExtraUpdates(); } // Collection deletions (deletions of complete collections) foreach ($this->_collectionDeletions as $collectionToDelete) { $this->getCollectionPersister($collectionToDelete->getMapping()) ->delete($collectionToDelete); } // Collection updates (deleteRows, updateRows, insertRows) foreach ($this->_collectionUpdates as $collectionToUpdate) { $this->getCollectionPersister($collectionToUpdate->getMapping()) ->update($collectionToUpdate); } // Entity deletions come last and need to be in reverse commit order if ($this->_entityDeletions) { for ($count = count($commitOrder), $i = $count - 1; $i >= 0; --$i) { $this->_executeDeletions($commitOrder[$i]); } } $conn->commit(); } catch (Exception $e) { $this->_em->close(); $conn->rollback(); throw $e; } // Take new snapshots from visited collections foreach ($this->_visitedCollections as $coll) { $coll->takeSnapshot(); } // Clear up $this->_entityInsertions = $this->_entityUpdates = $this->_entityDeletions = $this->_extraUpdates = $this->_entityChangeSets = $this->_collectionUpdates = $this->_collectionDeletions = $this->_visitedCollections = $this->_scheduledForDirtyCheck = $this->_orphanRemovals = array(); } /** * Computes the changes that happened to a single entity. * * Modifies/populates the following properties: * * {@link _originalEntityData} * If the entity is NEW or MANAGED but not yet fully persisted (only has an id) * then it was not fetched from the database and therefore we have no original * entity data yet. All of the current entity data is stored as the original entity data. * * {@link _entityChangeSets} * The changes detected on all properties of the entity are stored there. * A change is a tuple array where the first entry is the old value and the second * entry is the new value of the property. Changesets are used by persisters * to INSERT/UPDATE the persistent entity state. * * {@link _entityUpdates} * If the entity is already fully MANAGED (has been fetched from the database before) * and any changes to its properties are detected, then a reference to the entity is stored * there to mark it for an update. * * {@link _collectionDeletions} * If a PersistentCollection has been de-referenced in a fully MANAGED entity, * then this collection is marked for deletion. * * @param ClassMetadata $class The class descriptor of the entity. * @param object $entity The entity for which to compute the changes. */ public function computeChangeSet(Mapping\ClassMetadata $class, $entity) { // ... } /** * Computes all the changes that have been done to entities and collections * since the last commit and stores these changes in the _entityChangeSet map * temporarily for access by the persisters, until the UoW commit is finished. */ public function computeChangeSets() { // ... } /** * Schedules an entity for insertion into the database. * If the entity already has an identifier, it will be added to the identity map. * * @param object $entity The entity to schedule for insertion. */ public function scheduleForInsert($entity) { $oid = spl_object_hash($entity); if (isset($this->_entityUpdates[$oid])) { throw new \InvalidArgumentException("Dirty entity can not be scheduled for insertion."); } if (isset($this->_entityDeletions[$oid])) { throw new \InvalidArgumentException("Removed entity can not be scheduled for insertion."); } if (isset($this->_entityInsertions[$oid])) { throw new \InvalidArgumentException("Entity can not be scheduled for insertion twice."); } $this->_entityInsertions[$oid] = $entity; if (isset($this->_entityIdentifiers[$oid])) { $this->addToIdentityMap($entity); } } /** * Schedules an entity for being updated. * * @param object $entity The entity to schedule for being updated. */ public function scheduleForUpdate($entity) { $oid = spl_object_hash($entity); if ( ! isset($this->_entityIdentifiers[$oid])) { throw new \InvalidArgumentException("Entity has no identity."); } if (isset($this->_entityDeletions[$oid])) { throw new \InvalidArgumentException("Entity is removed."); } if ( ! isset($this->_entityUpdates[$oid]) && ! isset($this->_entityInsertions[$oid])) { $this->_entityUpdates[$oid] = $entity; } } /** * INTERNAL: * Schedules an entity for deletion. * * @param object $entity */ public function scheduleForDelete($entity) { $oid = spl_object_hash($entity); if (isset($this->_entityInsertions[$oid])) { if ($this->isInIdentityMap($entity)) { $this->removeFromIdentityMap($entity); } unset($this->_entityInsertions[$oid]); return; // entity has not been persisted yet, so nothing more to do. } if ( ! $this->isInIdentityMap($entity)) { return; // ignore } $this->removeFromIdentityMap($entity); if (isset($this->_entityUpdates[$oid])) { unset($this->_entityUpdates[$oid]); } if ( ! isset($this->_entityDeletions[$oid])) { $this->_entityDeletions[$oid] = $entity; } } /** * Checks whether an entity is scheduled for insertion, update or deletion. * * @param $entity * @return boolean */ public function isEntityScheduled($entity) { $oid = spl_object_hash($entity); return isset($this->_entityInsertions[$oid]) || isset($this->_entityUpdates[$oid]) || isset($this->_entityDeletions[$oid]); } public function persist($entity) { $visited = array(); $this->_doPersist($entity, $visited); } /** * Saves an entity as part of the current unit of work. * This method is internally called during save() cascades as it tracks * the already visited entities to prevent infinite recursions. * * NOTE: This method always considers entities that are not yet known to * this UnitOfWork as NEW. * * @param object $entity The entity to persist. * @param array $visited The already visited entities. */ private function _doPersist($entity, array &$visited) { $oid = spl_object_hash($entity); if (isset($visited[$oid])) { return; // Prevent infinite recursion } $visited[$oid] = $entity; // Mark visited $class = $this->_em->getClassMetadata(get_class($entity)); $entityState = $this->getEntityState($entity, self::STATE_NEW); switch ($entityState) { case self::STATE_MANAGED: // Nothing to do, except if policy is "deferred explicit" if ($class->isChangeTrackingDeferredExplicit()) { $this->scheduleForDirtyCheck($entity); } break; case self::STATE_NEW: if (isset($class->lifecycleCallbacks[Events::prePersist])) { $class->invokeLifecycleCallbacks(Events::prePersist, $entity); } if ($this->_evm->hasListeners(Events::prePersist)) { $this->_evm->dispatchEvent(Events::prePersist, new LifecycleEventArgs($entity, $this->_em)); } $idGen = $class->idGenerator; if ( ! $idGen->isPostInsertGenerator()) { $idValue = $idGen->generate($this->_em, $entity); if ( ! $idGen instanceof \Doctrine\ORM\Id\AssignedGenerator) { $this->_entityIdentifiers[$oid] = array($class->identifier[0] => $idValue); $class->setIdentifierValues($entity, $idValue); } else { $this->_entityIdentifiers[$oid] = $idValue; } } $this->_entityStates[$oid] = self::STATE_MANAGED; $this->scheduleForInsert($entity); break; case self::STATE_DETACHED: throw new \InvalidArgumentException( "Behavior of persist() for a detached entity is not yet defined."); case self::STATE_REMOVED: // Entity becomes managed again if ($this->isScheduledForDelete($entity)) { unset($this->_entityDeletions[$oid]); } else { //FIXME: There's more to think of here... $this->scheduleForInsert($entity); } break; default: throw ORMException::invalidEntityState($entityState); } $this->_cascadePersist($entity, $visited); } /** * Deletes an entity as part of the current unit of work. * * @param object $entity The entity to remove. */ public function remove($entity) { $visited = array(); $this->_doRemove($entity, $visited); } /** * Deletes an entity as part of the current unit of work. * * This method is internally called during delete() cascades as it tracks * the already visited entities to prevent infinite recursions. * * @param object $entity The entity to delete. * @param array $visited The map of the already visited entities. * @throws InvalidArgumentException If the instance is a detached entity. */ private function _doRemove($entity, array &$visited) { // ... } }
May 19, 2010
by Giorgio Sironi
· 13,687 Views
article thumbnail
Why Raven DB?
One question that I got a few times regarding Raven is why? Richard Lopes puts the question nicely: However as a pragmatic developer, I am wondering what new this project is offering in a saturated market where you have quite mature alternatives like CouchDB, MongoDB, Tokyo, Redis, and many more ? Many of these products are also cross platform and run at C speed with a proven record, being used in very big web sites where their sharding capabilities and fault tolerance have been pushed far. The answer is composed of several parts, and cover quite a bit of history. Why Raven DB from Ayende’s point of view? Almost two years ago, I decided that it is time that I give my Erlang reading abilities a big push and sat down and read Couch DB source code. That was quite interesting, and was one of the reasons that I got interested in that NoSQL Thing. Unfortunately, I am one of those people who have a really hard time learning by osmosis, I have to do something to truly understand it. I have used (and built) a distributed key value store in several projects, but I felt that I didn’t really have a good understanding on what it means to use a document database. I really hate having ideas stuck in my head, they tend to ping and then someone tell me that I have been staring at a blank wall for two hours, and I realize that I just finished designing a document database. And about a year ago, it finally got bad enough that I sat down and wrote an implementation, just to get it off my chest. That was Rhino DivanDB. In most ways, it was a proof of concept, more than anything else. Just enough so I could tell myself, yes, I can do it. Then I run into situations where a document database would be an ideal fit, except… that the available choices wouldn’t quite do what I wanted. They are all open source, however, so no problem there, right? Except that none of them are really approachable to the .NET eco system. Yes, I can do both C++ and Erlang, but I don’t really like it. Moreover, it seems like .NET support is almost an afterthought (if at all) for those projects. There are some people who call me arrogant, but I really do think that I can do better. And I think that we did. Raven is a project where I tried a lot of new things, not from coding perspective, but from community & launch perspectives. It will be out soon, and I think you’ll be able to appreciate the level of focus on the non coding aspects of the project. Why Raven DB from your point of view? Raven is an OSS (with a commercial option) document database for the .NET/Windows platform. While there are other document databases around, such as CouchDB or MongoDB, there really isn’t anything that a .NET developer can pick up and use without a significant amount of friction. Those projects are excellent in what they do, but they aren’t targeting the .NET ecosystem. Raven does, and in so doing, it brings a lot of benefits to the table. When building Raven and the supporting infrastructure, the focus was always on making sure that it did the Right Thing from the .NET developer point of view. Below you can see a more detailed analysis on Raven’s benefits, but it comes down to that. Raven is build by .NET developers for .NET developers. Corny, isn’t it? But true nonetheless. What does Raven DB has to offer? Raven… builds on existing infrastructure that is known to scale to amazing sizes (Raven’s storage can handle up to 16 terrabytes on a single machine). runs, natively and with no effort, on Windows. In comparison, to get CouchDB to run on Windows you start by compiling Erlang from source. is not just a server. You can easily (trivially) embed Raven inside your application. is transactional. That means ACID, if you put data in it, that data is going to stay there. supports System.Transactions and can take part in distributed transactions. allows you to define indexes using Linq queries. supports map/reduce operations on top of your documents using Linq. comes with a fully functional .NET client API, which implements Unit of Work, change tracking, read and write optimizations, and a bunch more. has an amazing web interface allowing you to see, manipulate and query your documents. is REST based, so you can access it via the java script API directly. can be extended by writing MEF plugins. has trigger support that allow you to do some really nifty things, like document merges, auditing, versioning and authorization. supports partial document updates, so you don’t have to send full documents over the wire. supports sharding out of the box. is available in both OSS and commercial modes. There are probably other things, but I need to head out for a client now, so I’ll stop. I would love to hear your opinions about it, both positive and negative.
May 18, 2010
by Oren Eini
· 11,349 Views · 1 Like
article thumbnail
Practical PHP Patterns: Data Mapper
Data Mapper is one of the most advanced persistence-related patterns: an implementation of a Data Mapper stores objects (in general a whole object graph) in a database, and decouples the object model from the backend data representation, moving objects back and forth from the data store without introducing hardcode dependencies towards it. The database back end used by most of the implementations are usually relational: Object-relational mappers are some of the widely used tools today (and are even fading in some areas where different types of database are preferred.) Dependencies The interfaces for a Data Mapper can be put in the domain layer, but actual implementatons are in the category of infrastructure adapters and should be kept out of it, to promote the reuse and testing of domain layer classes without the need for a database back end or driver to be present. When this pattern is employed in an application, there are no more dependencies from the domain layer to external components, and no subclassing like in the Active Record case. Domain entities and value objects become Plain Old PHP Objects which do not extend anything (extends keyword) and do not need to reflect any database schema (if they are saved in a relational db), ensuring the maximum freedom of modelling to the developers. Different kind of implementations Early implementations of Data Mapper did not store an inner reference to a database connection or object that represent the link with the data store; in this case, result sets or some kind of raw data are passed to the Data Mapper, which reconstitutes the objects and encapsulates the process. Currently it is preferred to put all the references to the database as internals of the Data Mapper implementation (or in an abstraction layer under it). Anyway the Data Mapper hides as much as possible, like the type of the database and related knowledge, from the client code (domain layer or an upper one). The interface of the modern Data Mappers become from store() (insert and update) and remove() to one that comprehends also find() methods or a more complex system of querying; the implementation of querying is out of the scope of this pattern, but can be mixed up with it easily. A distinction in the implementations of Data Mapper is in their scope. A Data Mapper can be specific to a particular Entity/Aggregate Root (single class or class with composed objects), or a generic implementation can be customized with metadata (annotations, XML configuration) to work with different classes. Generic implementations are usually very complex, and specific ones may become much more easy to code due to simplifications. However, generic Data Mappers are prone to reuse and present less bugs than the project-specific ones, which were the only alternative in the last years. Issues The difficulties in implementing such a pattern are clear. Given a transaction, like an http request, the mapper has to keep track of the changed objects, and generate automatically the right DML queries to issue (SELECT, UPDATE, DELETE), in the right order and without leaving out any part of the modified data, avoid duplicating rows or update ones that do not exist anymore. This is a case of simple interface and complex implementation. To avoid breaking encapsulation, implementations usually employ reflection to access private fields of the object to store or that the mapper is reconstituting. Other possible choices for the data access are specific constructors for reconstitution or specific interfaces for domain mapping, but this solution still breaks encapsulation by providing to the client code methods that are not meant to be called, or fields that should not even be seen out of the objects but are actually accessed. This results in a unclear Api which may promote dependencies on persistence-related items. Providing metadata breaks encapsulation too, of course, but at least it is kept in the immediate so that it can change with the domain classes. Annotations are the preferred mean to specify metadata such as column names or relationships, and in PHP they are hidden in the docblock comments so that when the Data Mapper is not used they are just ignored. Data Mapper does not provide a total illusion (abstraction) of an in-memory collection of objects: the knowledge that there is some kind of external data store scatters into the application upper layers. Moreover, eventually some particular issue of the storage leaks into the object part of the application. As an example, consider the performance of queries, which is often the object of discussion when using object-relational mappers. Usually not all the object graph is instantiated as it may be very large; tuning how large the instantiated part will be is a trade-off which depends on the underlying database. Furthermore, generated queries may result very inefficient to the point that much of the client code must hint the joins to perform via the Api. Examples The generic Data Mapper Doctrine 2 (now in beta) is one of the few implementations in PHP of this pattern. As we've seen before in this article, specific implementations are dependent on the domain layer, so they are usually not reusable. A working copy of Doctrine 2 would be too large in size for inclusion in this post, so we are only analyzing the interface that most of the client code would see: the Entity Manager (name borrowed from Hibernate and JPA, since Java application used Data Mappers for years before this pattern has seen adoption from PHP ones.) The Entity Manager is not a domain specific interface, but other patterns like the Repository one can then compose the mapper to provid segregated interfaces for a particular class (aggregate root). As always, I have removed the less interesting methods or code to show the Api, and expanded the comments. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel */ class EntityManager { /** * Flushes all changes to objects that have been queued up to now to the database. * This effectively synchronizes the in-memory state of managed objects with the * database. * No query is executed before this method is called from client code. * * @throws Doctrine\ORM\OptimisticLockException If a version check on an entity that * makes use of optimistic locking fails. */ public function flush() { $this->_errorIfClosed(); $this->_unitOfWork->commit(); } /** * Finds an Entity by its identifier. * This method is often combined with query-oriented ones. * * @param string $entityName the class name * @param mixed $identifier usually primary key * @param int $lockMode * @param int $lockVersion * @return object */ public function find($entityName, $identifier, $lockMode = LockMode::NONE, $lockVersion = null) { return $this->getRepository($entityName)->find($identifier, $lockMode, $lockVersion); } /** * Tells the EntityManager to make an instance managed and persistent. * * The entity will be entered into the database at or before transaction * commit or as a result of the flush operation. * * NOTE: The persist operation always considers entities that are not yet known to * this EntityManager as NEW. Do not pass detached entities to the persist operation. * * @param object $object The instance to make managed and persistent. */ public function persist($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->persist($entity); } /** * Removes an entity instance. * * A removed entity will be removed from the database at or before transaction commit * or as a result of the flush operation. * * @param object $entity The entity instance to remove. */ public function remove($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->remove($entity); } /** * Refreshes the persistent state of an entity from the database, * overriding any local changes that have not yet been persisted. * * @param object $entity The entity to refresh. */ public function refresh($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->refresh($entity); } /** * Determines whether an entity instance is managed in this EntityManager. * * @param object $entity * @return boolean TRUE if this EntityManager currently manages the given entity, FALSE otherwise. */ public function contains($entity) { return $this->_unitOfWork->isScheduledForInsert($entity) || $this->_unitOfWork->isInIdentityMap($entity) && ! $this->_unitOfWork->isScheduledForDelete($entity); } /** * Factory method to create EntityManager instances. * * @param mixed $conn An array with the connection parameters or an existing * Connection instance. * @param Configuration $config The Configuration instance to use. * @param EventManager $eventManager The EventManager instance to use. * @return EntityManager The created EntityManager. */ public static function create($conn, Configuration $config, EventManager $eventManager = null); }
May 17, 2010
by Giorgio Sironi
· 9,902 Views
article thumbnail
Practical PHP Patterns: Active Record
The Active Record pattern effectively prescribes to wrap a row of a database table in a domain object with a 1:1 relationship, managing its state and adding business logic in the wrapping class code. An Active Record implementation is in fact a classical C structure aka Record aka associative array of data, with the addition of utility methods that encapsulate behavior that acts on these data. The most useful method is usually the save() one, which updates the database reflecting in the row the current state of the record. Thus, the Active Record transparently works with SQL queries and provides an higher-level Api. Although Active Record is similar in implementation to the Row Data Gateway pattern, it is distinguished from it in the fact that it defines methods with domain-specific logic. The consequence of the presence of domain-specific logic is that generic implementations of Active Record provided by libraries must be customized to met the need of the object model. Typically this customization is done with a thin subclassing, which at least renames the library class with a domain name (like User or Post) and may specify metadata on the database table where the Active Records state is kept, if they are not inferred. The issue of subclassing Subclassing allows the developer to create new methods and properties to represent business logic, and to build a richer and more specific interface than the one constituted by simple Row Data Gateway objects. Despite these advantage, this interface is not much segregated, as subclassing exposes all the public methods of the base library class, on which the developers of the domain model have no control. Subclassing also ties the domain layer to the infrastructure one, being it a library or a framework or every kind of data persistence layer (examples of PHP ORMs that use Active Record are the Zend_Db component, Doctrine 1 and Propel). Domain objects cannot be created or even their class source code loaded without having the library code available. This is an issue when reusing the model in a different environment, and even in test suites. If the library is powerful enough, it may provide adapters for different databases so that a lightweight database instance can be created in the testing environment. Another caveat of Active Record is the fundamental assumption that a domain entity is always a row of a table of a relational database; this constraint is forced even when it is not appropriate, and the database and object model must match. In fact, part of the database (like foreign keys) often scatter into the domain model, as an Active Record with an external one-to-one relationship will usually store not only the related object but also its foreign key. Another example of mirroring of the relational model into the object graph is for the management of M-to-N associating entities, often forced to become real entities even when they do not make sense (the famous UserGroup classes that tie together User and Group rows). Diffusion Thus, the Active Record pattern puts at risk the freedom of implementing a powerful Domain Model, where the object graph is a mix of state-carrying and behavior-carrying objects, like Strategies and Specifications. It is however, a radical simplification in implementation of domain models where CRUD functions are all the rage, and there is no gain in implementing objects that do not simply map to a relational database. Note that in the case of PHP, most of the custom web applications developed in this language are deeply influenced by the back end, assumed as a relational database or even as MySQL. But while the technologies for user-to-application and application-to.application interaction on the web continue to grow, the situation will continue to evolve and if PHP wants to keep up with the pace of other dynamic languages, Java and .NET, it needs to finally decouple from the relational database as the unique model of data. By the way, current implementation of persistence frameworks are transitioning towards a Data Mapper approach (not only in PHP but also in Ruby, while Java has done that years ago with Hibernate and JPA), which is less invasive on the Domain Model source code, and does not introduce an hard dependency from the domain layer to infrastructure components. Examples This sample implementation of the Active Record pattern is taken from the Doctrine 1.2 ORM. The base class in this framework id Doctrine_Record (together with Doctrine_Record_Abstract), while a base class with the schema metadata is regenerated from a model, and it is subclassed for orthogonality of customization and synchronization by the developer. The base Active Record class looks like this: /** * Implements also __get() and __set(), not shown along with many other dozen methods. */ abstract class Doctrine_Record extends Doctrine_Record_Abstract implements Countable, IteratorAggregate, Serializable { /** * Empty template method to provide concrete Record classes with the possibility * to hook into the saving procedure. */ public function preSave($event) { } /** * Empty template method to provide concrete Record classes with the possibility * to hook into the saving procedure. */ public function postSave($event) { } /** * applies the changes made to this object into database * this method is smart enough to know if any changes are made * and whether to use INSERT or UPDATE statement * * this method also saves the related components * * @param Doctrine_Connection $conn optional connection parameter * @return void */ public function save(Doctrine_Connection $conn = null) { if ($conn === null) { $conn = $this->_table->getConnection(); } $conn->unitOfWork->saveGraph($this); } /** * returns a string representation of this object */ public function __toString() { return (string) $this->_oid; } } If we have an Article entity, it will be represented via subclassing of the generic Active Record. In Doctrine 1, a subclass can be generated by writing a compact Yaml model, or even reverse engineered from an existing database: /** * BaseOtk_Content_Article * * This class has been auto-generated by the Doctrine ORM Framework * * @property integer $id * @property integer $section_id * @property integer $author_id * @property integer $image_id * @property string $title * @property string $description * @property string $text * @property integer $visits * @property boolean $draft * @property boolean $closed * @property Otk_Content_Section $section * @property Otk_User $author * @property Otk_File $image * @property Doctrine_Collection $sections * @property Doctrine_Collection $Otk_Content_Tag * @property Doctrine_Collection $comments * */ abstract class BaseOtk_Content_Article extends Otk_Model_Record { public function setTableDefinition() { $this->setTableName('oss_content_articles'); $this->hasColumn('id', 'integer', 3, array('type' => 'integer', 'primary' => true, 'autoincrement' => true, 'length' => '3')); $this->hasColumn('section_id', 'integer', 2, array('type' => 'integer', 'notnull' => true, 'length' => '2')); $this->hasColumn('author_id', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('image_id', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('title', 'string', 255, array('type' => 'string', 'length' => '255')); $this->hasColumn('description', 'string', 1000, array('type' => 'string', 'length' => '1000')); $this->hasColumn('text', 'string', null, array('type' => 'string')); $this->hasColumn('visits', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('draft', 'boolean', null, array('type' => 'boolean', 'notnull' => true, 'default' => false)); $this->hasColumn('closed', 'boolean', null, array('type' => 'boolean')); } public function setUp() { $this->hasOne('Otk_Content_Section as section', array('local' => 'section_id', 'foreign' => 'id')); $this->hasOne('Otk_User as author', array('local' => 'author_id', 'foreign' => 'id')); $this->hasOne('Otk_File as image', array('local' => 'image_id', 'foreign' => 'id')); $this->hasMany('Otk_Content_Section as sections', array('refClass' => 'Otk_Content_Tag', 'local' => 'article_id', 'foreign' => 'section_id')); $this->hasMany('Otk_Content_Tag', array('local' => 'id', 'foreign' => 'article_id')); $this->hasMany('Otk_Content_Comment as comments', array('local' => 'id', 'foreign' => 'article_id')); $timestampable0 = new Doctrine_Template_Timestampable(); $sluggable0 = new Doctrine_Template_Sluggable(array('fields' => array(0 => 'title'), 'canUpdate' => true, 'unique' => true)); $this->actAs($timestampable0); $this->actAs($sluggable0); } } To support further regenerations of the subclass as the schema evolves, another subclassing step is necessary. This class will never be touched by the regeneration process and it is the one referred in client code. /** * This class defines the domain logic via addition of methods. */ class Otk_Content_Article extends BaseOtk_Content_Article { public function getTags() { $tags = array(); foreach ($this->sections as $section) { $tags[$section->slug] = $section->name; } return $tags; } }
May 12, 2010
by Giorgio Sironi
· 8,825 Views
article thumbnail
Practical PHP Patterns: Table Data Gateway
The Table Data Gateway pattern is the object-oriented equivalent of a relational table. In fact, this pattern's intent is to encapsulate the full interaction with a database table, holding all the logic specific to this particular implementation of the back end. In the majority of cases, a Table Data Gateway deals with a relational model, having a 1:1 relationship with the main tables of the database. Minor tables may not need a specific class, or can be managed via Table Data Gateways of tables that link them with foreign keys (for example entities introduced to store M:N relationships are usually not first-class citizens.) In the relational implementation, the Table Data Gateway handles all SQL queries, presenting a domain-specific interface when a class is coded for a specific table, or a generic one when a generic implementation is reused throughout different applications. The difference between the two APIs may be something like findBy($field, $value) (generic) versus findByPrice($price) (domain-specific). Note that in PHP magic methods are often used to implement domain-specific interfaces without code generation: a __call() implementation can catch the various findBy*() method and throw exceptions if the methods is not applicable. Related patterns Although, the concept of table is already correlated with a relational model (and it does not hold when the back end is an object-oriented database or one of the key-value stores so trendy today), this pattern is named Gateway because it is a specialization of the Gateway category of pattern, which decouple an object graph (or any in-memory structures) from external infrastructure like databases, web services, filesystems and so on. In fact, there is an alternate name for this pattern: Data Access Object (or DAO for friends). Although if I was pedantic I would highlight the differences between the implementations of DAOs and Table Data Gateway, their intent is really the same and there are differences in an individual pattern implementations that are greater than the ones between the different patterns. There's no clear demarcation line between the two. Another related pattern is the Table Module one. Table Data Gateway does not work against, but with a Table Module, providing a separation of concerns: the first object takes the rows out of the database, while the second performs in-memory operation on them (generally by composing the Table Data Gateway or its results). The in-memory operations of a Table Module are easier to test, but the SQL-based operations of the Table Data Gateway are pushed on the database side: there is a trade-off between the logic should be kept in each class. When used in isolation, the Table Data Gateway is also a Factory for also for Row Data Gateways or Active Records, both again implemented with generic or domain-specific interfaces. Many frameworks and first-generation PHP ORMs based on Active Record are also based on Table Data Gateway to provide a collection-level access to the objects stored as rows. In the context of Active Record, the only alternative to a Table Data Gateway to handle operations like find() is to place static methods on the Active Record class, with all the testability and dishonest API issues that ensue. Both Zend Framework and Doctrine 1.x represent tables as first-class objects. Examples Zend Framework's component Zend_Db, which is explored in the sample code, provides always generic implementations of Zend_Db_Table, and the possibility of optional subclassing (to add domain-specific methods). It is not recommend to expose the API of Table Data Gateway in front-end code, but it's a simple solution when the business logic does not warrant a full-featured Domain Model. Even when working with a Domain Model, and before the introduction of generic Data Mappers for PHP, the Table Data Gateway can be used in a composition solution (wrapped) to craft a simple API for a domain-specific Data Mapper, resulting in decoupling from the database. As I've written earlier, the sample code is taken from the Zend_Db_Table class of Zend Framework (actually from its parent abstract class, Zend_Db_Table_Abstract). I've enriched the docblock comments and left out all the methods not part of the main API (most of getters and setters for configuration and protecte|private members). $value) { switch ($key) { case self::ADAPTER: $this->_setAdapter($value); break; case self::DEFINITION: $this->setDefinition($value); break; case self::DEFINITION_CONFIG_NAME: $this->setDefinitionConfigName($value); break; case self::SCHEMA: $this->_schema = (string) $value; break; case self::NAME: $this->_name = (string) $value; break; case self::PRIMARY: $this->_primary = (array) $value; break; case self::ROW_CLASS: $this->setRowClass($value); break; case self::ROWSET_CLASS: $this->setRowsetClass($value); break; case self::REFERENCE_MAP: $this->setReferences($value); break; case self::DEPENDENT_TABLES: $this->setDependentTables($value); break; case self::METADATA_CACHE: $this->_setMetadataCache($value); break; case self::METADATA_CACHE_IN_CLASS: $this->setMetadataCacheInClass($value); break; case self::SEQUENCE: $this->_setSequence($value); break; default: // ignore unrecognized configuration directive break; } } return $this; } /** * Inserts a new row. * The data structure is as generic as possible. The list of columns is * known by configuration. * $this->_db is a light abstraction over PDO, which already encapsulates * most of the SQL. Database abstraction is not a banal task and segregating * the functionalities in different classes is very helpful. * * @param array $data Column-value pairs. * @return mixed The primary key of the row inserted. */ public function insert(array $data) { $this->_setupPrimaryKey(); /** * Zend_Db_Table assumes that if you have a compound primary key * and one of the columns in the key uses a sequence, * it's the _first_ column in the compound key. */ $primary = (array) $this->_primary; $pkIdentity = $primary[(int)$this->_identity]; /** * If this table uses a database sequence object and the data does not * specify a value, then get the next ID from the sequence and add it * to the row. We assume that only the first column in a compound * primary key takes a value from a sequence. */ if (is_string($this->_sequence) && !isset($data[$pkIdentity])) { $data[$pkIdentity] = $this->_db->nextSequenceId($this->_sequence); } /** * If the primary key can be generated automatically, and no value was * specified in the user-supplied data, then omit it from the tuple. */ if (array_key_exists($pkIdentity, $data) && $data[$pkIdentity] === null) { unset($data[$pkIdentity]); } /** * INSERT the new row. */ $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; $this->_db->insert($tableSpec, $data); /** * Fetch the most recent ID generated by an auto-increment * or IDENTITY column, unless the user has specified a value, * overriding the auto-increment mechanism. */ if ($this->_sequence === true && !isset($data[$pkIdentity])) { $data[$pkIdentity] = $this->_db->lastInsertId(); } /** * Return the primary key value if the PK is a single column, * else return an associative array of the PK column/value pairs. */ $pkData = array_intersect_key($data, array_flip($primary)); if (count($primary) == 1) { reset($pkData); return current($pkData); } return $pkData; } /** * Updates existing rows. * Again we see generic data structures, not tied to PDO * or to particular adapters. * * @param array $data Column-value pairs. * @param array|string $where An SQL WHERE clause, or an array of SQL WHERE clauses. * @return int The number of rows updated. */ public function update(array $data, $where) { $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; return $this->_db->update($tableSpec, $data, $where); } /** * Deletes existing rows. * * @param array|string $where SQL WHERE clause(s). * @return int The number of rows deleted. */ public function delete($where) { $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; return $this->_db->delete($tableSpec, $where); } /** * Fetches rows by primary key. The argument specifies one or more primary * key value(s). To find multiple rows by primary key, the argument must * be an array. * * This method accepts a variable number of arguments. If the table has a * multi-column primary key, the number of arguments must be the same as * the number of columns in the primary key. To find multiple rows in a * table with a multi-column primary key, each argument must be an array * with the same number of elements. * * The find() method always returns a Rowset object, even if only one row * was found. * * @param mixed $key The value(s) of the primary keys. * @return Zend_Db_Table_Rowset_Abstract Row(s) matching the criteria. * @throws Zend_Db_Table_Exception */ public function find() { $this->_setupPrimaryKey(); $args = func_get_args(); $keyNames = array_values((array) $this->_primary); if (count($args) < count($keyNames)) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Too few columns for the primary key"); } if (count($args) > count($keyNames)) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Too many columns for the primary key"); } $whereList = array(); $numberTerms = 0; foreach ($args as $keyPosition => $keyValues) { $keyValuesCount = count($keyValues); // Coerce the values to an array. // Don't simply typecast to array, because the values // might be Zend_Db_Expr objects. if (!is_array($keyValues)) { $keyValues = array($keyValues); } if ($numberTerms == 0) { $numberTerms = $keyValuesCount; } else if ($keyValuesCount != $numberTerms) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Missing value(s) for the primary key"); } $keyValues = array_values($keyValues); for ($i = 0; $i < $keyValuesCount; ++$i) { if (!isset($whereList[$i])) { $whereList[$i] = array(); } $whereList[$i][$keyPosition] = $keyValues[$i]; } } $whereClause = null; if (count($whereList)) { $whereOrTerms = array(); $tableName = $this->_db->quoteTableAs($this->_name, null, true); foreach ($whereList as $keyValueSets) { $whereAndTerms = array(); foreach ($keyValueSets as $keyPosition => $keyValue) { $type = $this->_metadata[$keyNames[$keyPosition]]['DATA_TYPE']; $columnName = $this->_db->quoteIdentifier($keyNames[$keyPosition], true); $whereAndTerms[] = $this->_db->quoteInto( $tableName . '.' . $columnName . ' = ?', $keyValue, $type); } $whereOrTerms[] = '(' . implode(' AND ', $whereAndTerms) . ')'; } $whereClause = '(' . implode(' OR ', $whereOrTerms) . ')'; } // issue ZF-5775 (empty where clause should return empty rowset) if ($whereClause == null) { $rowsetClass = $this->getRowsetClass(); if (!class_exists($rowsetClass)) { require_once 'Zend/Loader.php'; Zend_Loader::loadClass($rowsetClass); } return new $rowsetClass(array('table' => $this, 'rowClass' => $this->getRowClass(), 'stored' => true)); } return $this->fetchAll($whereClause); } /** * Fetches a new blank row (not from the database). * Thanks to the metadata, a new Row Data Gateway can be created. This * if a Factory Method. The dynamic nature of PHP makes configuring the * subclass for the Row Data Gateway as simple as defining a string. * * @param array $data OPTIONAL data to populate in the new row. * @param string $defaultSource OPTIONAL flag to force default values into new row * @return Zend_Db_Table_Row_Abstract */ public function createRow(array $data = array(), $defaultSource = null) { $cols = $this->_getCols(); $defaults = array_combine($cols, array_fill(0, count($cols), null)); // nothing provided at call-time, take the class value if ($defaultSource == null) { $defaultSource = $this->_defaultSource; } if (!in_array($defaultSource, array(self::DEFAULT_CLASS, self::DEFAULT_DB, self::DEFAULT_NONE))) { $defaultSource = self::DEFAULT_NONE; } if ($defaultSource == self::DEFAULT_DB) { foreach ($this->_metadata as $metadataName => $metadata) { if (($metadata['DEFAULT'] != null) && ($metadata['NULLABLE'] !== true || ($metadata['NULLABLE'] === true && isset($this->_defaultValues[$metadataName]) && $this->_defaultValues[$metadataName] === true)) && (!(isset($this->_defaultValues[$metadataName]) && $this->_defaultValues[$metadataName] === false))) { $defaults[$metadataName] = $metadata['DEFAULT']; } } } elseif ($defaultSource == self::DEFAULT_CLASS && $this->_defaultValues) { foreach ($this->_defaultValues as $defaultName => $defaultValue) { if (array_key_exists($defaultName, $defaults)) { $defaults[$defaultName] = $defaultValue; } } } $config = array( 'table' => $this, 'data' => $defaults, 'readOnly' => false, 'stored' => false ); $rowClass = $this->getRowClass(); if (!class_exists($rowClass)) { require_once 'Zend/Loader.php'; Zend_Loader::loadClass($rowClass); } $row = new $rowClass($config); $row->setFromArray($data); return $row; } }
May 5, 2010
by Giorgio Sironi
· 7,936 Views
article thumbnail
Practical PHP Patterns: Domain Model
The architectural pattern I'd like to talk about in this article is the overly famous Domain Model. An application's Domain Model is simply defined as an object graph created from domain-specific classes; when present, a Domain Model is the core of the application, where all the business logic resides. This object graph is employed by upper layers of an application which present it to the user. The metaphor for this methodology In software development, the term domain (or business domain) is an umbrella for the area the application is built in, and that it will serve. The new domains we encounter as we move to new projects are one of the most interesting points of software development, where we are constantly embracing new fields and gaining knowledge. Given a domain such as a particular industry (chemical, electronics) or business (air travelling, e-commerce), the point of connection of an application with these activities is its model. A model is an abstract representation of the reality of the domain, which captures its interesting and relevant aspects. The practice of modelling is not a specific trait of software development (in particular model-driven development), but it is a more general scientifical process. For example, everyone who works in the field of information technology knows the voltage/current relationships for simple components such as resistors and capacitors (Ohm's law and current derivative of the voltage). The specific domain here is electronics, and this model is named lumped component model, essentially because it lets a designer connect isolated one-port (two terminals) components to build his desired circuit. This model is a simplification of much more complex models of reality: the Maxwell equations and the propagation of electromagnetic fields; the lumped component model is valid whenever the frequency of the voltage/current signals in the circuit is low, so that the wavelengths of these signals are far greater than the dimensions of the circuit (if that goes over your head, don't worry, it's the field of electrical engineers.) When designers consider larger circuits, such as a transmission line, this model ceases to give correct results and more general ones must be employed. The domain is almost the same, but the model serves a different purpose and has to be necessarily different from the one used in small scale circuits. This complex example is here only to show that given a domain, there is no single model for it, but there are many possible ones which may adapt more or less reliably to the goals of an application. Starting from a modelling phase and deep understanding of the domain are key points of Domain-Driven Design, one of the ascending methodologies for developing complex enterprise software. Software models While there are standard mathematical models for many domains in the scientific world, software developers usually build a tailored one in every different application, performing an analysis of the domain (or at least they should.) The result of the modelling can comprehend document or diagrams, but the most powerful artifact is an executable model. Object-oriented programming is a almost perfect paradigm when it comes to modelling the real world, and lets the developers construct a Domain Model in the form of a set of classes. In a correct implementation of a Domain Model, these classes should be behaviorally complete: they must encapsulate their data as much as possible and expose a set of methods, while avoiding their usage as dumb data containers. The bread and butter of a Domain Model are the classical example of User, Post, Forum, Group, PrivateMessage classes, which are usually in a one to one relationship with database tables. But the Domain Model is not limited to these Entity classes: it also "comprehends" ValueObjects (modelization of domain-specific data types) and various kinds of Services. Every class that encapsulates business logic is welcome, so that this logic is not duplicated in upper layers, which are the primary clients of the Domain Model. Dependencies and purity Another key trait of the classes included in the Domain Model is the absence of external dependencies, like a library to store in the data contained in the objects in a database. The code artifact in a Domain Model are either interfaces, or Plain Old Php Objects (classes which do not extend any external abstract superclass.) Active Record approaches should be avoided because not only a relational database is an infrastructure detail not included in the Domain Model itself, but the very concept of persistence is abstracted away. As far as the clients of the Domain Model are concerned, the state and behavior of the application are represented by an in-memory object graph, whose methods expose functionalities and which client code can play with. There are no dependencies from a Domain Model towards infrastructure classes, because these dependencies must be inverted. The resulting system is an instance of the hexagonal architecture, where the Domain Model defines ports (interfaces) and infrastructure can be chosen to provide adapters for these ports (implementations in the form of classes extraneous to the model). The implementaton of non-invasive persistence is the subject of the Data Mapper pattern, which will be treated later in this series, but every kind of service implementation which communicate with the outside of the core object graph (databases, network, filesystem) is only defined as a contract in the Domain Model. Persistence is almost always dealt with a library in other object-oriented languages, now also in PHP with a non-invasive ORM such as Doctrine 2. Nothing obstructs the developers from implementing a specific Data Mapper by hand, but it's a very repetitive and prone to errors task. While in origin simpler, invasive patterns such as Active Record could be used in a Domain Model, nowadays with Data Mapper availables it is considered an hack. Sample Returning to the subject of the Domain Model as the core of an application, the diffused opinion is that the more complex the business logic and the data involved, the more the application benefits from a rich Domain Model. Thus, this pattern should not be used in small-sized applications where there is no much more logic than CRUD screens for data containers, which unfortunately were a target for PHP in the last ten years. I hope PHP keeps evolving to finally break in the enterprise segment, where this pattern is most valuable. Due to the size and scope of this article, I am forced to keep the sample code short. Forgive me if you think that you can achieve the same functionality with fewer lines of code, but this pattern is about architecture and should highlight the separation of concerns between classes more than the KISS principle. Another problem with code samples in modelling is that you have to actually know the domain well to follow the discussion. For this reason I chose a webmail system for this example. _sender; } /** * Do we need setters and getters? Every field should be * analyzed. If we can keep it private and inaccessible, * it's usually better. */ public function setSender($sender) { $this->_sender = $sender; } /** * @return string */ public function getRecipient() { return $this->_recipient; } public function setRecipient($recipient) { $this->_recipient = $recipient; } /** * @return string */ public function getSubject() { return $this->_subject; } public function setSubject($subject) { $this->_subject = $subject; } /** * @return string */ public function getText() { return $this->_text; } public function setText($text) { $this->_text = $text; } public function __toString() { return $this->_subject . ' > ' . substr($this->_text, 0, 20) . '...'; } public function reply() { $reply = new Email(); $reply->setRecipient($this->_sender); $reply->setSender($this->_recipient); $reply->setSubject('Re: ' . $this->_subject); $reply->setText($this->_sender . " wrote:\n" . $this->_text); return $reply; } } /** * Interface for a service. This is part of the Domain Model, * implementations will be plugged in depending on the environment. */ interface EmailRepository { /** * @return array * @TypeOf(Email) */ public function getEmailsFor($recipient); } // client code $mail = new Email(); $mail->setSender("[email protected]"); $mail->setRecipient("[email protected]"); $mail->setSubject('Hello'); $mail->setText('This is a test of an Email object, which is part of our Domain Model.'); echo $mail, "\n"; $reply = $mail->reply(); echo $reply, "\n";
April 25, 2010
by Giorgio Sironi
· 7,880 Views
article thumbnail
Running Hazelcast on a 100 Node Amazon EC2 Cluster
The purpose of this article is to give you the details of our 100 node cluster demo. This demo is recorded and you can watch the 5 minute screencast Hazelcast is an open source clustering and highly scalable data distribution platform for Java. JVMs that are running Hazelcast will dynamically cluster and allow you to easily share and partition your application data across the cluster. Hazelcast is a peer-to-peer solution (there is no master node, every node is a peer) so there is no single point of failure. Communication among cluster members is always TCP/IP with Java NIO beauty. The default configuration comes with 1 backup so if a node fails, no data will be lost (you can specify the backup count). It is as simple as using java.util.{Map, Queue, Set, List}. Just add the hazelcast.jar into your classpath and start coding. When you download the Hazelcast, you will find a test.sh under bin directory. The test.sh runs an application which randomly makes 40% get, 40% put and 20% remove on a distributed map. In this demo the same test application will be used to see how it performs on 100 node cluster. Amazon EC2 and S3 An easy to use and scalable cloud environment was needed for demo so we decided to use Amazon EC2 for server instances (nodes) and S3 service to store demo application zip and configuration files. With its newly announced Java SDK, it is very simple to start/stop server instances and upload files to S3 programatically. Hazelcast AMI & Launcher The challenge here is that we are running an application on 100 nodes and dealing with each and every server in the cluster is a huge task. We don't want to ssh into every server and manually start the application. This part is automated by creating a special server image (AMI). The AMI contains Java Runtime and a launcher application we developed, which will download the demo application from Amazon S3, unzip it, and run the hazelcast/bin/test.sh in it. The Launcher is actually so generic that it can run any application; it doesn't care/know what test.sh contains. Deployer Deployment of the demo application is also automated so that we don't need to login into AWS Management Console and manually start instances. Deployer instantiates any number of Amazon EC2 servers with any AMI and also uploads the demo application zip file to S3. So the idea here is that, the Deployer will store the application into S3 and launch 100 EC2 instances with our image. The Launcher on each instance will download the application from S3 and run it. Demo Details. The smallest EC2 instances (m1.small) are used to run the demo. These are the virtual instances with CPU about 1.0 GHz. Also keep in mind that EC2 platform suffers from considerable amount of network latency. That's why we increased the thread count to 250 in our application. The following steps performed during the demo Download hazelcast-1.8.3.zip from www.hazelcast.com. Unzip the file and move the monitoring war file into tomcat6/webapps directory. Edit the test.sh under the bin directory: Add -Xmx1G -Xms1G Add -Dhazelcast.initial.wait.seconds=100 to make the cluster evenly partition on start so that migration can be avoided for better performance. Add t250 as an argument to the application to set thread count to 250. Remember the latency issue. Run the Deployer from IDE. Check from EC2 Management Console if 100 servers started. Start tomcat. Copy the public DNS name of one of the servers to connect to from monitoring tool. Go to http://localhost:8080/hazelcast-monitor-1.8.3/ (Hazelcast Monitoring Tool). Paste the address and connect to the cluster. Enjoy! Results You should always look for programatic ways of launching applications on the cloud. With these tools we were able to deploy and run the demo application on 100 servers in minutes. The entire Hazelcast cluster was making over 400,000 operations per second on the smallest EC2 instances. In our next demo we will experiment Hazelcast on large data set and even bigger cluster. Watch the screencast
April 16, 2010
by Fuad Malikov
· 62,656 Views · 1 Like
article thumbnail
Debugging Hibernate Generated SQL
In this article, I will explain how to debug Hibernate’s generated SQL so that unexpected query results be traced faster either to a faulty dataset or a bug in the query. There’s no need to present Hibernate anymore. Yet, for those who lived in a cave for the past years, let’s say that Hibernate is one of the two main ORM frameworks (the second one being TopLink) that dramatically ease database access in Java. One of Hibernate’s main goal is to lessen the amount of SQL you write, to the point that in many cases, you won’t even write one line. However, chances are that one day, Hibernate’s fetching mechanism won’t get you the result you expected and the problems will begin in earnest. From that point and before further investigation, you should determine which is true: either the initial dataset is wrong or the generated query is or both if you’re really unlucky Being able to quickly diagnose the real cause will gain you much time. In order to do this, the greatest step will be viewing the generated SQL: if you can execute it in the right query tool, you could then compare pure SQL results to Hibernate’s results and assert the true cause. There are two solutions for viewing the SQL. Show SQL The first solution is the simplest one. It is part of Hibernate’s configuration and is heavily documented. Just add the following line to your hibernate.cfg.xml file: ... true The previous snippet will likely show something like this in the log: select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ Not very readable but enough to copy/paste in your favourite query tool. The main drawback of this is that if the query has parameters, they will display as ? and won’t show their values, like in the following output: select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ where (this_.PER_D_BIRTH_DATE=? and this_.PER_T_FIRST_NAME=? and this_.PER_T_LAST_NAME=?) If they’re are too many parameters, you’re in for a world of pain and replacing each parameter with its value will take too much time. Yet, IMHO, this simple configuration should be enabled in all environments (save production), since it can easily be turned off. Proxy driver The second solution is more intrusive and involves a third party product but is way more powerful. It consists of putting a proxy driver between JDBC and the real driver so that all generated SQL will be logged. It is compatible with all ORM solutions that rely on the JDBC/driver architecture. P6Spy is a driver that does just that. Despite its age (the last release dates from 2003), it is not obsolete and server our purpose just fine. It consists of the proxy driver itself and a properties configuration file (spy.properties), that both should be present on the classpath. In order to leverage P6Spy feature, the only thing you have to do is to tell Hibernate to use a specific driver: com.p6spy.engine.spy.P6SpyDriver ... This is a minimal spy.properties: module.log=com.p6spy.engine.logging.P6LogFactory realdriver=org.hsqldb.jdbcDriver autoflush=true excludecategories=debug,info,batch,result appender=com.p6spy.engine.logging.appender.StdoutLogger Notice the realdriver parameter so that P6Spy knows where to redirect the calls. With just these, the above output becomes: 1270906515233|3|0|statement|select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ where (this_.PER_D_BIRTH_DATE=? and this_.PER_T_FIRST_NAME=? and this_.PER_T_LAST_NAME=?)|select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ where (this_.PER_D_BIRTH_DATE=’2010-04-10′ and this_.PER_T_FIRST_NAME=’Johnny’ and this_.PER_T_LAST_NAME=’Be Good’) Of course, the configuration can go further. For example, P6Spy knows how to redirect the logs to a file, or to Log4J (it currently misses a SLF4J adapter but anyone could code one easily). If you need to use P6Spy in an application server, the configuration should be done on the application server itself, at the datasource level. In that case, every single use of this datasource will be traced, be it from Hibernate, TopLink, iBatis or plain old JDBC. In Tomcat, for example, put spy.properties in common/classes and update the datasource configuration to use P6Spy driver. The source code for this article can be found here. To go further: P6Spy official site Log4jdbc, a Google Code contender that aims to offer the same features From http://blog.frankel.ch/debugging-hibernate-generated-sql
April 13, 2010
by Nicolas Fränkel
· 30,733 Views
article thumbnail
How to use WMI from a .NET Application
First of all, let’s see what is WMI and what it offers. WMI is an acronym for Windows Management Instrumentation, which is basically an interface to the Windows OS system settings, drivers and parameters. It also allows managing Windows personal computers and servers through it. A .NET developer can use WMI to obtain information about drivers installed on the client machine, verify whether the system is licensed or not, check for hardware configuration and a lot more. Quoting Linus Torvalds, “Talk is cheap. Show me the code”, let’s get to the basics of WMI usage. To get data through WMI, a SQL-like query is used. The specific query type is called WQL (WMI Query Language). Don’t let the name confuse you. It is still very similar to SQL. Before diving into code, you should know that Windows comes with a tool called WMI Test Tool, which lets you test WQL queries, to check their correctness and returned results. It is a bit harder to track wrong query results in code, so this tool can save some time for the developer. To run it, just start the Run dialog (or the Command Prompt) and type wbemtest. Once it is started, you will see a window like this: Click on Connect and you will see a dialog like this: It lets you connect to a namespace on your local Windows computer. You can use your credentials (although for the most queries this is not a requirement) and select the impersonation and authentication levels (once again, for the most queries the default settings are acceptable). Once you click connect, you will be able to execute WMI queries, as well as perform other tasks (for example, enumerate classes in a superclass to review its possibilities). Before creating a query, you need to understand what information you want to obtain. The query is executed against a WMI class – you can read the complete list here. Let’s take the Win32_Processor class as an example here. Querying against this class will give us the information about the CPU installed on a machine. If the machine runs with multiple CPUs, a query result will be returned for each one of them. The Win32_Processor class exposes the following properties: AddressWidth Architecture Availability Caption ConfigManagerErrorCode ConfigManagerUserConfig CpuStatus CreationClassName CurrentClockSpeed CurrentVoltage DataWidth Description DeviceID ErrorCleared ErrorDescription ExtClock Family InstallDate L2CacheSize L2CacheSpeed L3CacheSize L3CacheSpeed LastErrorCode Level LoadPercentage Manufacturer MaxClockSpeed Name NumberOfCores NumberOfLogicalProcessors OtherFamilyDescription PNPDeviceID PowerManagementCapabilities[] PowerManagementSupported ProcessorId ProcessorType Revision Role SocketDesignation Status StatusInfo Stepping SystemCreationClassName SystemName UniqueId UpgradeMethod Version VoltageCaps Most of these are have self-descriptive names, but if you are ever confused about one of them, you can always refer to the MSDN documentation for the class, that explains each one of them. Now, let’s try to get the values of the above mentioned properties in your .NET application. In my examples I am using C#, but if you are using another .NET language, you shouldn’t have a problem adapting the code. First of all, you need to add a reference to the System.Management and System. Management.Instrumentation namespaces. This is done by right-clicking on References in the Solution Explorer and selecting Add Reference. Then, you can select the above mentioned libraries from the .NET list: Once selected, you need to reference the proper namespaces in your code: using System.Management; Now, to the actual code. I am going to create a function that can be called from anywhere in the code to simplify this task. void GetCPUInfo() { ManagementObjectSearcher searcher = new ManagementObjectSearcher("SELECT * FROM Win32_Processor"); foreach (ManagementObject obj in searcher.Get()) { if (!(obj == null)) Debug.Print(obj.Properties["CpuStatus"].Value.ToString()); } } The ManagementObjectSearcher is the key element here – it gets the returned properties based on the query. The parameter I am passing to it when instantiating is the actual query. As you see, it is very similar to SQL. My current query will retrieve all properties available in Win32_Processor. I iterate through them (note that each result is a ManagementObject – the property holder, in this case will be a separate instance for each CPU that is found) and print in the Output window the value of the CpuStatus property: The 1 here is exactly what is returned. It is a good practice to consult the documentation before reading specific properties, to understand the possible returned values. 1 for CpuStatus means that the CPU is installed and is active. Important note: Some of the readers might be curious, why there is a null value verification. Some of the classes require user authentication to get the correct data and some properties are simply not available, being the cause of multiple exceptions, depending on the authentication methods and property types. Therefore, to avoid exceptions, this code security measure is used here. If only one property is needed to be retrieved, then the query can be organized like this: SELECT CpuStatus FROM Win32_Processor The important thing to remember here is that when you only retrieve one property, the rest of them are unavailable for that specific query result. Therefore, trying to get their value will cause an exception.
April 12, 2010
by Denzel D.
· 18,382 Views
article thumbnail
Unrolling Spock: Advanced @Unroll Usages in 0.4
Some of the Spock Framework 0.4 features are starting to see the light of day, with the Data Tables being explained last week in a nice blog post from Peter Niederwieser. One of the new features that I had not seen before is the new advanced @Unroll usage. Mixed with Data Tables, it produces some very cool results, and it can still be used with 0.3 style specs as well. Here's the juice: JUnit Integration and @Unroll Spock is built on JUnit, and has always had good IDE support without any effort from you as a user. For the most part, the IDEs just think Spock is another unit test. Here's the a Spock spec for the new Data Tables feature and how it shows up in an IDE. import spock.lang.* class TableTest extends Specification { def "maximum of two numbers"() { expect: Math.max(a, b) == c where: a | b | c 3 | 7 | 7 5 | 4 | 5 9 | 9 | 9 } } The assertion will be run 3 times: once for each row in the data table. And JUnit faithfully reports the method name correctly, even when the method names has a space in it: The problem with data driven tests and xUnit is poor error location. When a test fails you will receive an error stating which method is the culprit... but what if the method runs an assertion across 50 or 60 pieces of data? The cause of a failure is almost never clear with data driven tests. At it's worst you have to step through several iterations of code waiting for an exception. Good tests have a clear point of failure, but good tests also do not repeat themselves with boilerplate. This is exactly why Spock has the @Unroll annotation. As a test author you get to write one concise unit test, and JUnit does the work of reporting results that help you isolate failures. Consider the same test method with the @Unroll annotation and the accompanying IDE output. @Unroll def "maximum of two numbers"() { expect: Math.max(a, b) == c where: a | b | c 3 | 7 | 7 5 | 4 | 5 9 | 9 | 9 } When executed, JUnit sees three test methods instead of one: one for each row in the data table: The end result for you as a test writer is accurate failure resolution. You can pinpoint exactly which row failed. This feature is available in Spock 0.3 and you can use it today. What is new in 0.4 is the ability to change the test name dynamically. Here is a full @Unroll annotation that changes the method name: @Unroll("maximum of #a and #b is #c") def "maximum of two numbers"() { expect: Math.max(a, b) == c where: a | b | c 3 | 7 | 7 5 | 4 | 5 9 | 9 | 9 } Notice the #variable syntax in the annotation parameter. The # produces a sort of GString-like variable substitution that lets you bind columns from your data table into your test name. The annotation parameter references #a, #b, and #c, which aligns with the data table definition of a | b | c. Check out the IDE output: Previously, the test name was just the iteration number within the test. The new @Unroll parameter allows you to make the test name much more meaningful. Your tests will improve because failures become more descriptive. Unrolled failure messages before simply had the iteration name embedded in them, while now they can have meaningful data that you prescribe. My favorite part of playing with the new @Unroll was to see the default value of the parameter within the Spock source code: java.lang.String value() default "#featureName[#iterationCount]"; Talk about eating your own dog food... the default value is a test name template, just like you could have written in your own test. Makes you wonder what other variables are in scope, huh? Spock snapshot builds for 0.4 are available at: http://m2repo.spockframework.org. Get it before the link breaks. From http://hamletdarcy.blogspot.com
March 24, 2010
by Hamlet D'Arcy
· 36,193 Views · 1 Like
article thumbnail
Play! Framework Usability
Perhaps the most striking thing about about the Play! framework is that its biggest advantage over other Java web application development frameworks does not fit into a neat feature list, and is only apparent after you have used it to build something. That advantage is usability. Note that usability is separate from functionality. In what follows, I am not suggesting that you cannot do this in some other framework: I merely claim that it is easier and more pleasant in Play! I need to emphasise this because geeks often have a total blind spot for usability because they enjoying figuring out difficult things, and under-appreciate the value of things that Just Work. Written by web developers for web developers The first hint that something different is going on here is when you first hear that the Play! framework is 'written by web developers for web developers', an unconventional positioning that puts the web's principles and conventions first and Java's second. Specifically, this means that the Play! framework is more in line with the W3C's Architecture of the World Wide Web than it is with Java Enterprise Edition (Java EE) conventions. URLs for perfectionists For example, the Play! framework, like other modern web frameworks, provides first-class support for arbitrary 'clean' URLs, which has always been lacking from the Servlet API. It is no coincidence that at the time of writing, Struts URLs for perfectionists, a set of work-arounds for the Servlet API-based Struts 1.x web framework, remains the third-most popular out of 160 articles on www.lunatech-research.com despite being a 2005 article about a previous-generation Java web technology. In Servlet-based frameworks, the Servlet API does not provide useful URL-routing support; Servlet-based frameworks configure web.xml to forward all requests to a single controller Servlet, and then implement URL routing in the framework, with additional configuration. At this point, it does not matter whether the Servlet API was ever intended to solve the URL-routing problem and failed by not being powerful enough, or whether it was intended to be a lower-level API that you do not build web applications in directly. Either way, the result is the same: web frameworks add an additional layer on top of the Servlet API, itself a layer on top of HTTP. Play! combines the web framework, HTTP API and the HTTP server, which allows it to implement the same thing more directly with fewer layers and a single URL routing configuration. This configuration, like Groovy's and Cake PHP's, reflects the structure of an HTTP request - HTTP method, URL path, and then the mapping: # Play! 'routes' configuration file… # Method URL path Controller GET / Application.index GET /about Application.about POST /item Item.addItem GET /item/{id} Item.getItem GET /item/{id}.pdf Item.getItemPdf In this example, there is more than one controller. We also see the use of an id URL parameter in the last two URLs. HttpServletRequest Another example is Play!'s Http.Request class, which is a far simpler than the Servlet API's HttpServletRequest interface. In addition, Play! uses a class where Java EE 6 uses the Java EE convention of using an interface. This interface is also split between HttpServletRequest and the more generic ServletRequest interface. This separation may be useful if you want to use Servlets for things other than web applications, or if you want to allow for the unlikely possibility of the web changing protocol, but for most of us it is merely irrelevant complexity. In other words, the Servlet API is always used with a framework on top these days because it is sub-optimised for building web applications, which is what all of us actually use it for. Play! fixes that. Better usability is not just for normal people Another way of looking at the idea that Play! is by and for web developers is to consider how a web developer might approach software design differently to a Java EE developer. When you write software, what is the primary interface? If you are a web developer, the primary interface is a web-based user-interface constructed with HTML, CSS and (increasingly) JavaScript. A Java EE developer, on the other hand, may consider their primary interface to be a Java API, or perhaps a web services API, for use by other layers in the system. This difference is a big deal, because a Java interface is intended for use by other programmers, while a web user-interface interface is intended for use by non-programmers. In both cases, good design includes usability, but usability for normal people is not the same as usability for programmers. In a way, usability for everyone is a higher standard than usability for programmers, when it comes to software, because programmers can cope better with poor usability. This is a bit like the Good Grips kitchen utensils: although they were originally designed to have better usability for elderly people with arthritis, it turns out that making tools easier to hold is better for all users. The Play! framework is different because the usability that you want to achieve in your web application is present in the framework itself. For example, the web interface to things like the framework documentation and error messages shown in the browser is just more usable. Along similar lines, the server's console output avoids the pages full of irrelevant logging and pages of stack traces when there is an error, leaving more focused and more usable information for the web developer. $ play run phase ~ _ _ ~ _ __ | | __ _ _ _| | ~ | '_ \| |/ _' | || |_| ~ | __/|_|\____|\__ (_) ~ |_| |__/ ~ ~ play! 1.0, http://www.playframework.org ~ ~ Ctrl+C to stop ~ Listening for transport dt_socket at address: 8000 10:15:58,629 INFO ~ Starting /Users/peter/Documents/work/workspace/phase 10:16:00,007 WARN ~ You're running Play! in DEV mode 10:16:00,424 INFO ~ Listening for HTTP on port 9000 (Waiting a first request to start) ... 10:16:11,847 INFO ~ Connected to jdbc:hsqldb:mem:playembed 10:16:13,448 INFO ~ Application 'phase' is now started ! 10:16:14,825 INFO ~ starting DispatcherThread 10:16:48,168 ERROR ~ @61lagcl6i Internal Server Error (500) for request GET /application/startprocess?account=x Java exception (In /app/controllers/Application.java around line 41) IllegalArgumentException occured : Person not found for account x play.exceptions.JavaExecutionException: Person not found for account x at play.mvc.ActionInvoker.invoke(ActionInvoker.java:200) at Invocation.HTTP Request(Play!) Caused by: java.lang.IllegalArgumentException: Person not found for account x at controllers.Application.startProcess(Application.java:41) at play.utils.Java.invokeStatic(Java.java:129) at play.mvc.ActionInvoker.invoke(ActionInvoker.java:127) ... 1 more Try to imagine a JSF web application producing a stack trace this short. In fact, Play! goes further: instead of showing the stack trace, the web application shows the last line of code within the application that appears in the stack trace. After all, what you really want to know is where things first went wrong in your own code. This kind of usability does not happen by itself; the Play! framework goes to considerable effort to filter out duplicate and irrelevant information, and focus on what is essential. Quality is in the details In the Play! framework, much of the quality turns out to be in the details: they may be small things individually, rather than big important features, but they add up to result in a more comfortable and more productive development experience. The warm feeling you get when building something with Play! is the absence of the frustration that usually results from fighting the framework. We recommend that you go to http://www.playframework.org/, download the latest binary release, and spend half an hour on the tutorial. Peter Hilton is a senior software developer at Lunatech Research.
March 16, 2010
by $$anonymous$$
· 24,633 Views
article thumbnail
Cache Java Webapps with Squid Reverse Proxy
This article shows you step by step how to cache your entire tomcat web application with Squid reverse Proxy without writing any Java code. What is Squid Squid is a free proxy server for HTTP, HTTPS and FTP which saves bandwidth and increases response time by caching frequently requested web pages. While squid can be used as a proxy server when users try to download pages from the internet, it can be also used as a reverse-proxy by putting squid between the user and your webapp. All user requests first hit Squid. If the requested page already exists in Squid’s cache it is served directly from the cache without hitting your Webapp. If the page does not exist in Squid’s cache, it is fetched from your web application and stored in the cache for future requests. Squid reduces hits to your server by caching response pages. You don’t have to worry about building page level caching in every application that your write, Squid takes care of that part. When should I use Squid Ideally you should use Squid for pages which have a high ratio of reads to writes. In other words, a page that changes less frequently but is accessed very often. Here are some scenarios: A dynamical web page which displays news and is updated once an hour, and receives hundreds of hits during the hour A static web page accessed freqently. Squid can give performance boost by caching frequently accessed static web pages in memory When should I not use Squid In most cases, if the request URL is the only factor which determines the response then you can safely use Squid. See more specific examples below: If the entire apps is very dynamic in nature, and the validity of pages changes immediately. Squid is not suitable for apps which require login. This unfortunately is a large number of applications. Such applications need to resort to back end caching, for example use other caching frameworks like Ehcache to cache re-usable page fragments and/or cache database queries and/or other performance bottlenecks. Apps which heavily use browser cookies. Squid relies on URLs to cache pages. If the page served is computed from URLs + cookies, then you should not cache those pages in Squid. How does the overall setup work Apache Squid Tomcat architecture Apache receives requests on port 80. Apache calls Squid with the request. Squid checks its cache to see if it has the response cached from before. If yes and if the response is not expired, it returns the cached response.In this case: Squid will write the following header to the response X-Cache: HIT from www.vineetmanohar.com X-Cache: HIT from www.vineetmanohar.com If the response is not found in Squid’s cache, squid will make a call to Tomcat on port 8082. Tomcat’s proxy connector is listening on this port. It processes the request and sends the response back to Squid. Squid saves the response in its cache, unless caching is disabled for that URL. Squid returns the final response to Apache which sends the response back to the user. What if I don’t want to use Apache Using Apache is not required to use Squid. You can run Squid on port 80, and point your users directly to Squid. If that is the case, skip section one and directly jump to section 2 below. Step 1/3: Apache Httpd Config If you are using Apache as a front end, you need to instruct Apache to forward requests to Squid at port 3128. See the following code snippet. Change the server name and paths to reflect your real values. Apache config file: /etc/httpd/conf/httpd.conf ServerName www.vineetmanohar.com DocumentRoot /home/webadmin/www.vineetmanohar.com/html # forward requests to squid running on port 3128 ProxyPass / http://localhost:3128/ ProxyPassReverse / http://localhost:3128/ /etc/httpd/conf/httpd.conf ServerName www.vineetmanohar.com DocumentRoot /home/webadmin/www.vineetmanohar.com/html # forward requests to squid running on port 3128 ProxyPass / http://localhost:3128/ ProxyPassReverse / http://localhost:3128/ In addition to the above, you also need mod_proxy installed. If you see the following in your httpd.conf, you probably already have mod_proxy installed. If you first need to install mod_proxy LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so Step 2/3: Squid Config First make sure that Squid is installed on your server. You can download Squid from here. The squid config file on Linux/Unix is located at this location /etc/squid/squid.conf /etc/squid/squid.conf The config file is pretty long. Follow these instructions and set the values appropriately. 1. # leave the port to 3128 2. http_port 3128 3. 4. # how much memory cache do you want? depends on how much memory you have on the machine 5. cache_mem 200 MB 6. 7. # what's the biggest page that you want stored in memory. If you home page is 100 KB and 8. # you want it stored in memory, you may set it to a number bigger than that. 9. maximum_object_size_in_memory 100 KB 10. 11. # how much disk cache do you want. It is 6400 MB in the following example, change it as per 12. # your needs. Make sure you have that much disk space free. 13. cache_dir ufs /var/spool/squid 6400 16 256 14. 15. # this is probably the most important config section. Here you can configure the cache life for 16. # each URL pattern. 17. 18. # Time is in minutes 19. # 1 day = 1440, 2 days = 2880, 7 days = 10080, 28 days = 40320 20. 21. # do not cache url1 22. refresh_pattern ^http://127.0.0.1:8082/url1/ 0 20% 0 23. 24. # cache url2 for 1 day 25. refresh_pattern ^http://127.0.0.1:8082/url2/ 1440 20% 1440 override-expire override-lastmod reload-into-ims ignore-reload 26. 27. # cache css for 7 days 28. refresh_pattern ^http://127.0.0.1:8082/css 10080 20% 10080 override-expire override-lastmod reload-into-ims ignore-reload 29. 30. # by default cache the whole website for 1 minute 31. refresh_pattern ^http://127.0.0.1:8082/ 0 20% 0 override-expire override-lastmod reload-into-ims ignore-reload 32. 33. # how long should the errors should be cached for. For example 404s, HTTP 500 errors 34. negative_ttl 0 seconds 35. 36. # On which host does tomcat run. Set 127.0.0.1 for localhost 37. httpd_accel_host 127.0.0.1 38. 39. # this is the proxy port as defined in Tomcat server.xml. By default it is "8082" 40. httpd_accel_port 8082 41. 42. # set this to "on". Read more documentation if you want to change this. 43. httpd_accel_single_host on 44. 45. # To access Squid stats via the manager interface, you need to enter a password here 46. cachemgr_passwd your_clear_text_password all 47. 48. # Say "off" if you want the query string to appear in the squid logs. 49. strip_query_terms off # leave the port to 3128 http_port 3128 # how much memory cache do you want? depends on how much memory you have on the machine cache_mem 200 MB # what's the biggest page that you want stored in memory. If you home page is 100 KB and # you want it stored in memory, you may set it to a number bigger than that. maximum_object_size_in_memory 100 KB # how much disk cache do you want. It is 6400 MB in the following example, change it as per # your needs. Make sure you have that much disk space free. cache_dir ufs /var/spool/squid 6400 16 256 # this is probably the most important config section. Here you can configure the cache life for # each URL pattern. # Time is in minutes # 1 day = 1440, 2 days = 2880, 7 days = 10080, 28 days = 40320 # do not cache url1 refresh_pattern ^http://127.0.0.1:8082/url1/ 0 20% 0 # cache url2 for 1 day refresh_pattern ^http://127.0.0.1:8082/url2/ 1440 20% 1440 override-expire override-lastmod reload-into-ims ignore-reload # cache css for 7 days refresh_pattern ^http://127.0.0.1:8082/css 10080 20% 10080 override-expire override-lastmod reload-into-ims ignore-reload # by default cache the whole website for 1 minute refresh_pattern ^http://127.0.0.1:8082/ 0 20% 0 override-expire override-lastmod reload-into-ims ignore-reload # how long should the errors should be cached for. For example 404s, HTTP 500 errors negative_ttl 0 seconds # On which host does tomcat run. Set 127.0.0.1 for localhost httpd_accel_host 127.0.0.1 # this is the proxy port as defined in Tomcat server.xml. By default it is "8082" httpd_accel_port 8082 # set this to "on". Read more documentation if you want to change this. httpd_accel_single_host on # To access Squid stats via the manager interface, you need to enter a password here cachemgr_passwd your_clear_text_password all # Say "off" if you want the query string to appear in the squid logs. strip_query_terms off Step 3/3: Tomcat Config Make sure that the HTTP Proxy Connector is defined in TOMCAT_HOME/conf/server.xml. If needed, see additional documentation on Tomcat proxy connector. Squid Manager Interface You can access the Squid config and stats via the Squid Manger HTTP interface. Make sure that the “cachemgr.cgi” file which ships with squid installation is in your cgi-bin directory. More documentation on setting that up here. Once you’ve set it up, you can access the cache manager via this URL: http:///cgi-bin/cachemgr.cgi http:///cgi-bin/cachemgr.cgi To continue enter the following values: Cache host: localhost Cache port: 3128 Manager name: manager Password: Cache host: localhost Cache port: 3128 Manager name: manager Password: Store Directory Stats shows you how much disk space is used by the disk cache. Cache Client List show you the cache HIT/MISS ratio as %. You should monitor this frequently and tune your cache to get a higher hit %. Reload Squid Config without restarting Edit the squid config using “vi” or your favorite editor vi /etc/squid/squid.conf vi /etc/squid/squid.conf Once you are done editing, reload the new config without restarting Squid /usr/sbin/squid -k reconfigure /usr/sbin/squid -k reconfigure Clearing Squid Cache To clear Squid cache: 1) Set the memory cache to 4 MB (or a lower number) cache_mem 8 MB cache_mem 8 MB 2) Set the disk cache to 8 MB (or a lower number). The disk cache must be higher that the memory cache. cache_dir ufs /var/spool/squid 20 16 256 cache_dir ufs /var/spool/squid 20 16 256 3) Reload squid config without restart as described in the previous section 4) You may need to wait a few hours for the cache to get cleared. Once the cache is clear, you may restore the previous cache sizes and reload the new config again. You can monitor the cache size through the Squid Manager HTTP interface. Bypassing Squid If for some reason you need to bypass Squid, reconfigure Apache to directly send requests to Tomcat. Edit the Apache config file /etc/httpd/conf/httpd.conf # forward requests directly to Tomcat's proxy connector running on port 8082 ProxyPass / http://localhost:8082/ ProxyPassReverse / http://localhost:8082/ # forward requests directly to Tomcat's proxy connector running on port 8082 ProxyPass / http://localhost:8082/ ProxyPassReverse / http://localhost:8082/ You will need to restart Apache after making this change. /etc/init.d/httpd restart Conclusion Squid is a very powerful tool for caching. It is not for all applications. Please examine the need of your application and use squid appropriately. I’ve used squid for several years for caching the output from a Java data mashup application and am very satisfied with the ease of use and benefits. Hope you found this tutorial useful. Feel free to post a comment or share your experience with squid. References Squid official website From http://www.vineetmanohar.com
March 10, 2010
by Vineet Manohar
· 109,019 Views · 1 Like
article thumbnail
Open Source NoSQL Databases
For almost a year now, the idea of "NoSQL" has been spreading due to the demand for relational database alternatives. Maybe the biggest motivation behind NoSQL is scalability. Relational databases don't lend themselves well to the kind of horizontal scalability that's required for large-scale social networking or cloud applications, and ORMs can abstract away impedance mismatch only so much. In other cases, companies just don't need as many of the complex features and rigid schemas provided by relational databases. Most people are not suggesting that we all ditch the RDBMS, in fact, many companies don't really need to switch. Relational databases will probably be necessary for many applications years and years from now. In essence, NoSQL is a movement that aims to reexamine the way we structure data and draw attention to innovation in hopes of finding the solution to the next generation's data persistence problems. Here are some of the better known open source data stores/models labeled as "NoSQL": CouchDB- Document Store Maps keys to data It provides a RESTful JSON API and is written in Erlang You can upload functions to index data and then you can call those functions Has a very simple REST interface Provides an innovative replication strategy - nodes can reconnect, sync, and reconcile differences after being disconnected for long periods of time Enables new distributed types of applications and data MongoDB - Document Store Free-form key-value-like data store with good performance Powerful, expansive query model Usability rivals that of Redis Good for complex data storage needs. Production-quality sharding capabilities Neo4j - GraphDB Disk-based Has a restricted, single-threaded model for graph traversal Has optional layers to expose Neo4j as an RDF store Can handle graphs of several billion nodes, relationships, or properties on a single machine Released under a dual license - free for non-commercial use Apache Hbase - Wide Column Store/Column Families Built on top of Hadoop, which has functionality similar to Google's GFS and MapReduce systems Hadoop's HDFS provides a mechanism that reliably stores and organizes large amounts of data Random access performance is on par with MySQL Has a high performance Thrift gateway Cascading source and sink modules Redis - Key Value/Tuple Store Provides a rich API and does more operations in memory, using disk only periodically. It's extremely fast Lets you append a value to the end of a list of items that's already been stored on a key. Has atomic operations, making it a best-of-breed tally server. Memcached - Key Value/Tuple Store High-performance, distributed memory object caching Free and open source Generic and agnostic to the objects/strings it caches It's all in-memory data Simple yet elegant design enables easy development and deployment Language neutral caching scheme. Most of the large properties on the web are using it now, except for Microsoft Project Voldemort - Eventually Consistent Key Value Store Used by LinkedIn Handles server failure transparently Pluggable serialization supports rich keys and values including lists and tuples with named fields Supports common serialization frameworks including Protocol Buffers, Thrift, and Java Serialization Data items are versioned Supports pluggable data placement strategies Memory caching and the storage system are combined Tokyo Cabinet and Tokyo Tyrant - Key Value/Tuple Store Supports hashtable mode, b-tree mode, and table mode It's fast and straightforward Good for small to medium-sized amounts of data that require rapid updating and can be easily modeled in terms of keys and values Cassandra - Wide Column Store/Column Families First developed by Facebook SuperColumns can turn a simple key-value architecture into an architecture that handles sorted lists, based on an index specified by the user. Can scale from one node to several thousand nodes clustered in different data centers. Can be tuned for more consistency or availability Smooth node replacement if one goes down ____ Some other well known NoSQL-style data stores that are closed source include Google BigTable and Amazon SimpleDB. GigaSpaces is a popular space-based Grid solution that has NoSQL qualities. Check out this informative post on NoSQL patterns.
February 23, 2010
by Mitch Pronschinske
· 45,916 Views
article thumbnail
Free Online SVN Repositories
This week, I searched for free online SVN repositories for closed-source projects.
February 23, 2010
by Nicolas Fränkel
· 52,786 Views
article thumbnail
Electric Cloud's New Tools Avoid Unnecessary Builds
electric cloud has recently developed several unique capabilities for its software production suite, and now the company has built these technologies into the newest versions of their electricaccelerator and electriccommander products, which were released this week. electricaccelerator 5.0 has added two major features. the "electrify" feature can now parallel process virtually any software production task, and the new subbuild feature avoids unnecessary builds. electriccommander 3.5 features a new, extensible interface for managing and automating a shop's existing tool infrastructure. electricaccelerator 5.0 electricaccelerator speeds up make, nmake, microsoft visual studio, and apache ant based builds (by 10-20x the company says) by parallelizing them and running them on a computer cluster. accelerator 5.0 is the full debut of electric cloud's patented technology to safely speed up development tasks through its parallel processing via public or private compute clouds. originally, accelerator's parallel processing applied only to software builds, but now it applies to other tools and development tasks in the build-test-deploy cycle including parallel testing and data modeling. electrify creates an all-purpose private compute cloud for parallel processing, but parallel processing can also be done on desktops or a dedicated server. another innovative addition to accelerator is the subbuilds feature. first previewed in electric cloud's free spark build tool , subbuilds allow unnecessary build avoidance. subbuilds are able to skip large swaths of the build tree by building only the relevant pieces to the current work. the result is fewer broken builds and the ability to compile and test quickly and frequently without affecting the rest of the team. the dependency graph below shows the agent component (util, xml, http libraries, and the agent application code) as solid. sparkbuild can recognize that only this component needs to be rebuilt. electricaccelerator 5.0 now supports build tools such as msbuild and scons along with homegrown systems. teams that standardize on scons, for example, can use less hardware and provide faster builds than individual mutli-core servers by applying the benefits of centralization. the virtualization capabilities of accelerator also allow easier support for multiple configurations. electriccommander 3.5 electriccommander is a web-based application for defining and executing distributed processes in the build-test-deploy cycle. in a development environment using many disparate tools, commander 3.5 can remove the need to learn multiple interfaces, and it manages those tools from a central, custom ui. electriccommander 3.5 can be configured to extract and display data from the defect tracker, relevant build results, and test results. this lets build managers track the status of fixes and be notified when qa resolves the issue. the commander ui's custom, dynamic screens can help developers create and execute a build or test request using the right parameters. commander 3.5 can give developers a custom interface based on their role in the production cycle. 3.5 also provides tools to create custom plug-ins for third-party integrations. electriccommander job plotter to try out some of electricaccelerator's capabilities, download electric cloud's free sparkbuild tool.
February 17, 2010
by Mitch Pronschinske
· 11,482 Views
article thumbnail
Rules of Thumb: Don't Use the Session
A while ago I wrote about some rules of thumb that I'd been taught by my colleagues with respect to software development and I was reminded of one of them – don't put anything in the session – during a presentation my colleague Luca Grulla gave at our client on scaling applications by making use of the infrastructure of the web. The problem with putting state in the session is that it means that requests from a specific user have to be tied to a specific server i.e. we have to use a sticky session/session affinity. This reduces our ability to scale our system horizontally (scale out) i.e. by adding more servers to handle requests. If, for example, we have a small amount of users (whose first request went to the same server) making a lot of requests (perhaps through AJAX calls) then we may quickly put one of our servers under load while the others are sitting there idle. In addition we have increased complexity around our deployment process. If we want to do an incremental deployment of a new version of our website across some of our servers then we need to ensure that we create a copy of any sessions on those servers and copy them to the ones we're not updating so that any users still on the system don't experience loss of data. There are no doubts products which can allow us to do this more easily but it seems to me to be an unnecessary product in the first place since we can just design our application to not rely on the session. As I understand it the web was designed to be stateless i.e. each request is independent and all the information is contained within that request and the idea of the session was only something which was added in later on. How does the way we code change if we don't use the session? One thing we've often used the session for on projects that I've worked on is to store the current state of a form that the user is filling in. When they've completed the form then we would probably store some representation of what they've entered in a database. If we don't use the session then we need to store this intermediate data somewhere and include a key to load it in the request. On the project I'm working on at the moment we're storing that data in a database but then clearing out that data every other day since it's not needed once the user has completed the form. An alternative perhaps could be to store it in a cache since in reality all we have is a key/value pair which we need to keep for a relatively short amount of time. Advantages/disadvantages of this approach The disadvantage of this approach is that we have to make more reads and writes to the database to deal with this temporary data. Apart from the advantages I outlined initially, we are also more protected if a server handling a user's request goes down. If we were using the session to store intermediate state then that information would be lost and they would have to start over. In the approach we've using this isn't a problem and when the request is sent to another server we can still query the database and get whatever data the user had already saved. As with most things there's a trade off to be made but in this case it seems a fair one to me. Alternative approaches I've come across some alternative approaches where we avoid using the session but don't store intermediate state in a database. One way is to store that state in hidden fields on the form and another is to send it in the request parameters. Neither of these approaches seem particularly clean to me and they give the user an easier way to change the intermediate data in ways that the form might not allow them to do. From my experience our server side code becomes more complicated since we're always writing all of the data entered so far back into the page. In addition the url becomes a complete mess with the second approach. From http://www.markhneedham.com
February 17, 2010
by Mark Needham
· 23,736 Views · 1 Like
article thumbnail
Interview: Intelligence Gathering Software on the NetBeans Platform
Chris Bohme is the chief software architect at Pinkmatter Solutions – a small, specialized software development company in South Africa. Pinkmatter has been working with a company called Paterva for the past few years to build Maltego - a tool for data visualization, reconnaissance and intelligence gathering. Maltego is used by law enforcement and intelligence agencies, network security professionals and large corporates to discover and analyze information. In a nutshell, how does Maltego work? Maltego models information as entities (e.g., persons, e-mail addresses) and relationships between them. Relationships are discovered by running pluggable functions (called transforms) on the entities. For example, when running a social network transform on my e-mail address, one would discover my Facebook and LinkedIn profiles. Out of the box, Maltego ships with over 150 transforms that mainly relate to open source intelligence. However, an organization using Maltego user can easily create their own transforms that run on their internal data. The concept of transforms makes data gathering very quick and easy which is one of the aspects that sets it apart from some of its competitors like Analyst Notebook, which has been the de-facto tool for investigation and intelligence analysis. Why and how did you choose to use the NetBeans Platform as the basis of this application? We have actually been using the NetBeans Platform at Pinkmatter since 2002, back in the days of NetBeans 3.2, when the NetBeans Platform was not really separate from the IDE and the only real documentation for NetBeans Platform users was the source code. Back then Pinkmatter was building a network security management tool we called “Palantir”, which was never released but which would later form the basis framework for Maltego. (Ironically one of Maltego’s competitors is now made by a company called Palantir Tech.) I was using Forte (Sun’s customized version of NetBeans) as my IDE for Java development and realized that I would need very similar features in Palantir – global selection management, runtime composition (i.e., modules), copy/paste/undo/redo, auto-update, property grid, window manager, system palette etc. So I began reading through the sources and building Palantir as a NetBeans module while trying to remove as much of the IDE parts as possible. I immediately fell in love with its design and complexity (yes, complexity – no matter how long you have been using the NetBeans Platform, there is something new you can learn every day) – but there was a definite beauty to it and I knew that following its architecture guidelines would save me from the certain “spaghetti-death” to which all large UI applications I had seen thus far were doomed from the start. What are the main advantages of the NetBeans Platform to you? On a personal level, working with the NetBeans Platform early on in my developer career has shaped my mindset around application design. As such, the NetBeans Platform source code was one of my most influential teachers when it comes to API design and architecture of large complex applications. I started looking for similar patterns in the frameworks I was building using other programming languages and it has helped me identify designs that are “right” and those that are “wrong”. (When it comes to API design I believe that “truth, like beauty, is not a matter of opinion” :-) ) On the level of Maltego, I think the benefits are fairly obvious – there is a platform that comes with lots “free stuff” right out of the box. And hey, the best thing is, someone else improves, fixes and supports all this free stuff while you can focus on your specific problem domain. If I were to rephrase the question to read “what in the NetBeans Platform couldn’t I live without?” – well, it would be the features related to runtime composition. The fact that components can be registered declaratively (for example in layer files) and are added as modules that get loaded at runtime shapes the overall design and maintainability and is something a modern application cannot do without. As Maltego matures, instead of removing the dependency on some NetBeans APIs and replacing them with our own, we tend to use more and more of what the NetBeans Platform (and even the IDE) has to offer. This is a very good indication to me that a) NetBeans Platform was the right choice to build Maltego on and b) that the evolution of the NetBeans Platform is in line with the needs of its users (well, at least for us). Continue to part 2 of this interview... Were there things that pleasantly surprised you while working with the NetBeans Platform? There were many.... but let’s start with backward compatibility. A lot of the Palantir code from 2002 can still run in NetBeans 6 – that is 3 major versions and 8 years later! – not a small feat to achieve for an API designer. As another example, for the upcoming 3.0 of Maltego we redesigned our underlying information model to allow a user to model entities with a multitude of properties. We needed to allow the user to configure these using many kinds of weird and wonderful type editors... and actually the good old PropertySheet works well for that, can be highly customized and takes up very little screen real estate. In general I am amazed every time how efficiently NetBeans can handle so many modules (and merged layer files)! What could be improved? Well, I have this gripe with the wizard framework. Although sufficient for the IDE, there is a lot to be desired from wizards when used in other applications. How about re-using wizard panels for editing something in a dialog (panels as tabbed panes for example)? Or quick and dirty mechanisms to disable the Cancel button or intercept it to cancel a background thread? (I know, I know, stop complaining, Chris, and contribute something of that sort – yes... one day when Maltego has grown up and I am no longer working nights.) But in the end I think that in spite of all the great efforts that have been made, documentation is still a limiting factor when it comes to the adoption and effective use of the NetBeans Platform. There are a number of really good books, blogs and tutorials, however, I feel there is a need for something like “An Architect’s Guide for Designing Applications for the NetBeans Platform” – something that focuses more on core design decisions that have to be made before getting started. For example, “how is your global selection management to work?” and “what mechanisms does the NetBeans Platform provide for that?” Any tips or tricks for other NetBeans Platform developers? Read every book that has ever been published about the NetBeans Platform. Read and take note of tips published on blogs – you might not need them today but in 6 months time you will remember that there is a smart way to do something. I check planetnetbeans.org every day for interesting articles. Keep a copy of the NetBeans Platform sources around (you can download them in a handy ZIP file and don’t even have to do a checkout). Whenever there is something that you don’t understand or that seemingly does not work, grep the sources for the relevant classes. Don’t feel you have to make use of NetBeans APIs all the time. Sometimes it makes sense to just use a JTable instead of creating a Node implementation with OutlineView. As that component gets more full featured, you can always refactor it and replace it with a suitable View. The default lookup is your friend! Finish this sentence: "If I had known..." Actually, if I had known that it is possible (and easy) to replace the default implementation of ContextGlobalProvider I would have more hair left on my head! (Before I read Tim’s blog entry, activating a TopComponent would amount to changing the global selection – something that is not valid for all applications – and boy did I struggle...) What's the future of the application? We are close to releasing Maltego 3.0 – the next big milestone in the life of our beloved baby. This release brings many new features with it, not least of all a slick new look (thanks to some of the beautiful work done by the likes of Gunnar Reinseth, Mikael Tollefsen and Kirill Grouchnikov): Our ultimate vision is to evolve Maltego into an autonomous information monitoring system – something like an IDS (intrusion detection system), but for information. The threats to organizations (or governments) on the internet are no longer constrained to attacks on their network infrastructure (the origin of the term IDS) but information about them, their competitors or employees floating around on the internet can seriously harm them. Think of it as a highly customizable, intelligent Google Alert, which is fed from the internet as well as private, internal databases. Subsequent releases will bring us closer to that vision with geo-spatial data, time base analyses and live, real time data feeds.
February 15, 2010
by Geertjan Wielenga
· 38,797 Views
article thumbnail
Checkout Multiple Projects Automatically Into Your Eclipse Workspace With Team Project Sets
When working in Eclipse, you’ll often end up with a number of projects in your workspace that constitute an application. You could have a multi-tiered system with a web, server and database project and other miscellaneous ones. Or if you’re an Eclipse RCP developer, you could end up with dozens of plugins each represented by a project. Although multiple projects give you modularity (which is good), they can make it difficult to manage the workspace (which is bad). Developers have to check out each project individually from different locations in the repository. Sometimes they even have to get projects from multiple repositories. This is a painstakingly long and error-prone task. But an easier way to manage multiple projects is with Eclipse’s Team Project Sets (TPS). Creating a workspace becomes as easy as importing an XML file and waiting for Eclipse to do its job. Yes, there are other more sophisticated tools out there that do this and more (eg. Maven and Buckminster) but team project sets are a good enough start if you haven’t got anything set up and may be good enough for the longer term as well, depending on how your team works. Create a Team Project Set to share with other developers It’s easy to create a team project set (TPS). The first thing is to start with a workspace that already has all the projects checked out. Then it’s as easy as choosing File > Export > Team > Team Project Set, selecting the projects you want to export and then entering a file name. Done. But it’s always better to see it in action. In the video, I export 3 projects that I’ve already checked out from Subversion into a TPS file. Notes: You can select which projects should go into the TPS. This way you can exclude irrelevant or personal projects you’ve got in your workspace. Eclipse adds the extension .psf if you don’t provide one. The exported file is an XML file, with the default extension of psf, so in the video the file would be music.psf. There is a project entry for each project you exported that includes the project’s name and its repository location, separated by commas. Once created, the file is easy to edit so go ahead and make your own changes if you want to. Here is an example of what it looks like: svn/repo/music-application/trunk,music-application"/> svn/repo/music-db/trunk,music-db"/> svn/repo/music-web/trunk,music-web"/> Import the Team Project Set to checkout multiple projects into your workspace Now for the fun part. To import a team project set (TPS), start with any workspace (normally an empty one) and choose File > Import > Team > Team Project Set. Choose the TPS file that someone else kindly exported for you and then wait for Eclipse to do its magic. Notes: If you have an existing project in your workspace whose name matches a project in the TPS, Eclipse will prompt you whether you want to overwrite the project. I always choose No To All, since overwriting the project will mean you lose any changes you made to it. But if you have the urge to start from scratch then you can choose Yes. The import also creates a link to the repository in SVN Repositories, so you don’t have to do that. If one already exists, it will not duplicate it but reuse the existing connection. The process may take a while depending on the number of projects in the TPS and the speed of your repo checkouts. You can choose to run the import in the background (as I did in the video), giving you the opportunity to use Eclipse while the import happens. Otherwise, grab some coffee and wait for it to finish the checkouts. Gotcha: You may find that Eclipse 3.4 and lower may actually create a repository connection per project if the repository didn’t exist beforehand, which is not ideal. To solve this, create an initial repository root that’s shared by the projects and then do the import of the TPS. This problem has been fixed in 3.5 Managing the team project set and working with branches I’d recommend checking in the team project set into your repository and versioning/tagging it along with the rest of your code base. With each release you may be adding/removing projects and consequently updating the TPS, so it’s important that the TPS matches what the repo looks like at that point. As projects are added/removed with each release, you have 3 possibilities: Recreate the TPS from an existing workspace: Same as the steps above, but it means that whoever does the export needs to maintain an up to date workspace to reflect the current project structure. Modify an existing TPS with the new/deleted project: This entails adding/removing an entry from the PSF file. Not a lot of maintenance, but someone needs to remember to do this. Automatically create/update the TPS: You could write a script that somehow updates the TPS to reflect the new repo structure. For example, if you’re developing an Eclipse RCP application, the PDE Build provides a map file that could be used as input to create the PSF file. If you want to checkout a branch other than trunk, just open the PSF file and do a Find/Replace of trunk with your branch name. You could also introduce an automated process as part of your build/release scripts to update the TPS with the correct branch and check it back in automatically, but that’s really optional. From http://eclipseone.wordpress.com
February 13, 2010
by Byron M
· 22,830 Views
article thumbnail
Java Content Repository: The Best Of Both Worlds
Learn the basics of Java Content Repositories, including how they work, and how they're used.
January 4, 2010
by Bertrand Delacretaz
· 144,355 Views · 5 Likes
article thumbnail
How to Create a Java EE 6 Application with JSF 2, EJB 3.1, JPA, and NetBeans IDE 6.8
Develop a web-based app based on technologies in the JEE6 specs such as Enterprise Java Beans 3.1 and JPA with the help of NetBeans IDE 6.8.
December 29, 2009
by Christopher Lam
· 723,076 Views · 3 Likes
  • Previous
  • ...
  • 519
  • 520
  • 521
  • 522
  • 523
  • 524
  • 525
  • 526
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×