Data Engineering Resources

The Latest Data Engineering Topics

Lazy initialization is a common procedure in programming, especially in an environment like PHP applications where during a specific HTTP request it's practically sure that there are resources which won't be used at all, and that, if eager loaded, would be simply discarded without being referenced again. The Lazy Loading pattern for stateful objects is a transparent solution for loading entity objects on demand, when they are referenced troughout client code. Its typical usage is in the internals of an ORM or in another kind of Data Mapper. Lazy Loading is not a mapper-specific pattern, but in the environment of enterprise patterns this is the best example of its application. Intent In general, the object graph managed by a Data Mapper can be large at will, and loading the entire graph to satisfy a request which usually involves only a small subset of it is a waste. This is the case when using many-to-many bidirectional relationships, as it is the case with the Unix users and groups model. For example, loading an User will bring back also his referenced groups, which in turn will reference their contained users and so on... In general, it is not possible to always break the relationships in one direction to simplify the model. In this case, full loading is not a waste: it's not feasible unless you have gigabytes of ram to allocate per every request. Another use case for lazy loading of domain model entities is in arbitrary computation to perform when there is no knowledge a priori of the extent of the object graph that will be reached by the method calls (or field access). For example, we may want to do a statistical analysis of the userw reachable from a particular starting point through groups relationships, and stop the research when we have reached a predetermined number of users. In this kind of computation, we cannot know for sure how much JOINs we will have to perform as this is an example of a query which falls out of the scope of SQL. Implementation Lazy Loading is generally implemented with a Proxy pattern, where a fake entity is substituted to the real objects at the boundary of the object graph. This fake entities do not load relationships and fields when they are first inserted in the graph. The Proxy pattern implementation used here is the ghost variant, where the object is essentially a proxy for itself, since the field are loaded and stored in it on the first access. The ghost has only its identifier fields loaded by default, and it is inserted in the Identity Map anyway. The PHP implementation of Lazy Loading mechanism in ORMs has to respect some caveats to work. For starters, the Proxy object substituted for the real one has to conform to the same interface of the original class. This is accomplished via subclassing since there is usually no explicit interface (defined with the interface keyword) for domain objects. This subclass may be generated previous to its usage or at runtime, when it comes the time to insert a proxy in the graph. The methods of the entity, such as getters, setters, or logic-related ones are overridden, and their new implementations call a load() method before delegating to the parent method. The load() private method uses a reference to the Entity Manager or to some internal component of the Orm to load the fields of the object itself, and its relationships. Note that this referenced objects can be proxies in turn. Of course, the load() method has a boolean guard that makes it perform real execution of queries only one time, since if the object has already been loaded its behavior returns to be the original class's one. Lazy Loading can be used also for public properties via __get() and __set() overriding, but it is maybe too much transparent as accessing a public property may cause a query to the database. PHP magic methods are often abused. Performance Lazy Loading may cause performance problems when used extensively. The main issues is that it promotes many small queries to load objects one at the time, instead of a big JOIN which can load the whole object graph subset needed, avoiding the overhead of all the unnecessary queries and a chatty communication. Even if the database is high-performance or not relational or whatever, if it resides on another machine communication over the network will experience longer latency and it is best performed in batches. Parts of the graph to eager load via joins (instead of lazy load) can be specified while querying via the language of choice (HQL, JQL, Linq, DQL) or via programmatic interface (Criteria object). This part of the relational information crosses the boundary of the ORM to reach its client code, but it is mandatory to optimize the generated queries to an acceptable level. One solution is to encapsulate it in Repository implementations, that provide a domain-specific interface and hide all the queries in the object-related language of the ORM. Lazy Loading is complex, and it is perfect for outsourcing to a library like an Orm. Be aware of the issues it causes on performance and you will be able to take advantage of it without causing strain on your database. Example The code sample for this ORM pattern is taken from Doctrine 2, where I contributed part of the code related to lazy loading one-to-one and many-to-one relationships. The class presented here is the ProxyFactory, which generates a proxy object for a particular entity and, if needed, the source code for the proxy class. I have inserted some extension to the docblock comments. * @author Giorgio Sironi * @since 2.0 */ class ProxyFactory { /** The EntityManager this factory is bound to. */ private $_em; /** Whether to automatically (re)generate proxy classes. */ private $_autoGenerate; /** The namespace that contains all proxy classes. */ private $_proxyNamespace; /** The directory that contains all proxy classes. */ private $_proxyDir; /** * Initializes a new instance of the ProxyFactory class that is * connected to the given EntityManager. * * @param EntityManager $em The EntityManager the new factory works for. * @param string $proxyDir The directory to use for the proxy classes. It must exist. * @param string $proxyNs The namespace to use for the proxy classes. * @param boolean $autoGenerate Whether to automatically generate proxy classes. */ public function __construct(EntityManager $em, $proxyDir, $proxyNs, $autoGenerate = false) { if ( ! $proxyDir) { throw ProxyException::proxyDirectoryRequired(); } if ( ! $proxyNs) { throw ProxyException::proxyNamespaceRequired(); } $this->_em = $em; $this->_proxyDir = $proxyDir; $this->_autoGenerate = $autoGenerate; $this->_proxyNamespace = $proxyNs; } /** * Gets a reference proxy instance for the entity of the given type and identified by * the given identifier. * Generalle this method will reuse the source code it has already generated, even in * other HTTP requests since the source file are saved in a configurable folder. * This is the only public method the other parts of the ORM will generally use. * * @param string $className * @param mixed $identifier * @return object */ public function getProxy($className, $identifier) { $proxyClassName = str_replace('\\', '', $className) . 'Proxy'; $fqn = $this->_proxyNamespace . '\\' . $proxyClassName; if ($this->_autoGenerate && ! class_exists($fqn, false)) { $fileName = $this->_proxyDir . DIRECTORY_SEPARATOR . $proxyClassName . '.php'; $this->_generateProxyClass($this->_em->getClassMetadata($className), $proxyClassName, $fileName, self::$_proxyClassTemplate); require $fileName; } if ( ! $this->_em->getMetadataFactory()->hasMetadataFor($fqn)) { $this->_em->getMetadataFactory()->setMetadataFor($fqn, $this->_em->getClassMetadata($className)); } $entityPersister = $this->_em->getUnitOfWork()->getEntityPersister($className); return new $fqn($entityPersister, $identifier); } /** * Generates proxy classes for all given classes. * Used for pre-generation from command line, in case PHP on the hosting service * has not the rights to create new files in the proxies folder. * * @param array $classes The classes (ClassMetadata instances) for which to generate proxies. * @param string $toDir The target directory of the proxy classes. If not specified, the * directory configured on the Configuration of the EntityManager used * by this factory is used. */ public function generateProxyClasses(array $classes, $toDir = null) { $proxyDir = $toDir ?: $this->_proxyDir; $proxyDir = rtrim($proxyDir, DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR; foreach ($classes as $class) { $proxyClassName = str_replace('\\', '', $class->name) . 'Proxy'; $proxyFileName = $proxyDir . $proxyClassName . '.php'; $this->_generateProxyClass($class, $proxyClassName, $proxyFileName, self::$_proxyClassTemplate); } } /** * Generates a proxy class file. * Substitutes certain parameters like class name and methods in a template * kept at the end of this file. The class source code is saved in a file in the directory specified. * Testing this method is usually not a problem since the directory is easily configurable. * * @param $class * @param $originalClassName * @param $proxyClassName * @param $file The path of the file to write to. */ private function _generateProxyClass($class, $proxyClassName, $fileName, $file) { $methods = $this->_generateMethods($class); $sleepImpl = $this->_generateSleep($class); $placeholders = array( '', '', '', '', '' ); if(substr($class->name, 0, 1) == "\\") { $className = substr($class->name, 1); } else { $className = $class->name; } $replacements = array( $this->_proxyNamespace, $proxyClassName, $className, $methods, $sleepImpl ); $file = str_replace($placeholders, $replacements, $file); file_put_contents($fileName, $file); } /** * Generates the methods of a proxy class. * All methods are overridden to call _load() before execution. * * @param ClassMetadata $class * @return string The code of the generated methods. */ private function _generateMethods(ClassMetadata $class) { $methods = ''; foreach ($class->reflClass->getMethods() as $method) { /* @var $method ReflectionMethod */ if ($method->isConstructor() || strtolower($method->getName()) == "__sleep") { continue; } if ($method->isPublic() && ! $method->isFinal() && ! $method->isStatic()) { $methods .= PHP_EOL . ' public function '; if ($method->returnsReference()) { $methods .= '&'; } $methods .= $method->getName() . '('; $firstParam = true; $parameterString = $argumentString = ''; foreach ($method->getParameters() as $param) { if ($firstParam) { $firstParam = false; } else { $parameterString .= ', '; $argumentString .= ', '; } // We need to pick the type hint class too if (($paramClass = $param->getClass()) !== null) { $parameterString .= '\\' . $paramClass->getName() . ' '; } else if ($param->isArray()) { $parameterString .= 'array '; } if ($param->isPassedByReference()) { $parameterString .= '&'; } $parameterString .= '$' . $param->getName(); $argumentString .= '$' . $param->getName(); if ($param->isDefaultValueAvailable()) { $parameterString .= ' = ' . var_export($param->getDefaultValue(), true); } } $methods .= $parameterString . ')'; $methods .= PHP_EOL . ' {' . PHP_EOL; $methods .= ' $this->_load();' . PHP_EOL; $methods .= ' return parent::' . $method->getName() . '(' . $argumentString . ');'; $methods .= PHP_EOL . ' }' . PHP_EOL; } } return $methods; } /** * Generates the code for the __sleep method for a proxy class. * The __sleep() method is used in case of serialization, which should * not include service objects referenced by proxies like the Entity Manager. * * @param $class * @return string */ private function _generateSleep(ClassMetadata $class) { $sleepImpl = ''; if ($class->reflClass->hasMethod('__sleep')) { $sleepImpl .= 'return parent::__sleep();'; } else { $sleepImpl .= 'return array('; $first = true; foreach ($class->getReflectionProperties() as $name => $prop) { if ($first) { $first = false; } else { $sleepImpl .= ', '; } $sleepImpl .= "'" . $name . "'"; } $sleepImpl .= ');'; } return $sleepImpl; } /** Proxy class code template */ private static $_proxyClassTemplate = '; /** * THIS CLASS WAS GENERATED BY THE DOCTRINE ORM. DO NOT EDIT THIS FILE. */ class extends \ implements \Doctrine\ORM\Proxy\Proxy { private $_entityPersister; private $_identifier; public $__isInitialized__ = false; public function __construct($entityPersister, $identifier) { $this->_entityPersister = $entityPersister; $this->_identifier = $identifier; } private function _load() { if (!$this->__isInitialized__ && $this->_entityPersister) { $this->__isInitialized__ = true; if ($this->_entityPersister->load($this->_identifier, $this) === null) { throw new \Doctrine\ORM\EntityNotFoundException(); } unset($this->_entityPersister); unset($this->_identifier); } } public function __sleep() { if (!$this->__isInitialized__) { throw new \RuntimeException("Not fully loaded proxy can not be serialized."); } } }'; }

May 26, 2010

by Giorgio Sironi

· 27,148 Views

Writing user stories for web applications

User stories are the substitute of formal requirements documents in an agile environment: they are short summaries of a functionality that leave space to expansion and refinement when it comes the time to implement it. Writing them it's not rocket science and it is definitely something a web developer should master. Stories are not requirements, in the sense they are not required at all: the prioritization process will choose the most important stories to implement at a given time, basing on their cost and on their value. Instead of giving a list of requirements where 90% of the features are only nice to have, the customer gets to make an informed decision over which stories should be implemented first, and can handle new requirements by adding them to the global list of stories (backlog). The typical agile estimation process is not the subject of this article, but it looks like this: asking questions to the customer generates a bunch of user stories, which go into the backlog where all the ideas about functionalities are kept. Stories are estimated in relative points or ideal time, to give an idea of their size. With the customer, the developers can choose which stories to pull from the backlog into a smaller plan (iteration or release based). if new requirements come up or a user story changes too much to be considered the same, it is put in the backlog so that the next planning process can deal with it. If it's still a priority, it will be surely included in the next iteration. How to write them A major point of user stories is their focus on the value provided to the end user and not on the technical topics related to its implementation. Technical options will be chosen to satisfy the story and to estimate its cost during the subsequent planning. User stories have usually the following overly famous form: As a [role], I [feature] so that [reason] For example, As a user, I can login into the application so that the it presents me my preferences. What is a role while writing user stories? It is the analogue of the classic Uml use case diagram role: for example it can be the customer, a user of the application, an admin, developer which uses the library you're writing. The so that part is often optional, but it should described the value provided by the feature the user story describes. The feature itself can change in development but it should conserve its original value. In our example, if we make the user login with OpenID the value is conserved even if we have thrown away our own authentication mechanism. In this sense, stories do not describe describe the how, only the what, and this particular what can change is this helps to achieve the same why (a little metaphysical definition). Keep in mind that user stories have to be testable, because they are the definition of done b: you can drink champagne only when the acceptance tests for a story are passing: consider modifying your stories if it is difficult to write automated tests for them. For instance in an application which indexed files asynchronously (and it may take a lot after the user has been returned an Indexing started message) I actually addedd a dynamic page with the last additions to the index, that is updated as the last step of the pipeline of operations, to make the story more testable. Complex stories which are unclear to test are a symptom that a refinement is needed. Where to write them The general suggestion is to write every user story on a 3x5 card because this size choice keeps the stories short and to the point. Moreover, if you write on paper you can shuffle single cards around for planning pourposes. I hate to write on paper something I may edit, so when working solo, I use txt files where a story is represented by a row. I'm currently looking for a low-footprint project management tool which does not complicate my process, but I can easily move around stories from the backlog to the release or iteration plan with vim by using dd to cut a story and P or p to paste it. This is nearly as low-tech as sheets of paper. Why web applications? Web applications adapt particularly well to iterative development and to a story-based approach. Usually a web application starts with a beta that implements the most important stories to provide basic functionalities. If a story does not gather enough success online (poor response from the users), linked stories that expand it may be delayed (left in the backlog) or cancelled, to make room for other stories that have come up. There are no problems in the client side while updating the application, as all issues with upgrading are moved to the server side (such as changes to the database schema). If the developers have access to the hosting service for the application, rolling out the result of an iteration is very easy and the users may not notice it until they start to use the new features. Compare this agility with the old MS Access applications used in the offices of half of the planet. While web applications may take over the enterprise world, legacy managerial applications are widely spread and it is difficult to substitute them in one single shot. I tried with a formal waterfall approach, and failed as the requirements were too fine-grained, and impossible to prioritize: imagine a list of an hundred of queries that can be done over the database, and no idea of which are the most used. Imagine fifty different entities which represent an outdated domain model you have to replace. How do you know where to start? Even if you can implement all these requirements, how much will it cost? An agile process is our only hope for replacing this kind of applications, and if you will someday see a PHP application generating invoices in your office, it will be in part thanks to user stories.

May 25, 2010

by Giorgio Sironi

· 43,628 Views

Interview: Music Composer on the NetBeans Platform

Steven Yi (pictured right) is a programmer and composer living in Rochester, NY. He studied music composition in college and became a programmer afterwards. He started off as a Flash and server side developer (which he did for about 7 years), and has spent the past few years at his current company doing mobile development with J2ME, Android, and iPhone, as well as server-side development with Spring, Hibernate, etc. He started learning and using Java and Swing for personal work in 2000 and has been using it since then for the development of blue, the focus of the interview that follows below. In the interview, Steven talks about the "blue" music composer, how it works, and how the NetBeans Platform and Python form the basis of this cool open-sourced Java music composer. What's "blue"? blue is a music composition environment I started in the fall of 2000. It was actually my very first Java program! At the time, I had started using the music software Csound (http://www.csounds.com) to compose, but found it slow to work with when it came to accomplishing what I was interested in musically. I had the idea to create a simple program that would have a timeline and the ability to scale musical score material in time. Fast forward many years later: in trying to solve other musical problems, and responding to feedback from the community of users, I've expanded blue's features a great deal. It now includes things like a mixer and effects system, a GUI builder tool for creating synthesizer interfaces, embedded Jython processing of musical scripts, and more. It's been quite satisfying to create a tool that can express my musical interests and to find a community of users who have found value in this program for their own musical work. Some screenshots: The Orchestra manager shows a BlueSynthBuilder instrument being edited. The "Reson 6" instrument is shown in edit mode. The BSB Object Properties panel shows the properties for the selected knob: The Score timeline shows a project using multiple parameter automations. The values automated include things like volume, panning, and a time pointer for a phase-vocoder instrument. All BlueSynthBuilder instruments, Effects, and the Mixer volume sliders can be automated: The Score timeline showing the author's composition "Reminiscences". The timeline shows multiple Python SoundObjects used. The SoundObject Editor shows the editor for the selected SoundObject in the timeline. The SoundObject Properties panel shows different properties for the selected SoundObject: The Score timeline showing a Tracker SoundObject being used. The timeline is configured to snap at every 4 beats and the time bar has been configured to show in numbers rather than time: The Score timeline showing a PianoRoll SoundObject being used. The PianoRoll is unique in that it is microtonal, meaning it can adapt the number of steps per "octave", depending on the values configured from a tuning file. In this screenshot, the scale loaded was a Bohlen-Pierce scale, which has 13-tones per tritave (octave and a half): The blue Mixer is shown docked into the bottom bar and in an open state. The interfaces for user-created Chorus and Reverb effects are shown. The interfaces were created using the same GUI builder tool that is found in the BlueSynthBuilder instrument: It's got a very special appearance. How did that come about? blue's custom look and feel started off one day when I was using my Palm PDA. I remember thinking that I enjoyed the look of the device with the backlight on, and so I wanted to recreate that kind of look for my program. Later, I modified the color scheme to tone it down in some ways, but I also introduced more colors than white and cyan to highlight secondary and tertiary features. Maybe now it is now more like Tron than it is like Palm. :) Overall, I enjoy the darker look of the application when I'm working on music. I tend to work on music when I have free time, and that is usually only late at night—I've found having a darker screen has been easier on my eyes. Also, if anyone was wondering, yes, blue is my favorite color. The blue look and feel is encapsulated in a module named "blue-plaf" and is available in the "blue" Mercurial repository (http://bluemusic.hg.sourceforge.net/hgweb/bluemusic/blue). The look and feel is quite hacked up (redoing it properly has been another item on my todo list), but it can be dropped into another application and it should work, as shown below with the CRUD Sample (which can be created from a tutorial found here): Can you explain how blue's timeline works? blue has a concept of SoundLayers and SoundObjects. SoundObjects are objects that primarily produce notes and have a start and duration. There are many different types of SoundObjects in blue and each has an editor (viewed in the SoundObject Editor TopComponent when a SoundObject is selected), and a BarRenderer, which is used to draw the content area of the bar on the timeline. A PolyObject is a special SoundObject. It consists of SoundLayers, which contain SoundObjects. The root timeline is itself just a PolyObject that you can add as many layers to as you like. You can also group individual SoundObjects into their own PolyObjects, and then use the resulting PolyObjects just like any other SoundObject on the timeline. If you double-click a PolyObject, the timeline is then reset with the timeline of the PolyObject you selected. As a result, PolyObjects allow timelines to be embedded within other timelines. If you think about how music is grouped into motives, phrases, sections, and even larger groupings, you can see how PolyObjects might represent these kinds of musical abstractions. For the component design, the ScoreTopComponent starts off with a JSplitPane to split between a SoundLayerListPanel on the left and a JScrollPane on the right. The JScrollPane has a ScoreTimeCanvas (the main timeline) in the main viewPort's view, a panel with the the time bar and tempo editor in the column header, and the corner is used to open up an extra panel to modify properties for the timeline. The JScrollPane has customized JScrollBars used to add the ± buttons that perform zooming on the timeline. There are a number of other features involved that are implemented amongst a number of classes, but the details of how viewPorts are synchronized (among other things) may be a bit too technical to discuss here. For those who are interested, the code can be viewed within the blue.ui.core.score package within the blue-ui-core module. How did blue come to find itself atop the NetBeans Platform? I first started to be interested in NetBeans IDE around the time 4.1 came out, but didn't really get into using it until the release of 5.0. At that time, I had hand-written Swing components for about 4-5 years (I don't really remember when 5.0 was released), and I found Matisse to be quite nice and began using it here and there. I had looked at the NetBeans Platform as an RCP at that time, but found it to be quite a bit to understand. However, I still kept it on my radar. Around the time 6.0 or 6.5 came out, I started to reconsider migrating to the NetBeans Platform once again. By this time, I had moved over to using NetBeans IDE full time for blue development and had been using NetBeans IDE more in general—particularly Java Web development and Ruby on Rails. One of the biggest things I found attractive about NetBeans IDE is its windowing system... and the things I read about in the platform development articles I'd seen online made me curious once again to see what the NetBeans Platform offered. I still felt that there was going to be a big learning curve to learn the NetBeans Platform, but the NetBeans Platform tutorials online were really quite helpful, as were the members of the NetBeans Platform mailing list, and there were also many more books available to help me get started. I think I ultimately spent about 6-8 months migrating blue to using the NetBeans Platform. Granted, it was a busy time in my life and I was working on this only in my spare time, so I think in the end it was a reasonable amount of time. Users have been very positive about the new blue interface and application as a whole, and I think it has been worth spending the time to use the NetBeans Platform. blue's window layout is quite unexpected for a NetBeans Platform application. By the time I had started migrating blue to the NetBeans Platform, the application was already some 7-8 years old. The interface I designed for blue in pure Swing was influenced by my experiences in using Flash, looking at other music composition environments (Digital Audio Workstations and Sequencing Programs), and evaluating the different aspects of working with Csound. Mapping the components from the Swing-based application to the NetBeans Platform was a little tricky in that I couldn't quite get the exact same design of panels as I had in pure Swing. In the end, I tried to think about where most of the components resided physically, and created TopComponents and placed them in the center, left, right, or bottom parts of the main window. I kept some of the dialogs from the old codebase as-is, but I migrated others to be TopComponents so that they could be docked, opened, or dragged out into a dialog as the user wished. In the end, the GUI is different and took a little getting used to after years of using and building the old interface, but I quickly adjusted to the changes and I think there is much greater consistency and usability now. The users have responded very positively to the general polish of the application and to being able to customize their environment. I myself have very much enjoyed being able to dock all of the windows as well as using full-screen mode, especially when I am on my netbook and composing. Excellent! What features of the NetBeans Platform are you using and what do you find to be most useful? Currently, I am using only a very small part of the NetBeans Platform. By the time I started to move my code to the NetBeans Platform, the codebase was already some 7 or 8 years old. I took the approach recommended to me on the mailing list and started off small, focusing primarily on migrating my project to using the Windows System API, the Options API, and a few other utility API's like IO and Dialogs. Having an old codebase, I found that I spent most of my time during migration just reorganizing my UI into TopComponents and working out communications between the components. I also spent time looking at API's that I had developed myself and seeing which ones could be replaced with API's provided by the NetBeans Platform. At this time, the application is still using a number of API's I wrote from the old codebase, but over time I would like to migrate more of the appplication to use the Nodes and Visual Library API's. I think migrating a codebase of this size in phases really worked out well. In the first phase, I was able to take advantage of the Window System API and have a very visible result on the application and gained a lot for usability. Also, a big part of the migration involved moving the codebase from a monolithic source tree and partitioning it into logical modules. I think there really is a great deal of benefit to working with a codebase with modular design, and that too is a very positive result of working with the NetBeans Platform. Please say something about how Jython relates to this application, how you are using it (what the benefits are), and your general opinions on Jython. I have had a Python SoundObject in blue for quite some time—I think since 2002. For me, it is one of the most important tools in blue when it comes to accomplishing what I want musically. With computer music, we have a lot of tools for what I call Common Practice computer music: PianoRolls, Pattern Editors, and Notation Objects. For computer musicians who are interested in Uncommon Practice music, the ability to use a scripting language opens up a number of ways to express musical ideas that cannot be easily conveyed using those other tools. In blue, Jython is primarily used to allow users to write scripts that will generate notes. For myself, I use Python scripts to model orchestral composition, creating Performer and PerformerGroup objects that I write in Python. I also write performance functions, usually per-project, to perform different musical material in different ways. Other users have used Python scripts in exploring things like algorithmic composition and genetic algorithms in their work. A blue project can contain any number of Python objects. The score generated by each Python object is translated and scaled in time by moving and resizing the SoundObject in the timeline. This allows a user who may want to use scripting to create musical material to also take advantage of blue's timeline to organize how the different musical objects will work together. One of the things I most appreciate about Jython (and scripting languages on the JVM in general) is that it is embeddable within a Java application. By packaging and embedding the Jython interpreter within the blue application, users can rest assured that the Python scripts they write can be interpreted anywhere that blue is installed. It's an extra assurance that their musical projects will be long-lasting, but they can still take advantage of a full programming language like Python in their work. Overall, I think that Jython is a fine piece of software and I hope that it will continue to grow and develop for years to come. Is the application open source and are you looking for code contributions and, if so, in which areas? Yes, the application is available under the GPL v2 license, and the source code can be viewed from the Mercurial repository on SourceForge at http://bluemusic.hg.sourceforge.net/hgweb/bluemusic/blue/. I am a strong proponent of open source, especially for creative work. In the same way that we can today look at and study musical works by composers of the past (like Josquin and Bach), I would like to imagine that the work composers and other artists are now creating with computers will also be open and available for study in the future. I believe that using open source software for creative work greatly helps in making musical projects available for the years ahead. I have done most of the development of blue myself, and over the years I've certainly built up a long list of things that I would like to implement. Users have also made wonderful feature requests that I would love to see in the program—but unfortunately, there are only so many hours in the day. It would certainly be nice to have others contributing code! Beyond new features, there are a number of infrastructural things that would be nice to address. The codebase is many years old, and while the application has been refactored multiple times over its lifetime, there are still some areas of the application that could be much more cleanly implemented. Also, in moving over to the NetBeans Platforms, I only really took the first steps. There are a number of components within the application that could probably be better served by migrating to using more of the NetBeans API's. For internal work, things like modifying the timeline to implement zooming to use Graphics2d and transforms, implementing a better waveform renderer for audio files, and further enhancing the instrument GUI builder are all things I'd like to see. I'd also love to get help in migrating all of the tables and trees to using the Nodes API, something that I have not yet had the time to do. It would also be nice to get the manual (currently in HTML and PDF, generated from DocBook) integrated into the application as JavaHelp, but this is another thing that I have had to postpone due to lack of time. For features, some interesting things I'd love to see are a Notation SoundObject, a separate graphical instrument builder using the Visual Library API, and a Sampler instrument. There's also a sound drawing SoundObject, enhancements to existing SoundObjects, and more I'd love to see moving forward. Maybe someone will find these kinds of things interesting and will take a look at blue's code sometime! Thanks Steven and happy music making with blue!

May 25, 2010

by Geertjan Wielenga

· 17,465 Views

Practical PHP Patterns: Identity Map

The Identity Map pattern is a Map implementation related to a Data Mapper usage. A map in the computer science sense is also called dictionary, or associative array; although in PHP associative arrays are very powerful, this kind of Map can be implemented as an object to present a specific interface to client code. The purpose of an Identity Map in the Data Mapper context is to keep a list of all the references to the in-memory domain objects that has been reconstituted by the Data Mapper internal mechanism, or are somehow managed by the Data Mapper itself (for example they have been scheduled for persistence). The Identity Map solves the problem of multiple loading of objects, which leads to performance issues and inconsistencies like two different objects with different states (but whose identity is the same, since they have for example an equal user id) that has to be stored in the back end. Ideally has a reference to every single object of the domain (that contains state, and thus is managed by the Data Mapper instead of being created by infrastructure or domain factories), in practice it is an array of references to the loaded objects. PHP implementation In PHP, the Identity Map is not unique troughout the whole application, but it is an object whose scope is limited to the single HTTP request (and so for example different requests have different Identity Maps which can become inconsistent with each other.) This limited scope, which is part of the nature of PHP and its scalable architecture, requires careful handling of objects that have been detached from the Data Mapper. In general, you can serialize or store in a cache domain objects for performance boost or simplification of business logic. You have, however, the obligation of reattaching a domain object to the Data Mapper with a special method (in Doctrine 2 EntityManager::merge()) to subsequently persist it, so that it can be reinserted in the Identity Map instead of being considered new or being duplicated. Remember that here duplication is more an issue of consistency than performance: a Data Mapper which accepts two different objects that points to the same place in the data store is not reliable. In fact, an Identity Map is a fundamental part of a non-naive Data Mapper: before recreating an entity the mapper looks for it in the Identity Map, to check if it is already available. Only if the object is not there, the Data Mapper creates a new one and inserts it in the Map for later reuse. Thus the Identity Map bridges the gap between the storage and the memory, keeping track of which parts of the object graph have been brought in memory and which are still on disks or external database machines, since we are forced by the technology to actually reconstitute a very small part of the application state in the form of objects (to be able to work on them). This approach is particularly suited to PHP's shared nothing mentality: there are other solutions for languages like Java and C#, like keeping the whole object graph in memory (some gigabytes) and dealing with persistence by taking a periodical snapshot of the graph, which is then freezed and stored on slower-but-larger memories like disks or SSD. In Doctrine 2 From the technical point of view, the Identity Map is an object or an associative array, with a single instance that exists for the entire request. This data structure is composed by the Entity Manager (the Facade of the Data Mapper) or by a Unit of Work, or even by some internal class of the Data Mapper. Even when an object is reconstituted as part of a query and not requested by its primary key, the loader class has to extract a unique identifier for the domain object and ask the Identity Map. In the sample code we will see at the end of this article, Doctrine 2 choice has been to keep the Identity Map as a private property (a multidimensional associative array) of the Unit of Work, which has a set of public methods available to access the Map to act as the unique Facade for the internal code. The tipical key used for the indexing is a combination of the class name of the domain object and of its unique identifier (usually the primary key used in storage, reduced to a serialized value if constituted by multiple fields.) This indexing implementation is generic enough to deal with most of the use cases, even with inheritance strategies. Another supplemental indexing is based on the spl_object_hash() function result, which returns a unique identifier for every in-memory object; this indexing is used to quickly check if an object originated from somewhere is in the Identity Map, without extracting its identifier and class name. The sample code is part of the Unit of Work of Doctrine 2. I cut all the aspects which did not involve its internal Identity Map as we have already described it in its own article. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel * @internal This class contains highly performance-sensitive code. */ class UnitOfWork implements PropertyChangedListener { //... /** * The identity map that holds references to all managed entities that have * an identity. The entities are grouped by their class name. * Since all classes in a hierarchy must share the same identifier set, * we always take the root class name of the hierarchy. * * @var array */ private $_identityMap = array(); /** * Map of all identifiers of managed entities. * Keys are object ids (spl_object_hash). * * @var array */ private $_entityIdentifiers = array(); /** * INTERNAL: * Registers an entity in the identity map. * Note that entities in a hierarchy are registered with the class name of * the root entity. * * @ignore * @param object $entity The entity to register. * @return boolean TRUE if the registration was successful, FALSE if the identity of * the entity in question is already managed. */ public function addToIdentityMap($entity) { $classMetadata = $this->_em->getClassMetadata(get_class($entity)); $idHash = implode(' ', $this->_entityIdentifiers[spl_object_hash($entity)]); if ($idHash === '') { throw new \InvalidArgumentException("The given entity has no identity."); } $className = $classMetadata->rootEntityName; if (isset($this->_identityMap[$className][$idHash])) { return false; } $this->_identityMap[$className][$idHash] = $entity; if ($entity instanceof NotifyPropertyChanged) { $entity->addPropertyChangedListener($this); } return true; } /** * INTERNAL: * Removes an entity from the identity map. This effectively detaches the * entity from the persistence management of Doctrine. * * @ignore * @param object $entity * @return boolean */ public function removeFromIdentityMap($entity) { $oid = spl_object_hash($entity); $classMetadata = $this->_em->getClassMetadata(get_class($entity)); $idHash = implode(' ', $this->_entityIdentifiers[$oid]); if ($idHash === '') { throw new \InvalidArgumentException("The given entity has no identity."); } $className = $classMetadata->rootEntityName; if (isset($this->_identityMap[$className][$idHash])) { unset($this->_identityMap[$className][$idHash]); $this->_entityStates[$oid] = self::STATE_DETACHED; return true; } return false; } /** * Checks whether an entity is registered in the identity map of this UnitOfWork. * * @param object $entity * @return boolean */ public function isInIdentityMap($entity) { $oid = spl_object_hash($entity); if ( ! isset($this->_entityIdentifiers[$oid])) { return false; } $classMetadata = $this->_em->getClassMetadata(get_class($entity)); $idHash = implode(' ', $this->_entityIdentifiers[$oid]); if ($idHash === '') { return false; } return isset($this->_identityMap[$classMetadata->rootEntityName][$idHash]); } }

May 24, 2010

by Giorgio Sironi

· 8,223 Views

C# and SQLDependency: Monitoring Your Database for Data Changes

Using the SqlDependency Class is a good way to make your data driven application (whether it be Web or Windows Forms) more efficient by removing the need to constantly re-query your database checking for data changes. For the purposes of this article we'll be discussing using SqlDependencywith SQL 2005, but I'm sure it works the same, or very similar with SQL 2008. Using SqlDependency to monitor for data changed relies on two things from SQL Server, Service Broker. A Service Broker allows the sending of asynchronous messages to and from your database. The second item it relies on Queues, your Service Broker will use your Queue to create loose coupling between the sender and receiver. The sender can send its message to the Queue and move on, and relies on the Service Broker to make sure whatever message was put into the Queue is processed and a response is sent back to the sender (your application). For this article I’m using a sample database I created named SqlDependencyExample and a table named Employee whick has this structure: CREATE TABLE [dbo].[EmployeeList]( [EmployeeID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [varchar](25) NOT NULL, [LastName] [varchar](25) NOT NULL, [PhoneNumber] [varchar](13) NOT NULL) ON [PRIMARY] We will be using a stored procedure for querying, but this can also be done with a regular text query in your code. Here’s the stored procedure CREATE PROCEDURE uspGetEmployeeInformationASBEGIN -- Insert statements for procedure here SELECT EmployeeID, FirstName, LastName, PhoneNumber FROM dbo.EmployeeListEND NOTE: Notice in the stored procedure we have dbo in the table name, when using dependency to monitor data changes you have to have the table in the format [owner].[Tablename] of it can cause unwanted results, so to avoid that just use the said format. First thing we need to do is create your Service Broker and Queue so we can send messages back & forth between our database. In my example I have a simple table that holds employee names & phone numbers. This is how we create the Service Broker & Queue (My database name is SqlDependencyExample but change that with your database name (We also need to give our SQL user permission to access it) USING [SqlDependencyExample]CREATE QUEUE NewEmployeeAddedQueue;CREATE SERVICE NewEmployeeAddedService ON QUEUE NewEmployeeAddedQueue([http://schemas.microsoft.com/sql/notifications/postquerynotification]); GRANT SUBSCRIBE QUERY NOTIFICATIONS TO SqlDependencyExampleUser; Now we move on to the fun part, the code (I know that's what you're waiting for). Before this can work we need to check and make sure the connected user has the proper permissions for the notifications. We can do this with creating a simple method CheckUserPermissions which uses the SqlClientPermissions Class to check the permissions of the currently connected user. So here’s the simple method for accomplishing this: private bool CheckUserPermissions(){ try { SqlClientPermission permissions = new SqlClientPermission(PermissionState.Unrestricted); //if we cann Demand() it will throw an exception if the current user //doesnt have the proper permissions permissions.Demand(); return true; } catch { return false; } One thing to know, SqlDependency relies on the OnChangeEventHandler Delegate which handles the SqlDependency.OnChange Event, which fires when any notification is received by any of the commands for the SqlDependency object. Now for getting the employee list, in this method we will query our EmployeeList table to get the employees information. We will also set the OnChange event of our SqlDependency object so that it can let us know when data has changed in our table and re-populate it with the latest employee list. Before trying to access our database we will call the method CheckUserPermissions method we created earlier to make sure the current user has the proper permissions, if not we display a message otherwise we move on to getting the employee list and populating a ListView with the ID, first & last name of the employee and their phone number. Here’s the GetEmployeeList method, which expects a parameter of type ListView (which is what will be displaying our employee list) /// /// method for querying our database to get an employee list/// /// the ListView we want to display the employee list inprivate void GetEmployeeList(ListView lview){ //the connection string to your database string connString = "YourConnectionString"; //the name of our stored procedure string proc = "uspGetEmployeeInformation"; //first we need to check that the current user has the proper permissions, //otherwise display the error if (!CheckUserPermissions()) MessageBox.Show("An error has occurred when checking permissions"); //clear our ListView so the data isnt doubled up lview.Items.Clear(); //in case we have dependency running we need to go a head and stop it, then //restart it SqlDependency.Stop(connString); SqlDependency.Start(connString); using (SqlConnection sqlConn = new SqlConnection(connString)) { using (SqlCommand sqlCmd = new SqlCommand()) { sqlCmd.Connection = sqlConn; sqlCmd.Connection.Open(); //tell our command object what to execute sqlCmd.CommandType = CommandType.StoredProcedure; sqlCmd.CommandText = proc; sqlCmd.Notification = null; SqlDependency dependency = new SqlDependency(sqlCmd); dependency.OnChange += new OnChangeEventHandler(dependency_OnDataChangedDelegate); sqlConn.Open(); using (SqlDataReader reader = sqlCmd.ExecuteReader()) { while (reader.Read()) { ListViewItem lv = new ListViewItem(); lv.Text = reader.GetInt32(0).ToString(); lv.SubItems.Add(reader.GetString(1)); lv.SubItems.Add(reader.GetString(2)); lv.SubItems.Add(reader.GetString(3)); lview.Items.Add(lv); } } } } Notice we set the OnChangeEventHandler of our SqlDependency object to dependency_OnDataChangedDelegate, this is the method that will re-query our table and send notifications when the data has changed in our EmployeeList table. In this method we invoke the work onto the main UI thread, this will help us avoid the dreaded cross—thread exception when we go to re-populate the ListView control when any notifications are sent to our application. Since our method (GetEmployeeList) required a parameter we cannot use the standard MethodInvoker delegate (as this cannot accept parameters). So what we will do is create our own Delegate which can accept our parameter. Here’s the delegate (very simple): private delegate void getEmployeeListDelegate(ListView lv); Our dependency_OnDataChangedDelegate requires a SqlNotificationEventArgs in the signature. Here we check the control being used to make sure InvokeRequired is true, if no then we use Invoke to invoke the work onto the main UI thread for re-populating our ListView, otherwise we just call our method to re-query: private void dependency_OnDataChangedDelegate(object sender, SqlNotificationEventArgs e){ //avoid a cross-thread exception (since this will run asynchronously) //we will invoke it onto the main UI thread if (listView1.InvokeRequired) listView1.Invoke(new getEmployeeListDelegate(GetEmployeeList), listView1); else GetEmployeeList(listView1); //this example is for only a single notification so now we remove the //event handler from the dependency object SqlDependency dependency = sender as SqlDependency; dependency.OnChange -= new OnChangeEventHandler(dependency_OnDataChangedDelegate);} NOTE: One thing to remember is you have to stop the dependency from querying your database when your form is closed, to do this use the forms FormClosing event to stop the work in the form you are using the dependency work in. That’s how you use the SqlDependency Class for monitoring data changes in your database without having to use something like a timer control to re-query at certain intervals. Thanks for reading and happy coding :)

May 21, 2010

by Richard Mccutchen

· 104,698 Views · 1 Like

Practical PHP Patterns: Unit of Work

The Unit of Work pattern is one of the most complex moving parts of Object-Relational Mappers, and usually of Data Mappers in general. A Unit of Work is a component (for us, an object with collaborators) which keeps track of the new, modified and deleted domain objects whose changes have to be reflected in the data store. At at the end of a transaction the Unit of Work, if used correctly, is capable of producing a list of changes to perform on the data store, solving concurrency or consistency problems, and avoiding too many redundant queries in the relational case or a chatty communication in the schemaless one. As I've already said, the Unit of Work pattern is usually not employed alone but as part of a Data Mapper, which provides a different interface to the internal client code and mixes up this pattern with several other ones. The minimum transaction that a PHP Unit of Work performs is usually an HTTP request, or a session composed by more than one request in case the domain objects can be saved in an intermediate store (like $_SESSION or a cache of any kind). Being able to serialize objects in a store and reattaching them to the Unit of Work during subsequent requests is not a trivial problem. Advantages The power of a Unit of Work resides in the fact that the actual database transaction is only performed (and kept opened) when the commit() method of the Unit of Work is called, while until that moment there is ideally no use of the database connection. This paradigm is called batch update. Objects stored in a Unit of Work have usually an associated state, like: new (which correspondes to INSERT queries during the batch update) clean (no SQL queries have to be issued since the object has been retrieved and not modified) dirty (UPDATE queries) removed (DELETE queries) There are different strategies for detecting changes to the object graph. The simplest strategy is comparing objects with a clean copy kept in memory (while it is usually not performance-wise to compare them with the database.) A more complex solution is having a specific interface which is implemented by the objects, so that they can manage their state and declare they are dirty or have to be removed. This implementation choice introduces a dependency from the domain layer to the infrastructure one, thus I prefer heavier approaches like the former, which is equivalent to generate a diff with your source control system of choice, but on the object graph instead of a codebase: the source files are not responsible for diffing themselves. Furthermore, the Unit of Work decoupling from the database state introduces an upper level of management, that makes us able to rollback changes if some constraint are not satisfied, or the computation has produced an error. In PHP, the client code can simply throw the object graph away, and the partial Unit of Work changeset is forgotten in the next requests. Issues While decoupling the object graph from the data store to perform custom computations is a comfortable possibility for the client code, at the same time it can be an issue that introduces stale data. The more the objects are kept in the Unit of Work, the more the data store is prone to external concurrent modifications inconsistent with the in-memory graph (for example updating fields with different values than the ones modified in this very session.) Either a optimistic or pessimistic locking mechanism has to be introduced when the scope of the object graph is longer than the few seconds necessary of producing an HTTP response, or even less than that when the traffic is higher. Injecting the Unit of Work in the domain objects so that they can track their state can be problematic and too much an invasion of the domain layer. Usually the problem is solved the other way around: when the objects are passed to the Object-Relational Mapper (almost always implemented as a Data Mapper and not as an Active Record), it delegates part of the logic to the Unit of Work, which is a first-class citizen and can be tested independently from the other components of the library. The alternative to the inherent complexity of the Unit of Work pattern is saving an object at the moment it is updated. This solution is problematic because either the client code has to explicitly call save() methods, or queries (read modification to the data store in case of non-relational model) have to be performed at the very time of an atomic change, for instance issuing multiple UPDATE statements, one for every time a field is modified. Example The sample code of this article is the internal API of Doctrine 2. The actual Unit of Work code is dependent on the strategy adopted to detect changes to domain objects, but the interface exposed to the Entity Manager is always the same and should provide a panoramic of an Unit of Work's responsibilities and features. In this implementation, the methods persist() and remove() are used to introduce new objects to the Unit of Work or to schedule something for deletion from the database, while commit() executes a batch update on demand. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel * @internal This class contains highly performance-sensitive code. */ class UnitOfWork implements PropertyChangedListener { /** * An entity is in MANAGED state when its persistence is managed by an EntityManager. */ const STATE_MANAGED = 1; /** * An entity is new if it has just been instantiated (i.e. using the "new" operator) * and is not (yet) managed by an EntityManager. */ const STATE_NEW = 2; /** * A detached entity is an instance with a persistent identity that is not * (or no longer) associated with an EntityManager (and a UnitOfWork). */ const STATE_DETACHED = 3; /** * A removed entity instance is an instance with a persistent identity, * associated with an EntityManager, whose persistent state has been * deleted (or is scheduled for deletion). */ const STATE_REMOVED = 4; /** * Commits the UnitOfWork, executing all operations that have been postponed * up to this point. The state of all managed entities will be synchronized with * the database. * * The operations are executed in the following order: * * 1) All entity insertions * 2) All entity updates * 3) All collection deletions * 4) All collection updates * 5) All entity deletions * */ public function commit() { // Compute changes done since last commit. $this->computeChangeSets(); if ( ! ($this->_entityInsertions || $this->_entityDeletions || $this->_entityUpdates || $this->_collectionUpdates || $this->_collectionDeletions || $this->_orphanRemovals)) { return; // Nothing to do. } if ($this->_orphanRemovals) { foreach ($this->_orphanRemovals as $orphan) { $this->remove($orphan); } } // Raise onFlush if ($this->_evm->hasListeners(Events::onFlush)) { $this->_evm->dispatchEvent(Events::onFlush, new Event\OnFlushEventArgs($this->_em)); } // Now we need a commit order to maintain referential integrity $commitOrder = $this->_getCommitOrder(); $conn = $this->_em->getConnection(); $conn->beginTransaction(); try { if ($this->_entityInsertions) { foreach ($commitOrder as $class) { $this->_executeInserts($class); } } if ($this->_entityUpdates) { foreach ($commitOrder as $class) { $this->_executeUpdates($class); } } // Extra updates that were requested by persisters. if ($this->_extraUpdates) { $this->_executeExtraUpdates(); } // Collection deletions (deletions of complete collections) foreach ($this->_collectionDeletions as $collectionToDelete) { $this->getCollectionPersister($collectionToDelete->getMapping()) ->delete($collectionToDelete); } // Collection updates (deleteRows, updateRows, insertRows) foreach ($this->_collectionUpdates as $collectionToUpdate) { $this->getCollectionPersister($collectionToUpdate->getMapping()) ->update($collectionToUpdate); } // Entity deletions come last and need to be in reverse commit order if ($this->_entityDeletions) { for ($count = count($commitOrder), $i = $count - 1; $i >= 0; --$i) { $this->_executeDeletions($commitOrder[$i]); } } $conn->commit(); } catch (Exception $e) { $this->_em->close(); $conn->rollback(); throw $e; } // Take new snapshots from visited collections foreach ($this->_visitedCollections as $coll) { $coll->takeSnapshot(); } // Clear up $this->_entityInsertions = $this->_entityUpdates = $this->_entityDeletions = $this->_extraUpdates = $this->_entityChangeSets = $this->_collectionUpdates = $this->_collectionDeletions = $this->_visitedCollections = $this->_scheduledForDirtyCheck = $this->_orphanRemovals = array(); } /** * Computes the changes that happened to a single entity. * * Modifies/populates the following properties: * * {@link _originalEntityData} * If the entity is NEW or MANAGED but not yet fully persisted (only has an id) * then it was not fetched from the database and therefore we have no original * entity data yet. All of the current entity data is stored as the original entity data. * * {@link _entityChangeSets} * The changes detected on all properties of the entity are stored there. * A change is a tuple array where the first entry is the old value and the second * entry is the new value of the property. Changesets are used by persisters * to INSERT/UPDATE the persistent entity state. * * {@link _entityUpdates} * If the entity is already fully MANAGED (has been fetched from the database before) * and any changes to its properties are detected, then a reference to the entity is stored * there to mark it for an update. * * {@link _collectionDeletions} * If a PersistentCollection has been de-referenced in a fully MANAGED entity, * then this collection is marked for deletion. * * @param ClassMetadata $class The class descriptor of the entity. * @param object $entity The entity for which to compute the changes. */ public function computeChangeSet(Mapping\ClassMetadata $class, $entity) { // ... } /** * Computes all the changes that have been done to entities and collections * since the last commit and stores these changes in the _entityChangeSet map * temporarily for access by the persisters, until the UoW commit is finished. */ public function computeChangeSets() { // ... } /** * Schedules an entity for insertion into the database. * If the entity already has an identifier, it will be added to the identity map. * * @param object $entity The entity to schedule for insertion. */ public function scheduleForInsert($entity) { $oid = spl_object_hash($entity); if (isset($this->_entityUpdates[$oid])) { throw new \InvalidArgumentException("Dirty entity can not be scheduled for insertion."); } if (isset($this->_entityDeletions[$oid])) { throw new \InvalidArgumentException("Removed entity can not be scheduled for insertion."); } if (isset($this->_entityInsertions[$oid])) { throw new \InvalidArgumentException("Entity can not be scheduled for insertion twice."); } $this->_entityInsertions[$oid] = $entity; if (isset($this->_entityIdentifiers[$oid])) { $this->addToIdentityMap($entity); } } /** * Schedules an entity for being updated. * * @param object $entity The entity to schedule for being updated. */ public function scheduleForUpdate($entity) { $oid = spl_object_hash($entity); if ( ! isset($this->_entityIdentifiers[$oid])) { throw new \InvalidArgumentException("Entity has no identity."); } if (isset($this->_entityDeletions[$oid])) { throw new \InvalidArgumentException("Entity is removed."); } if ( ! isset($this->_entityUpdates[$oid]) && ! isset($this->_entityInsertions[$oid])) { $this->_entityUpdates[$oid] = $entity; } } /** * INTERNAL: * Schedules an entity for deletion. * * @param object $entity */ public function scheduleForDelete($entity) { $oid = spl_object_hash($entity); if (isset($this->_entityInsertions[$oid])) { if ($this->isInIdentityMap($entity)) { $this->removeFromIdentityMap($entity); } unset($this->_entityInsertions[$oid]); return; // entity has not been persisted yet, so nothing more to do. } if ( ! $this->isInIdentityMap($entity)) { return; // ignore } $this->removeFromIdentityMap($entity); if (isset($this->_entityUpdates[$oid])) { unset($this->_entityUpdates[$oid]); } if ( ! isset($this->_entityDeletions[$oid])) { $this->_entityDeletions[$oid] = $entity; } } /** * Checks whether an entity is scheduled for insertion, update or deletion. * * @param $entity * @return boolean */ public function isEntityScheduled($entity) { $oid = spl_object_hash($entity); return isset($this->_entityInsertions[$oid]) || isset($this->_entityUpdates[$oid]) || isset($this->_entityDeletions[$oid]); } public function persist($entity) { $visited = array(); $this->_doPersist($entity, $visited); } /** * Saves an entity as part of the current unit of work. * This method is internally called during save() cascades as it tracks * the already visited entities to prevent infinite recursions. * * NOTE: This method always considers entities that are not yet known to * this UnitOfWork as NEW. * * @param object $entity The entity to persist. * @param array $visited The already visited entities. */ private function _doPersist($entity, array &$visited) { $oid = spl_object_hash($entity); if (isset($visited[$oid])) { return; // Prevent infinite recursion } $visited[$oid] = $entity; // Mark visited $class = $this->_em->getClassMetadata(get_class($entity)); $entityState = $this->getEntityState($entity, self::STATE_NEW); switch ($entityState) { case self::STATE_MANAGED: // Nothing to do, except if policy is "deferred explicit" if ($class->isChangeTrackingDeferredExplicit()) { $this->scheduleForDirtyCheck($entity); } break; case self::STATE_NEW: if (isset($class->lifecycleCallbacks[Events::prePersist])) { $class->invokeLifecycleCallbacks(Events::prePersist, $entity); } if ($this->_evm->hasListeners(Events::prePersist)) { $this->_evm->dispatchEvent(Events::prePersist, new LifecycleEventArgs($entity, $this->_em)); } $idGen = $class->idGenerator; if ( ! $idGen->isPostInsertGenerator()) { $idValue = $idGen->generate($this->_em, $entity); if ( ! $idGen instanceof \Doctrine\ORM\Id\AssignedGenerator) { $this->_entityIdentifiers[$oid] = array($class->identifier[0] => $idValue); $class->setIdentifierValues($entity, $idValue); } else { $this->_entityIdentifiers[$oid] = $idValue; } } $this->_entityStates[$oid] = self::STATE_MANAGED; $this->scheduleForInsert($entity); break; case self::STATE_DETACHED: throw new \InvalidArgumentException( "Behavior of persist() for a detached entity is not yet defined."); case self::STATE_REMOVED: // Entity becomes managed again if ($this->isScheduledForDelete($entity)) { unset($this->_entityDeletions[$oid]); } else { //FIXME: There's more to think of here... $this->scheduleForInsert($entity); } break; default: throw ORMException::invalidEntityState($entityState); } $this->_cascadePersist($entity, $visited); } /** * Deletes an entity as part of the current unit of work. * * @param object $entity The entity to remove. */ public function remove($entity) { $visited = array(); $this->_doRemove($entity, $visited); } /** * Deletes an entity as part of the current unit of work. * * This method is internally called during delete() cascades as it tracks * the already visited entities to prevent infinite recursions. * * @param object $entity The entity to delete. * @param array $visited The map of the already visited entities. * @throws InvalidArgumentException If the instance is a detached entity. */ private function _doRemove($entity, array &$visited) { // ... } }

May 19, 2010

by Giorgio Sironi

· 12,765 Views

Why Raven DB?

One question that I got a few times regarding Raven is why? Richard Lopes puts the question nicely: However as a pragmatic developer, I am wondering what new this project is offering in a saturated market where you have quite mature alternatives like CouchDB, MongoDB, Tokyo, Redis, and many more ? Many of these products are also cross platform and run at C speed with a proven record, being used in very big web sites where their sharding capabilities and fault tolerance have been pushed far. The answer is composed of several parts, and cover quite a bit of history. Why Raven DB from Ayende’s point of view? Almost two years ago, I decided that it is time that I give my Erlang reading abilities a big push and sat down and read Couch DB source code. That was quite interesting, and was one of the reasons that I got interested in that NoSQL Thing. Unfortunately, I am one of those people who have a really hard time learning by osmosis, I have to do something to truly understand it. I have used (and built) a distributed key value store in several projects, but I felt that I didn’t really have a good understanding on what it means to use a document database. I really hate having ideas stuck in my head, they tend to ping and then someone tell me that I have been staring at a blank wall for two hours, and I realize that I just finished designing a document database. And about a year ago, it finally got bad enough that I sat down and wrote an implementation, just to get it off my chest. That was Rhino DivanDB. In most ways, it was a proof of concept, more than anything else. Just enough so I could tell myself, yes, I can do it. Then I run into situations where a document database would be an ideal fit, except… that the available choices wouldn’t quite do what I wanted. They are all open source, however, so no problem there, right? Except that none of them are really approachable to the .NET eco system. Yes, I can do both C++ and Erlang, but I don’t really like it. Moreover, it seems like .NET support is almost an afterthought (if at all) for those projects. There are some people who call me arrogant, but I really do think that I can do better. And I think that we did. Raven is a project where I tried a lot of new things, not from coding perspective, but from community & launch perspectives. It will be out soon, and I think you’ll be able to appreciate the level of focus on the non coding aspects of the project. Why Raven DB from your point of view? Raven is an OSS (with a commercial option) document database for the .NET/Windows platform. While there are other document databases around, such as CouchDB or MongoDB, there really isn’t anything that a .NET developer can pick up and use without a significant amount of friction. Those projects are excellent in what they do, but they aren’t targeting the .NET ecosystem. Raven does, and in so doing, it brings a lot of benefits to the table. When building Raven and the supporting infrastructure, the focus was always on making sure that it did the Right Thing from the .NET developer point of view. Below you can see a more detailed analysis on Raven’s benefits, but it comes down to that. Raven is build by .NET developers for .NET developers. Corny, isn’t it? But true nonetheless. What does Raven DB has to offer? Raven… builds on existing infrastructure that is known to scale to amazing sizes (Raven’s storage can handle up to 16 terrabytes on a single machine). runs, natively and with no effort, on Windows. In comparison, to get CouchDB to run on Windows you start by compiling Erlang from source. is not just a server. You can easily (trivially) embed Raven inside your application. is transactional. That means ACID, if you put data in it, that data is going to stay there. supports System.Transactions and can take part in distributed transactions. allows you to define indexes using Linq queries. supports map/reduce operations on top of your documents using Linq. comes with a fully functional .NET client API, which implements Unit of Work, change tracking, read and write optimizations, and a bunch more. has an amazing web interface allowing you to see, manipulate and query your documents. is REST based, so you can access it via the java script API directly. can be extended by writing MEF plugins. has trigger support that allow you to do some really nifty things, like document merges, auditing, versioning and authorization. supports partial document updates, so you don’t have to send full documents over the wire. supports sharding out of the box. is available in both OSS and commercial modes. There are probably other things, but I need to head out for a client now, so I’ll stop. I would love to hear your opinions about it, both positive and negative.

May 18, 2010

by Oren Eini

· 10,785 Views · 1 Like

Practical PHP Patterns: Data Mapper

Data Mapper is one of the most advanced persistence-related patterns: an implementation of a Data Mapper stores objects (in general a whole object graph) in a database, and decouples the object model from the backend data representation, moving objects back and forth from the data store without introducing hardcode dependencies towards it. The database back end used by most of the implementations are usually relational: Object-relational mappers are some of the widely used tools today (and are even fading in some areas where different types of database are preferred.) Dependencies The interfaces for a Data Mapper can be put in the domain layer, but actual implementatons are in the category of infrastructure adapters and should be kept out of it, to promote the reuse and testing of domain layer classes without the need for a database back end or driver to be present. When this pattern is employed in an application, there are no more dependencies from the domain layer to external components, and no subclassing like in the Active Record case. Domain entities and value objects become Plain Old PHP Objects which do not extend anything (extends keyword) and do not need to reflect any database schema (if they are saved in a relational db), ensuring the maximum freedom of modelling to the developers. Different kind of implementations Early implementations of Data Mapper did not store an inner reference to a database connection or object that represent the link with the data store; in this case, result sets or some kind of raw data are passed to the Data Mapper, which reconstitutes the objects and encapsulates the process. Currently it is preferred to put all the references to the database as internals of the Data Mapper implementation (or in an abstraction layer under it). Anyway the Data Mapper hides as much as possible, like the type of the database and related knowledge, from the client code (domain layer or an upper one). The interface of the modern Data Mappers become from store() (insert and update) and remove() to one that comprehends also find() methods or a more complex system of querying; the implementation of querying is out of the scope of this pattern, but can be mixed up with it easily. A distinction in the implementations of Data Mapper is in their scope. A Data Mapper can be specific to a particular Entity/Aggregate Root (single class or class with composed objects), or a generic implementation can be customized with metadata (annotations, XML configuration) to work with different classes. Generic implementations are usually very complex, and specific ones may become much more easy to code due to simplifications. However, generic Data Mappers are prone to reuse and present less bugs than the project-specific ones, which were the only alternative in the last years. Issues The difficulties in implementing such a pattern are clear. Given a transaction, like an http request, the mapper has to keep track of the changed objects, and generate automatically the right DML queries to issue (SELECT, UPDATE, DELETE), in the right order and without leaving out any part of the modified data, avoid duplicating rows or update ones that do not exist anymore. This is a case of simple interface and complex implementation. To avoid breaking encapsulation, implementations usually employ reflection to access private fields of the object to store or that the mapper is reconstituting. Other possible choices for the data access are specific constructors for reconstitution or specific interfaces for domain mapping, but this solution still breaks encapsulation by providing to the client code methods that are not meant to be called, or fields that should not even be seen out of the objects but are actually accessed. This results in a unclear Api which may promote dependencies on persistence-related items. Providing metadata breaks encapsulation too, of course, but at least it is kept in the immediate so that it can change with the domain classes. Annotations are the preferred mean to specify metadata such as column names or relationships, and in PHP they are hidden in the docblock comments so that when the Data Mapper is not used they are just ignored. Data Mapper does not provide a total illusion (abstraction) of an in-memory collection of objects: the knowledge that there is some kind of external data store scatters into the application upper layers. Moreover, eventually some particular issue of the storage leaks into the object part of the application. As an example, consider the performance of queries, which is often the object of discussion when using object-relational mappers. Usually not all the object graph is instantiated as it may be very large; tuning how large the instantiated part will be is a trade-off which depends on the underlying database. Furthermore, generated queries may result very inefficient to the point that much of the client code must hint the joins to perform via the Api. Examples The generic Data Mapper Doctrine 2 (now in beta) is one of the few implementations in PHP of this pattern. As we've seen before in this article, specific implementations are dependent on the domain layer, so they are usually not reusable. A working copy of Doctrine 2 would be too large in size for inclusion in this post, so we are only analyzing the interface that most of the client code would see: the Entity Manager (name borrowed from Hibernate and JPA, since Java application used Data Mappers for years before this pattern has seen adoption from PHP ones.) The Entity Manager is not a domain specific interface, but other patterns like the Repository one can then compose the mapper to provid segregated interfaces for a particular class (aggregate root). As always, I have removed the less interesting methods or code to show the Api, and expanded the comments. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel */ class EntityManager { /** * Flushes all changes to objects that have been queued up to now to the database. * This effectively synchronizes the in-memory state of managed objects with the * database. * No query is executed before this method is called from client code. * * @throws Doctrine\ORM\OptimisticLockException If a version check on an entity that * makes use of optimistic locking fails. */ public function flush() { $this->_errorIfClosed(); $this->_unitOfWork->commit(); } /** * Finds an Entity by its identifier. * This method is often combined with query-oriented ones. * * @param string $entityName the class name * @param mixed $identifier usually primary key * @param int $lockMode * @param int $lockVersion * @return object */ public function find($entityName, $identifier, $lockMode = LockMode::NONE, $lockVersion = null) { return $this->getRepository($entityName)->find($identifier, $lockMode, $lockVersion); } /** * Tells the EntityManager to make an instance managed and persistent. * * The entity will be entered into the database at or before transaction * commit or as a result of the flush operation. * * NOTE: The persist operation always considers entities that are not yet known to * this EntityManager as NEW. Do not pass detached entities to the persist operation. * * @param object $object The instance to make managed and persistent. */ public function persist($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->persist($entity); } /** * Removes an entity instance. * * A removed entity will be removed from the database at or before transaction commit * or as a result of the flush operation. * * @param object $entity The entity instance to remove. */ public function remove($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->remove($entity); } /** * Refreshes the persistent state of an entity from the database, * overriding any local changes that have not yet been persisted. * * @param object $entity The entity to refresh. */ public function refresh($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->refresh($entity); } /** * Determines whether an entity instance is managed in this EntityManager. * * @param object $entity * @return boolean TRUE if this EntityManager currently manages the given entity, FALSE otherwise. */ public function contains($entity) { return $this->_unitOfWork->isScheduledForInsert($entity) || $this->_unitOfWork->isInIdentityMap($entity) && ! $this->_unitOfWork->isScheduledForDelete($entity); } /** * Factory method to create EntityManager instances. * * @param mixed $conn An array with the connection parameters or an existing * Connection instance. * @param Configuration $config The Configuration instance to use. * @param EventManager $eventManager The EventManager instance to use. * @return EntityManager The created EntityManager. */ public static function create($conn, Configuration $config, EventManager $eventManager = null); }

May 17, 2010

by Giorgio Sironi

· 9,404 Views

Eclipse Profile Configuration: The Launch Requires at Least One Data Collector

I just installed TPTP into my Eclipse 3.5 under Ubuntu 9.04 and tried to profile a class. The Profile Configuration opened with a red warning reading “the launch requires at least one data collector to be selected“. Clicking the configuration’s Monitor tab reveals a more detailed error (and nothing to select): IWATO435E An error occured when connecting to the host. A quick check of the error log (Window – Show View – Other… – General – Error Log) reveals the cause: RAServer generated the following output: [Error Stream]:ACServer: error while loading shared libraries: /home/jholy/development/tools/eclipse-ide/pulse2-2.4.2/Common/plugins/org.eclipse.tptp.platform.ac.linux_ia32_4.4.202.v201002100300/agent_controller/bin/../lib/libtptpUtils.so.4: file too short Checking the content of the lib/ folder revealed an interesting thing: -rw-r–r– 1 jholy jholy 17 2010-02-16 23:16 libtptpUtils.so -rw-r–r– 1 jholy jholy 21 2010-02-16 23:16 libtptpUtils.so.4 -rwxr-xr-x 1 jholy jholy 100K 2010-02-16 23:16 libtptpUtils.so.4.5.0 As also the content of the two small files suggests (they contain a name of the corresponding file with a longer name), the *.so and *.so.4 files should have been links but the installer failed to create them. Solution List all files in the lib/ folder, you will see that there are many real files like libtptpUtils.so.4.5.0 and libxerces-c.so.26.0 and many should-be-links files. The solution is, of course, to replace all those files that shoud be links with actual links. For me the solution was: $ cd .../plugins/org.eclipse.tptp.platform.ac.linux_ia32_4.4.202.v201002100300/agent_controller/lib # Move out the files that are OK lib$ mkdir tmp lib$ mv libswt-* libcbe.so tmp/ # Fix the links lib$ for FILE in `ls *.so`; do ln -sf "${FILE}.4.5.0" $FILE; ln -sf "${FILE}.4.5.0" "${FILE}.4"; done # Move the correct files back lib$ mv tmp/* . lib$ rmdir tmp # Fix links for files with *.26 instead of *.4.5.0 lib$ ln -sf libxerces-c.so.26.0 libxerces-c.so.26 lib$ ln -sf libxerces-c.so.26.0 libxerces-c.so lib$ ln -sf libxerces-depdom.so.26.0 libxerces-depdom.so.26 lib$ ln -sf libxerces-depdom.so.26.0 libxerces-depdom.so lib$ rm libxerces-depdom.so.4 libxerces-c.so.4 # Done! Try to open the profile configuration now, the IWATO435E should have disappeared and you should be able to select a data collector. If not, restart Eclipse, try again, check the error log. My environment Ubuntu 9.04 Eclipse 3.5 TPTP – see above From http://theholyjava.wordpress.com/2010/05/13/eclipse-profile-configuration-the-launch-requires-at-least-one-data-collector/

May 14, 2010

by Jakub Holý

· 12,891 Views

Practical PHP Patterns: Active Record

The Active Record pattern effectively prescribes to wrap a row of a database table in a domain object with a 1:1 relationship, managing its state and adding business logic in the wrapping class code. An Active Record implementation is in fact a classical C structure aka Record aka associative array of data, with the addition of utility methods that encapsulate behavior that acts on these data. The most useful method is usually the save() one, which updates the database reflecting in the row the current state of the record. Thus, the Active Record transparently works with SQL queries and provides an higher-level Api. Although Active Record is similar in implementation to the Row Data Gateway pattern, it is distinguished from it in the fact that it defines methods with domain-specific logic. The consequence of the presence of domain-specific logic is that generic implementations of Active Record provided by libraries must be customized to met the need of the object model. Typically this customization is done with a thin subclassing, which at least renames the library class with a domain name (like User or Post) and may specify metadata on the database table where the Active Records state is kept, if they are not inferred. The issue of subclassing Subclassing allows the developer to create new methods and properties to represent business logic, and to build a richer and more specific interface than the one constituted by simple Row Data Gateway objects. Despite these advantage, this interface is not much segregated, as subclassing exposes all the public methods of the base library class, on which the developers of the domain model have no control. Subclassing also ties the domain layer to the infrastructure one, being it a library or a framework or every kind of data persistence layer (examples of PHP ORMs that use Active Record are the Zend_Db component, Doctrine 1 and Propel). Domain objects cannot be created or even their class source code loaded without having the library code available. This is an issue when reusing the model in a different environment, and even in test suites. If the library is powerful enough, it may provide adapters for different databases so that a lightweight database instance can be created in the testing environment. Another caveat of Active Record is the fundamental assumption that a domain entity is always a row of a table of a relational database; this constraint is forced even when it is not appropriate, and the database and object model must match. In fact, part of the database (like foreign keys) often scatter into the domain model, as an Active Record with an external one-to-one relationship will usually store not only the related object but also its foreign key. Another example of mirroring of the relational model into the object graph is for the management of M-to-N associating entities, often forced to become real entities even when they do not make sense (the famous UserGroup classes that tie together User and Group rows). Diffusion Thus, the Active Record pattern puts at risk the freedom of implementing a powerful Domain Model, where the object graph is a mix of state-carrying and behavior-carrying objects, like Strategies and Specifications. It is however, a radical simplification in implementation of domain models where CRUD functions are all the rage, and there is no gain in implementing objects that do not simply map to a relational database. Note that in the case of PHP, most of the custom web applications developed in this language are deeply influenced by the back end, assumed as a relational database or even as MySQL. But while the technologies for user-to-application and application-to.application interaction on the web continue to grow, the situation will continue to evolve and if PHP wants to keep up with the pace of other dynamic languages, Java and .NET, it needs to finally decouple from the relational database as the unique model of data. By the way, current implementation of persistence frameworks are transitioning towards a Data Mapper approach (not only in PHP but also in Ruby, while Java has done that years ago with Hibernate and JPA), which is less invasive on the Domain Model source code, and does not introduce an hard dependency from the domain layer to infrastructure components. Examples This sample implementation of the Active Record pattern is taken from the Doctrine 1.2 ORM. The base class in this framework id Doctrine_Record (together with Doctrine_Record_Abstract), while a base class with the schema metadata is regenerated from a model, and it is subclassed for orthogonality of customization and synchronization by the developer. The base Active Record class looks like this: /** * Implements also __get() and __set(), not shown along with many other dozen methods. */ abstract class Doctrine_Record extends Doctrine_Record_Abstract implements Countable, IteratorAggregate, Serializable { /** * Empty template method to provide concrete Record classes with the possibility * to hook into the saving procedure. */ public function preSave($event) { } /** * Empty template method to provide concrete Record classes with the possibility * to hook into the saving procedure. */ public function postSave($event) { } /** * applies the changes made to this object into database * this method is smart enough to know if any changes are made * and whether to use INSERT or UPDATE statement * * this method also saves the related components * * @param Doctrine_Connection $conn optional connection parameter * @return void */ public function save(Doctrine_Connection $conn = null) { if ($conn === null) { $conn = $this->_table->getConnection(); } $conn->unitOfWork->saveGraph($this); } /** * returns a string representation of this object */ public function __toString() { return (string) $this->_oid; } } If we have an Article entity, it will be represented via subclassing of the generic Active Record. In Doctrine 1, a subclass can be generated by writing a compact Yaml model, or even reverse engineered from an existing database: /** * BaseOtk_Content_Article * * This class has been auto-generated by the Doctrine ORM Framework * * @property integer $id * @property integer $section_id * @property integer $author_id * @property integer $image_id * @property string $title * @property string $description * @property string $text * @property integer $visits * @property boolean $draft * @property boolean $closed * @property Otk_Content_Section $section * @property Otk_User $author * @property Otk_File $image * @property Doctrine_Collection $sections * @property Doctrine_Collection $Otk_Content_Tag * @property Doctrine_Collection $comments * */ abstract class BaseOtk_Content_Article extends Otk_Model_Record { public function setTableDefinition() { $this->setTableName('oss_content_articles'); $this->hasColumn('id', 'integer', 3, array('type' => 'integer', 'primary' => true, 'autoincrement' => true, 'length' => '3')); $this->hasColumn('section_id', 'integer', 2, array('type' => 'integer', 'notnull' => true, 'length' => '2')); $this->hasColumn('author_id', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('image_id', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('title', 'string', 255, array('type' => 'string', 'length' => '255')); $this->hasColumn('description', 'string', 1000, array('type' => 'string', 'length' => '1000')); $this->hasColumn('text', 'string', null, array('type' => 'string')); $this->hasColumn('visits', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('draft', 'boolean', null, array('type' => 'boolean', 'notnull' => true, 'default' => false)); $this->hasColumn('closed', 'boolean', null, array('type' => 'boolean')); } public function setUp() { $this->hasOne('Otk_Content_Section as section', array('local' => 'section_id', 'foreign' => 'id')); $this->hasOne('Otk_User as author', array('local' => 'author_id', 'foreign' => 'id')); $this->hasOne('Otk_File as image', array('local' => 'image_id', 'foreign' => 'id')); $this->hasMany('Otk_Content_Section as sections', array('refClass' => 'Otk_Content_Tag', 'local' => 'article_id', 'foreign' => 'section_id')); $this->hasMany('Otk_Content_Tag', array('local' => 'id', 'foreign' => 'article_id')); $this->hasMany('Otk_Content_Comment as comments', array('local' => 'id', 'foreign' => 'article_id')); $timestampable0 = new Doctrine_Template_Timestampable(); $sluggable0 = new Doctrine_Template_Sluggable(array('fields' => array(0 => 'title'), 'canUpdate' => true, 'unique' => true)); $this->actAs($timestampable0); $this->actAs($sluggable0); } } To support further regenerations of the subclass as the schema evolves, another subclassing step is necessary. This class will never be touched by the regeneration process and it is the one referred in client code. /** * This class defines the domain logic via addition of methods. */ class Otk_Content_Article extends BaseOtk_Content_Article { public function getTags() { $tags = array(); foreach ($this->sections as $section) { $tags[$section->slug] = $section->name; } return $tags; } }

May 12, 2010

by Giorgio Sironi

· 8,392 Views

Two Ways to Convert Java Map to String

This article shows 2 ways to convert Java Map to String. Approach 1: simple, lightweight – produces query string like output, but restrictive. Approach 2: uses Java XML bean serialization, more robust but produces overly verbose output. Approach 1: Map to query string format Approach 1 converts a map to a query-string like output. Here’s what an output looks like: name1=value1&name2=value2 Full Code: import java.io.UnsupportedEncodingException; import java.net.URLDecoder; import java.net.URLEncoder; import java.util.HashMap; import java.util.Map; public class MapUtil { public static String mapToString(Map map) { StringBuilder stringBuilder = new StringBuilder(); for (String key : map.keySet()) { if (stringBuilder.length() > 0) { stringBuilder.append("&"); } String value = map.get(key); try { stringBuilder.append((key != null ? URLEncoder.encode(key, "UTF-8") : "")); stringBuilder.append("="); stringBuilder.append(value != null ? URLEncoder.encode(value, "UTF-8") : ""); } catch (UnsupportedEncodingException e) { throw new RuntimeException("This method requires UTF-8 encoding support", e); } } return stringBuilder.toString(); } public static Map stringToMap(String input) { Map map = new HashMap(); String[] nameValuePairs = input.split("&"); for (String nameValuePair : nameValuePairs) { String[] nameValue = nameValuePair.split("="); try { map.put(URLDecoder.decode(nameValue[0], "UTF-8"), nameValue.length > 1 ? URLDecoder.decode( nameValue[1], "UTF-8") : ""); } catch (UnsupportedEncodingException e) { throw new RuntimeException("This method requires UTF-8 encoding support", e); } } return map; } } Example usage code Map map = new HashMap(); map.put("color", "red"); map.put("symbols", "{,=&*?}"); map.put("empty", ""); String output = MapUtil.mapToString(map); Map parsedMap = MapUtil.stringToMap(output); for (String key : map.keySet()) { Assert.assertEquals(parsedMap.get(key), map.get(key)); } Output with Approach 1: symbols=%7B%2C%3D%26*%3F%7D&color=red∅= Caveat Only supports String keys and values. Due to the nature of serialization, null keys and values are not supported. Null will be converted to an empty String. This is because there is no way to distinguish between a null and an empty String in the serialized form. If you need support for null keys and values, use java.beans.XMLEncoder as shown below. Approach 2: Java Bean XMLEncoder: Map to String Java provides XMLEncoder and XMLDecoder classes as part of the java.beans package as a standard way to serialize and deserialize objects. This Map map = new HashMap(); map.put("color", "red"); map.put("symbols", "{,=&*?}"); map.put("empty", ""); ByteArrayOutputStream bos = new ByteArrayOutputStream(); XMLEncoder xmlEncoder = new XMLEncoder(bos); xmlEncoder.writeObject(map); xmlEncoder.flush(); String serializedMap = bos.toString() System.output.println(serializedMap); Output with Approach 2 The serialized value is shown below. As you can see this is more verbose, but can accommodate different data types and null keys and values. symbols {,=&*?} color red empty symbols {,=&*?} color red empty Java Bean XMLDecoder: String to Map XMLDecoder xmlDecoder = new XMLDecoder(new ByteArrayInputStream(serializedMap.getBytes())); Map parsedMap = (Map) xmlDecoder.readObject(); for (String key : map.keySet()) { Assert.assertEquals(parsedMap.get(key), map.get(key)); } Summary While Java provides a standard (and overly verbose) way to serialize and deserialize objects, this articles discusses an alternative lightweight way to convert a Java Map to String and back. If you are serializing a map with non-null String keys and values, then you should be able to use this alternative way, otherwise use the Java bean serialization. From http://www.vineetmanohar.com/2010/05/07/2-ways-to-convert-java-map-to-string

May 8, 2010

by Vineet Manohar

· 137,685 Views

Practical PHP Patterns: Table Data Gateway

The Table Data Gateway pattern is the object-oriented equivalent of a relational table. In fact, this pattern's intent is to encapsulate the full interaction with a database table, holding all the logic specific to this particular implementation of the back end. In the majority of cases, a Table Data Gateway deals with a relational model, having a 1:1 relationship with the main tables of the database. Minor tables may not need a specific class, or can be managed via Table Data Gateways of tables that link them with foreign keys (for example entities introduced to store M:N relationships are usually not first-class citizens.) In the relational implementation, the Table Data Gateway handles all SQL queries, presenting a domain-specific interface when a class is coded for a specific table, or a generic one when a generic implementation is reused throughout different applications. The difference between the two APIs may be something like findBy($field, $value) (generic) versus findByPrice($price) (domain-specific). Note that in PHP magic methods are often used to implement domain-specific interfaces without code generation: a __call() implementation can catch the various findBy*() method and throw exceptions if the methods is not applicable. Related patterns Although, the concept of table is already correlated with a relational model (and it does not hold when the back end is an object-oriented database or one of the key-value stores so trendy today), this pattern is named Gateway because it is a specialization of the Gateway category of pattern, which decouple an object graph (or any in-memory structures) from external infrastructure like databases, web services, filesystems and so on. In fact, there is an alternate name for this pattern: Data Access Object (or DAO for friends). Although if I was pedantic I would highlight the differences between the implementations of DAOs and Table Data Gateway, their intent is really the same and there are differences in an individual pattern implementations that are greater than the ones between the different patterns. There's no clear demarcation line between the two. Another related pattern is the Table Module one. Table Data Gateway does not work against, but with a Table Module, providing a separation of concerns: the first object takes the rows out of the database, while the second performs in-memory operation on them (generally by composing the Table Data Gateway or its results). The in-memory operations of a Table Module are easier to test, but the SQL-based operations of the Table Data Gateway are pushed on the database side: there is a trade-off between the logic should be kept in each class. When used in isolation, the Table Data Gateway is also a Factory for also for Row Data Gateways or Active Records, both again implemented with generic or domain-specific interfaces. Many frameworks and first-generation PHP ORMs based on Active Record are also based on Table Data Gateway to provide a collection-level access to the objects stored as rows. In the context of Active Record, the only alternative to a Table Data Gateway to handle operations like find() is to place static methods on the Active Record class, with all the testability and dishonest API issues that ensue. Both Zend Framework and Doctrine 1.x represent tables as first-class objects. Examples Zend Framework's component Zend_Db, which is explored in the sample code, provides always generic implementations of Zend_Db_Table, and the possibility of optional subclassing (to add domain-specific methods). It is not recommend to expose the API of Table Data Gateway in front-end code, but it's a simple solution when the business logic does not warrant a full-featured Domain Model. Even when working with a Domain Model, and before the introduction of generic Data Mappers for PHP, the Table Data Gateway can be used in a composition solution (wrapped) to craft a simple API for a domain-specific Data Mapper, resulting in decoupling from the database. As I've written earlier, the sample code is taken from the Zend_Db_Table class of Zend Framework (actually from its parent abstract class, Zend_Db_Table_Abstract). I've enriched the docblock comments and left out all the methods not part of the main API (most of getters and setters for configuration and protecte|private members). $value) { switch ($key) { case self::ADAPTER: $this->_setAdapter($value); break; case self::DEFINITION: $this->setDefinition($value); break; case self::DEFINITION_CONFIG_NAME: $this->setDefinitionConfigName($value); break; case self::SCHEMA: $this->_schema = (string) $value; break; case self::NAME: $this->_name = (string) $value; break; case self::PRIMARY: $this->_primary = (array) $value; break; case self::ROW_CLASS: $this->setRowClass($value); break; case self::ROWSET_CLASS: $this->setRowsetClass($value); break; case self::REFERENCE_MAP: $this->setReferences($value); break; case self::DEPENDENT_TABLES: $this->setDependentTables($value); break; case self::METADATA_CACHE: $this->_setMetadataCache($value); break; case self::METADATA_CACHE_IN_CLASS: $this->setMetadataCacheInClass($value); break; case self::SEQUENCE: $this->_setSequence($value); break; default: // ignore unrecognized configuration directive break; } } return $this; } /** * Inserts a new row. * The data structure is as generic as possible. The list of columns is * known by configuration. * $this->_db is a light abstraction over PDO, which already encapsulates * most of the SQL. Database abstraction is not a banal task and segregating * the functionalities in different classes is very helpful. * * @param array $data Column-value pairs. * @return mixed The primary key of the row inserted. */ public function insert(array $data) { $this->_setupPrimaryKey(); /** * Zend_Db_Table assumes that if you have a compound primary key * and one of the columns in the key uses a sequence, * it's the _first_ column in the compound key. */ $primary = (array) $this->_primary; $pkIdentity = $primary[(int)$this->_identity]; /** * If this table uses a database sequence object and the data does not * specify a value, then get the next ID from the sequence and add it * to the row. We assume that only the first column in a compound * primary key takes a value from a sequence. */ if (is_string($this->_sequence) && !isset($data[$pkIdentity])) { $data[$pkIdentity] = $this->_db->nextSequenceId($this->_sequence); } /** * If the primary key can be generated automatically, and no value was * specified in the user-supplied data, then omit it from the tuple. */ if (array_key_exists($pkIdentity, $data) && $data[$pkIdentity] === null) { unset($data[$pkIdentity]); } /** * INSERT the new row. */ $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; $this->_db->insert($tableSpec, $data); /** * Fetch the most recent ID generated by an auto-increment * or IDENTITY column, unless the user has specified a value, * overriding the auto-increment mechanism. */ if ($this->_sequence === true && !isset($data[$pkIdentity])) { $data[$pkIdentity] = $this->_db->lastInsertId(); } /** * Return the primary key value if the PK is a single column, * else return an associative array of the PK column/value pairs. */ $pkData = array_intersect_key($data, array_flip($primary)); if (count($primary) == 1) { reset($pkData); return current($pkData); } return $pkData; } /** * Updates existing rows. * Again we see generic data structures, not tied to PDO * or to particular adapters. * * @param array $data Column-value pairs. * @param array|string $where An SQL WHERE clause, or an array of SQL WHERE clauses. * @return int The number of rows updated. */ public function update(array $data, $where) { $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; return $this->_db->update($tableSpec, $data, $where); } /** * Deletes existing rows. * * @param array|string $where SQL WHERE clause(s). * @return int The number of rows deleted. */ public function delete($where) { $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; return $this->_db->delete($tableSpec, $where); } /** * Fetches rows by primary key. The argument specifies one or more primary * key value(s). To find multiple rows by primary key, the argument must * be an array. * * This method accepts a variable number of arguments. If the table has a * multi-column primary key, the number of arguments must be the same as * the number of columns in the primary key. To find multiple rows in a * table with a multi-column primary key, each argument must be an array * with the same number of elements. * * The find() method always returns a Rowset object, even if only one row * was found. * * @param mixed $key The value(s) of the primary keys. * @return Zend_Db_Table_Rowset_Abstract Row(s) matching the criteria. * @throws Zend_Db_Table_Exception */ public function find() { $this->_setupPrimaryKey(); $args = func_get_args(); $keyNames = array_values((array) $this->_primary); if (count($args) < count($keyNames)) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Too few columns for the primary key"); } if (count($args) > count($keyNames)) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Too many columns for the primary key"); } $whereList = array(); $numberTerms = 0; foreach ($args as $keyPosition => $keyValues) { $keyValuesCount = count($keyValues); // Coerce the values to an array. // Don't simply typecast to array, because the values // might be Zend_Db_Expr objects. if (!is_array($keyValues)) { $keyValues = array($keyValues); } if ($numberTerms == 0) { $numberTerms = $keyValuesCount; } else if ($keyValuesCount != $numberTerms) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Missing value(s) for the primary key"); } $keyValues = array_values($keyValues); for ($i = 0; $i < $keyValuesCount; ++$i) { if (!isset($whereList[$i])) { $whereList[$i] = array(); } $whereList[$i][$keyPosition] = $keyValues[$i]; } } $whereClause = null; if (count($whereList)) { $whereOrTerms = array(); $tableName = $this->_db->quoteTableAs($this->_name, null, true); foreach ($whereList as $keyValueSets) { $whereAndTerms = array(); foreach ($keyValueSets as $keyPosition => $keyValue) { $type = $this->_metadata[$keyNames[$keyPosition]]['DATA_TYPE']; $columnName = $this->_db->quoteIdentifier($keyNames[$keyPosition], true); $whereAndTerms[] = $this->_db->quoteInto( $tableName . '.' . $columnName . ' = ?', $keyValue, $type); } $whereOrTerms[] = '(' . implode(' AND ', $whereAndTerms) . ')'; } $whereClause = '(' . implode(' OR ', $whereOrTerms) . ')'; } // issue ZF-5775 (empty where clause should return empty rowset) if ($whereClause == null) { $rowsetClass = $this->getRowsetClass(); if (!class_exists($rowsetClass)) { require_once 'Zend/Loader.php'; Zend_Loader::loadClass($rowsetClass); } return new $rowsetClass(array('table' => $this, 'rowClass' => $this->getRowClass(), 'stored' => true)); } return $this->fetchAll($whereClause); } /** * Fetches a new blank row (not from the database). * Thanks to the metadata, a new Row Data Gateway can be created. This * if a Factory Method. The dynamic nature of PHP makes configuring the * subclass for the Row Data Gateway as simple as defining a string. * * @param array $data OPTIONAL data to populate in the new row. * @param string $defaultSource OPTIONAL flag to force default values into new row * @return Zend_Db_Table_Row_Abstract */ public function createRow(array $data = array(), $defaultSource = null) { $cols = $this->_getCols(); $defaults = array_combine($cols, array_fill(0, count($cols), null)); // nothing provided at call-time, take the class value if ($defaultSource == null) { $defaultSource = $this->_defaultSource; } if (!in_array($defaultSource, array(self::DEFAULT_CLASS, self::DEFAULT_DB, self::DEFAULT_NONE))) { $defaultSource = self::DEFAULT_NONE; } if ($defaultSource == self::DEFAULT_DB) { foreach ($this->_metadata as $metadataName => $metadata) { if (($metadata['DEFAULT'] != null) && ($metadata['NULLABLE'] !== true || ($metadata['NULLABLE'] === true && isset($this->_defaultValues[$metadataName]) && $this->_defaultValues[$metadataName] === true)) && (!(isset($this->_defaultValues[$metadataName]) && $this->_defaultValues[$metadataName] === false))) { $defaults[$metadataName] = $metadata['DEFAULT']; } } } elseif ($defaultSource == self::DEFAULT_CLASS && $this->_defaultValues) { foreach ($this->_defaultValues as $defaultName => $defaultValue) { if (array_key_exists($defaultName, $defaults)) { $defaults[$defaultName] = $defaultValue; } } } $config = array( 'table' => $this, 'data' => $defaults, 'readOnly' => false, 'stored' => false ); $rowClass = $this->getRowClass(); if (!class_exists($rowClass)) { require_once 'Zend/Loader.php'; Zend_Loader::loadClass($rowClass); } $row = new $rowClass($config); $row->setFromArray($data); return $row; } }

May 5, 2010

by Giorgio Sironi

· 7,521 Views

Practical PHP Patterns: Domain Model

The architectural pattern I'd like to talk about in this article is the overly famous Domain Model. An application's Domain Model is simply defined as an object graph created from domain-specific classes; when present, a Domain Model is the core of the application, where all the business logic resides. This object graph is employed by upper layers of an application which present it to the user. The metaphor for this methodology In software development, the term domain (or business domain) is an umbrella for the area the application is built in, and that it will serve. The new domains we encounter as we move to new projects are one of the most interesting points of software development, where we are constantly embracing new fields and gaining knowledge. Given a domain such as a particular industry (chemical, electronics) or business (air travelling, e-commerce), the point of connection of an application with these activities is its model. A model is an abstract representation of the reality of the domain, which captures its interesting and relevant aspects. The practice of modelling is not a specific trait of software development (in particular model-driven development), but it is a more general scientifical process. For example, everyone who works in the field of information technology knows the voltage/current relationships for simple components such as resistors and capacitors (Ohm's law and current derivative of the voltage). The specific domain here is electronics, and this model is named lumped component model, essentially because it lets a designer connect isolated one-port (two terminals) components to build his desired circuit. This model is a simplification of much more complex models of reality: the Maxwell equations and the propagation of electromagnetic fields; the lumped component model is valid whenever the frequency of the voltage/current signals in the circuit is low, so that the wavelengths of these signals are far greater than the dimensions of the circuit (if that goes over your head, don't worry, it's the field of electrical engineers.) When designers consider larger circuits, such as a transmission line, this model ceases to give correct results and more general ones must be employed. The domain is almost the same, but the model serves a different purpose and has to be necessarily different from the one used in small scale circuits. This complex example is here only to show that given a domain, there is no single model for it, but there are many possible ones which may adapt more or less reliably to the goals of an application. Starting from a modelling phase and deep understanding of the domain are key points of Domain-Driven Design, one of the ascending methodologies for developing complex enterprise software. Software models While there are standard mathematical models for many domains in the scientific world, software developers usually build a tailored one in every different application, performing an analysis of the domain (or at least they should.) The result of the modelling can comprehend document or diagrams, but the most powerful artifact is an executable model. Object-oriented programming is a almost perfect paradigm when it comes to modelling the real world, and lets the developers construct a Domain Model in the form of a set of classes. In a correct implementation of a Domain Model, these classes should be behaviorally complete: they must encapsulate their data as much as possible and expose a set of methods, while avoiding their usage as dumb data containers. The bread and butter of a Domain Model are the classical example of User, Post, Forum, Group, PrivateMessage classes, which are usually in a one to one relationship with database tables. But the Domain Model is not limited to these Entity classes: it also "comprehends" ValueObjects (modelization of domain-specific data types) and various kinds of Services. Every class that encapsulates business logic is welcome, so that this logic is not duplicated in upper layers, which are the primary clients of the Domain Model. Dependencies and purity Another key trait of the classes included in the Domain Model is the absence of external dependencies, like a library to store in the data contained in the objects in a database. The code artifact in a Domain Model are either interfaces, or Plain Old Php Objects (classes which do not extend any external abstract superclass.) Active Record approaches should be avoided because not only a relational database is an infrastructure detail not included in the Domain Model itself, but the very concept of persistence is abstracted away. As far as the clients of the Domain Model are concerned, the state and behavior of the application are represented by an in-memory object graph, whose methods expose functionalities and which client code can play with. There are no dependencies from a Domain Model towards infrastructure classes, because these dependencies must be inverted. The resulting system is an instance of the hexagonal architecture, where the Domain Model defines ports (interfaces) and infrastructure can be chosen to provide adapters for these ports (implementations in the form of classes extraneous to the model). The implementaton of non-invasive persistence is the subject of the Data Mapper pattern, which will be treated later in this series, but every kind of service implementation which communicate with the outside of the core object graph (databases, network, filesystem) is only defined as a contract in the Domain Model. Persistence is almost always dealt with a library in other object-oriented languages, now also in PHP with a non-invasive ORM such as Doctrine 2. Nothing obstructs the developers from implementing a specific Data Mapper by hand, but it's a very repetitive and prone to errors task. While in origin simpler, invasive patterns such as Active Record could be used in a Domain Model, nowadays with Data Mapper availables it is considered an hack. Sample Returning to the subject of the Domain Model as the core of an application, the diffused opinion is that the more complex the business logic and the data involved, the more the application benefits from a rich Domain Model. Thus, this pattern should not be used in small-sized applications where there is no much more logic than CRUD screens for data containers, which unfortunately were a target for PHP in the last ten years. I hope PHP keeps evolving to finally break in the enterprise segment, where this pattern is most valuable. Due to the size and scope of this article, I am forced to keep the sample code short. Forgive me if you think that you can achieve the same functionality with fewer lines of code, but this pattern is about architecture and should highlight the separation of concerns between classes more than the KISS principle. Another problem with code samples in modelling is that you have to actually know the domain well to follow the discussion. For this reason I chose a webmail system for this example. _sender; } /** * Do we need setters and getters? Every field should be * analyzed. If we can keep it private and inaccessible, * it's usually better. */ public function setSender($sender) { $this->_sender = $sender; } /** * @return string */ public function getRecipient() { return $this->_recipient; } public function setRecipient($recipient) { $this->_recipient = $recipient; } /** * @return string */ public function getSubject() { return $this->_subject; } public function setSubject($subject) { $this->_subject = $subject; } /** * @return string */ public function getText() { return $this->_text; } public function setText($text) { $this->_text = $text; } public function __toString() { return $this->_subject . ' > ' . substr($this->_text, 0, 20) . '...'; } public function reply() { $reply = new Email(); $reply->setRecipient($this->_sender); $reply->setSender($this->_recipient); $reply->setSubject('Re: ' . $this->_subject); $reply->setText($this->_sender . " wrote:\n" . $this->_text); return $reply; } } /** * Interface for a service. This is part of the Domain Model, * implementations will be plugged in depending on the environment. */ interface EmailRepository { /** * @return array * @TypeOf(Email) */ public function getEmailsFor($recipient); } // client code $mail = new Email(); $mail->setSender("alice@example.com"); $mail->setRecipient("bob@example.com"); $mail->setSubject('Hello'); $mail->setText('This is a test of an Email object, which is part of our Domain Model.'); echo $mail, "\n"; $reply = $mail->reply(); echo $reply, "\n";

April 25, 2010

by Giorgio Sironi

· 7,458 Views

“When a class with type parameters is not a parameterized class” – a Java Generics Puzzler

while recently fiddling with some more runtime generic type extraction for deployit , i was caught out by some unexpected behaviour by the reflection api. a check of the javadocs quickly revealed that i had once again been too hasty in relying on "common sense". still, the case seems sufficiently unintuitive to merit discussion. in this case, the issue centres on the interplay between class.gettypeparameters and parameterizedtype . the gist of the code looks something like: interface spying {} // small class hierarchy class person {} class professional extends person {} class agent extends professional {} class assassin extends professional {} class bystander extends person {} ... person jbond = new agent(); system.out.println("generic superclass type argument: " + trygetsuperclassgenerictypeparam(jbond)); person joepublic = new bystander(); system.out.println("generic superclass type argument: " + trygetsuperclassgenerictypeparam(joepublic)); person oddjob = new assassin(); system.out.println("generic superclass type argument: " + trygetsuperclassgenerictypeparam(oddjob)); ... type trygetsuperclassgenerictypeparam(object obj) { class clazz = obj.getclass(); class superclass = clazz.getsuperclass(); // elvis would be preferred, but for the sake of clarity... if (superclass.gettypeparameters().length > 0) { return ((parameterizedtype) clazz.getgenericsuperclass()).getactualtypearguments()[0]; } else { return null; } } so...what happens? trygetsuperclassgenerictypeparam is where the action happens. it seems fairly straightforward: see if the object's superclass is generic (i.e. takes type parameters) and, if so, cast its type representation to parameterizedtype to extract the actual value for the type parameter. if the superclass is not generic, simply return null. when this code is run, the first two invocations of trygetsuperclassgenerictypeparam result in the expected: generic superclass type argument: interface spying generic superclass type argument: null what about the third one? well, given the fact that we've omitted to specify a generic type parameter for professional we might assume 1 that we'd also get null. the actual output, however, is: exception in thread "main" java.lang.classcastexception: java.lang.class cannot be cast to java.lang.reflect.parameterizedtype at trygetsuperclassgenerictypeparam(...) huh? in order to figure out what's going on here, let's have a look at the javadoc for class.gettypeparameters: returns an array of typevariable objects that represent the type variables declared by the generic declaration represented by this genericdeclaration object, in declaration order. returns an array of length 0 if the underlying generic declaration declares no type variables. in other words, this is returning class-level information about the declaration of, in our case, the professional class, which of course does have a type parameter. however, if we look at class.getgenericsuperclass 2 , which we invoke next, we find that it: returns the type representing the direct superclass of the entity [...] represented by this class. if the superclass is a parameterized type, the type object returned must accurately reflect the actual type parameters used in the source code. here, the information returned is specific to the actual declaration of the class, which may (or may not, as in our case) specify type paramaters for its superclass. and therein lies the problem: professional.class.gettypearguments looks at the declaration of the professional class, discovering a type argument, whereas assassin.class.getgenericsuperclass looks at the occurrence of professional in the declaration of assassin and discovers no type parameters. hence, it returns a class rather than a parameterizedtype and blows up our code. ergo to cut a long story short: if an object's superclass has type arguments as determined by class.gettypearguments that does not mean that object.getclass().getgenericsuperclass() will be a parameterizedtype. footnotes read "i assumed" it's a pity that class.getgenericsignature , which determines the "generic or not" behaviour of class.getgenericsuperclass, is private, native and undocumented. from http://blog.xebia.com/2010/04/22/when-a-class-with-type-parameters-is-not-a-parameterized-class-a-java-generics-puzzler/

April 22, 2010

by Andrew Phillips

· 28,204 Views

Extract constants from strings and numbers with Eclipse refactorings

For readability’s sake, it’s almost always a good idea to replace magic numbers and string literals with constants. That’s all good, but it can take a bit of time to refactor these to constants, especially strings or parts of strings. For example, in the code below we want to refactor “shovel and spade” to a private static final String called TOOLS. To do that manually would take some time. It goes even slower if we only want to extract “spade” to a constant because we first have to convert the string to a concatenation. String tools = "shovel and spade"; ... String otherTools = "shovel and spade"; Luckily, Eclipse has a couple of ways to instantly convert literals to constants. Coupled with tools to speed up string selection and to pick out part of a string, you have the ability to create a constant in about 2 seconds flat. I’ll discuss all these features below. Extract a constant from a string/number There are 2 ways to extract a constant, the one uses a quick fix and the other a refactoring. I’ll show the quick fix method first and then the refactoring and discuss the (small) differences between the two. The example uses a string, but everything is true for numbers as well. Follow these steps to use the quick fix: First select the string. The fastest way is to place the cursor on the string and press Alt+Shift+Up (Select Enclosing Element; a nifty shortcut that I discuss in Select strings and methods with a single keystroke). After selecting the string, press Ctr+1 (Quick Fix) and then select Extract to constant. Eclipse will do the following: (a) Create a private final static variable of type String with a default name, (b) replace all occurrences of that string with the constant and (c) place the cursor on the constant’s declaration to give you a chance to change the name, type and visibility of the variable using placeholders that you can Tab through. Once you’re happy with the constant details, press Enter to go back to the line on which you initiated the quick fix. Here’s a short video with an example of using quick fix. We’ll extract a constant (called TOOLS) from a string literal (“shovel and spade”) that’s used in two places. Note: You can use Tab to move from one placeholder to another and pressing Enter will get you back to your original line. The other way to extract a constant is by using the Extract Constant refactoring. Again, select the string, then select Refactor > Extract Constant… (Alt+T, A) from the application menu. A dialog appears prompting you for the constant’s name, its visibility and whether to replace all occurrences of the string with the constant. After you’ve entered the details, press Enter and you’ll have your constant defined. Here’s a short video with an example using refactoring. We’ll use the same example as above. The differences between the two? Not much, the biggest difference being when you enter the details of the constant (ie. before the change is made or after). The refactoring dialog also provides an option to add the qualifying type name before the constant’s usage, but most of time this is redundant. I’d recommend using the quick fix, unless you’re more comfortable with dialogs. BTW, you can assign custom keyboard shortcuts to either command by mapping either Quick Assist – Extract Constant or the command Extract Constant. Pick out part of a string Sometimes you’ll want to break up a string into multiple parts and convert one of those parts into a constant. Eclipse can do this automatically. Select the part of the string you want to pick out (don’t worry about quotes), press Ctrl+1 and choose Pick out selected part of String. Eclipse will convert that part into a string with quotes, concatenate it to the rest of the string and select it. You can then use any of the Extract Constant tools above. Here’s an example of how to use this feature. Notice how the string’s already selected so we can use the Extract Constant quick fix immediately. Related Tips Select entire strings and methods in Eclipse with a single keystroke Convert string concatenations into StringBuilder or MessageFormat calls with Eclipse’s Quick Fix How to manage keyboard shortcuts in Eclipse and why you should Join/split if statements and rearrange expressions using Eclipse Quick Fix More tips on using quick fixes and making editing faster.

April 19, 2010

by Byron M

· 20,971 Views · 1 Like

Running Hazelcast on a 100 Node Amazon EC2 Cluster

The purpose of this article is to give you the details of our 100 node cluster demo. This demo is recorded and you can watch the 5 minute screencast Hazelcast is an open source clustering and highly scalable data distribution platform for Java. JVMs that are running Hazelcast will dynamically cluster and allow you to easily share and partition your application data across the cluster. Hazelcast is a peer-to-peer solution (there is no master node, every node is a peer) so there is no single point of failure. Communication among cluster members is always TCP/IP with Java NIO beauty. The default configuration comes with 1 backup so if a node fails, no data will be lost (you can specify the backup count). It is as simple as using java.util.{Map, Queue, Set, List}. Just add the hazelcast.jar into your classpath and start coding. When you download the Hazelcast, you will find a test.sh under bin directory. The test.sh runs an application which randomly makes 40% get, 40% put and 20% remove on a distributed map. In this demo the same test application will be used to see how it performs on 100 node cluster. Amazon EC2 and S3 An easy to use and scalable cloud environment was needed for demo so we decided to use Amazon EC2 for server instances (nodes) and S3 service to store demo application zip and configuration files. With its newly announced Java SDK, it is very simple to start/stop server instances and upload files to S3 programatically. Hazelcast AMI & Launcher The challenge here is that we are running an application on 100 nodes and dealing with each and every server in the cluster is a huge task. We don't want to ssh into every server and manually start the application. This part is automated by creating a special server image (AMI). The AMI contains Java Runtime and a launcher application we developed, which will download the demo application from Amazon S3, unzip it, and run the hazelcast/bin/test.sh in it. The Launcher is actually so generic that it can run any application; it doesn't care/know what test.sh contains. Deployer Deployment of the demo application is also automated so that we don't need to login into AWS Management Console and manually start instances. Deployer instantiates any number of Amazon EC2 servers with any AMI and also uploads the demo application zip file to S3. So the idea here is that, the Deployer will store the application into S3 and launch 100 EC2 instances with our image. The Launcher on each instance will download the application from S3 and run it. Demo Details. The smallest EC2 instances (m1.small) are used to run the demo. These are the virtual instances with CPU about 1.0 GHz. Also keep in mind that EC2 platform suffers from considerable amount of network latency. That's why we increased the thread count to 250 in our application. The following steps performed during the demo Download hazelcast-1.8.3.zip from www.hazelcast.com. Unzip the file and move the monitoring war file into tomcat6/webapps directory. Edit the test.sh under the bin directory: Add -Xmx1G -Xms1G Add -Dhazelcast.initial.wait.seconds=100 to make the cluster evenly partition on start so that migration can be avoided for better performance. Add t250 as an argument to the application to set thread count to 250. Remember the latency issue. Run the Deployer from IDE. Check from EC2 Management Console if 100 servers started. Start tomcat. Copy the public DNS name of one of the servers to connect to from monitoring tool. Go to http://localhost:8080/hazelcast-monitor-1.8.3/ (Hazelcast Monitoring Tool). Paste the address and connect to the cluster. Enjoy! Results You should always look for programatic ways of launching applications on the cloud. With these tools we were able to deploy and run the demo application on 100 servers in minutes. The entire Hazelcast cluster was making over 400,000 operations per second on the smallest EC2 instances. In our next demo we will experiment Hazelcast on large data set and even bigger cluster. Watch the screencast

April 16, 2010

by Fuad Malikov

· 62,216 Views · 1 Like

Debugging Hibernate Generated SQL

In this article, I will explain how to debug Hibernate’s generated SQL so that unexpected query results be traced faster either to a faulty dataset or a bug in the query. There’s no need to present Hibernate anymore. Yet, for those who lived in a cave for the past years, let’s say that Hibernate is one of the two main ORM frameworks (the second one being TopLink) that dramatically ease database access in Java. One of Hibernate’s main goal is to lessen the amount of SQL you write, to the point that in many cases, you won’t even write one line. However, chances are that one day, Hibernate’s fetching mechanism won’t get you the result you expected and the problems will begin in earnest. From that point and before further investigation, you should determine which is true: either the initial dataset is wrong or the generated query is or both if you’re really unlucky Being able to quickly diagnose the real cause will gain you much time. In order to do this, the greatest step will be viewing the generated SQL: if you can execute it in the right query tool, you could then compare pure SQL results to Hibernate’s results and assert the true cause. There are two solutions for viewing the SQL. Show SQL The first solution is the simplest one. It is part of Hibernate’s configuration and is heavily documented. Just add the following line to your hibernate.cfg.xml file: ... true The previous snippet will likely show something like this in the log: select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ Not very readable but enough to copy/paste in your favourite query tool. The main drawback of this is that if the query has parameters, they will display as ? and won’t show their values, like in the following output: select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ where (this_.PER_D_BIRTH_DATE=? and this_.PER_T_FIRST_NAME=? and this_.PER_T_LAST_NAME=?) If they’re are too many parameters, you’re in for a world of pain and replacing each parameter with its value will take too much time. Yet, IMHO, this simple configuration should be enabled in all environments (save production), since it can easily be turned off. Proxy driver The second solution is more intrusive and involves a third party product but is way more powerful. It consists of putting a proxy driver between JDBC and the real driver so that all generated SQL will be logged. It is compatible with all ORM solutions that rely on the JDBC/driver architecture. P6Spy is a driver that does just that. Despite its age (the last release dates from 2003), it is not obsolete and server our purpose just fine. It consists of the proxy driver itself and a properties configuration file (spy.properties), that both should be present on the classpath. In order to leverage P6Spy feature, the only thing you have to do is to tell Hibernate to use a specific driver: com.p6spy.engine.spy.P6SpyDriver ... This is a minimal spy.properties: module.log=com.p6spy.engine.logging.P6LogFactory realdriver=org.hsqldb.jdbcDriver autoflush=true excludecategories=debug,info,batch,result appender=com.p6spy.engine.logging.appender.StdoutLogger Notice the realdriver parameter so that P6Spy knows where to redirect the calls. With just these, the above output becomes: 1270906515233|3|0|statement|select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ where (this_.PER_D_BIRTH_DATE=? and this_.PER_T_FIRST_NAME=? and this_.PER_T_LAST_NAME=?)|select this_.PER_N_ID as PER1_0_0_, this_.PER_D_BIRTH_DATE as PER2_0_0_, this_.PER_T_FIRST_NAME as PER3_0_0_, this_.PER_T_LAST_NAME as PER4_0_0_ from T_PERSON this_ where (this_.PER_D_BIRTH_DATE=’2010-04-10′ and this_.PER_T_FIRST_NAME=’Johnny’ and this_.PER_T_LAST_NAME=’Be Good’) Of course, the configuration can go further. For example, P6Spy knows how to redirect the logs to a file, or to Log4J (it currently misses a SLF4J adapter but anyone could code one easily). If you need to use P6Spy in an application server, the configuration should be done on the application server itself, at the datasource level. In that case, every single use of this datasource will be traced, be it from Hibernate, TopLink, iBatis or plain old JDBC. In Tomcat, for example, put spy.properties in common/classes and update the datasource configuration to use P6Spy driver. The source code for this article can be found here. To go further: P6Spy official site Log4jdbc, a Google Code contender that aims to offer the same features From http://blog.frankel.ch/debugging-hibernate-generated-sql

April 13, 2010

by Nicolas Fränkel

CORE

· 30,076 Views

How to use WMI from a .NET Application

First of all, let’s see what is WMI and what it offers. WMI is an acronym for Windows Management Instrumentation, which is basically an interface to the Windows OS system settings, drivers and parameters. It also allows managing Windows personal computers and servers through it. A .NET developer can use WMI to obtain information about drivers installed on the client machine, verify whether the system is licensed or not, check for hardware configuration and a lot more. Quoting Linus Torvalds, “Talk is cheap. Show me the code”, let’s get to the basics of WMI usage. To get data through WMI, a SQL-like query is used. The specific query type is called WQL (WMI Query Language). Don’t let the name confuse you. It is still very similar to SQL. Before diving into code, you should know that Windows comes with a tool called WMI Test Tool, which lets you test WQL queries, to check their correctness and returned results. It is a bit harder to track wrong query results in code, so this tool can save some time for the developer. To run it, just start the Run dialog (or the Command Prompt) and type wbemtest. Once it is started, you will see a window like this: Click on Connect and you will see a dialog like this: It lets you connect to a namespace on your local Windows computer. You can use your credentials (although for the most queries this is not a requirement) and select the impersonation and authentication levels (once again, for the most queries the default settings are acceptable). Once you click connect, you will be able to execute WMI queries, as well as perform other tasks (for example, enumerate classes in a superclass to review its possibilities). Before creating a query, you need to understand what information you want to obtain. The query is executed against a WMI class – you can read the complete list here. Let’s take the Win32_Processor class as an example here. Querying against this class will give us the information about the CPU installed on a machine. If the machine runs with multiple CPUs, a query result will be returned for each one of them. The Win32_Processor class exposes the following properties: AddressWidth Architecture Availability Caption ConfigManagerErrorCode ConfigManagerUserConfig CpuStatus CreationClassName CurrentClockSpeed CurrentVoltage DataWidth Description DeviceID ErrorCleared ErrorDescription ExtClock Family InstallDate L2CacheSize L2CacheSpeed L3CacheSize L3CacheSpeed LastErrorCode Level LoadPercentage Manufacturer MaxClockSpeed Name NumberOfCores NumberOfLogicalProcessors OtherFamilyDescription PNPDeviceID PowerManagementCapabilities[] PowerManagementSupported ProcessorId ProcessorType Revision Role SocketDesignation Status StatusInfo Stepping SystemCreationClassName SystemName UniqueId UpgradeMethod Version VoltageCaps Most of these are have self-descriptive names, but if you are ever confused about one of them, you can always refer to the MSDN documentation for the class, that explains each one of them. Now, let’s try to get the values of the above mentioned properties in your .NET application. In my examples I am using C#, but if you are using another .NET language, you shouldn’t have a problem adapting the code. First of all, you need to add a reference to the System.Management and System. Management.Instrumentation namespaces. This is done by right-clicking on References in the Solution Explorer and selecting Add Reference. Then, you can select the above mentioned libraries from the .NET list: Once selected, you need to reference the proper namespaces in your code: using System.Management; Now, to the actual code. I am going to create a function that can be called from anywhere in the code to simplify this task. void GetCPUInfo() { ManagementObjectSearcher searcher = new ManagementObjectSearcher("SELECT * FROM Win32_Processor"); foreach (ManagementObject obj in searcher.Get()) { if (!(obj == null)) Debug.Print(obj.Properties["CpuStatus"].Value.ToString()); } } The ManagementObjectSearcher is the key element here – it gets the returned properties based on the query. The parameter I am passing to it when instantiating is the actual query. As you see, it is very similar to SQL. My current query will retrieve all properties available in Win32_Processor. I iterate through them (note that each result is a ManagementObject – the property holder, in this case will be a separate instance for each CPU that is found) and print in the Output window the value of the CpuStatus property: The 1 here is exactly what is returned. It is a good practice to consult the documentation before reading specific properties, to understand the possible returned values. 1 for CpuStatus means that the CPU is installed and is active. Important note: Some of the readers might be curious, why there is a null value verification. Some of the classes require user authentication to get the correct data and some properties are simply not available, being the cause of multiple exceptions, depending on the authentication methods and property types. Therefore, to avoid exceptions, this code security measure is used here. If only one property is needed to be retrieved, then the query can be organized like this: SELECT CpuStatus FROM Win32_Processor The important thing to remember here is that when you only retrieve one property, the rest of them are unavailable for that specific query result. Therefore, trying to get their value will cause an exception.

April 12, 2010

by Denzel D.

· 17,860 Views

Jetty Browser Cache Control

Do you use Jetty and need to change the default setting for browser cache control? Have a look at the init-param element named cacheControl in webdefault.xml. Here’s the default configuration for the version of Jetty I use. Note the element is commented. To enable and configure browser cache control, uncomment and edit the param-value as appropriate. The following example instructs the browser to disable all caching. cacheControl no-store,no-cache,must-revalidate For information on Cache-Control, see RFC 2616, Section 14.9. From http://codeaweso.me/2009/09/jetty-browser-cache-control/

April 10, 2010

by Mike Christianson

· 15,899 Views

What To Do When A Hard Drive Fails

When a hard drive crashes, you can lose all your data. Corrupt hard drives happen out of the blue and for seemingly no good reason. If your hard drive fails, what can you do? One option is to call a hard drive recovery company. If your data is worth a lot of money to you, you can pay a forensic computer company to get the data off your hard drive. Before you write a check though, try a little Do-It-Yourself first. What is going on inside the hard drive is a bunch of little platters spinning at high speed. When data is accessed or written to the disk, a little head (sort of like on a record player) moves to the right spot and does it's magic. The space between the head and the platter is very very tiny. Freezing the hard drive will shrink the head and the platter ever so slightly, often allowing you to read data. Here is how I got the data off of a failed hard drive. Remove the hard drive from the computer. Place the hard drive inside of a zip top freezer bag. (don't buy a cheap bag.) Place the wrapped hard drive inside of ANOTHER zip top freezer bag. (yes, you need to do this) (see figure 1 below) Place the double wrapped hard drive in the coldest part of your freezer. Leave the hard drive in the freezer for 12 hours at least. You want it good and cold! (see figure 2 below) Once very chilled, install the hard drive in your computer and start pulling off data. Begin with the most valuable data. At some point, the hard drive will fail again. When it does, mark the last successfully copied data, pull out the hard drive, double wrap it again and stick it in the Chill Chest for another 12 hours. You may need to do this a number of times to get all the data you want, or until the hard drive stops working completely. Double Wrapped Hard Drive Hard Drive in the Freezer

April 5, 2010

by Dan Wilson

· 124,059 Views · 1 Like