DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
ExtJS, Spring MVC 3 and Hibernate 3.5: CRUD DataGrid Example
this tutorial will walk through how to implement a crud (create, read, update, delete) datagrid using extjs, spring mvc 3 and hibernate 3.5. what do we usually want to do with data? create (insert) read / retrieve (select) update (update) delete / destroy (delete) until extjs 3.0 we only could read data using a datagrid. if you wanted to update, insert or delete, you had to do some code to make these actions work. now extjs 3.0 (and newest versions) introduces the ext.data.writer, and you do not need all that work to have a crud grid. so… what do i need to add in my code to make all these things working together? in this example, i’m going to use json as data format exchange between the browser and the server. extjs code first, you need an ext.data.jsonwriter: // the new datawriter component. var writer = new ext.data.jsonwriter({ encode: true, writeallfields: true }); where writeallfields identifies that we want to write all the fields from the record to the database. if you have a fancy orm then maybe you can set this to false. in this example, i’m using hibernate, and we have saveorupate method – in this case, we need all fields to updated the object in database, so we have to ser writeallfields to true. this is my record type declaration: var contact = ext.data.record.create([ {name: 'id'}, { name: 'name', type: 'string' }, { name: 'phone', type: 'string' }, { name: 'email', type: 'string' }]); now you need to setup a proxy like this one: var proxy = new ext.data.httpproxy({ api: { read : 'contact/view.action', create : 'contact/create.action', update: 'contact/update.action', destroy: 'contact/delete.action' } }); fyi, this is how my reader looks like: var reader = new ext.data.jsonreader({ totalproperty: 'total', successproperty: 'success', idproperty: 'id', root: 'data', messageproperty: 'message' // <-- new "messageproperty" meta-data }, contact); the writer and the proxy (and the reader) can be hooked to the store like this: // typical store collecting the proxy, reader and writer together. var store = new ext.data.store({ id: 'user', proxy: proxy, reader: reader, writer: writer, // <-- plug a datawriter into the store just as you would a reader autosave: false // <-- false would delay executing create, update, //destroy requests until specifically told to do so with some [save] buton. }); where autosave identifies if you want the data in automatically saving mode (you do not need a save button, the app will send the actions automatically to the server). in this case, i implemented a save button, so every record with new or updated value will have a red mark on the cell left up corner). when the user alters a value in the grid, then a “save” event occurs (if autosave is true). upon the “save” event the grid determines which cells has been altered. when we have an altered cell, then the corresponding record is sent to the server with the ‘root’ from the reader around it. e.g if we read with root “data”, then we send back with root “data”. we can have several records being sent at once. when updating to the server (e.g multiple edits). and to make you life even easier, let’s use the roweditor plugin, so you can easily edit or add new records. all you have to do is to add the css and js files in your page: add the plugin on you grid declaration: var editor = new ext.ux.grid.roweditor({ savetext: 'update' }); // create grid var grid = new ext.grid.gridpanel({ store: store, columns: [ {header: "name", width: 170, sortable: true, dataindex: 'name', editor: { xtype: 'textfield', allowblank: false }, {header: "phone #", width: 150, sortable: true, dataindex: 'phone', editor: { xtype: 'textfield', allowblank: false }, {header: "email", width: 150, sortable: true, dataindex: 'email', editor: { xtype: 'textfield', allowblank: false })} ], plugins: [editor], title: 'my contacts', height: 300, width:610, frame:true, tbar: [{ iconcls: 'icon-user-add', text: 'add contact', handler: function(){ var e = new contact({ name: 'new guy', phone: '(000) 000-0000', email: '[email protected]' }); editor.stopediting(); store.insert(0, e); grid.getview().refresh(); grid.getselectionmodel().selectrow(0); editor.startediting(0); } },{ iconcls: 'icon-user-delete', text: 'remove contact', handler: function(){ editor.stopediting(); var s = grid.getselectionmodel().getselections(); for(var i = 0, r; r = s[i]; i++){ store.remove(r); } } },{ iconcls: 'icon-user-save', text: 'save all modifications', handler: function(){ store.save(); } }] }); java code finally, you need some server side code. controller: package com.loiane.web; @controller public class contactcontroller { private contactservice contactservice; @requestmapping(value="/contact/view.action") public @responsebody map view() throws exception { try{ list contacts = contactservice.getcontactlist(); return getmap(contacts); } catch (exception e) { return getmodelmaperror("error retrieving contacts from database."); } } @requestmapping(value="/contact/create.action") public @responsebody map create(@requestparam object data) throws exception { try{ list contacts = contactservice.create(data); return getmap(contacts); } catch (exception e) { return getmodelmaperror("error trying to create contact."); } } @requestmapping(value="/contact/update.action") public @responsebody map update(@requestparam object data) throws exception { try{ list contacts = contactservice.update(data); return getmap(contacts); } catch (exception e) { return getmodelmaperror("error trying to update contact."); } } @requestmapping(value="/contact/delete.action") public @responsebody map delete(@requestparam object data) throws exception { try{ contactservice.delete(data); map modelmap = new hashmap(3); modelmap.put("success", true); return modelmap; } catch (exception e) { return getmodelmaperror("error trying to delete contact."); } } private map getmap(list contacts){ map modelmap = new hashmap(3); modelmap.put("total", contacts.size()); modelmap.put("data", contacts); modelmap.put("success", true); return modelmap; } private map getmodelmaperror(string msg){ map modelmap = new hashmap(2); modelmap.put("message", msg); modelmap.put("success", false); return modelmap; } @autowired public void setcontactservice(contactservice contactservice) { this.contactservice = contactservice; } } some observations: in spring 3, we can get the objects from requests directly in the method parameters using @requestparam. i don’t know why, but it did not work with extjs. i had to leave as an object and to the json-object parser myself. that is why i’m using a util class – to parser the object from request into my pojo class. if you know how i can replace object parameter from controller methods, please, leave a comment, because i’d really like to know that! service class: package com.loiane.service; @service public class contactservice { private contactdao contactdao; private util util; @transactional(readonly=true) public list getcontactlist(){ return contactdao.getcontacts(); } @transactional public list create(object data){ list newcontacts = new arraylist(); list list = util.getcontactsfromrequest(data); for (contact contact : list){ newcontacts.add(contactdao.savecontact(contact)); } return newcontacts; } @transactional public list update(object data){ list returncontacts = new arraylist(); list updatedcontacts = util.getcontactsfromrequest(data); for (contact contact : updatedcontacts){ returncontacts.add(contactdao.savecontact(contact)); } return returncontacts; } @transactional public void delete(object data){ //it is an array - have to cast to array object if (data.tostring().indexof('[') > -1){ list deletecontacts = util.getlistidfromjson(data); for (integer id : deletecontacts){ contactdao.deletecontact(id); } } else { //it is only one object - cast to object/bean integer id = integer.parseint(data.tostring()); contactdao.deletecontact(id); } } @autowired public void setcontactdao(contactdao contactdao) { this.contactdao = contactdao; } @autowired public void setutil(util util) { this.util = util; } } contact class – pojo: package com.loiane.model; @jsonautodetect @entity @table(name="contact") public class contact { private int id; private string name; private string phone; private string email; @id @generatedvalue @column(name="contact_id") public int getid() { return id; } public void setid(int id) { this.id = id; } @column(name="contact_name", nullable=false) public string getname() { return name; } public void setname(string name) { this.name = name; } @column(name="contact_phone", nullable=false) public string getphone() { return phone; } public void setphone(string phone) { this.phone = phone; } @column(name="contact_email", nullable=false) public string getemail() { return email; } public void setemail(string email) { this.email = email; } } dao class: package com.loiane.dao; @repository public class contactdao implements icontactdao{ private hibernatetemplate hibernatetemplate; @autowired public void setsessionfactory(sessionfactory sessionfactory) { hibernatetemplate = new hibernatetemplate(sessionfactory); } @suppresswarnings("unchecked") @override public list getcontacts() { return hibernatetemplate.find("from contact"); } @override public void deletecontact(int id){ object record = hibernatetemplate.load(contact.class, id); hibernatetemplate.delete(record); } @override public contact savecontact(contact contact){ hibernatetemplate.saveorupdate(contact); return contact; } } util class: package com.loiane.util; @component public class util { public list getcontactsfromrequest(object data){ list list; //it is an array - have to cast to array object if (data.tostring().indexof('[') > -1){ list = getlistcontactsfromjson(data); } else { //it is only one object - cast to object/bean contact contact = getcontactfromjson(data); list = new arraylist(); list.add(contact); } return list; } private contact getcontactfromjson(object data){ jsonobject jsonobject = jsonobject.fromobject(data); contact newcontact = (contact) jsonobject.tobean(jsonobject, contact.class); return newcontact; } ) private list getlistcontactsfromjson(object data){ jsonarray jsonarray = jsonarray.fromobject(data); list newcontacts = (list) jsonarray.tocollection(jsonarray,contact.class); return newcontacts; } public list getlistidfromjson(object data){ jsonarray jsonarray = jsonarray.fromobject(data); list idcontacts = (list) jsonarray.tocollection(jsonarray,integer.class); return idcontacts; } } if you want to see all the code (complete project will all the necessary files to run this app), download it from my github repository: http://github.com/loiane/extjs-crud-grid-spring-hibernate this was a requested post. i’ve got a lot of comments from my previous crud grid example and some emails. i made some adjustments to current code, but the idea is still the same. i hope i was able answer all the questions. happy coding! from http://loianegroner.com/2010/09/extjs-spring-mvc-3-and-hibernate-3-5-crud-datagrid-example/
September 3, 2010
by Loiane Groner
· 100,958 Views · 1 Like
article thumbnail
The different kinds of testing
Automated testing supports your constant effort in design and refactoring, and besides that ensures that your application actually works in a reliable and repeatable way. Tests at every level of detail are a form of executable specification and documentation. They give you immediate feedback and confidence that your code works, plus a satisfying green bar many times a day. I've been consulting on a Zend Framework application, with the goal of repairing the test suite and expanding it. In this article I'll describe the different categories of testing, as applied to a Zend Framework 1 application, but this classification pertains to every web application based on object-oriented programming. Since this kind of applications is obviously PHP-based, PHPUnit will be the tool of choice along with some of its standard extensions. For a panoramic of PHPUnit and its features, feel free to download my free ebook on the subject, which condenses much of the technical informations about it to a mere 50 pages. Let's start with the most debated and simple kind of testing - the one at the unit level. Unit testing Each unit tests target a unit of code in isolation - usually a class, and thus one or more objects instantiated from this class. The isolation property is what defines a unit test: its code must not have dependencies on other classes than the one under test, since they should be tested independently, by their own test classes. Since PHPUnit models a test case for a production code class as another class extending PHPUnit_Framework_TestCase, implementing unit testing leads very often to a parallel hierarchy of classes, where every Foo_Bar class has a corresponding Foo_BarTest test case. Given these premises, a unit test that fails tells you immediately where the error is: in the class it exercises. Moreover, it will be very fast to execute, since it works on only a single object at the time. Unit test should target mostly your models, and any code written by you that is not framework-specified: these would also be the classes that contain the majority of the business logic, and the most interesting to test. This code is usually composed of Plain Old PHP Objects and of subclasses of framework or library base classes when when they leave no other choice for integration. For writing unit tests, usually no external library other than PHPUnit is necessary. In a Zend Framework application you can usually reuse the bootstrap files, which set up things like autoloading, in the phpunit --bootstrap option or by defining it in the phpunit.xml configuration file. This way it will be executed only once for each test suite run. I prefer to leave initialization of the single components to test in the test cases itself, to ensure maximum isolation. However, a simpler and standard solution is to just run the whole Bootstrap class, with a custom configuration (application/config/application.ini), which 'testing' environment section is created by default by Zend_Tool. Pragmatic unit testing That's not a standard name. In some cases, you should also be pragmatic: you cannot usually mock all the external resources, nor you should since mocking a contract which you can't change can lead you to madness. You should configure a lightweight version of your dependencies and test with them. For example, if you're using the Doctrine Object-Relational Mapper, you must test the interaction with the database somewhere, and mocking the whole Doctrine infrastructure will be prohibitive and unuseful. The standard practice here is to use the real Doctrine infrastructure to test database-coupled classes, like Repositories and Data Access Objects, but to instantiate a lightweight database like an sqlite in-memory one which is much faster in its operations than a production one. This database can then be discarded or truncated at the end of each test to ensure no global state is shared between test cases. The downside in this approach is that sqlite is not the real database; one time I was testing with it and due to a bug (feature?) in Doctrine 1 the code failed in MySQL while passing with Sqlite. The reason was sqlite does not support foreign key constraints and was simply ignoring them, while MySQL correctly throwed exceptions when they were violated. Moreover, these tests are never fast as the ones totally isolated from external libraries. The upside is that the tests for classes interacting with the database via Doctrine or another ORM still have the benefit of the unit level: when the test fail, it is clear that the related production code class has encountered a regression, because the ORM code is only imported in discrete, distant points of time, when the test suite is green, and so could never change while you're expanding your code. Nevertheless this kind of testing should be applied only to the adapters of your application, which constitute the boundary of the object graph towards external components like databases, web services or the filesystem. Functional testing Functional testing's goal is to exercise a medium-sized object graph, without instantiating the whole application, to a cover a full functionality and make sure the classes adhere to the same contract. For example, these tests can target a service layer built upon your Domain Model, if you want to enhance to cover your factories or DI mechanisms. In other cases, they can target the controllers: this happens when you have supplemental logic on the client side. In case of functional testing on plain old classes, PHPUnit suffices again. In case you target controllers instead, the Zend_Test component gives you a Zend_Test_PHPUnit_ControllerTestCase class which you can extend to gain helper functionalities. Basically, every test method of a Zend_Test test case makes at least a HTTP request. The helper test case sets up a fake HTTP request and response objects in every setUp(), and lets you check the result, being it written HTML (via querying and asserting), XML or JSON. Integration testing Integration tests target an external component such as a library to ensure the expectations of the developers on it are met. Integration tests are usually started as exploratory tests, which are used to learn about the library and to encapsulate this knowledge into a repeatable, executable form. With time, they become regression tests, which allow you to upgrade the library to a new release or version by catching the changes in behavior. Some of these tests target the PHP runtime itself, to check for example that an extension assumed as present is really available. For example, this week we were surprised when a === check inside a Domain Model class was failing. We started writing integration tests for Doctrine_Query, and it turned out that PDO and Doctrine returned strings for numeric fields on their Active Record. By having a specific test to cover our expectations, we understood where our assumption was wrong, and cease to suspect a bug in our own code where the === resided. For this kind of test, again only PHPUnit is necessary; moreover, you'll have to bootstrap the involved library, but it can be simply a matter of adding it to the include_path. Acceptance testing Acceptance tests are end-to-end tests, which see the application as a black box. They exercise the behavior of the whole application, from the user inserting data to the reports created and the actions performed as a consequence. These tests are much slower, but they work on the end result of your work, and define what the user will see and interact with. For old-style applications, which do not involve rich clients, Zend_Test is usually enough for these kinds of tests. A thin layer of CSS expression built over it in order to check the pages without duplicating the same selectors all over the suite may help. However, for Javascript-rich apps, a tool like Selenium is necessary. Selenium drives a real web browser to a fresh instance of your application, and execute your tests, which can be defined manually or via a record-and-replay browser extension. Many PHPUnit extensions offer the means for connecting to a Selenium server, which manages the browsers, and navigate the web application. As a result of its focus on real web browsers such as Firefox and Chrome, Selenium tests are much slower than Zend_Test ones. However, they are the only tool available to execute acceptance tests which involve JavaScript. Conclusion Note that everyone of these kinds of tests (except the integration ones) can be written before the production code it exercises. Unit tests ahead of their referred class; functional tests ahead of the Facade they target; acceptance tests before a whole vertical slice of functionality is implemented. Moreover, if you're doing Test-Driven Development you should in general start at the higher level of abstraction (acceptance) and descending into the lower levels as needed. These different types of testing are always present, maybe as a small part of the suite, in every web application of moderate size. Learning to recognize them when they emerge will help you organizing the test suite better and maintaining it productive and responsive to change.
August 29, 2010
by Giorgio Sironi
· 31,229 Views
article thumbnail
Practical PHP Patterns: Optimistic Offline Lock
In this series we are now entering the realm of concurrency, an option which adds complexity to an application as many different threads of execution are accessing the state storage at the same time. There is no native multithreading support in PHP (every script gets its own isolated process), but still concurrency can easily become an issue: multiple clients from all over the world continuosly make requests to PHP applications, and they can easily mutually overwrite their changesets. A classic example of race condition in the PHP world is two different clients filling an editing form referred to the same entity. They both will submit the form once it is complete, and the first one will get his changes overwritten by the second request. There are many other common situations were concurrent user can provoke errors in the system. Just think of different people choosing the same username and being told via Ajax that it is available; the slower one of them will be surprised when he submits his registration form. Background Before entering the explanation of patterns emerged to solve the concurrency problems, we need some definitions. First of all the notion of transaction is necessary: we define a transaction as a change of state of the application. Editing an entity is a transaction; adding or deleting another is still a transaction. The first kind of transaction we are interested in is the database transactions, which is totally accomplished in one PHP script. This is usually automatically enforced by mechanisms supported at the database level. The other kind of transaction is the business transaction, which spans over multiple HTTP requests and makes uses of from one to N database transactions. It comprehends checking out data, populating a form or other kind of rich user interface, modifying or adding data (a human-based action), and sending it back to the server. There is no automatic enforcement for business transaction, since they are defined by the business rules of the domain. This is not a problem that originates because of PHP nature, but because of the separation between client and server which the web is based on. Optimistic lock The Optimistic Offline Lock pattern is a way of ensuring integrity of data, avoiding the option that different clients submit conflicting changes. As the name suggests, it assumes that the chances of conflict are low. Indeed, when this is the case the optimistic lock does not slow down the user interaction a bit. The goal of this pattern is detecting a conflicting change and instead of applying it, rollback the business transaction and present an error to the user. It accomplishes this goal by validating that no one else has tampered with a record in the data source prior to allowing the modification to be committed. All open source version control systems such as Subversion and Git implement optimistic lock: anyone may check out a source file and work on it, to end his little fork later with a commit or push. The pain comes while merging, so you are supposed to integrate often. We also borrowed terminology from the source control systems, so in this article you'll encounter terms like commit, checkout, and merge. Implementation The most common implementation of Optimistic Offline Lock is a numerical version field on the record to protect from condurrency issues; to aid rollback notifications, other additional fields are useful for signaling the conflict, like the id of the last user that modified the record. The pattern inner working is not complex: when the data is submitted along with the version field value kept on the client, the version field in it must be the same currently present in the database. Only then it is incremented and the changeset committed. Encountering a different version field value in the database record means someone else has modified the data in between our checkout and commit, and so it must be preserved. For example we can show a diff to the user, like VCS does; in any case, we should interrupt the transaction. RDBMS and ORMs can simply use an additional column on the table where the root object is stored to support this pattern. Alternatives An alternative implementation consists in using all fields in the WHERE clause of the UPDATE (or only the sensible ones, or only the modified ones to let transactions that affect different field succeed when the business logic allows it. See below). This solution is handy when you can't add a version field, but it may have performance impact. Another alternative is to check conditions instead of version fields, which is practical in different use cases. For example, we can check the existence of a record before deleting it. This is already indirectly done to a cartain extent by ORMs and other abstraction layers when they provide you with an object abstraction you can calla delete() method on. An extension to the functionality of this pattern is checking that a current editing will (probably) commit, as a feature available at any time during editing of the data. This feature should check that the checkout data is still current. The domain An important is that it is part of the job of business domain logic to decide when a conflict occurs: some concurent changesets may be acceptable, while others may not be allowed even if they modify different fields. In his book, Fowler makes the example of adding elements to a collection concurrently. We can't know if this is right by seeing that the object is a collection, because it is the abstraction that it represent that must be maintained valid: sometimes it is right to add elements, sometimes the transaction should be stopped. Also merging strategies, which solve the conflicts, are subsceptible to domain considerations. Some are valuable and should be pursued, while some are costly and a rollback with manual user editing of the data is fine. Advantages The greatest advantage of this pattern is that it can support real time concurrency, like the check out of multiple items by multiple users simultaneously, as long as there is a merging strategy in place. It can also easily prevent race conditions. This pattern is also easy to implement, and thus it is the default choice to solve concurrency issues. Disadvantages When the conflict probability is high, since there are many concurrent transactions, this pattern produces too many rollbacks. It is not adequate for use cases where a pessimistic pattern should be adopted. Examples Doctrine 2 uses natively database transactions: it only commits the changes made in a PHP script when the EntityManager::flush() method is called. It automatically rolls back if an error is detected. Besides that, Doctrine 2 has also automatic Optimistic Offline Locking support, via the addition of a version field to the entity to lock.
August 17, 2010
by Giorgio Sironi
· 4,474 Views · 1 Like
article thumbnail
Migrating from Cassandra to MongoDB
I'll start off by saying this article is not intended to be a Cassandra-bashing session, instead it provides an interesting look at one development company's case study to show that Cassandra (although it's fantastic for some) is not for everyone. The company is Nodeta and the application is Flowdock, a free tool (currently in beta) that functions as a web-based team messenger in place of Campfires, Skype Chats, IRCs, etc. Otto Hilska, who talked about the migration, says that "All software developers should be using it… because it better supports their actual workflow.." About a week ago the team finished their transition from the Apache Cassandra NoSQL database to another NoSQL, MongoDB. The switch was made due to stability issues that the developers were having with Cassandra. Hilska explained the details of his company's experience with Cassandra: " All nodes would go into an infinite loop, running GC and trying to compact the data files – occasionally falling off the cluster. We were unable to solve the problem, except that restarting and then compacting a node usually settled it down for a while. Other people had reported similar problems. Last couple of weeks our Cassandra nodes always ate all the resources they were given, slowing down Flowdock. This was not the first time we had run into problems because of our bleeding edge database choice. When upgrading from 0.4 to 0.5, we had to shut down the cluster, only to find out that it hadn’t flushed everything to the disk (even though we explicitly flushed it, as instructed). Thus we ended up having a couple of minutes of discussions lost, and our custom-built indices were miserably out of date and needed to be rebuilt. I think it was 4 AM when we finally got to leave the office."--Otto Hilska Flowdock developers became attracted to a new NoSQL store, MongoDB, because of its recent addition of auto-sharding and replica sets. Hilska wrote the conversion script in a day and it took a week to get Flowdock running purely on MongoDB. Then Nodeta tested it internally for a few weeks before they deployed it to production. However, MongoDB is not without flaws as well. Dots are not allowed in BSON document keys and the document size is limited to 4MB. It's also not as easy to add new nodes as it is with Cassandra. On the other hand, Hilska says that the smart (multikey) indices, complex queries directly from the console, MapReduce, GridFS, and lack of issues make up for these minor flaws. I recently posted a guide for determining the right database solution (Relational or NoSQL) for various use cases. The article has some very good resources and includes situations where MongoDB or Cassandra might be the best choice. Some criticisms of Cassandra emerged a few weeks ago when Twitter announced that it would over to the NoSQL store. By no means has Twitter stopped using Cassandra. They have stated that it's currently being used to store geolocation data and data mining results that feed into things like local trends and @toptweets. The NoSQL's creators at Facebook are also still using Cassandra. Production deployments of MongoDB exist at Foursquare, SourceForge, The New York Times, BoxedIce, GitHub, and SugarCRM.
July 26, 2010
by Mitch Pronschinske
· 19,502 Views
article thumbnail
Optimizing JPA Performance: An EclipseLink, Hibernate, and OpenJPA Comparison
'Impedance mismatch'. No two words encompass the troubles, headaches and quirks most developers face when attempting to link applications to relational databases (RDBMS). But lets face it, object orientated designs aren't going away anytime soon from mainstream languages and neither are the relational storage systems used in most applications. One side works with objects, while the other with tables. Resolving these differences -- or as its technically referred to 'object/relational impedance mismatch' -- can result in substantial overhead, which in turn can materialize into poor application performance. In Java, the Java Persistence API (JPA) is one of the most popular mechanisms used to bridge the gap between objects (i.e. the Java language) and tables (i.e. relational databases). Though there are other mechanisms that allow Java applications to interact with relational databases -- such as JDBC and JDO -- JPA has gained wider adoption due to its underpinnings: Object Relational Mapping (ORM). ORM's gain in popularity is due precisely to it being specifically designed to address the interaction between object and tables. In the case of JPA, there is a standard body charged with setting its course, a process which has given way to several JPA implementations, among the three most popular you will find: EclipseLink (evolved from TopLink), Hibernate and OpenJPA. But even though all three are based on the same standard, ORM being such a deep and complex topic, beyond core functionality each implementation has differences ranging from configuration to optimization techniques. What I will do next is explain a series of topics related to optimizing an application's use of the JPA, using and comparing each of the previous JPA implementations. While JPA is capable of automatically creating relational tables and can work with a series of relational database vendors, I will part from having pre-existing data deployed on a MySQL relational database, in addition to relying on the Spring framework to facilitate the use of the JPA. This will not only make it a fairer comparison, but also make the described techniques appealing to a wider audience, since performance issues become a serious concern once you have a large volume of data, in addition to MySQL and Spring being a common choice due to their community driven (i.e. open-source) roots. See the source code/application section at the end for instructions on setting up the application code discussed in the remainder of the sections. Download the Source Code associated with this article (~45 MB) The basics: Metrics In order to establish JPA performance levels in an application, it's vital to first obtain a series of metrics related to a JPA implementation's inner workings. These include things like: What are the actual queries being performed against a RDBMS? How long does each query take? Are queries being performed constantly against the RDBMS or is a cache being used? These metrics will be critical to our performance analysis, since they will shed light on the underlying operations performed by a JPA implementation and in the process show the effectiveness or ineffectiveness of certain techniques. In this area you will find the first differences among implementations, and I'm not talking about metric results, but actually how to obtain these metrics. To kick things off, I will first address the topic of logging. By default, all three JPA implementations discussed here -- EclipseLink, Hibernate and OpenJPA -- log the query performed against a RDBMS, which will be an advantage in determining if the queries performed by an ORM are optimal for a particular relational data model. Nevertheless, tweaking the logging level of a JPA implementation further can be helpful for one of two things: Getting even more details from the underlying operations made by a JPA -- which can be turned off by default (e.g. database connection details) -- or getting no logging information at all -- which can benefit a production system's performance. Logging in JPA implementations is managed through one of several logging frameworks, such as Apache Commons Logging or Log4J. This requires the presence of such libraries in an application. Logging configuration of a JPA implementation is mostly done through a value in an application's persistence.xml file or in some cases, directly in a logging framework's configuration files. The following table describes JPA logging configuration parameters: Large table, so here's an external link In addition to the information obtained through logging, there is another set of JPA performance metrics which require different steps to be obtained. One of these metrics is the time it takes to perform a query. Even though some JPA implementations provide this information using certain configurations, some do not. Even so, I opted to use a separate approach and apply it to all three JPA implementations in question. After all, time metrics measured in milliseconds can be skewed in certain ways depending on start and end time criteria. So to measure query times, I will use Aspects with the aid of the Spring framework. Aspects will allow us to measure the time it takes a method containing a query to be executed, without mixing the timing logic with the actual query logic -- the last feature of which is the whole purpose of using Aspects. Further discussing Aspects would go beyond the scope of performance, so next I will concentrate on the Aspect itself. I advise you to look over the accompanying source code, Aspects and Spring Aspects for more details on these topics and their configuration. The following Aspect is used for measuring execution times in query methods. package com.webforefront.aop;import org.apache.commons.lang.time.StopWatch;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Pointcut;import org.aspectj.lang.annotation.Aspect;@Aspectpublic class DAOInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object logQueryTimes(ProceedingJoinPoint pjp) throws Throwable { StopWatch stopWatch = new StopWatch(); stopWatch.start(); Object retVal = pjp.proceed(); stopWatch.stop(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": " + stopWatch.getTime() + "ms"); return retVal; } The main part of the Aspect is the @Around annotation. The value assigned to this last annotation indicates to execute the aspect method -- logQueryTimes -- each time a method belonging to a class in the com.webforefront.jpa.service package is executed -- this last package is where all our application's JPA query methods will reside. The logic performed by the logQueryTimes aspect method is tasked with calculating the execution time and outputting it as logging information using Apache Commons Logging. Another set of important JPA metrics is related to statistics beyond those provided by standard logging. The statistics I'm referring to are things related to caches, sessions and transactions. Since the JPA standard doesn't dictate any particular approach to statistics, each JPA implementation also varies in the type and way it collects statistics. Both Hibernate and OpenJPA have their own statistics class, where as EclipseLink relies on a Profiler to gather similar metrics. Since I'm already relying on Aspects, I will also use an Aspect to obtain statistics both prior and after the execution of a JPA query method. The following Aspect obtains statistics for an application relying on Hibernate. package com.webforefront.aop;import org.hibernate.stat.Statistics;import org.hibernate.SessionFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.hibernate.ejb.HibernateEntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheHibernateInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { HibernateEntityManagerFactory hbmanagerfactory = (HibernateEntityManagerFactory) entityManagerFactory; SessionFactory sessionFactory = hbmanagerfactory.getSessionFactory(); Statistics statistics = sessionFactory.getStatistics(); String str = pjp.getTarget().toString(); statistics.setStatisticsEnabled(true); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) " + statistics); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) " + statistics); return result; } } Notice the similar structure to the prior timing Aspect, except in this case the logging output contains values that belong to the Statistics Hibernate class obtained via the application's EntityManagerFactory. The next Aspect is used to obtain statistics for an application relying on OpenJPA. package com.webforefront.aop;import org.apache.openjpa.datacache.CacheStatistics;import org.apache.openjpa.persistence.OpenJPAEntityManagerFactory;import org.apache.openjpa.persistence.OpenJPAPersistence;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheOpenJPAInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { OpenJPAEntityManagerFactory ojpamanagerfactory = OpenJPAPersistence.cast(entityManagerFactory); CacheStatistics statistics = ojpamanagerfactory.getStoreCache().getStatistics(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); return result; } } Once again, notice the similar Aspect structure to the previous Aspect which relies on an application's EntityManagerFactory. In this case, the logging output contains values that belong to the CacheStatistics OpenJPA class. Since OpenJPA does not enable statistics by default, you will need to add the following two properties to an application's persistence.xml file: The first property ensures statistics are gathered, while the second property is used to indicate the gathering of statistics take place on a single JVM. NOTE: The value "true(EnableStatistics=true)" also enables caching in addition to statistics. Since EclipseLink doesn't have any particular statistics class and relies on a Profiler to determine advanced metrics, the simplest way to obtain similar statistics to those of Hibernate and OpenJPA is through the Profiler itself. To active EclipseLink's Profiler you just need to add the following property to an application's persistence.xml file: . By doing so, the EclipseLink Profiler output's several metrics on each JPA query method execution as logging information. Now that you know how to obtain several metrics from all three JPA implementations and understand they will be obtained as fairly as possible for all three providers, it's time to put each JPA implementation to the test along with several performance techniques. JPQL queries, weaving and class transformations Lets start by making a query that retrieves data belonging to a pre-existing RDBMS table named "Master". The "Master" table contains over 17,000 records belonging to baseball players. To simplify matters, I will create a Java class named "Player" and map it to the "Master" table in order to retrieve the records as objects. Next, relying on the Spring framework's JpaTemplate functionality, I will setup a query to retrieve all "Player" objects, with the query taking the following form: getJpaTemplate().find("select e from Player e"); See the accompanying source code for more details on this last process. Next, I deploy the application using each of the three JPA implementations on Apache Tomcat, doing so separately, as well as starting and stopping the server on each deployment to ensure fair results. These are the results of doing so on a 64-bit Ubuntu-4GB RAM box, using Java 1.6: All player objects - 17,468 records Time Query Hibernate 3558 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ EclipseLink (Run-time weaver - Spring ReflectiveLoadTimeWeaver weaver ) 3215 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (Build-time weaving) 3571 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (No weaving) 3996 ms SELECT lahmanID, nameLast, nameFirst FROM Master OpenJPA (Build-time enhanced classes) 5998 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Run-time enhanced classes- OpenJPA enhancer) 6136 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Non enhanced classes) 7677 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 As you can observe, the queries performed by each JPA implementation are fairly similar, with two of them using a shortcut notation (e.g. t0 and player0 for the table named 'Master'). This syntax variation though has minimal impact on performance, since directly querying an RDBMS using any of these notation variations shows identical results. However, the query times made through several JPA implementations using distinct parameters vary considerably. One important factor leading to this time difference is due to how each implementation handles JPA entities. Lets start with the OpenJPA implementation which had the poorest times. OpenJPA can execute an enhancement process on Java entities (e.g. in this case the 'Player' class). This enhancement process can be performed when the entities are built, at run-time or foregone altogether. As you can observe, foregoing entity enhancement altogether in OpenJPA produced the longest query times. Where as enhancing entities at either build-time or run-time produced relatively better results, with the former beating out the latter. By default, OpenJPA expects entities to be enhanced. This means you will either need to explicitly configure an application to support unenhanced classes by adding the following: ...property to an application's persistence.xml file or enhance classes at build-time or at run-time relying on the OpenJPA enhancer, otherwise an application relying on OpenJPA will throw an error. Given these OpenJPA results, the remaining OpenJPA tests will be based on build-time enhanced entity classes. For more on the topic of OpenJPA enhancement, refer to the OpenJPA documentation in addition to consulting the accompanying source code for this article. You may be wondering what exactly constitutes OpenJPA enhancement ? OpenJPA entity enhancement is a processing step applied to the bytecode generated by the Java compiler which adds JPA specific instructions to provide optimal runtime performance, these instructions can include things like flexible lazy loading and dirty read tracking. So why doesn't Hibernate or EclipseLink enhance entities ? In short, Hibernate and EclipseLink also enhance JPA entites, they just don't outright call it 'enhancement'. EclipseLink calls this 'enhancement' process by the more technical term: weaving. Similar to OpenJPA's enhancement process, weaving in EclipseLink can take place at either build-time (a.k.a. static weaving), run-time or forgone altogether. As you can observe in the results, all of EclipseLink's tests present smaller variations compared to OpenJPA. The longest EclipseLink variation involved not using weaving. If you think about it, this is rather logical given that the purpose of weaving consists of altering Java byte code for the purpose of adding optimized JPA instructions that include lazy loading, change tracking, fetch groups and internal optimizations. For the EclipseLink tests using weaving, both build-time and run-time weaving present better results. For build-time weaving, I used EclipseLink's library along with an Apache Ant task, where as for run-time weaving, I used the Spring framework's ReflectiveLoadTimeWeaver. I can only assume the slightly better performance of using run-time weaving over build-time weaving in EclipseLink was due to the fact of using a weaver integrated with the Spring framework, which in turn could result in better JPA optimizations designed for Spring applications. Nevertheless, considering the test result of forgoing weaving altogether, weaving does not appear to be a major performance impact when using EclipseLink, ceteris paribus. By default, EclipseLink expects run-time weaving to be enabled, otherwise you will receive an error in the form 'Cannot apply class transformer without LoadTimeWeaver specified'. This means that for cases using build-time weaving or no weaving at all, you will need to explicitly indicate this behavior. In order to disable EclipseLink weaving you will need to either configure an application's EntityManagerFactory Spring bean with: ... or add the .... ...property to an application's persistence.xml file. To indicate an application's entities are built using build-time weaving, substitute the previous property's "false" value with "static". To configure the default run-time weaver expected by EclipseLink, add the following: ...property to an application's EntityManagerFactory Spring bean. Given these EclipseLink results, the remaining EclipseLink tests will be based on run-time weaving provided by the Spring framework. For more on the topic of EclipseLink weaving, refer to the EclipseLink documentation at http://wiki.eclipse.org/Introduction_to_EclipseLink_Application_Development_(ELUG)#Using_Weaving, in addition to consulting the accompanying source code for this article. Hibernate doesn't require neither enhancing JPA entities or weaving. For this reason, there is only one test result. This not only makes Hibernate simpler to setup, but judging by its only test result -- which clock's in at second place with respect to all other tests -- Hibernate's performance ranks high compared to its counterparts. However, in what I would consider Hibernate's equivalent to OpenJPA's enhancement process or EclipseLink's weaving, you will find a series of Hibernate properties. For example, Hibernate has properties such as hibernate.default_batch_fetch_size designed to optimize lazy loading. As you might recall, among the purposes of both OpenJPA's enhancement process and EclipseLink's weaving are the optimization of lazy loading. So where as OpenJPA and EclipseLink require a separate and monolithic step -- at build-time or run-time -- to achieve JPA optimization techniques, Hibernate falls back to the use of granular properties specified in an application's persistence.xml file. Nevertheless, given that Hibernate's default behavior proved to be on par with the best query times, I didn't feel a need to further explore with these Hibernate properties. To get another sense of the times and mapping procedures of each JPA implementation, I will make more selective queries based on a Player object's first name and last name. These are the results of performing a query for all Player objects whose first name is John and a query for all Player objects whose last name in Smith. All player objects whose first name is John - 472 records Time Query EclipseLink 1265 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameFirst = ?) Hibernate 613 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameFirst=? OpenJPA 1643 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameFirst = ?) [params=?] All player objects whose last name is Smith - 146 records Time Query EclipseLink 986 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameLastt = ?) Hibernate 537 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameLast=? OpenJPA 1452 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameLast = ?) [params=?] These test results tell a slightly different story,with all three JPA implementations presenting substantial time differences amongst one another. At a lower record count, Hibernate's out-of-the-box configuration resulted in almost twice as fast queries as its closest competitor and almost three times faster queries than its other competitor. To get an even broader sense of the times and mapping procedures of each JPA implementation, I will make a query on a single Player object based on its id. These are the results of performing such a query. Single player object whose ID is 777- 1 record Time Query EclipseLink 521 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (lahmanID = ?) Hibernate 157 ms select player0_.lahmanID as lahmanID0_0_, player0_.nameFirst as nameFirst0_0_, player0_.nameLast as nameLast0_0_ from Master player0_ where player0_.lahmanID=? OpenJPA 1052 ms SELECT t0.nameFirst, t0.nameLast FROM Master t0 WHERE t0.lahmanID = ? [params=?] With the exception of the faster query times -- due to it being a query for a single Player object -- the times between JPA implementations are practically in proportion to the queries used for extracting multiple Player objects by first and last name. This will do it as far as test queries are concerned. However, a word of caution is in order when discussing these topics on optimization/enhancement/weaving. Even though the previous tests consisted of querying over 17,000 records and confirm clear advantages of using one provider and technique over another, they are still one dimensional, since they're based on read operations performed on a single object type and a single RDBMS table. JPA can perform a large array of operations that also include updating, writing and deleting RDBMS records, not to mention the execution of more elaborate queries that can span multiple objects and tables. In addition, RDBMS themselves can have influencing factors (e.g. indexes) over JPA query times. So all this said, it's not too far fetched to think the use of OpenJPA entity enhancement, EclipseLink weaving or Hibernate properties, could have varying degrees -- either beneficial or detrimental -- depending on the queries (i.e. multi-table, multi-object) and type of JPA operation (i.e. read, write, update, delete) involved. Next, I will describe one of the most popular techniques used to boost performance in JPA applications. Caches A cache allows data to remain closer to an application's tier without constantly polling an RDBMS for the same data. I entitled the section in plural -- caches -- because there can be several caches involved in an application using JPA. This of course doesn't mean you have to configure or use all the caches provided by an application relying on JPA, but properly configuring caches can go a long way toward enhancing an application's JPA performance. So lets start by analyzing what it's each JPA implementation offers in its out-of-the-box state in terms of caching. The following table illustrates tests done by simply invoking the previous JPA queries for a second and third consecutive time, without stopping the server. Note that the same process of deploying a single application at once was used, in addition to the server being re-started on each set of tests. Query / Implementation EclipseLink Hibernate OpenJPA All records (1st time) 3215 ms 3558 ms 5998 ms All records (2nd time) 507 ms 272 ms 521 ms All records (3rd time) 439 ms 218 ms 263 ms First name (1st time) 1265 ms 613 ms 1643 ms First name (2nd time) 151 ms 115 ms 239 ms First name (3rd time) 154 ms 101 ms 227 ms Last name (1st time) 986 ms 537 ms 1452 ms Last name (2nd time) 41 ms 41 ms 112 ms Last name (3rd time) 65 ms 38 ms 117 ms By ID (1st time) 521 ms 157 ms 1052 ms By ID (2nd time) 1 ms 6 ms 3 ms By ID (3rd time) 1 ms 3 ms 3 ms As you can observe, on both the second and third invocation all the queries show substantial improvements with respect to the first invocation. The primary cause for these improvements is unequivocally due to the use of a cache. But what type of cache exactly ? Could it be an RDBMS's own caching engine ? JPA ? Spring ? Or some other variation ?. In order to shed some light on cache usage, the following table illustrates the cache statistics generated on each of the previous JPA queries. Query / Impleme)ntation EclipseLink Hibernate OpenJPA All records (2nd time) number of objects=17468, total time=506, local time=506, row fetch=65, object building=328, cache=112, sql execute=47, objects/second=34521, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=34936, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A All records (3rd time) number of objects=17468, total time=435, local time=435, profiling time=1, row fetch=28, object building=323, cache=106, logging=1, sql execute=27, objects/second=40156, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=52404, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (2nd time) number of objects=472, total time=148, local time=148, row fetch=27, object building=106, cache=7, logging=1, sql execute=3, objects/second=3189, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=944, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (3rd time) number of objects=472, total time=152, local time=152, row fetch=20, object building=121, cache=7, sql execute=3, objects/second=3105, sessions opened=3, sessions closed=3 connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=1416, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (2nd time) number of objects=146, total time=40, local time=40, row fetch=7, object building=27, cache=2, logging=1, sql execute=3, objects/second=3650, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=292, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (3rd time) number of objects=146, total time=63, local time=63, profiling time=1, row fetch=6, object building=19, cache=5, sql prepare=1, sql execute=23, objects/second=2317, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=438 queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (2nd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=2, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (3rd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=3, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A Notice the statistics generated by each JPA implementation are different. EclipseLink reports a single cache statistic, OpenJPA doesn't even report statistics unless a cache is enabled -- see previous section on metrics for details on this behavior -- and Hibernate reports two cache related statistics: second level cache and query cache. At this juncture, if you look at the test results and statistics for the second and third invocation, something won't add up. How is it that OpenJPA's test results came out faster when caching is disabled by default ? An how about Hibernate returning 0's on its cache related statistics, even when its test results came out faster ? The reason for this performance increase is due to RDBMS caching. On the first query, the RDBMS needs to read data from its own file system (i.e. perform an I/O operation), on subsequent requests the data is present in RDBMS memory (i.e. its cache) making the entire JPA query much faster. A closer look at the Hibernate statistics field 'queries executed to the database' can confirm this. Notice that on every second query it shows 2 and on every third query it shows 3, meaning the data was read directly from the database. NOTE: The only exception to this occurs when a query is made on a single entity (i.e. by id), I will address this shortly. Next, lets start breaking down the caches you will encounter when using JPA applications. The JPA 2.0 standard defines two types of caches: A first level cache and a second level cache. The first level cache or EntityManager cache is used to properly handle JPA transactions. A first level cache only exist for the duration of the EntityManager. With the exception of long lived operations performed against a RDBMS, JPA EntityManager's are short lived and are created & destroyed per request or per transaction. In this case, given the nature of the queries, first level caches are cleared on every query. A second level cache on the other hand is a broader cache that can be used across transactions and users. This makes a JPA second level cache more powerful, since it can avoid constantly polling an RDBMS for the same data. But even though the JPA 2.0 standard now addresses second level cache features, this was not the case in JPA 1.0. In the 1.0 version of the JPA standard only a first level cache was addressed, leaving the door completely open on the topic of a second level cache. This created a fragmented approach to caching in JPA implementations, which even now as JPA 2.0 compliant implementations emerge, some non-standard features continue to be part of certain implementations given the value they provide to JPA caching in general. So as I move forward, bear in mind that just like previous JPA topics, each JPA implementation can have its own particular way of dealing with second level caching. I will start with OpenJPA, which has the least amount of proprietary caching options. To enable OpenJPA caching (i.e. second level caching) you need to declare the following two properties in an application's persistence.xml file: The first property ensures caching and statistics are activated, while the second property is used to indicate caching take place on a single JVM. The following results and statistics were obtained with OpenJPA's second level cache enabled. Query with OpenJPA caching Time Statistics Time without statistics All records (2nd time) 420 ms read count=34936, hit count=17468, write count=17468, total read count=34936, total hit count=17468, total write count=17468 347 ms All records (3rd time) 254 ms read count=52404, hit count=34936, write count=17468, total read count=52404, total hit count=34936, total write count=17468 230 ms First name (2nd time) 125 ms read count=944, hit count=472, write count=472, total read count=944, total hit count=472, total write count=472 127 ms First name (3rd time) 114 ms read count=1416, hit count=944, write count=472, total read count=1416, total hit count=944, total write count=472 132 ms Last name (2nd time) 63 ms read count=292, hit count=146, write count=146, total read count=292, total hit count=146, total write count=146 53 ms Last name (3rd time) 49 ms read count=438, hit count=292, write count=146, total read count=438, total hit count=292, total write count=146 50 ms By ID (2nd time) 5 ms read count=2, hit count=1, write count=1, total read count=2, total hit count=1, total write count=1 1 ms By ID (3rd time) 4 ms read count=3, hit count=2, write count=1, total read count=3, total hit count=2, total write count=1 1 ms As these test results illustrate, executing subsequent JPA queries with OpenJPA's second level cache produce superior results. Another important behavior illustrated in some of these test cases is that by simply disabling statistics -- and still using the second level cache -- query times improve even more. The OpenJPA statistics also demonstrate how the cache is being used. Notice that on each subsequent query the statistics field 'hit count' is duplicated, which means data is being read from the cache (i.e. a hit). Also notice the statistics field 'write count' remains static, which means data is only written once from the RDBMS to the cache. This is pretty basic functionality for a second level cache. On certain occasions a need may arise to interact directly with a cache. These interactions can range from prohibiting an entity from being cached, assigning a particular amount of memory to a cache, forcing an entity to always be cached, flushing all the data contained in a cache, or even plugging-in a third party caching solution to provide a more robust strategy, among other things. The JPA 2.0 standard provides a very basic feature set in terms of second level caching through javax.persistence.Cache. Upon consulting this interface, you'll realize it only provides four methods charged with verifying the presence of entities and evicting them. This feature set not only proves to be limited, but also cumbersome since it can only be leveraged programmatically (i.e. through an API). In this sense, and as I've already mentioned, JPA implementations have provided a series of features ranging from persistence.xml properties to Java annotations related to second level caching. OpenJPA offers several of these second level caching features, including a separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will point you directly to OpenJPA's cache documentation available at http://openjpa.apache.org/builds/apache-openjpa-2.1.0-SNAPSHOT/docs/manual/ref_guide_caching.html#ref_guide_cache_query so you can try these parameters for yourself on the accompanying application source code. Hibernate just like OpenJPA has its second level cache disabled. To enable Hibernate's second level cache you need to add the following properties to an application's persistence.xml file: Its worth mentioning that Hibernate has integral support for other second level caches. The previous properties displayed how to enable the HashtableCacheProvider cache -- the simplest of the integral second level caches -- but Hibernate also provides support for five additional caches, which include: EHCache, OSCache, SwarmCache, JBoss cache 1 and JBoss cache 2, all of which provide distinct features, albeit require additional configuration. Besides these properties, Hibernate also requires that each JPA entity be declared with a caching strategy. In this case, since the Person entity is read only, a caching strategy like the following would be used: Similar to OpenJPA, Hibernate also offers several second level caching features through proprietary annotations and configurations, as well as support for the separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will also point you directly to Hibernate's cache documentation available at http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache so you can try these parameters for yourself on the accompanying application source code. Unlike OpenJPA and Hibernate, EclipseLink's second level cache is enabled by default, therefore there is no need to provide any additional configuration. However, similar to its counterparts, EclipseLink also has a series of proprietary second level cache features which can enhance JPA performance. You can find more information on these features by consulting EclipseLink's cache documentation available at: http://wiki.eclipse.org/Introduction_to_Cache_(ELUG) With this we bring our discussion on object relational mapping performance with JPA to a close. I hope you found the various tests and metrics presented here a helpful aid in making decisions about your own JPA applications. In addition, don't forget you can rely on the accompanying source code to try out several JPA variations more ad-hoc to your circumstances. About the author Daniel Rubio is an independent technology consultant specializing in enterprise and web-based software. He blogs regularly on these and other software areas at http://www.webforefront.com. He's also authored and co-authored three books on Java technology. Source code/Application installation * Install MySQL on your workstation (Tested on MySQL 5.1.37-64 bits) - http://dev.mysql.com/downloads/ * Install data set on MySQL - Go to http://www.baseball-databank.org/ and click on the link titled 'Database in MySQL form'. This will download a zipped file with a series of MySQL data structures containing baseball statistics. First create a MySQL database to host the data using the command: 'mysqladmin -p create jpaperformance'. This will create a database named 'jpaperformance'. Next, load the baseball statistics using the following command: 'mysql -p -D jpaperformance < BDB-sql-2009-11-25.sql' where 'BDB-sql-2009-11.25.sql' represents the unzipped SQL script obtained by extracting the zip file you dowloaded. * Create JPA application WARs - The download includes source code, library dependencies and an Ant build file. This includes all three JPA implementations Hibernate 3.5.3, EclipseLink 2.1 and OpenJPA 2.1. To build the JPA Hibernate WAR - ant hibernate To build the JPA EclipseLink WAR - ant eclipselink To build the JPA OpenJPA WAR - ant openjpa All builds are placed under the dist/ directories. * Deploy to Tomcat 6.0.26 - Copy the MySQL Java driver and Spring Tomcat Weaver -- included in the download directory 'tomcat_jar_deps' -- to Apache Tomcat's /lib directory. - Copy each JPA application WAR to Apache Tomcat's /webapps directory, as needed. * Deployment URL's http://localhost:8080/hibernate/hibernate/home ( Query all Player objects ) http://localhost:8080/eclipselink/eclipselink/home ( Query all Player objects ) http://localhost:8080/openjpa/openjpa/home ( Query all Player objects ) http://localhost:8080/hibernate/hibernate/firstname/ ( Query Player objects by first name) http://localhost:8080/eclipselink/eclipselink/firstname/ ( Query Player objects by first name) http://localhost:8080/openjpa/openjpa/firstname/ ( Query Player objects by first name) http://localhost:8080/hibernate/hibernate/lastname/ ( Query Player objects by last name) http://localhost:8080/eclipselink/eclipselink/lastname/ ( Query Player objects by last name) http://localhost:8080/openjpa/openjpa/lastname/ ( Query Player objects by last name) http://localhost:8080/hibernate/hibernate/playerid/ (Query Player by id) http://localhost:8080/eclipselink/eclipselink/playerid/ ( Query Player by id) http://localhost:8080/openjpa/openjpa/playerid/ ( Query Player by id)
July 20, 2010
by Daniel Rubio
· 153,723 Views · 2 Likes
article thumbnail
Practical PHP Patterns: Query Object
An ORM provides an abstraction of storage as an in-memory object graph, but it is difficult to navigate that graph via object pointers without loading a large part of it. Typical problems of this approach are the performance issues related to loading of the various objects, and the transfer of business logic execution from the database side to the client code side, with the resulting duplication. Anyway, when we start navigating an object graph we have to obtain a reference to an entity somehow (an Aggregate Root), from which we can navigate to the other ones. ORMs and, in general, Data Mappers provide different ways to select a subset of objects (or a single one) and reconstitute only that subset from the data storage. Custom mapper classes with domain-specific methods are the the simplest solution, which is often recommended when not using a generic Data Mapper. Custom mapper classes with finder methods are an half-baked solution, which mixes up domain-specific mappers with general purpose methods, sometimes needed to allow flexibility on the user side. Generic mapper classes with finder methods can be provided as a way to parametrize fields, resulting in methods like findBy($entityName, $field, $value). Generic mapper classes with query objects are employed when there is the necessity of composing queries and pass them around for further elaboration or refining. Promoting the query as an object helps this use case. Note that once a mapper implements query objects, they can be effectively used in finder methods, which are a subset of the functionality provided by query objects. In fact, query objects are the most versatile way to ask for the objects that satisfy certain conditions, and they are an Interpreter implementation over a query language adapt for an object model. All of us already know a query language: SQL. But SQL is pertinent to relational databases, while an ORM strives for keeping the illusion of an object-only model into existence. As a result, it must adopt a different language which describes object features, like HQL (Hibernate) or DQL (the Doctrine equivalent). Object query languages There are several differences between an object query language and SQL in the entities you can refer to within queries: SQL refers to tables; object query languages refer to classes and some tables like the association tables for many-to-many relationships simply vanish. SQL refers to rows; object query languages to objects. SQL refers to other tables for making JOINs; object query languages to object collaborators. SQL refers to columns, which also include foreign keys; object query languages only to fields of the objects. When a full-featured language is involved, there must be a component of the ORM that parses the strings containing language statements into a Query Object. Another way to define such an object (Interpreter) is constructing it by hand, by calling a series of setter methods or by implementing a Builder pattern. Advantages A Query Object hides the relational model (the schema) from the user, as it can be inferred by the union of the queries and the Metadata Mapping anyway. The information contained in the metadata, like foreign keys and additional tables, do not have to be repeated in the various components of client code. It hides also the peculiarities of the particular database vendor, since the generation of SQL can be addressed by a driver. It promotes queries as first-class citizens, making them objects that can be passed around, cloned or modified. The database abstraction layers like PDO make of statement objects (PDOStatement) one of their first modelling points. Disadvantages The implementation of the parser for a query language is a task of great complexity, which makes this pattern only feasible in generic Data Mappers. Even when using only Query Objects made by hand, it is advisable to employ an external Data Mapper to take advantage of the translation of object-based queries to SQL. Examples Doctrine 2 contains a parser for its Doctrine Query Language, which lets you define queries like you would do with PDO, but still referring to an object model. The documentation of the query language itself is pretty complete, so I won't go into details but I'll give you a feel of how using DQL is like. The language itself is compatible with the Doctrine 1 version, if you happen to have used it. createQuery('SELECT u FROM MyProject\Model\User u WHERE u.age > 20'); $users = $query->getResult(); $query = $em->createQuery("SELECT u, a FROM User u JOIN u.address a WHERE a.city = 'Berlin'"); $users = $query->getResult(); uery = $em->createQuery('SELECT u, p FROM CmsUser u JOIN u.phonenumbers p'); $users = $query->getResult(); // array of CmsUser objects with the phonenumbers association loaded $phonenumbers = $users[0]->getPhonenumbers(); $query = $em->createQuery('SELECT u, a, p, c FROM CmsUser u JOIN u.articles a JOIN u.phonenumbers p JOIN a.comments c'); $users = $query->getResult(); Sometimes there are no fixed queries, but a dynamic query has to be constructed from its various parts, as a union of conditions, joins and sorting parameters; not all the parameters may be available at a certain time and concatenating strings to compose a DQL statement is prone to error. Doctrine 2 includes a Query Builder which has methods you can call orthogonally, in any order and combination. add('select', 'u') ->add('from', 'User u') ->add('where', 'u.id = :identifier') ->add('orderBy', 'u.name ASC'); ->setParameter('identifier', 100); // Sets :identifier to 100, and thus we will fetch a user with u.id = 100
July 7, 2010
by Giorgio Sironi
· 6,467 Views
article thumbnail
Practical PHP Patterns: Metadata Mapping
The intent of the Metadata Mapping pattern is to express implementation details, related a particular domain and Domain Model, as metadata of a general purpose library. In the sense intended here, metadata is related to the persistence operations (transferring objects back and forth from a database). These metadata is usually fed to a general purpose object-relational mapper. Technically the term metadata is plural (of metadatum, data about data), but it is commonly used as an uncountable noun. Why expressing metadata Object-relational mapping is a difficult task to automate, prone to lots of potential bugs and undefined behaviors; expressing the domain-related peculiarities as metadata means that you are able to code only one ORM, and not have to repeat the same work in many custom Data Mappers, which are very boring to write and can't be transported out a specific application. Custom Data Mappers were a cleaner solution for Domain Models with regard to employing Active Records, and they are advocated for example in Zend Framework books like Keith's Pope one. They are finally becoming obsolete thanks to the power of a declarative approach like this pattern, which tools like Doctrine 2 are based on. Historicacally, Hibernate from JBoss was the first Data Mapper implemented as a generic ORM (it is a Java product). Doctrine 2 is the most famous PHP implementation, and it is in beta at the time of this writing. The metadata we'd like to tell to an ORM are for example: which classes should be persisted at all. Optional names for the tables (it can use the class names.) Which fields form the primary key. The types of the different columns, particularly important in a loosely typed language like PHP. Which collaborators have to be persisted and via what means: foreign keys and additional association tables. The metadata should usually not consist of code: non-standard behavior shouldn't be contained in them, as in general all the behavior like ineritance strategies and conversion of relationships is extracted in the generic ORM. Thus there are different formats we can use in place of PHP code: XML, annotations, YAML, INI... Different approaches There are two approaches to Metadata Mapping pattern, described by Fowler in his original book. The first one is code generation: the metadata is processed to generate the source code of the mapping classes, for example a Data Mapper for every entity or aggregate root of your model (one for User, one for BlogPost, and so on). The ORM would theoretically not be necessary in production if the generation is complete enough. Doctrine 1 used this approach in part, but it generated also the PHP code of the domain model itself from the Yaml mapping, as subclasses of Doctrine_Record. Still, Doctrine 1 was necessary to instantiate those classes and the solution wasn't so clean. Doctrine 2 is very different in architecture and goals. The second approach is called reflective program, and consists in interpreting the mapping at runtime in the ORM's code, to open up correctly the objects via reflection (or a standard interface) and putting them in the database. The converse can happen: objects can be recreated from the union of metadata and database tables. How it is used The reflective solution is the common one nowadays, and Doctrine 2 borrows it from Hibernate in its own design. Reflection is used to access the private fields to persist. Some critics point out speed problems of this technique, but keep in mind that your ORM is communicating with an external process or database machine at the same time of using reflection: it probably won't count much in the benchmark. Doctrine 2 however takes optimization seriously to the point that metadata internal classes (accessed very often) present an Api with public properties instead of methods to avoid every overhead in a crucial part (hydration of objects with data retrieved from the database). An advantage of generated code is that it would be easier to debug, but it is usually a pain to maintain: every time you evolve or refactor a domain class you have to regenerate the Mapper classes. You can't customize this code either, because you would lost your changes at the regeneration time. Advantages and (few) disadvantages Of course we lose some expressiveness by specifying metadata instead of a programmatical behavior like the source code of a custom Data Mapper. But we gain very much: a fully tested ORM, like Doctrine 2 in the PHP case, with only some lines of added metadata to keep in sync with the rest of the code base. Declarative approaches trading off completeness of functionalities (the absent ones are not used very often anyway) for developers time. But there are other advantages, such as the generation (and migration) of the database schema based on the metadata, and also of the proxy classes. Ideally, the metadata mapping is the only point of strong coupling of your Domain Model with an external adapter, the ORM. It is of course part of the infrastructure, so keep it under version control along with the code! Adding and removing fields or relationships, changing keys or refactoring is much easier because you do it declaratively instead of refactoring a specific mapper class. Note that automated refactoring tools are not to be trusted here: for example they usually ignore the mapping when you change a field name. So grep is your best ally. Examples The sample code of this article will present the different ways of specifying metadata for Doctrine 2, the most high-tech PHP ORM. The performance of the different methods are equivalent, since the metadata are read only one time into native PHP objects and then cached. Metadata is a vast subject since all the different persistence implementations have to be driven by it, but we will look more at the types of metadata specification we can use instead of all the different metadata instances, which are best described in conjunction with the single features (for example, the inheritance patterns articles contain the description of the metadata related to subclassing.) The simplest way to express metadata mapping in Doctrine 2 is via annotations, embedded in the docblocks and ignored from anything but the ORM: Don't be alarmed by the size: this mapping does much more than the annotations example's one. A third way to specify metadata is via YAML, a format widely used in symfony-related software: --- # Doctrine.Tests.ORM.Mapping.User.dcm.yml Doctrine\Tests\ORM\Mapping\User: type: entity table: cms_users id: id: type: integer generator: strategy: AUTO fields: name: type: string length: 50 oneToOne: address: targetEntity: Address joinColumn: name: address_id referencedColumnName: id oneToMany: phonenumbers: targetEntity: Phonenumber mappedBy: user cascade: cascadePersist manyToMany: groups: targetEntity: Group joinTable: name: cms_users_groups joinColumns: user_id: referencedColumnName: id inverseJoinColumns: group_id: referencedColumnName: id lifecycleCallbacks: prePersist: [ doStuffOnPrePersist, doOtherStuffOnPrePersistToo ] postPersist: [ doStuffOnPostPersist ]
July 5, 2010
by Giorgio Sironi
· 3,932 Views
article thumbnail
Working with the bit.ly API to shorten URLs
Bit.ly is a quite popular URL shortening service. On Tiwtter, almost all of the links I see provided by the people I follow are posted as bit.ly shortcuts. If you’ve used a Twitter client, you probably already know that some of them (if not almost every single of them) offers URL shortening as a built-in capability. Now, you can implement the same functionality, thanks to the fact that bit.ly offers a public API to do this. But let’s start with coding. First of all, all data that is passed to the service is transferred via HTTP requests. The response generated by the service is by default formatted as a JSON document, however the developer can explicitly specify that XML data should be returned. A request to the bit.ly shortening service requires authentication, and the username and API key are required to be passed as parameters. This means that in order to use the service, a bit.ly account is needed (it is free). The API key can be found here (http://bit.ly/account/your_api_key) once the user registered. Shortening The first (and probably the most important method) is the one that actually shortens the URL and it is called /v3/shorten. V3 at the beginning stands for the API version (that is 3.0 at the moment, so don’t worry about that). This method accepts 5 parameters: • format – determines the output format for the request • longUrl –determines the long URL that needs to be shortened • domain – [optional] the domain used for shortening – either bit.ly or j.mp (default: bit.ly) • x_login – the user ID (although in the documentation it is indicated as optional, it is not) • x_apiKey – the user API key (although in the documentation it is indicated as optional, it is not) Let’s look at the method I’ve written to shorten the URL: enum Format { XML, JSON, TXT } enum Domain { BITLY, JMP } string ShortenUrl(string longURL, string username, string apiKey, Format format = Format.XML, Domain domain = Domain.BITLY) { string _domain; string output; // Build the domain string depending on the selected domain type if (domain == Domain.BITLY) _domain = "bit.ly"; else _domain = "j.mp"; HttpWebRequest request = (HttpWebRequest)WebRequest.Create( string.Format(@"http://api.bit.ly/v3/shorten?login={0}&apiKey={1}&longUrl={2}&format={3}&domain={4}", username, apiKey, HttpUtility.UrlEncode(longURL), format.ToString().ToLower(), _domain)); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader (response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } There are two enums that hold the possible data formats as well as the domain names. This is made for safety reasons – if I would pass these as string parameters, there is a higher chance the end-user will pass the wrong string, and then the function will fail. The code is based on a single HttpWebRequest that creates a HTTP request to the URL that is built according to the data passed to it. Then, I am getting the response stream and passing the string representation to the returned string variable. Notice the fact that I am explicitly returning a string value for this method. In fact, I could either return a JsonDocument instance (requires a third-party library to use this class) or XmlDocument. But since there are two possible formats for the request to handle, it is better to return this as a simple string and then let the developer decide what he wants to do next. Once called, the function will return data similar to this: 200 OK http://j.mp/crnexS crnexS msft http://www.microsoft.com 0 Or this (for JSON): { "status_code": 200, "status_txt": "OK", "data": { "long_url": "http:\/\/www.microsoft.com", "url": "http:\/\/j.mp\/crnexS", "hash": "crnexS", "global_hash": "msft", "new_hash": 0 } } Or this (for TXT): http://j.mp/crnexS I am using a custom domain here, but as you see – the format and domain are optional parameters. I can leave them with default values without actually passing them to the function, and then the only values that need to be indicated are the user ID, API key and the long URL. Decoding There is also a way to decode the URL to its initial state from what was the shortened one. The method is called /v3/expand and is used in a similar manner as the shortening one. In this method, I am also using the Format enum to specify the output format: string DecodeUrl(string[] urlSet, string[] hashSet, string username, string apiKey, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/expand?login={0}&apiKey={1}&format={2}", username, apiKey, format.ToString().ToLower()); if (urlSet != null) { foreach (string url in urlSet) URL += "&shortUrl=" + HttpUtility.UrlEncode(url); } if (hashSet != null) { foreach (string hash in hashSet) URL += "&hash=" + hash; } HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } It works a bit different though. As you can see, I am requesting the user to pass two arrays – one with URLs and one with hashes. One of them can be null, therefore the URL can be decoded either by the hash or by the shortened URL. The user can pass both arrays, and get a result similar to this: 200 OK http://j.mp/crnexS http://www.microsoft.com crnexS msft crnexS http://www.microsoft.com crnexS msft The URLs are sanitized inside the function – I am not assuming that the user will pass the encoded URL. In fact, the developer should never assume that the user will pass the correct value – the code should be as foolproof as possible. User validation If you work on an application that depends on the URL shortening service, it would be a good idea to validate the user before making the API calls. Bit.ly provides a method for this as well and it is called /v3/validate. It only requires three parameters – the username, the API key and the output format (that is in fact optional). The C# implementation for this method looks like this: string ValidateUser(string username, string apiKey, string userToCheck, string keyToCheck, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/validate?x_login={0}&x_apiKey={1}&login={2}&apiKey={3}&format={4}", userToCheck, keyToCheck, username,apiKey, format.ToString().ToLower()); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } A bit of confusion can be caused by the fact that there are x_ -prefixed copies of login and API key. You need to pass your ID and API key to verify someone else’s account validity. X_ -prefixed parameters represent the end user. The output should look similar to this: 200 OK 1 Count clicks Bit.ly provides click statistics, so once you shorten a URL, you can track its basic usage. Statistics are available through the /v3/clicks method. It doesn’t have a TXT output format, so you will have to avoid using that (or create a separate enum, that is the best choice). The implementation for it looks like this: string GetClicks(string[] urlSet, string[] hashSet, string username, string apiKey, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/clicks?login={0}&apiKey={1}&format={2}", username, apiKey, format.ToString().ToLower()); if (urlSet != null) { foreach (string url in urlSet) URL += "&shortUrl=" + HttpUtility.UrlEncode(url); } if (hashSet != null) { foreach (string hash in hashSet) URL += "&hash=" + hash; } HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } Same as for the Expand method, an array of URLs and hash codes can be passed and statistics will be generated for multiple entries at once. The response looks like this (in XML format): 200 http://j.mp/crnexS msft 0 crnexS 2230 0 msft crnexS crnexS 2230 OK Note that the XML won’t be indented by default. Check for PRO domain Bit.ly offers pro, customizable domains. That means, that not only bit.ly and j.mp can be used for shortening, but user-defined domains as well. The /v3/bitly_pro_domain method allows to check whether a domain is bit.ly PRO-powered or not. It is very similar to the user validation method, but it accepts the domain name instead of the user credentials. The C# implementation looks like this: string CheckPro(string username, string apiKey, string domain, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/bitly_pro_domain?login={0}&apiKey={1}&domain={2}&format={3}", username, apiKey, domain, format.ToString().ToLower()); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } Once called, the output looks like this: 200 nyti.ms 1 OK URL lookup Bit.ly also allows the lookup of long URLs. For example, you might want to find if there is an existing short URL for the existing long URL. To do this, there is the /v3/lookup method. The implementation is quite simple and as other methods, it has the same base structure: string Lookup(string username, string apiKey, string[] url, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/lookup?login={0}&apiKey={1}&format={2}", username, apiKey, format.ToString().ToLower()); foreach (string _url in url) URL += "&url=" + HttpUtility.UrlEncode(_url); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } The XML response looks similar to this for a positive result (there is a URL found): 200 http://www.dreamincode.net http://bit.ly/mviGY mviGY OK Notice that I can pass an array of URLs to be checked. Mind, though, that the maximum number of URLs that can be passed to the method is 15. With the methods described above, you can harness the power of bit.ly and bring it to your .NET application (the code can easily be ported to any .NET-compatible programming language). For official documentation, you can take a look here.
June 18, 2010
by Denzel D.
· 33,495 Views
article thumbnail
Creating Master-Detail Forms with Vaadin and Grails
In the article, Groovy, Grails and Vaadin, Petter Holmström outlined an example of using Vaadin with Grails, to build a simple data bound UI for a single database table. This was a great article that showed how one can quickly build a web-based solution that behaves very much like a typical client-server application. However, there are many instances where we need to be able to build an entry screen which allows the user to update both the master record and its detail records at the same time, such as in an Invoice entry screen. In this article I looked at extending the example in the above article, leveraging the simplicity of the Vaadin framework and the GORM capabilities, to build a master-detail example. After years of doing development using Oracle Forms, I liked the way it provides generic functions on each form (or data block) to allow users to add new records, delete them, query them and navigate from one record to the next. Furthermore, in Oracle Forms, you could provide a data grid to allow the user to edit the related detail table (such as the invoice lines in an Invoice document) and keeps these records in sync with the Invoice header (master) record. This is a first attempt at reproducing some of these capabilities using Vaadin and Grails. The master-detail data model The example here is an extension of the Petter Holmström model of the Trip Planner. In the original example, there is a table holding information on trips taken by the user. We will extend that example by having a one-to-many relation with a table of the places that are visited in each trip. This is the code for the Trip domain class. Notice that we are using Grails 1.3.1 which, by default, puts the domain class into the tripplanner package. package tripplanner class Trip { static constraints = { name(nullable: true) city(nullable: true) startDate(nullable: true) endDate(nullable: true) purpose(nullable: true) notes(nullable: true) } String name String city Date startDate Date endDate String purpose String notes static hasMany = [ places : Place ] static mapping = { places lazy:false, cascade:"all,delete-orphan" } } We have made the fields nullable for ease of inserting the record into the table. We also created the one-to-many relation with the Place domain class, and put in the mapping to cascade the deletion and updates to the sub-table class. We also need to change the relation from lazy to eager to ensure that the all detail records are loaded at the same time as the master (header) record. As for the Place domain class, we have 2 fields; name of the place we visited and a description field. In addition, we also put in a transient field, "_deleted", hich we will use to mark a detail record for deletion. package tripplanner class Place { static constraints = { name(nullable: true) description(nullable: true) } String name String description boolean _deleted static transients = [ '_deleted' ] static belongsTo = [trip:Trip] } The master-detail form Now, we need to change the VaadinApp class. Instead of listing the master record in a table to be selected for editing, I have opted to follow the Oracle Forms style for letting the user query for master records using Query By Example (QBE). The standard GORM and Hibernate infrastructure allows the use of a "example" object as a query pattern to extract a subset of records. This unfortunately is not exactly how Oracle Forms QBE works as Oracle Forms allows the user to put in "A%" patterns to query for values that starts with "A" and ">999" to query for values that is bigger than 999. The QBE in GORM only allows exact matches. We will accept this limitation for this example. So now we need a set of buttons to allow the user to: New - Add a new record data entry Save - Saves the current record to the database, including the changes done to the detail table Del - Deletes the current master record and the associated detail records EnterQ - Enters into Query mode which allows the user to provide a pattern for a QBE search ExecQ - Executes the Query and shows the first matching record from the query ExitQ - Exits out of Query mode and goes back to New record mode Prev - Go to previous record if there any queried records Next - Go to next record if there are any queried records These buttons are located at the top of the master record form. Error messages which arise from the save / update button are listed at the top of the master record form fields (in the "description field" of the Vaadin Form). There is also a status indicator label located at the footer of the master record form ("footer section" of the Vaadin Form). The status label will show the current mode of the master record entry screen and also the current record number if this is part of a query set. Below the master record form is the editable, selectable detail records table which is located in its own panel. Users are allowed to edit the detail record table only if the master record is in Edit mode. This means that users need to save the newly created master (header) record before that they can add any detail records. The detail record table has a column for all the editable, visible records excluding the "_deleted" field which is for internal use only. To manipulate the detail records, we provide 2 buttons to add a new detail record (the "+" button) or delete the selected detail record (the "-" button). All additions, deletions or even updates are only saved to the database after the user clicks on the "Save" button in the master (header) record form. As the user moves from one master record to the next using the "Prev" and "Next" buttons in the master table form, the detail table will be synchronized accordingly. package tripplanner import com.vaadin.ui.* import com.vaadin.data.* import com.vaadin.data.util.* class VaadinApp extends com.vaadin.Application { def query def rec_pos def container = new BeanItemContainer(Place.class) def master_fields = ["name", "purpose", "startDate", "endDate", "city", "notes"] def saveButton def newButton def delButton def entQButton def exeQButton def extQButton def prevButton def nextButton def statusLabel def pnlDetail void new_mode(editor) { def tripInstance = new Trip() editor.itemDataSource = new BeanItem(tripInstance) editor.visibleItemProperties = master_fields editor.description = "" container.removeAllItems() saveButton.setEnabled(true) delButton.setEnabled(false) entQButton.setEnabled(true) extQButton.setEnabled(false) statusLabel.setValue "New Mode" pnlDetail.setEnabled(false) } void init() { //def window = new Window("Trip Maintenance", new SplitPanel(SplitPanel.ORIENTATION_HORIZONTAL)) def window = new Window("Trip Maintenance") setMainWindow window // Form Panel def panel = new Panel("Trip Maintenance") panel.setSizeFull() panel.setLayout(new VerticalLayout()); // Trip editor def tripEditor = new Form() tripEditor.setSizeFull() tripEditor.layout.setMargin true tripEditor.immediate = true tripEditor.visible = true // status bar - shows the mode of the form statusLabel = new Label("New Mode") // define master form buttons saveButton = new Button("Save") newButton = new Button("New") delButton = new Button("Del") entQButton = new Button("EnterQ") exeQButton = new Button("ExecQ") extQButton = new Button("ExitQ") prevButton = new Button("Prev") nextButton = new Button("Next") // set initial state of buttons delButton.setEnabled(false) extQButton.setEnabled(false) // default is New Mode def v_tripInstance = new Trip() tripEditor.itemDataSource = new BeanItem(v_tripInstance) tripEditor.visibleItemProperties = master_fields //new_mode(tripEditor) // panel for detail form pnlDetail = new Panel() pnlDetail.setEnabled(false) // table to hold the detail form rows def table = new Table() table.containerDataSource = container table.selectable = true table.editable = true table.setSizeFull() table.visibleColumns = ["name", "description"] table.immediate = true // toolbar for the detail form manipulation def toolbar = new HorizontalLayout() // buttons for detail form // button to add new row to detail form def addRowButton = new Button("+", new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { def placeInstance = new Place() def tripInstance = tripEditor.itemDataSource.bean tripInstance.addToPlaces(placeInstance) container.addBean placeInstance } }) toolbar.addComponent addRowButton // button to delete current row from detail form def delRowButton = new Button("-", new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (table.value) { def placeInstance = table.value placeInstance._deleted = true container.removeItem(placeInstance) table.value = null } } }) toolbar.addComponent delRowButton // detail panel has the table rows and the toolbar pnlDetail.addComponent toolbar pnlDetail.addComponent table // master form buttons // save record button saveButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { Trip.withTransaction { status -> table.commit() def tripInstance = tripEditor.itemDataSource.bean def _toBeDeleted = tripInstance.places.findAll {it._deleted} if (_toBeDeleted) { tripInstance.places.removeAll(_toBeDeleted) } if (!tripInstance.save(flush:true)) { tripEditor.description = "Error:" tripInstance.errors.allErrors.each { error -> tripEditor.description = tripEditor.description + "" + "Field [${error.getField()}] with value [${error.getRejectedValue()}] is invalid" } tripEditor.description = tripEditor.description + "" window.showNotification "Could not save changes" } else { tripEditor.description = "" window.showNotification "Changes saved" delButton.setEnabled(true) statusLabel.setValue("Edit Mode"); pnlDetail.setEnabled(true) populate_container(container,tripInstance.places) } } } }) // new record button newButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { new_mode(tripEditor) } }) // delete record button delButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (tripEditor.itemDataSource.bean) { Trip.withTransaction { status -> def tripInstance = tripEditor.itemDataSource.bean tripInstance.delete(flush:true) } } new_mode(tripEditor) } }) // enter query mode entQButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { def tripInstance = new Trip() def newBean = new BeanItem(tripInstance) tripEditor.itemDataSource = newBean tripEditor.visibleItemProperties = master_fields tripEditor.description = "" saveButton.setEnabled(false) delButton.setEnabled(false) entQButton.setEnabled(false) extQButton.setEnabled(true) statusLabel.setValue("Query Mode"); pnlDetail.setEnabled(false) } }) // execute query exeQButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { tripEditor.commit() def example = (Trip) tripEditor.itemDataSource.bean query = Trip.findAll(example) rec_pos = 0 tripEditor.description = "" entQButton.setEnabled(true) extQButton.setEnabled(false) if (query.size > 0) { def next_rec = query[rec_pos] tripEditor.itemDataSource = new BeanItem(next_rec) tripEditor.visibleItemProperties = master_fields saveButton.setEnabled(true) delButton.setEnabled(true) statusLabel.setValue("Edit Mode. Record " + (rec_pos+1) + "/" + query.size); pnlDetail.setEnabled(true) populate_container(container,next_rec.places) } else { window.showNotification "No record found" new_mode(tripEditor) } } }) // exit query mode. Back to New mode extQButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { new_mode(tripEditor) } }) // edit mode, previous record prevButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (!query) { window.showNotification "No query result" return } if (rec_pos > 0) { rec_pos-- def next_rec = query[rec_pos] tripEditor.itemDataSource = new BeanItem(next_rec) tripEditor.visibleItemProperties = master_fields saveButton.setEnabled(true) statusLabel.setValue("Edit Mode. Record " + (rec_pos+1) + "/" + query.size) pnlDetail.setEnabled(true) populate_container(container,next_rec.places) } } }) // edit mode, next record nextButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (!query) { window.showNotification "No query result" return } if (rec_pos < query.size - 1) { rec_pos++ def next_rec = query[rec_pos] tripEditor.itemDataSource = new BeanItem(next_rec) tripEditor.visibleItemProperties = master_fields saveButton.setEnabled(true) statusLabel.setValue("Edit Mode. Record " + (rec_pos+1) + "/" + query.size) pnlDetail.setEnabled(true) populate_container(container,next_rec.places) } } }) // put the status bar on the footer of the master form tripEditor.footer = new HorizontalLayout() tripEditor.footer.addComponent statusLabel // put all master form buttons into a horizontal panel def btnPanel = new HorizontalLayout() btnPanel.addComponent saveButton btnPanel.addComponent newButton btnPanel.addComponent delButton btnPanel.addComponent entQButton btnPanel.addComponent exeQButton btnPanel.addComponent extQButton btnPanel.addComponent prevButton btnPanel.addComponent nextButton // construct master-detail form panel panel.addComponent btnPanel panel.addComponent tripEditor panel.addComponent pnlDetail // set panel to main window mainWindow.addComponent panel } // routine to populate the container with the detail records void populate_container(container, details) { container.removeAllItems() details.each { if (!it._deleted) container.addBean it } } } A walkthrough of the above VaadinApp code is as follows. The query object stores the list of "Trip" instances that is returned from a QBE query. The rec_pos is an index into the above list for the currently displayed record. The master_fields stores the list of master record fields that is visible on the form We put some of the buttons, status label and detail panel object as the object variables so that they can be access by helper routines The new_mode routine is to put the form into the New mode, which occurs multiple times in the whole code The init routine is where the whole window and form is defined The form is laid out with the master record buttons at the top, followed by the master record form, and then the detail table sub-panel below that In the detail sub-panel, there are 2 buttons, "+" and "-" which adds a detail to the Table and the master record instance and removes a detail record, respectively. These are not put into a GORM Transaction as we only want to save (flush) the record when the "Save" button is saved. The really tricky part of the whole form is in the master record buttons. For the "Save" button, we first force the detail table to update the source records in the master collection. Then we delete all the detail records which are marked for deletion, where the "_deleted" field is set to true. Finally the master record is saved by calling the "save" method in the tripInstance bean object. Any error messages are put into the "description field" of the master record form. If everything is fine, then the form is put into "Edit" mode for further editing including adding detail records into the Table below. The new button just puts a new Trip instance into the master record form (and disables the detail table panel) The delete button just deletes the master record. There is no need to "commit" the transaction as it is done automatically, and this differs from Oracle Forms The enter-query button just clears the fields to collect the parameters to search. If all the fields are null, then the QBE will match all records in the database Executing the query will run the "findAll(object)" method of the GORM domain instance. The return result is a List of matching instances. This is not quite scalable as the List needs to be stored in memory for the duration of the session or until the next query is run. This will need to be improved further. If there are any returned records, then the master form will display the first record and put it in "Edit" mode Exiting the Query mode will return the form to the New mode The previous and next buttons will update the master record form and the container of the detail table panel, thus keeping both screens in synchrony. The picture on the left shows the data entry screen with the detail records for the current master record. Future Direction To improve on this, we will need to componentize the whole form so that developers do not need to copy and paste the above template code and modify it for each form. Furthermore the "container" object above should be changeable to allow for other types of containers to be used but they should not be data-bound and leave the data-binding to the GORM objects and layer. Another suggestion is to create a Container component which is manipulated by the master table form buttons (add, save, delete, enter query, execute query, previous record, next record) and it then controls the master table entry Form. Finally, as mentioned in the Petter Holmström article, we are not really using any feature of Grails other than the GORM, and so, this entire example can be build using just Groovy, GORM and Vaadin.
May 31, 2010
by Chee-keong Lee
· 39,591 Views · 1 Like
article thumbnail
Practical PHP Patterns: Lazy Loading
Lazy initialization is a common procedure in programming, especially in an environment like PHP applications where during a specific HTTP request it's practically sure that there are resources which won't be used at all, and that, if eager loaded, would be simply discarded without being referenced again. The Lazy Loading pattern for stateful objects is a transparent solution for loading entity objects on demand, when they are referenced troughout client code. Its typical usage is in the internals of an ORM or in another kind of Data Mapper. Lazy Loading is not a mapper-specific pattern, but in the environment of enterprise patterns this is the best example of its application. Intent In general, the object graph managed by a Data Mapper can be large at will, and loading the entire graph to satisfy a request which usually involves only a small subset of it is a waste. This is the case when using many-to-many bidirectional relationships, as it is the case with the Unix users and groups model. For example, loading an User will bring back also his referenced groups, which in turn will reference their contained users and so on... In general, it is not possible to always break the relationships in one direction to simplify the model. In this case, full loading is not a waste: it's not feasible unless you have gigabytes of ram to allocate per every request. Another use case for lazy loading of domain model entities is in arbitrary computation to perform when there is no knowledge a priori of the extent of the object graph that will be reached by the method calls (or field access). For example, we may want to do a statistical analysis of the userw reachable from a particular starting point through groups relationships, and stop the research when we have reached a predetermined number of users. In this kind of computation, we cannot know for sure how much JOINs we will have to perform as this is an example of a query which falls out of the scope of SQL. Implementation Lazy Loading is generally implemented with a Proxy pattern, where a fake entity is substituted to the real objects at the boundary of the object graph. This fake entities do not load relationships and fields when they are first inserted in the graph. The Proxy pattern implementation used here is the ghost variant, where the object is essentially a proxy for itself, since the field are loaded and stored in it on the first access. The ghost has only its identifier fields loaded by default, and it is inserted in the Identity Map anyway. The PHP implementation of Lazy Loading mechanism in ORMs has to respect some caveats to work. For starters, the Proxy object substituted for the real one has to conform to the same interface of the original class. This is accomplished via subclassing since there is usually no explicit interface (defined with the interface keyword) for domain objects. This subclass may be generated previous to its usage or at runtime, when it comes the time to insert a proxy in the graph. The methods of the entity, such as getters, setters, or logic-related ones are overridden, and their new implementations call a load() method before delegating to the parent method. The load() private method uses a reference to the Entity Manager or to some internal component of the Orm to load the fields of the object itself, and its relationships. Note that this referenced objects can be proxies in turn. Of course, the load() method has a boolean guard that makes it perform real execution of queries only one time, since if the object has already been loaded its behavior returns to be the original class's one. Lazy Loading can be used also for public properties via __get() and __set() overriding, but it is maybe too much transparent as accessing a public property may cause a query to the database. PHP magic methods are often abused. Performance Lazy Loading may cause performance problems when used extensively. The main issues is that it promotes many small queries to load objects one at the time, instead of a big JOIN which can load the whole object graph subset needed, avoiding the overhead of all the unnecessary queries and a chatty communication. Even if the database is high-performance or not relational or whatever, if it resides on another machine communication over the network will experience longer latency and it is best performed in batches. Parts of the graph to eager load via joins (instead of lazy load) can be specified while querying via the language of choice (HQL, JQL, Linq, DQL) or via programmatic interface (Criteria object). This part of the relational information crosses the boundary of the ORM to reach its client code, but it is mandatory to optimize the generated queries to an acceptable level. One solution is to encapsulate it in Repository implementations, that provide a domain-specific interface and hide all the queries in the object-related language of the ORM. Lazy Loading is complex, and it is perfect for outsourcing to a library like an Orm. Be aware of the issues it causes on performance and you will be able to take advantage of it without causing strain on your database. Example The code sample for this ORM pattern is taken from Doctrine 2, where I contributed part of the code related to lazy loading one-to-one and many-to-one relationships. The class presented here is the ProxyFactory, which generates a proxy object for a particular entity and, if needed, the source code for the proxy class. I have inserted some extension to the docblock comments. * @author Giorgio Sironi * @since 2.0 */ class ProxyFactory { /** The EntityManager this factory is bound to. */ private $_em; /** Whether to automatically (re)generate proxy classes. */ private $_autoGenerate; /** The namespace that contains all proxy classes. */ private $_proxyNamespace; /** The directory that contains all proxy classes. */ private $_proxyDir; /** * Initializes a new instance of the ProxyFactory class that is * connected to the given EntityManager. * * @param EntityManager $em The EntityManager the new factory works for. * @param string $proxyDir The directory to use for the proxy classes. It must exist. * @param string $proxyNs The namespace to use for the proxy classes. * @param boolean $autoGenerate Whether to automatically generate proxy classes. */ public function __construct(EntityManager $em, $proxyDir, $proxyNs, $autoGenerate = false) { if ( ! $proxyDir) { throw ProxyException::proxyDirectoryRequired(); } if ( ! $proxyNs) { throw ProxyException::proxyNamespaceRequired(); } $this->_em = $em; $this->_proxyDir = $proxyDir; $this->_autoGenerate = $autoGenerate; $this->_proxyNamespace = $proxyNs; } /** * Gets a reference proxy instance for the entity of the given type and identified by * the given identifier. * Generalle this method will reuse the source code it has already generated, even in * other HTTP requests since the source file are saved in a configurable folder. * This is the only public method the other parts of the ORM will generally use. * * @param string $className * @param mixed $identifier * @return object */ public function getProxy($className, $identifier) { $proxyClassName = str_replace('\\', '', $className) . 'Proxy'; $fqn = $this->_proxyNamespace . '\\' . $proxyClassName; if ($this->_autoGenerate && ! class_exists($fqn, false)) { $fileName = $this->_proxyDir . DIRECTORY_SEPARATOR . $proxyClassName . '.php'; $this->_generateProxyClass($this->_em->getClassMetadata($className), $proxyClassName, $fileName, self::$_proxyClassTemplate); require $fileName; } if ( ! $this->_em->getMetadataFactory()->hasMetadataFor($fqn)) { $this->_em->getMetadataFactory()->setMetadataFor($fqn, $this->_em->getClassMetadata($className)); } $entityPersister = $this->_em->getUnitOfWork()->getEntityPersister($className); return new $fqn($entityPersister, $identifier); } /** * Generates proxy classes for all given classes. * Used for pre-generation from command line, in case PHP on the hosting service * has not the rights to create new files in the proxies folder. * * @param array $classes The classes (ClassMetadata instances) for which to generate proxies. * @param string $toDir The target directory of the proxy classes. If not specified, the * directory configured on the Configuration of the EntityManager used * by this factory is used. */ public function generateProxyClasses(array $classes, $toDir = null) { $proxyDir = $toDir ?: $this->_proxyDir; $proxyDir = rtrim($proxyDir, DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR; foreach ($classes as $class) { $proxyClassName = str_replace('\\', '', $class->name) . 'Proxy'; $proxyFileName = $proxyDir . $proxyClassName . '.php'; $this->_generateProxyClass($class, $proxyClassName, $proxyFileName, self::$_proxyClassTemplate); } } /** * Generates a proxy class file. * Substitutes certain parameters like class name and methods in a template * kept at the end of this file. The class source code is saved in a file in the directory specified. * Testing this method is usually not a problem since the directory is easily configurable. * * @param $class * @param $originalClassName * @param $proxyClassName * @param $file The path of the file to write to. */ private function _generateProxyClass($class, $proxyClassName, $fileName, $file) { $methods = $this->_generateMethods($class); $sleepImpl = $this->_generateSleep($class); $placeholders = array( '', '', '', '', '' ); if(substr($class->name, 0, 1) == "\\") { $className = substr($class->name, 1); } else { $className = $class->name; } $replacements = array( $this->_proxyNamespace, $proxyClassName, $className, $methods, $sleepImpl ); $file = str_replace($placeholders, $replacements, $file); file_put_contents($fileName, $file); } /** * Generates the methods of a proxy class. * All methods are overridden to call _load() before execution. * * @param ClassMetadata $class * @return string The code of the generated methods. */ private function _generateMethods(ClassMetadata $class) { $methods = ''; foreach ($class->reflClass->getMethods() as $method) { /* @var $method ReflectionMethod */ if ($method->isConstructor() || strtolower($method->getName()) == "__sleep") { continue; } if ($method->isPublic() && ! $method->isFinal() && ! $method->isStatic()) { $methods .= PHP_EOL . ' public function '; if ($method->returnsReference()) { $methods .= '&'; } $methods .= $method->getName() . '('; $firstParam = true; $parameterString = $argumentString = ''; foreach ($method->getParameters() as $param) { if ($firstParam) { $firstParam = false; } else { $parameterString .= ', '; $argumentString .= ', '; } // We need to pick the type hint class too if (($paramClass = $param->getClass()) !== null) { $parameterString .= '\\' . $paramClass->getName() . ' '; } else if ($param->isArray()) { $parameterString .= 'array '; } if ($param->isPassedByReference()) { $parameterString .= '&'; } $parameterString .= '$' . $param->getName(); $argumentString .= '$' . $param->getName(); if ($param->isDefaultValueAvailable()) { $parameterString .= ' = ' . var_export($param->getDefaultValue(), true); } } $methods .= $parameterString . ')'; $methods .= PHP_EOL . ' {' . PHP_EOL; $methods .= ' $this->_load();' . PHP_EOL; $methods .= ' return parent::' . $method->getName() . '(' . $argumentString . ');'; $methods .= PHP_EOL . ' }' . PHP_EOL; } } return $methods; } /** * Generates the code for the __sleep method for a proxy class. * The __sleep() method is used in case of serialization, which should * not include service objects referenced by proxies like the Entity Manager. * * @param $class * @return string */ private function _generateSleep(ClassMetadata $class) { $sleepImpl = ''; if ($class->reflClass->hasMethod('__sleep')) { $sleepImpl .= 'return parent::__sleep();'; } else { $sleepImpl .= 'return array('; $first = true; foreach ($class->getReflectionProperties() as $name => $prop) { if ($first) { $first = false; } else { $sleepImpl .= ', '; } $sleepImpl .= "'" . $name . "'"; } $sleepImpl .= ');'; } return $sleepImpl; } /** Proxy class code template */ private static $_proxyClassTemplate = '; /** * THIS CLASS WAS GENERATED BY THE DOCTRINE ORM. DO NOT EDIT THIS FILE. */ class extends \ implements \Doctrine\ORM\Proxy\Proxy { private $_entityPersister; private $_identifier; public $__isInitialized__ = false; public function __construct($entityPersister, $identifier) { $this->_entityPersister = $entityPersister; $this->_identifier = $identifier; } private function _load() { if (!$this->__isInitialized__ && $this->_entityPersister) { $this->__isInitialized__ = true; if ($this->_entityPersister->load($this->_identifier, $this) === null) { throw new \Doctrine\ORM\EntityNotFoundException(); } unset($this->_entityPersister); unset($this->_identifier); } } public function __sleep() { if (!$this->__isInitialized__) { throw new \RuntimeException("Not fully loaded proxy can not be serialized."); } } }'; }
May 26, 2010
by Giorgio Sironi
· 27,577 Views
article thumbnail
Writing user stories for web applications
User stories are the substitute of formal requirements documents in an agile environment: they are short summaries of a functionality that leave space to expansion and refinement when it comes the time to implement it. Writing them it's not rocket science and it is definitely something a web developer should master. Stories are not requirements, in the sense they are not required at all: the prioritization process will choose the most important stories to implement at a given time, basing on their cost and on their value. Instead of giving a list of requirements where 90% of the features are only nice to have, the customer gets to make an informed decision over which stories should be implemented first, and can handle new requirements by adding them to the global list of stories (backlog). The typical agile estimation process is not the subject of this article, but it looks like this: asking questions to the customer generates a bunch of user stories, which go into the backlog where all the ideas about functionalities are kept. Stories are estimated in relative points or ideal time, to give an idea of their size. With the customer, the developers can choose which stories to pull from the backlog into a smaller plan (iteration or release based). if new requirements come up or a user story changes too much to be considered the same, it is put in the backlog so that the next planning process can deal with it. If it's still a priority, it will be surely included in the next iteration. How to write them A major point of user stories is their focus on the value provided to the end user and not on the technical topics related to its implementation. Technical options will be chosen to satisfy the story and to estimate its cost during the subsequent planning. User stories have usually the following overly famous form: As a [role], I [feature] so that [reason] For example, As a user, I can login into the application so that the it presents me my preferences. What is a role while writing user stories? It is the analogue of the classic Uml use case diagram role: for example it can be the customer, a user of the application, an admin, developer which uses the library you're writing. The so that part is often optional, but it should described the value provided by the feature the user story describes. The feature itself can change in development but it should conserve its original value. In our example, if we make the user login with OpenID the value is conserved even if we have thrown away our own authentication mechanism. In this sense, stories do not describe describe the how, only the what, and this particular what can change is this helps to achieve the same why (a little metaphysical definition). Keep in mind that user stories have to be testable, because they are the definition of done b: you can drink champagne only when the acceptance tests for a story are passing: consider modifying your stories if it is difficult to write automated tests for them. For instance in an application which indexed files asynchronously (and it may take a lot after the user has been returned an Indexing started message) I actually addedd a dynamic page with the last additions to the index, that is updated as the last step of the pipeline of operations, to make the story more testable. Complex stories which are unclear to test are a symptom that a refinement is needed. Where to write them The general suggestion is to write every user story on a 3x5 card because this size choice keeps the stories short and to the point. Moreover, if you write on paper you can shuffle single cards around for planning pourposes. I hate to write on paper something I may edit, so when working solo, I use txt files where a story is represented by a row. I'm currently looking for a low-footprint project management tool which does not complicate my process, but I can easily move around stories from the backlog to the release or iteration plan with vim by using dd to cut a story and P or p to paste it. This is nearly as low-tech as sheets of paper. Why web applications? Web applications adapt particularly well to iterative development and to a story-based approach. Usually a web application starts with a beta that implements the most important stories to provide basic functionalities. If a story does not gather enough success online (poor response from the users), linked stories that expand it may be delayed (left in the backlog) or cancelled, to make room for other stories that have come up. There are no problems in the client side while updating the application, as all issues with upgrading are moved to the server side (such as changes to the database schema). If the developers have access to the hosting service for the application, rolling out the result of an iteration is very easy and the users may not notice it until they start to use the new features. Compare this agility with the old MS Access applications used in the offices of half of the planet. While web applications may take over the enterprise world, legacy managerial applications are widely spread and it is difficult to substitute them in one single shot. I tried with a formal waterfall approach, and failed as the requirements were too fine-grained, and impossible to prioritize: imagine a list of an hundred of queries that can be done over the database, and no idea of which are the most used. Imagine fifty different entities which represent an outdated domain model you have to replace. How do you know where to start? Even if you can implement all these requirements, how much will it cost? An agile process is our only hope for replacing this kind of applications, and if you will someday see a PHP application generating invoices in your office, it will be in part thanks to user stories.
May 25, 2010
by Giorgio Sironi
· 44,116 Views
article thumbnail
Interview: Music Composer on the NetBeans Platform
Steven Yi (pictured right) is a programmer and composer living in Rochester, NY. He studied music composition in college and became a programmer afterwards. He started off as a Flash and server side developer (which he did for about 7 years), and has spent the past few years at his current company doing mobile development with J2ME, Android, and iPhone, as well as server-side development with Spring, Hibernate, etc. He started learning and using Java and Swing for personal work in 2000 and has been using it since then for the development of blue, the focus of the interview that follows below. In the interview, Steven talks about the "blue" music composer, how it works, and how the NetBeans Platform and Python form the basis of this cool open-sourced Java music composer. What's "blue"? blue is a music composition environment I started in the fall of 2000. It was actually my very first Java program! At the time, I had started using the music software Csound (http://www.csounds.com) to compose, but found it slow to work with when it came to accomplishing what I was interested in musically. I had the idea to create a simple program that would have a timeline and the ability to scale musical score material in time. Fast forward many years later: in trying to solve other musical problems, and responding to feedback from the community of users, I've expanded blue's features a great deal. It now includes things like a mixer and effects system, a GUI builder tool for creating synthesizer interfaces, embedded Jython processing of musical scripts, and more. It's been quite satisfying to create a tool that can express my musical interests and to find a community of users who have found value in this program for their own musical work. Some screenshots: The Orchestra manager shows a BlueSynthBuilder instrument being edited. The "Reson 6" instrument is shown in edit mode. The BSB Object Properties panel shows the properties for the selected knob: The Score timeline shows a project using multiple parameter automations. The values automated include things like volume, panning, and a time pointer for a phase-vocoder instrument. All BlueSynthBuilder instruments, Effects, and the Mixer volume sliders can be automated: The Score timeline showing the author's composition "Reminiscences". The timeline shows multiple Python SoundObjects used. The SoundObject Editor shows the editor for the selected SoundObject in the timeline. The SoundObject Properties panel shows different properties for the selected SoundObject: The Score timeline showing a Tracker SoundObject being used. The timeline is configured to snap at every 4 beats and the time bar has been configured to show in numbers rather than time: The Score timeline showing a PianoRoll SoundObject being used. The PianoRoll is unique in that it is microtonal, meaning it can adapt the number of steps per "octave", depending on the values configured from a tuning file. In this screenshot, the scale loaded was a Bohlen-Pierce scale, which has 13-tones per tritave (octave and a half): The blue Mixer is shown docked into the bottom bar and in an open state. The interfaces for user-created Chorus and Reverb effects are shown. The interfaces were created using the same GUI builder tool that is found in the BlueSynthBuilder instrument: It's got a very special appearance. How did that come about? blue's custom look and feel started off one day when I was using my Palm PDA. I remember thinking that I enjoyed the look of the device with the backlight on, and so I wanted to recreate that kind of look for my program. Later, I modified the color scheme to tone it down in some ways, but I also introduced more colors than white and cyan to highlight secondary and tertiary features. Maybe now it is now more like Tron than it is like Palm. :) Overall, I enjoy the darker look of the application when I'm working on music. I tend to work on music when I have free time, and that is usually only late at night—I've found having a darker screen has been easier on my eyes. Also, if anyone was wondering, yes, blue is my favorite color. The blue look and feel is encapsulated in a module named "blue-plaf" and is available in the "blue" Mercurial repository (http://bluemusic.hg.sourceforge.net/hgweb/bluemusic/blue). The look and feel is quite hacked up (redoing it properly has been another item on my todo list), but it can be dropped into another application and it should work, as shown below with the CRUD Sample (which can be created from a tutorial found here): Can you explain how blue's timeline works? blue has a concept of SoundLayers and SoundObjects. SoundObjects are objects that primarily produce notes and have a start and duration. There are many different types of SoundObjects in blue and each has an editor (viewed in the SoundObject Editor TopComponent when a SoundObject is selected), and a BarRenderer, which is used to draw the content area of the bar on the timeline. A PolyObject is a special SoundObject. It consists of SoundLayers, which contain SoundObjects. The root timeline is itself just a PolyObject that you can add as many layers to as you like. You can also group individual SoundObjects into their own PolyObjects, and then use the resulting PolyObjects just like any other SoundObject on the timeline. If you double-click a PolyObject, the timeline is then reset with the timeline of the PolyObject you selected. As a result, PolyObjects allow timelines to be embedded within other timelines. If you think about how music is grouped into motives, phrases, sections, and even larger groupings, you can see how PolyObjects might represent these kinds of musical abstractions. For the component design, the ScoreTopComponent starts off with a JSplitPane to split between a SoundLayerListPanel on the left and a JScrollPane on the right. The JScrollPane has a ScoreTimeCanvas (the main timeline) in the main viewPort's view, a panel with the the time bar and tempo editor in the column header, and the corner is used to open up an extra panel to modify properties for the timeline. The JScrollPane has customized JScrollBars used to add the ± buttons that perform zooming on the timeline. There are a number of other features involved that are implemented amongst a number of classes, but the details of how viewPorts are synchronized (among other things) may be a bit too technical to discuss here. For those who are interested, the code can be viewed within the blue.ui.core.score package within the blue-ui-core module. How did blue come to find itself atop the NetBeans Platform? I first started to be interested in NetBeans IDE around the time 4.1 came out, but didn't really get into using it until the release of 5.0. At that time, I had hand-written Swing components for about 4-5 years (I don't really remember when 5.0 was released), and I found Matisse to be quite nice and began using it here and there. I had looked at the NetBeans Platform as an RCP at that time, but found it to be quite a bit to understand. However, I still kept it on my radar. Around the time 6.0 or 6.5 came out, I started to reconsider migrating to the NetBeans Platform once again. By this time, I had moved over to using NetBeans IDE full time for blue development and had been using NetBeans IDE more in general—particularly Java Web development and Ruby on Rails. One of the biggest things I found attractive about NetBeans IDE is its windowing system... and the things I read about in the platform development articles I'd seen online made me curious once again to see what the NetBeans Platform offered. I still felt that there was going to be a big learning curve to learn the NetBeans Platform, but the NetBeans Platform tutorials online were really quite helpful, as were the members of the NetBeans Platform mailing list, and there were also many more books available to help me get started. I think I ultimately spent about 6-8 months migrating blue to using the NetBeans Platform. Granted, it was a busy time in my life and I was working on this only in my spare time, so I think in the end it was a reasonable amount of time. Users have been very positive about the new blue interface and application as a whole, and I think it has been worth spending the time to use the NetBeans Platform. blue's window layout is quite unexpected for a NetBeans Platform application. By the time I had started migrating blue to the NetBeans Platform, the application was already some 7-8 years old. The interface I designed for blue in pure Swing was influenced by my experiences in using Flash, looking at other music composition environments (Digital Audio Workstations and Sequencing Programs), and evaluating the different aspects of working with Csound. Mapping the components from the Swing-based application to the NetBeans Platform was a little tricky in that I couldn't quite get the exact same design of panels as I had in pure Swing. In the end, I tried to think about where most of the components resided physically, and created TopComponents and placed them in the center, left, right, or bottom parts of the main window. I kept some of the dialogs from the old codebase as-is, but I migrated others to be TopComponents so that they could be docked, opened, or dragged out into a dialog as the user wished. In the end, the GUI is different and took a little getting used to after years of using and building the old interface, but I quickly adjusted to the changes and I think there is much greater consistency and usability now. The users have responded very positively to the general polish of the application and to being able to customize their environment. I myself have very much enjoyed being able to dock all of the windows as well as using full-screen mode, especially when I am on my netbook and composing. Excellent! What features of the NetBeans Platform are you using and what do you find to be most useful? Currently, I am using only a very small part of the NetBeans Platform. By the time I started to move my code to the NetBeans Platform, the codebase was already some 7 or 8 years old. I took the approach recommended to me on the mailing list and started off small, focusing primarily on migrating my project to using the Windows System API, the Options API, and a few other utility API's like IO and Dialogs. Having an old codebase, I found that I spent most of my time during migration just reorganizing my UI into TopComponents and working out communications between the components. I also spent time looking at API's that I had developed myself and seeing which ones could be replaced with API's provided by the NetBeans Platform. At this time, the application is still using a number of API's I wrote from the old codebase, but over time I would like to migrate more of the appplication to use the Nodes and Visual Library API's. I think migrating a codebase of this size in phases really worked out well. In the first phase, I was able to take advantage of the Window System API and have a very visible result on the application and gained a lot for usability. Also, a big part of the migration involved moving the codebase from a monolithic source tree and partitioning it into logical modules. I think there really is a great deal of benefit to working with a codebase with modular design, and that too is a very positive result of working with the NetBeans Platform. Please say something about how Jython relates to this application, how you are using it (what the benefits are), and your general opinions on Jython. I have had a Python SoundObject in blue for quite some time—I think since 2002. For me, it is one of the most important tools in blue when it comes to accomplishing what I want musically. With computer music, we have a lot of tools for what I call Common Practice computer music: PianoRolls, Pattern Editors, and Notation Objects. For computer musicians who are interested in Uncommon Practice music, the ability to use a scripting language opens up a number of ways to express musical ideas that cannot be easily conveyed using those other tools. In blue, Jython is primarily used to allow users to write scripts that will generate notes. For myself, I use Python scripts to model orchestral composition, creating Performer and PerformerGroup objects that I write in Python. I also write performance functions, usually per-project, to perform different musical material in different ways. Other users have used Python scripts in exploring things like algorithmic composition and genetic algorithms in their work. A blue project can contain any number of Python objects. The score generated by each Python object is translated and scaled in time by moving and resizing the SoundObject in the timeline. This allows a user who may want to use scripting to create musical material to also take advantage of blue's timeline to organize how the different musical objects will work together. One of the things I most appreciate about Jython (and scripting languages on the JVM in general) is that it is embeddable within a Java application. By packaging and embedding the Jython interpreter within the blue application, users can rest assured that the Python scripts they write can be interpreted anywhere that blue is installed. It's an extra assurance that their musical projects will be long-lasting, but they can still take advantage of a full programming language like Python in their work. Overall, I think that Jython is a fine piece of software and I hope that it will continue to grow and develop for years to come. Is the application open source and are you looking for code contributions and, if so, in which areas? Yes, the application is available under the GPL v2 license, and the source code can be viewed from the Mercurial repository on SourceForge at http://bluemusic.hg.sourceforge.net/hgweb/bluemusic/blue/. I am a strong proponent of open source, especially for creative work. In the same way that we can today look at and study musical works by composers of the past (like Josquin and Bach), I would like to imagine that the work composers and other artists are now creating with computers will also be open and available for study in the future. I believe that using open source software for creative work greatly helps in making musical projects available for the years ahead. I have done most of the development of blue myself, and over the years I've certainly built up a long list of things that I would like to implement. Users have also made wonderful feature requests that I would love to see in the program—but unfortunately, there are only so many hours in the day. It would certainly be nice to have others contributing code! Beyond new features, there are a number of infrastructural things that would be nice to address. The codebase is many years old, and while the application has been refactored multiple times over its lifetime, there are still some areas of the application that could be much more cleanly implemented. Also, in moving over to the NetBeans Platforms, I only really took the first steps. There are a number of components within the application that could probably be better served by migrating to using more of the NetBeans API's. For internal work, things like modifying the timeline to implement zooming to use Graphics2d and transforms, implementing a better waveform renderer for audio files, and further enhancing the instrument GUI builder are all things I'd like to see. I'd also love to get help in migrating all of the tables and trees to using the Nodes API, something that I have not yet had the time to do. It would also be nice to get the manual (currently in HTML and PDF, generated from DocBook) integrated into the application as JavaHelp, but this is another thing that I have had to postpone due to lack of time. For features, some interesting things I'd love to see are a Notation SoundObject, a separate graphical instrument builder using the Visual Library API, and a Sampler instrument. There's also a sound drawing SoundObject, enhancements to existing SoundObjects, and more I'd love to see moving forward. Maybe someone will find these kinds of things interesting and will take a look at blue's code sometime! Thanks Steven and happy music making with blue!
May 25, 2010
by Geertjan Wielenga
· 17,888 Views
article thumbnail
C# and SQLDependency: Monitoring Your Database for Data Changes
Using the SqlDependency Class is a good way to make your data driven application (whether it be Web or Windows Forms) more efficient by removing the need to constantly re-query your database checking for data changes. For the purposes of this article we'll be discussing using SqlDependencywith SQL 2005, but I'm sure it works the same, or very similar with SQL 2008. Using SqlDependency to monitor for data changed relies on two things from SQL Server, Service Broker. A Service Broker allows the sending of asynchronous messages to and from your database. The second item it relies on Queues, your Service Broker will use your Queue to create loose coupling between the sender and receiver. The sender can send its message to the Queue and move on, and relies on the Service Broker to make sure whatever message was put into the Queue is processed and a response is sent back to the sender (your application). For this article I’m using a sample database I created named SqlDependencyExample and a table named Employee whick has this structure: CREATE TABLE [dbo].[EmployeeList]( [EmployeeID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [varchar](25) NOT NULL, [LastName] [varchar](25) NOT NULL, [PhoneNumber] [varchar](13) NOT NULL) ON [PRIMARY] We will be using a stored procedure for querying, but this can also be done with a regular text query in your code. Here’s the stored procedure CREATE PROCEDURE uspGetEmployeeInformationASBEGIN -- Insert statements for procedure here SELECT EmployeeID, FirstName, LastName, PhoneNumber FROM dbo.EmployeeListEND NOTE: Notice in the stored procedure we have dbo in the table name, when using dependency to monitor data changes you have to have the table in the format [owner].[Tablename] of it can cause unwanted results, so to avoid that just use the said format. First thing we need to do is create your Service Broker and Queue so we can send messages back & forth between our database. In my example I have a simple table that holds employee names & phone numbers. This is how we create the Service Broker & Queue (My database name is SqlDependencyExample but change that with your database name (We also need to give our SQL user permission to access it) USING [SqlDependencyExample]CREATE QUEUE NewEmployeeAddedQueue;CREATE SERVICE NewEmployeeAddedService ON QUEUE NewEmployeeAddedQueue([http://schemas.microsoft.com/sql/notifications/postquerynotification]); GRANT SUBSCRIBE QUERY NOTIFICATIONS TO SqlDependencyExampleUser; Now we move on to the fun part, the code (I know that's what you're waiting for). Before this can work we need to check and make sure the connected user has the proper permissions for the notifications. We can do this with creating a simple method CheckUserPermissions which uses the SqlClientPermissions Class to check the permissions of the currently connected user. So here’s the simple method for accomplishing this: private bool CheckUserPermissions(){ try { SqlClientPermission permissions = new SqlClientPermission(PermissionState.Unrestricted); //if we cann Demand() it will throw an exception if the current user //doesnt have the proper permissions permissions.Demand(); return true; } catch { return false; } One thing to know, SqlDependency relies on the OnChangeEventHandler Delegate which handles the SqlDependency.OnChange Event, which fires when any notification is received by any of the commands for the SqlDependency object. Now for getting the employee list, in this method we will query our EmployeeList table to get the employees information. We will also set the OnChange event of our SqlDependency object so that it can let us know when data has changed in our table and re-populate it with the latest employee list. Before trying to access our database we will call the method CheckUserPermissions method we created earlier to make sure the current user has the proper permissions, if not we display a message otherwise we move on to getting the employee list and populating a ListView with the ID, first & last name of the employee and their phone number. Here’s the GetEmployeeList method, which expects a parameter of type ListView (which is what will be displaying our employee list) /// /// method for querying our database to get an employee list/// /// the ListView we want to display the employee list inprivate void GetEmployeeList(ListView lview){ //the connection string to your database string connString = "YourConnectionString"; //the name of our stored procedure string proc = "uspGetEmployeeInformation"; //first we need to check that the current user has the proper permissions, //otherwise display the error if (!CheckUserPermissions()) MessageBox.Show("An error has occurred when checking permissions"); //clear our ListView so the data isnt doubled up lview.Items.Clear(); //in case we have dependency running we need to go a head and stop it, then //restart it SqlDependency.Stop(connString); SqlDependency.Start(connString); using (SqlConnection sqlConn = new SqlConnection(connString)) { using (SqlCommand sqlCmd = new SqlCommand()) { sqlCmd.Connection = sqlConn; sqlCmd.Connection.Open(); //tell our command object what to execute sqlCmd.CommandType = CommandType.StoredProcedure; sqlCmd.CommandText = proc; sqlCmd.Notification = null; SqlDependency dependency = new SqlDependency(sqlCmd); dependency.OnChange += new OnChangeEventHandler(dependency_OnDataChangedDelegate); sqlConn.Open(); using (SqlDataReader reader = sqlCmd.ExecuteReader()) { while (reader.Read()) { ListViewItem lv = new ListViewItem(); lv.Text = reader.GetInt32(0).ToString(); lv.SubItems.Add(reader.GetString(1)); lv.SubItems.Add(reader.GetString(2)); lv.SubItems.Add(reader.GetString(3)); lview.Items.Add(lv); } } } } Notice we set the OnChangeEventHandler of our SqlDependency object to dependency_OnDataChangedDelegate, this is the method that will re-query our table and send notifications when the data has changed in our EmployeeList table. In this method we invoke the work onto the main UI thread, this will help us avoid the dreaded cross—thread exception when we go to re-populate the ListView control when any notifications are sent to our application. Since our method (GetEmployeeList) required a parameter we cannot use the standard MethodInvoker delegate (as this cannot accept parameters). So what we will do is create our own Delegate which can accept our parameter. Here’s the delegate (very simple): private delegate void getEmployeeListDelegate(ListView lv); Our dependency_OnDataChangedDelegate requires a SqlNotificationEventArgs in the signature. Here we check the control being used to make sure InvokeRequired is true, if no then we use Invoke to invoke the work onto the main UI thread for re-populating our ListView, otherwise we just call our method to re-query: private void dependency_OnDataChangedDelegate(object sender, SqlNotificationEventArgs e){ //avoid a cross-thread exception (since this will run asynchronously) //we will invoke it onto the main UI thread if (listView1.InvokeRequired) listView1.Invoke(new getEmployeeListDelegate(GetEmployeeList), listView1); else GetEmployeeList(listView1); //this example is for only a single notification so now we remove the //event handler from the dependency object SqlDependency dependency = sender as SqlDependency; dependency.OnChange -= new OnChangeEventHandler(dependency_OnDataChangedDelegate);} NOTE: One thing to remember is you have to stop the dependency from querying your database when your form is closed, to do this use the forms FormClosing event to stop the work in the form you are using the dependency work in. That’s how you use the SqlDependency Class for monitoring data changes in your database without having to use something like a timer control to re-query at certain intervals. Thanks for reading and happy coding :)
May 21, 2010
by Richard Mccutchen
· 105,805 Views · 1 Like
article thumbnail
Practical PHP Patterns: Unit of Work
The Unit of Work pattern is one of the most complex moving parts of Object-Relational Mappers, and usually of Data Mappers in general. A Unit of Work is a component (for us, an object with collaborators) which keeps track of the new, modified and deleted domain objects whose changes have to be reflected in the data store. At at the end of a transaction the Unit of Work, if used correctly, is capable of producing a list of changes to perform on the data store, solving concurrency or consistency problems, and avoiding too many redundant queries in the relational case or a chatty communication in the schemaless one. As I've already said, the Unit of Work pattern is usually not employed alone but as part of a Data Mapper, which provides a different interface to the internal client code and mixes up this pattern with several other ones. The minimum transaction that a PHP Unit of Work performs is usually an HTTP request, or a session composed by more than one request in case the domain objects can be saved in an intermediate store (like $_SESSION or a cache of any kind). Being able to serialize objects in a store and reattaching them to the Unit of Work during subsequent requests is not a trivial problem. Advantages The power of a Unit of Work resides in the fact that the actual database transaction is only performed (and kept opened) when the commit() method of the Unit of Work is called, while until that moment there is ideally no use of the database connection. This paradigm is called batch update. Objects stored in a Unit of Work have usually an associated state, like: new (which correspondes to INSERT queries during the batch update) clean (no SQL queries have to be issued since the object has been retrieved and not modified) dirty (UPDATE queries) removed (DELETE queries) There are different strategies for detecting changes to the object graph. The simplest strategy is comparing objects with a clean copy kept in memory (while it is usually not performance-wise to compare them with the database.) A more complex solution is having a specific interface which is implemented by the objects, so that they can manage their state and declare they are dirty or have to be removed. This implementation choice introduces a dependency from the domain layer to the infrastructure one, thus I prefer heavier approaches like the former, which is equivalent to generate a diff with your source control system of choice, but on the object graph instead of a codebase: the source files are not responsible for diffing themselves. Furthermore, the Unit of Work decoupling from the database state introduces an upper level of management, that makes us able to rollback changes if some constraint are not satisfied, or the computation has produced an error. In PHP, the client code can simply throw the object graph away, and the partial Unit of Work changeset is forgotten in the next requests. Issues While decoupling the object graph from the data store to perform custom computations is a comfortable possibility for the client code, at the same time it can be an issue that introduces stale data. The more the objects are kept in the Unit of Work, the more the data store is prone to external concurrent modifications inconsistent with the in-memory graph (for example updating fields with different values than the ones modified in this very session.) Either a optimistic or pessimistic locking mechanism has to be introduced when the scope of the object graph is longer than the few seconds necessary of producing an HTTP response, or even less than that when the traffic is higher. Injecting the Unit of Work in the domain objects so that they can track their state can be problematic and too much an invasion of the domain layer. Usually the problem is solved the other way around: when the objects are passed to the Object-Relational Mapper (almost always implemented as a Data Mapper and not as an Active Record), it delegates part of the logic to the Unit of Work, which is a first-class citizen and can be tested independently from the other components of the library. The alternative to the inherent complexity of the Unit of Work pattern is saving an object at the moment it is updated. This solution is problematic because either the client code has to explicitly call save() methods, or queries (read modification to the data store in case of non-relational model) have to be performed at the very time of an atomic change, for instance issuing multiple UPDATE statements, one for every time a field is modified. Example The sample code of this article is the internal API of Doctrine 2. The actual Unit of Work code is dependent on the strategy adopted to detect changes to domain objects, but the interface exposed to the Entity Manager is always the same and should provide a panoramic of an Unit of Work's responsibilities and features. In this implementation, the methods persist() and remove() are used to introduce new objects to the Unit of Work or to schedule something for deletion from the database, while commit() executes a batch update on demand. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel * @internal This class contains highly performance-sensitive code. */ class UnitOfWork implements PropertyChangedListener { /** * An entity is in MANAGED state when its persistence is managed by an EntityManager. */ const STATE_MANAGED = 1; /** * An entity is new if it has just been instantiated (i.e. using the "new" operator) * and is not (yet) managed by an EntityManager. */ const STATE_NEW = 2; /** * A detached entity is an instance with a persistent identity that is not * (or no longer) associated with an EntityManager (and a UnitOfWork). */ const STATE_DETACHED = 3; /** * A removed entity instance is an instance with a persistent identity, * associated with an EntityManager, whose persistent state has been * deleted (or is scheduled for deletion). */ const STATE_REMOVED = 4; /** * Commits the UnitOfWork, executing all operations that have been postponed * up to this point. The state of all managed entities will be synchronized with * the database. * * The operations are executed in the following order: * * 1) All entity insertions * 2) All entity updates * 3) All collection deletions * 4) All collection updates * 5) All entity deletions * */ public function commit() { // Compute changes done since last commit. $this->computeChangeSets(); if ( ! ($this->_entityInsertions || $this->_entityDeletions || $this->_entityUpdates || $this->_collectionUpdates || $this->_collectionDeletions || $this->_orphanRemovals)) { return; // Nothing to do. } if ($this->_orphanRemovals) { foreach ($this->_orphanRemovals as $orphan) { $this->remove($orphan); } } // Raise onFlush if ($this->_evm->hasListeners(Events::onFlush)) { $this->_evm->dispatchEvent(Events::onFlush, new Event\OnFlushEventArgs($this->_em)); } // Now we need a commit order to maintain referential integrity $commitOrder = $this->_getCommitOrder(); $conn = $this->_em->getConnection(); $conn->beginTransaction(); try { if ($this->_entityInsertions) { foreach ($commitOrder as $class) { $this->_executeInserts($class); } } if ($this->_entityUpdates) { foreach ($commitOrder as $class) { $this->_executeUpdates($class); } } // Extra updates that were requested by persisters. if ($this->_extraUpdates) { $this->_executeExtraUpdates(); } // Collection deletions (deletions of complete collections) foreach ($this->_collectionDeletions as $collectionToDelete) { $this->getCollectionPersister($collectionToDelete->getMapping()) ->delete($collectionToDelete); } // Collection updates (deleteRows, updateRows, insertRows) foreach ($this->_collectionUpdates as $collectionToUpdate) { $this->getCollectionPersister($collectionToUpdate->getMapping()) ->update($collectionToUpdate); } // Entity deletions come last and need to be in reverse commit order if ($this->_entityDeletions) { for ($count = count($commitOrder), $i = $count - 1; $i >= 0; --$i) { $this->_executeDeletions($commitOrder[$i]); } } $conn->commit(); } catch (Exception $e) { $this->_em->close(); $conn->rollback(); throw $e; } // Take new snapshots from visited collections foreach ($this->_visitedCollections as $coll) { $coll->takeSnapshot(); } // Clear up $this->_entityInsertions = $this->_entityUpdates = $this->_entityDeletions = $this->_extraUpdates = $this->_entityChangeSets = $this->_collectionUpdates = $this->_collectionDeletions = $this->_visitedCollections = $this->_scheduledForDirtyCheck = $this->_orphanRemovals = array(); } /** * Computes the changes that happened to a single entity. * * Modifies/populates the following properties: * * {@link _originalEntityData} * If the entity is NEW or MANAGED but not yet fully persisted (only has an id) * then it was not fetched from the database and therefore we have no original * entity data yet. All of the current entity data is stored as the original entity data. * * {@link _entityChangeSets} * The changes detected on all properties of the entity are stored there. * A change is a tuple array where the first entry is the old value and the second * entry is the new value of the property. Changesets are used by persisters * to INSERT/UPDATE the persistent entity state. * * {@link _entityUpdates} * If the entity is already fully MANAGED (has been fetched from the database before) * and any changes to its properties are detected, then a reference to the entity is stored * there to mark it for an update. * * {@link _collectionDeletions} * If a PersistentCollection has been de-referenced in a fully MANAGED entity, * then this collection is marked for deletion. * * @param ClassMetadata $class The class descriptor of the entity. * @param object $entity The entity for which to compute the changes. */ public function computeChangeSet(Mapping\ClassMetadata $class, $entity) { // ... } /** * Computes all the changes that have been done to entities and collections * since the last commit and stores these changes in the _entityChangeSet map * temporarily for access by the persisters, until the UoW commit is finished. */ public function computeChangeSets() { // ... } /** * Schedules an entity for insertion into the database. * If the entity already has an identifier, it will be added to the identity map. * * @param object $entity The entity to schedule for insertion. */ public function scheduleForInsert($entity) { $oid = spl_object_hash($entity); if (isset($this->_entityUpdates[$oid])) { throw new \InvalidArgumentException("Dirty entity can not be scheduled for insertion."); } if (isset($this->_entityDeletions[$oid])) { throw new \InvalidArgumentException("Removed entity can not be scheduled for insertion."); } if (isset($this->_entityInsertions[$oid])) { throw new \InvalidArgumentException("Entity can not be scheduled for insertion twice."); } $this->_entityInsertions[$oid] = $entity; if (isset($this->_entityIdentifiers[$oid])) { $this->addToIdentityMap($entity); } } /** * Schedules an entity for being updated. * * @param object $entity The entity to schedule for being updated. */ public function scheduleForUpdate($entity) { $oid = spl_object_hash($entity); if ( ! isset($this->_entityIdentifiers[$oid])) { throw new \InvalidArgumentException("Entity has no identity."); } if (isset($this->_entityDeletions[$oid])) { throw new \InvalidArgumentException("Entity is removed."); } if ( ! isset($this->_entityUpdates[$oid]) && ! isset($this->_entityInsertions[$oid])) { $this->_entityUpdates[$oid] = $entity; } } /** * INTERNAL: * Schedules an entity for deletion. * * @param object $entity */ public function scheduleForDelete($entity) { $oid = spl_object_hash($entity); if (isset($this->_entityInsertions[$oid])) { if ($this->isInIdentityMap($entity)) { $this->removeFromIdentityMap($entity); } unset($this->_entityInsertions[$oid]); return; // entity has not been persisted yet, so nothing more to do. } if ( ! $this->isInIdentityMap($entity)) { return; // ignore } $this->removeFromIdentityMap($entity); if (isset($this->_entityUpdates[$oid])) { unset($this->_entityUpdates[$oid]); } if ( ! isset($this->_entityDeletions[$oid])) { $this->_entityDeletions[$oid] = $entity; } } /** * Checks whether an entity is scheduled for insertion, update or deletion. * * @param $entity * @return boolean */ public function isEntityScheduled($entity) { $oid = spl_object_hash($entity); return isset($this->_entityInsertions[$oid]) || isset($this->_entityUpdates[$oid]) || isset($this->_entityDeletions[$oid]); } public function persist($entity) { $visited = array(); $this->_doPersist($entity, $visited); } /** * Saves an entity as part of the current unit of work. * This method is internally called during save() cascades as it tracks * the already visited entities to prevent infinite recursions. * * NOTE: This method always considers entities that are not yet known to * this UnitOfWork as NEW. * * @param object $entity The entity to persist. * @param array $visited The already visited entities. */ private function _doPersist($entity, array &$visited) { $oid = spl_object_hash($entity); if (isset($visited[$oid])) { return; // Prevent infinite recursion } $visited[$oid] = $entity; // Mark visited $class = $this->_em->getClassMetadata(get_class($entity)); $entityState = $this->getEntityState($entity, self::STATE_NEW); switch ($entityState) { case self::STATE_MANAGED: // Nothing to do, except if policy is "deferred explicit" if ($class->isChangeTrackingDeferredExplicit()) { $this->scheduleForDirtyCheck($entity); } break; case self::STATE_NEW: if (isset($class->lifecycleCallbacks[Events::prePersist])) { $class->invokeLifecycleCallbacks(Events::prePersist, $entity); } if ($this->_evm->hasListeners(Events::prePersist)) { $this->_evm->dispatchEvent(Events::prePersist, new LifecycleEventArgs($entity, $this->_em)); } $idGen = $class->idGenerator; if ( ! $idGen->isPostInsertGenerator()) { $idValue = $idGen->generate($this->_em, $entity); if ( ! $idGen instanceof \Doctrine\ORM\Id\AssignedGenerator) { $this->_entityIdentifiers[$oid] = array($class->identifier[0] => $idValue); $class->setIdentifierValues($entity, $idValue); } else { $this->_entityIdentifiers[$oid] = $idValue; } } $this->_entityStates[$oid] = self::STATE_MANAGED; $this->scheduleForInsert($entity); break; case self::STATE_DETACHED: throw new \InvalidArgumentException( "Behavior of persist() for a detached entity is not yet defined."); case self::STATE_REMOVED: // Entity becomes managed again if ($this->isScheduledForDelete($entity)) { unset($this->_entityDeletions[$oid]); } else { //FIXME: There's more to think of here... $this->scheduleForInsert($entity); } break; default: throw ORMException::invalidEntityState($entityState); } $this->_cascadePersist($entity, $visited); } /** * Deletes an entity as part of the current unit of work. * * @param object $entity The entity to remove. */ public function remove($entity) { $visited = array(); $this->_doRemove($entity, $visited); } /** * Deletes an entity as part of the current unit of work. * * This method is internally called during delete() cascades as it tracks * the already visited entities to prevent infinite recursions. * * @param object $entity The entity to delete. * @param array $visited The map of the already visited entities. * @throws InvalidArgumentException If the instance is a detached entity. */ private function _doRemove($entity, array &$visited) { // ... } }
May 19, 2010
by Giorgio Sironi
· 13,762 Views
article thumbnail
Why Raven DB?
One question that I got a few times regarding Raven is why? Richard Lopes puts the question nicely: However as a pragmatic developer, I am wondering what new this project is offering in a saturated market where you have quite mature alternatives like CouchDB, MongoDB, Tokyo, Redis, and many more ? Many of these products are also cross platform and run at C speed with a proven record, being used in very big web sites where their sharding capabilities and fault tolerance have been pushed far. The answer is composed of several parts, and cover quite a bit of history. Why Raven DB from Ayende’s point of view? Almost two years ago, I decided that it is time that I give my Erlang reading abilities a big push and sat down and read Couch DB source code. That was quite interesting, and was one of the reasons that I got interested in that NoSQL Thing. Unfortunately, I am one of those people who have a really hard time learning by osmosis, I have to do something to truly understand it. I have used (and built) a distributed key value store in several projects, but I felt that I didn’t really have a good understanding on what it means to use a document database. I really hate having ideas stuck in my head, they tend to ping and then someone tell me that I have been staring at a blank wall for two hours, and I realize that I just finished designing a document database. And about a year ago, it finally got bad enough that I sat down and wrote an implementation, just to get it off my chest. That was Rhino DivanDB. In most ways, it was a proof of concept, more than anything else. Just enough so I could tell myself, yes, I can do it. Then I run into situations where a document database would be an ideal fit, except… that the available choices wouldn’t quite do what I wanted. They are all open source, however, so no problem there, right? Except that none of them are really approachable to the .NET eco system. Yes, I can do both C++ and Erlang, but I don’t really like it. Moreover, it seems like .NET support is almost an afterthought (if at all) for those projects. There are some people who call me arrogant, but I really do think that I can do better. And I think that we did. Raven is a project where I tried a lot of new things, not from coding perspective, but from community & launch perspectives. It will be out soon, and I think you’ll be able to appreciate the level of focus on the non coding aspects of the project. Why Raven DB from your point of view? Raven is an OSS (with a commercial option) document database for the .NET/Windows platform. While there are other document databases around, such as CouchDB or MongoDB, there really isn’t anything that a .NET developer can pick up and use without a significant amount of friction. Those projects are excellent in what they do, but they aren’t targeting the .NET ecosystem. Raven does, and in so doing, it brings a lot of benefits to the table. When building Raven and the supporting infrastructure, the focus was always on making sure that it did the Right Thing from the .NET developer point of view. Below you can see a more detailed analysis on Raven’s benefits, but it comes down to that. Raven is build by .NET developers for .NET developers. Corny, isn’t it? But true nonetheless. What does Raven DB has to offer? Raven… builds on existing infrastructure that is known to scale to amazing sizes (Raven’s storage can handle up to 16 terrabytes on a single machine). runs, natively and with no effort, on Windows. In comparison, to get CouchDB to run on Windows you start by compiling Erlang from source. is not just a server. You can easily (trivially) embed Raven inside your application. is transactional. That means ACID, if you put data in it, that data is going to stay there. supports System.Transactions and can take part in distributed transactions. allows you to define indexes using Linq queries. supports map/reduce operations on top of your documents using Linq. comes with a fully functional .NET client API, which implements Unit of Work, change tracking, read and write optimizations, and a bunch more. has an amazing web interface allowing you to see, manipulate and query your documents. is REST based, so you can access it via the java script API directly. can be extended by writing MEF plugins. has trigger support that allow you to do some really nifty things, like document merges, auditing, versioning and authorization. supports partial document updates, so you don’t have to send full documents over the wire. supports sharding out of the box. is available in both OSS and commercial modes. There are probably other things, but I need to head out for a client now, so I’ll stop. I would love to hear your opinions about it, both positive and negative.
May 18, 2010
by Oren Eini
· 11,386 Views · 1 Like
article thumbnail
Practical PHP Patterns: Data Mapper
Data Mapper is one of the most advanced persistence-related patterns: an implementation of a Data Mapper stores objects (in general a whole object graph) in a database, and decouples the object model from the backend data representation, moving objects back and forth from the data store without introducing hardcode dependencies towards it. The database back end used by most of the implementations are usually relational: Object-relational mappers are some of the widely used tools today (and are even fading in some areas where different types of database are preferred.) Dependencies The interfaces for a Data Mapper can be put in the domain layer, but actual implementatons are in the category of infrastructure adapters and should be kept out of it, to promote the reuse and testing of domain layer classes without the need for a database back end or driver to be present. When this pattern is employed in an application, there are no more dependencies from the domain layer to external components, and no subclassing like in the Active Record case. Domain entities and value objects become Plain Old PHP Objects which do not extend anything (extends keyword) and do not need to reflect any database schema (if they are saved in a relational db), ensuring the maximum freedom of modelling to the developers. Different kind of implementations Early implementations of Data Mapper did not store an inner reference to a database connection or object that represent the link with the data store; in this case, result sets or some kind of raw data are passed to the Data Mapper, which reconstitutes the objects and encapsulates the process. Currently it is preferred to put all the references to the database as internals of the Data Mapper implementation (or in an abstraction layer under it). Anyway the Data Mapper hides as much as possible, like the type of the database and related knowledge, from the client code (domain layer or an upper one). The interface of the modern Data Mappers become from store() (insert and update) and remove() to one that comprehends also find() methods or a more complex system of querying; the implementation of querying is out of the scope of this pattern, but can be mixed up with it easily. A distinction in the implementations of Data Mapper is in their scope. A Data Mapper can be specific to a particular Entity/Aggregate Root (single class or class with composed objects), or a generic implementation can be customized with metadata (annotations, XML configuration) to work with different classes. Generic implementations are usually very complex, and specific ones may become much more easy to code due to simplifications. However, generic Data Mappers are prone to reuse and present less bugs than the project-specific ones, which were the only alternative in the last years. Issues The difficulties in implementing such a pattern are clear. Given a transaction, like an http request, the mapper has to keep track of the changed objects, and generate automatically the right DML queries to issue (SELECT, UPDATE, DELETE), in the right order and without leaving out any part of the modified data, avoid duplicating rows or update ones that do not exist anymore. This is a case of simple interface and complex implementation. To avoid breaking encapsulation, implementations usually employ reflection to access private fields of the object to store or that the mapper is reconstituting. Other possible choices for the data access are specific constructors for reconstitution or specific interfaces for domain mapping, but this solution still breaks encapsulation by providing to the client code methods that are not meant to be called, or fields that should not even be seen out of the objects but are actually accessed. This results in a unclear Api which may promote dependencies on persistence-related items. Providing metadata breaks encapsulation too, of course, but at least it is kept in the immediate so that it can change with the domain classes. Annotations are the preferred mean to specify metadata such as column names or relationships, and in PHP they are hidden in the docblock comments so that when the Data Mapper is not used they are just ignored. Data Mapper does not provide a total illusion (abstraction) of an in-memory collection of objects: the knowledge that there is some kind of external data store scatters into the application upper layers. Moreover, eventually some particular issue of the storage leaks into the object part of the application. As an example, consider the performance of queries, which is often the object of discussion when using object-relational mappers. Usually not all the object graph is instantiated as it may be very large; tuning how large the instantiated part will be is a trade-off which depends on the underlying database. Furthermore, generated queries may result very inefficient to the point that much of the client code must hint the joins to perform via the Api. Examples The generic Data Mapper Doctrine 2 (now in beta) is one of the few implementations in PHP of this pattern. As we've seen before in this article, specific implementations are dependent on the domain layer, so they are usually not reusable. A working copy of Doctrine 2 would be too large in size for inclusion in this post, so we are only analyzing the interface that most of the client code would see: the Entity Manager (name borrowed from Hibernate and JPA, since Java application used Data Mappers for years before this pattern has seen adoption from PHP ones.) The Entity Manager is not a domain specific interface, but other patterns like the Repository one can then compose the mapper to provid segregated interfaces for a particular class (aggregate root). As always, I have removed the less interesting methods or code to show the Api, and expanded the comments. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel */ class EntityManager { /** * Flushes all changes to objects that have been queued up to now to the database. * This effectively synchronizes the in-memory state of managed objects with the * database. * No query is executed before this method is called from client code. * * @throws Doctrine\ORM\OptimisticLockException If a version check on an entity that * makes use of optimistic locking fails. */ public function flush() { $this->_errorIfClosed(); $this->_unitOfWork->commit(); } /** * Finds an Entity by its identifier. * This method is often combined with query-oriented ones. * * @param string $entityName the class name * @param mixed $identifier usually primary key * @param int $lockMode * @param int $lockVersion * @return object */ public function find($entityName, $identifier, $lockMode = LockMode::NONE, $lockVersion = null) { return $this->getRepository($entityName)->find($identifier, $lockMode, $lockVersion); } /** * Tells the EntityManager to make an instance managed and persistent. * * The entity will be entered into the database at or before transaction * commit or as a result of the flush operation. * * NOTE: The persist operation always considers entities that are not yet known to * this EntityManager as NEW. Do not pass detached entities to the persist operation. * * @param object $object The instance to make managed and persistent. */ public function persist($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->persist($entity); } /** * Removes an entity instance. * * A removed entity will be removed from the database at or before transaction commit * or as a result of the flush operation. * * @param object $entity The entity instance to remove. */ public function remove($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->remove($entity); } /** * Refreshes the persistent state of an entity from the database, * overriding any local changes that have not yet been persisted. * * @param object $entity The entity to refresh. */ public function refresh($entity) { if ( ! is_object($entity)) { throw new \InvalidArgumentException(gettype($entity)); } $this->_errorIfClosed(); $this->_unitOfWork->refresh($entity); } /** * Determines whether an entity instance is managed in this EntityManager. * * @param object $entity * @return boolean TRUE if this EntityManager currently manages the given entity, FALSE otherwise. */ public function contains($entity) { return $this->_unitOfWork->isScheduledForInsert($entity) || $this->_unitOfWork->isInIdentityMap($entity) && ! $this->_unitOfWork->isScheduledForDelete($entity); } /** * Factory method to create EntityManager instances. * * @param mixed $conn An array with the connection parameters or an existing * Connection instance. * @param Configuration $config The Configuration instance to use. * @param EventManager $eventManager The EventManager instance to use. * @return EntityManager The created EntityManager. */ public static function create($conn, Configuration $config, EventManager $eventManager = null); }
May 17, 2010
by Giorgio Sironi
· 9,948 Views
article thumbnail
Practical PHP Patterns: Active Record
The Active Record pattern effectively prescribes to wrap a row of a database table in a domain object with a 1:1 relationship, managing its state and adding business logic in the wrapping class code. An Active Record implementation is in fact a classical C structure aka Record aka associative array of data, with the addition of utility methods that encapsulate behavior that acts on these data. The most useful method is usually the save() one, which updates the database reflecting in the row the current state of the record. Thus, the Active Record transparently works with SQL queries and provides an higher-level Api. Although Active Record is similar in implementation to the Row Data Gateway pattern, it is distinguished from it in the fact that it defines methods with domain-specific logic. The consequence of the presence of domain-specific logic is that generic implementations of Active Record provided by libraries must be customized to met the need of the object model. Typically this customization is done with a thin subclassing, which at least renames the library class with a domain name (like User or Post) and may specify metadata on the database table where the Active Records state is kept, if they are not inferred. The issue of subclassing Subclassing allows the developer to create new methods and properties to represent business logic, and to build a richer and more specific interface than the one constituted by simple Row Data Gateway objects. Despite these advantage, this interface is not much segregated, as subclassing exposes all the public methods of the base library class, on which the developers of the domain model have no control. Subclassing also ties the domain layer to the infrastructure one, being it a library or a framework or every kind of data persistence layer (examples of PHP ORMs that use Active Record are the Zend_Db component, Doctrine 1 and Propel). Domain objects cannot be created or even their class source code loaded without having the library code available. This is an issue when reusing the model in a different environment, and even in test suites. If the library is powerful enough, it may provide adapters for different databases so that a lightweight database instance can be created in the testing environment. Another caveat of Active Record is the fundamental assumption that a domain entity is always a row of a table of a relational database; this constraint is forced even when it is not appropriate, and the database and object model must match. In fact, part of the database (like foreign keys) often scatter into the domain model, as an Active Record with an external one-to-one relationship will usually store not only the related object but also its foreign key. Another example of mirroring of the relational model into the object graph is for the management of M-to-N associating entities, often forced to become real entities even when they do not make sense (the famous UserGroup classes that tie together User and Group rows). Diffusion Thus, the Active Record pattern puts at risk the freedom of implementing a powerful Domain Model, where the object graph is a mix of state-carrying and behavior-carrying objects, like Strategies and Specifications. It is however, a radical simplification in implementation of domain models where CRUD functions are all the rage, and there is no gain in implementing objects that do not simply map to a relational database. Note that in the case of PHP, most of the custom web applications developed in this language are deeply influenced by the back end, assumed as a relational database or even as MySQL. But while the technologies for user-to-application and application-to.application interaction on the web continue to grow, the situation will continue to evolve and if PHP wants to keep up with the pace of other dynamic languages, Java and .NET, it needs to finally decouple from the relational database as the unique model of data. By the way, current implementation of persistence frameworks are transitioning towards a Data Mapper approach (not only in PHP but also in Ruby, while Java has done that years ago with Hibernate and JPA), which is less invasive on the Domain Model source code, and does not introduce an hard dependency from the domain layer to infrastructure components. Examples This sample implementation of the Active Record pattern is taken from the Doctrine 1.2 ORM. The base class in this framework id Doctrine_Record (together with Doctrine_Record_Abstract), while a base class with the schema metadata is regenerated from a model, and it is subclassed for orthogonality of customization and synchronization by the developer. The base Active Record class looks like this: /** * Implements also __get() and __set(), not shown along with many other dozen methods. */ abstract class Doctrine_Record extends Doctrine_Record_Abstract implements Countable, IteratorAggregate, Serializable { /** * Empty template method to provide concrete Record classes with the possibility * to hook into the saving procedure. */ public function preSave($event) { } /** * Empty template method to provide concrete Record classes with the possibility * to hook into the saving procedure. */ public function postSave($event) { } /** * applies the changes made to this object into database * this method is smart enough to know if any changes are made * and whether to use INSERT or UPDATE statement * * this method also saves the related components * * @param Doctrine_Connection $conn optional connection parameter * @return void */ public function save(Doctrine_Connection $conn = null) { if ($conn === null) { $conn = $this->_table->getConnection(); } $conn->unitOfWork->saveGraph($this); } /** * returns a string representation of this object */ public function __toString() { return (string) $this->_oid; } } If we have an Article entity, it will be represented via subclassing of the generic Active Record. In Doctrine 1, a subclass can be generated by writing a compact Yaml model, or even reverse engineered from an existing database: /** * BaseOtk_Content_Article * * This class has been auto-generated by the Doctrine ORM Framework * * @property integer $id * @property integer $section_id * @property integer $author_id * @property integer $image_id * @property string $title * @property string $description * @property string $text * @property integer $visits * @property boolean $draft * @property boolean $closed * @property Otk_Content_Section $section * @property Otk_User $author * @property Otk_File $image * @property Doctrine_Collection $sections * @property Doctrine_Collection $Otk_Content_Tag * @property Doctrine_Collection $comments * */ abstract class BaseOtk_Content_Article extends Otk_Model_Record { public function setTableDefinition() { $this->setTableName('oss_content_articles'); $this->hasColumn('id', 'integer', 3, array('type' => 'integer', 'primary' => true, 'autoincrement' => true, 'length' => '3')); $this->hasColumn('section_id', 'integer', 2, array('type' => 'integer', 'notnull' => true, 'length' => '2')); $this->hasColumn('author_id', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('image_id', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('title', 'string', 255, array('type' => 'string', 'length' => '255')); $this->hasColumn('description', 'string', 1000, array('type' => 'string', 'length' => '1000')); $this->hasColumn('text', 'string', null, array('type' => 'string')); $this->hasColumn('visits', 'integer', 3, array('type' => 'integer', 'length' => '3')); $this->hasColumn('draft', 'boolean', null, array('type' => 'boolean', 'notnull' => true, 'default' => false)); $this->hasColumn('closed', 'boolean', null, array('type' => 'boolean')); } public function setUp() { $this->hasOne('Otk_Content_Section as section', array('local' => 'section_id', 'foreign' => 'id')); $this->hasOne('Otk_User as author', array('local' => 'author_id', 'foreign' => 'id')); $this->hasOne('Otk_File as image', array('local' => 'image_id', 'foreign' => 'id')); $this->hasMany('Otk_Content_Section as sections', array('refClass' => 'Otk_Content_Tag', 'local' => 'article_id', 'foreign' => 'section_id')); $this->hasMany('Otk_Content_Tag', array('local' => 'id', 'foreign' => 'article_id')); $this->hasMany('Otk_Content_Comment as comments', array('local' => 'id', 'foreign' => 'article_id')); $timestampable0 = new Doctrine_Template_Timestampable(); $sluggable0 = new Doctrine_Template_Sluggable(array('fields' => array(0 => 'title'), 'canUpdate' => true, 'unique' => true)); $this->actAs($timestampable0); $this->actAs($sluggable0); } } To support further regenerations of the subclass as the schema evolves, another subclassing step is necessary. This class will never be touched by the regeneration process and it is the one referred in client code. /** * This class defines the domain logic via addition of methods. */ class Otk_Content_Article extends BaseOtk_Content_Article { public function getTags() { $tags = array(); foreach ($this->sections as $section) { $tags[$section->slug] = $section->name; } return $tags; } }
May 12, 2010
by Giorgio Sironi
· 8,887 Views
article thumbnail
Practical PHP Patterns: Table Data Gateway
The Table Data Gateway pattern is the object-oriented equivalent of a relational table. In fact, this pattern's intent is to encapsulate the full interaction with a database table, holding all the logic specific to this particular implementation of the back end. In the majority of cases, a Table Data Gateway deals with a relational model, having a 1:1 relationship with the main tables of the database. Minor tables may not need a specific class, or can be managed via Table Data Gateways of tables that link them with foreign keys (for example entities introduced to store M:N relationships are usually not first-class citizens.) In the relational implementation, the Table Data Gateway handles all SQL queries, presenting a domain-specific interface when a class is coded for a specific table, or a generic one when a generic implementation is reused throughout different applications. The difference between the two APIs may be something like findBy($field, $value) (generic) versus findByPrice($price) (domain-specific). Note that in PHP magic methods are often used to implement domain-specific interfaces without code generation: a __call() implementation can catch the various findBy*() method and throw exceptions if the methods is not applicable. Related patterns Although, the concept of table is already correlated with a relational model (and it does not hold when the back end is an object-oriented database or one of the key-value stores so trendy today), this pattern is named Gateway because it is a specialization of the Gateway category of pattern, which decouple an object graph (or any in-memory structures) from external infrastructure like databases, web services, filesystems and so on. In fact, there is an alternate name for this pattern: Data Access Object (or DAO for friends). Although if I was pedantic I would highlight the differences between the implementations of DAOs and Table Data Gateway, their intent is really the same and there are differences in an individual pattern implementations that are greater than the ones between the different patterns. There's no clear demarcation line between the two. Another related pattern is the Table Module one. Table Data Gateway does not work against, but with a Table Module, providing a separation of concerns: the first object takes the rows out of the database, while the second performs in-memory operation on them (generally by composing the Table Data Gateway or its results). The in-memory operations of a Table Module are easier to test, but the SQL-based operations of the Table Data Gateway are pushed on the database side: there is a trade-off between the logic should be kept in each class. When used in isolation, the Table Data Gateway is also a Factory for also for Row Data Gateways or Active Records, both again implemented with generic or domain-specific interfaces. Many frameworks and first-generation PHP ORMs based on Active Record are also based on Table Data Gateway to provide a collection-level access to the objects stored as rows. In the context of Active Record, the only alternative to a Table Data Gateway to handle operations like find() is to place static methods on the Active Record class, with all the testability and dishonest API issues that ensue. Both Zend Framework and Doctrine 1.x represent tables as first-class objects. Examples Zend Framework's component Zend_Db, which is explored in the sample code, provides always generic implementations of Zend_Db_Table, and the possibility of optional subclassing (to add domain-specific methods). It is not recommend to expose the API of Table Data Gateway in front-end code, but it's a simple solution when the business logic does not warrant a full-featured Domain Model. Even when working with a Domain Model, and before the introduction of generic Data Mappers for PHP, the Table Data Gateway can be used in a composition solution (wrapped) to craft a simple API for a domain-specific Data Mapper, resulting in decoupling from the database. As I've written earlier, the sample code is taken from the Zend_Db_Table class of Zend Framework (actually from its parent abstract class, Zend_Db_Table_Abstract). I've enriched the docblock comments and left out all the methods not part of the main API (most of getters and setters for configuration and protecte|private members). $value) { switch ($key) { case self::ADAPTER: $this->_setAdapter($value); break; case self::DEFINITION: $this->setDefinition($value); break; case self::DEFINITION_CONFIG_NAME: $this->setDefinitionConfigName($value); break; case self::SCHEMA: $this->_schema = (string) $value; break; case self::NAME: $this->_name = (string) $value; break; case self::PRIMARY: $this->_primary = (array) $value; break; case self::ROW_CLASS: $this->setRowClass($value); break; case self::ROWSET_CLASS: $this->setRowsetClass($value); break; case self::REFERENCE_MAP: $this->setReferences($value); break; case self::DEPENDENT_TABLES: $this->setDependentTables($value); break; case self::METADATA_CACHE: $this->_setMetadataCache($value); break; case self::METADATA_CACHE_IN_CLASS: $this->setMetadataCacheInClass($value); break; case self::SEQUENCE: $this->_setSequence($value); break; default: // ignore unrecognized configuration directive break; } } return $this; } /** * Inserts a new row. * The data structure is as generic as possible. The list of columns is * known by configuration. * $this->_db is a light abstraction over PDO, which already encapsulates * most of the SQL. Database abstraction is not a banal task and segregating * the functionalities in different classes is very helpful. * * @param array $data Column-value pairs. * @return mixed The primary key of the row inserted. */ public function insert(array $data) { $this->_setupPrimaryKey(); /** * Zend_Db_Table assumes that if you have a compound primary key * and one of the columns in the key uses a sequence, * it's the _first_ column in the compound key. */ $primary = (array) $this->_primary; $pkIdentity = $primary[(int)$this->_identity]; /** * If this table uses a database sequence object and the data does not * specify a value, then get the next ID from the sequence and add it * to the row. We assume that only the first column in a compound * primary key takes a value from a sequence. */ if (is_string($this->_sequence) && !isset($data[$pkIdentity])) { $data[$pkIdentity] = $this->_db->nextSequenceId($this->_sequence); } /** * If the primary key can be generated automatically, and no value was * specified in the user-supplied data, then omit it from the tuple. */ if (array_key_exists($pkIdentity, $data) && $data[$pkIdentity] === null) { unset($data[$pkIdentity]); } /** * INSERT the new row. */ $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; $this->_db->insert($tableSpec, $data); /** * Fetch the most recent ID generated by an auto-increment * or IDENTITY column, unless the user has specified a value, * overriding the auto-increment mechanism. */ if ($this->_sequence === true && !isset($data[$pkIdentity])) { $data[$pkIdentity] = $this->_db->lastInsertId(); } /** * Return the primary key value if the PK is a single column, * else return an associative array of the PK column/value pairs. */ $pkData = array_intersect_key($data, array_flip($primary)); if (count($primary) == 1) { reset($pkData); return current($pkData); } return $pkData; } /** * Updates existing rows. * Again we see generic data structures, not tied to PDO * or to particular adapters. * * @param array $data Column-value pairs. * @param array|string $where An SQL WHERE clause, or an array of SQL WHERE clauses. * @return int The number of rows updated. */ public function update(array $data, $where) { $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; return $this->_db->update($tableSpec, $data, $where); } /** * Deletes existing rows. * * @param array|string $where SQL WHERE clause(s). * @return int The number of rows deleted. */ public function delete($where) { $tableSpec = ($this->_schema ? $this->_schema . '.' : '') . $this->_name; return $this->_db->delete($tableSpec, $where); } /** * Fetches rows by primary key. The argument specifies one or more primary * key value(s). To find multiple rows by primary key, the argument must * be an array. * * This method accepts a variable number of arguments. If the table has a * multi-column primary key, the number of arguments must be the same as * the number of columns in the primary key. To find multiple rows in a * table with a multi-column primary key, each argument must be an array * with the same number of elements. * * The find() method always returns a Rowset object, even if only one row * was found. * * @param mixed $key The value(s) of the primary keys. * @return Zend_Db_Table_Rowset_Abstract Row(s) matching the criteria. * @throws Zend_Db_Table_Exception */ public function find() { $this->_setupPrimaryKey(); $args = func_get_args(); $keyNames = array_values((array) $this->_primary); if (count($args) < count($keyNames)) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Too few columns for the primary key"); } if (count($args) > count($keyNames)) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Too many columns for the primary key"); } $whereList = array(); $numberTerms = 0; foreach ($args as $keyPosition => $keyValues) { $keyValuesCount = count($keyValues); // Coerce the values to an array. // Don't simply typecast to array, because the values // might be Zend_Db_Expr objects. if (!is_array($keyValues)) { $keyValues = array($keyValues); } if ($numberTerms == 0) { $numberTerms = $keyValuesCount; } else if ($keyValuesCount != $numberTerms) { require_once 'Zend/Db/Table/Exception.php'; throw new Zend_Db_Table_Exception("Missing value(s) for the primary key"); } $keyValues = array_values($keyValues); for ($i = 0; $i < $keyValuesCount; ++$i) { if (!isset($whereList[$i])) { $whereList[$i] = array(); } $whereList[$i][$keyPosition] = $keyValues[$i]; } } $whereClause = null; if (count($whereList)) { $whereOrTerms = array(); $tableName = $this->_db->quoteTableAs($this->_name, null, true); foreach ($whereList as $keyValueSets) { $whereAndTerms = array(); foreach ($keyValueSets as $keyPosition => $keyValue) { $type = $this->_metadata[$keyNames[$keyPosition]]['DATA_TYPE']; $columnName = $this->_db->quoteIdentifier($keyNames[$keyPosition], true); $whereAndTerms[] = $this->_db->quoteInto( $tableName . '.' . $columnName . ' = ?', $keyValue, $type); } $whereOrTerms[] = '(' . implode(' AND ', $whereAndTerms) . ')'; } $whereClause = '(' . implode(' OR ', $whereOrTerms) . ')'; } // issue ZF-5775 (empty where clause should return empty rowset) if ($whereClause == null) { $rowsetClass = $this->getRowsetClass(); if (!class_exists($rowsetClass)) { require_once 'Zend/Loader.php'; Zend_Loader::loadClass($rowsetClass); } return new $rowsetClass(array('table' => $this, 'rowClass' => $this->getRowClass(), 'stored' => true)); } return $this->fetchAll($whereClause); } /** * Fetches a new blank row (not from the database). * Thanks to the metadata, a new Row Data Gateway can be created. This * if a Factory Method. The dynamic nature of PHP makes configuring the * subclass for the Row Data Gateway as simple as defining a string. * * @param array $data OPTIONAL data to populate in the new row. * @param string $defaultSource OPTIONAL flag to force default values into new row * @return Zend_Db_Table_Row_Abstract */ public function createRow(array $data = array(), $defaultSource = null) { $cols = $this->_getCols(); $defaults = array_combine($cols, array_fill(0, count($cols), null)); // nothing provided at call-time, take the class value if ($defaultSource == null) { $defaultSource = $this->_defaultSource; } if (!in_array($defaultSource, array(self::DEFAULT_CLASS, self::DEFAULT_DB, self::DEFAULT_NONE))) { $defaultSource = self::DEFAULT_NONE; } if ($defaultSource == self::DEFAULT_DB) { foreach ($this->_metadata as $metadataName => $metadata) { if (($metadata['DEFAULT'] != null) && ($metadata['NULLABLE'] !== true || ($metadata['NULLABLE'] === true && isset($this->_defaultValues[$metadataName]) && $this->_defaultValues[$metadataName] === true)) && (!(isset($this->_defaultValues[$metadataName]) && $this->_defaultValues[$metadataName] === false))) { $defaults[$metadataName] = $metadata['DEFAULT']; } } } elseif ($defaultSource == self::DEFAULT_CLASS && $this->_defaultValues) { foreach ($this->_defaultValues as $defaultName => $defaultValue) { if (array_key_exists($defaultName, $defaults)) { $defaults[$defaultName] = $defaultValue; } } } $config = array( 'table' => $this, 'data' => $defaults, 'readOnly' => false, 'stored' => false ); $rowClass = $this->getRowClass(); if (!class_exists($rowClass)) { require_once 'Zend/Loader.php'; Zend_Loader::loadClass($rowClass); } $row = new $rowClass($config); $row->setFromArray($data); return $row; } }
May 5, 2010
by Giorgio Sironi
· 7,970 Views
article thumbnail
Practical PHP Patterns: Domain Model
The architectural pattern I'd like to talk about in this article is the overly famous Domain Model. An application's Domain Model is simply defined as an object graph created from domain-specific classes; when present, a Domain Model is the core of the application, where all the business logic resides. This object graph is employed by upper layers of an application which present it to the user. The metaphor for this methodology In software development, the term domain (or business domain) is an umbrella for the area the application is built in, and that it will serve. The new domains we encounter as we move to new projects are one of the most interesting points of software development, where we are constantly embracing new fields and gaining knowledge. Given a domain such as a particular industry (chemical, electronics) or business (air travelling, e-commerce), the point of connection of an application with these activities is its model. A model is an abstract representation of the reality of the domain, which captures its interesting and relevant aspects. The practice of modelling is not a specific trait of software development (in particular model-driven development), but it is a more general scientifical process. For example, everyone who works in the field of information technology knows the voltage/current relationships for simple components such as resistors and capacitors (Ohm's law and current derivative of the voltage). The specific domain here is electronics, and this model is named lumped component model, essentially because it lets a designer connect isolated one-port (two terminals) components to build his desired circuit. This model is a simplification of much more complex models of reality: the Maxwell equations and the propagation of electromagnetic fields; the lumped component model is valid whenever the frequency of the voltage/current signals in the circuit is low, so that the wavelengths of these signals are far greater than the dimensions of the circuit (if that goes over your head, don't worry, it's the field of electrical engineers.) When designers consider larger circuits, such as a transmission line, this model ceases to give correct results and more general ones must be employed. The domain is almost the same, but the model serves a different purpose and has to be necessarily different from the one used in small scale circuits. This complex example is here only to show that given a domain, there is no single model for it, but there are many possible ones which may adapt more or less reliably to the goals of an application. Starting from a modelling phase and deep understanding of the domain are key points of Domain-Driven Design, one of the ascending methodologies for developing complex enterprise software. Software models While there are standard mathematical models for many domains in the scientific world, software developers usually build a tailored one in every different application, performing an analysis of the domain (or at least they should.) The result of the modelling can comprehend document or diagrams, but the most powerful artifact is an executable model. Object-oriented programming is a almost perfect paradigm when it comes to modelling the real world, and lets the developers construct a Domain Model in the form of a set of classes. In a correct implementation of a Domain Model, these classes should be behaviorally complete: they must encapsulate their data as much as possible and expose a set of methods, while avoiding their usage as dumb data containers. The bread and butter of a Domain Model are the classical example of User, Post, Forum, Group, PrivateMessage classes, which are usually in a one to one relationship with database tables. But the Domain Model is not limited to these Entity classes: it also "comprehends" ValueObjects (modelization of domain-specific data types) and various kinds of Services. Every class that encapsulates business logic is welcome, so that this logic is not duplicated in upper layers, which are the primary clients of the Domain Model. Dependencies and purity Another key trait of the classes included in the Domain Model is the absence of external dependencies, like a library to store in the data contained in the objects in a database. The code artifact in a Domain Model are either interfaces, or Plain Old Php Objects (classes which do not extend any external abstract superclass.) Active Record approaches should be avoided because not only a relational database is an infrastructure detail not included in the Domain Model itself, but the very concept of persistence is abstracted away. As far as the clients of the Domain Model are concerned, the state and behavior of the application are represented by an in-memory object graph, whose methods expose functionalities and which client code can play with. There are no dependencies from a Domain Model towards infrastructure classes, because these dependencies must be inverted. The resulting system is an instance of the hexagonal architecture, where the Domain Model defines ports (interfaces) and infrastructure can be chosen to provide adapters for these ports (implementations in the form of classes extraneous to the model). The implementaton of non-invasive persistence is the subject of the Data Mapper pattern, which will be treated later in this series, but every kind of service implementation which communicate with the outside of the core object graph (databases, network, filesystem) is only defined as a contract in the Domain Model. Persistence is almost always dealt with a library in other object-oriented languages, now also in PHP with a non-invasive ORM such as Doctrine 2. Nothing obstructs the developers from implementing a specific Data Mapper by hand, but it's a very repetitive and prone to errors task. While in origin simpler, invasive patterns such as Active Record could be used in a Domain Model, nowadays with Data Mapper availables it is considered an hack. Sample Returning to the subject of the Domain Model as the core of an application, the diffused opinion is that the more complex the business logic and the data involved, the more the application benefits from a rich Domain Model. Thus, this pattern should not be used in small-sized applications where there is no much more logic than CRUD screens for data containers, which unfortunately were a target for PHP in the last ten years. I hope PHP keeps evolving to finally break in the enterprise segment, where this pattern is most valuable. Due to the size and scope of this article, I am forced to keep the sample code short. Forgive me if you think that you can achieve the same functionality with fewer lines of code, but this pattern is about architecture and should highlight the separation of concerns between classes more than the KISS principle. Another problem with code samples in modelling is that you have to actually know the domain well to follow the discussion. For this reason I chose a webmail system for this example. _sender; } /** * Do we need setters and getters? Every field should be * analyzed. If we can keep it private and inaccessible, * it's usually better. */ public function setSender($sender) { $this->_sender = $sender; } /** * @return string */ public function getRecipient() { return $this->_recipient; } public function setRecipient($recipient) { $this->_recipient = $recipient; } /** * @return string */ public function getSubject() { return $this->_subject; } public function setSubject($subject) { $this->_subject = $subject; } /** * @return string */ public function getText() { return $this->_text; } public function setText($text) { $this->_text = $text; } public function __toString() { return $this->_subject . ' > ' . substr($this->_text, 0, 20) . '...'; } public function reply() { $reply = new Email(); $reply->setRecipient($this->_sender); $reply->setSender($this->_recipient); $reply->setSubject('Re: ' . $this->_subject); $reply->setText($this->_sender . " wrote:\n" . $this->_text); return $reply; } } /** * Interface for a service. This is part of the Domain Model, * implementations will be plugged in depending on the environment. */ interface EmailRepository { /** * @return array * @TypeOf(Email) */ public function getEmailsFor($recipient); } // client code $mail = new Email(); $mail->setSender("[email protected]"); $mail->setRecipient("[email protected]"); $mail->setSubject('Hello'); $mail->setText('This is a test of an Email object, which is part of our Domain Model.'); echo $mail, "\n"; $reply = $mail->reply(); echo $reply, "\n";
April 25, 2010
by Giorgio Sironi
· 7,953 Views
article thumbnail
Running Hazelcast on a 100 Node Amazon EC2 Cluster
The purpose of this article is to give you the details of our 100 node cluster demo. This demo is recorded and you can watch the 5 minute screencast Hazelcast is an open source clustering and highly scalable data distribution platform for Java. JVMs that are running Hazelcast will dynamically cluster and allow you to easily share and partition your application data across the cluster. Hazelcast is a peer-to-peer solution (there is no master node, every node is a peer) so there is no single point of failure. Communication among cluster members is always TCP/IP with Java NIO beauty. The default configuration comes with 1 backup so if a node fails, no data will be lost (you can specify the backup count). It is as simple as using java.util.{Map, Queue, Set, List}. Just add the hazelcast.jar into your classpath and start coding. When you download the Hazelcast, you will find a test.sh under bin directory. The test.sh runs an application which randomly makes 40% get, 40% put and 20% remove on a distributed map. In this demo the same test application will be used to see how it performs on 100 node cluster. Amazon EC2 and S3 An easy to use and scalable cloud environment was needed for demo so we decided to use Amazon EC2 for server instances (nodes) and S3 service to store demo application zip and configuration files. With its newly announced Java SDK, it is very simple to start/stop server instances and upload files to S3 programatically. Hazelcast AMI & Launcher The challenge here is that we are running an application on 100 nodes and dealing with each and every server in the cluster is a huge task. We don't want to ssh into every server and manually start the application. This part is automated by creating a special server image (AMI). The AMI contains Java Runtime and a launcher application we developed, which will download the demo application from Amazon S3, unzip it, and run the hazelcast/bin/test.sh in it. The Launcher is actually so generic that it can run any application; it doesn't care/know what test.sh contains. Deployer Deployment of the demo application is also automated so that we don't need to login into AWS Management Console and manually start instances. Deployer instantiates any number of Amazon EC2 servers with any AMI and also uploads the demo application zip file to S3. So the idea here is that, the Deployer will store the application into S3 and launch 100 EC2 instances with our image. The Launcher on each instance will download the application from S3 and run it. Demo Details. The smallest EC2 instances (m1.small) are used to run the demo. These are the virtual instances with CPU about 1.0 GHz. Also keep in mind that EC2 platform suffers from considerable amount of network latency. That's why we increased the thread count to 250 in our application. The following steps performed during the demo Download hazelcast-1.8.3.zip from www.hazelcast.com. Unzip the file and move the monitoring war file into tomcat6/webapps directory. Edit the test.sh under the bin directory: Add -Xmx1G -Xms1G Add -Dhazelcast.initial.wait.seconds=100 to make the cluster evenly partition on start so that migration can be avoided for better performance. Add t250 as an argument to the application to set thread count to 250. Remember the latency issue. Run the Deployer from IDE. Check from EC2 Management Console if 100 servers started. Start tomcat. Copy the public DNS name of one of the servers to connect to from monitoring tool. Go to http://localhost:8080/hazelcast-monitor-1.8.3/ (Hazelcast Monitoring Tool). Paste the address and connect to the cluster. Enjoy! Results You should always look for programatic ways of launching applications on the cloud. With these tools we were able to deploy and run the demo application on 100 servers in minutes. The entire Hazelcast cluster was making over 400,000 operations per second on the smallest EC2 instances. In our next demo we will experiment Hazelcast on large data set and even bigger cluster. Watch the screencast
April 16, 2010
by Fuad Malikov
· 62,695 Views · 1 Like
  • Previous
  • ...
  • 520
  • 521
  • 522
  • 523
  • 524
  • 525
  • 526
  • 527
  • 528
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×