Data Engineering Resources

The Latest Data Engineering Topics

Naming Conventions for Parameterized Types

Parameterized types - the <> expressions that can be used in Java as of JDK 5 are not just for collections. I find myself frequently using them in APIs I design. They really do let you write things which are more generic in the non-Java sense of the word - and the result is more reusable code, which means less code overall, which means fewer bugs and things to test. The verbosity, and some of the weirdness of type-erasure are less than ideal, but used right, the benefits are worth the complexity. The standard (and somewhere recommended) naming convention for parameterized types is to use a single-letter name. That works fine in signatures that have only one such type. But in practice, single-letter names make code less self-describing, and if you're defining a class with more than one parameterized type, it can be confusing and hard to read. People other than me will have to call, understand and maintain my code - the more self-describing I can make it, the better. So I am looking for a naming convention that makes it obvious that something is a parameterized type, but allows for descriptive names. I am wondering if anybody else has run into this problem, and if there is any emerging consensus on naming generics. Do you work on a project that uses generics a lot? If so, what do you do? Here's an example. At the moment, I'm writing a generic (in both senses) class which simply limits the number of threads which can access some resource. It's basically a wrapper around a Semaphore which uses a Runnable-like object to ensure that the Semaphore is accessed correctly, and does some non-blocking statistic gathering about thread contention. So to access the scarce resource, you pass in a ResourceAccessor: public interface ResourceAccessor { public Result run (ProtectedResource resource, Argument argument); } The problem is that, when somebody looks at this interface, they will instantly get the idea that there are really classes they need to go find, which are called ProtectedResource, Argument and Result - and of course, no such classes exist - these are just names for generic types. The standard-naming-convention is worse: public interface ResourceAccessor { public S run (T resource, R argument); } Here, nobody could possibly figure out what on earth this class is for without extensive documentation - this is a really horrible idea. So I've concluded that the standard recommendations for generic type names are simply wrong for any non-trivial usage (I.e. Collection is fine, since there is one type and Collections are well-understood). You simply can't do this on a non-collection code structure you have invented, or people will just be confused and not use it. The best suggestion I've heard thus far is using $ as a prefix: public interface ResourceAccessor <$ProtectedResource, $Argument, $Result> { public $Result run ($ProtectedResource resource, $Argument argument); } I don't find this pretty, but I don't have any better ideas, and at least it makes it crystal-clear that there is something different about these names. Any thoughts? What do you do in this situation?

September 20, 2010

by Tim Boudreau

· 17,840 Views

Commons Lang 3 -- Improved and Powerful StringEscapeUtils

In the first and second parts of this series I talked about some of the new features like enum and concurrency support that have been added in commons-lang 3. In this article, I am going to talk about a new package 'org.apache.commons.lang3.text.translate' which has been added in commons-lang 3. This package is added to fix problems in the design and implementation of the StringEscapeUtils class which exists in versions prior to 3.0. To make it clearer, let's first talk about the purpose of StringEscapeUtils class and the problems it had prior to version 3. Purpose of StringEscapeUtils StringEscapeUtils is a utility class which escapes and unescapes String for Java, JavaScript, HTML, XML, and SQL. For example, @Test public void test_StringEscapeUtils() { assertEquals("\\\\\\n\\t\\r", StringEscapeUtils.escapeJava("\\\n\t\r")); // escapes the Java String assertEquals("\\\n\t\r",StringEscapeUtils.unescapeJava("\\\\\\n\\t\\r")); //unescapes the Java String assertEquals("I didn\\'t say \\\"you to run!\\\"",StringEscapeUtils.escapeJavaScript("I didn't say \"you to run!\""));//escapes the Javascript assertEquals("<xml>", StringEscapeUtils.escapeXml(""));//escapes the xml } Problems with StringEscapeUtils There were a lot of problems in the StringEscapeUtils implementation prior to version3. Some of these were: The implementation was not extensible. Let's take an example of escapeJava, suppose we want to add support in the escapeJava method that it should start escaping single quotes. To add such support we would have to change the existing class code and another if condition which if satisfied will escape single quotes. So, the API was breaking the open-closed principle i.e. a class should be open for extension and closed for modification. It was not symmetric i.e. original should be equal to unescape(escape(original)) but it was not the case. StringEscapeUtils.escapeHtml() escapes multibyte characters like Chinese, Japanese etc. Issue 339 @Test public void testEscapeHiragana() { // Some random Japanese unicode characters String original = "\u304B\u304C\u3068"; String escaped = StringEscapeUtils.escapeHtml(original); assertEquals(original, escaped); } StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into 2 characters. Issue 480 @Test public void testEscapeHtmlHighUnicode() throws java.io.UnsupportedEncodingException { byte[] data = new byte[] { (byte) 0xF0, (byte) 0x9D, (byte) 0x8D,(byte) 0xA2 }; String original = new String(data, "UTF8"); String escaped = StringEscapeUtils.escapeHtml(original); assertEquals(original, escaped); } StringEscaper.escapeXml() escapes characters > 0x7f . Issue 66 @Test public void shouldNotEscapeValuesGreaterThan0x7f() { assertEquals("XML should not escape >0x7f values", "\u00A1",StringEscapeUtils.escapeXml("\u00A1")); } Solution -- Rewritten StringEscapeUtils In version 3.0, StringEscapeUtils is completely rewritten to fix all the bugs associated with this class and to provide a way for the user to customize the behavior of its methods. They have moved all the logic present in the StringEscapeUtils to the classes in the package 'org.apache.commons.lang3.text.translate'. Let's take an example of escapeJava function in StringEscapeUtils, escapeJava function does not contain any business logic, it just calls the translate method on CharSequenceTranslator reference. What they did can be best understood by looking at the code below public static final CharSequenceTranslator ESCAPE_JAVA = new AggregateTranslator(new LookupTranslator( new String[][] { {"\"", "\\\""}, {"\\", "\\\\"}, }),new LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_ESCAPE()),UnicodeEscaper.outsideOf(32, 0x7f)); and in the escapeJava method public static final String escapeJava(String input) { return ESCAPE_JAVA.translate(input); } A constant of type CharSequenceTranslator was assigned an AggregateTranslator object. AggregateTranslator can take an array of translators, and it iterates over each of them. The LookupTranslator replaces the string at zeroth index with the string at the first index. UnicodeEscaper translates values outside the given range to unicode values. As you can see, you can very easily write your own escape methods. For example, if you want to add the support of escaping &, you can do it like this public static final CharSequenceTranslator ESCAPE_JAVA = new LookupTranslator( new String[][] { {"\"", "\\\""}, {"\\", "\\\\"}, }).with(new LookupTranslator( new String[][]{ {"&", "&"}, {"<", "<"} )).with( new LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_ESCAPE()) ).with( UnicodeEscaper.outsideOf(32, 0x7f) ); StringEscapeUtils.escapeSql has been removed from the API as it was misleading developers to not use PreparedStatement.This method was not of much use as it was only escaping single quotes.

September 17, 2010

by Shekhar Gulati

· 71,173 Views · 2 Likes

Jetty-maven-plugin: Running a Webapp with a DataSource and Security

this post describes how to configure the jetty-maven-plugin and the jetty servlet container to run a web application that uses a data source and requires users to log in, which are the basic requirements of most web applications. i use jetty in development because it’s fast and easy to work with. why jetty? well, because it’s much faster then the websphere as i normally use and it really well supports fast (or shall i say agile? ) development thanks to its fast turnaround. and because it’s simply cool to type bash$ svn checkout http://example.com/repo/trunk/mywebapp bash$ cd mywebapp bash$ mvn jetty:run bash$ firefox http://localhost:8080/mywebapp and to be able to immediatelly log into and interact with the application. however it should be noted that jetty isn’t a full-featured javaee server and thus may not be always usable. project setup general configuration you need to add the jetty plugin to your pom.xml : 4.0.0 com.example mywebapp war ... ... org.mortbay.jetty maven-jetty-plugin 6.1.0 3 ... ... ... as you can see, i’m using jetty 6.1.0. defining a datasource let’s assume that the application uses a datasource configured at the server and accesses it normally via jndi. then we must define a reference to the data source in src/main/webapp/ web-inf/web.xml : ... ... ... jdbc/lmsdb javax.sql.datasource container shareable next we need to describe the datasource to jetty. there are multiple ways to do that, i’ve chosen to do so in src/main/webapp/ web-inf/jetty-env.xml : jdbc/lmsdb lmsdb myuser secret db.toronto.ca.ibm.com 3711 notice that the class used is db2simpledatasource and not a jdbc driver. that is, of course, because we need a datasource, not a driver. the jetty wiki pages also contain examples of datasource configuration for other dbs . finally we must make the corresponding jdbc implementation available to jetty by adding it to the plugin’s dependencies in the pom.xml : org.mortbay.jetty maven-jetty-plugin 6.1.0 <... com.ibm.db2 db2jcc 9.7 jar system ${basedir}/../lms.sharedlibraries/db2/db2jcc.jar com.ibm.db2 db2jcc_license_cisuz 9.7 jar system ${basedir}/../lms.sharedlibraries/db2/db2jcc_license_cisuz.jar please do not scorn me for using system-scoped dependencies , sometimes that is unfortunatelly the most feasible way. enabling security and configuring an authentication mechanism we would like to limit access to the application only to the authenticated users in the admin role with the exception of pages under public/. therefore we declare the appropriate security constraints in web.xml: ... authorizedusers all urls /* admin publicaccess public pages /public/* basic learning@ibm mini person feed management administrator access admin ... beware that jetty doesn’t support https out of the box and thus if you will add the data constraint confidential to any resource, you will automatically get http 403 forbidden no matter what you do. that’s why i’ve commented it out above. it is possible to enable ssl in jetty but i didn’t want to bother with certificate generation etc. next we need to tell jetty how to authenticate users. this is done via realms and we will use the simplest, file-based one. again there are multiple ways to configure it, for example in the pom.xml : org.mortbay.jetty maven-jetty-plugin 6.1.0 3 learning@ibm mini person feed management src/test/resources/jetty-users.properties ... the name must match exactly the realm-name in web.xml. you then define the users and their passwords and roles in the declared file, in this case in src/test/resources/ jetty-users.properties : user=psw,admin the format of the file is username=password[,role1,role2,...]. when you download jetty, you will find a fine example of using jaas with a file-based back-end for authentication and authorization under examples/test-jaas-webapp (invoke mvn jetty:run from the folder and go to http://localhost:8080/jetty-test-jaas/). however it seems that jaas causes an additional overhead visible as a few-seconds delay when starting the server so it might be preferrable not to use it. conclusion with jetty it’s easy to enable security and create a data source, which are the basic requirements of most web applications. anybody can then very easily run the application to test and develop it. development is where jetty really shines provided that you don’t need any feature it doesn’t have. when troubleshooting, you may want to tell jetty to log at the debug level with mvn -ddebug .. or to log requests , which can be also configured in the jetty-env.xml. beware that this post describes configuration for jetty 6.1.0. it can be different in other versions and it certainly is different in jetty 7. from http://theholyjava.wordpress.com/2010/09/10/jetty-maven-plugin-running-a-webapp-with-a-datasource-and-security/

September 13, 2010

by Jakub Holý

· 23,016 Views

Java: Overriding Getters and Setters Design

Why do we keep instance variables private? We don’t want other classes to depend on them. Moreover it gives the flexibility to change a variable’s type or implementation on a whim or an impulse. Why, then programmers automatically add or override getters and setters to their objects, exposing their private variables as if they were public? Accessor methods Accessors (also known as getters and setters) are methods that let you read and write the value of an instance variable of an object. public class AccessorExample { private String attribute; public String getAttribute() { return attribute; } public void setAttribute(String attribute) { this.attribute = attribute; } } Why Accessors? There are actually many good reasons to consider using accessors rather than directly exposing fields of a class Getter and Setter make API more stable. For instance, consider a field public in a class which is accessed by other classes. Now, later on, you want to add any extra logic while getting and setting the variable. This will impact the existing client that uses the API. So any changes to this public field will require a change to each class that refers it. On the contrary, with accessor methods, one can easily add some logic like cache some data, lazily initialize it later. Moreover, one can fire a property changed event if the new value is different from the previous value. All this will be seamless to the class that gets the value using accessor method. Should I have Accessor Methods for all my fields? Fields can be declared public for package-private or private nested class. Exposing fields in these classes produces less visual clutter compare to accessor-method approach, both in the class definition and in the client code that uses it. If a class is package-private or is a private nested class, there is nothing inherently wrong with exposing its data fields—assuming they do an adequate job of describing the abstraction provided by the class. Such code is restricted to the package where the class is declared, while the client code is tied to class internal representation. We can change it without modifying any code outside that package. Moreover, in the case of a private nested class, the scope of the change is further restricted to the enclosing class. Another example of a design that uses public fields is JavaSpace entry objects. Ken Arnold described the process they went through to decide to make those fields public instead of private with get and set methods here Now this sometimes makes people uncomfortable because they've been told not to have public fields; that public fields are bad. And often, people interpret those things religiously. But we're not a very religious bunch. Rules have reasons. And the reason for the private data rule doesn't apply in this particular case. It is a rare exception to the rule. I also tell people not to put public fields in their objects, but exceptions exist. This is an exception to the rule, because it is simpler and safer to just say it is a field. We sat back and asked: Why is the rule thus? Does it apply? In this case it doesn't. Private fields + Public accessors == encapsulation Consider the example below public class A { public int a; } Usually, this is considered bad coding practice as it violates encapsulation. The alternate approach is public class A { private int a; public void setA(int a) { this.a =a; } public int getA() { return this.a; } } It is argued that this encapsulates the attribute. Now is this really encapsulation? The fact is, Getters/setters have nothing to do with encapsulation. Here the data isn't more hidden or encapsulated than it was in a public field. Other objects still have intimate knowledge of the internals of the class. Changes made to the class might ripple out and enforce changes in dependent classes. Getter and setter in this way are generally breaking encapsulation. A truly well-encapsulated class has no setters and preferably no getters either. Rather than asking a class for some data and then compute something with it, the class should be responsible for computing something with its data and then return the result. Consider an example below, public class Screens { private Map screens = new HashMap(); public Map getScreens() { return screens; } public void setScreens(Map screens) { this.screens = screens; } // remaining code here } If we need to get a particular screen, we do code like below, Screen s = (Screen)screens.get(screenId); There are things worth noticing here.... The client needs to get an Object from the Map and casting it to the right type. Moreover, the worst is that any client of the Map has the power to clear it which may not be the case we usually want. An alternative implementation of the same logic is: public class Screens { private Map screens = new HashMap(); public Screen getById(String id) { return (Screen) screens.get(id); } // remaining code here } Here the Map instance and the interface at the boundary (Map) are hidden. Getters and Setters are highly Overused Creating private fields and then using the IDE to automatically generate getters and setters for all these fields is almost as bad as using public fields. One reason for the overuse is that in an IDE it’s just now a matter of few clicks to create these accessors. The completely meaningless getter/setter code is at times longer than the real logic in a class and you will read these functions many times even if you don't want to. All fields should be kept private, but with setters only when they make sense which makes object Immutable. Adding an unnecessary getter reveals internal structure, which is an opportunity for increased coupling. To avoid this, every time before adding the accessor, we should analyse if we can encapsulate the behaviour instead. Let’s take another example, public class Money { private double amount; public double getAmount() { return amount; } public void setAmount(double amount) { this.amount = amount; } //client Money pocketMoney = new Money(); pocketMoney.setAmount(15d); double amount = pocketMoney.getAmount(); // we know its double pocketMoney.setAmount(amount + 10d); } With the above logic, later on, if we assume that double is not a right type to use and should use BigDecimal instead, then the existing client that uses this class also breaks. Let’s restructure the above example, public class Money { private BigDecimal amount; public Money(String amount) { this.amount = new BigDecimal(amount); } public void add(Money toAdd) { amount = amount.add(toAdd.amount); } // client Money balance1 = new Money("10.0"); Money balance2 = new Money("6.0"); balance1.add(balance2); } Now instead of asking for a value, the class has a responsibility to increase its own value. With this approach, the change request for any other datatype in future requires no change in the client code. Here not only the data is encapsulated but also the data which is stored, or even the fact that it exists at all. Conclusions Use of accessors to restrict direct access to field variable is preferred over the use of public fields, however, making getters and setter for each and every field is overkill. It also depends on the situation though, sometimes you just want a dumb data object. Accessors should be added to a field where they're really required. A class should expose larger behaviour which happens to use its state, rather than a repository of state to be manipulated by other classes. More Reading http://c2.com/cgi/wiki?TellDontAsk http://c2.com/cgi/wiki?AccessorsAreEvil Effective Java - See more at http://muhammadkhojaye.blogspot.co.uk/2010/10/getter-setter-to-use-or-not-to-use.html

September 10, 2010

by Muhammad Ali Khojaye

· 98,219 Views · 1 Like

Practical PHP Patterns: Gateway

A fundamental trait of modern software is that it does not live in isolation, especially in the realm of web applications, which can easily interact with external resources like web services and databases. The majority of PHP applications must access external resources, that by architecture do not run in the same memory segment or programming language of their core Domain Model. There are many examples of these situations: web services like Google's or Yahoo! ones. Relational and NoSQL databases. The filesystem of the server. Other web and non-web applications for data interoperability. I'll call any instance of this external dependency a resource, which is an umbrella term for each item of this list. Motivation When you have to access an external resource, you get an API which you code may call. However accessing an API directly, like a PDO object or a HTTP request stream, presents many issues. First of all, your application ends up becoming very coupled to the particular product or application instance you're using. There is no room for change, since every resource has its specific API, unless it is a commodity like a relational database. More subtly, general purpose APIs are designed as catch-all interfaces for providing any functionality, and capturing any use case from every possible client. The entire set of methods becomes a possible requirement of your application, since you cannot instantly easily distinguish the primitives really called by your application from the one ignored. Moreover, the external resource may use data formats and models different from the ones used by your application. This is the case with relational database used as a storage for object models. Implementation There is an easy solution to these interaction problems, which I feel is never pushed enough. The Gateway pattern is this solution: wrap into a single object all the interaction specifical to the integrated resource, so that your object provides a specialized API of exactly what you want, as you want. This pattern is similar to the Facade classic one, but it is applied on other people's code instead of our own. You can also compare it to an Adapter, when the Adaptee is not even object-oriented or in the same process of your application's code. By the way, this pattern is specialized by many other ones, and it can be thought of as their superclass. Wrapping Wrapping is the mechanism used for this pattern's implementation. Only the functionality needed is really exposed from the Gateway. This minimalism help the Gateway in becoming the target of integration tests or pragmatic unit tests that exercise only the functionalities actually exposed and that may cause a regression. This pattern insulate the application layer or the Domain Model from external changes. The Hexagonal Architecture is really an evolution of this pattern applied systematically to every external resource, until only an in-memory object structure stands as the core domain, and every dependency is injected as an adapter for an application's port. A Gateway can also be implemented with more than one object (back end and front end) when the work to do is both on the protocol side (procedural vs. oo, XML vs. variables) and at the workflow side (different slicing of functionalities, APIs at the wrong level of abstraction fro your use case). Advantages I'll never get done with talking of the advantage of introducing a Gateway over an external dependency. You achieve greater insulation over the dependency: changes do not spread into your system and you can test them separately and efficiently. The system is also easier to read and understand as it does not pull in the whole complexity of the resource, but only the abstraction needed by client code. Disadvantages There's hardly any downside in coding up a Gateway class, unless you introduce a leaky abstraction. Peculiarity According to Fowler, this pattern is somewhat different from the other integration-related ones, and due to these differences it has earned a name and an article here. A Facade simplifies a complex API, and it is written by the developers of the resource used. A Gateway is written by the client code developers to simplify their own job. The Facade also implies a different interface, while Gateway can simply wrap it and transform it or hiding part of it. An Adapter alters an implementation to provide a new API. With a Gateway there may not be an existing interface, or if there is, the Adapter is part of the Gateway implementation, which comprehends a back end side. A Mediator separates different objects, but Gateway is much more specialized in separating two objects and keeping the dependency side (the external resource) not aware of being used. Example Today's example is a Gateway to a web service, in the form of the classic Twitter client. For simplicity and readability we'll deal only with a single operations that does not require authentication, badly implemented with OAuth by Twitter at the time of this writing. status->text; } } // having an object to represent Twitter means we can mock it, // pass it around, injecting it, composing it... $gateway = new TwitterGateway(); // client code echo $gateway->getLastTweet('giorgiosironi'), "\n";

September 9, 2010

by Giorgio Sironi

· 11,273 Views

Server Centric Java Frameworks: Performance Comparison

These days we are used to AJAX-intensive, sophisticated web frameworks. These frameworks provide us desktop style development into the Single Page Interface (SPI) paradigm. As you know there are two main types of frameworks, client-centric and server-centric. Each approach has pros and cons. Testing the performance of Java server-centric frameworks In the server-centric view, state is managed in server. In some way the client is a sophisticated terminal of the server because most of visual decisions are taken on the server and some kind of visual rendering is done on the server (HTML generation as markup or embedded in JavaScript or more higher level code sent to the client). The main advantage is that data and visual rendering are together in the same memory space, avoiding custom client-server bridges for data communication and synchronization, typical of the client-centric approach. This article only reviews Java server-centric frameworks. In SPI, the web page is partially changed; that is, some HTML parts can be removed and some new HTML markup can be inserted. This approach obviously saves tons of bandwidth and computer power because the complete page is not rebuilt and not fully sent to the client when some page change happens. A server-centric framework to be effective must send to the client ONLY the markup going to be changed or equivalent instructions in some form, when some AJAX event hits the server. This article reviews how much effective most of the SPI Java web frameworks are on partial changes provided by the server. We are not interested in events with no server communication, that is, events with no (possible) server control. How they are going to be measured We are going to measure the amount of code that is sent to client regarding to the visual change performed in client. For instance for a minor visual change (some new data) in a component we expect not much code from server, that is, the new markup needed as plain HTML, or embedded in JavaScript, or some high level instructions containing the new data to be visualized. Otherwise something seems wrong for instance the complete component or page zone is rebuilt, wasting bandwidth and client power (and maybe server power). Because we will use public demos, we are not going to get a definitive and fine grain benchmark. But you will see very strong differences between frameworks. The testing technique is very easy and everybody can do it with no special infrastructure, we just need FireFox and FireBug. In this test FireFox 3.6.8 and FireBug 1.5.4 are used. The FireBug Console when "Show XMLHttpRequests" is enabled logs any AJAX request showing the server response. The process is simple: The Console will be enabled before loading the page with the demo. Some clicks will drive some concrete component to the desired state. A final click will perform a small change in the component being analyzed. Then we will copy the output code of the AJAX request (HTML, XML, JavaScript ...) sent from server. The more code the less effective, more bandwidth waste and client processing is needed. We cannot measure the server power used because we need a deep knowledge of how the framework works in server, said this we can easily "suspect" the more code generated in server the more server power is wasted. Frameworks tested RichFaces, IceFaces, MyFaces/Trinidad, OpenFaces, PrimeFaces, Vaadin, ZK, ItsNat ADF Faces is not tested because there is no longer a public live demo. Because ADF Faces is based on Trinidad, Trinidad analysis could be extrapolated to ADF Faces (?). Update: NO, ADF Faces are very different to Trinidad. Note before starting Some frameworks seem to perform very well (regarding to this kind of test), that is, the ratio between visual change and amount of code is acceptable, but in some concrete cases (components) they "miserably" fail. This article tries to measure bad performant components. RichFaces Console must be enabled, configured and open as seen before. Open this tree demo (Ajax switch type) Expand "Baccara" node Expand "Grand Collection" Collapse "Grand Collection" As you can see the child nodes below "Grand Collection" has been removed or hidden (FireBug's DOM inspector says they were removed). Grand Collection As you can see too much HTML code has been sent for not much of a visual change. A more severe performance penalty: Open the Extended Data Table Demo On "State Name" paste "Alaska" (paste the name from clipboard), one row is shown Paste "Alabama" replacing "Alaska" (again paste from clipboard selecting Alaska first), again one different row is shown. The answer (HTML code) is too big to put here, 3.474 bytes, if you inspect the result you will see a complete rewrite of the table including header. IceFaces Open the Calendar demo Click on any different day Something like this is the last AJAX response: The answer (XML with metadata) is really big, 6.452 bytes, for a simple day change according to visual changes. MyFaces/Trinidad Open this Tree Table Demo Expand node_0_0 Expand node_0_0_0 (node node_0_0_0_ is shown) Collapse node_0_0_0 (hides/removes node_0_0_0_0) The last AJAX response is too big to put here, 18.765 bytes, because is a complete rewrite of the tree component. Update: a live demo of ADF Faces components is here and they seem to work fine as expected, that is, the ratio between code sent to the client and visual change is "correct" (in spite of HTML layout is very verbose the code sent to the client is almost the same to be displayed). OpenFaces Open the Tree Table demo Expand "Re: Scalling an image" Expand the new child "Re: Scalling an image" The last AJAX response is Re: Scaling an imageChristian SmileAug 3, 2007" data="{"structureMap":{"0":"1"}" scripts="" /> This code is very reasonable according to the change (a new child node/table row). Nevertheless some component miserably fails: Open the Data Table demo Select "AK" as "State", resulting one row. Replace with "AR", resulting again a new row The last AJAX result is too big, 38.209 bytes, because is a complete rewrite of the table including headers. PrimeFaces The AJAX answers of all tested examples were very reasonable. Said this, PrimeFaces lacks of a "filtered table component" or similar, the Achilles's heel of other JSF implementations. Update: As Cagatay Civici (one of the fathers of PrimeFaces) points out, PrimeFaces has a filltered table, this component works fine regarding to the ratio of visual change/code sent to client (try to do the same tests as prvious frameworks). Vaadin This is the first non-JSF framework. Open the Tree single selection demo Select "Dell OptiPlex GX240" Click "Apply" button (no change is needed) This is the last AJAX answer: for(;;);[{"changes":[["change",{"format": "uidl","pid": "PID190"},["12",{"id": "PID190","immediate":true,"caption": "Hardware Inventory","selectmode": "single","nullselect":true,"v":{"action":"","selected":["2"],"expand":[],"collapse":[],"newitem":[]},["node",{"caption": "Desktops","key": "1","expanded":true,"al":["1","2"]},["leaf",{"caption": "Dell OptiPlex GX240","key": "2","selected":true,"al":["1","2"]}],["leaf",{"caption": "Dell OptiPlex GX260","key": "3","al":["1","2"]}],["leaf",{"caption": "Dell OptiPlex GX280","key": "4","al":["1","2"]}]],["node",{"caption": "Monitors","key": "5","expanded":true,"al":["1","2"]},["leaf",{"caption": "Benq T190HD","key": "6","al":["1","2"]}],["leaf",{"caption": "Benq T220HD","key": "7","al":["1","2"]}],["leaf",{"caption": "Benq T240HD","key": "8","al":["1","2"]}]],["node",{"caption": "Laptops","key": "9","expanded":true,"al":["1","2"]},["leaf",{"caption": "IBM ThinkPad T40","key": "10","al":["1","2"]}],["leaf",{"caption": "IBM ThinkPad T43","key": "11","al":["1","2"]}],["leaf",{"caption": "IBM ThinkPad T60","key": "12","al":["1","2"]}]],["actions",{},["action",{"caption": "Add child item","key": "1"}],["action",{"caption": "Delete","key": "2"}]]]]], "meta" : {}, "resources" : {}, "locales":[]}] It seems not very much, but if you review the code the entire tree is being rebuilt again. ZK Another non-JSF framework. In the last versions ZK embrace an hybrid approach, most of the visual logic is in client as JavaScript components, the server sends to the client high level commands to the high level client library (Vaadin is not different). I have not found a component sending too much code from client (according to the visual change) in ZK's demo. ItsNat The last framework studied, again a non-JSF framework. In ItsNat the server keeps the same DOM state as in client and through DOM mutation events any change to the DOM in server automatically generates the JavaScript necessary to update the client accordingly. Open the demo Click on the handler of "Core" folder, the child nodes (11) are hidden. Result code of the AJAX event: itsNatDoc.addNodeCache(["cn_10","cn_14","0,1,1,0,0",["cn_15","cn_16"]]); itsNatDoc.setAttribute2("cn_14","src","img/tree/tree_node_collapse.gif"); itsNatDoc.setAttribute2(["cn_15","cn_17","1"],"src","img/tree/tree_folder_close.gif"); itsNatDoc.setAttribute2(["cn_16","cn_18","1,0",["cn_19"]],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_20","1"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_21","2"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_22","3"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_23","4"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_24","5"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_25","6"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_26","7"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_27","8"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_28","9"],"style","display:none"); itsNatDoc.setAttribute2(["cn_19","cn_29","10"],"style","display:none"); No surprises. Another test Open this Tree demo Click on "Insert Child". A new child node ("Actors") is inserted and a new log message is added. AJAX result code: itsNatDoc.removeAttribute2(["cn_15","cn_39","13"],"style"); itsNatDoc.setInnerHTML2("cn_39"," clickjavax.swing.event.TreeModelEvent 13802934 path [Grey's Anatomy, Actors] indices [ 4 ] children [ Actors ]"); var child = itsNatDoc.doc.createElement("li"); itsNatDoc.setAttribute(child,"style","padding:1px;"); itsNatDoc.appendChild2(["cn_17","cn_40","0,1,1,1",["cn_41","cn_42"]],child); itsNatDoc.setInnerHTML(child,"Label\n \n "); itsNatDoc.setTextData2(["cn_40","cn_43","4,0,2,0",["cn_44","cn_45"]],null,"Actors"); itsNatDoc.setAttribute2(["cn_45","cn_46","0"],"style","display:none"); itsNatDoc.setAttribute2(["cn_45","cn_47","1"],"src","img/tree/gear.gif"); Again no surprises. And the Winner is... There is no winner because only some components have been tested. Having said this, apparently the only JSF implementation free of serious performance penalties is PrimeFaces. In non-JSF frameworks using a very high level JS library like in Vaadin or ZK (PrimeFaces?) helps very much to reduce the network bandwidth (in spite of the fact that some components in Vaadin have serious performance problems), this cannot be said for client performance because in ItsNat the exact JS DOM code is sent to the client. On the other side a high level JS library complicates custom component development (beyond composition) because the server does not help very much but this is another story, and another article.

September 7, 2010

by Jose Maria Arranz

· 23,053 Views

Let's Create... Our Own SQL Editor

Isn't it time you gained full control of your SQL work environment? Stop being limited by the tools foisted upon you and start creating your own. Not hard at all, either. Here's a complete tutorial for creating your very own SQL editor, which will look like this: OK. Now, let's create it from scratch. Start up NetBeans IDE and use this template to create a basis for your application. Just click through it and you'll have new folders and files on disk that represent your project: When you've clicked Next above, you'll be able to provide the name of your project: And when you click Finish, the Projects window will show you your application structure: You've now got a basic application that includes all the infrastructure you need (a module system, window system, file system, actions system, and more), without any content. Let's now add the content. Now right-click the "SQLEditor" node above (i.e., the orange icon) and choose Properties. In the Project Properties dialog, expand the "java" node and then include the SQL Editor: Click "Resolve" above and the IDE will include all the related modules. I.e., the SQL Editor module depends on other modules. Via the "Resolve" button, those dependencies will be identified and registered in your project. Next, let's include support for Java DB: Click "Resolve" again. Hurray, we're done. All the functionality for our own SQL editor is now available in our application. Now we'll add a new module, just so that we can perform a few tweaks to our application. In other words, this will be a branding module. Right-click the "Modules" node and choose "Add New": Name it something, such as "SQLBranding": Provide a unique identifier for your new module and make sure to include a layer.xml file, which you'll use to mask out the default menus and toolbars you don't need in your application: Click Finish above. Then right-click on the main package that is created in the module and choose New | Other. There you'll be able to create a new Module Install class, which will initialize the module when the application starts up: What we want is to force the Services window in the application (i.e., this is a window in NetBeans IDE for working with databases) to open when the application starts. So, we will provide code in the Module Install class (which you created above) for finding that window and opening it. The code we will need comes from the Window System API. Right-click the Libraries node in the module, as shown below, and choose "Add Module Dependency": Then browse to Window System API and click OK: Tip: In the Projects window, right-click the module's "Libraries" node. Choose "Add Dependency" and set a dependency on the "Window System API". That's what we need to use the window system code in the snippet below: Now, in the Module Install class, provide the following code: public class Installer extends ModuleInstall { @Override public void restored() { WindowManager.getDefault().invokeWhenUIReady( new Runnable() { @Override public void run() { TopComponent svcWindow = WindowManager.getDefault(). findTopComponent("services"); svcWindow.open(); svcWindow.requestActive(); } }); } } Now the window we need will be forced to open when the application starts. Let's turn to some other ancillary matters now. We can change the default splash screen, via "Branding", which is a menu item that you see when you right-click on the application's node in the Projects window, producing the Branding Editor below: And we can search all the strings in the modules that come from the NetBeans Platform, so that we can change the string "Services" to "Databases", for example. Or to some other custom string. You can also hide the menu items and toolbar buttons that you don't need and perform similar wrap-up tasks to really customize the application to your specific business needs. Let's now, just for fun, also include a file browser in our application. So, back in the Project Properties dialog of your application, choose Favorites under the "platform" node. While you're there, also enable the two AutoUpdate modules, so that the end user will be able to install plugins (i.e., new features and patches) that you or the community of your SQL editor will provide: The application is now complete. Let's create a ZIP distribution for our end users, while noticing we can also create a Mac distribution or one for web starting the application: After doing the above, the Files window shows your new ZIP distribution: If you prefer, you can also create an installer for your application: Once the application is unzipped or installed, click the launcher in the bin folder. Then you'll have the application with which this article started. Look in the Tools menu and, guess what? You find that you have a "Plugins" menu item, enabling extensions (i.e., features and patches) to be installed into the application. Many thanks to Tim Sparg from CoreFreight in Johannesburg for inspiring this article.

September 4, 2010

by Geertjan Wielenga

· 24,436 Views · 1 Like

ExtJS, Spring MVC 3 and Hibernate 3.5: CRUD DataGrid Example

this tutorial will walk through how to implement a crud (create, read, update, delete) datagrid using extjs, spring mvc 3 and hibernate 3.5. what do we usually want to do with data? create (insert) read / retrieve (select) update (update) delete / destroy (delete) until extjs 3.0 we only could read data using a datagrid. if you wanted to update, insert or delete, you had to do some code to make these actions work. now extjs 3.0 (and newest versions) introduces the ext.data.writer, and you do not need all that work to have a crud grid. so… what do i need to add in my code to make all these things working together? in this example, i’m going to use json as data format exchange between the browser and the server. extjs code first, you need an ext.data.jsonwriter: // the new datawriter component. var writer = new ext.data.jsonwriter({ encode: true, writeallfields: true }); where writeallfields identifies that we want to write all the fields from the record to the database. if you have a fancy orm then maybe you can set this to false. in this example, i’m using hibernate, and we have saveorupate method – in this case, we need all fields to updated the object in database, so we have to ser writeallfields to true. this is my record type declaration: var contact = ext.data.record.create([ {name: 'id'}, { name: 'name', type: 'string' }, { name: 'phone', type: 'string' }, { name: 'email', type: 'string' }]); now you need to setup a proxy like this one: var proxy = new ext.data.httpproxy({ api: { read : 'contact/view.action', create : 'contact/create.action', update: 'contact/update.action', destroy: 'contact/delete.action' } }); fyi, this is how my reader looks like: var reader = new ext.data.jsonreader({ totalproperty: 'total', successproperty: 'success', idproperty: 'id', root: 'data', messageproperty: 'message' // <-- new "messageproperty" meta-data }, contact); the writer and the proxy (and the reader) can be hooked to the store like this: // typical store collecting the proxy, reader and writer together. var store = new ext.data.store({ id: 'user', proxy: proxy, reader: reader, writer: writer, // <-- plug a datawriter into the store just as you would a reader autosave: false // <-- false would delay executing create, update, //destroy requests until specifically told to do so with some [save] buton. }); where autosave identifies if you want the data in automatically saving mode (you do not need a save button, the app will send the actions automatically to the server). in this case, i implemented a save button, so every record with new or updated value will have a red mark on the cell left up corner). when the user alters a value in the grid, then a “save” event occurs (if autosave is true). upon the “save” event the grid determines which cells has been altered. when we have an altered cell, then the corresponding record is sent to the server with the ‘root’ from the reader around it. e.g if we read with root “data”, then we send back with root “data”. we can have several records being sent at once. when updating to the server (e.g multiple edits). and to make you life even easier, let’s use the roweditor plugin, so you can easily edit or add new records. all you have to do is to add the css and js files in your page: add the plugin on you grid declaration: var editor = new ext.ux.grid.roweditor({ savetext: 'update' }); // create grid var grid = new ext.grid.gridpanel({ store: store, columns: [ {header: "name", width: 170, sortable: true, dataindex: 'name', editor: { xtype: 'textfield', allowblank: false }, {header: "phone #", width: 150, sortable: true, dataindex: 'phone', editor: { xtype: 'textfield', allowblank: false }, {header: "email", width: 150, sortable: true, dataindex: 'email', editor: { xtype: 'textfield', allowblank: false })} ], plugins: [editor], title: 'my contacts', height: 300, width:610, frame:true, tbar: [{ iconcls: 'icon-user-add', text: 'add contact', handler: function(){ var e = new contact({ name: 'new guy', phone: '(000) 000-0000', email: '[email protected]' }); editor.stopediting(); store.insert(0, e); grid.getview().refresh(); grid.getselectionmodel().selectrow(0); editor.startediting(0); } },{ iconcls: 'icon-user-delete', text: 'remove contact', handler: function(){ editor.stopediting(); var s = grid.getselectionmodel().getselections(); for(var i = 0, r; r = s[i]; i++){ store.remove(r); } } },{ iconcls: 'icon-user-save', text: 'save all modifications', handler: function(){ store.save(); } }] }); java code finally, you need some server side code. controller: package com.loiane.web; @controller public class contactcontroller { private contactservice contactservice; @requestmapping(value="/contact/view.action") public @responsebody map view() throws exception { try{ list contacts = contactservice.getcontactlist(); return getmap(contacts); } catch (exception e) { return getmodelmaperror("error retrieving contacts from database."); } } @requestmapping(value="/contact/create.action") public @responsebody map create(@requestparam object data) throws exception { try{ list contacts = contactservice.create(data); return getmap(contacts); } catch (exception e) { return getmodelmaperror("error trying to create contact."); } } @requestmapping(value="/contact/update.action") public @responsebody map update(@requestparam object data) throws exception { try{ list contacts = contactservice.update(data); return getmap(contacts); } catch (exception e) { return getmodelmaperror("error trying to update contact."); } } @requestmapping(value="/contact/delete.action") public @responsebody map delete(@requestparam object data) throws exception { try{ contactservice.delete(data); map modelmap = new hashmap(3); modelmap.put("success", true); return modelmap; } catch (exception e) { return getmodelmaperror("error trying to delete contact."); } } private map getmap(list contacts){ map modelmap = new hashmap(3); modelmap.put("total", contacts.size()); modelmap.put("data", contacts); modelmap.put("success", true); return modelmap; } private map getmodelmaperror(string msg){ map modelmap = new hashmap(2); modelmap.put("message", msg); modelmap.put("success", false); return modelmap; } @autowired public void setcontactservice(contactservice contactservice) { this.contactservice = contactservice; } } some observations: in spring 3, we can get the objects from requests directly in the method parameters using @requestparam. i don’t know why, but it did not work with extjs. i had to leave as an object and to the json-object parser myself. that is why i’m using a util class – to parser the object from request into my pojo class. if you know how i can replace object parameter from controller methods, please, leave a comment, because i’d really like to know that! service class: package com.loiane.service; @service public class contactservice { private contactdao contactdao; private util util; @transactional(readonly=true) public list getcontactlist(){ return contactdao.getcontacts(); } @transactional public list create(object data){ list newcontacts = new arraylist(); list list = util.getcontactsfromrequest(data); for (contact contact : list){ newcontacts.add(contactdao.savecontact(contact)); } return newcontacts; } @transactional public list update(object data){ list returncontacts = new arraylist(); list updatedcontacts = util.getcontactsfromrequest(data); for (contact contact : updatedcontacts){ returncontacts.add(contactdao.savecontact(contact)); } return returncontacts; } @transactional public void delete(object data){ //it is an array - have to cast to array object if (data.tostring().indexof('[') > -1){ list deletecontacts = util.getlistidfromjson(data); for (integer id : deletecontacts){ contactdao.deletecontact(id); } } else { //it is only one object - cast to object/bean integer id = integer.parseint(data.tostring()); contactdao.deletecontact(id); } } @autowired public void setcontactdao(contactdao contactdao) { this.contactdao = contactdao; } @autowired public void setutil(util util) { this.util = util; } } contact class – pojo: package com.loiane.model; @jsonautodetect @entity @table(name="contact") public class contact { private int id; private string name; private string phone; private string email; @id @generatedvalue @column(name="contact_id") public int getid() { return id; } public void setid(int id) { this.id = id; } @column(name="contact_name", nullable=false) public string getname() { return name; } public void setname(string name) { this.name = name; } @column(name="contact_phone", nullable=false) public string getphone() { return phone; } public void setphone(string phone) { this.phone = phone; } @column(name="contact_email", nullable=false) public string getemail() { return email; } public void setemail(string email) { this.email = email; } } dao class: package com.loiane.dao; @repository public class contactdao implements icontactdao{ private hibernatetemplate hibernatetemplate; @autowired public void setsessionfactory(sessionfactory sessionfactory) { hibernatetemplate = new hibernatetemplate(sessionfactory); } @suppresswarnings("unchecked") @override public list getcontacts() { return hibernatetemplate.find("from contact"); } @override public void deletecontact(int id){ object record = hibernatetemplate.load(contact.class, id); hibernatetemplate.delete(record); } @override public contact savecontact(contact contact){ hibernatetemplate.saveorupdate(contact); return contact; } } util class: package com.loiane.util; @component public class util { public list getcontactsfromrequest(object data){ list list; //it is an array - have to cast to array object if (data.tostring().indexof('[') > -1){ list = getlistcontactsfromjson(data); } else { //it is only one object - cast to object/bean contact contact = getcontactfromjson(data); list = new arraylist(); list.add(contact); } return list; } private contact getcontactfromjson(object data){ jsonobject jsonobject = jsonobject.fromobject(data); contact newcontact = (contact) jsonobject.tobean(jsonobject, contact.class); return newcontact; } ) private list getlistcontactsfromjson(object data){ jsonarray jsonarray = jsonarray.fromobject(data); list newcontacts = (list) jsonarray.tocollection(jsonarray,contact.class); return newcontacts; } public list getlistidfromjson(object data){ jsonarray jsonarray = jsonarray.fromobject(data); list idcontacts = (list) jsonarray.tocollection(jsonarray,integer.class); return idcontacts; } } if you want to see all the code (complete project will all the necessary files to run this app), download it from my github repository: http://github.com/loiane/extjs-crud-grid-spring-hibernate this was a requested post. i’ve got a lot of comments from my previous crud grid example and some emails. i made some adjustments to current code, but the idea is still the same. i hope i was able answer all the questions. happy coding! from http://loianegroner.com/2010/09/extjs-spring-mvc-3-and-hibernate-3-5-crud-datagrid-example/

September 3, 2010

by Loiane Groner

· 100,953 Views · 1 Like

The different kinds of testing

Automated testing supports your constant effort in design and refactoring, and besides that ensures that your application actually works in a reliable and repeatable way. Tests at every level of detail are a form of executable specification and documentation. They give you immediate feedback and confidence that your code works, plus a satisfying green bar many times a day. I've been consulting on a Zend Framework application, with the goal of repairing the test suite and expanding it. In this article I'll describe the different categories of testing, as applied to a Zend Framework 1 application, but this classification pertains to every web application based on object-oriented programming. Since this kind of applications is obviously PHP-based, PHPUnit will be the tool of choice along with some of its standard extensions. For a panoramic of PHPUnit and its features, feel free to download my free ebook on the subject, which condenses much of the technical informations about it to a mere 50 pages. Let's start with the most debated and simple kind of testing - the one at the unit level. Unit testing Each unit tests target a unit of code in isolation - usually a class, and thus one or more objects instantiated from this class. The isolation property is what defines a unit test: its code must not have dependencies on other classes than the one under test, since they should be tested independently, by their own test classes. Since PHPUnit models a test case for a production code class as another class extending PHPUnit_Framework_TestCase, implementing unit testing leads very often to a parallel hierarchy of classes, where every Foo_Bar class has a corresponding Foo_BarTest test case. Given these premises, a unit test that fails tells you immediately where the error is: in the class it exercises. Moreover, it will be very fast to execute, since it works on only a single object at the time. Unit test should target mostly your models, and any code written by you that is not framework-specified: these would also be the classes that contain the majority of the business logic, and the most interesting to test. This code is usually composed of Plain Old PHP Objects and of subclasses of framework or library base classes when when they leave no other choice for integration. For writing unit tests, usually no external library other than PHPUnit is necessary. In a Zend Framework application you can usually reuse the bootstrap files, which set up things like autoloading, in the phpunit --bootstrap option or by defining it in the phpunit.xml configuration file. This way it will be executed only once for each test suite run. I prefer to leave initialization of the single components to test in the test cases itself, to ensure maximum isolation. However, a simpler and standard solution is to just run the whole Bootstrap class, with a custom configuration (application/config/application.ini), which 'testing' environment section is created by default by Zend_Tool. Pragmatic unit testing That's not a standard name. In some cases, you should also be pragmatic: you cannot usually mock all the external resources, nor you should since mocking a contract which you can't change can lead you to madness. You should configure a lightweight version of your dependencies and test with them. For example, if you're using the Doctrine Object-Relational Mapper, you must test the interaction with the database somewhere, and mocking the whole Doctrine infrastructure will be prohibitive and unuseful. The standard practice here is to use the real Doctrine infrastructure to test database-coupled classes, like Repositories and Data Access Objects, but to instantiate a lightweight database like an sqlite in-memory one which is much faster in its operations than a production one. This database can then be discarded or truncated at the end of each test to ensure no global state is shared between test cases. The downside in this approach is that sqlite is not the real database; one time I was testing with it and due to a bug (feature?) in Doctrine 1 the code failed in MySQL while passing with Sqlite. The reason was sqlite does not support foreign key constraints and was simply ignoring them, while MySQL correctly throwed exceptions when they were violated. Moreover, these tests are never fast as the ones totally isolated from external libraries. The upside is that the tests for classes interacting with the database via Doctrine or another ORM still have the benefit of the unit level: when the test fail, it is clear that the related production code class has encountered a regression, because the ORM code is only imported in discrete, distant points of time, when the test suite is green, and so could never change while you're expanding your code. Nevertheless this kind of testing should be applied only to the adapters of your application, which constitute the boundary of the object graph towards external components like databases, web services or the filesystem. Functional testing Functional testing's goal is to exercise a medium-sized object graph, without instantiating the whole application, to a cover a full functionality and make sure the classes adhere to the same contract. For example, these tests can target a service layer built upon your Domain Model, if you want to enhance to cover your factories or DI mechanisms. In other cases, they can target the controllers: this happens when you have supplemental logic on the client side. In case of functional testing on plain old classes, PHPUnit suffices again. In case you target controllers instead, the Zend_Test component gives you a Zend_Test_PHPUnit_ControllerTestCase class which you can extend to gain helper functionalities. Basically, every test method of a Zend_Test test case makes at least a HTTP request. The helper test case sets up a fake HTTP request and response objects in every setUp(), and lets you check the result, being it written HTML (via querying and asserting), XML or JSON. Integration testing Integration tests target an external component such as a library to ensure the expectations of the developers on it are met. Integration tests are usually started as exploratory tests, which are used to learn about the library and to encapsulate this knowledge into a repeatable, executable form. With time, they become regression tests, which allow you to upgrade the library to a new release or version by catching the changes in behavior. Some of these tests target the PHP runtime itself, to check for example that an extension assumed as present is really available. For example, this week we were surprised when a === check inside a Domain Model class was failing. We started writing integration tests for Doctrine_Query, and it turned out that PDO and Doctrine returned strings for numeric fields on their Active Record. By having a specific test to cover our expectations, we understood where our assumption was wrong, and cease to suspect a bug in our own code where the === resided. For this kind of test, again only PHPUnit is necessary; moreover, you'll have to bootstrap the involved library, but it can be simply a matter of adding it to the include_path. Acceptance testing Acceptance tests are end-to-end tests, which see the application as a black box. They exercise the behavior of the whole application, from the user inserting data to the reports created and the actions performed as a consequence. These tests are much slower, but they work on the end result of your work, and define what the user will see and interact with. For old-style applications, which do not involve rich clients, Zend_Test is usually enough for these kinds of tests. A thin layer of CSS expression built over it in order to check the pages without duplicating the same selectors all over the suite may help. However, for Javascript-rich apps, a tool like Selenium is necessary. Selenium drives a real web browser to a fresh instance of your application, and execute your tests, which can be defined manually or via a record-and-replay browser extension. Many PHPUnit extensions offer the means for connecting to a Selenium server, which manages the browsers, and navigate the web application. As a result of its focus on real web browsers such as Firefox and Chrome, Selenium tests are much slower than Zend_Test ones. However, they are the only tool available to execute acceptance tests which involve JavaScript. Conclusion Note that everyone of these kinds of tests (except the integration ones) can be written before the production code it exercises. Unit tests ahead of their referred class; functional tests ahead of the Facade they target; acceptance tests before a whole vertical slice of functionality is implemented. Moreover, if you're doing Test-Driven Development you should in general start at the higher level of abstraction (acceptance) and descending into the lower levels as needed. These different types of testing are always present, maybe as a small part of the suite, in every web application of moderate size. Learning to recognize them when they emerge will help you organizing the test suite better and maintaining it productive and responsive to change.

August 29, 2010

by Giorgio Sironi

· 31,225 Views

Practical PHP Patterns: Optimistic Offline Lock

In this series we are now entering the realm of concurrency, an option which adds complexity to an application as many different threads of execution are accessing the state storage at the same time. There is no native multithreading support in PHP (every script gets its own isolated process), but still concurrency can easily become an issue: multiple clients from all over the world continuosly make requests to PHP applications, and they can easily mutually overwrite their changesets. A classic example of race condition in the PHP world is two different clients filling an editing form referred to the same entity. They both will submit the form once it is complete, and the first one will get his changes overwritten by the second request. There are many other common situations were concurrent user can provoke errors in the system. Just think of different people choosing the same username and being told via Ajax that it is available; the slower one of them will be surprised when he submits his registration form. Background Before entering the explanation of patterns emerged to solve the concurrency problems, we need some definitions. First of all the notion of transaction is necessary: we define a transaction as a change of state of the application. Editing an entity is a transaction; adding or deleting another is still a transaction. The first kind of transaction we are interested in is the database transactions, which is totally accomplished in one PHP script. This is usually automatically enforced by mechanisms supported at the database level. The other kind of transaction is the business transaction, which spans over multiple HTTP requests and makes uses of from one to N database transactions. It comprehends checking out data, populating a form or other kind of rich user interface, modifying or adding data (a human-based action), and sending it back to the server. There is no automatic enforcement for business transaction, since they are defined by the business rules of the domain. This is not a problem that originates because of PHP nature, but because of the separation between client and server which the web is based on. Optimistic lock The Optimistic Offline Lock pattern is a way of ensuring integrity of data, avoiding the option that different clients submit conflicting changes. As the name suggests, it assumes that the chances of conflict are low. Indeed, when this is the case the optimistic lock does not slow down the user interaction a bit. The goal of this pattern is detecting a conflicting change and instead of applying it, rollback the business transaction and present an error to the user. It accomplishes this goal by validating that no one else has tampered with a record in the data source prior to allowing the modification to be committed. All open source version control systems such as Subversion and Git implement optimistic lock: anyone may check out a source file and work on it, to end his little fork later with a commit or push. The pain comes while merging, so you are supposed to integrate often. We also borrowed terminology from the source control systems, so in this article you'll encounter terms like commit, checkout, and merge. Implementation The most common implementation of Optimistic Offline Lock is a numerical version field on the record to protect from condurrency issues; to aid rollback notifications, other additional fields are useful for signaling the conflict, like the id of the last user that modified the record. The pattern inner working is not complex: when the data is submitted along with the version field value kept on the client, the version field in it must be the same currently present in the database. Only then it is incremented and the changeset committed. Encountering a different version field value in the database record means someone else has modified the data in between our checkout and commit, and so it must be preserved. For example we can show a diff to the user, like VCS does; in any case, we should interrupt the transaction. RDBMS and ORMs can simply use an additional column on the table where the root object is stored to support this pattern. Alternatives An alternative implementation consists in using all fields in the WHERE clause of the UPDATE (or only the sensible ones, or only the modified ones to let transactions that affect different field succeed when the business logic allows it. See below). This solution is handy when you can't add a version field, but it may have performance impact. Another alternative is to check conditions instead of version fields, which is practical in different use cases. For example, we can check the existence of a record before deleting it. This is already indirectly done to a cartain extent by ORMs and other abstraction layers when they provide you with an object abstraction you can calla delete() method on. An extension to the functionality of this pattern is checking that a current editing will (probably) commit, as a feature available at any time during editing of the data. This feature should check that the checkout data is still current. The domain An important is that it is part of the job of business domain logic to decide when a conflict occurs: some concurent changesets may be acceptable, while others may not be allowed even if they modify different fields. In his book, Fowler makes the example of adding elements to a collection concurrently. We can't know if this is right by seeing that the object is a collection, because it is the abstraction that it represent that must be maintained valid: sometimes it is right to add elements, sometimes the transaction should be stopped. Also merging strategies, which solve the conflicts, are subsceptible to domain considerations. Some are valuable and should be pursued, while some are costly and a rollback with manual user editing of the data is fine. Advantages The greatest advantage of this pattern is that it can support real time concurrency, like the check out of multiple items by multiple users simultaneously, as long as there is a merging strategy in place. It can also easily prevent race conditions. This pattern is also easy to implement, and thus it is the default choice to solve concurrency issues. Disadvantages When the conflict probability is high, since there are many concurrent transactions, this pattern produces too many rollbacks. It is not adequate for use cases where a pessimistic pattern should be adopted. Examples Doctrine 2 uses natively database transactions: it only commits the changes made in a PHP script when the EntityManager::flush() method is called. It automatically rolls back if an error is detected. Besides that, Doctrine 2 has also automatic Optimistic Offline Locking support, via the addition of a version field to the entity to lock.

August 17, 2010

by Giorgio Sironi

· 4,470 Views · 1 Like

Practical PHP Patterns: Data Transfer Object

Information can travel in very different ways between the parts of an applications, which could be tricky when these parts are distributed on different tiers, or times. The Data Transfer Object pattern prescribes to use a first-class citizen like a class to model the data used in the communication. This pattern was originally meant to aid logic distribution: coarse grained interfaces, which are ideal for remote interaction, must return more data in each call to reduce the number of messages sent over the network and their overhead. This kind of objects is built with the goal of be easy to serialize and send over the wire. In the PHP world, the communication is usually either from server to server, or even targets the same server: PHP is a lot different from other languages which are used for application distribution. Commonly the serialization and unserialization of objects are accomplished by the same codebase, which has a short life (the time of an HTTP request) and stores data with the serialization mechanism to maintain state between different requests. As a result, the usual dependencies discourse, which prescribes to use a Data Mapper between the Domain Model and the DTOs, do not apply; to the point that sometimes even domain model objects are serialized directly (an ORM like Doctrine 2 allows you to do so). There are more differences with classic Java DTOs which we'll see in this article. Implementation The definition of Data Transfer Object talks about communication between different processes: subsequent executions of the same PHP script are indeed different processes (or, when the same process is reused, it does not share any variable space with the previous executions). PHP is peculiar also from the implementation point of view: a DTO may not even be an object, since an ordinary or multidimensional array will do the same job in many cases. However, an object implementation is necessary to take advantage of object handlers, where the structure of data is not hierarchical but involves a connected graph of objects or recursive data structures. This implementation choice is much more clear than using arrays and variable references, which can be tricky in PHP. Basically, a DTO is the object which we have been told to never write: it has getters and setters to expose allof its properties, while providing nearly no encapsulation nor business logic. It's a data structure, like a Value Object (which was its name in certain Java literature). But it is not immutable nor it carries the semantic meaning of Value Objects. Use cases Data Transfer Objects come handy in many use cases. In their simplest implementation, they are used for returning multiple values from a method. Their further evolution involves then serialization mechanisms; of course both the provenience and the destination of a Data Transfer Object need to be a PHP environment, a fact which opens different scenarios from classical Java distribution ones, that involve clients well. For example a Data Transfer object, easily serializable, can be stored in caches (memcache) or sessions ($_SESSION). Other use cases regard databases: according to Fowler, Record Sets like PDOStatement can be defined as the Data Transfer Objects of relational databases. There are even more scenarios for Data Transfer Object, like clients which return a serialized data structure, or their usage for breaking dependencies between different layers: a Controller may populate a DTO and pass it to the View. Serialization As you may have experience while using var_dump() in PHP, the objects of a Domain Model are usually interconnected in a complex graph; DTOs isolate a subset of the graph and make it easy to serialize, even by omitting part of the information. Thus, serialization of domain object needs some complex infrastructure (lazy loading proxies which are discarded on serialization and must be re-initialized during the merge with a current object graph); sometimes is by far easier to extract data in a DTO, especially when you do not have particular libraries at your disposal. Once you have an isolated object graph, PHP will handle serialization by itself, even without marker interfaces; it will simply include all the public, protected and private properties (this behavior can be modified by specifying a __sleep() method). Example This running code shows a small domain model composed of the User and Group classes, and how a DTO for the User class allows to serialize a User without pulling in its Groups, for example for quickly storing it in a cache. _name; } public function setName($name) { $this->_name = $name; } /** * @return string */ public function getRole() { return $this->_role; } public function setRole($role) { $this->_role = $role; } public function addGroup(Group $group) { $this->_group = $group; } public function getGroups() { return $this->_groups; } } /** * Another Domain Model class. */ class Group { private $_name; /** * @return string */ public function getName() { return $this->_name; } public function setName($name) { $this->_name = $name; } } /** * The Data Transfer Object for User. * It stores the mandatory data for a particular use case - for example ignoring the groups, * and ensuring easy serialization. */ class UserDTO { /** * In more complex implementations, the population of the DTO can be the responsibilty * of an Assembler object, which would also break any dependency between User and UserDTO. */ public function __construct(User $user) { $this->_name = $user->getName(); $this->_role = $user->getRole(); } public function getName() { return $this->_name; } public function getRole() { return $this->_role; } // there are no setters because this use cases does not require modification of data // however, in general DTOs do not need to be immutable. } // client code $user = new User(); $user->setName('Giorgio'); $user->setRole('Author'); $user->addGroup(new Group('Authors')); $user->addGroup(new Group('Editors')); // many more groups $dto = new UserDTO($user); // this value is what will be stored in the session, or in a cache... var_dump(serialize($dto));

August 12, 2010

by Giorgio Sironi

· 95,357 Views · 5 Likes

Code Generation With Xtext

Recently I attended a local rheinJUG meeting in Düsseldorf. While the topic of the session was Eclipse e4, the night’s sponsor itemis provided some handouts on Xtext which got me very interested. The reason is that currently at work we are developing a mobile Java application (J9, CDC/Foundation 1.1 on Windows CE6) for which we needed an easy to use and reliable way for configuring navigation through the application. In a previous iteration we had – mostly because of time constraints – hard coded most of the navigational paths, but this time the app is more complex and doing that again was not really an option. First we thought about an XML based configuration, but this seemed to be a hassle to write (and read) and also would mean we would have to pay the price of parsing it on every application startup. Enter Xtext: An Eclipse based framework/library for building text based DSLs. In short, you just provide a grammar description of a new DSL to suit your needs and with – literally – just a few mouse clicks you are provided with a content-assist, syntax-highlight, outline-view-enabled Eclipse editor and optionally a code generator based on that language. Getting started: Sample Grammar There is a nice tutorial provided as part of the Xtext documentation, but I believe it might be beneficial to provide another example of how to put a DSL to good use. I will not go into every step in great detail, because setting up Xtext is Eclipse 3.6 Helios is just a matter of putting an Update Site URL in, and the New Project wizard provided makes the initial setup a snap. I assume, you have already set up Eclipse and Xtext and created a new Xtext project including a generator project (activate the corresponding checkbox when going through the wizard). In this post I am assuming a project name of com.danielschneller.navi.dsl and a file extension of .navi. When finished we will have the infrastructure ready for editing, parsing and generating code based on files like these: navigation rules for MyApplication mappings { map permission AdminPermission to "privAdmin" map permission DataAccessPermission to "privData" map coordinate Login to "com.danielschneller.myapp.gui.login.LoginController" in "com.danielschneller.myapp.login" map coordinate LoginFailed to "com.danielschneller.myapp.gui.login.LoginFailedController" in "com.danielschneller.myapp.login" map coordinate MainMenu to "com.danielschneller.myapp.gui.menu.MainMenuController" in "com.danielschneller.myapp.menu" map coordinate UserAdministration to "com.danielschneller.myapp.gui.admin.UserAdminController" in "com.danielschneller.myapp.admin" map coordinate DataLookup to "com.danielschneller.myapp.gui.lookup.LookupController" in "com.danielschneller.myapp.lookup" } navigations { define navigation USER_LOGON_FAILED define navigation USER_LOGON_SUCCESS define navigation OK define navigation BACK define navigation ADMIN define navigation DATA_LOOKUP } navrules { from Login on navigation USER_LOGON_FAILED go to LoginFailed on navigation USER_LOGON_SUCCESS go to MainMenu from LoginFailed on navigation OK go to Login from MainMenu on navigation ADMIN go to UserAdministration with AdminPermission on navigation DATA_LOOKUP go to DataLookup with DataAccessPermission from UserAdministration on navigation BACK go to MainMenu from DataLookup on navigation BACK go to MainMenu } As you can see it is a nice little language for defining coordinates in an application, meaning a specific GUI for a certain task and the possible navigation paths between them. Optionally a navigation path can be tagged to require one or more permissions to work. So for example one possible navigation path shown in the above sample is from the applications main menu, identified by the identifier MainMenu and represented in code by the com.danielschneller.myapp.gui.menu.MainMenuController class in the com.danielschneller.myapp.menu OSGi bundle to a GUI identified as DataLookup, implemented by com.danielschneller.myapp.gui.lookup.LookupController in the com.danielschneller.myapp.lookup bundle. For this path to be taken, the application must request the DataLookup navigation path and the currently logged in user be assigned the DataAccessPermission. What exactly that means is not the focus of this tutorial, suffice it to say that we somehow need to get the information contained in this specialized language into our Java application in some shape or form that can be evaluated at runtime. In the following example all information will be transformed into a HashMap based data structure. For our little mobile application this has several advantages over the XML option mentioned earlier: No XML parsing necessary on application startup, saving some performance Validation of the navigation rules ahead of time, preventing parse errors at runtime No libraries needed to access the information – by putting everything in a simple HashMap we do not have to rely on any non-standard classes whatsoever First thing I did when I started with Xtext was define a sample input file such as the one above. Then – following its general structure – I began to extract a formal grammar for it. Of course, the first draft of the sample data was not perfect, over the course of a few iterations I refined some of the syntax, but in the end this is the grammar definition I came up with. It is heavily commented to allow you to copy it out and still leave the documentation intact: grammar com.danielschneller.navi.NavigationRules with org.eclipse.xtext.common.Terminals generate navigationRules "http://com.danielschneller/fw/funkmde/navi/NavigationRules" /* * The top level entry point for the file. * "Root" is just a name as good as any, but * makes the meaning quite clear. */ Root: // first thing in the file is a "keyword", // followed by an attribute that will be // accessible as "name" later and allow // definition of an ID type of thing. 'navigation rules for' name=ID // after the keyword and "name" attribute // three sections follow, each assigned // to an attribute for later reference // (called "mappingdefs", "transitiondefs" // and "ruledefs"). // Their types are defined later in the file. mappingsdefs=Mappings transitiondefs=TransitionDefinitions ruledefs=NavigationRules // semicolon ends the definition of "Root" ; // mappings section >>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* * Definition of the "Mappings" type used in * the "Root" type. */ Mappings: // first the keyword "mappings" is expected, // then an open curly 'mappings' '{' // after that a collection of "Mapping"s is // expected. The "+=" means that they will // all be collected in a collection type element // called "mappings" for future reference. // The "+" at the end means "at least one, but // more is just fine". (mappings+=Mapping)+ // finally the "Mappings" type requires a closing // curly brace. '}' // semicolon ends the definition of "Mappings" ; /* * Definition of a single "Mapping", those we are * collecting in the "mappings" attribute of the * "Mappings" type. */ Mapping: // each mapping starts with the keyword "map" // and is followed by an element of type "MappingSpec" 'map' MappingSpec ; /* * Definition of a "MappingSpec" element. This is * actually just a "parent type" for two more specific * kinds of "MappingSpec": */ MappingSpec: // no keywords are defined here, a "MappingSpec" // can be either a "PermissionMappingSpec" or a // "CoordinateMappingSpec". Any of these will be // fine where a "MappingSpec" is asked for. PermissionMappingSpec | CoordinateMappingSpec ; /* * Definition of a "PermissionMappingSpec" element. */ PermissionMappingSpec: // first the keyword "permission" is required. // then a "name" attribute is expected of type ID. // Following the name the "to" keyword is expected, // followed by a string that is stored in the "value" // attribute 'permission' name=ID 'to' value=STRING ; /* * Definition of a "CoordinateMappingSpec" element. * The definition is very similar to the "PermissionMappingSpec" * but has more attributes. */ CoordinateMappingSpec: // first the keyword "coordinate", then an ID stored as "name", // the keyword "to", followed by a string stored as "controllername", // next the keyword "in" and finally another string, memorized as // "bundleid" 'coordinate' name=ID 'to' controllername=STRING 'in' bundleid=STRING ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< mappings section // >>>>>>>>>>>>>>>>>>>>>>>>>>>>> navigations section /* * Definition of the "TransitionDefinitions" type used in * the "Root" type. */ TransitionDefinitions: // first, this element is introduced with the "navigations" // keyword, followed by an open curly brace. 'navigations' '{' // after that a collection of "TransitionDefinition"s is // expected. The "+=" means that they will // all be collected in a collection type element // called "transitions" for future reference. // The "+" at the end means "at least one, but // more is just fine". (transitions+=TransitionDefinition)+ // the element ends with a closing curly brace '}' ; /* * Definition of a "TransitionDefinition" element. This * one is very simple. */ TransitionDefinition: // the keyword "define navigation" is required first, // then a "name" attribute of type ID is expected. 'define navigation' name=ID ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< navigations section // >>>>>>>>>>>>>>>>>>>>>>>>>>>>> navrules section /* * Definition of the "NavigationRules" element. */ NavigationRules: // Element starts with the keywords "navrules" and // open curly. 'navrules' '{' // collection attribute called "rules", consisting // of one or more occurrences of a "Rule" element. (rules+=Rule)+ // element finishes with a closing curly keyword '}' ; /* * Definition of a "Rule" element as used in the "NavigationRules" * element. */ Rule: // first the "from" keyword, then a reference to one of the // coordinate mappings defined earlier. This time no new // definition of a coordinate is required, but one of those // that have been listed before. So the type here is put in // square brackets 'from' source=[CoordinateMappingSpec] // following the source specification, one or more "Destination" // type elements are expected, collected in a collection attribute // named "destinations" (destinations+=Destination)+ ; /* * Definition of a "Destination" type. These are collected * in a "Rule". */ Destination: // first comes an "on navigation" keyword. After that a // reference to one of the Transition elements defined // in the "navigations" section is required and stored // in the "transition" attribute. // after that follows a "go to" keyword and a reference // to a coordinate mapping, stored in the "target" attribute. // finally - as with the "destinations" collection attribute // in the "Rule" element - a "permissions" collection is // defined to store none or more (*) "PermissionReference" // elements. 'on navigation' transition=[TransitionDefinition] 'go to' target=[CoordinateMappingSpec] (permissions+=PermissionReference)* ; /* * Definition of a "PermissionReference" type. This is used * in the "permissions" collection of a "Destination". */ PermissionReference: // first, a "with" keyword is expected. After that a // "permission" attribute stores a reference to one of // the previously defined permission mappings from the // "mappings" section. 'with' permission=[PermissionMappingSpec] ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< navrules section This is what XText can digest and create an editor plugin and outline view for. Just save this as navigationRules.xtext – when you created the XText project in Eclipse using the wizard it should have been prepared for you. Copying and pasting this into a .xtext file in Eclipse will provide you with syntax highlighting, code completion and syntax checking, making it easy to play around with grammar files. Once done, right click the .mwe2 file lying next to the grammar file in the Package Explorer view and select Run As MWE2 Workflow from the context menu. This will take a moment and generate several classes, both in the current (XText) project and the accompanying ...ui project. Next, right click the Xtext project and select Run As Eclipse Application from the context menu. This will bring up another Eclipse instance with the newly created support for navigation rules files (with a .navi suffix) installed. To try it out, just create a new project and in that a new file. Make sure its name ends in .navi. When asked, make sure to accept adding the Xtext nature to the project. You will be presented with a new, empty editor that already has an error marker in it. This is because according to our grammar definition, an empty file does not comply to all the rules we specified. Try hitting the code-completion shortcut (Ctrl-Space) twice and see what happens: The first code-completion fills in the navigation rules for part. According to the grammar this is the only valid text at the beginning of a file, so it is automatically inserted. Hitting Ctrl-Space again will tell you that now you need a Name of type ID. Just go ahead and try out the completion. It will help you create a syntactically sound navigation rules file. Notice that the Problems View tells you what is currently wrong. Also notice, that one you reach a part where references are expected by the grammar (e. g. when defining source and destination coordinates in a navigation rule) you will get suggestions based on what you entered earlier. This is what the whole sample from above looks like in the editor: While you are still fleshing out and fine tuning your grammar definitions, you will probably close this Eclipse instance and reopen it, once you repeated the Run As MWE2 Workflow steps in the main instance. In the long run I suggest you create a Feature and an Update Site project to allow easier distribution and updates of the intermediate iterations. Generating Code Now, as we have a complete Xtext DSL defined and in place let’s have a look at the Code Generation side of things. This part is completely optional: You are free to include the necessary Xtext libraries into your applications runtime (although they seem to be numerous) and just use them to dynamically load and parse .navi files on-the-fly. This would probably be a good idea if you were writing an Eclipse based application anyway. However, when targeting a very limited platform like JavaME this option is not viable. Instead we will now create a code generator that provides a transformation from the DSL syntax into more classic Java terms – specifically we will create a HashMap based data structure that carries all the same information, but in Java terms. This is a sample of what the generated output is going to look like: public class NaviRules { private Map navigationRules = new Hashtable(); // ... public NaviRules() { NaviDestination naviDest; // ========== From Login (com.danielschneller.myapp.gui.login.LoginController) // ========== On USER_LOGON_FAILED // ========== To LoginFailed (com.danielschneller.myapp.gui.login.LoginFailedController in com.danielschneller.myapp.login) naviDest = new NaviDestination(); naviDest.action = "USER_LOGON_FAILED"; naviDest.targetClassname = "com.danielschneller.myapp.gui.login.LoginFailedController"; naviDest.targetBundleId = "com.danielschneller.myapp.login"; store("com.danielschneller.myapp.gui.login.LoginController", naviDest); // ========== On USER_LOGON_SUCCESS // ========== To MainMenu (com.danielschneller.myapp.gui.menu.MainMenuController in com.danielschneller.myapp.menu) naviDest = new NaviDestination(); naviDest.action = "USER_LOGON_SUCCESS"; naviDest.targetClassname = "com.danielschneller.myapp.gui.menu.MainMenuController"; naviDest.targetBundleId = "com.danielschneller.myapp.menu"; store("com.danielschneller.myapp.gui.login.LoginController", naviDest); // ============================================================================= // ========== From LoginFailed (com.danielschneller.myapp.gui.login.LoginFailedController) // ========== On OK // ========== To Login (com.danielschneller.myapp.gui.login.LoginController in com.danielschneller.myapp.login) naviDest = new NaviDestination(); naviDest.action = "OK"; naviDest.targetClassname = "com.danielschneller.myapp.gui.login.LoginController"; naviDest.targetBundleId = "com.danielschneller.myapp.login"; store("com.danielschneller.myapp.gui.login.LoginFailedController", naviDest); // .... and so on ... } } The support class NaviDestination is omitted but is generally just a value holder struct type class. When creating the Xtext project using the wizard earlier we created a third Eclipse project, ending in ...generator. Its src folder contains three subdirectories called model, templates and workflow. Put the sample .navi file into the model directory. It will serve as the input for the generator. Create the first template Code generation is based on templates. Xtext leverages the Xpand template engine. In the templates directory create a new Xpand template using the context menu. Call it NaviRules.xpt, open it and insert the following: «REM» import the namespace defined in our DSL model «ENDREM» «IMPORT navigationRules» «REM» Define a template called "main" for elements of type "Root". The minus sign at the end takes care of not adding a newline at the end of it. «ENDREM» «DEFINE main FOR Root-» «ENDDEFINE» As there is only one instance of a Root element in a navigation rules file, this will be the main entry point - hence the name. There is no need to call it main, but it seems fitting. Now between the DEFINE and ENDDEFINE insert what is to be generated: As shown above, we need a new Java source file called NaviRules.java: ... «DEFINE main FOR Root-» «FILE "NaviRules.java"-» «ENDFILE-» «ENDDEFINE» ... Again, the contents to be generated is put in between the FILE and ENDFILE brackets. Anything not enclosed in «» will be used verbatim in the output file. So first of all, put in the static parts of the Java file. What I did was first write the source for a single navigation rule by hand, made sure it compiled and then copied over the relevant parts into the template piece by piece: ... «FILE "NaviRules.java"-» import java.util.*; public class NaviRules { public static class NaviDestination { String action; List requiredPermissions = new ArrayList(); String targetClassname; String targetBundleId; NaviDestination() {}; public final List getRequiredPermissions() { return new ArrayList(requiredPermissions); } // let Eclipse generate getters, setters, // equals and hashCode methods for this } private Map navigationRules = new Hashtable(); «ENDFILE-» ... Now, this is nothing special so far. To fill in the elements from the navigation rules DSL file put in the following: ... private Map navigationRules = new Hashtable(); public NaviRules() { NaviDestination naviDest; «REM» Iterate all elements in the "rules" collection attribute of the "ruledefs" attribute of the "Root" element. Call each iterated element (which is of type "Rule") "rule" and expand the "ruletmpl" template for it here. «ENDREM» «FOREACH ruledefs.rules AS r»«EXPAND ruletmpl FOR r»«ENDFOREACH» } ... In the class constructor we first define a local variable naviDest of the previously declared type. Then - as the comment states - the FOREACH instruction will iterate over all Rule type elements. This might not seem to be completely obvious at first. Remember at this point in the template the current scope is the "Root" element from the navigation rules file. It has an attribute called ruledefs as per the grammer definition. This attribute is of type NavigationRules which in turn has a collection attribute called rules, containing of Rule type objects. Inside the loop the current element can then be adressed by the template variable name r. The loop body (between FOREACH and ENDFOREACH) contains another Xpand instruction to expand a template called ruletmpl which will be declared next. Don't worry, even though this is a little difficult at first - switching contexts between the Java and the template scopes is made significantly easier in Eclipse, because the Xpand template editor will syntax color (static parts are blue) and also assist you with code completion inside the Xpand template parts. Ctrl-Spacing your way through it will make things more obvious than they are when reading an example. Now for the ruletmpl template. Place it below the ENDDEFINE statement belonging to the main template: ... «ENDFILE-» «ENDDEFINE» «DEFINE ruletmpl FOR Rule-» // ========== From «source.name» («source.controllername») «FOREACH destinations AS d»«EXPAND destTmpl(source) FOR d»«ENDFOREACH» // ============================================================================= «ENDDEFINE» You see the same idea used again: Static parts that get transferred into the output file 1:1 and Xpand statements that fill in data from the navigation rules definition file. In this case you see references to the attributes of the Rule element. As per the FOREACH instruction in the previous template, the one at hand will be repeated for every instance of Rule in our source file. Inside this definition the current scope is that of Rule, so with «source.name» the name attribute of the CoordinateMappingSpec object referenced as source in a Rule is taken first, then the controllername attribute likewise. Next up another FOREACH loop iterates the one or more possible Destinations of each Rule. Instead of just applying a template (destTmpl) for every Destination we also pass in the corresponding CoordinateMappingSpec stored in the source attribute of the Rule. This is then used in the following template: ... «DEFINE destTmpl(CoordinateMappingSpec source) FOR Destination-» // ========== On «transition.name» // ========== To «target.name» («target.controllername» in «target.bundleid») naviDest = new NaviDestination(); naviDest.action = "«transition.name»"; naviDest.targetClassname = "«target.controllername»"; naviDest.targetBundleId = "«target.bundleid»"; «FOREACH permissions AS p»«EXPAND permTmpl FOR p»«ENDFOREACH» store("«source.controllername»", naviDest); «ENDDEFINE» «DEFINE permTmpl FOR PermissionReference-» naviDest.requiredPermissions.add("«permission.value»"); «ENDDEFINE» In this innermost templates the attributes of the CoordinateMappingSpec objects source and target are accessed and put into place to be assigned to the members a NaviDestination Java object instance per Destination. There is only one more (very simple) template for the PermissionReference elements. With this, the Xpand file is complete. Set Up The Generator Workflow The wizard initially created a NavigationRulesGenerator.mwe2 file in the workflow folder. Open it and replace its contents with the following: module workflow.NavigationRulesGenerator import org.eclipse.emf.mwe.utils.* var targetDir = "src-gen" var fileEncoding = "Cp1252" var modelPath = "src/model" Workflow { component = org.eclipse.xtext.mwe.Reader { path = modelPath // this class has been generated by the xtext generator register = com.danielschneller.navi.NavigationRulesStandaloneSetup {} load = { slot = "root" type = "Root" } } component = org.eclipse.xpand2.Generator { metaModel = org.eclipse.xtend.typesystem.emf.EmfRegistryMetaModel {} expand = "templates::NaviRules::main FOREACH root" outlet = { path = targetDir } fileEncoding = fileEncoding } } The most interesting parts of this workflow file are the load section in the Reader component and the expand and outlet sections in the Generator component: The first one will connect a so-called slot with the Root element from our navigation rules. The second one will trigger the evaluation of the main template in the NaviRules.xpt file in the templates folder and feed any Root instances it finds in the *.navi files from the src/model (modelPath) into it. Now it is time for some actual generation. Run the generator workflow Right click the MWE2 file you just edited and select the Run As MWE2 Workflow command from the context menu. The Eclipse console will show this output: 0 [main] DEBUG org.eclipse.xtext.mwe.Reader - Resource Pathes : [src/model] 431 [main] DEBUG xt.validation.ResourceValidatorImpl - Syntax check OK! Resource: file:/Users/ds/ws/ws36_xtext/com.danielschneller.navi.dsl.generator/src/model/MyApp.navi 1013 [main] INFO org.eclipse.xpand2.Generator - Written 1 files to outlet [default](src-gen) 1014 [main] INFO .emf.mwe2.runtime.workflow.Workflow - Done. Then have a look at the newly generated contents of the src-gen source folder. If everything went alright, you should find a fresh NaviRules.java file placed there, based on the contents of your navigation rules file and the Xpand templates. Try and make some changes to the template, then re-run the workflow. You will see the changes reflected in the generated source file. Generate a second source File In the templates directory add another Xpand template file Navigation.xpt with the following content: «IMPORT navigationRules»; «DEFINE main FOR Root-» «FILE "Navigation.java"-» public final class Navigation { «FOREACH ruledefs.rules.destinations.transition.collect(e|e.name).toSet().sortBy(e|e) AS t»«EXPAND actionTmpl FOR t»«ENDFOREACH» private final String name; private Navigation(String aName) { name = aName; } public String getName() { return name; } } «ENDFILE-» «ENDDEFINE» «DEFINE actionTmpl FOR String-» /** Constant for Navigation «this» */ public static final Navigation «this» = new Navigation("«this»"); «ENDDEFINE» This is a template for a type-safe enumeration that can be used in Java 1.4 - remember I had to do this for JavaME. Notice the FOREACH loop in this case. It demonstrates that not only simple iterations are possible, but that Xpand allows more complex operations as well. In this case it will collect the names of all the navigation transitions from all the Destinations in the navigation rules. These are of type String. They are made unique by converting them to a Set datastructure and then finally sorted in their natural order. The resulting list of sorted strings is then iterated, each one - called t - is passed to the actionTmpl template. It is very simple, just placing the string itself («this») into a single line of Java source code. Of course, strictly speaking this is a rather complicated procedure to get the same information we could also have taken from the TransitionDefinitions element in the rules definition. However I think it serves as a nice example for additional Xpand capabilities. For a full description of its possibilities, have a look at the Xpand Reference in the Eclipse documentation. To use the new template, add another section to the MWE2 workflow definition: component = org.eclipse.xpand2.Generator { metaModel = org.eclipse.xtend.typesystem.emf.EmfRegistryMetaModel {} expand = "templates::Navigation::main FOREACH root" outlet = { path = targetDir } fileEncoding = fileEncoding } Running it again will produce a slightly different output, making clear that two files have been generated. This is what comes out in the src-gen folder as Navigation.java: public final class Navigation { /** Constant for Navigation ADMIN */ public static final Navigation ADMIN = new Navigation("ADMIN"); /** Constant for Navigation BACK */ public static final Navigation BACK = new Navigation("BACK"); /** Constant for Navigation DATA_LOOKUP */ public static final Navigation DATA_LOOKUP = new Navigation("DATA_LOOKUP"); /** Constant for Navigation OK */ public static final Navigation OK = new Navigation("OK"); ... More... This was just about my first experiments with Xtext. I am sure there is plenty more to be done with it. For more reading, please have a look at this very nice Getting started with Xtext tutorial by Peter Friese of Itemis. From http://www.danielschneller.com/2010/08/code-generation-with-xtext.html

August 7, 2010

by Daniel Schneller

· 27,012 Views

Migrating from Cassandra to MongoDB

I'll start off by saying this article is not intended to be a Cassandra-bashing session, instead it provides an interesting look at one development company's case study to show that Cassandra (although it's fantastic for some) is not for everyone. The company is Nodeta and the application is Flowdock, a free tool (currently in beta) that functions as a web-based team messenger in place of Campfires, Skype Chats, IRCs, etc. Otto Hilska, who talked about the migration, says that "All software developers should be using it… because it better supports their actual workflow.." About a week ago the team finished their transition from the Apache Cassandra NoSQL database to another NoSQL, MongoDB. The switch was made due to stability issues that the developers were having with Cassandra. Hilska explained the details of his company's experience with Cassandra: " All nodes would go into an infinite loop, running GC and trying to compact the data files – occasionally falling off the cluster. We were unable to solve the problem, except that restarting and then compacting a node usually settled it down for a while. Other people had reported similar problems. Last couple of weeks our Cassandra nodes always ate all the resources they were given, slowing down Flowdock. This was not the first time we had run into problems because of our bleeding edge database choice. When upgrading from 0.4 to 0.5, we had to shut down the cluster, only to find out that it hadn’t flushed everything to the disk (even though we explicitly flushed it, as instructed). Thus we ended up having a couple of minutes of discussions lost, and our custom-built indices were miserably out of date and needed to be rebuilt. I think it was 4 AM when we finally got to leave the office."--Otto Hilska Flowdock developers became attracted to a new NoSQL store, MongoDB, because of its recent addition of auto-sharding and replica sets. Hilska wrote the conversion script in a day and it took a week to get Flowdock running purely on MongoDB. Then Nodeta tested it internally for a few weeks before they deployed it to production. However, MongoDB is not without flaws as well. Dots are not allowed in BSON document keys and the document size is limited to 4MB. It's also not as easy to add new nodes as it is with Cassandra. On the other hand, Hilska says that the smart (multikey) indices, complex queries directly from the console, MapReduce, GridFS, and lack of issues make up for these minor flaws. I recently posted a guide for determining the right database solution (Relational or NoSQL) for various use cases. The article has some very good resources and includes situations where MongoDB or Cassandra might be the best choice. Some criticisms of Cassandra emerged a few weeks ago when Twitter announced that it would over to the NoSQL store. By no means has Twitter stopped using Cassandra. They have stated that it's currently being used to store geolocation data and data mining results that feed into things like local trends and @toptweets. The NoSQL's creators at Facebook are also still using Cassandra. Production deployments of MongoDB exist at Foursquare, SourceForge, The New York Times, BoxedIce, GitHub, and SugarCRM.

July 26, 2010

by Mitch Pronschinske

· 19,498 Views

Optimizing JPA Performance: An EclipseLink, Hibernate, and OpenJPA Comparison

'Impedance mismatch'. No two words encompass the troubles, headaches and quirks most developers face when attempting to link applications to relational databases (RDBMS). But lets face it, object orientated designs aren't going away anytime soon from mainstream languages and neither are the relational storage systems used in most applications. One side works with objects, while the other with tables. Resolving these differences -- or as its technically referred to 'object/relational impedance mismatch' -- can result in substantial overhead, which in turn can materialize into poor application performance. In Java, the Java Persistence API (JPA) is one of the most popular mechanisms used to bridge the gap between objects (i.e. the Java language) and tables (i.e. relational databases). Though there are other mechanisms that allow Java applications to interact with relational databases -- such as JDBC and JDO -- JPA has gained wider adoption due to its underpinnings: Object Relational Mapping (ORM). ORM's gain in popularity is due precisely to it being specifically designed to address the interaction between object and tables. In the case of JPA, there is a standard body charged with setting its course, a process which has given way to several JPA implementations, among the three most popular you will find: EclipseLink (evolved from TopLink), Hibernate and OpenJPA. But even though all three are based on the same standard, ORM being such a deep and complex topic, beyond core functionality each implementation has differences ranging from configuration to optimization techniques. What I will do next is explain a series of topics related to optimizing an application's use of the JPA, using and comparing each of the previous JPA implementations. While JPA is capable of automatically creating relational tables and can work with a series of relational database vendors, I will part from having pre-existing data deployed on a MySQL relational database, in addition to relying on the Spring framework to facilitate the use of the JPA. This will not only make it a fairer comparison, but also make the described techniques appealing to a wider audience, since performance issues become a serious concern once you have a large volume of data, in addition to MySQL and Spring being a common choice due to their community driven (i.e. open-source) roots. See the source code/application section at the end for instructions on setting up the application code discussed in the remainder of the sections. Download the Source Code associated with this article (~45 MB) The basics: Metrics In order to establish JPA performance levels in an application, it's vital to first obtain a series of metrics related to a JPA implementation's inner workings. These include things like: What are the actual queries being performed against a RDBMS? How long does each query take? Are queries being performed constantly against the RDBMS or is a cache being used? These metrics will be critical to our performance analysis, since they will shed light on the underlying operations performed by a JPA implementation and in the process show the effectiveness or ineffectiveness of certain techniques. In this area you will find the first differences among implementations, and I'm not talking about metric results, but actually how to obtain these metrics. To kick things off, I will first address the topic of logging. By default, all three JPA implementations discussed here -- EclipseLink, Hibernate and OpenJPA -- log the query performed against a RDBMS, which will be an advantage in determining if the queries performed by an ORM are optimal for a particular relational data model. Nevertheless, tweaking the logging level of a JPA implementation further can be helpful for one of two things: Getting even more details from the underlying operations made by a JPA -- which can be turned off by default (e.g. database connection details) -- or getting no logging information at all -- which can benefit a production system's performance. Logging in JPA implementations is managed through one of several logging frameworks, such as Apache Commons Logging or Log4J. This requires the presence of such libraries in an application. Logging configuration of a JPA implementation is mostly done through a value in an application's persistence.xml file or in some cases, directly in a logging framework's configuration files. The following table describes JPA logging configuration parameters: Large table, so here's an external link In addition to the information obtained through logging, there is another set of JPA performance metrics which require different steps to be obtained. One of these metrics is the time it takes to perform a query. Even though some JPA implementations provide this information using certain configurations, some do not. Even so, I opted to use a separate approach and apply it to all three JPA implementations in question. After all, time metrics measured in milliseconds can be skewed in certain ways depending on start and end time criteria. So to measure query times, I will use Aspects with the aid of the Spring framework. Aspects will allow us to measure the time it takes a method containing a query to be executed, without mixing the timing logic with the actual query logic -- the last feature of which is the whole purpose of using Aspects. Further discussing Aspects would go beyond the scope of performance, so next I will concentrate on the Aspect itself. I advise you to look over the accompanying source code, Aspects and Spring Aspects for more details on these topics and their configuration. The following Aspect is used for measuring execution times in query methods. package com.webforefront.aop;import org.apache.commons.lang.time.StopWatch;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Pointcut;import org.aspectj.lang.annotation.Aspect;@Aspectpublic class DAOInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object logQueryTimes(ProceedingJoinPoint pjp) throws Throwable { StopWatch stopWatch = new StopWatch(); stopWatch.start(); Object retVal = pjp.proceed(); stopWatch.stop(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": " + stopWatch.getTime() + "ms"); return retVal; } The main part of the Aspect is the @Around annotation. The value assigned to this last annotation indicates to execute the aspect method -- logQueryTimes -- each time a method belonging to a class in the com.webforefront.jpa.service package is executed -- this last package is where all our application's JPA query methods will reside. The logic performed by the logQueryTimes aspect method is tasked with calculating the execution time and outputting it as logging information using Apache Commons Logging. Another set of important JPA metrics is related to statistics beyond those provided by standard logging. The statistics I'm referring to are things related to caches, sessions and transactions. Since the JPA standard doesn't dictate any particular approach to statistics, each JPA implementation also varies in the type and way it collects statistics. Both Hibernate and OpenJPA have their own statistics class, where as EclipseLink relies on a Profiler to gather similar metrics. Since I'm already relying on Aspects, I will also use an Aspect to obtain statistics both prior and after the execution of a JPA query method. The following Aspect obtains statistics for an application relying on Hibernate. package com.webforefront.aop;import org.hibernate.stat.Statistics;import org.hibernate.SessionFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.hibernate.ejb.HibernateEntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheHibernateInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { HibernateEntityManagerFactory hbmanagerfactory = (HibernateEntityManagerFactory) entityManagerFactory; SessionFactory sessionFactory = hbmanagerfactory.getSessionFactory(); Statistics statistics = sessionFactory.getStatistics(); String str = pjp.getTarget().toString(); statistics.setStatisticsEnabled(true); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) " + statistics); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) " + statistics); return result; } } Notice the similar structure to the prior timing Aspect, except in this case the logging output contains values that belong to the Statistics Hibernate class obtained via the application's EntityManagerFactory. The next Aspect is used to obtain statistics for an application relying on OpenJPA. package com.webforefront.aop;import org.apache.openjpa.datacache.CacheStatistics;import org.apache.openjpa.persistence.OpenJPAEntityManagerFactory;import org.apache.openjpa.persistence.OpenJPAPersistence;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheOpenJPAInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { OpenJPAEntityManagerFactory ojpamanagerfactory = OpenJPAPersistence.cast(entityManagerFactory); CacheStatistics statistics = ojpamanagerfactory.getStoreCache().getStatistics(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); return result; } } Once again, notice the similar Aspect structure to the previous Aspect which relies on an application's EntityManagerFactory. In this case, the logging output contains values that belong to the CacheStatistics OpenJPA class. Since OpenJPA does not enable statistics by default, you will need to add the following two properties to an application's persistence.xml file: The first property ensures statistics are gathered, while the second property is used to indicate the gathering of statistics take place on a single JVM. NOTE: The value "true(EnableStatistics=true)" also enables caching in addition to statistics. Since EclipseLink doesn't have any particular statistics class and relies on a Profiler to determine advanced metrics, the simplest way to obtain similar statistics to those of Hibernate and OpenJPA is through the Profiler itself. To active EclipseLink's Profiler you just need to add the following property to an application's persistence.xml file: . By doing so, the EclipseLink Profiler output's several metrics on each JPA query method execution as logging information. Now that you know how to obtain several metrics from all three JPA implementations and understand they will be obtained as fairly as possible for all three providers, it's time to put each JPA implementation to the test along with several performance techniques. JPQL queries, weaving and class transformations Lets start by making a query that retrieves data belonging to a pre-existing RDBMS table named "Master". The "Master" table contains over 17,000 records belonging to baseball players. To simplify matters, I will create a Java class named "Player" and map it to the "Master" table in order to retrieve the records as objects. Next, relying on the Spring framework's JpaTemplate functionality, I will setup a query to retrieve all "Player" objects, with the query taking the following form: getJpaTemplate().find("select e from Player e"); See the accompanying source code for more details on this last process. Next, I deploy the application using each of the three JPA implementations on Apache Tomcat, doing so separately, as well as starting and stopping the server on each deployment to ensure fair results. These are the results of doing so on a 64-bit Ubuntu-4GB RAM box, using Java 1.6: All player objects - 17,468 records Time Query Hibernate 3558 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ EclipseLink (Run-time weaver - Spring ReflectiveLoadTimeWeaver weaver ) 3215 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (Build-time weaving) 3571 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (No weaving) 3996 ms SELECT lahmanID, nameLast, nameFirst FROM Master OpenJPA (Build-time enhanced classes) 5998 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Run-time enhanced classes- OpenJPA enhancer) 6136 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Non enhanced classes) 7677 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 As you can observe, the queries performed by each JPA implementation are fairly similar, with two of them using a shortcut notation (e.g. t0 and player0 for the table named 'Master'). This syntax variation though has minimal impact on performance, since directly querying an RDBMS using any of these notation variations shows identical results. However, the query times made through several JPA implementations using distinct parameters vary considerably. One important factor leading to this time difference is due to how each implementation handles JPA entities. Lets start with the OpenJPA implementation which had the poorest times. OpenJPA can execute an enhancement process on Java entities (e.g. in this case the 'Player' class). This enhancement process can be performed when the entities are built, at run-time or foregone altogether. As you can observe, foregoing entity enhancement altogether in OpenJPA produced the longest query times. Where as enhancing entities at either build-time or run-time produced relatively better results, with the former beating out the latter. By default, OpenJPA expects entities to be enhanced. This means you will either need to explicitly configure an application to support unenhanced classes by adding the following: ...property to an application's persistence.xml file or enhance classes at build-time or at run-time relying on the OpenJPA enhancer, otherwise an application relying on OpenJPA will throw an error. Given these OpenJPA results, the remaining OpenJPA tests will be based on build-time enhanced entity classes. For more on the topic of OpenJPA enhancement, refer to the OpenJPA documentation in addition to consulting the accompanying source code for this article. You may be wondering what exactly constitutes OpenJPA enhancement ? OpenJPA entity enhancement is a processing step applied to the bytecode generated by the Java compiler which adds JPA specific instructions to provide optimal runtime performance, these instructions can include things like flexible lazy loading and dirty read tracking. So why doesn't Hibernate or EclipseLink enhance entities ? In short, Hibernate and EclipseLink also enhance JPA entites, they just don't outright call it 'enhancement'. EclipseLink calls this 'enhancement' process by the more technical term: weaving. Similar to OpenJPA's enhancement process, weaving in EclipseLink can take place at either build-time (a.k.a. static weaving), run-time or forgone altogether. As you can observe in the results, all of EclipseLink's tests present smaller variations compared to OpenJPA. The longest EclipseLink variation involved not using weaving. If you think about it, this is rather logical given that the purpose of weaving consists of altering Java byte code for the purpose of adding optimized JPA instructions that include lazy loading, change tracking, fetch groups and internal optimizations. For the EclipseLink tests using weaving, both build-time and run-time weaving present better results. For build-time weaving, I used EclipseLink's library along with an Apache Ant task, where as for run-time weaving, I used the Spring framework's ReflectiveLoadTimeWeaver. I can only assume the slightly better performance of using run-time weaving over build-time weaving in EclipseLink was due to the fact of using a weaver integrated with the Spring framework, which in turn could result in better JPA optimizations designed for Spring applications. Nevertheless, considering the test result of forgoing weaving altogether, weaving does not appear to be a major performance impact when using EclipseLink, ceteris paribus. By default, EclipseLink expects run-time weaving to be enabled, otherwise you will receive an error in the form 'Cannot apply class transformer without LoadTimeWeaver specified'. This means that for cases using build-time weaving or no weaving at all, you will need to explicitly indicate this behavior. In order to disable EclipseLink weaving you will need to either configure an application's EntityManagerFactory Spring bean with: ... or add the .... ...property to an application's persistence.xml file. To indicate an application's entities are built using build-time weaving, substitute the previous property's "false" value with "static". To configure the default run-time weaver expected by EclipseLink, add the following: ...property to an application's EntityManagerFactory Spring bean. Given these EclipseLink results, the remaining EclipseLink tests will be based on run-time weaving provided by the Spring framework. For more on the topic of EclipseLink weaving, refer to the EclipseLink documentation at http://wiki.eclipse.org/Introduction_to_EclipseLink_Application_Development_(ELUG)#Using_Weaving, in addition to consulting the accompanying source code for this article. Hibernate doesn't require neither enhancing JPA entities or weaving. For this reason, there is only one test result. This not only makes Hibernate simpler to setup, but judging by its only test result -- which clock's in at second place with respect to all other tests -- Hibernate's performance ranks high compared to its counterparts. However, in what I would consider Hibernate's equivalent to OpenJPA's enhancement process or EclipseLink's weaving, you will find a series of Hibernate properties. For example, Hibernate has properties such as hibernate.default_batch_fetch_size designed to optimize lazy loading. As you might recall, among the purposes of both OpenJPA's enhancement process and EclipseLink's weaving are the optimization of lazy loading. So where as OpenJPA and EclipseLink require a separate and monolithic step -- at build-time or run-time -- to achieve JPA optimization techniques, Hibernate falls back to the use of granular properties specified in an application's persistence.xml file. Nevertheless, given that Hibernate's default behavior proved to be on par with the best query times, I didn't feel a need to further explore with these Hibernate properties. To get another sense of the times and mapping procedures of each JPA implementation, I will make more selective queries based on a Player object's first name and last name. These are the results of performing a query for all Player objects whose first name is John and a query for all Player objects whose last name in Smith. All player objects whose first name is John - 472 records Time Query EclipseLink 1265 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameFirst = ?) Hibernate 613 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameFirst=? OpenJPA 1643 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameFirst = ?) [params=?] All player objects whose last name is Smith - 146 records Time Query EclipseLink 986 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameLastt = ?) Hibernate 537 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameLast=? OpenJPA 1452 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameLast = ?) [params=?] These test results tell a slightly different story,with all three JPA implementations presenting substantial time differences amongst one another. At a lower record count, Hibernate's out-of-the-box configuration resulted in almost twice as fast queries as its closest competitor and almost three times faster queries than its other competitor. To get an even broader sense of the times and mapping procedures of each JPA implementation, I will make a query on a single Player object based on its id. These are the results of performing such a query. Single player object whose ID is 777- 1 record Time Query EclipseLink 521 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (lahmanID = ?) Hibernate 157 ms select player0_.lahmanID as lahmanID0_0_, player0_.nameFirst as nameFirst0_0_, player0_.nameLast as nameLast0_0_ from Master player0_ where player0_.lahmanID=? OpenJPA 1052 ms SELECT t0.nameFirst, t0.nameLast FROM Master t0 WHERE t0.lahmanID = ? [params=?] With the exception of the faster query times -- due to it being a query for a single Player object -- the times between JPA implementations are practically in proportion to the queries used for extracting multiple Player objects by first and last name. This will do it as far as test queries are concerned. However, a word of caution is in order when discussing these topics on optimization/enhancement/weaving. Even though the previous tests consisted of querying over 17,000 records and confirm clear advantages of using one provider and technique over another, they are still one dimensional, since they're based on read operations performed on a single object type and a single RDBMS table. JPA can perform a large array of operations that also include updating, writing and deleting RDBMS records, not to mention the execution of more elaborate queries that can span multiple objects and tables. In addition, RDBMS themselves can have influencing factors (e.g. indexes) over JPA query times. So all this said, it's not too far fetched to think the use of OpenJPA entity enhancement, EclipseLink weaving or Hibernate properties, could have varying degrees -- either beneficial or detrimental -- depending on the queries (i.e. multi-table, multi-object) and type of JPA operation (i.e. read, write, update, delete) involved. Next, I will describe one of the most popular techniques used to boost performance in JPA applications. Caches A cache allows data to remain closer to an application's tier without constantly polling an RDBMS for the same data. I entitled the section in plural -- caches -- because there can be several caches involved in an application using JPA. This of course doesn't mean you have to configure or use all the caches provided by an application relying on JPA, but properly configuring caches can go a long way toward enhancing an application's JPA performance. So lets start by analyzing what it's each JPA implementation offers in its out-of-the-box state in terms of caching. The following table illustrates tests done by simply invoking the previous JPA queries for a second and third consecutive time, without stopping the server. Note that the same process of deploying a single application at once was used, in addition to the server being re-started on each set of tests. Query / Implementation EclipseLink Hibernate OpenJPA All records (1st time) 3215 ms 3558 ms 5998 ms All records (2nd time) 507 ms 272 ms 521 ms All records (3rd time) 439 ms 218 ms 263 ms First name (1st time) 1265 ms 613 ms 1643 ms First name (2nd time) 151 ms 115 ms 239 ms First name (3rd time) 154 ms 101 ms 227 ms Last name (1st time) 986 ms 537 ms 1452 ms Last name (2nd time) 41 ms 41 ms 112 ms Last name (3rd time) 65 ms 38 ms 117 ms By ID (1st time) 521 ms 157 ms 1052 ms By ID (2nd time) 1 ms 6 ms 3 ms By ID (3rd time) 1 ms 3 ms 3 ms As you can observe, on both the second and third invocation all the queries show substantial improvements with respect to the first invocation. The primary cause for these improvements is unequivocally due to the use of a cache. But what type of cache exactly ? Could it be an RDBMS's own caching engine ? JPA ? Spring ? Or some other variation ?. In order to shed some light on cache usage, the following table illustrates the cache statistics generated on each of the previous JPA queries. Query / Impleme)ntation EclipseLink Hibernate OpenJPA All records (2nd time) number of objects=17468, total time=506, local time=506, row fetch=65, object building=328, cache=112, sql execute=47, objects/second=34521, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=34936, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A All records (3rd time) number of objects=17468, total time=435, local time=435, profiling time=1, row fetch=28, object building=323, cache=106, logging=1, sql execute=27, objects/second=40156, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=52404, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (2nd time) number of objects=472, total time=148, local time=148, row fetch=27, object building=106, cache=7, logging=1, sql execute=3, objects/second=3189, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=944, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (3rd time) number of objects=472, total time=152, local time=152, row fetch=20, object building=121, cache=7, sql execute=3, objects/second=3105, sessions opened=3, sessions closed=3 connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=1416, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (2nd time) number of objects=146, total time=40, local time=40, row fetch=7, object building=27, cache=2, logging=1, sql execute=3, objects/second=3650, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=292, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (3rd time) number of objects=146, total time=63, local time=63, profiling time=1, row fetch=6, object building=19, cache=5, sql prepare=1, sql execute=23, objects/second=2317, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=438 queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (2nd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=2, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (3rd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=3, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A Notice the statistics generated by each JPA implementation are different. EclipseLink reports a single cache statistic, OpenJPA doesn't even report statistics unless a cache is enabled -- see previous section on metrics for details on this behavior -- and Hibernate reports two cache related statistics: second level cache and query cache. At this juncture, if you look at the test results and statistics for the second and third invocation, something won't add up. How is it that OpenJPA's test results came out faster when caching is disabled by default ? An how about Hibernate returning 0's on its cache related statistics, even when its test results came out faster ? The reason for this performance increase is due to RDBMS caching. On the first query, the RDBMS needs to read data from its own file system (i.e. perform an I/O operation), on subsequent requests the data is present in RDBMS memory (i.e. its cache) making the entire JPA query much faster. A closer look at the Hibernate statistics field 'queries executed to the database' can confirm this. Notice that on every second query it shows 2 and on every third query it shows 3, meaning the data was read directly from the database. NOTE: The only exception to this occurs when a query is made on a single entity (i.e. by id), I will address this shortly. Next, lets start breaking down the caches you will encounter when using JPA applications. The JPA 2.0 standard defines two types of caches: A first level cache and a second level cache. The first level cache or EntityManager cache is used to properly handle JPA transactions. A first level cache only exist for the duration of the EntityManager. With the exception of long lived operations performed against a RDBMS, JPA EntityManager's are short lived and are created & destroyed per request or per transaction. In this case, given the nature of the queries, first level caches are cleared on every query. A second level cache on the other hand is a broader cache that can be used across transactions and users. This makes a JPA second level cache more powerful, since it can avoid constantly polling an RDBMS for the same data. But even though the JPA 2.0 standard now addresses second level cache features, this was not the case in JPA 1.0. In the 1.0 version of the JPA standard only a first level cache was addressed, leaving the door completely open on the topic of a second level cache. This created a fragmented approach to caching in JPA implementations, which even now as JPA 2.0 compliant implementations emerge, some non-standard features continue to be part of certain implementations given the value they provide to JPA caching in general. So as I move forward, bear in mind that just like previous JPA topics, each JPA implementation can have its own particular way of dealing with second level caching. I will start with OpenJPA, which has the least amount of proprietary caching options. To enable OpenJPA caching (i.e. second level caching) you need to declare the following two properties in an application's persistence.xml file: The first property ensures caching and statistics are activated, while the second property is used to indicate caching take place on a single JVM. The following results and statistics were obtained with OpenJPA's second level cache enabled. Query with OpenJPA caching Time Statistics Time without statistics All records (2nd time) 420 ms read count=34936, hit count=17468, write count=17468, total read count=34936, total hit count=17468, total write count=17468 347 ms All records (3rd time) 254 ms read count=52404, hit count=34936, write count=17468, total read count=52404, total hit count=34936, total write count=17468 230 ms First name (2nd time) 125 ms read count=944, hit count=472, write count=472, total read count=944, total hit count=472, total write count=472 127 ms First name (3rd time) 114 ms read count=1416, hit count=944, write count=472, total read count=1416, total hit count=944, total write count=472 132 ms Last name (2nd time) 63 ms read count=292, hit count=146, write count=146, total read count=292, total hit count=146, total write count=146 53 ms Last name (3rd time) 49 ms read count=438, hit count=292, write count=146, total read count=438, total hit count=292, total write count=146 50 ms By ID (2nd time) 5 ms read count=2, hit count=1, write count=1, total read count=2, total hit count=1, total write count=1 1 ms By ID (3rd time) 4 ms read count=3, hit count=2, write count=1, total read count=3, total hit count=2, total write count=1 1 ms As these test results illustrate, executing subsequent JPA queries with OpenJPA's second level cache produce superior results. Another important behavior illustrated in some of these test cases is that by simply disabling statistics -- and still using the second level cache -- query times improve even more. The OpenJPA statistics also demonstrate how the cache is being used. Notice that on each subsequent query the statistics field 'hit count' is duplicated, which means data is being read from the cache (i.e. a hit). Also notice the statistics field 'write count' remains static, which means data is only written once from the RDBMS to the cache. This is pretty basic functionality for a second level cache. On certain occasions a need may arise to interact directly with a cache. These interactions can range from prohibiting an entity from being cached, assigning a particular amount of memory to a cache, forcing an entity to always be cached, flushing all the data contained in a cache, or even plugging-in a third party caching solution to provide a more robust strategy, among other things. The JPA 2.0 standard provides a very basic feature set in terms of second level caching through javax.persistence.Cache. Upon consulting this interface, you'll realize it only provides four methods charged with verifying the presence of entities and evicting them. This feature set not only proves to be limited, but also cumbersome since it can only be leveraged programmatically (i.e. through an API). In this sense, and as I've already mentioned, JPA implementations have provided a series of features ranging from persistence.xml properties to Java annotations related to second level caching. OpenJPA offers several of these second level caching features, including a separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will point you directly to OpenJPA's cache documentation available at http://openjpa.apache.org/builds/apache-openjpa-2.1.0-SNAPSHOT/docs/manual/ref_guide_caching.html#ref_guide_cache_query so you can try these parameters for yourself on the accompanying application source code. Hibernate just like OpenJPA has its second level cache disabled. To enable Hibernate's second level cache you need to add the following properties to an application's persistence.xml file: Its worth mentioning that Hibernate has integral support for other second level caches. The previous properties displayed how to enable the HashtableCacheProvider cache -- the simplest of the integral second level caches -- but Hibernate also provides support for five additional caches, which include: EHCache, OSCache, SwarmCache, JBoss cache 1 and JBoss cache 2, all of which provide distinct features, albeit require additional configuration. Besides these properties, Hibernate also requires that each JPA entity be declared with a caching strategy. In this case, since the Person entity is read only, a caching strategy like the following would be used: Similar to OpenJPA, Hibernate also offers several second level caching features through proprietary annotations and configurations, as well as support for the separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will also point you directly to Hibernate's cache documentation available at http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache so you can try these parameters for yourself on the accompanying application source code. Unlike OpenJPA and Hibernate, EclipseLink's second level cache is enabled by default, therefore there is no need to provide any additional configuration. However, similar to its counterparts, EclipseLink also has a series of proprietary second level cache features which can enhance JPA performance. You can find more information on these features by consulting EclipseLink's cache documentation available at: http://wiki.eclipse.org/Introduction_to_Cache_(ELUG) With this we bring our discussion on object relational mapping performance with JPA to a close. I hope you found the various tests and metrics presented here a helpful aid in making decisions about your own JPA applications. In addition, don't forget you can rely on the accompanying source code to try out several JPA variations more ad-hoc to your circumstances. About the author Daniel Rubio is an independent technology consultant specializing in enterprise and web-based software. He blogs regularly on these and other software areas at http://www.webforefront.com. He's also authored and co-authored three books on Java technology. Source code/Application installation * Install MySQL on your workstation (Tested on MySQL 5.1.37-64 bits) - http://dev.mysql.com/downloads/ * Install data set on MySQL - Go to http://www.baseball-databank.org/ and click on the link titled 'Database in MySQL form'. This will download a zipped file with a series of MySQL data structures containing baseball statistics. First create a MySQL database to host the data using the command: 'mysqladmin -p create jpaperformance'. This will create a database named 'jpaperformance'. Next, load the baseball statistics using the following command: 'mysql -p -D jpaperformance < BDB-sql-2009-11-25.sql' where 'BDB-sql-2009-11.25.sql' represents the unzipped SQL script obtained by extracting the zip file you dowloaded. * Create JPA application WARs - The download includes source code, library dependencies and an Ant build file. This includes all three JPA implementations Hibernate 3.5.3, EclipseLink 2.1 and OpenJPA 2.1. To build the JPA Hibernate WAR - ant hibernate To build the JPA EclipseLink WAR - ant eclipselink To build the JPA OpenJPA WAR - ant openjpa All builds are placed under the dist/ directories. * Deploy to Tomcat 6.0.26 - Copy the MySQL Java driver and Spring Tomcat Weaver -- included in the download directory 'tomcat_jar_deps' -- to Apache Tomcat's /lib directory. - Copy each JPA application WAR to Apache Tomcat's /webapps directory, as needed. * Deployment URL's http://localhost:8080/hibernate/hibernate/home ( Query all Player objects ) http://localhost:8080/eclipselink/eclipselink/home ( Query all Player objects ) http://localhost:8080/openjpa/openjpa/home ( Query all Player objects ) http://localhost:8080/hibernate/hibernate/firstname/ ( Query Player objects by first name) http://localhost:8080/eclipselink/eclipselink/firstname/ ( Query Player objects by first name) http://localhost:8080/openjpa/openjpa/firstname/ ( Query Player objects by first name) http://localhost:8080/hibernate/hibernate/lastname/ ( Query Player objects by last name) http://localhost:8080/eclipselink/eclipselink/lastname/ ( Query Player objects by last name) http://localhost:8080/openjpa/openjpa/lastname/ ( Query Player objects by last name) http://localhost:8080/hibernate/hibernate/playerid/ (Query Player by id) http://localhost:8080/eclipselink/eclipselink/playerid/ ( Query Player by id) http://localhost:8080/openjpa/openjpa/playerid/ ( Query Player by id)

July 20, 2010

by Daniel Rubio

· 153,717 Views · 2 Likes

Practical PHP Patterns: Query Object

An ORM provides an abstraction of storage as an in-memory object graph, but it is difficult to navigate that graph via object pointers without loading a large part of it. Typical problems of this approach are the performance issues related to loading of the various objects, and the transfer of business logic execution from the database side to the client code side, with the resulting duplication. Anyway, when we start navigating an object graph we have to obtain a reference to an entity somehow (an Aggregate Root), from which we can navigate to the other ones. ORMs and, in general, Data Mappers provide different ways to select a subset of objects (or a single one) and reconstitute only that subset from the data storage. Custom mapper classes with domain-specific methods are the the simplest solution, which is often recommended when not using a generic Data Mapper. Custom mapper classes with finder methods are an half-baked solution, which mixes up domain-specific mappers with general purpose methods, sometimes needed to allow flexibility on the user side. Generic mapper classes with finder methods can be provided as a way to parametrize fields, resulting in methods like findBy($entityName, $field, $value). Generic mapper classes with query objects are employed when there is the necessity of composing queries and pass them around for further elaboration or refining. Promoting the query as an object helps this use case. Note that once a mapper implements query objects, they can be effectively used in finder methods, which are a subset of the functionality provided by query objects. In fact, query objects are the most versatile way to ask for the objects that satisfy certain conditions, and they are an Interpreter implementation over a query language adapt for an object model. All of us already know a query language: SQL. But SQL is pertinent to relational databases, while an ORM strives for keeping the illusion of an object-only model into existence. As a result, it must adopt a different language which describes object features, like HQL (Hibernate) or DQL (the Doctrine equivalent). Object query languages There are several differences between an object query language and SQL in the entities you can refer to within queries: SQL refers to tables; object query languages refer to classes and some tables like the association tables for many-to-many relationships simply vanish. SQL refers to rows; object query languages to objects. SQL refers to other tables for making JOINs; object query languages to object collaborators. SQL refers to columns, which also include foreign keys; object query languages only to fields of the objects. When a full-featured language is involved, there must be a component of the ORM that parses the strings containing language statements into a Query Object. Another way to define such an object (Interpreter) is constructing it by hand, by calling a series of setter methods or by implementing a Builder pattern. Advantages A Query Object hides the relational model (the schema) from the user, as it can be inferred by the union of the queries and the Metadata Mapping anyway. The information contained in the metadata, like foreign keys and additional tables, do not have to be repeated in the various components of client code. It hides also the peculiarities of the particular database vendor, since the generation of SQL can be addressed by a driver. It promotes queries as first-class citizens, making them objects that can be passed around, cloned or modified. The database abstraction layers like PDO make of statement objects (PDOStatement) one of their first modelling points. Disadvantages The implementation of the parser for a query language is a task of great complexity, which makes this pattern only feasible in generic Data Mappers. Even when using only Query Objects made by hand, it is advisable to employ an external Data Mapper to take advantage of the translation of object-based queries to SQL. Examples Doctrine 2 contains a parser for its Doctrine Query Language, which lets you define queries like you would do with PDO, but still referring to an object model. The documentation of the query language itself is pretty complete, so I won't go into details but I'll give you a feel of how using DQL is like. The language itself is compatible with the Doctrine 1 version, if you happen to have used it. createQuery('SELECT u FROM MyProject\Model\User u WHERE u.age > 20'); $users = $query->getResult(); $query = $em->createQuery("SELECT u, a FROM User u JOIN u.address a WHERE a.city = 'Berlin'"); $users = $query->getResult(); uery = $em->createQuery('SELECT u, p FROM CmsUser u JOIN u.phonenumbers p'); $users = $query->getResult(); // array of CmsUser objects with the phonenumbers association loaded $phonenumbers = $users[0]->getPhonenumbers(); $query = $em->createQuery('SELECT u, a, p, c FROM CmsUser u JOIN u.articles a JOIN u.phonenumbers p JOIN a.comments c'); $users = $query->getResult(); Sometimes there are no fixed queries, but a dynamic query has to be constructed from its various parts, as a union of conditions, joins and sorting parameters; not all the parameters may be available at a certain time and concatenating strings to compose a DQL statement is prone to error. Doctrine 2 includes a Query Builder which has methods you can call orthogonally, in any order and combination. add('select', 'u') ->add('from', 'User u') ->add('where', 'u.id = :identifier') ->add('orderBy', 'u.name ASC'); ->setParameter('identifier', 100); // Sets :identifier to 100, and thus we will fetch a user with u.id = 100

July 7, 2010

by Giorgio Sironi

· 6,465 Views

Practical PHP Patterns: Metadata Mapping

The intent of the Metadata Mapping pattern is to express implementation details, related a particular domain and Domain Model, as metadata of a general purpose library. In the sense intended here, metadata is related to the persistence operations (transferring objects back and forth from a database). These metadata is usually fed to a general purpose object-relational mapper. Technically the term metadata is plural (of metadatum, data about data), but it is commonly used as an uncountable noun. Why expressing metadata Object-relational mapping is a difficult task to automate, prone to lots of potential bugs and undefined behaviors; expressing the domain-related peculiarities as metadata means that you are able to code only one ORM, and not have to repeat the same work in many custom Data Mappers, which are very boring to write and can't be transported out a specific application. Custom Data Mappers were a cleaner solution for Domain Models with regard to employing Active Records, and they are advocated for example in Zend Framework books like Keith's Pope one. They are finally becoming obsolete thanks to the power of a declarative approach like this pattern, which tools like Doctrine 2 are based on. Historicacally, Hibernate from JBoss was the first Data Mapper implemented as a generic ORM (it is a Java product). Doctrine 2 is the most famous PHP implementation, and it is in beta at the time of this writing. The metadata we'd like to tell to an ORM are for example: which classes should be persisted at all. Optional names for the tables (it can use the class names.) Which fields form the primary key. The types of the different columns, particularly important in a loosely typed language like PHP. Which collaborators have to be persisted and via what means: foreign keys and additional association tables. The metadata should usually not consist of code: non-standard behavior shouldn't be contained in them, as in general all the behavior like ineritance strategies and conversion of relationships is extracted in the generic ORM. Thus there are different formats we can use in place of PHP code: XML, annotations, YAML, INI... Different approaches There are two approaches to Metadata Mapping pattern, described by Fowler in his original book. The first one is code generation: the metadata is processed to generate the source code of the mapping classes, for example a Data Mapper for every entity or aggregate root of your model (one for User, one for BlogPost, and so on). The ORM would theoretically not be necessary in production if the generation is complete enough. Doctrine 1 used this approach in part, but it generated also the PHP code of the domain model itself from the Yaml mapping, as subclasses of Doctrine_Record. Still, Doctrine 1 was necessary to instantiate those classes and the solution wasn't so clean. Doctrine 2 is very different in architecture and goals. The second approach is called reflective program, and consists in interpreting the mapping at runtime in the ORM's code, to open up correctly the objects via reflection (or a standard interface) and putting them in the database. The converse can happen: objects can be recreated from the union of metadata and database tables. How it is used The reflective solution is the common one nowadays, and Doctrine 2 borrows it from Hibernate in its own design. Reflection is used to access the private fields to persist. Some critics point out speed problems of this technique, but keep in mind that your ORM is communicating with an external process or database machine at the same time of using reflection: it probably won't count much in the benchmark. Doctrine 2 however takes optimization seriously to the point that metadata internal classes (accessed very often) present an Api with public properties instead of methods to avoid every overhead in a crucial part (hydration of objects with data retrieved from the database). An advantage of generated code is that it would be easier to debug, but it is usually a pain to maintain: every time you evolve or refactor a domain class you have to regenerate the Mapper classes. You can't customize this code either, because you would lost your changes at the regeneration time. Advantages and (few) disadvantages Of course we lose some expressiveness by specifying metadata instead of a programmatical behavior like the source code of a custom Data Mapper. But we gain very much: a fully tested ORM, like Doctrine 2 in the PHP case, with only some lines of added metadata to keep in sync with the rest of the code base. Declarative approaches trading off completeness of functionalities (the absent ones are not used very often anyway) for developers time. But there are other advantages, such as the generation (and migration) of the database schema based on the metadata, and also of the proxy classes. Ideally, the metadata mapping is the only point of strong coupling of your Domain Model with an external adapter, the ORM. It is of course part of the infrastructure, so keep it under version control along with the code! Adding and removing fields or relationships, changing keys or refactoring is much easier because you do it declaratively instead of refactoring a specific mapper class. Note that automated refactoring tools are not to be trusted here: for example they usually ignore the mapping when you change a field name. So grep is your best ally. Examples The sample code of this article will present the different ways of specifying metadata for Doctrine 2, the most high-tech PHP ORM. The performance of the different methods are equivalent, since the metadata are read only one time into native PHP objects and then cached. Metadata is a vast subject since all the different persistence implementations have to be driven by it, but we will look more at the types of metadata specification we can use instead of all the different metadata instances, which are best described in conjunction with the single features (for example, the inheritance patterns articles contain the description of the metadata related to subclassing.) The simplest way to express metadata mapping in Doctrine 2 is via annotations, embedded in the docblocks and ignored from anything but the ORM: Don't be alarmed by the size: this mapping does much more than the annotations example's one. A third way to specify metadata is via YAML, a format widely used in symfony-related software: --- # Doctrine.Tests.ORM.Mapping.User.dcm.yml Doctrine\Tests\ORM\Mapping\User: type: entity table: cms_users id: id: type: integer generator: strategy: AUTO fields: name: type: string length: 50 oneToOne: address: targetEntity: Address joinColumn: name: address_id referencedColumnName: id oneToMany: phonenumbers: targetEntity: Phonenumber mappedBy: user cascade: cascadePersist manyToMany: groups: targetEntity: Group joinTable: name: cms_users_groups joinColumns: user_id: referencedColumnName: id inverseJoinColumns: group_id: referencedColumnName: id lifecycleCallbacks: prePersist: [ doStuffOnPrePersist, doOtherStuffOnPrePersistToo ] postPersist: [ doStuffOnPostPersist ]

July 5, 2010

by Giorgio Sironi

· 3,923 Views

Working with the bit.ly API to shorten URLs

Bit.ly is a quite popular URL shortening service. On Tiwtter, almost all of the links I see provided by the people I follow are posted as bit.ly shortcuts. If you’ve used a Twitter client, you probably already know that some of them (if not almost every single of them) offers URL shortening as a built-in capability. Now, you can implement the same functionality, thanks to the fact that bit.ly offers a public API to do this. But let’s start with coding. First of all, all data that is passed to the service is transferred via HTTP requests. The response generated by the service is by default formatted as a JSON document, however the developer can explicitly specify that XML data should be returned. A request to the bit.ly shortening service requires authentication, and the username and API key are required to be passed as parameters. This means that in order to use the service, a bit.ly account is needed (it is free). The API key can be found here (http://bit.ly/account/your_api_key) once the user registered. Shortening The first (and probably the most important method) is the one that actually shortens the URL and it is called /v3/shorten. V3 at the beginning stands for the API version (that is 3.0 at the moment, so don’t worry about that). This method accepts 5 parameters: • format – determines the output format for the request • longUrl –determines the long URL that needs to be shortened • domain – [optional] the domain used for shortening – either bit.ly or j.mp (default: bit.ly) • x_login – the user ID (although in the documentation it is indicated as optional, it is not) • x_apiKey – the user API key (although in the documentation it is indicated as optional, it is not) Let’s look at the method I’ve written to shorten the URL: enum Format { XML, JSON, TXT } enum Domain { BITLY, JMP } string ShortenUrl(string longURL, string username, string apiKey, Format format = Format.XML, Domain domain = Domain.BITLY) { string _domain; string output; // Build the domain string depending on the selected domain type if (domain == Domain.BITLY) _domain = "bit.ly"; else _domain = "j.mp"; HttpWebRequest request = (HttpWebRequest)WebRequest.Create( string.Format(@"http://api.bit.ly/v3/shorten?login={0}&apiKey={1}&longUrl={2}&format={3}&domain={4}", username, apiKey, HttpUtility.UrlEncode(longURL), format.ToString().ToLower(), _domain)); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader (response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } There are two enums that hold the possible data formats as well as the domain names. This is made for safety reasons – if I would pass these as string parameters, there is a higher chance the end-user will pass the wrong string, and then the function will fail. The code is based on a single HttpWebRequest that creates a HTTP request to the URL that is built according to the data passed to it. Then, I am getting the response stream and passing the string representation to the returned string variable. Notice the fact that I am explicitly returning a string value for this method. In fact, I could either return a JsonDocument instance (requires a third-party library to use this class) or XmlDocument. But since there are two possible formats for the request to handle, it is better to return this as a simple string and then let the developer decide what he wants to do next. Once called, the function will return data similar to this: 200 OK http://j.mp/crnexS crnexS msft http://www.microsoft.com 0 Or this (for JSON): { "status_code": 200, "status_txt": "OK", "data": { "long_url": "http:\/\/www.microsoft.com", "url": "http:\/\/j.mp\/crnexS", "hash": "crnexS", "global_hash": "msft", "new_hash": 0 } } Or this (for TXT): http://j.mp/crnexS I am using a custom domain here, but as you see – the format and domain are optional parameters. I can leave them with default values without actually passing them to the function, and then the only values that need to be indicated are the user ID, API key and the long URL. Decoding There is also a way to decode the URL to its initial state from what was the shortened one. The method is called /v3/expand and is used in a similar manner as the shortening one. In this method, I am also using the Format enum to specify the output format: string DecodeUrl(string[] urlSet, string[] hashSet, string username, string apiKey, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/expand?login={0}&apiKey={1}&format={2}", username, apiKey, format.ToString().ToLower()); if (urlSet != null) { foreach (string url in urlSet) URL += "&shortUrl=" + HttpUtility.UrlEncode(url); } if (hashSet != null) { foreach (string hash in hashSet) URL += "&hash=" + hash; } HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } It works a bit different though. As you can see, I am requesting the user to pass two arrays – one with URLs and one with hashes. One of them can be null, therefore the URL can be decoded either by the hash or by the shortened URL. The user can pass both arrays, and get a result similar to this: 200 OK http://j.mp/crnexS http://www.microsoft.com crnexS msft crnexS http://www.microsoft.com crnexS msft The URLs are sanitized inside the function – I am not assuming that the user will pass the encoded URL. In fact, the developer should never assume that the user will pass the correct value – the code should be as foolproof as possible. User validation If you work on an application that depends on the URL shortening service, it would be a good idea to validate the user before making the API calls. Bit.ly provides a method for this as well and it is called /v3/validate. It only requires three parameters – the username, the API key and the output format (that is in fact optional). The C# implementation for this method looks like this: string ValidateUser(string username, string apiKey, string userToCheck, string keyToCheck, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/validate?x_login={0}&x_apiKey={1}&login={2}&apiKey={3}&format={4}", userToCheck, keyToCheck, username,apiKey, format.ToString().ToLower()); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } A bit of confusion can be caused by the fact that there are x_ -prefixed copies of login and API key. You need to pass your ID and API key to verify someone else’s account validity. X_ -prefixed parameters represent the end user. The output should look similar to this: 200 OK 1 Count clicks Bit.ly provides click statistics, so once you shorten a URL, you can track its basic usage. Statistics are available through the /v3/clicks method. It doesn’t have a TXT output format, so you will have to avoid using that (or create a separate enum, that is the best choice). The implementation for it looks like this: string GetClicks(string[] urlSet, string[] hashSet, string username, string apiKey, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/clicks?login={0}&apiKey={1}&format={2}", username, apiKey, format.ToString().ToLower()); if (urlSet != null) { foreach (string url in urlSet) URL += "&shortUrl=" + HttpUtility.UrlEncode(url); } if (hashSet != null) { foreach (string hash in hashSet) URL += "&hash=" + hash; } HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } Same as for the Expand method, an array of URLs and hash codes can be passed and statistics will be generated for multiple entries at once. The response looks like this (in XML format): 200 http://j.mp/crnexS msft 0 crnexS 2230 0 msft crnexS crnexS 2230 OK Note that the XML won’t be indented by default. Check for PRO domain Bit.ly offers pro, customizable domains. That means, that not only bit.ly and j.mp can be used for shortening, but user-defined domains as well. The /v3/bitly_pro_domain method allows to check whether a domain is bit.ly PRO-powered or not. It is very similar to the user validation method, but it accepts the domain name instead of the user credentials. The C# implementation looks like this: string CheckPro(string username, string apiKey, string domain, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/bitly_pro_domain?login={0}&apiKey={1}&domain={2}&format={3}", username, apiKey, domain, format.ToString().ToLower()); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } Once called, the output looks like this: 200 nyti.ms 1 OK URL lookup Bit.ly also allows the lookup of long URLs. For example, you might want to find if there is an existing short URL for the existing long URL. To do this, there is the /v3/lookup method. The implementation is quite simple and as other methods, it has the same base structure: string Lookup(string username, string apiKey, string[] url, Format format = Format.XML) { string output; string URL = string.Format(@"http://api.bit.ly/v3/lookup?login={0}&apiKey={1}&format={2}", username, apiKey, format.ToString().ToLower()); foreach (string _url in url) URL += "&url=" + HttpUtility.UrlEncode(_url); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL); using (WebResponse response = request.GetResponse()) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { output = reader.ReadToEnd(); } } return output; } The XML response looks similar to this for a positive result (there is a URL found): 200 http://www.dreamincode.net http://bit.ly/mviGY mviGY OK Notice that I can pass an array of URLs to be checked. Mind, though, that the maximum number of URLs that can be passed to the method is 15. With the methods described above, you can harness the power of bit.ly and bring it to your .NET application (the code can easily be ported to any .NET-compatible programming language). For official documentation, you can take a look here.

June 18, 2010

by Denzel D.

· 33,478 Views

Versioning Static Assets with UrlRewriteFilter

A few weeks ago, a co-worker sent me interesting email after talking with the Zoompf CEO at JSConf. One interesting tip mentioned was how we querystring the version on our scripts and css. Apparently this doesn't always cache the way we expected it would (some proxies will never cache an asset if it has a querystring). The recommendation is to rev the filename itself. This article explains how we implemented a "cache busting" system in our application with Maven and the UrlRewriteFilter. We originally used querystring in our implementation, but switched to filenames after reading Souders' recommendation. That part was figured out by my esteemed colleague Noah Paci. Our Requirements Make the URL include a version number for each static asset URL (JS, CSS and SWF) that serves to expire a client's cache of the asset. Insert the version number into the application so the version number can be included in the URL. Use a random version number when in development mode (based on running without a packaged war) so that developers will not need to clear their browser cache when making changes to static resources. The random version number should match the production version number formats which is currently: x.y-SNAPSHOT-revisionNumber When running in production, the version number/cachebust is computed once (when a Filter is initialized). In development, a new cachebust is computed on each request. In our app, we're using Maven, Spring and JSP, but the latter two don't really matter for the purposes of this discussion. Implementation Steps 1. First we added the buildnumber-maven-plugin to our project's pom.xml so the build number is calculated from SVN. org.codehaus.mojo buildnumber-maven-plugin 1.0-beta-4 validate create false false javasvn 2. Next we used the maven-war-plugin to add these values to our WAR's MANIFEST.MF file. maven-war-plugin 2.0.2 true ${project.version} ${buildNumber} ${timestamp} 3. Then we configured a Filter to read the values from this file on startup. If this file doesn't exist, a default version number of "1.0-SNAPSHOT-{random}" is used. Otherwise, the version is calculated as ${project.version}-${buildNumber}. private String buildNumber = null; ... @Override public void initFilterBean() throws ServletException { try { InputStream is = servletContext.getResourceAsStream("/META-INF/MANIFEST.MF"); if (is == null) { log.warn("META-INF/MANIFEST.MF not found."); } else { Manifest mf = new Manifest(); mf.read(is); Attributes atts = mf.getMainAttributes(); buildNumber = atts.getValue("Implementation-Version") + "-" + atts.getValue("Implementation-Build"); log.info("Application version set to: " + buildNumber); } } catch (IOException e) { log.error("I/O Exception reading manifest: " + e.getMessage()); } } ... // If there was a build number defined in the war, then use it for // the cache buster. Otherwise, assume we are in development mode // and use a random cache buster so developers don't have to clear // their browswer cache. requestVars.put("cachebust", buildNumber != null ? buildNumber : "1.0-SNAPSHOT-" + new Random().nextInt(100000)); 4. We then used the "cachebust" variable and appended it to static asset URLs as indicated below. The injection of /v/[CACHEBUSTINGSTRING]/(assets|compressed) eventually has to map back to the actual asset (that does not include the two first elements of the URI). The application must remove these two elements to map back to the actual asset. To do this, we use the UrlRewriteFilter. The UrlRewriteFilter is used (instead of Apache's mod_rewrite) so when developers run locally (using mvn jetty:run) they don't have to configure Apache. 5. In our application, "/compressed/" is mapped to wro4j's WroFilter. In order to get UrlRewriteFilter and WroFilter to work with this setup, the WroFilter has to accept FORWARD and REQUEST dispatchers. rewriteFilter /* WebResourceOptimizer /compressed/* FORWARD REQUEST Once this was configured, we added the following rules to our urlrewrite.xml to allow rewriting of any assets or compressed resource request back to its "correct" URL. ^/v/[0-9A-Za-z_.\-]+/assets/(.*)$ /assets/$1 ^/v/[0-9A-Za-z_.\-]+/compressed/(.*)$ /compressed/$1 /compressed/** /compressed/$1 Of course, you can also do this in Apache. This is what it might look like in your vhost.d file: RewriteEngine on RewriteLogLevel 0! RewriteLog /srv/log/apache22/app_rewrite_log RewriteRule ^/v/[.A-Za-z0-9_-]+/assets/(.*) /assets/$1 [PT] RewriteRule ^/v/[.A-Za-z0-9_-]+/compressed/(.*) /compressed/$1 [PT] Whether it's a good idea to implement this in Apache or using the UrlRewriteFilter is up for debate. If we're able to do this with the UrlRewriteFilter, the benefit of doing this at all in Apache is questionable, especially since it creates a duplicate of code. From http://raibledesigns.com/rd/entry/versioning_static_assets_with_urlrewritefilter

June 5, 2010

by Matt Raible

· 11,770 Views

Creating Master-Detail Forms with Vaadin and Grails

In the article, Groovy, Grails and Vaadin, Petter Holmström outlined an example of using Vaadin with Grails, to build a simple data bound UI for a single database table. This was a great article that showed how one can quickly build a web-based solution that behaves very much like a typical client-server application. However, there are many instances where we need to be able to build an entry screen which allows the user to update both the master record and its detail records at the same time, such as in an Invoice entry screen. In this article I looked at extending the example in the above article, leveraging the simplicity of the Vaadin framework and the GORM capabilities, to build a master-detail example. After years of doing development using Oracle Forms, I liked the way it provides generic functions on each form (or data block) to allow users to add new records, delete them, query them and navigate from one record to the next. Furthermore, in Oracle Forms, you could provide a data grid to allow the user to edit the related detail table (such as the invoice lines in an Invoice document) and keeps these records in sync with the Invoice header (master) record. This is a first attempt at reproducing some of these capabilities using Vaadin and Grails. The master-detail data model The example here is an extension of the Petter Holmström model of the Trip Planner. In the original example, there is a table holding information on trips taken by the user. We will extend that example by having a one-to-many relation with a table of the places that are visited in each trip. This is the code for the Trip domain class. Notice that we are using Grails 1.3.1 which, by default, puts the domain class into the tripplanner package. package tripplanner class Trip { static constraints = { name(nullable: true) city(nullable: true) startDate(nullable: true) endDate(nullable: true) purpose(nullable: true) notes(nullable: true) } String name String city Date startDate Date endDate String purpose String notes static hasMany = [ places : Place ] static mapping = { places lazy:false, cascade:"all,delete-orphan" } } We have made the fields nullable for ease of inserting the record into the table. We also created the one-to-many relation with the Place domain class, and put in the mapping to cascade the deletion and updates to the sub-table class. We also need to change the relation from lazy to eager to ensure that the all detail records are loaded at the same time as the master (header) record. As for the Place domain class, we have 2 fields; name of the place we visited and a description field. In addition, we also put in a transient field, "_deleted", hich we will use to mark a detail record for deletion. package tripplanner class Place { static constraints = { name(nullable: true) description(nullable: true) } String name String description boolean _deleted static transients = [ '_deleted' ] static belongsTo = [trip:Trip] } The master-detail form Now, we need to change the VaadinApp class. Instead of listing the master record in a table to be selected for editing, I have opted to follow the Oracle Forms style for letting the user query for master records using Query By Example (QBE). The standard GORM and Hibernate infrastructure allows the use of a "example" object as a query pattern to extract a subset of records. This unfortunately is not exactly how Oracle Forms QBE works as Oracle Forms allows the user to put in "A%" patterns to query for values that starts with "A" and ">999" to query for values that is bigger than 999. The QBE in GORM only allows exact matches. We will accept this limitation for this example. So now we need a set of buttons to allow the user to: New - Add a new record data entry Save - Saves the current record to the database, including the changes done to the detail table Del - Deletes the current master record and the associated detail records EnterQ - Enters into Query mode which allows the user to provide a pattern for a QBE search ExecQ - Executes the Query and shows the first matching record from the query ExitQ - Exits out of Query mode and goes back to New record mode Prev - Go to previous record if there any queried records Next - Go to next record if there are any queried records These buttons are located at the top of the master record form. Error messages which arise from the save / update button are listed at the top of the master record form fields (in the "description field" of the Vaadin Form). There is also a status indicator label located at the footer of the master record form ("footer section" of the Vaadin Form). The status label will show the current mode of the master record entry screen and also the current record number if this is part of a query set. Below the master record form is the editable, selectable detail records table which is located in its own panel. Users are allowed to edit the detail record table only if the master record is in Edit mode. This means that users need to save the newly created master (header) record before that they can add any detail records. The detail record table has a column for all the editable, visible records excluding the "_deleted" field which is for internal use only. To manipulate the detail records, we provide 2 buttons to add a new detail record (the "+" button) or delete the selected detail record (the "-" button). All additions, deletions or even updates are only saved to the database after the user clicks on the "Save" button in the master (header) record form. As the user moves from one master record to the next using the "Prev" and "Next" buttons in the master table form, the detail table will be synchronized accordingly. package tripplanner import com.vaadin.ui.* import com.vaadin.data.* import com.vaadin.data.util.* class VaadinApp extends com.vaadin.Application { def query def rec_pos def container = new BeanItemContainer(Place.class) def master_fields = ["name", "purpose", "startDate", "endDate", "city", "notes"] def saveButton def newButton def delButton def entQButton def exeQButton def extQButton def prevButton def nextButton def statusLabel def pnlDetail void new_mode(editor) { def tripInstance = new Trip() editor.itemDataSource = new BeanItem(tripInstance) editor.visibleItemProperties = master_fields editor.description = "" container.removeAllItems() saveButton.setEnabled(true) delButton.setEnabled(false) entQButton.setEnabled(true) extQButton.setEnabled(false) statusLabel.setValue "New Mode" pnlDetail.setEnabled(false) } void init() { //def window = new Window("Trip Maintenance", new SplitPanel(SplitPanel.ORIENTATION_HORIZONTAL)) def window = new Window("Trip Maintenance") setMainWindow window // Form Panel def panel = new Panel("Trip Maintenance") panel.setSizeFull() panel.setLayout(new VerticalLayout()); // Trip editor def tripEditor = new Form() tripEditor.setSizeFull() tripEditor.layout.setMargin true tripEditor.immediate = true tripEditor.visible = true // status bar - shows the mode of the form statusLabel = new Label("New Mode") // define master form buttons saveButton = new Button("Save") newButton = new Button("New") delButton = new Button("Del") entQButton = new Button("EnterQ") exeQButton = new Button("ExecQ") extQButton = new Button("ExitQ") prevButton = new Button("Prev") nextButton = new Button("Next") // set initial state of buttons delButton.setEnabled(false) extQButton.setEnabled(false) // default is New Mode def v_tripInstance = new Trip() tripEditor.itemDataSource = new BeanItem(v_tripInstance) tripEditor.visibleItemProperties = master_fields //new_mode(tripEditor) // panel for detail form pnlDetail = new Panel() pnlDetail.setEnabled(false) // table to hold the detail form rows def table = new Table() table.containerDataSource = container table.selectable = true table.editable = true table.setSizeFull() table.visibleColumns = ["name", "description"] table.immediate = true // toolbar for the detail form manipulation def toolbar = new HorizontalLayout() // buttons for detail form // button to add new row to detail form def addRowButton = new Button("+", new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { def placeInstance = new Place() def tripInstance = tripEditor.itemDataSource.bean tripInstance.addToPlaces(placeInstance) container.addBean placeInstance } }) toolbar.addComponent addRowButton // button to delete current row from detail form def delRowButton = new Button("-", new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (table.value) { def placeInstance = table.value placeInstance._deleted = true container.removeItem(placeInstance) table.value = null } } }) toolbar.addComponent delRowButton // detail panel has the table rows and the toolbar pnlDetail.addComponent toolbar pnlDetail.addComponent table // master form buttons // save record button saveButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { Trip.withTransaction { status -> table.commit() def tripInstance = tripEditor.itemDataSource.bean def _toBeDeleted = tripInstance.places.findAll {it._deleted} if (_toBeDeleted) { tripInstance.places.removeAll(_toBeDeleted) } if (!tripInstance.save(flush:true)) { tripEditor.description = "Error:" tripInstance.errors.allErrors.each { error -> tripEditor.description = tripEditor.description + "" + "Field [${error.getField()}] with value [${error.getRejectedValue()}] is invalid" } tripEditor.description = tripEditor.description + "" window.showNotification "Could not save changes" } else { tripEditor.description = "" window.showNotification "Changes saved" delButton.setEnabled(true) statusLabel.setValue("Edit Mode"); pnlDetail.setEnabled(true) populate_container(container,tripInstance.places) } } } }) // new record button newButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { new_mode(tripEditor) } }) // delete record button delButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (tripEditor.itemDataSource.bean) { Trip.withTransaction { status -> def tripInstance = tripEditor.itemDataSource.bean tripInstance.delete(flush:true) } } new_mode(tripEditor) } }) // enter query mode entQButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { def tripInstance = new Trip() def newBean = new BeanItem(tripInstance) tripEditor.itemDataSource = newBean tripEditor.visibleItemProperties = master_fields tripEditor.description = "" saveButton.setEnabled(false) delButton.setEnabled(false) entQButton.setEnabled(false) extQButton.setEnabled(true) statusLabel.setValue("Query Mode"); pnlDetail.setEnabled(false) } }) // execute query exeQButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { tripEditor.commit() def example = (Trip) tripEditor.itemDataSource.bean query = Trip.findAll(example) rec_pos = 0 tripEditor.description = "" entQButton.setEnabled(true) extQButton.setEnabled(false) if (query.size > 0) { def next_rec = query[rec_pos] tripEditor.itemDataSource = new BeanItem(next_rec) tripEditor.visibleItemProperties = master_fields saveButton.setEnabled(true) delButton.setEnabled(true) statusLabel.setValue("Edit Mode. Record " + (rec_pos+1) + "/" + query.size); pnlDetail.setEnabled(true) populate_container(container,next_rec.places) } else { window.showNotification "No record found" new_mode(tripEditor) } } }) // exit query mode. Back to New mode extQButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { new_mode(tripEditor) } }) // edit mode, previous record prevButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (!query) { window.showNotification "No query result" return } if (rec_pos > 0) { rec_pos-- def next_rec = query[rec_pos] tripEditor.itemDataSource = new BeanItem(next_rec) tripEditor.visibleItemProperties = master_fields saveButton.setEnabled(true) statusLabel.setValue("Edit Mode. Record " + (rec_pos+1) + "/" + query.size) pnlDetail.setEnabled(true) populate_container(container,next_rec.places) } } }) // edit mode, next record nextButton.addListener(new Button.ClickListener() { void buttonClick(Button.ClickEvent event) { if (!query) { window.showNotification "No query result" return } if (rec_pos < query.size - 1) { rec_pos++ def next_rec = query[rec_pos] tripEditor.itemDataSource = new BeanItem(next_rec) tripEditor.visibleItemProperties = master_fields saveButton.setEnabled(true) statusLabel.setValue("Edit Mode. Record " + (rec_pos+1) + "/" + query.size) pnlDetail.setEnabled(true) populate_container(container,next_rec.places) } } }) // put the status bar on the footer of the master form tripEditor.footer = new HorizontalLayout() tripEditor.footer.addComponent statusLabel // put all master form buttons into a horizontal panel def btnPanel = new HorizontalLayout() btnPanel.addComponent saveButton btnPanel.addComponent newButton btnPanel.addComponent delButton btnPanel.addComponent entQButton btnPanel.addComponent exeQButton btnPanel.addComponent extQButton btnPanel.addComponent prevButton btnPanel.addComponent nextButton // construct master-detail form panel panel.addComponent btnPanel panel.addComponent tripEditor panel.addComponent pnlDetail // set panel to main window mainWindow.addComponent panel } // routine to populate the container with the detail records void populate_container(container, details) { container.removeAllItems() details.each { if (!it._deleted) container.addBean it } } } A walkthrough of the above VaadinApp code is as follows. The query object stores the list of "Trip" instances that is returned from a QBE query. The rec_pos is an index into the above list for the currently displayed record. The master_fields stores the list of master record fields that is visible on the form We put some of the buttons, status label and detail panel object as the object variables so that they can be access by helper routines The new_mode routine is to put the form into the New mode, which occurs multiple times in the whole code The init routine is where the whole window and form is defined The form is laid out with the master record buttons at the top, followed by the master record form, and then the detail table sub-panel below that In the detail sub-panel, there are 2 buttons, "+" and "-" which adds a detail to the Table and the master record instance and removes a detail record, respectively. These are not put into a GORM Transaction as we only want to save (flush) the record when the "Save" button is saved. The really tricky part of the whole form is in the master record buttons. For the "Save" button, we first force the detail table to update the source records in the master collection. Then we delete all the detail records which are marked for deletion, where the "_deleted" field is set to true. Finally the master record is saved by calling the "save" method in the tripInstance bean object. Any error messages are put into the "description field" of the master record form. If everything is fine, then the form is put into "Edit" mode for further editing including adding detail records into the Table below. The new button just puts a new Trip instance into the master record form (and disables the detail table panel) The delete button just deletes the master record. There is no need to "commit" the transaction as it is done automatically, and this differs from Oracle Forms The enter-query button just clears the fields to collect the parameters to search. If all the fields are null, then the QBE will match all records in the database Executing the query will run the "findAll(object)" method of the GORM domain instance. The return result is a List of matching instances. This is not quite scalable as the List needs to be stored in memory for the duration of the session or until the next query is run. This will need to be improved further. If there are any returned records, then the master form will display the first record and put it in "Edit" mode Exiting the Query mode will return the form to the New mode The previous and next buttons will update the master record form and the container of the detail table panel, thus keeping both screens in synchrony. The picture on the left shows the data entry screen with the detail records for the current master record. Future Direction To improve on this, we will need to componentize the whole form so that developers do not need to copy and paste the above template code and modify it for each form. Furthermore the "container" object above should be changeable to allow for other types of containers to be used but they should not be data-bound and leave the data-binding to the GORM objects and layer. Another suggestion is to create a Container component which is manipulated by the master table form buttons (add, save, delete, enter query, execute query, previous record, next record) and it then controls the master table entry Form. Finally, as mentioned in the Petter Holmström article, we are not really using any feature of Grails other than the GORM, and so, this entire example can be build using just Groovy, GORM and Vaadin.

May 31, 2010

by Chee-keong Lee

· 39,586 Views · 1 Like

JMS Clustering by Example

It's amazing how the JBoss Team put together an easy way to do JMS Clustering, out of the box!!. I'll start with an easy example, creating a Queue named "MyClusteredQueue". In this example I'm using JBoss AS 5.1. and two computers connected on the same network, with these IP's: - Computer A: 192.168.0.143 - Computer B: 192.168.0.210 So, here are the steps: 1) Install the JBoss on both computers. We are going to use the "all" configuration for both computers. 2) We create our Queue on both servers. Go to $JBOSS_HOME/server/all/deploy/messaging/ and edit the destinations-service.xml file. Add the MyClusteredQueue before the last server tag. It looks like this: jboss.messaging:service=ServerPeer jboss.messaging:service=PostOffice true This is how you add a Queue to the JBoss, and the people how are familiar with this, the only new thing is to add the attribute "Clustered". This step must be set on both computers. At the end of the article you can find the files. 3) Write the MDB to consume the messages, and deploy it on the two computers. (I'm using an EJB 3 - MDB style). import java.net.InetAddress; import javax.ejb.ActivationConfigProperty; import javax.ejb.MessageDriven; import javax.jms.Message; import javax.jms.MessageListener; import javax.jms.ObjectMessage; import org.apache.log4j.Logger; /** * @author felipeg * */ @MessageDriven(activationConfig = { @ActivationConfigProperty(propertyName="destinationType", propertyValue="javax.jms.Queue"), @ActivationConfigProperty(propertyName="destination", propertyValue="queue/MyClusteredQueue") }) public class JMSClusterClientHandler implements MessageListener { Logger log = Logger.getLogger(JMSClusterClientHandler.class); @Override public void onMessage(Message message) { try{ if (message instanceof ObjectMessage) { InetAddress addr = InetAddress.getLocalHost(); log.info("########## Processing Host: " + addr.getHostName() + " ##########" ); ObjectMessage objMessage = (ObjectMessage) message; Object obj = objMessage.getObject(); log.info("Object received:" + obj.toString()); } } catch (Exception e) { e.printStackTrace(); } } } 4) Start the jboss with the following options: Computer A: $ cd $JBOSS_HOME/bin $ ./run.sh -c all -b 192.168.0.143 -Djboss.messaging.ServerPeerID=1 Computer B: $ cd $JBOSS_HOME/bin $ ./run.sh -c all -b 192.168.0.210 -Djboss.messaging.ServerPeerID=2 It is necesary to give an ID to each server and this is accomplished with this directive: -Djboss.messaging.ServerPeerID When you start the jboss on computer A, you should see the logs (server.log) telling you that there is one node ready and listening, and once you start the jboss on computer B, on the log will appear the two nodes, the two IP's ready to consume messages. 5) Now it's time to send a Message to the Queue. To accomplish this it's necessary to change the connection factory to "ClusteredConnectionFactory" (JMSDispatcher.java - See the code below). Also on the jndi.properties (if you are using the default InitialContext) file it's necessary to add the two computers ip's separated by comma to the java.naming.provider.url property. (In my case a create a Properties variable and I set all the necessary properties, JMSDispatcher.java - see the code below). java.naming.provider.url=192.168.0.143:1099,192.168.0.210:1099 The client that I wrote is a web application, that consist in one index.jsp page, which contains a form that prompts you for the name of the queue, the type of messaging (Queue or Topic), the server ip and port, how many times it will send the message and the actual message to be sent; also the web application has a Servlet (JMSClusteredClient.java - see code below) that receives the postback and helper class (JMSDispatcher.java - see code below) that sends the message to the jboss servers. You can to deploy it in any computer. In my case I deployed it on the Computer A. And you can access it through this URL: http://192.168.0.143:8080/JMSWeb/ (just modify the IP where the client war was deployed). If you notice (on the index.jsp - code below) I've already put some default values that reflects the name of the Queue, and the IP's of my two computers. Now, If you increment the number of times that the message will be sent (maybe a 10) and fill out the message box, and click "Send" you should see on the two servers some of the messages being consumed by the MDB. Here are the Files to create the client: index.jsp JMS Clustered - Test Client Server: QueueTopic Times:Message: Servlet: JMSClusteredClient.java public class JMSClusteredClient extends HttpServlet { private static final long serialVersionUID = 1L; /** * @see HttpServlet#service(HttpServletRequest request, HttpServletResponse response) */ protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { PrintWriter out = response.getWriter(); String topicqueue = request.getParameter("topicqueue"); String message = request.getParameter("message"); String server = request.getParameter("server"); String messageType = request.getParameter("messageType"); String times = request.getParameter("times"); int intTimes = Integer.parseInt(times); JMSDispatcher dispatcher = new JMSDispatcher(); dispatcher.setTopicQueueName(topicqueue); dispatcher.setServer(server); dispatcher.setMessageType(messageType); try { for(int count =1; count <= intTimes;count++){ dispatcher.sendMessage( count + " of " + times + " " + message); } out.println("Message [" + message + "] sent successfully to [" + topic + "] to the [" + server + "] server " + times + " times."); } catch (JMSException e) { e.printStackTrace(); out.println("Error:" + e.getMessage()); } catch (NamingException e) { out.println("Error:" + e.getMessage()); e.printStackTrace(); } finally{ out.close(); } } } A utility to send the messages: JMSDispatcher.java public class JMSDispatcher { /** * */ private static final long serialVersionUID = 7105145023422143880L; private static Logger log = Logger.getLogger(JMSDispatcher.class); private final String CONNECTION_FACTORY_CLUSTERED = "ClusteredConnectionFactory"; private final String CONNECTION_FACTORY = "ConnectionFactory"; private final String TOPIC = "TOPIC"; private final String QUEUE = "QUEUE"; private String topicQueueName; private String server; private String messageType; public void setTopicQueueName(String value){ this.topicQueueName = value; } public void setServer(String value){ this.server = value; } public void setMessageType(String value){ this.messageType = value; } public void sendMessage(Object objectMessage) throws JMSException, NamingException{ log.debug("##### Setting up a Queue/Topic Message: #####"); if (TOPIC.equals(messageType)){ sendTopicMessage(objectMessage); } else if (QUEUE.equals(messageType)){ sendQueueMessage(objectMessage); } log.debug("##### Publishing Message: Done #####"); } private void sendQueueMessage(Object objectMessage) throws JMSException, NamingException{ try{ InitialContext initialContext = getInitialContext(); QueueConnectionFactory qcf = (QueueConnectionFactory) initialContext.lookup(CONNECTION_FACTORY_CLUSTERED); QueueConnection queueConn = qcf.createQueueConnection(); Queue queue = (Queue) initialContext.lookup(topicQueueName); QueueSession queueSession = queueConn.createQueueSession(false, Session.AUTO_ACKNOWLEDGE); queueConn.start(); QueueSender send = queueSession.createSender(queue); ObjectMessage om = queueSession.createObjectMessage((Serializable)objectMessage); setMessageProperties(om); log.debug("##### Publishing Message to a Queue: " + queueName + "#####"); send.send(om); send.close(); queueConn.stop(); queueSession.close(); queueConn.close(); }catch(MessageFormatException ex){ log.error("##### The MESSAGE is not Serializable ####"); throw ex; }catch(MessageNotWriteableException ex){ log.error("##### The MESSAGE is not Readable ####"); throw ex; }catch(JMSException ex){ log.error("##### JMS provider fails to set the object due to some internal error. ####"); throw ex; } } private void sendTopicMessage(Object objectMessage) throws JMSException, NamingException{ try{ InitialContext initialContext = getInitialContext(); TopicConnectionFactory tcf = (TopicConnectionFactory)initialContext.lookup(CONNECTION_FACTORY_CLUSTERED); TopicConnection topicConn = tcf.createTopicConnection(); Topic topic = (Topic) initialContext.lookup(topicQueueName); TopicSession topicSession = topicConn.createTopicSession(false,TopicSession.AUTO_ACKNOWLEDGE); topicConn.start(); TopicPublisher send = topicSession.createPublisher(topic); ObjectMessage om = topicSession.createObjectMessage(); om.setObject((Serializable)objectMessage); setMessageProperties(om); log.debug("##### Publishing Message to a Topic: " + topicName + "#####"); send.publish(om); send.close(); topicConn.stop(); topicSession.close(); topicConn.close(); }catch(MessageFormatException ex){ log.error("##### The MESSAGE is not Serializable ####"); throw ex; }catch(MessageNotWriteableException ex){ log.error("##### The MESSAGE is not Readable ####"); throw ex; }catch(JMSException ex){ log.error("##### JMS provider fails to set the object due to some internal error. ####"); throw ex; } } private InitialContext getInitialContext() throws NamingException{ Properties jboss = new Properties(); jboss.put("java.naming.factory.initial", "org.jnp.interfaces.NamingContextFactory"); jboss.put("java.naming.factory.url.pkgs", "org.jboss.naming:org.jnp.interfaces"); jboss.put("java.naming.provider.url", server); return new InitialContext(jboss); } } And the web.xml JMSWeb index.jsp JMSClusteredClient JMSClusteredClient com.blogspot.felipeg48.jms.web.JMSClusteredClient JMSClusteredClient /JMSClusteredClient Happy Clustering!!

May 26, 2010

by Felipe Gutierrez

· 16,739 Views