Data Resources

The Latest Data Topics

Deterring “ToMany” Relationships in JPA models

This article considers the issues of one to many relationships from the JPA model, and looks at an alternative strategy to provide more efficient and fine grained data access, to build more robust and lightweight applications and web services. A fairly typical use is to have one entity ‘owned’ by the other in such a way that one entity is said to ‘have’ many instances of the other one. A typical example would be customer and orders : class Customer { @OneToMany(mappedBy="customer") private Set orders; } class Order { @ManyToOne private Customer customer; } In this trivial example, the order belongs to a customer, and the customer has a set of orders. We don’t have a problem with the ManyToOne relationship, especially as it is required in order to map the order back to the customer. When we load an order we will at most get a single reference to a customer. No, our problem is with the value we get from customer.getOrders() as this set of order entities doesn’t really serve any useful purpose and can cause more problems than it solves for the following reasons : Dumb Relationship – It will contain every order for this particular customer when you usually only want a subset of the orders that match a set of criteria. You either have to read them all and filter the ones you don’t want manually (which is what SQL is for) or you end up having to make a call to a method to get the specific entities you are interested in. Unbounded dataset – How many orders a customer has could vary and you could end up with a customer with thousands of orders. Combined with accidental eager fetching and loading a simple list of 10 people could mean loading thousands of entities. Unsecured Access – Sometimes we may want to restrict the items visible to the user based on their security rights. By making it available as a property controlled by JPA we lose that ability or have to implement it further down in the application stack. No Pagination – Similar to the unbounded dataset, you end up throwing the whole list into the pagination components and letting them sort out what to display. In most cases, you need to treat each dataset like it will eventually contain more than 30 records so you really need to consider pagination early. Overgrown object graph – When you request an entity, how much of the object graph do you need? How do you know which pieces to initialize so you can avoid LIEs? This is often the case with JPA, but is also more relevant when you take account of the needs to serialize object graphs to XML or JSON. Sometimes you might need the relationships and sometimes you do not depending on the context you will be using the data in. Rife with pitfalls – Who saves and cascades what and how do you bind one to the other? You create an order, and assign the customer, do you need to then add it to the customers list of orders or not. What happens if you forget to add it to the customer and you save the customer? Whatever strategy you pick for dealing with this will no doubt end up being implemented inconsistently. (Ok, the first four are really different facets of the same problem, that you can’t control the data you are getting back.) So what use are they? Well, they make it really tempting just to use customer.orders in the view which is suitable for some sets of data. They also allow the relationship to be used in ejbql statements, although the inverse of the relationship can also be used in most cases. Specifying this relationship can also allow you to cascade updates/deletes from the customer to the order, but then so can your database. Going Granular The best alternative I’ve found is to provide additional methods to obtain the relational information separate from the model. This more granular approach gives you plenty of ways of obtaining data from the database without the dangers and temptations of bad practices. For example, the Order object still has the Customer reference on it, which we use to obtain lists of orders from the data access layer which can be constrained by customer, time frame, or other criteria depending on where it is being used. Also, it allows data to be fetched when needed without having to define a single initialization strategy using annotations or mapping files. The code that knows what pieces of data it needs will have access to facilities to fetch the specific data it needs. Alternatively, the methods to fetch the data can either be exposed as web services directly or DTO objects can be used to build a data payload to be returned from a single web service that consolidates the calls. Regardless, you don’t need to worry about setting the JPA fetch or XML/JSON serialization policy permanently in the model. Some examples might be to fetch orders for a customer in different ways. public List getOrders(Long customerId) {...} public List getOrders(Long customerId,Date startDate,Date endDate) {...} public List getOrders(SearchCriteria searchCriteria,int firstResult,int pageSize) {...} What about @ManyToMany Good question. In most cases I find that what starts as a many to many relationship can usually be modeled as a separate entity because when you create a many to many relationship, there is usually additional information stored with that relationship. For example, a Users and Groups ManyToMany relationship has many users belonging to many groups and vice versa. The membership however also probably has start and end dates and also maybe a role within that group. This also exhibits one of the earlier problems in that user.getGroupMemberships() would return all group memberships past and present whereas you probably only want the active ones. Modeling it as a separate entity means it becomes an entity with two OneToMany relationships. While there are cases where the many to many relationship is literally just a pair of ids (think blog post tags, many tags to many posts), you could benefit at a later date by using an entity if you decide to add additional information into the relationship. In summary, moving relational fetches out of the data model and into the data layer means you remove some of the temptations of bad practices and create a library of reusable functions for fetching the data that can be used from different code points. From http://www.andygibson.net/blog/article/deterring-tomany-relationships-in-jpa-models/

January 18, 2011

by Andy Gibson

· 23,333 Views · 1 Like

EMF Tips: Accessing Model Meta Data, Serializing into Element/Attribute

Two tips for the Eclipse Modeling Framework (EMF) 2.2.1: Accessing model’s meta model – accessing EClass/attribute by name – so that you can set an attribute when you only know its name and haven’t its EAttribute How to force EMF to serialize an object as an XML element instead of an XML attribute Excuse my rather liberal and slightly confusing use of the terms model, model instance and meta model. Tip 1: Accessing model’s meta model – accessing EClass/attribute by name Normally you cannot do any operation on a dynamic EMF model instance – such as instantiating an EObject or settings its property via eSet – without the corresponding meta model objects such as EClass and EAttribute. But there is a solution – you can build an ExtendedMetaData instance and use its methods to find the meta model objects based on search criteria such as element name and namespace. Examples Create ExtendedMetaData from a model One way to build a meta data instance is to instantiate BasicExtendedMetaData based on a Registry, containing all registered packages. This is usually available either via ResourceSet.getPackageRegistry() or globally, via EPackage.Registry.INSTANCE. import org.eclipse.emf.ecore.util.*; ... ExtendedMetaData modelMetaData = new BasicExtendedMetaData(myResourceSet.getPackageRegistry()); Get EClass for a namespace and name Provided that your model contains an element with the name Book and namespace http://example.com/book: EClass bookEClass = (EClass) modelMetaData.getType("http://example.com/book", "Book"); Get EClass’ attribute by name Beware: Some properties (such as those described by named complex types) are not represented by an EAttribute but an EReference (both extend EStructuralFeature) and are accessible as the EClass’ elements, not attributes, even though from developer’s points of view they’re attributes of the owning class. Let’s suppose that the book has the attribute name: EStructuralFeature nameAttr = modelMetaData.getElement(bookEClass, null, "name"); The namespace is null because normally attributes/nested elements are not classified with a schema. Here is how you would print the name and namespace URI of an attribute/element: System.out.println("attr: " + modelMetaData.getNamespace(nameAttr) + ":" + nameAttr.getName()); //prints "null:name" Tip 2: How to force EMF to serialize an object as an XML element instead of an XML attribute Normally EMF stores simple Java properties as attributes of the XML element representing the owning class: but you might prefer to have it rather as a nested element: The Book of Songs To achieve that: Enable the save option OPTION_EXTENDED_META_DATA (so that extended meta data such as annotations and an XML map aren’t ignored) Tell EMF that you want this property to be stored as an element By attaching an eAnnotation to it (not shown) By supplying a XML Map with this information upon save To enable the extended metadata: Map saveOptions = new HashMap(); saveOptions.put(XMLResource.OPTION_EXTENDED_META_DATA, Boolean.TRUE); According to some documentation the value should be an implementation of ExtendedMetaData, according to others Boolean.TRUE is the correct choice – I use the latter for it’s easier and works for me. To tell EMF to write a property as an element when serailizing to XML: import org.eclipse.emf.ecore.xmi.impl.*; ... EAttribute bookNameEAttribute = ...; // retrieved e.g. from meta data, see Tip 1 XMLMapImpl map = new XMLMapImpl(); XMLInfoImpl x = new XMLInfoImpl(); x.setXMLRepresentation(XMLInfoImpl.ELEMENT); map.add(bookNameEAttribute, x); saveOptions.put(XMLResource.OPTION_XML_MAP, map); The XMLInfoImpl enables you to customize the namespace, name and representation of the element. When saving you then just supply the save options: EObject target = ...; org.eclipse.emf.ecore.resource.Resource outResource = ...; outResource.getContents().add(target); outResource.save(saveOptions); Reference: the Redbook Eclipse Development using the Graphical Editing Framework and the Eclipse Modeling Framework, page 74, section 2.3.4 From http://theholyjava.wordpress.com/2011/01/11/emf-tips-accessing-model-meta-data-serializing-into-elementattribute/

January 12, 2011

by Jakub Holý

· 14,566 Views

Getting MySQL work with Entity Framework 4.0

Does MySQL work with Entity Framework 4.0? The answer is: yes, it works! I just put up one experimental project to play with MySQL and Entity Framework 4.0 and in this posting I will show you how to get MySQL data to EF. Also I will give some suggestions how to deploy your applications to hosting and cloud environments. MySQL stuff As you may guess you need MySQL running somewhere. I have MySQL installed to my development machine so I can also develop stuff when I’m offline. The other thing you need is MySQL Connector for .NET Framework. Currently there is available development version of MySQL Connector/NET 6.3.5 that supports Visual Studio 2010. Before you start download MySQL and Connector/NET: MySQL Community Server Connector/NET 6.3.5 If you are not big fan of phpMyAdmin then you can try out free desktop client for MySQL – HeidiSQL. I am using it and I am really happy with this program. NB! If you just put up MySQL then create also database with couple of table there. To use all features of Entity Framework 4.0 I suggest you to use InnoDB or other engine that has support for foreign keys. Connecting MySQL to Entity Framework 4.0 Now create simple console project using Visual Studio 2010 and go through the following steps. 1. Add new ADO.NET Entity Data Model to your project. For model insert the name that is informative and that you are able later recognize. Now you can choose how you want to create your model. Select “Generate from database” and click OK. 2. Set up database connection Change data connection and select MySQL Database as data source. You may also need to set provider – there is only one choice. Select it if data provider combo shows empty value. Click OK and insert connection information you are asked about. Don’t forget to click test connection button to see if your connection data is okay. If everything works then click OK. 3. Insert context name Now you should see the following dialog. Insert your data model name for application configuration file and click OK. Click next button. 4. Select tables for model Now you can select tables and views your classes are based on. I have small database with events data. Uncheck the checkbox “Include foreign key columns in the model” – it is damn annoying to get them away from model later. Also insert informative and easy to remember name for your model. Click finish button. 5. Define your classes Now it’s time to define your classes. Here you can see what Entity Framework generated for you. Relations were detected automatically – that’s why we needed foreign keys. The names of classes and their members are not nice yet. After some modifications my class model looks like on the following diagram. Note that I removed attendees navigation property from person class. Now my classes look nice and they follow conventions I am using when naming classes and their members. NB! Don’t forget to see properties of classes (properties windows) and modify their set names if set names contain numbers (I changed set name for Entity from Entity1 to Entities). 6. Let’s test! Now let’s write simple testing program to see if MySQL data runs through Entity Framework 4.0 as expected. My program looks for events where I attended. using(var context = new MySqlEntities()) { var myEvents = from e in context.Events from a in e.Attendees where a.Person.FirstName == "Gunnar" && a.Person.LastName == "Peipman" select e; Console.WriteLine("My events: "); foreach(var e in myEvents) { Console.WriteLine(e.Title); } } Console.ReadKey(); And when I run it I get the result shown on screenshot on right. I checked out from database and these results are correct. At first run connector seems to work slow but this is only the effect of first run. As connector is loaded to memory by Entity Framework it works fast from this point on. Now let’s see what we have to do to get our program work in hosting and cloud environments where MySQL connector is not installed. Deploying application to hosting and cloud environments If your hosting or cloud environment has no MySQL connector installed you have to provide MySQL connector assemblies with your project. Add the following assemblies to your project’s bin folder and include them to your project (otherwise they are not packaged by WebDeploy and Azure tools): MySQL.Data MySQL.Data.Entity MySQL.Web You can also add references to these assemblies and mark references as local so these assemblies are copied to binary folder of your application. If you have references to these assemblies then you don’t have to include them to your project from bin folder. Also add the following block to your application configuration file. ... ... Conclusion It was not hard to get MySQL connector installed and MySQL connected to Entity Framework 4.0. To use full power of Entity Framework we used InnoDB engine because it supports foreign keys. It was also easy to query our model. To get our project online we needed some easy modifications to our project and configuration files.

December 10, 2010

by Gunnar Peipman

· 24,565 Views

Using Sphinx and Java to Implement Free Text Search

As promised I am going to provide an article on how we can use Sphinx with Java to perform a full text search. I will begin the article with an introduction to Sphinx. Introduction to Sphinx Databases are continually growing and sometimes tend to hold about 100M records and need an external solution for full text search to be performed. I have picked Sphinx, an open source full-text search engine, distributed under GPL version 2 to perform a full text search on such a huge amount of data. Generally, it's a standalone search engine meant to provide fast, size-efficient and relevant full-text search functions to other applications very much compatible with an SQL Database. So my example will be based on the MySQL database, as we cannot produce millions of data to evaluate the real power of Sphinx, we will have a small amount of data and I think that should not be a problem. Here are few Sphinx Unique Features: high indexing speed (up to 10 MB/sec on modern CPUs) high search speed (avg query is under 0.1 sec on 2-4 GB text collections) high scalability (up to 100 GB of text, upto 100 M documents on a single CPU) provides distributed searching capabilities provides searching from within MySQL through pluggable storage engine supports boolean, phrase, and word proximity queries supports multiple full-text fields per document (upto 32 by default) supports multiple additional attributes per document (ie. groups, timestamps, etc) supports MySQL natively (MyISAM and InnoDB tables are both supported) The important features which have been adopted to perform a full text search are the provision of the Java API to integrate easily with the web application and considerably high indexing and searching speed with an average of 4-10 MB/sec & 20-30 ms/q @5GB,3.5M docs(wikipedia) Sphinx Terms & How It Works The fist principle part of sphinx is indexer. It is solely responsible for gathering the data that will be searchable. From the Sphinx point of view, the data it indexes is a set of structured documents, each of which has the same set of fields. This is biased towards SQL, where each row corresponds to a document, and each column to a field. Sphinx builds a special data structure optimized for our queries from the data provided. This structure is called index; and the process of building index from data is called indexing and the element of sphinx which carries out these tasks is called indexer. Indexer can be executed either from a regular script or command-line interface. Sphinx documents are equal to records in DB. Document is set of text fields and number attributes + unique ID – similar to row in DB Set of fields and attributes is constant for index – similar to table in DB Fields are searchable for FullText queries Attributes may be used for filtering, sorting, grouping searchd is the second principle tools as part of Sphinx. It is the part of the system which actually handles searches; it functions as a server and is responsible for receiving queries, processing them and returning a dataset back to the different APIs for client applications. Unlike indexer, searchd is not designed to be run either from a regular script or command-line calling, but instead either as a daemon to be called from init.d (on Unix/Linux type systems) or to be called as a service (on Windows-type systems). I am going to focus on Windows environment so later I will show you how we can install sphinx on windows as a service. Finally search is one of the helper tools within the Sphinx package. Whereas searchd is responsible for searches in a server-type environment, search is aimed at testing the index quickly without building a framework to make the connection to the server and process its response. This will only be used for testing sphinx from command – line and with respect to application’s requirement; searchd service will be used to query the MySql Server with a pre created index. Installation on Windows So now we come to the part of installing Sphinx on Windows: Download Sphinx from the official Sphinx download site i.e http://sphinxsearch.com (I downloaded Win32 release binaries with MySQL support: sphinx-0.9.9-win32.zip) Unzip the file to some folder, I unzipped to C:\devel\sphinx-0.9.9-win32 and added the bin directory to the windows path variable Well Sphinx is installed. Nice, simple, easy. Later I will tell how to set up indexes and search. Sample Application Till now I guess the whole motto of this article is clear to you, let's move ahead to define our sample application. We all use the Address Book to search for people by using their name or e-mail address when we want to immediately address an e-mail message to a specific person, people, or distribution list. We also search for people by using other basic information, such as e-mail alias, office location, and telephone number etc. I think most of the people on this planet are quire familiar with this kind of search, so let's make outlook address book as our sample database schema. Most of the fields are mapped from microsoft outlook, the only additional column is date of joining so that we can filter our queries based on joining dates of the employees. The example that I am going to put forth will use Sphinx to search for a particluar address entry using free text search, meaning the user is free to type in anything, here is our search screen, the DOJ (date of joining) search parameter is optional. The screen is self explanatory, let's move ahead and define our database. As Sphinx works well with MySQL and MySQL being free also, lets create our db scripts around mysql database (Those who wish to install MySQL can dowload it from http://www.mysql.com) Let's create our sample database 'addressbook' mysql> create database addressbook; Query OK, 1 row affected (0.03 sec) mysql> use addressbook; Database changed Note: The fields defined in the following tables are for the purpose of learning only and may not contain a complete set of fields that microsoft address book or any similar software may provide. mysql> CREATE TABLE addressbook ( Id int(11) NOT NULL, FirstName varchar(30) NOT NULL, LastName varchar(30) NOT NULL, OfficeId int(11) DEFAULT NULL, Title varchar(20) DEFAULT NULL, Alias varchar(20) NOT NULL, Email varchar(50) NOT NULL, DOJ date NOT NULL, PhoneNo varchar(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; mysql> CREATE TABLE CompanyLocations ( Id int(11) NOT NULL, Location varchar(60) NOT NULL, Country varchar(20) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; It's time to put some dummy data into the table, so let's fill our tables. Our virtual company 'gogs.it' has six offices across India and Singapore as defined in the following insert script. mysql> insert into CompanyLocations (Id, Location, Country) VALUES (1, 'Tower One, Harbour Front, Singapore', 'SG'); insert into CompanyLocations (Id, Location, Country) VALUES (2, 'DLF Phase 3, Gurgaon, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (3, 'Hiranandani Gardens, Powai, Mumbai, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (4, 'Hinjwadi, Pune, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (5, 'Toll Post, Nagrota, Jammu, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (6, 'Bani (Kathua), India', 'IN'); Now comes the real stuff... The data sphinx is going to index, let's populate that as well...wooooo mysql> INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (1,'Aabheer','Kumar',1,'Mr','u534','[email protected]','2008-9-3', '+911234599990'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (2,'Aadarsh','Gupta',6,'Mr','u668','[email protected]','2007-2-23','+911234599991'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (3,'Aachman','Singh',5,'Mr','u2766','[email protected]','2006-12-18','+911234599992'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (4,'Aadesh','Shrivastav',5,'Mr','u3198','[email protected]','2007-11-23','+911234599993'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (5,'Aadi','manav',1,'Mr','u2686','[email protected]','2010-7-20','+911234599994'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (6,'Aadidev','singh',4,'Mr','u572','[email protected]','2010-8-18','+911234599995'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (7,'Aafreen','sheikh',4,'Smt','u1092','[email protected]','2007-7-11','+911234599996'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (8,'Aakar','Sherpa',5,'Mr','u1420','[email protected]','2009-10-3','+911234599997'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (9,'Aakash','Singh',4,'Mrs','u2884','[email protected]','2008-6-11','+911234599998'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (10,'Aalap','Singhania',4,'Mrs','u609','[email protected]','2010-10-8','+911234599999'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (11,'Aandaleeb','mahajan',1,'Smt','u131','[email protected]','2010-10-21','+911234580001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (12,'Mamata','kumari',5,'Sh','u2519','[email protected]','2009-6-12','+911234580002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (13,'Mamta','sharma',6,'Smt','u4123','[email protected]','2009-2-8','+911234580003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (14,'Manali','singh',6,'Mr','u1078','[email protected]','2008-6-14','+911234580004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (15,'Manda','saxena',1,'Mrs','u196','[email protected]','2010-9-4','+911234580005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (16,'Salila','shetty',3,'Miss','u157','[email protected]','2009-11-15','+911234580006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (17,'Salima','happy',3,'Mrs','u3445','[email protected]','2006-7-14','+911234580007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (18,'Salma','haik',5,'Sh','u4621','[email protected]','2008-6-23','+911234580008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (19,'Samita','patil',3,'Smt','u3156','[email protected]','2006-6-7','+911234580009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (20,'Sameena','sheikh',5,'Mrs','u952','[email protected]','2008-8-13','+911234580010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (21,'Ranita','gupta',5,'Mrs','u2664','[email protected]','2008-10-20','+911234580011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (22,'Ranjana','sharma',1,'Sh','u3085','[email protected]','2010-6-21','+911234580012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (23,'Ranjini','singh',6,'Mrs','u4200','[email protected]','2007-4-13','+911234580013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (24,'Ranjita','vyapari',2,'Smt','u1109','[email protected]','2008-1-22','+911234580014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (25,'Rashi','gupta',6,'Mrs','u3492','[email protected]','2006-2-2','+911234580015'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (26,'Rashmi','sehgal',3,'Mr','u3248','[email protected]','2008-9-9','+911234580016'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (27,'Rashmika','sexy',1,'Mrs','u4599','[email protected]','2009-3-12','+911234580017'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (28,'Rasika','dulari',3,'Smt','u2089','[email protected]','2009-1-24','+911234580018'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (29,'Dilber','lover',6,'Mr','u4241','[email protected]','2007-10-11','+911234580019'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (30,'Dilshad','happy',1,'Mr','u1564','[email protected]','2007-4-8','+911234580020'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (31,'Dipali','lights',5,'Sh','u1127','[email protected]','2006-11-1','+911234580021'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (32,'Dipika','lamp',1,'Sh','u2271','[email protected]','2010-12-17','+911234580022'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (33,'Dipti','brightness',5,'Smt','u422','[email protected]','2010-9-25','+911234580023'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (34,'Disha','singh',3,'Sh','u4604','[email protected]','2006-5-2','+911234580024'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (35,'Maadhav','Krishna',1,'Miss','u2561','[email protected]','2007-11-6','+911234580025'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (36,'Maagh','month',5,'Miss','u874','[email protected]','2008-5-8','+911234580026'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (37,'Maahir','Skilled',4,'Mr','u3372','[email protected]','2007-8-4','+911234580027'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (38,'Maalolan','Ahobilam',5,'Mrs','u3498','[email protected]','2007-7-9','+911234580028'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (39,'Maandhata','King',1,'Smt','u2089','[email protected]','2009-9-3','+911234580029'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (40,'Maaran','Brave',2,'Miss','u4020','[email protected]','2008-4-5','+9112345606001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (41,'Maari','Rain',2,'Sh','u3593','[email protected]','2007-12-5','+9112345606002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (42,'Madan','Cupid',4,'Mrs','u795','[email protected]','2007-11-11','+9112345606003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (43,'Madangopal','Krishna',3,'Sh','u438','[email protected]','2007-2-19','+9112345606004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (44,'sahil','gogna',1,'Sh','u2273','[email protected]','2007-10-7','+9112345606005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (45,'nikhil','gogna',2,'Mr','u1240','[email protected]','2009-9-14','+9112345606006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (46,'amit','gogna',5,'Sh','u3879','[email protected]','2006-2-8','+9112345606007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (47,'krishan','gogna',4,'Miss','u3632','[email protected]','2010-9-20','+9112345606008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (48,'anil','kashyap',4,'Smt','u3939','[email protected]','2010-3-15','+9112345606009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (49,'sunil','kashyap',5,'Mrs','u3493','[email protected]','2008-3-16','+9112345606010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (50,'sandy','singh',6,'Mrs','u4691','[email protected]','2009-6-2','+9112345606011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (51,'vishal','kapoor',3,'Mr','u1087','[email protected]','2010-5-13','+9112345606012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (52,'bala','ji',5,'Mrs','u4762','[email protected]','2007-8-9','+9112345606013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (53,'karan','sarin',4,'Miss','u3030','[email protected]','2008-4-8','+9112345606014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (54,'abhishek','kumar',4,'Miss','u1093','[email protected]','2008-12-21','+9112345605001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (55,'babu','the',1,'Miss','u1055','[email protected]','2008-7-2','+9112345506001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (56,'sandeep','gainda',3,'Miss','u1320','[email protected]','2010-5-14','+9112345606301'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (57,'dheeraj','kumar',3,'Miss','u3685','[email protected]','2007-10-14','+9112345606091'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (58,'dharmendra','chauhan',1,'Smt','u3235','[email protected]','2008-8-1','+9112345806001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (59,'max','alan',3,'Smt','u3465','[email protected]','2009-5-5','+9112345608011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (60,'hidayat','khan',3,'Smt','u958','[email protected]','2007-11-18','+911234599101'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (61,'himnashu','singh',4,'Miss','u2027','[email protected]','2008-3-2','+911234599102'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (62,'dinesh','kumar',6,'Sh','u3233','[email protected]','2008-5-9','+911234599103'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (63,'toshi','prakash',1,'Mr','u3766','[email protected]','2010-9-17','+911234599104'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (64,'niti','puri',3,'Mr','u3575','[email protected]','2009-11-15','+911234599105'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (65,'pawan','tikki',3,'Sh','u3919','[email protected]','2006-3-19','+911234599106'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (66,'gaurav','sharma',2,'Sh','u413','[email protected]','2010-4-2','+911234599107'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (67,'himanshu','verma',2,'Mrs','u4732','[email protected]','2009-3-20','+911234599108'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (68,'priyanshu','verma',3,'Sh','u183','[email protected]','2010-8-12','+911234599109'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (69,'nitika','luthra',2,'Mrs','u4259','[email protected]','2010-7-12','+911234599110'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (70,'neeru','gogna',2,'Sh','u1633','[email protected]','2010-6-23','+91532110000'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (71,'bindu','gupta',1,'Sh','u1859','[email protected]','2006-11-10','+91532110001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (72,'gurleen','bakshi',5,'Miss','u1423','[email protected]','2007-7-1','+91532110003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (73,'rahul','gupta',3,'Sh','u1223','[email protected]','2009-8-11','+91532110004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (74,'jagdish','salgotra',3,'Mr','u12','[email protected]','2008-5-19','+91532110005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (75,'vikas','sharma',3,'Smt','u465','[email protected]','2006-6-2','+91532110006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (76,'poonam','mahendra',2,'Sh','u1744','[email protected]','2009-12-2','+91532110007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (77,'pooja','kulkarni',3,'Mrs','u1903','[email protected]','2008-10-6','+91532110008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (78,'priya','mahajan',6,'Sh','u4205','[email protected]','2010-8-5','+91532110009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (79,'manoj','zerger',1,'Mrs','u3369','[email protected]','2009-12-4','+91532110010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (80,'mohan','master',5,'Mr','u2841','[email protected]','2010-10-7','+91532110011'); Please note that above employee data is just a data *only data* I created using a small java programme using random number generators and reading some names file, so you may find titles getting messed up :( We next create a procedure that we will use from java to fetch records that we just inserted. DROP PROCEDURE IF EXISTS search_address_book; CREATE PROCEDURE search_address_book(IN address_ids VARCHAR(1000) ) BEGIN DECLARE search_address_query VARCHAR(2000) DEFAULT ''; SET address_ids = CONCAT('\'', REPLACE(address_ids, ',', '\',\''), '\''); SET search_address_query = CONCAT(search_address_query, ' select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, ab.Title as Title, ab.Alias as Alias, ab.Email as Email, ab.DOJ as DOJ, ab.PhoneNo as PhoneNo ' ); SET search_address_query = CONCAT(search_address_query, ' from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id '); SET search_address_query = CONCAT(search_address_query, ' where ab.id IN (', address_ids ,') '); SET @statement = search_address_query; PREPARE dynquery FROM @statement; EXECUTE dynquery; DEALLOCATE PREPARE dynquery; END; # To get records for ids 1, 6 and 7, we run following commands: call search_address_book('1,6,7'); Configuring Sphinx It turns out that it was not terribly difficult to setup sphinx, but I had a hard time finding instructions on the web, so I'll post my steps here. By default Sphinx looks for 'sphinx.co.in' configuration file to come with indexes and other stuff, lets create and define source and index for our sample application addressbook.conf (read between the lines) ############################################################################# ## data source definition ############################################################################# source addressBookSource { ## SQL settings for 'mysql' ## type = mysql # some straightforward parameters for SQL source types sql_host = localhost sql_user = root sql_pass = root sql_db = addressbook sql_port = 3306 # optional, default is 3306 # pre-query, executed before the main fetch query sql_query_pre = SET NAMES utf8 # main document fetch query, integer document ID field MUST be the first selected column sql_query = \ select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, \ ab.Title as Title, ab.Alias as Alias, ab.Email as Email, UNIX_TIMESTAMP(ab.DOJ) as DOJ, ab.PhoneNo as PhoneNo \ from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id sql_attr_timestamp = DOJ # document info query, ONLY for CLI search (ie. testing and debugging) , optional, default is empty must contain $id macro and must fetch the document by that id sql_query_info = SELECT * FROM AddressBook WHERE id=$id } ############################################################################# ## index definition ############################################################################# # local index example, this is an index which is stored locally in the filesystem index addressBookIndex { # document source(s) to index source = addressBookSource # index files path and file name, without extension, make sure you have this folder path = C:\devel\sphinx-0.9.9-win32\data\addressBookIndex # document attribute values (docinfo) storage mode docinfo = extern # memory locking for cached data (.spa and .spi), to prevent swapping mlock = 0 morphology = none # make sure this file exists exceptions =C:\devel\sphinx-0.9.9-win32\data\exceptions.txt enable_star = 1 } ############################################################################# ## indexer settings ############################################################################# indexer { # memory limit, in bytes, kiloytes (16384K) or megabytes (256M) # optional, default is 32M, max is 2047M, recommended is 256M to 1024M mem_limit = 32M # maximum IO calls per second (for I/O throttling) # optional, default is 0 (unlimited) # # max_iops = 40 # maximum IO call size, bytes (for I/O throttling) # optional, default is 0 (unlimited) # # max_iosize = 1048576 # maximum xmlpipe2 field length, bytes # optional, default is 2M # # max_xmlpipe2_field = 4M # write buffer size, bytes # several (currently up to 4) buffers will be allocated # write buffers are allocated in addition to mem_limit # optional, default is 1M # # write_buffer = 1M } ############################################################################# ## searchd settings ############################################################################# searchd { # hostname, port, or hostname:port, or /unix/socket/path to listen on listen = 9312 # log file, searchd run info is logged here # optional, default is 'searchd.log' log = C:\devel\sphinx-0.9.9-win32\data\log\searchd.log # query log file, all search queries are logged here # optional, default is empty (do not log queries) query_log = C:\devel\sphinx-0.9.9-win32\data\log\query.log # client read timeout, seconds # optional, default is 5 read_timeout = 5 # request timeout, seconds # optional, default is 5 minutes client_timeout = 300 # maximum amount of children to fork (concurrent searches to run) # optional, default is 0 (unlimited) max_children = 30 # PID file, searchd process ID file name # mandatory pid_file = C:\devel\sphinx-0.9.9-win32\data\log\searchd.pid # max amount of matches the daemon ever keeps in RAM, per-index # WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL # default is 1000 (just like Google) max_matches = 1000 # seamless rotate, prevents rotate stalls if precaching huge datasets # optional, default is 1 seamless_rotate = 1 # whether to forcibly preopen all indexes on startup # optional, default is 0 (do not preopen) preopen_indexes = 0 } # --eof-- Once the configuration is done, its time to index our sql data, the command to use is 'indexer' as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... indexing index 'addressBookIndex'... collected 80 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 80 docs, 5514 bytes total 0.057 sec, 96386 bytes/sec, 1398.43 docs/sec total 2 reads, 0.000 sec, 3.5 kb/call avg, 0.0 msec/call avg total 7 writes, 0.000 sec, 2.5 kb/call avg, 0.0 msec/call avg Note: As I told earlier that Sphinx creates 1 document for each row, as we had 80 rows in the database so a total of 80 docs are created. Time taken is also very very small, believe me I tried with half million rows and it took around 3-4 seconds :) cool isn't it? Once the index is up let's try to search few records, the utility command to perform search is 'search'. Ok Sphinx maharaj* please search for employee whose alias is u4732 C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf u4732 CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... index 'addressBookIndex': query 'u4732 ': returned 1 matches of 1 total in 0.001 sec displaying matches: 1. document=67, weight=1, doj=Fri Mar 20 00:00:00 2009 Id=67 FirstName=himanshu LastName=verma OfficeId=2 Title=Mrs Alias=u4732 [email protected] DOJ=2009-03-20 PhoneNo=+911234599108 words: 1. 'u4732': 1 documents, 1 hits words: 1. 'u4732': 1 documents, 1 hits As you can see above this is a unique record for Himanshu. Note: You see a lot of information for the result, this is because of following line in our configuration file sql_query_info = SELECT * FROM AddressBook WHERE id=$id If you want to see less columns you need to change the sql_query_info in configuration file. Let's try another search, sphinx maharaj* please tell me which all rows have gurleen or toshi in them. C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --any toshi gurleen CONSOLE: displaying matches: 1. document=63, weight=2, doj=Fri Sep 17 00:00:00 2010 Id=63 FirstName=toshi LastName=prakash OfficeId=1 Title=Mr Alias=u3766 [email protected] DOJ=2010-09-17 PhoneNo=+911234599104 2. document=72, weight=2, doj=Sun Jul 01 00:00:00 2007 Id=72 FirstName=gurleen LastName=bakshi OfficeId=5 Title=Miss Alias=u1423 [email protected] DOJ=2007-07-01 PhoneNo=+91532110003 Exactly two records were returned and this is what we were expecting. The following special operators and modifiers can be used when using the extended matching mode: operator OR: nikhil | sahil operator NOT: hello -sandy hello !sandy field search operator: @Email [email protected] For a complete set of search features , I advise you to go through http://sphinxsearch.com/docs/manual-0.9.9.html#searching link. Sphinx as Windows Service Now our main aim is to use sphinx with JAVA API, so let's move towards that now, before java can utilize the true power of Sphinx, we need to start 'searchd' as a windows service so that our java programme can connect to sphinx search engine. Let's install Sphinx as a windows service so that our java program can use this daemon service to query the index that we just created, the command is : C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff Installing service... Service 'SphinxSearch' installed succesfully. Well now the sphinx is ready to serve us on port 9312 Note: If you try to install Sphinx without admin rights, you may get following error messages. C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Installing service... FATAL: OpenSCManager() failed: code=5, error=Access is denied. Once done you can start the service as: c:\>sc start SphinxSearch (or alternatively from the services screen, start 'services.msc' in windows Run) If some how you want to delete the service , use c:\>sc delete SphinxSearch Let's create an adapter to fetch data from the database. package it.gogs.sphinx.util; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.sql.CallableStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.util.ArrayList; import java.util.List; import org.apache.log4j.Logger; /** * Adapter to fetch data from the database. * * @author Munish Gogna * */ public class AddressBookAdapter { private static Logger logger = Logger.getLogger(AddressBookAdapter.class); private AddressBookAdapter() { // use in static way.. } private static Connection getConnection() throws AddressBookTechnicalException { String userName = "root"; String password = "root"; String url = "jdbc:mysql://localhost/addressbook"; try { Class.forName("com.mysql.jdbc.Driver").newInstance(); return DriverManager.getConnection(url, userName, password); } catch (Exception e) { throw new AddressBookTechnicalException("could not get connection"); } } public static List getAddressBookList(List addressIds) throws AddressBookTechnicalException, AddressBookBizException { List addressBoookList = new ArrayList(); if (addressIds == null || addressIds.size() == 0){ logger.error("AddressIds was null or empty, returning empty list"); return addressBoookList; } Connection connection = null; CallableStatement callableStatement = null; try { connection = getConnection(); callableStatement = connection.prepareCall("{ call search_address_book(?)}"); callableStatement.setString(1, Utils.toCommaString(addressIds)); callableStatement.execute(); ResultSet resultSet = callableStatement.getResultSet(); prepareResults(resultSet, addressBoookList); connection.close(); } catch (SQLException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw new AddressBookTechnicalException(e.getMessage()); } catch (AddressBookTechnicalException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw e; } finally{ if(connection != null){ try { connection.close(); } catch (SQLException e) { logger.error("Problem closing conection - " + e.getMessage()); e.printStackTrace(); } } } return addressBoookList; } private static void prepareResults(ResultSet resultSet, List addressBoookList) throws SQLException { AddressBoook addressBoook; while (resultSet.next()) { addressBoook = new AddressBoook(); addressBoook.setAlias(resultSet.getString("Alias")); addressBoook.setEmail(resultSet.getString("Email")); addressBoook.setfName(resultSet.getString("FName")); addressBoook.setlName(resultSet.getString("LName")); addressBoook.setOfficeLocation(resultSet.getString("Location")); addressBoook.setPhoneNo(resultSet.getString("PhoneNo")); addressBoook.setTitle(resultSet.getString("Title")); addressBoook.setDateOfJoining(resultSet.getDate("DOJ")); addressBoook.setId(resultSet.getLong("Id")); addressBoookList.add(addressBoook); } } } Next we create the SphinxInstance that will parse the keywords and date range and provide us a list of Ids that matches the search. package it.gogs.sphinx.util; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxClient; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.api.SphinxMatch; import it.gogs.sphinx.api.SphinxResult; import it.gogs.sphinx.exception.AddressBookBizException; import java.util.ArrayList; import java.util.Date; import java.util.List; import org.apache.log4j.Logger; /** * Instance that will parse our free text and provide the results. * * Note: Make sure that 'searchd' is up and running before you use this class * @author Munish Gogna * */ public class SphinxInstance { private static String SPHINX_HOST = "localhost"; private static String SPHINX_INDEX = "addressBookIndex"; private static int SPHINX_PORT = 9312; private static SphinxClient sphinxClient; private static Logger logger = Logger.getLogger(SphinxInstance.class); static { sphinxClient = new SphinxClient(SPHINX_HOST, SPHINX_PORT); } public static List getAddressBookIds(SearchCriteria criteria) throws AddressBookBizException, SphinxException { List addressIdsList = new ArrayList(); try { if (Utils.isNull(criteria)) { logger.error("criteria is null"); throw new AddressBookBizException("criteria is null"); } if (Utils.isNull(criteria.getKeywords())) { logger.error("keyword is a required field"); throw new AddressBookBizException("keyword is a required field"); } DateRange dateRange = criteria.getDateRage(); if (!Utils.isNull(dateRange)) { if (Utils.isDateRangeValid(dateRange)) { // this is to filter results based on joining dates if they are provided sphinxClient.SetFilterRange("DOJ", getTimeInSeconds(dateRange.getFromDate()), getTimeInSeconds(dateRange.getToDate()), false); } else { logger.error(" fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); throw new AddressBookBizException("fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); } } sphinxClient.SetMatchMode(SphinxClient.SPH_MATCH_EXTENDED2); sphinxClient.SetSortMode(SphinxClient.SPH_SORT_RELEVANCE, ""); SphinxResult result = sphinxClient.Query(buildSearchQuery(criteria), SPHINX_INDEX, "buidling query for address book search"); SphinxMatch[] matches = result.matches; for (SphinxMatch match : matches) { addressIdsList.add(String.valueOf(match.docId)); } } catch (SphinxException e) { throw e; } catch (AddressBookBizException e) { throw e; } logger.info("Total record(s):" + addressIdsList.size()); return addressIdsList; } private static long getTimeInSeconds(Date time) { return time.getTime()/1000; } private static String buildSearchQuery(SearchCriteria criteria) throws AddressBookBizException { String keywords[] = criteria.getKeywords().split(" "); StringBuilder searchFor = new StringBuilder(); for (String key : keywords) { if (!Utils.isEmpty(key)) { searchFor.append(key); if (searchFor.length() > 1) { searchFor.append("*|*"); } } } searchFor.delete(searchFor.lastIndexOf("|*"), searchFor.length()); StringBuilder queryBuilder = new StringBuilder(); String query = searchFor.toString(); queryBuilder.append("@FName *" + query + " | "); queryBuilder.append("@LName *" + query + " | "); queryBuilder.append("@Title *" + query + " | "); queryBuilder.append("@Location *"+ query + " | "); queryBuilder.append("@Alias *" + query + " | "); queryBuilder.append("@Email *" + query + " | "); queryBuilder.append("@PhoneNo *" + query); logger.info("Sphinx Query: " + queryBuilder.toString()); return queryBuilder.toString(); } } Here is the interface that I will expose to the outside world (in my future article I will expose this interface as Web Service) import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.util.List; /** * * @author Munish Gogna * */ public interface AddressBook { /** * Returns the list of AddressBook objects based on search criteria. * * @param criteria * @throws AddressBookTechnicalException * @throws AddressBookBizException * @throws SphinxException */ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException; } and here is the implementation class for the same. package it.gogs.sphinx.addressbook.impl; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import it.gogs.sphinx.util.AddressBookAdapter; import it.gogs.sphinx.util.SphinxInstance; /** * Implementation for our Address Book example * * @author Munish Gogna * */ public class AddressBookImpl implements AddressBook{ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException { List addressIds= SphinxInstance.getAddressBookIds(criteria); return AddressBookAdapter.getAddressBookList(addressIds); } } ok so far so good, let's run some tests now ............ package it.gogs.sphinx.test; import java.util.Calendar; import java.util.GregorianCalendar; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.addressbook.impl.AddressBookImpl; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import junit.framework.TestCase; /** * * @author Munish Gogna * */ public class AddressBookTest extends TestCase { private AddressBook addressBook; @Override protected void setUp() throws Exception { super.setUp(); addressBook = new AddressBookImpl(); } @Override protected void tearDown() throws Exception { super.tearDown(); } /** this should be a unique record for Himanshu */ public void test_search_for_himanshu() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the first 'search' example?? criteria.setKeywords("u4732"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 1); assertTrue("expecting himanshu here", "himanshu".equals(addressList.get(0).getfName())); } /** only two employees have name gurleen or toshi */ public void test_search_for_gurleen_or_toshi() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the second 'search' example?? criteria.setKeywords("gurleen toshi"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 2); assertTrue("expecting toshi here", "toshi".equals(addressList.get(0).getfName())); assertTrue("expecting gurleen here", "gurleen".equals(addressList.get(1).getfName())); } /** there are 16 people from jammu location */ public void test_search_for_people_from_jammu_location() throws Exception { SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("jammu"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 16); } /** only Aalap, Manda and nitika are having title as Mrs and joined in 2010 */ public void test_joined_in_2010_with_title_Mrs() throws Exception { DateRange dateRange = new DateRange(); GregorianCalendar calendar1 = new GregorianCalendar(); calendar1.set(Calendar.YEAR, 2010); calendar1.set(Calendar.MONTH, Calendar.JANUARY); calendar1.set(Calendar.DAY_OF_MONTH, 1); dateRange.setFromDate(calendar1.getTime()); GregorianCalendar calendar2 = new GregorianCalendar(); calendar2.set(Calendar.YEAR, 2010); calendar2.set(Calendar.MONTH, Calendar.DECEMBER); calendar2.set(Calendar.DAY_OF_MONTH, 31); dateRange.setToDate(calendar2.getTime()); SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("Mrs"); criteria.setDateRage(dateRange); List addressList = addressBook.getAddressBookList(criteria); assertTrue("expecting 3 records here", addressList.size() == 3); } /** should get a business exception here */ public void test_without_specifying_keywords(){ SearchCriteria criteria = new SearchCriteria(); //criteria.setKeywords("Mrs"); try { addressBook.getAddressBookList(criteria); } catch (Exception e) { assertTrue(e instanceof AddressBookBizException); assertTrue(e.getMessage().indexOf("keyword is a required field") >-1); } } } How we update the Index once database changes? For these kinds of requirements, we can set up two sources and two indexes, with one "main" index for the data which only changes rarely (if ever), and one "delta" for the new documents. First Time data will go in the "main" index and the newly inserted address book entries will go into "delta". Delta index could then be reindexed very frequently, and the documents can be made available to search in a matter of minutes. Also one thing to take from this article is once 'searchd' daemon is running we can't index the data in normal way,we have to use --rotate option in such cases. For some applications where there is a timely batch update for the data, we can configure some cron job to reindex our documents in Sphinx as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --rotate Capsule We asked Sphinx to provide us the Document Ids corresponding to our search parameters and then we used those Ids to fire database query. In case the data we want to return is included in Index (DOJ attribute for example in our case) we can skip the database portion, so choose wisely how much information (attributes) you want to include while you index your sql data. Well that's all ... it's time to say good bye. Take good care of your health and don't forget to vote, its a must :) - Munish Gogna

December 7, 2010

by Munish Gogna

· 38,149 Views · 2 Likes

Setting mouse cursor position with WinAPI

Setting the mouse cursor position on a Windows machine with the help of .NET Framework shouldn't be that big of a problem. After all, there is the built-in Cursor class that lets you do that by executing a simple line of code: Cursor.Position = new System.Drawing.Point(0, 0); Of course, here 0 and 0 are the absolute coordinates for the mouse cursor on the screen. One thing to mention about this type of position setting is that Cursor requires a reference to System.Windows.Forms. And in some cases you don't want this extra reference. If that's the case, WinAPI is your solution. It requires some more work compared to the regular .NET way (class instance -> method call) but at the end you get more control than you would expect. When using WinAPI to set the cursor position, there are two ways you can go: mouse_event SendInput mouse_event is the very basic function that is only able to set the mouse coordinates. It was superseded and Microsoft recomends using SendInput instead. Nonetheless, it still works (although I cannot say for sure whether it will be working in future releases of Windows). So to start, I have a very basic class: class WINAPI_SUPERSEDED { [DllImport("user32.dll",SetLastError=true)] public static extern void mouse_event(uint dwFlags, uint dx, uint dy, uint dwData, int dwExtraInfo); public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } public enum DataFlags { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } } So to set the cursor position to 0,0 I would use this: WINAPI_SUPERSEDED.mouse_event((int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_MOVE | (int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_ABSOLUTE, 0, 0, 0, 0); If you go through the documentation, you will notice that in fact, dx and dy are in no way direct coordinates but rather normalized values ranged between 0 and 65,535 - that is only when the MOUSEEVENTF_ABSOLUTE flag is present. Otherwise, the position will be adjusted according to the current mouse cursor position. This method doesn't return any value so I cannot be informed whether it was successful or not. GetLastError won't give much information either. In this case, SendInput comes to the rescue. Here is what I have defined as the main class: class WINAPI { [DllImport("kernel32.dll")] public static extern uint GetLastError(); public enum MouseData { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_HWHEEL = 0x01000, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_MOVE_NOCOALESCE = 0x2000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_VIRTUALDESK = 0x4000, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } [DllImport("user32.dll", SetLastError=true)] public static extern uint SendInput(uint nInputs, ref INPUT pInputs, int cbSize); [StructLayout (LayoutKind.Explicit)] public struct INPUT { [FieldOffset(0)] public int type; [FieldOffset(4)] public MOUSEINPUT mi; } public struct MOUSEINPUT { public int dx; public int dy; public int mouseData; public int dwFlags; public int time; public int extraInfo; } } This class is a bit more complicated, but at the same time you have to understand that in some cases, SendInput is used for hardware and keyboard input as well. Of course, for experimentation purposes I removed those parts in the sample class. Here you have the same MouseFlags enum that will let you pass custom flags defining the mouse behavior. Notice, that SendInput has SetLastError set to true, therefore if something wrong happens via this method, the error will be easily obtained via GetLastError, that is implemented as a helper method in the same class. VERY IMPORTANT: When you define the INPUT struct, make sure you use LayoutKind.Explicit since when passed to an unamanaged call, a specific field layout is required - as you can see, every field is decorated with a FieldOffset attribute. Also taking about StructLayout, you don't have to set StructLayout.Sequential to MOUSEINPUT since it is setautomatically by CLR. When I want to call the method above, I can simply use this snippet: WINAPI.MOUSEINPUT mouseInput = new WINAPI.MOUSEINPUT(); mouseInput.dx = 100; mouseInput.dy = 10; mouseInput.dwFlags = (int)WINAPI.MouseFlags.MOUSEEVENTF_ABSOLUTE | (int)WINAPI.MouseFlags.MOUSEEVENTF_MOVE; WINAPI.INPUT input = new WINAPI.INPUT(); input.type = 0; input.mi = mouseInput; uint x = WINAPI.SendInput(1, ref input, Marshal.SizeOf(input)); Console.WriteLine(x); Console.WriteLine(WINAPI.GetLastError()); Console.ReadLine(); Notice that I have to call the unmanaged version of sizeof and pass the INPUT struct to it in order for the method to correctly execute. The regular C# sizeof won't cut it here. When I am defining the type of input, 0 is repesenting the INPUT_MOUSE flag, since I am only handling the mouse here. Of course, I can re-organize my method to accept a set of INPUT instances - the native call itself allows this by requesting an array of INPUT and the correct indication of the number of INPUT instances passed, but that is not required for testing purposes.

November 28, 2010

by Denzel D.

· 17,387 Views

Map Reduce and Stream Processing

Hadoop Map/Reduce model is very good in processing large amount of data in parallel. It provides a general partitioning mechanism (based on the key of the data) to distribute aggregation workload across different machines. Basically, map/reduce algorithm design is all about how to select the right key for the record at different stage of processing. However, "time dimension" has a very different characteristic compared to other dimensional attributes of data, especially when real-time data processing is concerned. It presents a different set of challenges to the batch oriented, Map/Reduce model. Real-time processing demands a very low latency of response, which means there isn't too much data accumulated at the "time" dimension for processing. Data collected from multiple sources may not have all arrived at the point of aggregation. In the standard model of Map/Reduce, the reduce phase cannot start until the map phase is completed. And all the intermediate data is persisted in the disk before download to the reducer. All these added to significant latency of the processing. Here is a more detail description of this high latency characteristic of Hadoop. Although Hadoop Map/Reduce is designed for batch-oriented work load, certain application, such as fraud detection, ad display, network monitoring requires real-time response for processing large amount of data, have started to looked at various way of tweaking Hadoop to fit in the more real-time processing environment. Here I try to look at some technique to perform low-latency parallel processing based on the Map/Reduce model. General stream processing model In this model, data are produced at various OLTP system, which update the transaction data store and also asynchronously send additional data for analytic processing. The analytic processing will write the output to a decision model, which will feed back information to the OLTP system for real-time decision making. Notice the "asynchronous nature" of the analytic processing which is decoupled from the OLTP system, this way the OLTP system won't be slow down waiting for the completion of the analytic processing. Nevetheless, we still need to perform the analytic processing ASAP, otherwise the decision model will not be very useful if it doesn't reflect the current picture of the world. What latency is tolerable is application specific. Micro-batch in Map/Reduce One approach is to cut the data into small batches based on time window (e.g. every hour) and submit the data collected in each batch to the Map Reduce job. Staging mechanism is needed such that the OLTP application can continue independent of the analytic processing. A job scheduler is used to regulate the producer and consumer so each of them can proceed independently. Continuous Map/Reduce Here lets imagine some possible modification of the Map/Reduce execution model to cater for real-time stream processing. I am not trying to worry about the backward compatibility of Hadoop which is the approach that Hadoop online prototype (HOP) is taking. Long running The first modification is to make the mapper and reducer long-running. Therefore, we cannot wait for the end of the map phase before starting the reduce phase as the map phase never ends. This implies the mapper push the data to the reducer once it complete its processing and let the reducer to sort the data. A downside of this approach is that it offers no opportunity to run the combine() function on the map side to reduce the bandwidth utilization. It also shift more workload to the reducer which now needs to do the sorting. Notice there is a tradeoff between latency and optimization. Optimization requires more data to be accumulated at the source (ie: the Mapper) so local consolidation (ie: combine) can be performed. Unfortunately, low latency requires the data to be sent ASAP so not much accumulation can be done. HOP suggest an adaptive flow control mechanism such that data is pushed out to reducer ASAP until the reducer is overloaded and push back (using some sort of flow control protocol). Then the mapper will buffer the processed message and perform combine() before it send to the reducer. This approach automatically shift back and forth the aggregation workload between the reducer and the mapper. Time Window: Slice and Range This is a "time slice" concept and a "time range" concept. "Slice" defines a time window where result is accumulated before the reduce processing is executed. This is also the minimum amount of data that the mapper should accumulate before sending to the reducer. "Range" defines the time window where results are aggregated. It can be a landmark window where it has a well-defined starting point, or a jumping window (consider a moving landmark scenario). It can also be a sliding window where is a fixed size window from the current time is aggregated. After receiving a specific time slice from every mapper, the reducer can start the aggregation processing and combine the result with the previous aggregation result. Slice can be dynamically adjusted based on the amount of data sent from the mapper. Incremental processing Notice that the reducer need to compute the aggregated slice value after receive all records of the same slice from all mappers. After that it calls the user-defined merge() function to merge the slice value with the range value. In case the range need to be refreshed (e.g. reaching a jumping window boundary), the init() functin will be called to get a refreshed range value. If the range value need to be updated (when certain slice value falls outside a sliding range), the unmerge() function will be invoked. Here is an example of how we keep tracked of the average hit rate (ie: total hits per hour) within a 24 hour sliding window with update happens per hour (ie: an one-hour slice). # Call at each hit record map(k1, hitRecord) { site = hitRecord.site # lookup the slice of the particular key slice = lookupSlice(site) if (slice.time - now > 60.minutes) { # Notify reducer whole slice of site is sent advance(site, slice) slice = lookupSlice(site) } emitIntermediate(site, slice, 1) } combine(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(site, slice, hitCount) } # Called when reducer receive full slice from all mappers reduce(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } sv = SliceValue.new sv.hitCount = hitCount return sv } # Called at each jumping window boundary init(slice) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Called after each reduce() merge(rangeValue, slice, sliceValue) { rangeValue.hitCount += sliceValue.hitCount } # Called when a slice fall out the sliding window unmerge(rangeValue, slice, sliceValue) { rangeValue.hitCount -= sliceValue.hitCount }

November 23, 2010

by Ricky Ho

· 17,279 Views · 1 Like

Implementing Retries with a MDB or an MQ Batch Job? (WAS 7, MQ 6)

Both approaches have some advantages and disadvantages and so it’s a question of the likelihood of particular problems and business requirements and priorities.

November 10, 2010

by Jakub Holý

· 27,306 Views

Struts 2 : Creating and Accessing Maps

Today, in this post, I am going to discuss how to create and access HashMaps in Struts 2. My environment has the following jar files. struts2-core-2.1.8.1.jar ognl-2.7.3.jar Struts 2 makes extensive use of OGNL in order to retrieve the values of elements. OGNL stands for Object Graph Navigation Language. As the name suggests, OGNL is used to navigate an object graph. In this post, i am going to use the OGNL syntax to create Map on a jsp page, and show you how to iterate over it to fetch the keys and values from the map. In the following example, i will create a map, on they fly in the iterator tag, and will use it in the body of the iterator tag. Note the syntax that has been used to create a Map on a jsp page in struts 2 using OGNL. Once the map is created, the iterator tag can be used to iterate over each element of the map. Now suppose the map that we want to access is in a map inside the HttpRequest. Assume that some action somewhere in the chain has kept a map in the request using the key "myMap". In order to iterate over the elements of the map, we can do the following Okay now, enough with iterating over maps. I don't think there are any more permutations for accessing maps. But if i do find more, ill document them down here. Consider a case where you dont want to iterate over the entire map. Instead, all that you want to do is to extract a value form the map based upon a key that is already known to you in your jsp page. Assume that you have a variable in page scope called "runtimeKey" that you set using the s:set tag. The value of this variable is a string key that can be used to get a value from a map. Here is how you can fetch the value from the map without iterating over it. As you see, since the variable "runtimeKey" is an OGNL variable and is available on the value stack, it can be referenced using the # notations. Also not that instead of using the dot notation, I have used square brackets to fetch the key. This is because the value of my expression "#runtimeKey will only be evaluated when its inside the brackets. Also note that the value of the key runtimeKey is contained within single quotes to direct OGNL to evaluate it as a string when setting as the value for my key. Consider another situation where your keys follow a pattern. For example key_1, key_2. And you have the values 1 and 2 as page scoped variables. So, now instead of having the string value of the key directly, you may have to construct the key using concatenation. Your key pattern was set above. And you Map has a key called key_1. So, here is how you would have to concatenate your strings in order to construct the key and fetch the value. Huh. So easy! Thats all for now folks.. Stay tuned for more! Happy Programming :) From http://mycodefixes.blogspot.com/2010/11/struts-2-creating-and-accessing-maps.html

November 8, 2010

by Ryan Sukale

· 32,555 Views

MyBatis (formerly iBatis) – Examples and Hints using SELECT, INSERT and UPDATE Annotations

MyBatis is a lightweight persistence framework for Java and .NET. This blog entry addresses the Java side. MyBatis is an alternative positioned somewhere between plain JDBC and ORM frameworks (e.g. EclipseLink or Hibernate). MyBatis usually uses XML, but it also supports annotations since version 3. The documentation is very detailed for XML, but lacks annotation examples. Just the Annotations itself are described, but no examples how to use them. I could not find any good and easy examples anywhere, so I will describe some very basic examples for SELECT, INSERT and UPDATE statements by implementing a Data Access Object (DAO) using MyBatis. These examples are a good starting point to create more complex MyBatis queries using a DAO. You can find the full source code at the end of this blog. A simple SQL-Table I use a very simple table with two attributes. The name of the table is “simple_information”. The primary key is a Integer and may not be null (info_id). The only real data is a character and may also not be null (info_content). That is enough “complexity” to learn the usage of MyBatis annotations. The Java class “SimpleInformationEntity” is a POJO which contains these two attributes. @SELECT-Statement The @Select annotation is very easy to use, if you want to use exactly one paramter. If you need more than one paramter, use the @Param annotation (which is described below at the update example). You do not have to map the found information to a SimpleInformationEntity object, as you would have to do with a JDBC ResultSet. The magic of the framework does this for you. final String GET_INFO = “SELECT * FROM simple_information WHERE info_id = #{info_id}”; @Select(GET_INFO) public SimpleInformationEntity getSimpleInformationById(int info_id) throws Exception; @INSERT-Statement You can use the object (which you want to be persist) as parameter. You do not have to use several parameters for each attribute of the object. The magic of the framework does this for you. final String PERSIST_INFO = “INSERT INTO simple_information(info_id, info_content) VALUES (#{infoId}, #{infoContent})”; @Insert(PERSIST_INFO) public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception; @UPDATE-Statement You cannot use more than one parameter within a method. If you want to understand why, look at some MyBatis XML examples in the documenation: There you use the attribute “parameterType”, which must be exactly one “parameter”! So you will get a (strange) exception, if you use two or more parameters. Instead you have to use the @Param annotation, if you need more than one parameter. final String UPDATE_INFO = “UPDATE simple_information SET info_content = #{newInfo} WHERE info_id = #{infoId}”; @Update(UPDATE_INFO) public int updateInformation(@Param(“infoId”) int info_id, @Param(“newInfo”) String new_content) throws Exception; Configuration You have to add your MyBatis-Interface for the Mapper to the SQLSessionFactory: sqlSessionFactory.getConfiguration().addMapper(InfoMapper.class); Some Hints for developing with MyBatis The following means helped me a lot to use MyBatis annotations despite the lack of documentation about using annotations: - MyBatis is open source! So add the sources to your build path and use the debugging function of your IDE to enter the MyBatis source code while executing some queries. You will see what MyBatis expects as input and how it is processed. - Read the documentation about using MyBatis XML. This does not really make any sense you think? It does! The processing deep inside MyBatis does not change if you use annotations instead of XML. It is just another way to develop persistence queries. E.g. if you know that a XML select query may just use exactly one parameterType-attribute, then you know that you may just use one parameter in an annotation-based method too! If you need more parameters, you have to use the @Param annotation. - If you get any strange exception that does not make any sense, then clean and re-compile your project. This often helps, because as with other persistence frameworks such as Hibernate, the bytecode enhancement sometimes confuses your IDE. Conclusion: @MyBatis-Team: Improve and extend the documentation instead of improving the framework itself! MyBatis is a nice lightweigt persistence framework. But the documentation is not enough detailed. Some important information is completely missing. Especially, if you are a newbie to MyBatis / iBatis, it is very tough to develop with MyBatis using annotations instead of XML. Besides the usage of annotations, another good example for missing documentation is how to configure transactions in MyBatis by using JNDI and a J2EE / JEE Application server. You have to use google to find out, and if you are lucky you will find a mailing list or blog entry describing your problem. If not, you have to try it out. The missing documentation makes MyBatis much more tough than it actually is. So in the next months, the MyBatis team should improve and extend the documentation instead of improving the framework itself… Best regards, Kai Wähner (Twitter: @Kai Waehner) [Content from my Blog: MyBatis (formerly iBatis) - Examples and Hints using Select, Insert and Update Annotations - Kai Wähner's IT-Blog] Appendix: Source Code Here you see all the necessary source code, also including a MyBatis Connection Factory, which reads the configuration data from a XML file. ############################################################################ Connection Factory (using a static initializer) ############################################################################ import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import org.apache.ibatis.session.SqlSessionFactory; import org.apache.ibatis.session.SqlSessionFactoryBuilder; import de.waehner.kai.persistence.InfoDAO.InfoMapper; public class MyBatisConnectionFactory { private static SqlSessionFactory sqlSessionFactory; static { Reader reader = null; try { InputStream in = MyBatisConnectionFactory.class.getResourceAsStream(“myBatisConfiguration.xml”); reader = new InputStreamReader(in); if (sqlSessionFactory == null) { sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader); sqlSessionFactory.getConfiguration().addMapper(InfoMapper.class); } in.close(); } catch (FileNotFoundException fileNotFoundException) { fileNotFoundException.printStackTrace(); } catch (IOException iOException) { iOException.printStackTrace(); } } public static SqlSessionFactory getSqlSessionFactory() { return sqlSessionFactory; } } ############################################################################ Data Acces Object (including the MyBatis-Mapper as inner Class) ############################################################################ import org.apache.ibatis.annotations.Insert; import org.apache.ibatis.annotations.Param; import org.apache.ibatis.annotations.Select; import org.apache.ibatis.annotations.Update; import org.apache.ibatis.session.SqlSession; import org.apache.ibatis.session.SqlSessionFactory; public class InfoDAO { public interface InfoMapper { final String GET_INFO = “SELECT * FROM simple_information WHERE info_id = #{info_id}”; @Select(GET_INFO) public SimpleInformationEntity getSimpleInformationById(int info_id) throws Exception; final String PERSIST_INFO = “INSERT INTO simple_information(info_id, info_content) VALUES (#{infoId}, #{infoContent})”; @Insert(PERSIST_INFO) public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception; final String UPDATE_INFO = “UPDATE simple_information SET info_content = #{newInfo} WHERE info_id = #{infoId}”; @Update(UPDATE_INFO) public int updateInformation(@Param(“infoId”) int info_id, @Param(“newInfo”) String new_content) throws Exception; } public SimpleInformationEntity getSingleAlarm(int info_id) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); SimpleInformationEntity simpleInfo = mapper.getSimpleInformationById(info_id); return simpleInfo; } finally { session.close(); } } public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); int answer = mapper.persistInformation(simpleInfo); return answer; } finally { session.close(); } } public int updateInformation(int info_id, String new_content) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); int answer = mapper.updateInformation(info_id, new_content); return answer; } finally { session.close(); } } } ############################################################################ SimpleInformation Entity (a simple POJO) ############################################################################ import java.io.Serializable; public class SimpleInformationEntity implements Serializable{ private static final long serialVersionUID = -821826330941829539L; private int infoId; private String infoContent; public int getInfoId() { return infoId; } public void setInfoId(int infoId) { this.infoId = infoId; } public String getInfoContent() { return infoContent; } public void setInfoContent(String infoContent) { this.infoContent = infoContent; } }

November 2, 2010

by Kai Wähner

CORE

· 79,433 Views · 2 Likes

Manage Hierarchical Data using Spring, JPA and Aspects

Managing hierarchical data using two dimentional tables is a pain. There are some patterns to reduce this pain. One such solution is described here. This article is about implementing the same using Spring, JPA, Annotations and Aspects. Please go through follow the link to better understand this solution described. The purpose is to come up with a component that will remove the boiler-plate code in the business layer to handle hierarchical data. Summary Create a base class for Entities used to represent Hierarchical data Create annotation classes Code the Aspect that will execute addional steps for managing Hierarchical data. (Heart of the solution) Now the Aspect can be used everywhere Hierarchical data is used. Detail Create base class for Entities used to represent Hierarchical data. The purpose of the super class is to encapsulate all the common attrubutes and operations required for managing hierarchical data in a table. Please note that the class is annotated as @MappedSuperclass. The methods are meant to generate queries required to perform CRUD operations on the Table. Their use will be more clear later in the article when we will revisit HierarchicalEntity. Now any Entity that extends this class will have all the attributes required to manage hierarchical data. import com.es.clms.aspect.HierarchicalEntity; import javax.persistence.EntityListeners; import javax.persistence.MappedSuperclass; @MappedSuperclass @EntityListeners({HierarchicalEntity.class}) public abstract class AbstractHierarchyEntity implements Serializable { protected Long parentId; protected Long lft; protected Long rgt; public String getMaxRightQuery() { return "Select max(e.rgt) from " + this.getClass().getName() + " e"; } public String getQueryForParentRight() { return "Select e.rgt from " + this.getClass().getName() + " e where e.id = ?1"; } public String getDeleteStmt() { return "Delete from " + this.getClass().getName() + " e Where e.lft between ?1 and ?2"; } public String getUpdateStmtForFirst() { return "Update " + this.getClass().getName() + " e set e.lft = e.lft + ?2 Where e.lft >= ?1"; } public String getUpdateStmtForRight() { return "Update " + this.getClass().getName() + " e set e.rgt = e.rgt + ?2 Where e.rgt >= ?1"; } . . .//Getter and setters for all the attributes. } Create annotation classes The following is an annotation class that will be used to annotate the methods that perform CRUD operations on hierarchical data. It is followed by an enum that will decide the type of CRUD operation to be performed. These classes will make more sense after the next section. import java.lang.annotation.ElementType; import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; import java.lang.annotation.Target; @Target(ElementType.METHOD) @Retention(RetentionPolicy.RUNTIME) public @interface HierarchicalOperation { HierarchicalOperationType operationType(); } /** * Enum - Type of CRUD operation. */ public enum HierarchicalOperationType { SAVE, DELETE; } Code the Aspect that will execute addional steps for managing Hierarchical data. HierarchicalEntity is an aspect that performs the additional logic required to manage the hierarchical data as descriped in the article here. This is the first time I have used an Aspect, therefore I am sure that there are better ways to do this. Those of you, who are good at it, please improve this part of code. This class is annotated as @Aspect. The pointcut will intercept any method anotated with HierarchicalOperation and has a input of type AbstractHierarchyEntity. A sample its usage is in next section. operation method is annotated to be executed before the pointcut. Based on the HierarchicalOperationType passed, this method will either execute the additional tasks required to save or delete the hierarchical record. This is where the methods defined in AbstractHierarchyEntity for generating JPA Queries are used. GenericDAOHelper is a utility class for using JPA. import com.es.clms.annotation.HierarchicalOperation; import com.es.clms.annotation.HierarchicalOperationType; import com.es.clms.common.GenericDAOHelper; import com.es.clms.model.AbstractHierarchyEntity; import org.aspectj.lang.JoinPoint; import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Before; import org.aspectj.lang.annotation.Pointcut; import org.springframework.stereotype.Service; import org.springframework.beans.factory.annotation.Autowired; @Aspect @Service("hierarchicalEntity") public class HierarchicalEntity { @Autowired private GenericDAOHelper genericDAOHelper; @Pointcut(value = "execution(@com.es.clms.annotation.HierarchicalOperation * *(..)) " + "&& args(AbstractHierarchyEntity)") private void hierarchicalOps() { } /** * * @param jp * @param hierarchicalOperation */ @Before("hierarchicalOps() && @annotation(hierarchicalOperation) ") public void operation(final JoinPoint jp, final HierarchicalOperation hierarchicalOperation) { if (jp.getArgs().length != 1) { throw new IllegalArgumentException( "Expecting only one parameter of type AbstractHierarchyEntity in " + jp.getSignature()); } if (HierarchicalOperationType.SAVE.equals( hierarchicalOperation.operationType())) { save(jp); } else if (HierarchicalOperationType.DELETE.equals( hierarchicalOperation.operationType())) { delete(jp); } } /** * * @param jp */ private void save(JoinPoint jp) { AbstractHierarchyEntity entity = (AbstractHierarchyEntity) jp.getArgs()[0]; if (entity == null) return; if (entity.getParentId() == null) { Long maxRight = (Long) genericDAOHelper.executeSingleResultQuery( entity.getMaxRightQuery()); if (maxRight == null) { maxRight = 0L; } entity.setLft(maxRight + 1); entity.setRgt(maxRight + 2); } else { Long parentRight = (Long) genericDAOHelper.executeSingleResultQuery( entity.getQueryForParentRight(), entity.getParentId()); entity.setLft(parentRight); entity.setRgt(parentRight + 1); genericDAOHelper.executeUpdate( entity.getUpdateStmtForFirst(), parentRight, 2L); genericDAOHelper.executeUpdate( entity.getUpdateStmtForRight(), parentRight, 2L); } } /** * * @param jp */ private void delete(JoinPoint jp) { AbstractHierarchyEntity entity = (AbstractHierarchyEntity) jp.getArgs()[0]; genericDAOHelper.executeUpdate( entity.getDeleteStmt(), entity.getLft(), entity.getRgt()); Long width = (entity.getRgt() - entity.getLft()) + 1; genericDAOHelper.executeUpdate( entity.getUpdateStmtForFirst(), entity.getRgt(), width * (-1)); genericDAOHelper.executeUpdate( entity.getUpdateStmtForRight(), entity.getRgt(), width * (-1)); } } Sample Usage From this point on you don't have to worry about the additional tasks required for managing the data. Just use the HierarchicalOperation anotation with appropriate HierarchicalOperationType. Below is a sample use of the code developed so far. @HierarchicalOperation(operationType = HierarchicalOperationType.SAVE) public long save(VariableGroup group) { entityManager.persist(group); return group.getId(); } @HierarchicalOperation(operationType = HierarchicalOperationType.DELETE) public void delete(VariableGroup group) { entityManager.remove(entityManager.merge(group)); } http://rajeshkilango.blogspot.com/2010/10/manage-hierarchical-data-using-spring.html

October 21, 2010

by Rajesh Ilango

· 17,041 Views

Practical PHP Patterns: Record Set

This is the last article from the Practical PHP Patterns series. Stay tuned on css.dzone.com for the new series, Practical PHP Testing Patterns. The RecordSet pattern's goal is to represent a set of relational database rows, with the main purpose of giving access to their values (a data structure), sometimes with the possibility of modification (via single Row Data Gateway instances). Despite the term set, the rows have usually a defined order. The intent is ultimately representing a result from an SQL query with an object, to gain the usual advantages of objects over scalars and functions: it can be passed around but maintain its encapsulated behavior, injected, mocked, wrapped and so on. A Record Set is usually not mocked if it is provided by an external extension or library, because instancing a real one working on a lightweight database such as Sqlite is used in substitution for the real one. Especially in the PHP world, this solution is fast enough to become a standard. Today, Record Set is less used on the client side to favor Object-Relational Mapping approaches, which make some kind of translation over the raw rows (what an horrible pun). Record Set instead maintains by definition a one-to-one relationship with the table rows, and it is still diffused in ORM internals (such as Doctrine's own code) or in applications that call PDO directly. It is a fairly basic pattern, but given that a vast part of legacy PHP applications still uses mysql_query()... Interesting things happen when... Interesting leverages of this pattern happen when someone stays in the middle between the Record Set creation and the user interface, with the goal of modifying or decorating it. The UI can then explore the RecordSet and automatically generate itself, in a form of scaffolding. Continuing on this line of thought, UI components can edit the RecordSet without knowing the model which it refers to, via building forms driven by the Record Set metadata. This solution is diffused, but it does not scale to Domain Models with a level of complexity greater than plain arrays. Of course, the Record Set may also encapsulate business logic as a low-cost form of Domain Model. In this case, the ability to unlink it from the database connection is important to its serializability and ease of testing. Examples PDOStatement represents both a SQL query and a RecordSet implementation, after it has been executed. When fetching all the results, it returns an array. In other languages Record Sets are more evolved and can for example be used to navigate a table and modify only certain records (via their annexed Row Data Gateway). PDOStatement is used only for reading. If you want further functionalities (which obviously depends on your domain), you should create your own Record Set accordingly. It is probably best to wrap the PDOStatement because extending it is out of question due to the instantiation not being under our control. connection = $connection; } public function getTweetsRecordSet($username) { /** * @var PDOStatement this is a Record Set */ $stmt = $this->connection->prepare('SELECT * FROM tweets WHERE username = :username'); $stmt->bindValue(':username', $username, PDO::PARAM_STR); $stmt->execute(); return $stmt; } } $pdo = new PDO('sqlite::memory:'); $pdo->exec('CREATE TABLE tweets (id INT NOT NULL, username VARCHAR(255) NOT NULL, text VARCHAR(255) NOT NULL, PRIMARY KEY(id))'); $pdo->exec('INSERT INTO tweets (id, username, text) VALUES (42, "giorgiosironi", "Practical PHP Patterns has come to an end")'); $pdo->exec('INSERT INTO tweets (id, username, text) VALUES (43, "giorgiosironi", "Cool series: will continue as Practical PHP Testing Patterns")'); // client code $table = new TweetsTable($pdo); $recordSet = $table->getTweetsRecordSet('giorgiosironi'); while ($row = $recordSet->fetch()) { var_dump($row['text']); }

October 18, 2010

by Giorgio Sironi

· 3,359 Views

Enum tricks: hierarchical data structure

Java enums are typically used to hold array like data. This tip shows how to use enum for hierarchical structures. Motivation Once upon a time I wanted to create enum that contains various operating system, i.e. public enum OsType { WindowsNTWorkstation, WindowsNTServer, Windows2000Server, Windows2000Workstation, WindowsXp, WindowsVista, Windows7, Windows95, Windows98, Fedora, Ubuntu, Knopix, SunOs, HpUx, I did not like this structure because I'd like to see a group of WindowsNT that contains WinNTWorkstation and WindNT server. All windows versions should be in super group of "windows". Fedora, Knopix and Ubuntu are distributions of Linux. All Linux distributions together with SunOs and HpUx are Unix systems. All Windows systems have common properties. The same is about Unix systems. And I hate copy/paste programming. Solutions As always there are several solutions. Class per OS Solution The obvious solution here is to create separate classes per operating system and abstract classes that represent OS groups. For example class Fedora extends class Linux that extends class Unix that extends class OperatingSystem. We can enjoy all advantages of inheritance, so all common properties of Windows OS are stored in class Windows and can be overridden by its subclasses. But now we cannot see all operating systems together, iterate over them etc., i.e. very useful features of Java enum are missing. No problem! Now we can create enum like previous that holds custom field of type Class: public enum OsType { WindowsNTWorkstation(WindowsNTWorkstation.class), WindowsNTServer(WindowsNTServer.class), Windows2000Server(Windows2000Server.class), Windows2000Workstation(Windows2000Workstation.class), WindowsXp(WindowsXp.class), WindowsVista(WindowsVista.class), Windows7(Windows7.class), Windows95(Windows7.class), Windows98(Windows98.class), Fedora(Fedora.class), Ubuntu(Ubuntu.class), Knopix(Knopix.class), SunOs(SunOs.class), HpUx(HpUx.class), ; private Class clazz; OsType(Class clazz) { this.clazz = clazz; } } This solution is better but it still has disadvantages: Implementation of method that retrieves all "children" of specific OS (for example all Linux distributions) is hard and ineffective. Grouping is separate from enum. The solution is very verbose: each OS is represented by its own class even if the class has nothing to override. Hierarchical Enum To create hierarchy using enum we need custom field "parent" that is initialized by constructor: public enum OsType { OS(null), Windows(OS), WindowsNT(Windows), WindowsNTWorkstation(WindowsNT), WindowsNTServer(WindowsNT), Windows2000(Windows), Windows2000Server(Windows2000), Windows2000Workstation(Windows2000), WindowsXp(Windows), WindowsVista(Windows), Windows7(Windows), Windows95(Windows), Windows98(Windows), Unix(OS) { @Override public boolean supportsXWindows() { return true; } }, Linux(Unix), AIX(Unix), HpUx(Unix), SunOs(Unix), ; private OsType parent = null; private OsType(OsType parent) { this.parent = parent; } } This structure allows implementation of method "is" that works like operator instanceof for classes and interfaces. For example Windows2000 is Windows, Fedora is Linux, Windows is not Unix etc. public boolean is(OsType other) { if (other == null) { return false; } for (OsType t = this; t != null; t = t.parent) { if (other == t) { return true; } } return false; } Sometimes we need a method that returns all "children" of current nodes, e.g. all Linux systems or all variants of Windows2000. The easiest way to implement this is to hold collection of children per element and fill it from constructor: private List children = new ArrayList(); private OsType(OsType parent) { this.parent = parent; if (this.parent != null) { this.parent.addChild(this); } } Now method "children()" that returns direct node's children is trivial: public OsType[] children() { return children.toArray(new OsType[children.size()]); } It is not hard to implement recursive method "allChildren()" that returns all children of current node (see full source code). But hierarchy term is always accompanied by inheritance that allows overriding methods of parent. This is the basic feature of classes in all object oriented languages. Is it possible to implement a kind of inheritance relationship for elements of one enum? Overriding parent's method Unix systems support X Window graphical environment. MS Windows does not. We would like to be able to ask OS whether it supports X Window. We can define boolean flag "supportsX" and boolean method public boolean supportsX() {return suppotsX;} Now we have to add yet another argument to OsType constructor and pass true/false for each element of the enum. But it is too verbose. Is it possible to say that Unix supports X, Windows does not support X and be sure that Fedora's supportX() returns true while Winddows95's supportX() returns false? The implementation is pretty simple. First for simplicity let's say that X Window is supported by all Unix systems and is not supported by others. So, we can implement method supportsXWindowSystem() at enum level as following: public boolean supportsXWindowSystem() { return false; } Now we have to override it for all Unix systems. To implement this we change the default implementation to following: public boolean supportsXWindowSystem() { return parent == null ? false : parent.supportsXWindowSystem(); } The method of first parent in hierarchy that implements the method will be used. If no one of parents and parents of parents does not implement this method itself we call method of root element. Now we can say the following: ... Unix(OS) { @Override public boolean supportsXWindowSystem() { return true; } }, Linux(Unix), AIX(Unix), HpUx(Unix), SunOs(Unix), ... The method is overridden for Unix element only and all its children will use this method. The problem is solved. We got enum based hierarchical polymorphic structure! We can implement method in base element (using it like a super class) and then override it in any element we want. The only disadvantage of this solution is that now we have to create similar implementation for each method we add to this enum and for all other enums that hold hierarchical structure. Conclusions Although we are regular to use enums as some kind of static arrays they also can be used to present hierarchical tree-like data structures where each node can find its parent, its children and even inherit and override parent's method almost exactly as we do with class inheritance. Acknowledgments First version of this article has been written in my blog. Method supportsXWindowSystem() there was implemented using reflection. 4 guys discussed this solution and suggested me to simplify it. I would like to thank Todo, Anton Dumler, Nick and Johannes Schneider that helped me to improve the article.

October 18, 2010

by Alexander Radzin

· 80,946 Views · 2 Likes

Enum Tricks: Customized valueOf

When I am writing enumerations I very often found myself implementing a static method similar to the standard enum’s valueOf() but based on field rather than name: public static TestOne valueOfDescription(String description) { for (TestOne v : values()) { if (v.description.equals(description)) { return v; } } throw new IllegalArgumentException( "No enum const " + TestOne.class + "@description." + description); } Where “description” is yet another String field in my enum. And I am not alone. See this article for example. Obviously this method is very ineffective. Every time it is invoked it iterates over all members of the enum. Here is the improved version that uses a cache: private static Map map = null; public static TestTwo valueOfDescription(String description) { synchronized(TestTwo.class) { if (map == null) { map = new HashMap(); for (TestTwo v : values()) { map.put(v.description, v); } } } TestTwo result = map.get(description); if (result == null) { throw new IllegalArgumentException( "No enum const " + TestTwo.class + "@description." + description); } return result; } It is fine if we have only one enum and only one custom field that we use to find the enum value. But if we have 20 enums, and each has 3 such fields, then the code will be very verbose. As I dislike copy/paste programming I have implemented a utility that helps to create such methods. I called this utility class ValueOf. It has 2 public methods: public static , V> T valueOf(Class enumType, String fieldName, V value); which finds the required field in specified enum. It is implemented utilizing reflection and uses a hash table initialized during the first call for better performance. The other overridden valueOf() looks like: public static > T valueOf(Class enumType, Comparable comparable); This method does not cache results, so it iterates over enum members on each invocation. But it is more universal: you can implement comparable as you want, so this method may find enum members using more complicated criteria. Full code with examples and JUnit test case are available here. Conclusions Java Enums provide the ability to locate enum members by name. This article describes a utility that makes it easy to locate enum members by any other field.

October 16, 2010

by Alexander Radzin

· 80,199 Views

Naming Conventions for Parameterized Types

Parameterized types - the <> expressions that can be used in Java as of JDK 5 are not just for collections. I find myself frequently using them in APIs I design. They really do let you write things which are more generic in the non-Java sense of the word - and the result is more reusable code, which means less code overall, which means fewer bugs and things to test. The verbosity, and some of the weirdness of type-erasure are less than ideal, but used right, the benefits are worth the complexity. The standard (and somewhere recommended) naming convention for parameterized types is to use a single-letter name. That works fine in signatures that have only one such type. But in practice, single-letter names make code less self-describing, and if you're defining a class with more than one parameterized type, it can be confusing and hard to read. People other than me will have to call, understand and maintain my code - the more self-describing I can make it, the better. So I am looking for a naming convention that makes it obvious that something is a parameterized type, but allows for descriptive names. I am wondering if anybody else has run into this problem, and if there is any emerging consensus on naming generics. Do you work on a project that uses generics a lot? If so, what do you do? Here's an example. At the moment, I'm writing a generic (in both senses) class which simply limits the number of threads which can access some resource. It's basically a wrapper around a Semaphore which uses a Runnable-like object to ensure that the Semaphore is accessed correctly, and does some non-blocking statistic gathering about thread contention. So to access the scarce resource, you pass in a ResourceAccessor: public interface ResourceAccessor { public Result run (ProtectedResource resource, Argument argument); } The problem is that, when somebody looks at this interface, they will instantly get the idea that there are really classes they need to go find, which are called ProtectedResource, Argument and Result - and of course, no such classes exist - these are just names for generic types. The standard-naming-convention is worse: public interface ResourceAccessor { public S run (T resource, R argument); } Here, nobody could possibly figure out what on earth this class is for without extensive documentation - this is a really horrible idea. So I've concluded that the standard recommendations for generic type names are simply wrong for any non-trivial usage (I.e. Collection is fine, since there is one type and Collections are well-understood). You simply can't do this on a non-collection code structure you have invented, or people will just be confused and not use it. The best suggestion I've heard thus far is using $ as a prefix: public interface ResourceAccessor <$ProtectedResource, $Argument, $Result> { public $Result run ($ProtectedResource resource, $Argument argument); } I don't find this pretty, but I don't have any better ideas, and at least it makes it crystal-clear that there is something different about these names. Any thoughts? What do you do in this situation?

September 20, 2010

by Tim Boudreau

· 17,896 Views

Commons Lang 3 -- Improved and Powerful StringEscapeUtils

In the first and second parts of this series I talked about some of the new features like enum and concurrency support that have been added in commons-lang 3. In this article, I am going to talk about a new package 'org.apache.commons.lang3.text.translate' which has been added in commons-lang 3. This package is added to fix problems in the design and implementation of the StringEscapeUtils class which exists in versions prior to 3.0. To make it clearer, let's first talk about the purpose of StringEscapeUtils class and the problems it had prior to version 3. Purpose of StringEscapeUtils StringEscapeUtils is a utility class which escapes and unescapes String for Java, JavaScript, HTML, XML, and SQL. For example, @Test public void test_StringEscapeUtils() { assertEquals("\\\\\\n\\t\\r", StringEscapeUtils.escapeJava("\\\n\t\r")); // escapes the Java String assertEquals("\\\n\t\r",StringEscapeUtils.unescapeJava("\\\\\\n\\t\\r")); //unescapes the Java String assertEquals("I didn\\'t say \\\"you to run!\\\"",StringEscapeUtils.escapeJavaScript("I didn't say \"you to run!\""));//escapes the Javascript assertEquals("<xml>", StringEscapeUtils.escapeXml(""));//escapes the xml } Problems with StringEscapeUtils There were a lot of problems in the StringEscapeUtils implementation prior to version3. Some of these were: The implementation was not extensible. Let's take an example of escapeJava, suppose we want to add support in the escapeJava method that it should start escaping single quotes. To add such support we would have to change the existing class code and another if condition which if satisfied will escape single quotes. So, the API was breaking the open-closed principle i.e. a class should be open for extension and closed for modification. It was not symmetric i.e. original should be equal to unescape(escape(original)) but it was not the case. StringEscapeUtils.escapeHtml() escapes multibyte characters like Chinese, Japanese etc. Issue 339 @Test public void testEscapeHiragana() { // Some random Japanese unicode characters String original = "\u304B\u304C\u3068"; String escaped = StringEscapeUtils.escapeHtml(original); assertEquals(original, escaped); } StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into 2 characters. Issue 480 @Test public void testEscapeHtmlHighUnicode() throws java.io.UnsupportedEncodingException { byte[] data = new byte[] { (byte) 0xF0, (byte) 0x9D, (byte) 0x8D,(byte) 0xA2 }; String original = new String(data, "UTF8"); String escaped = StringEscapeUtils.escapeHtml(original); assertEquals(original, escaped); } StringEscaper.escapeXml() escapes characters > 0x7f . Issue 66 @Test public void shouldNotEscapeValuesGreaterThan0x7f() { assertEquals("XML should not escape >0x7f values", "\u00A1",StringEscapeUtils.escapeXml("\u00A1")); } Solution -- Rewritten StringEscapeUtils In version 3.0, StringEscapeUtils is completely rewritten to fix all the bugs associated with this class and to provide a way for the user to customize the behavior of its methods. They have moved all the logic present in the StringEscapeUtils to the classes in the package 'org.apache.commons.lang3.text.translate'. Let's take an example of escapeJava function in StringEscapeUtils, escapeJava function does not contain any business logic, it just calls the translate method on CharSequenceTranslator reference. What they did can be best understood by looking at the code below public static final CharSequenceTranslator ESCAPE_JAVA = new AggregateTranslator(new LookupTranslator( new String[][] { {"\"", "\\\""}, {"\\", "\\\\"}, }),new LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_ESCAPE()),UnicodeEscaper.outsideOf(32, 0x7f)); and in the escapeJava method public static final String escapeJava(String input) { return ESCAPE_JAVA.translate(input); } A constant of type CharSequenceTranslator was assigned an AggregateTranslator object. AggregateTranslator can take an array of translators, and it iterates over each of them. The LookupTranslator replaces the string at zeroth index with the string at the first index. UnicodeEscaper translates values outside the given range to unicode values. As you can see, you can very easily write your own escape methods. For example, if you want to add the support of escaping &, you can do it like this public static final CharSequenceTranslator ESCAPE_JAVA = new LookupTranslator( new String[][] { {"\"", "\\\""}, {"\\", "\\\\"}, }).with(new LookupTranslator( new String[][]{ {"&", "&"}, {"<", "<"} )).with( new LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_ESCAPE()) ).with( UnicodeEscaper.outsideOf(32, 0x7f) ); StringEscapeUtils.escapeSql has been removed from the API as it was misleading developers to not use PreparedStatement.This method was not of much use as it was only escaping single quotes.

September 17, 2010

by Shekhar Gulati

· 71,199 Views · 2 Likes

Jetty-maven-plugin: Running a Webapp with a DataSource and Security

this post describes how to configure the jetty-maven-plugin and the jetty servlet container to run a web application that uses a data source and requires users to log in, which are the basic requirements of most web applications. i use jetty in development because it’s fast and easy to work with. why jetty? well, because it’s much faster then the websphere as i normally use and it really well supports fast (or shall i say agile? ) development thanks to its fast turnaround. and because it’s simply cool to type bash$ svn checkout http://example.com/repo/trunk/mywebapp bash$ cd mywebapp bash$ mvn jetty:run bash$ firefox http://localhost:8080/mywebapp and to be able to immediatelly log into and interact with the application. however it should be noted that jetty isn’t a full-featured javaee server and thus may not be always usable. project setup general configuration you need to add the jetty plugin to your pom.xml : 4.0.0 com.example mywebapp war ... ... org.mortbay.jetty maven-jetty-plugin 6.1.0 3 ... ... ... as you can see, i’m using jetty 6.1.0. defining a datasource let’s assume that the application uses a datasource configured at the server and accesses it normally via jndi. then we must define a reference to the data source in src/main/webapp/ web-inf/web.xml : ... ... ... jdbc/lmsdb javax.sql.datasource container shareable next we need to describe the datasource to jetty. there are multiple ways to do that, i’ve chosen to do so in src/main/webapp/ web-inf/jetty-env.xml : jdbc/lmsdb lmsdb myuser secret db.toronto.ca.ibm.com 3711 notice that the class used is db2simpledatasource and not a jdbc driver. that is, of course, because we need a datasource, not a driver. the jetty wiki pages also contain examples of datasource configuration for other dbs . finally we must make the corresponding jdbc implementation available to jetty by adding it to the plugin’s dependencies in the pom.xml : org.mortbay.jetty maven-jetty-plugin 6.1.0 <... com.ibm.db2 db2jcc 9.7 jar system ${basedir}/../lms.sharedlibraries/db2/db2jcc.jar com.ibm.db2 db2jcc_license_cisuz 9.7 jar system ${basedir}/../lms.sharedlibraries/db2/db2jcc_license_cisuz.jar please do not scorn me for using system-scoped dependencies , sometimes that is unfortunatelly the most feasible way. enabling security and configuring an authentication mechanism we would like to limit access to the application only to the authenticated users in the admin role with the exception of pages under public/. therefore we declare the appropriate security constraints in web.xml: ... authorizedusers all urls /* admin publicaccess public pages /public/* basic learning@ibm mini person feed management administrator access admin ... beware that jetty doesn’t support https out of the box and thus if you will add the data constraint confidential to any resource, you will automatically get http 403 forbidden no matter what you do. that’s why i’ve commented it out above. it is possible to enable ssl in jetty but i didn’t want to bother with certificate generation etc. next we need to tell jetty how to authenticate users. this is done via realms and we will use the simplest, file-based one. again there are multiple ways to configure it, for example in the pom.xml : org.mortbay.jetty maven-jetty-plugin 6.1.0 3 learning@ibm mini person feed management src/test/resources/jetty-users.properties ... the name must match exactly the realm-name in web.xml. you then define the users and their passwords and roles in the declared file, in this case in src/test/resources/ jetty-users.properties : user=psw,admin the format of the file is username=password[,role1,role2,...]. when you download jetty, you will find a fine example of using jaas with a file-based back-end for authentication and authorization under examples/test-jaas-webapp (invoke mvn jetty:run from the folder and go to http://localhost:8080/jetty-test-jaas/). however it seems that jaas causes an additional overhead visible as a few-seconds delay when starting the server so it might be preferrable not to use it. conclusion with jetty it’s easy to enable security and create a data source, which are the basic requirements of most web applications. anybody can then very easily run the application to test and develop it. development is where jetty really shines provided that you don’t need any feature it doesn’t have. when troubleshooting, you may want to tell jetty to log at the debug level with mvn -ddebug .. or to log requests , which can be also configured in the jetty-env.xml. beware that this post describes configuration for jetty 6.1.0. it can be different in other versions and it certainly is different in jetty 7. from http://theholyjava.wordpress.com/2010/09/10/jetty-maven-plugin-running-a-webapp-with-a-datasource-and-security/

September 13, 2010

by Jakub Holý

· 23,055 Views

Java: Overriding Getters and Setters Design

Why do we keep instance variables private? We don’t want other classes to depend on them. Moreover it gives the flexibility to change a variable’s type or implementation on a whim or an impulse. Why, then programmers automatically add or override getters and setters to their objects, exposing their private variables as if they were public? Accessor methods Accessors (also known as getters and setters) are methods that let you read and write the value of an instance variable of an object. public class AccessorExample { private String attribute; public String getAttribute() { return attribute; } public void setAttribute(String attribute) { this.attribute = attribute; } } Why Accessors? There are actually many good reasons to consider using accessors rather than directly exposing fields of a class Getter and Setter make API more stable. For instance, consider a field public in a class which is accessed by other classes. Now, later on, you want to add any extra logic while getting and setting the variable. This will impact the existing client that uses the API. So any changes to this public field will require a change to each class that refers it. On the contrary, with accessor methods, one can easily add some logic like cache some data, lazily initialize it later. Moreover, one can fire a property changed event if the new value is different from the previous value. All this will be seamless to the class that gets the value using accessor method. Should I have Accessor Methods for all my fields? Fields can be declared public for package-private or private nested class. Exposing fields in these classes produces less visual clutter compare to accessor-method approach, both in the class definition and in the client code that uses it. If a class is package-private or is a private nested class, there is nothing inherently wrong with exposing its data fields—assuming they do an adequate job of describing the abstraction provided by the class. Such code is restricted to the package where the class is declared, while the client code is tied to class internal representation. We can change it without modifying any code outside that package. Moreover, in the case of a private nested class, the scope of the change is further restricted to the enclosing class. Another example of a design that uses public fields is JavaSpace entry objects. Ken Arnold described the process they went through to decide to make those fields public instead of private with get and set methods here Now this sometimes makes people uncomfortable because they've been told not to have public fields; that public fields are bad. And often, people interpret those things religiously. But we're not a very religious bunch. Rules have reasons. And the reason for the private data rule doesn't apply in this particular case. It is a rare exception to the rule. I also tell people not to put public fields in their objects, but exceptions exist. This is an exception to the rule, because it is simpler and safer to just say it is a field. We sat back and asked: Why is the rule thus? Does it apply? In this case it doesn't. Private fields + Public accessors == encapsulation Consider the example below public class A { public int a; } Usually, this is considered bad coding practice as it violates encapsulation. The alternate approach is public class A { private int a; public void setA(int a) { this.a =a; } public int getA() { return this.a; } } It is argued that this encapsulates the attribute. Now is this really encapsulation? The fact is, Getters/setters have nothing to do with encapsulation. Here the data isn't more hidden or encapsulated than it was in a public field. Other objects still have intimate knowledge of the internals of the class. Changes made to the class might ripple out and enforce changes in dependent classes. Getter and setter in this way are generally breaking encapsulation. A truly well-encapsulated class has no setters and preferably no getters either. Rather than asking a class for some data and then compute something with it, the class should be responsible for computing something with its data and then return the result. Consider an example below, public class Screens { private Map screens = new HashMap(); public Map getScreens() { return screens; } public void setScreens(Map screens) { this.screens = screens; } // remaining code here } If we need to get a particular screen, we do code like below, Screen s = (Screen)screens.get(screenId); There are things worth noticing here.... The client needs to get an Object from the Map and casting it to the right type. Moreover, the worst is that any client of the Map has the power to clear it which may not be the case we usually want. An alternative implementation of the same logic is: public class Screens { private Map screens = new HashMap(); public Screen getById(String id) { return (Screen) screens.get(id); } // remaining code here } Here the Map instance and the interface at the boundary (Map) are hidden. Getters and Setters are highly Overused Creating private fields and then using the IDE to automatically generate getters and setters for all these fields is almost as bad as using public fields. One reason for the overuse is that in an IDE it’s just now a matter of few clicks to create these accessors. The completely meaningless getter/setter code is at times longer than the real logic in a class and you will read these functions many times even if you don't want to. All fields should be kept private, but with setters only when they make sense which makes object Immutable. Adding an unnecessary getter reveals internal structure, which is an opportunity for increased coupling. To avoid this, every time before adding the accessor, we should analyse if we can encapsulate the behaviour instead. Let’s take another example, public class Money { private double amount; public double getAmount() { return amount; } public void setAmount(double amount) { this.amount = amount; } //client Money pocketMoney = new Money(); pocketMoney.setAmount(15d); double amount = pocketMoney.getAmount(); // we know its double pocketMoney.setAmount(amount + 10d); } With the above logic, later on, if we assume that double is not a right type to use and should use BigDecimal instead, then the existing client that uses this class also breaks. Let’s restructure the above example, public class Money { private BigDecimal amount; public Money(String amount) { this.amount = new BigDecimal(amount); } public void add(Money toAdd) { amount = amount.add(toAdd.amount); } // client Money balance1 = new Money("10.0"); Money balance2 = new Money("6.0"); balance1.add(balance2); } Now instead of asking for a value, the class has a responsibility to increase its own value. With this approach, the change request for any other datatype in future requires no change in the client code. Here not only the data is encapsulated but also the data which is stored, or even the fact that it exists at all. Conclusions Use of accessors to restrict direct access to field variable is preferred over the use of public fields, however, making getters and setter for each and every field is overkill. It also depends on the situation though, sometimes you just want a dumb data object. Accessors should be added to a field where they're really required. A class should expose larger behaviour which happens to use its state, rather than a repository of state to be manipulated by other classes. More Reading http://c2.com/cgi/wiki?TellDontAsk http://c2.com/cgi/wiki?AccessorsAreEvil Effective Java - See more at http://muhammadkhojaye.blogspot.co.uk/2010/10/getter-setter-to-use-or-not-to-use.html

September 10, 2010

by Muhammad Ali Khojaye

· 98,261 Views · 1 Like

Practical PHP Patterns: Data Transfer Object

Information can travel in very different ways between the parts of an applications, which could be tricky when these parts are distributed on different tiers, or times. The Data Transfer Object pattern prescribes to use a first-class citizen like a class to model the data used in the communication. This pattern was originally meant to aid logic distribution: coarse grained interfaces, which are ideal for remote interaction, must return more data in each call to reduce the number of messages sent over the network and their overhead. This kind of objects is built with the goal of be easy to serialize and send over the wire. In the PHP world, the communication is usually either from server to server, or even targets the same server: PHP is a lot different from other languages which are used for application distribution. Commonly the serialization and unserialization of objects are accomplished by the same codebase, which has a short life (the time of an HTTP request) and stores data with the serialization mechanism to maintain state between different requests. As a result, the usual dependencies discourse, which prescribes to use a Data Mapper between the Domain Model and the DTOs, do not apply; to the point that sometimes even domain model objects are serialized directly (an ORM like Doctrine 2 allows you to do so). There are more differences with classic Java DTOs which we'll see in this article. Implementation The definition of Data Transfer Object talks about communication between different processes: subsequent executions of the same PHP script are indeed different processes (or, when the same process is reused, it does not share any variable space with the previous executions). PHP is peculiar also from the implementation point of view: a DTO may not even be an object, since an ordinary or multidimensional array will do the same job in many cases. However, an object implementation is necessary to take advantage of object handlers, where the structure of data is not hierarchical but involves a connected graph of objects or recursive data structures. This implementation choice is much more clear than using arrays and variable references, which can be tricky in PHP. Basically, a DTO is the object which we have been told to never write: it has getters and setters to expose allof its properties, while providing nearly no encapsulation nor business logic. It's a data structure, like a Value Object (which was its name in certain Java literature). But it is not immutable nor it carries the semantic meaning of Value Objects. Use cases Data Transfer Objects come handy in many use cases. In their simplest implementation, they are used for returning multiple values from a method. Their further evolution involves then serialization mechanisms; of course both the provenience and the destination of a Data Transfer Object need to be a PHP environment, a fact which opens different scenarios from classical Java distribution ones, that involve clients well. For example a Data Transfer object, easily serializable, can be stored in caches (memcache) or sessions ($_SESSION). Other use cases regard databases: according to Fowler, Record Sets like PDOStatement can be defined as the Data Transfer Objects of relational databases. There are even more scenarios for Data Transfer Object, like clients which return a serialized data structure, or their usage for breaking dependencies between different layers: a Controller may populate a DTO and pass it to the View. Serialization As you may have experience while using var_dump() in PHP, the objects of a Domain Model are usually interconnected in a complex graph; DTOs isolate a subset of the graph and make it easy to serialize, even by omitting part of the information. Thus, serialization of domain object needs some complex infrastructure (lazy loading proxies which are discarded on serialization and must be re-initialized during the merge with a current object graph); sometimes is by far easier to extract data in a DTO, especially when you do not have particular libraries at your disposal. Once you have an isolated object graph, PHP will handle serialization by itself, even without marker interfaces; it will simply include all the public, protected and private properties (this behavior can be modified by specifying a __sleep() method). Example This running code shows a small domain model composed of the User and Group classes, and how a DTO for the User class allows to serialize a User without pulling in its Groups, for example for quickly storing it in a cache. _name; } public function setName($name) { $this->_name = $name; } /** * @return string */ public function getRole() { return $this->_role; } public function setRole($role) { $this->_role = $role; } public function addGroup(Group $group) { $this->_group = $group; } public function getGroups() { return $this->_groups; } } /** * Another Domain Model class. */ class Group { private $_name; /** * @return string */ public function getName() { return $this->_name; } public function setName($name) { $this->_name = $name; } } /** * The Data Transfer Object for User. * It stores the mandatory data for a particular use case - for example ignoring the groups, * and ensuring easy serialization. */ class UserDTO { /** * In more complex implementations, the population of the DTO can be the responsibilty * of an Assembler object, which would also break any dependency between User and UserDTO. */ public function __construct(User $user) { $this->_name = $user->getName(); $this->_role = $user->getRole(); } public function getName() { return $this->_name; } public function getRole() { return $this->_role; } // there are no setters because this use cases does not require modification of data // however, in general DTOs do not need to be immutable. } // client code $user = new User(); $user->setName('Giorgio'); $user->setRole('Author'); $user->addGroup(new Group('Authors')); $user->addGroup(new Group('Editors')); // many more groups $dto = new UserDTO($user); // this value is what will be stored in the session, or in a cache... var_dump(serialize($dto));

August 12, 2010

by Giorgio Sironi

· 95,408 Views · 5 Likes

Code Generation With Xtext

Recently I attended a local rheinJUG meeting in Düsseldorf. While the topic of the session was Eclipse e4, the night’s sponsor itemis provided some handouts on Xtext which got me very interested. The reason is that currently at work we are developing a mobile Java application (J9, CDC/Foundation 1.1 on Windows CE6) for which we needed an easy to use and reliable way for configuring navigation through the application. In a previous iteration we had – mostly because of time constraints – hard coded most of the navigational paths, but this time the app is more complex and doing that again was not really an option. First we thought about an XML based configuration, but this seemed to be a hassle to write (and read) and also would mean we would have to pay the price of parsing it on every application startup. Enter Xtext: An Eclipse based framework/library for building text based DSLs. In short, you just provide a grammar description of a new DSL to suit your needs and with – literally – just a few mouse clicks you are provided with a content-assist, syntax-highlight, outline-view-enabled Eclipse editor and optionally a code generator based on that language. Getting started: Sample Grammar There is a nice tutorial provided as part of the Xtext documentation, but I believe it might be beneficial to provide another example of how to put a DSL to good use. I will not go into every step in great detail, because setting up Xtext is Eclipse 3.6 Helios is just a matter of putting an Update Site URL in, and the New Project wizard provided makes the initial setup a snap. I assume, you have already set up Eclipse and Xtext and created a new Xtext project including a generator project (activate the corresponding checkbox when going through the wizard). In this post I am assuming a project name of com.danielschneller.navi.dsl and a file extension of .navi. When finished we will have the infrastructure ready for editing, parsing and generating code based on files like these: navigation rules for MyApplication mappings { map permission AdminPermission to "privAdmin" map permission DataAccessPermission to "privData" map coordinate Login to "com.danielschneller.myapp.gui.login.LoginController" in "com.danielschneller.myapp.login" map coordinate LoginFailed to "com.danielschneller.myapp.gui.login.LoginFailedController" in "com.danielschneller.myapp.login" map coordinate MainMenu to "com.danielschneller.myapp.gui.menu.MainMenuController" in "com.danielschneller.myapp.menu" map coordinate UserAdministration to "com.danielschneller.myapp.gui.admin.UserAdminController" in "com.danielschneller.myapp.admin" map coordinate DataLookup to "com.danielschneller.myapp.gui.lookup.LookupController" in "com.danielschneller.myapp.lookup" } navigations { define navigation USER_LOGON_FAILED define navigation USER_LOGON_SUCCESS define navigation OK define navigation BACK define navigation ADMIN define navigation DATA_LOOKUP } navrules { from Login on navigation USER_LOGON_FAILED go to LoginFailed on navigation USER_LOGON_SUCCESS go to MainMenu from LoginFailed on navigation OK go to Login from MainMenu on navigation ADMIN go to UserAdministration with AdminPermission on navigation DATA_LOOKUP go to DataLookup with DataAccessPermission from UserAdministration on navigation BACK go to MainMenu from DataLookup on navigation BACK go to MainMenu } As you can see it is a nice little language for defining coordinates in an application, meaning a specific GUI for a certain task and the possible navigation paths between them. Optionally a navigation path can be tagged to require one or more permissions to work. So for example one possible navigation path shown in the above sample is from the applications main menu, identified by the identifier MainMenu and represented in code by the com.danielschneller.myapp.gui.menu.MainMenuController class in the com.danielschneller.myapp.menu OSGi bundle to a GUI identified as DataLookup, implemented by com.danielschneller.myapp.gui.lookup.LookupController in the com.danielschneller.myapp.lookup bundle. For this path to be taken, the application must request the DataLookup navigation path and the currently logged in user be assigned the DataAccessPermission. What exactly that means is not the focus of this tutorial, suffice it to say that we somehow need to get the information contained in this specialized language into our Java application in some shape or form that can be evaluated at runtime. In the following example all information will be transformed into a HashMap based data structure. For our little mobile application this has several advantages over the XML option mentioned earlier: No XML parsing necessary on application startup, saving some performance Validation of the navigation rules ahead of time, preventing parse errors at runtime No libraries needed to access the information – by putting everything in a simple HashMap we do not have to rely on any non-standard classes whatsoever First thing I did when I started with Xtext was define a sample input file such as the one above. Then – following its general structure – I began to extract a formal grammar for it. Of course, the first draft of the sample data was not perfect, over the course of a few iterations I refined some of the syntax, but in the end this is the grammar definition I came up with. It is heavily commented to allow you to copy it out and still leave the documentation intact: grammar com.danielschneller.navi.NavigationRules with org.eclipse.xtext.common.Terminals generate navigationRules "http://com.danielschneller/fw/funkmde/navi/NavigationRules" /* * The top level entry point for the file. * "Root" is just a name as good as any, but * makes the meaning quite clear. */ Root: // first thing in the file is a "keyword", // followed by an attribute that will be // accessible as "name" later and allow // definition of an ID type of thing. 'navigation rules for' name=ID // after the keyword and "name" attribute // three sections follow, each assigned // to an attribute for later reference // (called "mappingdefs", "transitiondefs" // and "ruledefs"). // Their types are defined later in the file. mappingsdefs=Mappings transitiondefs=TransitionDefinitions ruledefs=NavigationRules // semicolon ends the definition of "Root" ; // mappings section >>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* * Definition of the "Mappings" type used in * the "Root" type. */ Mappings: // first the keyword "mappings" is expected, // then an open curly 'mappings' '{' // after that a collection of "Mapping"s is // expected. The "+=" means that they will // all be collected in a collection type element // called "mappings" for future reference. // The "+" at the end means "at least one, but // more is just fine". (mappings+=Mapping)+ // finally the "Mappings" type requires a closing // curly brace. '}' // semicolon ends the definition of "Mappings" ; /* * Definition of a single "Mapping", those we are * collecting in the "mappings" attribute of the * "Mappings" type. */ Mapping: // each mapping starts with the keyword "map" // and is followed by an element of type "MappingSpec" 'map' MappingSpec ; /* * Definition of a "MappingSpec" element. This is * actually just a "parent type" for two more specific * kinds of "MappingSpec": */ MappingSpec: // no keywords are defined here, a "MappingSpec" // can be either a "PermissionMappingSpec" or a // "CoordinateMappingSpec". Any of these will be // fine where a "MappingSpec" is asked for. PermissionMappingSpec | CoordinateMappingSpec ; /* * Definition of a "PermissionMappingSpec" element. */ PermissionMappingSpec: // first the keyword "permission" is required. // then a "name" attribute is expected of type ID. // Following the name the "to" keyword is expected, // followed by a string that is stored in the "value" // attribute 'permission' name=ID 'to' value=STRING ; /* * Definition of a "CoordinateMappingSpec" element. * The definition is very similar to the "PermissionMappingSpec" * but has more attributes. */ CoordinateMappingSpec: // first the keyword "coordinate", then an ID stored as "name", // the keyword "to", followed by a string stored as "controllername", // next the keyword "in" and finally another string, memorized as // "bundleid" 'coordinate' name=ID 'to' controllername=STRING 'in' bundleid=STRING ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< mappings section // >>>>>>>>>>>>>>>>>>>>>>>>>>>>> navigations section /* * Definition of the "TransitionDefinitions" type used in * the "Root" type. */ TransitionDefinitions: // first, this element is introduced with the "navigations" // keyword, followed by an open curly brace. 'navigations' '{' // after that a collection of "TransitionDefinition"s is // expected. The "+=" means that they will // all be collected in a collection type element // called "transitions" for future reference. // The "+" at the end means "at least one, but // more is just fine". (transitions+=TransitionDefinition)+ // the element ends with a closing curly brace '}' ; /* * Definition of a "TransitionDefinition" element. This * one is very simple. */ TransitionDefinition: // the keyword "define navigation" is required first, // then a "name" attribute of type ID is expected. 'define navigation' name=ID ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< navigations section // >>>>>>>>>>>>>>>>>>>>>>>>>>>>> navrules section /* * Definition of the "NavigationRules" element. */ NavigationRules: // Element starts with the keywords "navrules" and // open curly. 'navrules' '{' // collection attribute called "rules", consisting // of one or more occurrences of a "Rule" element. (rules+=Rule)+ // element finishes with a closing curly keyword '}' ; /* * Definition of a "Rule" element as used in the "NavigationRules" * element. */ Rule: // first the "from" keyword, then a reference to one of the // coordinate mappings defined earlier. This time no new // definition of a coordinate is required, but one of those // that have been listed before. So the type here is put in // square brackets 'from' source=[CoordinateMappingSpec] // following the source specification, one or more "Destination" // type elements are expected, collected in a collection attribute // named "destinations" (destinations+=Destination)+ ; /* * Definition of a "Destination" type. These are collected * in a "Rule". */ Destination: // first comes an "on navigation" keyword. After that a // reference to one of the Transition elements defined // in the "navigations" section is required and stored // in the "transition" attribute. // after that follows a "go to" keyword and a reference // to a coordinate mapping, stored in the "target" attribute. // finally - as with the "destinations" collection attribute // in the "Rule" element - a "permissions" collection is // defined to store none or more (*) "PermissionReference" // elements. 'on navigation' transition=[TransitionDefinition] 'go to' target=[CoordinateMappingSpec] (permissions+=PermissionReference)* ; /* * Definition of a "PermissionReference" type. This is used * in the "permissions" collection of a "Destination". */ PermissionReference: // first, a "with" keyword is expected. After that a // "permission" attribute stores a reference to one of // the previously defined permission mappings from the // "mappings" section. 'with' permission=[PermissionMappingSpec] ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< navrules section This is what XText can digest and create an editor plugin and outline view for. Just save this as navigationRules.xtext – when you created the XText project in Eclipse using the wizard it should have been prepared for you. Copying and pasting this into a .xtext file in Eclipse will provide you with syntax highlighting, code completion and syntax checking, making it easy to play around with grammar files. Once done, right click the .mwe2 file lying next to the grammar file in the Package Explorer view and select Run As MWE2 Workflow from the context menu. This will take a moment and generate several classes, both in the current (XText) project and the accompanying ...ui project. Next, right click the Xtext project and select Run As Eclipse Application from the context menu. This will bring up another Eclipse instance with the newly created support for navigation rules files (with a .navi suffix) installed. To try it out, just create a new project and in that a new file. Make sure its name ends in .navi. When asked, make sure to accept adding the Xtext nature to the project. You will be presented with a new, empty editor that already has an error marker in it. This is because according to our grammar definition, an empty file does not comply to all the rules we specified. Try hitting the code-completion shortcut (Ctrl-Space) twice and see what happens: The first code-completion fills in the navigation rules for part. According to the grammar this is the only valid text at the beginning of a file, so it is automatically inserted. Hitting Ctrl-Space again will tell you that now you need a Name of type ID. Just go ahead and try out the completion. It will help you create a syntactically sound navigation rules file. Notice that the Problems View tells you what is currently wrong. Also notice, that one you reach a part where references are expected by the grammar (e. g. when defining source and destination coordinates in a navigation rule) you will get suggestions based on what you entered earlier. This is what the whole sample from above looks like in the editor: While you are still fleshing out and fine tuning your grammar definitions, you will probably close this Eclipse instance and reopen it, once you repeated the Run As MWE2 Workflow steps in the main instance. In the long run I suggest you create a Feature and an Update Site project to allow easier distribution and updates of the intermediate iterations. Generating Code Now, as we have a complete Xtext DSL defined and in place let’s have a look at the Code Generation side of things. This part is completely optional: You are free to include the necessary Xtext libraries into your applications runtime (although they seem to be numerous) and just use them to dynamically load and parse .navi files on-the-fly. This would probably be a good idea if you were writing an Eclipse based application anyway. However, when targeting a very limited platform like JavaME this option is not viable. Instead we will now create a code generator that provides a transformation from the DSL syntax into more classic Java terms – specifically we will create a HashMap based data structure that carries all the same information, but in Java terms. This is a sample of what the generated output is going to look like: public class NaviRules { private Map navigationRules = new Hashtable(); // ... public NaviRules() { NaviDestination naviDest; // ========== From Login (com.danielschneller.myapp.gui.login.LoginController) // ========== On USER_LOGON_FAILED // ========== To LoginFailed (com.danielschneller.myapp.gui.login.LoginFailedController in com.danielschneller.myapp.login) naviDest = new NaviDestination(); naviDest.action = "USER_LOGON_FAILED"; naviDest.targetClassname = "com.danielschneller.myapp.gui.login.LoginFailedController"; naviDest.targetBundleId = "com.danielschneller.myapp.login"; store("com.danielschneller.myapp.gui.login.LoginController", naviDest); // ========== On USER_LOGON_SUCCESS // ========== To MainMenu (com.danielschneller.myapp.gui.menu.MainMenuController in com.danielschneller.myapp.menu) naviDest = new NaviDestination(); naviDest.action = "USER_LOGON_SUCCESS"; naviDest.targetClassname = "com.danielschneller.myapp.gui.menu.MainMenuController"; naviDest.targetBundleId = "com.danielschneller.myapp.menu"; store("com.danielschneller.myapp.gui.login.LoginController", naviDest); // ============================================================================= // ========== From LoginFailed (com.danielschneller.myapp.gui.login.LoginFailedController) // ========== On OK // ========== To Login (com.danielschneller.myapp.gui.login.LoginController in com.danielschneller.myapp.login) naviDest = new NaviDestination(); naviDest.action = "OK"; naviDest.targetClassname = "com.danielschneller.myapp.gui.login.LoginController"; naviDest.targetBundleId = "com.danielschneller.myapp.login"; store("com.danielschneller.myapp.gui.login.LoginFailedController", naviDest); // .... and so on ... } } The support class NaviDestination is omitted but is generally just a value holder struct type class. When creating the Xtext project using the wizard earlier we created a third Eclipse project, ending in ...generator. Its src folder contains three subdirectories called model, templates and workflow. Put the sample .navi file into the model directory. It will serve as the input for the generator. Create the first template Code generation is based on templates. Xtext leverages the Xpand template engine. In the templates directory create a new Xpand template using the context menu. Call it NaviRules.xpt, open it and insert the following: «REM» import the namespace defined in our DSL model «ENDREM» «IMPORT navigationRules» «REM» Define a template called "main" for elements of type "Root". The minus sign at the end takes care of not adding a newline at the end of it. «ENDREM» «DEFINE main FOR Root-» «ENDDEFINE» As there is only one instance of a Root element in a navigation rules file, this will be the main entry point - hence the name. There is no need to call it main, but it seems fitting. Now between the DEFINE and ENDDEFINE insert what is to be generated: As shown above, we need a new Java source file called NaviRules.java: ... «DEFINE main FOR Root-» «FILE "NaviRules.java"-» «ENDFILE-» «ENDDEFINE» ... Again, the contents to be generated is put in between the FILE and ENDFILE brackets. Anything not enclosed in «» will be used verbatim in the output file. So first of all, put in the static parts of the Java file. What I did was first write the source for a single navigation rule by hand, made sure it compiled and then copied over the relevant parts into the template piece by piece: ... «FILE "NaviRules.java"-» import java.util.*; public class NaviRules { public static class NaviDestination { String action; List requiredPermissions = new ArrayList(); String targetClassname; String targetBundleId; NaviDestination() {}; public final List getRequiredPermissions() { return new ArrayList(requiredPermissions); } // let Eclipse generate getters, setters, // equals and hashCode methods for this } private Map navigationRules = new Hashtable(); «ENDFILE-» ... Now, this is nothing special so far. To fill in the elements from the navigation rules DSL file put in the following: ... private Map navigationRules = new Hashtable(); public NaviRules() { NaviDestination naviDest; «REM» Iterate all elements in the "rules" collection attribute of the "ruledefs" attribute of the "Root" element. Call each iterated element (which is of type "Rule") "rule" and expand the "ruletmpl" template for it here. «ENDREM» «FOREACH ruledefs.rules AS r»«EXPAND ruletmpl FOR r»«ENDFOREACH» } ... In the class constructor we first define a local variable naviDest of the previously declared type. Then - as the comment states - the FOREACH instruction will iterate over all Rule type elements. This might not seem to be completely obvious at first. Remember at this point in the template the current scope is the "Root" element from the navigation rules file. It has an attribute called ruledefs as per the grammer definition. This attribute is of type NavigationRules which in turn has a collection attribute called rules, containing of Rule type objects. Inside the loop the current element can then be adressed by the template variable name r. The loop body (between FOREACH and ENDFOREACH) contains another Xpand instruction to expand a template called ruletmpl which will be declared next. Don't worry, even though this is a little difficult at first - switching contexts between the Java and the template scopes is made significantly easier in Eclipse, because the Xpand template editor will syntax color (static parts are blue) and also assist you with code completion inside the Xpand template parts. Ctrl-Spacing your way through it will make things more obvious than they are when reading an example. Now for the ruletmpl template. Place it below the ENDDEFINE statement belonging to the main template: ... «ENDFILE-» «ENDDEFINE» «DEFINE ruletmpl FOR Rule-» // ========== From «source.name» («source.controllername») «FOREACH destinations AS d»«EXPAND destTmpl(source) FOR d»«ENDFOREACH» // ============================================================================= «ENDDEFINE» You see the same idea used again: Static parts that get transferred into the output file 1:1 and Xpand statements that fill in data from the navigation rules definition file. In this case you see references to the attributes of the Rule element. As per the FOREACH instruction in the previous template, the one at hand will be repeated for every instance of Rule in our source file. Inside this definition the current scope is that of Rule, so with «source.name» the name attribute of the CoordinateMappingSpec object referenced as source in a Rule is taken first, then the controllername attribute likewise. Next up another FOREACH loop iterates the one or more possible Destinations of each Rule. Instead of just applying a template (destTmpl) for every Destination we also pass in the corresponding CoordinateMappingSpec stored in the source attribute of the Rule. This is then used in the following template: ... «DEFINE destTmpl(CoordinateMappingSpec source) FOR Destination-» // ========== On «transition.name» // ========== To «target.name» («target.controllername» in «target.bundleid») naviDest = new NaviDestination(); naviDest.action = "«transition.name»"; naviDest.targetClassname = "«target.controllername»"; naviDest.targetBundleId = "«target.bundleid»"; «FOREACH permissions AS p»«EXPAND permTmpl FOR p»«ENDFOREACH» store("«source.controllername»", naviDest); «ENDDEFINE» «DEFINE permTmpl FOR PermissionReference-» naviDest.requiredPermissions.add("«permission.value»"); «ENDDEFINE» In this innermost templates the attributes of the CoordinateMappingSpec objects source and target are accessed and put into place to be assigned to the members a NaviDestination Java object instance per Destination. There is only one more (very simple) template for the PermissionReference elements. With this, the Xpand file is complete. Set Up The Generator Workflow The wizard initially created a NavigationRulesGenerator.mwe2 file in the workflow folder. Open it and replace its contents with the following: module workflow.NavigationRulesGenerator import org.eclipse.emf.mwe.utils.* var targetDir = "src-gen" var fileEncoding = "Cp1252" var modelPath = "src/model" Workflow { component = org.eclipse.xtext.mwe.Reader { path = modelPath // this class has been generated by the xtext generator register = com.danielschneller.navi.NavigationRulesStandaloneSetup {} load = { slot = "root" type = "Root" } } component = org.eclipse.xpand2.Generator { metaModel = org.eclipse.xtend.typesystem.emf.EmfRegistryMetaModel {} expand = "templates::NaviRules::main FOREACH root" outlet = { path = targetDir } fileEncoding = fileEncoding } } The most interesting parts of this workflow file are the load section in the Reader component and the expand and outlet sections in the Generator component: The first one will connect a so-called slot with the Root element from our navigation rules. The second one will trigger the evaluation of the main template in the NaviRules.xpt file in the templates folder and feed any Root instances it finds in the *.navi files from the src/model (modelPath) into it. Now it is time for some actual generation. Run the generator workflow Right click the MWE2 file you just edited and select the Run As MWE2 Workflow command from the context menu. The Eclipse console will show this output: 0 [main] DEBUG org.eclipse.xtext.mwe.Reader - Resource Pathes : [src/model] 431 [main] DEBUG xt.validation.ResourceValidatorImpl - Syntax check OK! Resource: file:/Users/ds/ws/ws36_xtext/com.danielschneller.navi.dsl.generator/src/model/MyApp.navi 1013 [main] INFO org.eclipse.xpand2.Generator - Written 1 files to outlet [default](src-gen) 1014 [main] INFO .emf.mwe2.runtime.workflow.Workflow - Done. Then have a look at the newly generated contents of the src-gen source folder. If everything went alright, you should find a fresh NaviRules.java file placed there, based on the contents of your navigation rules file and the Xpand templates. Try and make some changes to the template, then re-run the workflow. You will see the changes reflected in the generated source file. Generate a second source File In the templates directory add another Xpand template file Navigation.xpt with the following content: «IMPORT navigationRules»; «DEFINE main FOR Root-» «FILE "Navigation.java"-» public final class Navigation { «FOREACH ruledefs.rules.destinations.transition.collect(e|e.name).toSet().sortBy(e|e) AS t»«EXPAND actionTmpl FOR t»«ENDFOREACH» private final String name; private Navigation(String aName) { name = aName; } public String getName() { return name; } } «ENDFILE-» «ENDDEFINE» «DEFINE actionTmpl FOR String-» /** Constant for Navigation «this» */ public static final Navigation «this» = new Navigation("«this»"); «ENDDEFINE» This is a template for a type-safe enumeration that can be used in Java 1.4 - remember I had to do this for JavaME. Notice the FOREACH loop in this case. It demonstrates that not only simple iterations are possible, but that Xpand allows more complex operations as well. In this case it will collect the names of all the navigation transitions from all the Destinations in the navigation rules. These are of type String. They are made unique by converting them to a Set datastructure and then finally sorted in their natural order. The resulting list of sorted strings is then iterated, each one - called t - is passed to the actionTmpl template. It is very simple, just placing the string itself («this») into a single line of Java source code. Of course, strictly speaking this is a rather complicated procedure to get the same information we could also have taken from the TransitionDefinitions element in the rules definition. However I think it serves as a nice example for additional Xpand capabilities. For a full description of its possibilities, have a look at the Xpand Reference in the Eclipse documentation. To use the new template, add another section to the MWE2 workflow definition: component = org.eclipse.xpand2.Generator { metaModel = org.eclipse.xtend.typesystem.emf.EmfRegistryMetaModel {} expand = "templates::Navigation::main FOREACH root" outlet = { path = targetDir } fileEncoding = fileEncoding } Running it again will produce a slightly different output, making clear that two files have been generated. This is what comes out in the src-gen folder as Navigation.java: public final class Navigation { /** Constant for Navigation ADMIN */ public static final Navigation ADMIN = new Navigation("ADMIN"); /** Constant for Navigation BACK */ public static final Navigation BACK = new Navigation("BACK"); /** Constant for Navigation DATA_LOOKUP */ public static final Navigation DATA_LOOKUP = new Navigation("DATA_LOOKUP"); /** Constant for Navigation OK */ public static final Navigation OK = new Navigation("OK"); ... More... This was just about my first experiments with Xtext. I am sure there is plenty more to be done with it. For more reading, please have a look at this very nice Getting started with Xtext tutorial by Peter Friese of Itemis. From http://www.danielschneller.com/2010/08/code-generation-with-xtext.html

August 7, 2010

by Daniel Schneller

· 27,140 Views

Optimizing JPA Performance: An EclipseLink, Hibernate, and OpenJPA Comparison

'Impedance mismatch'. No two words encompass the troubles, headaches and quirks most developers face when attempting to link applications to relational databases (RDBMS). But lets face it, object orientated designs aren't going away anytime soon from mainstream languages and neither are the relational storage systems used in most applications. One side works with objects, while the other with tables. Resolving these differences -- or as its technically referred to 'object/relational impedance mismatch' -- can result in substantial overhead, which in turn can materialize into poor application performance. In Java, the Java Persistence API (JPA) is one of the most popular mechanisms used to bridge the gap between objects (i.e. the Java language) and tables (i.e. relational databases). Though there are other mechanisms that allow Java applications to interact with relational databases -- such as JDBC and JDO -- JPA has gained wider adoption due to its underpinnings: Object Relational Mapping (ORM). ORM's gain in popularity is due precisely to it being specifically designed to address the interaction between object and tables. In the case of JPA, there is a standard body charged with setting its course, a process which has given way to several JPA implementations, among the three most popular you will find: EclipseLink (evolved from TopLink), Hibernate and OpenJPA. But even though all three are based on the same standard, ORM being such a deep and complex topic, beyond core functionality each implementation has differences ranging from configuration to optimization techniques. What I will do next is explain a series of topics related to optimizing an application's use of the JPA, using and comparing each of the previous JPA implementations. While JPA is capable of automatically creating relational tables and can work with a series of relational database vendors, I will part from having pre-existing data deployed on a MySQL relational database, in addition to relying on the Spring framework to facilitate the use of the JPA. This will not only make it a fairer comparison, but also make the described techniques appealing to a wider audience, since performance issues become a serious concern once you have a large volume of data, in addition to MySQL and Spring being a common choice due to their community driven (i.e. open-source) roots. See the source code/application section at the end for instructions on setting up the application code discussed in the remainder of the sections. Download the Source Code associated with this article (~45 MB) The basics: Metrics In order to establish JPA performance levels in an application, it's vital to first obtain a series of metrics related to a JPA implementation's inner workings. These include things like: What are the actual queries being performed against a RDBMS? How long does each query take? Are queries being performed constantly against the RDBMS or is a cache being used? These metrics will be critical to our performance analysis, since they will shed light on the underlying operations performed by a JPA implementation and in the process show the effectiveness or ineffectiveness of certain techniques. In this area you will find the first differences among implementations, and I'm not talking about metric results, but actually how to obtain these metrics. To kick things off, I will first address the topic of logging. By default, all three JPA implementations discussed here -- EclipseLink, Hibernate and OpenJPA -- log the query performed against a RDBMS, which will be an advantage in determining if the queries performed by an ORM are optimal for a particular relational data model. Nevertheless, tweaking the logging level of a JPA implementation further can be helpful for one of two things: Getting even more details from the underlying operations made by a JPA -- which can be turned off by default (e.g. database connection details) -- or getting no logging information at all -- which can benefit a production system's performance. Logging in JPA implementations is managed through one of several logging frameworks, such as Apache Commons Logging or Log4J. This requires the presence of such libraries in an application. Logging configuration of a JPA implementation is mostly done through a value in an application's persistence.xml file or in some cases, directly in a logging framework's configuration files. The following table describes JPA logging configuration parameters: Large table, so here's an external link In addition to the information obtained through logging, there is another set of JPA performance metrics which require different steps to be obtained. One of these metrics is the time it takes to perform a query. Even though some JPA implementations provide this information using certain configurations, some do not. Even so, I opted to use a separate approach and apply it to all three JPA implementations in question. After all, time metrics measured in milliseconds can be skewed in certain ways depending on start and end time criteria. So to measure query times, I will use Aspects with the aid of the Spring framework. Aspects will allow us to measure the time it takes a method containing a query to be executed, without mixing the timing logic with the actual query logic -- the last feature of which is the whole purpose of using Aspects. Further discussing Aspects would go beyond the scope of performance, so next I will concentrate on the Aspect itself. I advise you to look over the accompanying source code, Aspects and Spring Aspects for more details on these topics and their configuration. The following Aspect is used for measuring execution times in query methods. package com.webforefront.aop;import org.apache.commons.lang.time.StopWatch;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Pointcut;import org.aspectj.lang.annotation.Aspect;@Aspectpublic class DAOInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object logQueryTimes(ProceedingJoinPoint pjp) throws Throwable { StopWatch stopWatch = new StopWatch(); stopWatch.start(); Object retVal = pjp.proceed(); stopWatch.stop(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": " + stopWatch.getTime() + "ms"); return retVal; } The main part of the Aspect is the @Around annotation. The value assigned to this last annotation indicates to execute the aspect method -- logQueryTimes -- each time a method belonging to a class in the com.webforefront.jpa.service package is executed -- this last package is where all our application's JPA query methods will reside. The logic performed by the logQueryTimes aspect method is tasked with calculating the execution time and outputting it as logging information using Apache Commons Logging. Another set of important JPA metrics is related to statistics beyond those provided by standard logging. The statistics I'm referring to are things related to caches, sessions and transactions. Since the JPA standard doesn't dictate any particular approach to statistics, each JPA implementation also varies in the type and way it collects statistics. Both Hibernate and OpenJPA have their own statistics class, where as EclipseLink relies on a Profiler to gather similar metrics. Since I'm already relying on Aspects, I will also use an Aspect to obtain statistics both prior and after the execution of a JPA query method. The following Aspect obtains statistics for an application relying on Hibernate. package com.webforefront.aop;import org.hibernate.stat.Statistics;import org.hibernate.SessionFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.hibernate.ejb.HibernateEntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheHibernateInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { HibernateEntityManagerFactory hbmanagerfactory = (HibernateEntityManagerFactory) entityManagerFactory; SessionFactory sessionFactory = hbmanagerfactory.getSessionFactory(); Statistics statistics = sessionFactory.getStatistics(); String str = pjp.getTarget().toString(); statistics.setStatisticsEnabled(true); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) " + statistics); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) " + statistics); return result; } } Notice the similar structure to the prior timing Aspect, except in this case the logging output contains values that belong to the Statistics Hibernate class obtained via the application's EntityManagerFactory. The next Aspect is used to obtain statistics for an application relying on OpenJPA. package com.webforefront.aop;import org.apache.openjpa.datacache.CacheStatistics;import org.apache.openjpa.persistence.OpenJPAEntityManagerFactory;import org.apache.openjpa.persistence.OpenJPAPersistence;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheOpenJPAInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { OpenJPAEntityManagerFactory ojpamanagerfactory = OpenJPAPersistence.cast(entityManagerFactory); CacheStatistics statistics = ojpamanagerfactory.getStoreCache().getStatistics(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); return result; } } Once again, notice the similar Aspect structure to the previous Aspect which relies on an application's EntityManagerFactory. In this case, the logging output contains values that belong to the CacheStatistics OpenJPA class. Since OpenJPA does not enable statistics by default, you will need to add the following two properties to an application's persistence.xml file: The first property ensures statistics are gathered, while the second property is used to indicate the gathering of statistics take place on a single JVM. NOTE: The value "true(EnableStatistics=true)" also enables caching in addition to statistics. Since EclipseLink doesn't have any particular statistics class and relies on a Profiler to determine advanced metrics, the simplest way to obtain similar statistics to those of Hibernate and OpenJPA is through the Profiler itself. To active EclipseLink's Profiler you just need to add the following property to an application's persistence.xml file: . By doing so, the EclipseLink Profiler output's several metrics on each JPA query method execution as logging information. Now that you know how to obtain several metrics from all three JPA implementations and understand they will be obtained as fairly as possible for all three providers, it's time to put each JPA implementation to the test along with several performance techniques. JPQL queries, weaving and class transformations Lets start by making a query that retrieves data belonging to a pre-existing RDBMS table named "Master". The "Master" table contains over 17,000 records belonging to baseball players. To simplify matters, I will create a Java class named "Player" and map it to the "Master" table in order to retrieve the records as objects. Next, relying on the Spring framework's JpaTemplate functionality, I will setup a query to retrieve all "Player" objects, with the query taking the following form: getJpaTemplate().find("select e from Player e"); See the accompanying source code for more details on this last process. Next, I deploy the application using each of the three JPA implementations on Apache Tomcat, doing so separately, as well as starting and stopping the server on each deployment to ensure fair results. These are the results of doing so on a 64-bit Ubuntu-4GB RAM box, using Java 1.6: All player objects - 17,468 records Time Query Hibernate 3558 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ EclipseLink (Run-time weaver - Spring ReflectiveLoadTimeWeaver weaver ) 3215 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (Build-time weaving) 3571 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (No weaving) 3996 ms SELECT lahmanID, nameLast, nameFirst FROM Master OpenJPA (Build-time enhanced classes) 5998 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Run-time enhanced classes- OpenJPA enhancer) 6136 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Non enhanced classes) 7677 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 As you can observe, the queries performed by each JPA implementation are fairly similar, with two of them using a shortcut notation (e.g. t0 and player0 for the table named 'Master'). This syntax variation though has minimal impact on performance, since directly querying an RDBMS using any of these notation variations shows identical results. However, the query times made through several JPA implementations using distinct parameters vary considerably. One important factor leading to this time difference is due to how each implementation handles JPA entities. Lets start with the OpenJPA implementation which had the poorest times. OpenJPA can execute an enhancement process on Java entities (e.g. in this case the 'Player' class). This enhancement process can be performed when the entities are built, at run-time or foregone altogether. As you can observe, foregoing entity enhancement altogether in OpenJPA produced the longest query times. Where as enhancing entities at either build-time or run-time produced relatively better results, with the former beating out the latter. By default, OpenJPA expects entities to be enhanced. This means you will either need to explicitly configure an application to support unenhanced classes by adding the following: ...property to an application's persistence.xml file or enhance classes at build-time or at run-time relying on the OpenJPA enhancer, otherwise an application relying on OpenJPA will throw an error. Given these OpenJPA results, the remaining OpenJPA tests will be based on build-time enhanced entity classes. For more on the topic of OpenJPA enhancement, refer to the OpenJPA documentation in addition to consulting the accompanying source code for this article. You may be wondering what exactly constitutes OpenJPA enhancement ? OpenJPA entity enhancement is a processing step applied to the bytecode generated by the Java compiler which adds JPA specific instructions to provide optimal runtime performance, these instructions can include things like flexible lazy loading and dirty read tracking. So why doesn't Hibernate or EclipseLink enhance entities ? In short, Hibernate and EclipseLink also enhance JPA entites, they just don't outright call it 'enhancement'. EclipseLink calls this 'enhancement' process by the more technical term: weaving. Similar to OpenJPA's enhancement process, weaving in EclipseLink can take place at either build-time (a.k.a. static weaving), run-time or forgone altogether. As you can observe in the results, all of EclipseLink's tests present smaller variations compared to OpenJPA. The longest EclipseLink variation involved not using weaving. If you think about it, this is rather logical given that the purpose of weaving consists of altering Java byte code for the purpose of adding optimized JPA instructions that include lazy loading, change tracking, fetch groups and internal optimizations. For the EclipseLink tests using weaving, both build-time and run-time weaving present better results. For build-time weaving, I used EclipseLink's library along with an Apache Ant task, where as for run-time weaving, I used the Spring framework's ReflectiveLoadTimeWeaver. I can only assume the slightly better performance of using run-time weaving over build-time weaving in EclipseLink was due to the fact of using a weaver integrated with the Spring framework, which in turn could result in better JPA optimizations designed for Spring applications. Nevertheless, considering the test result of forgoing weaving altogether, weaving does not appear to be a major performance impact when using EclipseLink, ceteris paribus. By default, EclipseLink expects run-time weaving to be enabled, otherwise you will receive an error in the form 'Cannot apply class transformer without LoadTimeWeaver specified'. This means that for cases using build-time weaving or no weaving at all, you will need to explicitly indicate this behavior. In order to disable EclipseLink weaving you will need to either configure an application's EntityManagerFactory Spring bean with: ... or add the .... ...property to an application's persistence.xml file. To indicate an application's entities are built using build-time weaving, substitute the previous property's "false" value with "static". To configure the default run-time weaver expected by EclipseLink, add the following: ...property to an application's EntityManagerFactory Spring bean. Given these EclipseLink results, the remaining EclipseLink tests will be based on run-time weaving provided by the Spring framework. For more on the topic of EclipseLink weaving, refer to the EclipseLink documentation at http://wiki.eclipse.org/Introduction_to_EclipseLink_Application_Development_(ELUG)#Using_Weaving, in addition to consulting the accompanying source code for this article. Hibernate doesn't require neither enhancing JPA entities or weaving. For this reason, there is only one test result. This not only makes Hibernate simpler to setup, but judging by its only test result -- which clock's in at second place with respect to all other tests -- Hibernate's performance ranks high compared to its counterparts. However, in what I would consider Hibernate's equivalent to OpenJPA's enhancement process or EclipseLink's weaving, you will find a series of Hibernate properties. For example, Hibernate has properties such as hibernate.default_batch_fetch_size designed to optimize lazy loading. As you might recall, among the purposes of both OpenJPA's enhancement process and EclipseLink's weaving are the optimization of lazy loading. So where as OpenJPA and EclipseLink require a separate and monolithic step -- at build-time or run-time -- to achieve JPA optimization techniques, Hibernate falls back to the use of granular properties specified in an application's persistence.xml file. Nevertheless, given that Hibernate's default behavior proved to be on par with the best query times, I didn't feel a need to further explore with these Hibernate properties. To get another sense of the times and mapping procedures of each JPA implementation, I will make more selective queries based on a Player object's first name and last name. These are the results of performing a query for all Player objects whose first name is John and a query for all Player objects whose last name in Smith. All player objects whose first name is John - 472 records Time Query EclipseLink 1265 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameFirst = ?) Hibernate 613 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameFirst=? OpenJPA 1643 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameFirst = ?) [params=?] All player objects whose last name is Smith - 146 records Time Query EclipseLink 986 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameLastt = ?) Hibernate 537 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameLast=? OpenJPA 1452 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameLast = ?) [params=?] These test results tell a slightly different story,with all three JPA implementations presenting substantial time differences amongst one another. At a lower record count, Hibernate's out-of-the-box configuration resulted in almost twice as fast queries as its closest competitor and almost three times faster queries than its other competitor. To get an even broader sense of the times and mapping procedures of each JPA implementation, I will make a query on a single Player object based on its id. These are the results of performing such a query. Single player object whose ID is 777- 1 record Time Query EclipseLink 521 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (lahmanID = ?) Hibernate 157 ms select player0_.lahmanID as lahmanID0_0_, player0_.nameFirst as nameFirst0_0_, player0_.nameLast as nameLast0_0_ from Master player0_ where player0_.lahmanID=? OpenJPA 1052 ms SELECT t0.nameFirst, t0.nameLast FROM Master t0 WHERE t0.lahmanID = ? [params=?] With the exception of the faster query times -- due to it being a query for a single Player object -- the times between JPA implementations are practically in proportion to the queries used for extracting multiple Player objects by first and last name. This will do it as far as test queries are concerned. However, a word of caution is in order when discussing these topics on optimization/enhancement/weaving. Even though the previous tests consisted of querying over 17,000 records and confirm clear advantages of using one provider and technique over another, they are still one dimensional, since they're based on read operations performed on a single object type and a single RDBMS table. JPA can perform a large array of operations that also include updating, writing and deleting RDBMS records, not to mention the execution of more elaborate queries that can span multiple objects and tables. In addition, RDBMS themselves can have influencing factors (e.g. indexes) over JPA query times. So all this said, it's not too far fetched to think the use of OpenJPA entity enhancement, EclipseLink weaving or Hibernate properties, could have varying degrees -- either beneficial or detrimental -- depending on the queries (i.e. multi-table, multi-object) and type of JPA operation (i.e. read, write, update, delete) involved. Next, I will describe one of the most popular techniques used to boost performance in JPA applications. Caches A cache allows data to remain closer to an application's tier without constantly polling an RDBMS for the same data. I entitled the section in plural -- caches -- because there can be several caches involved in an application using JPA. This of course doesn't mean you have to configure or use all the caches provided by an application relying on JPA, but properly configuring caches can go a long way toward enhancing an application's JPA performance. So lets start by analyzing what it's each JPA implementation offers in its out-of-the-box state in terms of caching. The following table illustrates tests done by simply invoking the previous JPA queries for a second and third consecutive time, without stopping the server. Note that the same process of deploying a single application at once was used, in addition to the server being re-started on each set of tests. Query / Implementation EclipseLink Hibernate OpenJPA All records (1st time) 3215 ms 3558 ms 5998 ms All records (2nd time) 507 ms 272 ms 521 ms All records (3rd time) 439 ms 218 ms 263 ms First name (1st time) 1265 ms 613 ms 1643 ms First name (2nd time) 151 ms 115 ms 239 ms First name (3rd time) 154 ms 101 ms 227 ms Last name (1st time) 986 ms 537 ms 1452 ms Last name (2nd time) 41 ms 41 ms 112 ms Last name (3rd time) 65 ms 38 ms 117 ms By ID (1st time) 521 ms 157 ms 1052 ms By ID (2nd time) 1 ms 6 ms 3 ms By ID (3rd time) 1 ms 3 ms 3 ms As you can observe, on both the second and third invocation all the queries show substantial improvements with respect to the first invocation. The primary cause for these improvements is unequivocally due to the use of a cache. But what type of cache exactly ? Could it be an RDBMS's own caching engine ? JPA ? Spring ? Or some other variation ?. In order to shed some light on cache usage, the following table illustrates the cache statistics generated on each of the previous JPA queries. Query / Impleme)ntation EclipseLink Hibernate OpenJPA All records (2nd time) number of objects=17468, total time=506, local time=506, row fetch=65, object building=328, cache=112, sql execute=47, objects/second=34521, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=34936, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A All records (3rd time) number of objects=17468, total time=435, local time=435, profiling time=1, row fetch=28, object building=323, cache=106, logging=1, sql execute=27, objects/second=40156, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=52404, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (2nd time) number of objects=472, total time=148, local time=148, row fetch=27, object building=106, cache=7, logging=1, sql execute=3, objects/second=3189, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=944, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (3rd time) number of objects=472, total time=152, local time=152, row fetch=20, object building=121, cache=7, sql execute=3, objects/second=3105, sessions opened=3, sessions closed=3 connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=1416, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (2nd time) number of objects=146, total time=40, local time=40, row fetch=7, object building=27, cache=2, logging=1, sql execute=3, objects/second=3650, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=292, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (3rd time) number of objects=146, total time=63, local time=63, profiling time=1, row fetch=6, object building=19, cache=5, sql prepare=1, sql execute=23, objects/second=2317, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=438 queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (2nd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=2, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (3rd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=3, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A Notice the statistics generated by each JPA implementation are different. EclipseLink reports a single cache statistic, OpenJPA doesn't even report statistics unless a cache is enabled -- see previous section on metrics for details on this behavior -- and Hibernate reports two cache related statistics: second level cache and query cache. At this juncture, if you look at the test results and statistics for the second and third invocation, something won't add up. How is it that OpenJPA's test results came out faster when caching is disabled by default ? An how about Hibernate returning 0's on its cache related statistics, even when its test results came out faster ? The reason for this performance increase is due to RDBMS caching. On the first query, the RDBMS needs to read data from its own file system (i.e. perform an I/O operation), on subsequent requests the data is present in RDBMS memory (i.e. its cache) making the entire JPA query much faster. A closer look at the Hibernate statistics field 'queries executed to the database' can confirm this. Notice that on every second query it shows 2 and on every third query it shows 3, meaning the data was read directly from the database. NOTE: The only exception to this occurs when a query is made on a single entity (i.e. by id), I will address this shortly. Next, lets start breaking down the caches you will encounter when using JPA applications. The JPA 2.0 standard defines two types of caches: A first level cache and a second level cache. The first level cache or EntityManager cache is used to properly handle JPA transactions. A first level cache only exist for the duration of the EntityManager. With the exception of long lived operations performed against a RDBMS, JPA EntityManager's are short lived and are created & destroyed per request or per transaction. In this case, given the nature of the queries, first level caches are cleared on every query. A second level cache on the other hand is a broader cache that can be used across transactions and users. This makes a JPA second level cache more powerful, since it can avoid constantly polling an RDBMS for the same data. But even though the JPA 2.0 standard now addresses second level cache features, this was not the case in JPA 1.0. In the 1.0 version of the JPA standard only a first level cache was addressed, leaving the door completely open on the topic of a second level cache. This created a fragmented approach to caching in JPA implementations, which even now as JPA 2.0 compliant implementations emerge, some non-standard features continue to be part of certain implementations given the value they provide to JPA caching in general. So as I move forward, bear in mind that just like previous JPA topics, each JPA implementation can have its own particular way of dealing with second level caching. I will start with OpenJPA, which has the least amount of proprietary caching options. To enable OpenJPA caching (i.e. second level caching) you need to declare the following two properties in an application's persistence.xml file: The first property ensures caching and statistics are activated, while the second property is used to indicate caching take place on a single JVM. The following results and statistics were obtained with OpenJPA's second level cache enabled. Query with OpenJPA caching Time Statistics Time without statistics All records (2nd time) 420 ms read count=34936, hit count=17468, write count=17468, total read count=34936, total hit count=17468, total write count=17468 347 ms All records (3rd time) 254 ms read count=52404, hit count=34936, write count=17468, total read count=52404, total hit count=34936, total write count=17468 230 ms First name (2nd time) 125 ms read count=944, hit count=472, write count=472, total read count=944, total hit count=472, total write count=472 127 ms First name (3rd time) 114 ms read count=1416, hit count=944, write count=472, total read count=1416, total hit count=944, total write count=472 132 ms Last name (2nd time) 63 ms read count=292, hit count=146, write count=146, total read count=292, total hit count=146, total write count=146 53 ms Last name (3rd time) 49 ms read count=438, hit count=292, write count=146, total read count=438, total hit count=292, total write count=146 50 ms By ID (2nd time) 5 ms read count=2, hit count=1, write count=1, total read count=2, total hit count=1, total write count=1 1 ms By ID (3rd time) 4 ms read count=3, hit count=2, write count=1, total read count=3, total hit count=2, total write count=1 1 ms As these test results illustrate, executing subsequent JPA queries with OpenJPA's second level cache produce superior results. Another important behavior illustrated in some of these test cases is that by simply disabling statistics -- and still using the second level cache -- query times improve even more. The OpenJPA statistics also demonstrate how the cache is being used. Notice that on each subsequent query the statistics field 'hit count' is duplicated, which means data is being read from the cache (i.e. a hit). Also notice the statistics field 'write count' remains static, which means data is only written once from the RDBMS to the cache. This is pretty basic functionality for a second level cache. On certain occasions a need may arise to interact directly with a cache. These interactions can range from prohibiting an entity from being cached, assigning a particular amount of memory to a cache, forcing an entity to always be cached, flushing all the data contained in a cache, or even plugging-in a third party caching solution to provide a more robust strategy, among other things. The JPA 2.0 standard provides a very basic feature set in terms of second level caching through javax.persistence.Cache. Upon consulting this interface, you'll realize it only provides four methods charged with verifying the presence of entities and evicting them. This feature set not only proves to be limited, but also cumbersome since it can only be leveraged programmatically (i.e. through an API). In this sense, and as I've already mentioned, JPA implementations have provided a series of features ranging from persistence.xml properties to Java annotations related to second level caching. OpenJPA offers several of these second level caching features, including a separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will point you directly to OpenJPA's cache documentation available at http://openjpa.apache.org/builds/apache-openjpa-2.1.0-SNAPSHOT/docs/manual/ref_guide_caching.html#ref_guide_cache_query so you can try these parameters for yourself on the accompanying application source code. Hibernate just like OpenJPA has its second level cache disabled. To enable Hibernate's second level cache you need to add the following properties to an application's persistence.xml file: Its worth mentioning that Hibernate has integral support for other second level caches. The previous properties displayed how to enable the HashtableCacheProvider cache -- the simplest of the integral second level caches -- but Hibernate also provides support for five additional caches, which include: EHCache, OSCache, SwarmCache, JBoss cache 1 and JBoss cache 2, all of which provide distinct features, albeit require additional configuration. Besides these properties, Hibernate also requires that each JPA entity be declared with a caching strategy. In this case, since the Person entity is read only, a caching strategy like the following would be used: Similar to OpenJPA, Hibernate also offers several second level caching features through proprietary annotations and configurations, as well as support for the separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will also point you directly to Hibernate's cache documentation available at http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache so you can try these parameters for yourself on the accompanying application source code. Unlike OpenJPA and Hibernate, EclipseLink's second level cache is enabled by default, therefore there is no need to provide any additional configuration. However, similar to its counterparts, EclipseLink also has a series of proprietary second level cache features which can enhance JPA performance. You can find more information on these features by consulting EclipseLink's cache documentation available at: http://wiki.eclipse.org/Introduction_to_Cache_(ELUG) With this we bring our discussion on object relational mapping performance with JPA to a close. I hope you found the various tests and metrics presented here a helpful aid in making decisions about your own JPA applications. In addition, don't forget you can rely on the accompanying source code to try out several JPA variations more ad-hoc to your circumstances. About the author Daniel Rubio is an independent technology consultant specializing in enterprise and web-based software. He blogs regularly on these and other software areas at http://www.webforefront.com. He's also authored and co-authored three books on Java technology. Source code/Application installation * Install MySQL on your workstation (Tested on MySQL 5.1.37-64 bits) - http://dev.mysql.com/downloads/ * Install data set on MySQL - Go to http://www.baseball-databank.org/ and click on the link titled 'Database in MySQL form'. This will download a zipped file with a series of MySQL data structures containing baseball statistics. First create a MySQL database to host the data using the command: 'mysqladmin -p create jpaperformance'. This will create a database named 'jpaperformance'. Next, load the baseball statistics using the following command: 'mysql -p -D jpaperformance < BDB-sql-2009-11-25.sql' where 'BDB-sql-2009-11.25.sql' represents the unzipped SQL script obtained by extracting the zip file you dowloaded. * Create JPA application WARs - The download includes source code, library dependencies and an Ant build file. This includes all three JPA implementations Hibernate 3.5.3, EclipseLink 2.1 and OpenJPA 2.1. To build the JPA Hibernate WAR - ant hibernate To build the JPA EclipseLink WAR - ant eclipselink To build the JPA OpenJPA WAR - ant openjpa All builds are placed under the dist/ directories. * Deploy to Tomcat 6.0.26 - Copy the MySQL Java driver and Spring Tomcat Weaver -- included in the download directory 'tomcat_jar_deps' -- to Apache Tomcat's /lib directory. - Copy each JPA application WAR to Apache Tomcat's /webapps directory, as needed. * Deployment URL's http://localhost:8080/hibernate/hibernate/home ( Query all Player objects ) http://localhost:8080/eclipselink/eclipselink/home ( Query all Player objects ) http://localhost:8080/openjpa/openjpa/home ( Query all Player objects ) http://localhost:8080/hibernate/hibernate/firstname/ ( Query Player objects by first name) http://localhost:8080/eclipselink/eclipselink/firstname/ ( Query Player objects by first name) http://localhost:8080/openjpa/openjpa/firstname/ ( Query Player objects by first name) http://localhost:8080/hibernate/hibernate/lastname/ ( Query Player objects by last name) http://localhost:8080/eclipselink/eclipselink/lastname/ ( Query Player objects by last name) http://localhost:8080/openjpa/openjpa/lastname/ ( Query Player objects by last name) http://localhost:8080/hibernate/hibernate/playerid/ (Query Player by id) http://localhost:8080/eclipselink/eclipselink/playerid/ ( Query Player by id) http://localhost:8080/openjpa/openjpa/playerid/ ( Query Player by id)

July 20, 2010

by Daniel Rubio

· 153,792 Views · 2 Likes