DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Data Topics

article thumbnail
Using Sphinx and Java to Implement Free Text Search
As promised I am going to provide an article on how we can use Sphinx with Java to perform a full text search. I will begin the article with an introduction to Sphinx. Introduction to Sphinx Databases are continually growing and sometimes tend to hold about 100M records and need an external solution for full text search to be performed. I have picked Sphinx, an open source full-text search engine, distributed under GPL version 2 to perform a full text search on such a huge amount of data. Generally, it's a standalone search engine meant to provide fast, size-efficient and relevant full-text search functions to other applications very much compatible with an SQL Database. So my example will be based on the MySQL database, as we cannot produce millions of data to evaluate the real power of Sphinx, we will have a small amount of data and I think that should not be a problem. Here are few Sphinx Unique Features: high indexing speed (up to 10 MB/sec on modern CPUs) high search speed (avg query is under 0.1 sec on 2-4 GB text collections) high scalability (up to 100 GB of text, upto 100 M documents on a single CPU) provides distributed searching capabilities provides searching from within MySQL through pluggable storage engine supports boolean, phrase, and word proximity queries supports multiple full-text fields per document (upto 32 by default) supports multiple additional attributes per document (ie. groups, timestamps, etc) supports MySQL natively (MyISAM and InnoDB tables are both supported) The important features which have been adopted to perform a full text search are the provision of the Java API to integrate easily with the web application and considerably high indexing and searching speed with an average of 4-10 MB/sec & 20-30 ms/q @5GB,3.5M docs(wikipedia) Sphinx Terms & How It Works The fist principle part of sphinx is indexer. It is solely responsible for gathering the data that will be searchable. From the Sphinx point of view, the data it indexes is a set of structured documents, each of which has the same set of fields. This is biased towards SQL, where each row corresponds to a document, and each column to a field. Sphinx builds a special data structure optimized for our queries from the data provided. This structure is called index; and the process of building index from data is called indexing and the element of sphinx which carries out these tasks is called indexer. Indexer can be executed either from a regular script or command-line interface. Sphinx documents are equal to records in DB. Document is set of text fields and number attributes + unique ID – similar to row in DB Set of fields and attributes is constant for index – similar to table in DB Fields are searchable for FullText queries Attributes may be used for filtering, sorting, grouping searchd is the second principle tools as part of Sphinx. It is the part of the system which actually handles searches; it functions as a server and is responsible for receiving queries, processing them and returning a dataset back to the different APIs for client applications. Unlike indexer, searchd is not designed to be run either from a regular script or command-line calling, but instead either as a daemon to be called from init.d (on Unix/Linux type systems) or to be called as a service (on Windows-type systems). I am going to focus on Windows environment so later I will show you how we can install sphinx on windows as a service. Finally search is one of the helper tools within the Sphinx package. Whereas searchd is responsible for searches in a server-type environment, search is aimed at testing the index quickly without building a framework to make the connection to the server and process its response. This will only be used for testing sphinx from command – line and with respect to application’s requirement; searchd service will be used to query the MySql Server with a pre created index. Installation on Windows So now we come to the part of installing Sphinx on Windows: Download Sphinx from the official Sphinx download site i.e http://sphinxsearch.com (I downloaded Win32 release binaries with MySQL support: sphinx-0.9.9-win32.zip) Unzip the file to some folder, I unzipped to C:\devel\sphinx-0.9.9-win32 and added the bin directory to the windows path variable Well Sphinx is installed. Nice, simple, easy. Later I will tell how to set up indexes and search. Sample Application Till now I guess the whole motto of this article is clear to you, let's move ahead to define our sample application. We all use the Address Book to search for people by using their name or e-mail address when we want to immediately address an e-mail message to a specific person, people, or distribution list. We also search for people by using other basic information, such as e-mail alias, office location, and telephone number etc. I think most of the people on this planet are quire familiar with this kind of search, so let's make outlook address book as our sample database schema. Most of the fields are mapped from microsoft outlook, the only additional column is date of joining so that we can filter our queries based on joining dates of the employees. The example that I am going to put forth will use Sphinx to search for a particluar address entry using free text search, meaning the user is free to type in anything, here is our search screen, the DOJ (date of joining) search parameter is optional. The screen is self explanatory, let's move ahead and define our database. As Sphinx works well with MySQL and MySQL being free also, lets create our db scripts around mysql database (Those who wish to install MySQL can dowload it from http://www.mysql.com) Let's create our sample database 'addressbook' mysql> create database addressbook; Query OK, 1 row affected (0.03 sec) mysql> use addressbook; Database changed Note: The fields defined in the following tables are for the purpose of learning only and may not contain a complete set of fields that microsoft address book or any similar software may provide. mysql> CREATE TABLE addressbook ( Id int(11) NOT NULL, FirstName varchar(30) NOT NULL, LastName varchar(30) NOT NULL, OfficeId int(11) DEFAULT NULL, Title varchar(20) DEFAULT NULL, Alias varchar(20) NOT NULL, Email varchar(50) NOT NULL, DOJ date NOT NULL, PhoneNo varchar(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; mysql> CREATE TABLE CompanyLocations ( Id int(11) NOT NULL, Location varchar(60) NOT NULL, Country varchar(20) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; It's time to put some dummy data into the table, so let's fill our tables. Our virtual company 'gogs.it' has six offices across India and Singapore as defined in the following insert script. mysql> insert into CompanyLocations (Id, Location, Country) VALUES (1, 'Tower One, Harbour Front, Singapore', 'SG'); insert into CompanyLocations (Id, Location, Country) VALUES (2, 'DLF Phase 3, Gurgaon, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (3, 'Hiranandani Gardens, Powai, Mumbai, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (4, 'Hinjwadi, Pune, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (5, 'Toll Post, Nagrota, Jammu, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (6, 'Bani (Kathua), India', 'IN'); Now comes the real stuff... The data sphinx is going to index, let's populate that as well...wooooo mysql> INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (1,'Aabheer','Kumar',1,'Mr','u534','[email protected]','2008-9-3', '+911234599990'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (2,'Aadarsh','Gupta',6,'Mr','u668','[email protected]','2007-2-23','+911234599991'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (3,'Aachman','Singh',5,'Mr','u2766','[email protected]','2006-12-18','+911234599992'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (4,'Aadesh','Shrivastav',5,'Mr','u3198','[email protected]','2007-11-23','+911234599993'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (5,'Aadi','manav',1,'Mr','u2686','[email protected]','2010-7-20','+911234599994'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (6,'Aadidev','singh',4,'Mr','u572','[email protected]','2010-8-18','+911234599995'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (7,'Aafreen','sheikh',4,'Smt','u1092','[email protected]','2007-7-11','+911234599996'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (8,'Aakar','Sherpa',5,'Mr','u1420','[email protected]','2009-10-3','+911234599997'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (9,'Aakash','Singh',4,'Mrs','u2884','[email protected]','2008-6-11','+911234599998'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (10,'Aalap','Singhania',4,'Mrs','u609','[email protected]','2010-10-8','+911234599999'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (11,'Aandaleeb','mahajan',1,'Smt','u131','[email protected]','2010-10-21','+911234580001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (12,'Mamata','kumari',5,'Sh','u2519','[email protected]','2009-6-12','+911234580002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (13,'Mamta','sharma',6,'Smt','u4123','[email protected]','2009-2-8','+911234580003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (14,'Manali','singh',6,'Mr','u1078','[email protected]','2008-6-14','+911234580004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (15,'Manda','saxena',1,'Mrs','u196','[email protected]','2010-9-4','+911234580005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (16,'Salila','shetty',3,'Miss','u157','[email protected]','2009-11-15','+911234580006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (17,'Salima','happy',3,'Mrs','u3445','[email protected]','2006-7-14','+911234580007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (18,'Salma','haik',5,'Sh','u4621','[email protected]','2008-6-23','+911234580008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (19,'Samita','patil',3,'Smt','u3156','[email protected]','2006-6-7','+911234580009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (20,'Sameena','sheikh',5,'Mrs','u952','[email protected]','2008-8-13','+911234580010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (21,'Ranita','gupta',5,'Mrs','u2664','[email protected]','2008-10-20','+911234580011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (22,'Ranjana','sharma',1,'Sh','u3085','[email protected]','2010-6-21','+911234580012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (23,'Ranjini','singh',6,'Mrs','u4200','[email protected]','2007-4-13','+911234580013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (24,'Ranjita','vyapari',2,'Smt','u1109','[email protected]','2008-1-22','+911234580014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (25,'Rashi','gupta',6,'Mrs','u3492','[email protected]','2006-2-2','+911234580015'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (26,'Rashmi','sehgal',3,'Mr','u3248','[email protected]','2008-9-9','+911234580016'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (27,'Rashmika','sexy',1,'Mrs','u4599','[email protected]','2009-3-12','+911234580017'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (28,'Rasika','dulari',3,'Smt','u2089','[email protected]','2009-1-24','+911234580018'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (29,'Dilber','lover',6,'Mr','u4241','[email protected]','2007-10-11','+911234580019'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (30,'Dilshad','happy',1,'Mr','u1564','[email protected]','2007-4-8','+911234580020'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (31,'Dipali','lights',5,'Sh','u1127','[email protected]','2006-11-1','+911234580021'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (32,'Dipika','lamp',1,'Sh','u2271','[email protected]','2010-12-17','+911234580022'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (33,'Dipti','brightness',5,'Smt','u422','[email protected]','2010-9-25','+911234580023'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (34,'Disha','singh',3,'Sh','u4604','[email protected]','2006-5-2','+911234580024'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (35,'Maadhav','Krishna',1,'Miss','u2561','[email protected]','2007-11-6','+911234580025'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (36,'Maagh','month',5,'Miss','u874','[email protected]','2008-5-8','+911234580026'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (37,'Maahir','Skilled',4,'Mr','u3372','[email protected]','2007-8-4','+911234580027'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (38,'Maalolan','Ahobilam',5,'Mrs','u3498','[email protected]','2007-7-9','+911234580028'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (39,'Maandhata','King',1,'Smt','u2089','[email protected]','2009-9-3','+911234580029'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (40,'Maaran','Brave',2,'Miss','u4020','[email protected]','2008-4-5','+9112345606001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (41,'Maari','Rain',2,'Sh','u3593','[email protected]','2007-12-5','+9112345606002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (42,'Madan','Cupid',4,'Mrs','u795','[email protected]','2007-11-11','+9112345606003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (43,'Madangopal','Krishna',3,'Sh','u438','[email protected]','2007-2-19','+9112345606004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (44,'sahil','gogna',1,'Sh','u2273','[email protected]','2007-10-7','+9112345606005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (45,'nikhil','gogna',2,'Mr','u1240','[email protected]','2009-9-14','+9112345606006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (46,'amit','gogna',5,'Sh','u3879','[email protected]','2006-2-8','+9112345606007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (47,'krishan','gogna',4,'Miss','u3632','[email protected]','2010-9-20','+9112345606008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (48,'anil','kashyap',4,'Smt','u3939','[email protected]','2010-3-15','+9112345606009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (49,'sunil','kashyap',5,'Mrs','u3493','[email protected]','2008-3-16','+9112345606010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (50,'sandy','singh',6,'Mrs','u4691','[email protected]','2009-6-2','+9112345606011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (51,'vishal','kapoor',3,'Mr','u1087','[email protected]','2010-5-13','+9112345606012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (52,'bala','ji',5,'Mrs','u4762','[email protected]','2007-8-9','+9112345606013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (53,'karan','sarin',4,'Miss','u3030','[email protected]','2008-4-8','+9112345606014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (54,'abhishek','kumar',4,'Miss','u1093','[email protected]','2008-12-21','+9112345605001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (55,'babu','the',1,'Miss','u1055','[email protected]','2008-7-2','+9112345506001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (56,'sandeep','gainda',3,'Miss','u1320','[email protected]','2010-5-14','+9112345606301'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (57,'dheeraj','kumar',3,'Miss','u3685','[email protected]','2007-10-14','+9112345606091'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (58,'dharmendra','chauhan',1,'Smt','u3235','[email protected]','2008-8-1','+9112345806001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (59,'max','alan',3,'Smt','u3465','[email protected]','2009-5-5','+9112345608011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (60,'hidayat','khan',3,'Smt','u958','[email protected]','2007-11-18','+911234599101'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (61,'himnashu','singh',4,'Miss','u2027','[email protected]','2008-3-2','+911234599102'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (62,'dinesh','kumar',6,'Sh','u3233','[email protected]','2008-5-9','+911234599103'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (63,'toshi','prakash',1,'Mr','u3766','[email protected]','2010-9-17','+911234599104'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (64,'niti','puri',3,'Mr','u3575','[email protected]','2009-11-15','+911234599105'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (65,'pawan','tikki',3,'Sh','u3919','[email protected]','2006-3-19','+911234599106'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (66,'gaurav','sharma',2,'Sh','u413','[email protected]','2010-4-2','+911234599107'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (67,'himanshu','verma',2,'Mrs','u4732','[email protected]','2009-3-20','+911234599108'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (68,'priyanshu','verma',3,'Sh','u183','[email protected]','2010-8-12','+911234599109'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (69,'nitika','luthra',2,'Mrs','u4259','[email protected]','2010-7-12','+911234599110'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (70,'neeru','gogna',2,'Sh','u1633','[email protected]','2010-6-23','+91532110000'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (71,'bindu','gupta',1,'Sh','u1859','[email protected]','2006-11-10','+91532110001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (72,'gurleen','bakshi',5,'Miss','u1423','[email protected]','2007-7-1','+91532110003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (73,'rahul','gupta',3,'Sh','u1223','[email protected]','2009-8-11','+91532110004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (74,'jagdish','salgotra',3,'Mr','u12','[email protected]','2008-5-19','+91532110005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (75,'vikas','sharma',3,'Smt','u465','[email protected]','2006-6-2','+91532110006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (76,'poonam','mahendra',2,'Sh','u1744','[email protected]','2009-12-2','+91532110007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (77,'pooja','kulkarni',3,'Mrs','u1903','[email protected]','2008-10-6','+91532110008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (78,'priya','mahajan',6,'Sh','u4205','[email protected]','2010-8-5','+91532110009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (79,'manoj','zerger',1,'Mrs','u3369','[email protected]','2009-12-4','+91532110010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (80,'mohan','master',5,'Mr','u2841','[email protected]','2010-10-7','+91532110011'); Please note that above employee data is just a data *only data* I created using a small java programme using random number generators and reading some names file, so you may find titles getting messed up :( We next create a procedure that we will use from java to fetch records that we just inserted. DROP PROCEDURE IF EXISTS search_address_book; CREATE PROCEDURE search_address_book(IN address_ids VARCHAR(1000) ) BEGIN DECLARE search_address_query VARCHAR(2000) DEFAULT ''; SET address_ids = CONCAT('\'', REPLACE(address_ids, ',', '\',\''), '\''); SET search_address_query = CONCAT(search_address_query, ' select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, ab.Title as Title, ab.Alias as Alias, ab.Email as Email, ab.DOJ as DOJ, ab.PhoneNo as PhoneNo ' ); SET search_address_query = CONCAT(search_address_query, ' from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id '); SET search_address_query = CONCAT(search_address_query, ' where ab.id IN (', address_ids ,') '); SET @statement = search_address_query; PREPARE dynquery FROM @statement; EXECUTE dynquery; DEALLOCATE PREPARE dynquery; END; # To get records for ids 1, 6 and 7, we run following commands: call search_address_book('1,6,7'); Configuring Sphinx It turns out that it was not terribly difficult to setup sphinx, but I had a hard time finding instructions on the web, so I'll post my steps here. By default Sphinx looks for 'sphinx.co.in' configuration file to come with indexes and other stuff, lets create and define source and index for our sample application addressbook.conf (read between the lines) ############################################################################# ## data source definition ############################################################################# source addressBookSource { ## SQL settings for 'mysql' ## type = mysql # some straightforward parameters for SQL source types sql_host = localhost sql_user = root sql_pass = root sql_db = addressbook sql_port = 3306 # optional, default is 3306 # pre-query, executed before the main fetch query sql_query_pre = SET NAMES utf8 # main document fetch query, integer document ID field MUST be the first selected column sql_query = \ select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, \ ab.Title as Title, ab.Alias as Alias, ab.Email as Email, UNIX_TIMESTAMP(ab.DOJ) as DOJ, ab.PhoneNo as PhoneNo \ from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id sql_attr_timestamp = DOJ # document info query, ONLY for CLI search (ie. testing and debugging) , optional, default is empty must contain $id macro and must fetch the document by that id sql_query_info = SELECT * FROM AddressBook WHERE id=$id } ############################################################################# ## index definition ############################################################################# # local index example, this is an index which is stored locally in the filesystem index addressBookIndex { # document source(s) to index source = addressBookSource # index files path and file name, without extension, make sure you have this folder path = C:\devel\sphinx-0.9.9-win32\data\addressBookIndex # document attribute values (docinfo) storage mode docinfo = extern # memory locking for cached data (.spa and .spi), to prevent swapping mlock = 0 morphology = none # make sure this file exists exceptions =C:\devel\sphinx-0.9.9-win32\data\exceptions.txt enable_star = 1 } ############################################################################# ## indexer settings ############################################################################# indexer { # memory limit, in bytes, kiloytes (16384K) or megabytes (256M) # optional, default is 32M, max is 2047M, recommended is 256M to 1024M mem_limit = 32M # maximum IO calls per second (for I/O throttling) # optional, default is 0 (unlimited) # # max_iops = 40 # maximum IO call size, bytes (for I/O throttling) # optional, default is 0 (unlimited) # # max_iosize = 1048576 # maximum xmlpipe2 field length, bytes # optional, default is 2M # # max_xmlpipe2_field = 4M # write buffer size, bytes # several (currently up to 4) buffers will be allocated # write buffers are allocated in addition to mem_limit # optional, default is 1M # # write_buffer = 1M } ############################################################################# ## searchd settings ############################################################################# searchd { # hostname, port, or hostname:port, or /unix/socket/path to listen on listen = 9312 # log file, searchd run info is logged here # optional, default is 'searchd.log' log = C:\devel\sphinx-0.9.9-win32\data\log\searchd.log # query log file, all search queries are logged here # optional, default is empty (do not log queries) query_log = C:\devel\sphinx-0.9.9-win32\data\log\query.log # client read timeout, seconds # optional, default is 5 read_timeout = 5 # request timeout, seconds # optional, default is 5 minutes client_timeout = 300 # maximum amount of children to fork (concurrent searches to run) # optional, default is 0 (unlimited) max_children = 30 # PID file, searchd process ID file name # mandatory pid_file = C:\devel\sphinx-0.9.9-win32\data\log\searchd.pid # max amount of matches the daemon ever keeps in RAM, per-index # WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL # default is 1000 (just like Google) max_matches = 1000 # seamless rotate, prevents rotate stalls if precaching huge datasets # optional, default is 1 seamless_rotate = 1 # whether to forcibly preopen all indexes on startup # optional, default is 0 (do not preopen) preopen_indexes = 0 } # --eof-- Once the configuration is done, its time to index our sql data, the command to use is 'indexer' as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... indexing index 'addressBookIndex'... collected 80 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 80 docs, 5514 bytes total 0.057 sec, 96386 bytes/sec, 1398.43 docs/sec total 2 reads, 0.000 sec, 3.5 kb/call avg, 0.0 msec/call avg total 7 writes, 0.000 sec, 2.5 kb/call avg, 0.0 msec/call avg Note: As I told earlier that Sphinx creates 1 document for each row, as we had 80 rows in the database so a total of 80 docs are created. Time taken is also very very small, believe me I tried with half million rows and it took around 3-4 seconds :) cool isn't it? Once the index is up let's try to search few records, the utility command to perform search is 'search'. Ok Sphinx maharaj* please search for employee whose alias is u4732 C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf u4732 CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... index 'addressBookIndex': query 'u4732 ': returned 1 matches of 1 total in 0.001 sec displaying matches: 1. document=67, weight=1, doj=Fri Mar 20 00:00:00 2009 Id=67 FirstName=himanshu LastName=verma OfficeId=2 Title=Mrs Alias=u4732 [email protected] DOJ=2009-03-20 PhoneNo=+911234599108 words: 1. 'u4732': 1 documents, 1 hits words: 1. 'u4732': 1 documents, 1 hits As you can see above this is a unique record for Himanshu. Note: You see a lot of information for the result, this is because of following line in our configuration file sql_query_info = SELECT * FROM AddressBook WHERE id=$id If you want to see less columns you need to change the sql_query_info in configuration file. Let's try another search, sphinx maharaj* please tell me which all rows have gurleen or toshi in them. C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --any toshi gurleen CONSOLE: displaying matches: 1. document=63, weight=2, doj=Fri Sep 17 00:00:00 2010 Id=63 FirstName=toshi LastName=prakash OfficeId=1 Title=Mr Alias=u3766 [email protected] DOJ=2010-09-17 PhoneNo=+911234599104 2. document=72, weight=2, doj=Sun Jul 01 00:00:00 2007 Id=72 FirstName=gurleen LastName=bakshi OfficeId=5 Title=Miss Alias=u1423 [email protected] DOJ=2007-07-01 PhoneNo=+91532110003 Exactly two records were returned and this is what we were expecting. The following special operators and modifiers can be used when using the extended matching mode: operator OR: nikhil | sahil operator NOT: hello -sandy hello !sandy field search operator: @Email [email protected] For a complete set of search features , I advise you to go through http://sphinxsearch.com/docs/manual-0.9.9.html#searching link. Sphinx as Windows Service Now our main aim is to use sphinx with JAVA API, so let's move towards that now, before java can utilize the true power of Sphinx, we need to start 'searchd' as a windows service so that our java programme can connect to sphinx search engine. Let's install Sphinx as a windows service so that our java program can use this daemon service to query the index that we just created, the command is : C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff Installing service... Service 'SphinxSearch' installed succesfully. Well now the sphinx is ready to serve us on port 9312 Note: If you try to install Sphinx without admin rights, you may get following error messages. C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Installing service... FATAL: OpenSCManager() failed: code=5, error=Access is denied. Once done you can start the service as: c:\>sc start SphinxSearch (or alternatively from the services screen, start 'services.msc' in windows Run) If some how you want to delete the service , use c:\>sc delete SphinxSearch Let's create an adapter to fetch data from the database. package it.gogs.sphinx.util; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.sql.CallableStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.util.ArrayList; import java.util.List; import org.apache.log4j.Logger; /** * Adapter to fetch data from the database. * * @author Munish Gogna * */ public class AddressBookAdapter { private static Logger logger = Logger.getLogger(AddressBookAdapter.class); private AddressBookAdapter() { // use in static way.. } private static Connection getConnection() throws AddressBookTechnicalException { String userName = "root"; String password = "root"; String url = "jdbc:mysql://localhost/addressbook"; try { Class.forName("com.mysql.jdbc.Driver").newInstance(); return DriverManager.getConnection(url, userName, password); } catch (Exception e) { throw new AddressBookTechnicalException("could not get connection"); } } public static List getAddressBookList(List addressIds) throws AddressBookTechnicalException, AddressBookBizException { List addressBoookList = new ArrayList(); if (addressIds == null || addressIds.size() == 0){ logger.error("AddressIds was null or empty, returning empty list"); return addressBoookList; } Connection connection = null; CallableStatement callableStatement = null; try { connection = getConnection(); callableStatement = connection.prepareCall("{ call search_address_book(?)}"); callableStatement.setString(1, Utils.toCommaString(addressIds)); callableStatement.execute(); ResultSet resultSet = callableStatement.getResultSet(); prepareResults(resultSet, addressBoookList); connection.close(); } catch (SQLException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw new AddressBookTechnicalException(e.getMessage()); } catch (AddressBookTechnicalException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw e; } finally{ if(connection != null){ try { connection.close(); } catch (SQLException e) { logger.error("Problem closing conection - " + e.getMessage()); e.printStackTrace(); } } } return addressBoookList; } private static void prepareResults(ResultSet resultSet, List addressBoookList) throws SQLException { AddressBoook addressBoook; while (resultSet.next()) { addressBoook = new AddressBoook(); addressBoook.setAlias(resultSet.getString("Alias")); addressBoook.setEmail(resultSet.getString("Email")); addressBoook.setfName(resultSet.getString("FName")); addressBoook.setlName(resultSet.getString("LName")); addressBoook.setOfficeLocation(resultSet.getString("Location")); addressBoook.setPhoneNo(resultSet.getString("PhoneNo")); addressBoook.setTitle(resultSet.getString("Title")); addressBoook.setDateOfJoining(resultSet.getDate("DOJ")); addressBoook.setId(resultSet.getLong("Id")); addressBoookList.add(addressBoook); } } } Next we create the SphinxInstance that will parse the keywords and date range and provide us a list of Ids that matches the search. package it.gogs.sphinx.util; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxClient; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.api.SphinxMatch; import it.gogs.sphinx.api.SphinxResult; import it.gogs.sphinx.exception.AddressBookBizException; import java.util.ArrayList; import java.util.Date; import java.util.List; import org.apache.log4j.Logger; /** * Instance that will parse our free text and provide the results. * * Note: Make sure that 'searchd' is up and running before you use this class * @author Munish Gogna * */ public class SphinxInstance { private static String SPHINX_HOST = "localhost"; private static String SPHINX_INDEX = "addressBookIndex"; private static int SPHINX_PORT = 9312; private static SphinxClient sphinxClient; private static Logger logger = Logger.getLogger(SphinxInstance.class); static { sphinxClient = new SphinxClient(SPHINX_HOST, SPHINX_PORT); } public static List getAddressBookIds(SearchCriteria criteria) throws AddressBookBizException, SphinxException { List addressIdsList = new ArrayList(); try { if (Utils.isNull(criteria)) { logger.error("criteria is null"); throw new AddressBookBizException("criteria is null"); } if (Utils.isNull(criteria.getKeywords())) { logger.error("keyword is a required field"); throw new AddressBookBizException("keyword is a required field"); } DateRange dateRange = criteria.getDateRage(); if (!Utils.isNull(dateRange)) { if (Utils.isDateRangeValid(dateRange)) { // this is to filter results based on joining dates if they are provided sphinxClient.SetFilterRange("DOJ", getTimeInSeconds(dateRange.getFromDate()), getTimeInSeconds(dateRange.getToDate()), false); } else { logger.error(" fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); throw new AddressBookBizException("fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); } } sphinxClient.SetMatchMode(SphinxClient.SPH_MATCH_EXTENDED2); sphinxClient.SetSortMode(SphinxClient.SPH_SORT_RELEVANCE, ""); SphinxResult result = sphinxClient.Query(buildSearchQuery(criteria), SPHINX_INDEX, "buidling query for address book search"); SphinxMatch[] matches = result.matches; for (SphinxMatch match : matches) { addressIdsList.add(String.valueOf(match.docId)); } } catch (SphinxException e) { throw e; } catch (AddressBookBizException e) { throw e; } logger.info("Total record(s):" + addressIdsList.size()); return addressIdsList; } private static long getTimeInSeconds(Date time) { return time.getTime()/1000; } private static String buildSearchQuery(SearchCriteria criteria) throws AddressBookBizException { String keywords[] = criteria.getKeywords().split(" "); StringBuilder searchFor = new StringBuilder(); for (String key : keywords) { if (!Utils.isEmpty(key)) { searchFor.append(key); if (searchFor.length() > 1) { searchFor.append("*|*"); } } } searchFor.delete(searchFor.lastIndexOf("|*"), searchFor.length()); StringBuilder queryBuilder = new StringBuilder(); String query = searchFor.toString(); queryBuilder.append("@FName *" + query + " | "); queryBuilder.append("@LName *" + query + " | "); queryBuilder.append("@Title *" + query + " | "); queryBuilder.append("@Location *"+ query + " | "); queryBuilder.append("@Alias *" + query + " | "); queryBuilder.append("@Email *" + query + " | "); queryBuilder.append("@PhoneNo *" + query); logger.info("Sphinx Query: " + queryBuilder.toString()); return queryBuilder.toString(); } } Here is the interface that I will expose to the outside world (in my future article I will expose this interface as Web Service) import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.util.List; /** * * @author Munish Gogna * */ public interface AddressBook { /** * Returns the list of AddressBook objects based on search criteria. * * @param criteria * @throws AddressBookTechnicalException * @throws AddressBookBizException * @throws SphinxException */ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException; } and here is the implementation class for the same. package it.gogs.sphinx.addressbook.impl; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import it.gogs.sphinx.util.AddressBookAdapter; import it.gogs.sphinx.util.SphinxInstance; /** * Implementation for our Address Book example * * @author Munish Gogna * */ public class AddressBookImpl implements AddressBook{ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException { List addressIds= SphinxInstance.getAddressBookIds(criteria); return AddressBookAdapter.getAddressBookList(addressIds); } } ok so far so good, let's run some tests now ............ package it.gogs.sphinx.test; import java.util.Calendar; import java.util.GregorianCalendar; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.addressbook.impl.AddressBookImpl; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import junit.framework.TestCase; /** * * @author Munish Gogna * */ public class AddressBookTest extends TestCase { private AddressBook addressBook; @Override protected void setUp() throws Exception { super.setUp(); addressBook = new AddressBookImpl(); } @Override protected void tearDown() throws Exception { super.tearDown(); } /** this should be a unique record for Himanshu */ public void test_search_for_himanshu() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the first 'search' example?? criteria.setKeywords("u4732"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 1); assertTrue("expecting himanshu here", "himanshu".equals(addressList.get(0).getfName())); } /** only two employees have name gurleen or toshi */ public void test_search_for_gurleen_or_toshi() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the second 'search' example?? criteria.setKeywords("gurleen toshi"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 2); assertTrue("expecting toshi here", "toshi".equals(addressList.get(0).getfName())); assertTrue("expecting gurleen here", "gurleen".equals(addressList.get(1).getfName())); } /** there are 16 people from jammu location */ public void test_search_for_people_from_jammu_location() throws Exception { SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("jammu"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 16); } /** only Aalap, Manda and nitika are having title as Mrs and joined in 2010 */ public void test_joined_in_2010_with_title_Mrs() throws Exception { DateRange dateRange = new DateRange(); GregorianCalendar calendar1 = new GregorianCalendar(); calendar1.set(Calendar.YEAR, 2010); calendar1.set(Calendar.MONTH, Calendar.JANUARY); calendar1.set(Calendar.DAY_OF_MONTH, 1); dateRange.setFromDate(calendar1.getTime()); GregorianCalendar calendar2 = new GregorianCalendar(); calendar2.set(Calendar.YEAR, 2010); calendar2.set(Calendar.MONTH, Calendar.DECEMBER); calendar2.set(Calendar.DAY_OF_MONTH, 31); dateRange.setToDate(calendar2.getTime()); SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("Mrs"); criteria.setDateRage(dateRange); List addressList = addressBook.getAddressBookList(criteria); assertTrue("expecting 3 records here", addressList.size() == 3); } /** should get a business exception here */ public void test_without_specifying_keywords(){ SearchCriteria criteria = new SearchCriteria(); //criteria.setKeywords("Mrs"); try { addressBook.getAddressBookList(criteria); } catch (Exception e) { assertTrue(e instanceof AddressBookBizException); assertTrue(e.getMessage().indexOf("keyword is a required field") >-1); } } } How we update the Index once database changes? For these kinds of requirements, we can set up two sources and two indexes, with one "main" index for the data which only changes rarely (if ever), and one "delta" for the new documents. First Time data will go in the "main" index and the newly inserted address book entries will go into "delta". Delta index could then be reindexed very frequently, and the documents can be made available to search in a matter of minutes. Also one thing to take from this article is once 'searchd' daemon is running we can't index the data in normal way,we have to use --rotate option in such cases. For some applications where there is a timely batch update for the data, we can configure some cron job to reindex our documents in Sphinx as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --rotate Capsule We asked Sphinx to provide us the Document Ids corresponding to our search parameters and then we used those Ids to fire database query. In case the data we want to return is included in Index (DOJ attribute for example in our case) we can skip the database portion, so choose wisely how much information (attributes) you want to include while you index your sql data. Well that's all ... it's time to say good bye. Take good care of your health and don't forget to vote, its a must :) - Munish Gogna
December 7, 2010
by Munish Gogna
· 38,087 Views · 2 Likes
article thumbnail
Setting mouse cursor position with WinAPI
Setting the mouse cursor position on a Windows machine with the help of .NET Framework shouldn't be that big of a problem. After all, there is the built-in Cursor class that lets you do that by executing a simple line of code: Cursor.Position = new System.Drawing.Point(0, 0); Of course, here 0 and 0 are the absolute coordinates for the mouse cursor on the screen. One thing to mention about this type of position setting is that Cursor requires a reference to System.Windows.Forms. And in some cases you don't want this extra reference. If that's the case, WinAPI is your solution. It requires some more work compared to the regular .NET way (class instance -> method call) but at the end you get more control than you would expect. When using WinAPI to set the cursor position, there are two ways you can go: mouse_event SendInput mouse_event is the very basic function that is only able to set the mouse coordinates. It was superseded and Microsoft recomends using SendInput instead. Nonetheless, it still works (although I cannot say for sure whether it will be working in future releases of Windows). So to start, I have a very basic class: class WINAPI_SUPERSEDED { [DllImport("user32.dll",SetLastError=true)] public static extern void mouse_event(uint dwFlags, uint dx, uint dy, uint dwData, int dwExtraInfo); public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } public enum DataFlags { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } } So to set the cursor position to 0,0 I would use this: WINAPI_SUPERSEDED.mouse_event((int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_MOVE | (int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_ABSOLUTE, 0, 0, 0, 0); If you go through the documentation, you will notice that in fact, dx and dy are in no way direct coordinates but rather normalized values ranged between 0 and 65,535 - that is only when the MOUSEEVENTF_ABSOLUTE flag is present. Otherwise, the position will be adjusted according to the current mouse cursor position. This method doesn't return any value so I cannot be informed whether it was successful or not. GetLastError won't give much information either. In this case, SendInput comes to the rescue. Here is what I have defined as the main class: class WINAPI { [DllImport("kernel32.dll")] public static extern uint GetLastError(); public enum MouseData { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_HWHEEL = 0x01000, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_MOVE_NOCOALESCE = 0x2000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_VIRTUALDESK = 0x4000, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } [DllImport("user32.dll", SetLastError=true)] public static extern uint SendInput(uint nInputs, ref INPUT pInputs, int cbSize); [StructLayout (LayoutKind.Explicit)] public struct INPUT { [FieldOffset(0)] public int type; [FieldOffset(4)] public MOUSEINPUT mi; } public struct MOUSEINPUT { public int dx; public int dy; public int mouseData; public int dwFlags; public int time; public int extraInfo; } } This class is a bit more complicated, but at the same time you have to understand that in some cases, SendInput is used for hardware and keyboard input as well. Of course, for experimentation purposes I removed those parts in the sample class. Here you have the same MouseFlags enum that will let you pass custom flags defining the mouse behavior. Notice, that SendInput has SetLastError set to true, therefore if something wrong happens via this method, the error will be easily obtained via GetLastError, that is implemented as a helper method in the same class. VERY IMPORTANT: When you define the INPUT struct, make sure you use LayoutKind.Explicit since when passed to an unamanaged call, a specific field layout is required - as you can see, every field is decorated with a FieldOffset attribute. Also taking about StructLayout, you don't have to set StructLayout.Sequential to MOUSEINPUT since it is setautomatically by CLR. When I want to call the method above, I can simply use this snippet: WINAPI.MOUSEINPUT mouseInput = new WINAPI.MOUSEINPUT(); mouseInput.dx = 100; mouseInput.dy = 10; mouseInput.dwFlags = (int)WINAPI.MouseFlags.MOUSEEVENTF_ABSOLUTE | (int)WINAPI.MouseFlags.MOUSEEVENTF_MOVE; WINAPI.INPUT input = new WINAPI.INPUT(); input.type = 0; input.mi = mouseInput; uint x = WINAPI.SendInput(1, ref input, Marshal.SizeOf(input)); Console.WriteLine(x); Console.WriteLine(WINAPI.GetLastError()); Console.ReadLine(); Notice that I have to call the unmanaged version of sizeof and pass the INPUT struct to it in order for the method to correctly execute. The regular C# sizeof won't cut it here. When I am defining the type of input, 0 is repesenting the INPUT_MOUSE flag, since I am only handling the mouse here. Of course, I can re-organize my method to accept a set of INPUT instances - the native call itself allows this by requesting an array of INPUT and the correct indication of the number of INPUT instances passed, but that is not required for testing purposes.
November 28, 2010
by Denzel D.
· 17,325 Views
article thumbnail
Map Reduce and Stream Processing
Hadoop Map/Reduce model is very good in processing large amount of data in parallel. It provides a general partitioning mechanism (based on the key of the data) to distribute aggregation workload across different machines. Basically, map/reduce algorithm design is all about how to select the right key for the record at different stage of processing. However, "time dimension" has a very different characteristic compared to other dimensional attributes of data, especially when real-time data processing is concerned. It presents a different set of challenges to the batch oriented, Map/Reduce model. Real-time processing demands a very low latency of response, which means there isn't too much data accumulated at the "time" dimension for processing. Data collected from multiple sources may not have all arrived at the point of aggregation. In the standard model of Map/Reduce, the reduce phase cannot start until the map phase is completed. And all the intermediate data is persisted in the disk before download to the reducer. All these added to significant latency of the processing. Here is a more detail description of this high latency characteristic of Hadoop. Although Hadoop Map/Reduce is designed for batch-oriented work load, certain application, such as fraud detection, ad display, network monitoring requires real-time response for processing large amount of data, have started to looked at various way of tweaking Hadoop to fit in the more real-time processing environment. Here I try to look at some technique to perform low-latency parallel processing based on the Map/Reduce model. General stream processing model In this model, data are produced at various OLTP system, which update the transaction data store and also asynchronously send additional data for analytic processing. The analytic processing will write the output to a decision model, which will feed back information to the OLTP system for real-time decision making. Notice the "asynchronous nature" of the analytic processing which is decoupled from the OLTP system, this way the OLTP system won't be slow down waiting for the completion of the analytic processing. Nevetheless, we still need to perform the analytic processing ASAP, otherwise the decision model will not be very useful if it doesn't reflect the current picture of the world. What latency is tolerable is application specific. Micro-batch in Map/Reduce One approach is to cut the data into small batches based on time window (e.g. every hour) and submit the data collected in each batch to the Map Reduce job. Staging mechanism is needed such that the OLTP application can continue independent of the analytic processing. A job scheduler is used to regulate the producer and consumer so each of them can proceed independently. Continuous Map/Reduce Here lets imagine some possible modification of the Map/Reduce execution model to cater for real-time stream processing. I am not trying to worry about the backward compatibility of Hadoop which is the approach that Hadoop online prototype (HOP) is taking. Long running The first modification is to make the mapper and reducer long-running. Therefore, we cannot wait for the end of the map phase before starting the reduce phase as the map phase never ends. This implies the mapper push the data to the reducer once it complete its processing and let the reducer to sort the data. A downside of this approach is that it offers no opportunity to run the combine() function on the map side to reduce the bandwidth utilization. It also shift more workload to the reducer which now needs to do the sorting. Notice there is a tradeoff between latency and optimization. Optimization requires more data to be accumulated at the source (ie: the Mapper) so local consolidation (ie: combine) can be performed. Unfortunately, low latency requires the data to be sent ASAP so not much accumulation can be done. HOP suggest an adaptive flow control mechanism such that data is pushed out to reducer ASAP until the reducer is overloaded and push back (using some sort of flow control protocol). Then the mapper will buffer the processed message and perform combine() before it send to the reducer. This approach automatically shift back and forth the aggregation workload between the reducer and the mapper. Time Window: Slice and Range This is a "time slice" concept and a "time range" concept. "Slice" defines a time window where result is accumulated before the reduce processing is executed. This is also the minimum amount of data that the mapper should accumulate before sending to the reducer. "Range" defines the time window where results are aggregated. It can be a landmark window where it has a well-defined starting point, or a jumping window (consider a moving landmark scenario). It can also be a sliding window where is a fixed size window from the current time is aggregated. After receiving a specific time slice from every mapper, the reducer can start the aggregation processing and combine the result with the previous aggregation result. Slice can be dynamically adjusted based on the amount of data sent from the mapper. Incremental processing Notice that the reducer need to compute the aggregated slice value after receive all records of the same slice from all mappers. After that it calls the user-defined merge() function to merge the slice value with the range value. In case the range need to be refreshed (e.g. reaching a jumping window boundary), the init() functin will be called to get a refreshed range value. If the range value need to be updated (when certain slice value falls outside a sliding range), the unmerge() function will be invoked. Here is an example of how we keep tracked of the average hit rate (ie: total hits per hour) within a 24 hour sliding window with update happens per hour (ie: an one-hour slice). # Call at each hit record map(k1, hitRecord) { site = hitRecord.site # lookup the slice of the particular key slice = lookupSlice(site) if (slice.time - now > 60.minutes) { # Notify reducer whole slice of site is sent advance(site, slice) slice = lookupSlice(site) } emitIntermediate(site, slice, 1) } combine(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(site, slice, hitCount) } # Called when reducer receive full slice from all mappers reduce(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } sv = SliceValue.new sv.hitCount = hitCount return sv } # Called at each jumping window boundary init(slice) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Called after each reduce() merge(rangeValue, slice, sliceValue) { rangeValue.hitCount += sliceValue.hitCount } # Called when a slice fall out the sliding window unmerge(rangeValue, slice, sliceValue) { rangeValue.hitCount -= sliceValue.hitCount }
November 23, 2010
by Ricky Ho
· 17,245 Views · 1 Like
article thumbnail
Implementing Retries with a MDB or an MQ Batch Job? (WAS 7, MQ 6)
Both approaches have some advantages and disadvantages and so it’s a question of the likelihood of particular problems and business requirements and priorities.
November 10, 2010
by Jakub Holý
· 27,265 Views
article thumbnail
Struts 2 : Creating and Accessing Maps
Today, in this post, I am going to discuss how to create and access HashMaps in Struts 2. My environment has the following jar files. struts2-core-2.1.8.1.jar ognl-2.7.3.jar Struts 2 makes extensive use of OGNL in order to retrieve the values of elements. OGNL stands for Object Graph Navigation Language. As the name suggests, OGNL is used to navigate an object graph. In this post, i am going to use the OGNL syntax to create Map on a jsp page, and show you how to iterate over it to fetch the keys and values from the map. In the following example, i will create a map, on they fly in the iterator tag, and will use it in the body of the iterator tag. Note the syntax that has been used to create a Map on a jsp page in struts 2 using OGNL. Once the map is created, the iterator tag can be used to iterate over each element of the map. Now suppose the map that we want to access is in a map inside the HttpRequest. Assume that some action somewhere in the chain has kept a map in the request using the key "myMap". In order to iterate over the elements of the map, we can do the following Okay now, enough with iterating over maps. I don't think there are any more permutations for accessing maps. But if i do find more, ill document them down here. Consider a case where you dont want to iterate over the entire map. Instead, all that you want to do is to extract a value form the map based upon a key that is already known to you in your jsp page. Assume that you have a variable in page scope called "runtimeKey" that you set using the s:set tag. The value of this variable is a string key that can be used to get a value from a map. Here is how you can fetch the value from the map without iterating over it. As you see, since the variable "runtimeKey" is an OGNL variable and is available on the value stack, it can be referenced using the # notations. Also not that instead of using the dot notation, I have used square brackets to fetch the key. This is because the value of my expression "#runtimeKey will only be evaluated when its inside the brackets. Also note that the value of the key runtimeKey is contained within single quotes to direct OGNL to evaluate it as a string when setting as the value for my key. Consider another situation where your keys follow a pattern. For example key_1, key_2. And you have the values 1 and 2 as page scoped variables. So, now instead of having the string value of the key directly, you may have to construct the key using concatenation. Your key pattern was set above. And you Map has a key called key_1. So, here is how you would have to concatenate your strings in order to construct the key and fetch the value. Huh. So easy! Thats all for now folks.. Stay tuned for more! Happy Programming :) From http://mycodefixes.blogspot.com/2010/11/struts-2-creating-and-accessing-maps.html
November 8, 2010
by Ryan Sukale
· 32,516 Views
article thumbnail
MyBatis (formerly iBatis) – Examples and Hints using SELECT, INSERT and UPDATE Annotations
MyBatis is a lightweight persistence framework for Java and .NET. This blog entry addresses the Java side. MyBatis is an alternative positioned somewhere between plain JDBC and ORM frameworks (e.g. EclipseLink or Hibernate). MyBatis usually uses XML, but it also supports annotations since version 3. The documentation is very detailed for XML, but lacks annotation examples. Just the Annotations itself are described, but no examples how to use them. I could not find any good and easy examples anywhere, so I will describe some very basic examples for SELECT, INSERT and UPDATE statements by implementing a Data Access Object (DAO) using MyBatis. These examples are a good starting point to create more complex MyBatis queries using a DAO. You can find the full source code at the end of this blog. A simple SQL-Table I use a very simple table with two attributes. The name of the table is “simple_information”. The primary key is a Integer and may not be null (info_id). The only real data is a character and may also not be null (info_content). That is enough “complexity” to learn the usage of MyBatis annotations. The Java class “SimpleInformationEntity” is a POJO which contains these two attributes. @SELECT-Statement The @Select annotation is very easy to use, if you want to use exactly one paramter. If you need more than one paramter, use the @Param annotation (which is described below at the update example). You do not have to map the found information to a SimpleInformationEntity object, as you would have to do with a JDBC ResultSet. The magic of the framework does this for you. final String GET_INFO = “SELECT * FROM simple_information WHERE info_id = #{info_id}”; @Select(GET_INFO) public SimpleInformationEntity getSimpleInformationById(int info_id) throws Exception; @INSERT-Statement You can use the object (which you want to be persist) as parameter. You do not have to use several parameters for each attribute of the object. The magic of the framework does this for you. final String PERSIST_INFO = “INSERT INTO simple_information(info_id, info_content) VALUES (#{infoId}, #{infoContent})”; @Insert(PERSIST_INFO) public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception; @UPDATE-Statement You cannot use more than one parameter within a method. If you want to understand why, look at some MyBatis XML examples in the documenation: There you use the attribute “parameterType”, which must be exactly one “parameter”! So you will get a (strange) exception, if you use two or more parameters. Instead you have to use the @Param annotation, if you need more than one parameter. final String UPDATE_INFO = “UPDATE simple_information SET info_content = #{newInfo} WHERE info_id = #{infoId}”; @Update(UPDATE_INFO) public int updateInformation(@Param(“infoId”) int info_id, @Param(“newInfo”) String new_content) throws Exception; Configuration You have to add your MyBatis-Interface for the Mapper to the SQLSessionFactory: sqlSessionFactory.getConfiguration().addMapper(InfoMapper.class); Some Hints for developing with MyBatis The following means helped me a lot to use MyBatis annotations despite the lack of documentation about using annotations: - MyBatis is open source! So add the sources to your build path and use the debugging function of your IDE to enter the MyBatis source code while executing some queries. You will see what MyBatis expects as input and how it is processed. - Read the documentation about using MyBatis XML. This does not really make any sense you think? It does! The processing deep inside MyBatis does not change if you use annotations instead of XML. It is just another way to develop persistence queries. E.g. if you know that a XML select query may just use exactly one parameterType-attribute, then you know that you may just use one parameter in an annotation-based method too! If you need more parameters, you have to use the @Param annotation. - If you get any strange exception that does not make any sense, then clean and re-compile your project. This often helps, because as with other persistence frameworks such as Hibernate, the bytecode enhancement sometimes confuses your IDE. Conclusion: @MyBatis-Team: Improve and extend the documentation instead of improving the framework itself! MyBatis is a nice lightweigt persistence framework. But the documentation is not enough detailed. Some important information is completely missing. Especially, if you are a newbie to MyBatis / iBatis, it is very tough to develop with MyBatis using annotations instead of XML. Besides the usage of annotations, another good example for missing documentation is how to configure transactions in MyBatis by using JNDI and a J2EE / JEE Application server. You have to use google to find out, and if you are lucky you will find a mailing list or blog entry describing your problem. If not, you have to try it out. The missing documentation makes MyBatis much more tough than it actually is. So in the next months, the MyBatis team should improve and extend the documentation instead of improving the framework itself… Best regards, Kai Wähner (Twitter: @Kai Waehner) [Content from my Blog: MyBatis (formerly iBatis) - Examples and Hints using Select, Insert and Update Annotations - Kai Wähner's IT-Blog] Appendix: Source Code Here you see all the necessary source code, also including a MyBatis Connection Factory, which reads the configuration data from a XML file. ############################################################################ Connection Factory (using a static initializer) ############################################################################ import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import org.apache.ibatis.session.SqlSessionFactory; import org.apache.ibatis.session.SqlSessionFactoryBuilder; import de.waehner.kai.persistence.InfoDAO.InfoMapper; public class MyBatisConnectionFactory { private static SqlSessionFactory sqlSessionFactory; static { Reader reader = null; try { InputStream in = MyBatisConnectionFactory.class.getResourceAsStream(“myBatisConfiguration.xml”); reader = new InputStreamReader(in); if (sqlSessionFactory == null) { sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader); sqlSessionFactory.getConfiguration().addMapper(InfoMapper.class); } in.close(); } catch (FileNotFoundException fileNotFoundException) { fileNotFoundException.printStackTrace(); } catch (IOException iOException) { iOException.printStackTrace(); } } public static SqlSessionFactory getSqlSessionFactory() { return sqlSessionFactory; } } ############################################################################ Data Acces Object (including the MyBatis-Mapper as inner Class) ############################################################################ import org.apache.ibatis.annotations.Insert; import org.apache.ibatis.annotations.Param; import org.apache.ibatis.annotations.Select; import org.apache.ibatis.annotations.Update; import org.apache.ibatis.session.SqlSession; import org.apache.ibatis.session.SqlSessionFactory; public class InfoDAO { public interface InfoMapper { final String GET_INFO = “SELECT * FROM simple_information WHERE info_id = #{info_id}”; @Select(GET_INFO) public SimpleInformationEntity getSimpleInformationById(int info_id) throws Exception; final String PERSIST_INFO = “INSERT INTO simple_information(info_id, info_content) VALUES (#{infoId}, #{infoContent})”; @Insert(PERSIST_INFO) public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception; final String UPDATE_INFO = “UPDATE simple_information SET info_content = #{newInfo} WHERE info_id = #{infoId}”; @Update(UPDATE_INFO) public int updateInformation(@Param(“infoId”) int info_id, @Param(“newInfo”) String new_content) throws Exception; } public SimpleInformationEntity getSingleAlarm(int info_id) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); SimpleInformationEntity simpleInfo = mapper.getSimpleInformationById(info_id); return simpleInfo; } finally { session.close(); } } public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); int answer = mapper.persistInformation(simpleInfo); return answer; } finally { session.close(); } } public int updateInformation(int info_id, String new_content) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); int answer = mapper.updateInformation(info_id, new_content); return answer; } finally { session.close(); } } } ############################################################################ SimpleInformation Entity (a simple POJO) ############################################################################ import java.io.Serializable; public class SimpleInformationEntity implements Serializable{ private static final long serialVersionUID = -821826330941829539L; private int infoId; private String infoContent; public int getInfoId() { return infoId; } public void setInfoId(int infoId) { this.infoId = infoId; } public String getInfoContent() { return infoContent; } public void setInfoContent(String infoContent) { this.infoContent = infoContent; } }
November 2, 2010
by Kai Wähner DZone Core CORE
· 79,351 Views · 2 Likes
article thumbnail
Manage Hierarchical Data using Spring, JPA and Aspects
Managing hierarchical data using two dimentional tables is a pain. There are some patterns to reduce this pain. One such solution is described here. This article is about implementing the same using Spring, JPA, Annotations and Aspects. Please go through follow the link to better understand this solution described. The purpose is to come up with a component that will remove the boiler-plate code in the business layer to handle hierarchical data. Summary Create a base class for Entities used to represent Hierarchical data Create annotation classes Code the Aspect that will execute addional steps for managing Hierarchical data. (Heart of the solution) Now the Aspect can be used everywhere Hierarchical data is used. Detail Create base class for Entities used to represent Hierarchical data. The purpose of the super class is to encapsulate all the common attrubutes and operations required for managing hierarchical data in a table. Please note that the class is annotated as @MappedSuperclass. The methods are meant to generate queries required to perform CRUD operations on the Table. Their use will be more clear later in the article when we will revisit HierarchicalEntity. Now any Entity that extends this class will have all the attributes required to manage hierarchical data. import com.es.clms.aspect.HierarchicalEntity; import javax.persistence.EntityListeners; import javax.persistence.MappedSuperclass; @MappedSuperclass @EntityListeners({HierarchicalEntity.class}) public abstract class AbstractHierarchyEntity implements Serializable { protected Long parentId; protected Long lft; protected Long rgt; public String getMaxRightQuery() { return "Select max(e.rgt) from " + this.getClass().getName() + " e"; } public String getQueryForParentRight() { return "Select e.rgt from " + this.getClass().getName() + " e where e.id = ?1"; } public String getDeleteStmt() { return "Delete from " + this.getClass().getName() + " e Where e.lft between ?1 and ?2"; } public String getUpdateStmtForFirst() { return "Update " + this.getClass().getName() + " e set e.lft = e.lft + ?2 Where e.lft >= ?1"; } public String getUpdateStmtForRight() { return "Update " + this.getClass().getName() + " e set e.rgt = e.rgt + ?2 Where e.rgt >= ?1"; } . . .//Getter and setters for all the attributes. } Create annotation classes The following is an annotation class that will be used to annotate the methods that perform CRUD operations on hierarchical data. It is followed by an enum that will decide the type of CRUD operation to be performed. These classes will make more sense after the next section. import java.lang.annotation.ElementType; import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; import java.lang.annotation.Target; @Target(ElementType.METHOD) @Retention(RetentionPolicy.RUNTIME) public @interface HierarchicalOperation { HierarchicalOperationType operationType(); } /** * Enum - Type of CRUD operation. */ public enum HierarchicalOperationType { SAVE, DELETE; } Code the Aspect that will execute addional steps for managing Hierarchical data. HierarchicalEntity is an aspect that performs the additional logic required to manage the hierarchical data as descriped in the article here. This is the first time I have used an Aspect, therefore I am sure that there are better ways to do this. Those of you, who are good at it, please improve this part of code. This class is annotated as @Aspect. The pointcut will intercept any method anotated with HierarchicalOperation and has a input of type AbstractHierarchyEntity. A sample its usage is in next section. operation method is annotated to be executed before the pointcut. Based on the HierarchicalOperationType passed, this method will either execute the additional tasks required to save or delete the hierarchical record. This is where the methods defined in AbstractHierarchyEntity for generating JPA Queries are used. GenericDAOHelper is a utility class for using JPA. import com.es.clms.annotation.HierarchicalOperation; import com.es.clms.annotation.HierarchicalOperationType; import com.es.clms.common.GenericDAOHelper; import com.es.clms.model.AbstractHierarchyEntity; import org.aspectj.lang.JoinPoint; import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Before; import org.aspectj.lang.annotation.Pointcut; import org.springframework.stereotype.Service; import org.springframework.beans.factory.annotation.Autowired; @Aspect @Service("hierarchicalEntity") public class HierarchicalEntity { @Autowired private GenericDAOHelper genericDAOHelper; @Pointcut(value = "execution(@com.es.clms.annotation.HierarchicalOperation * *(..)) " + "&& args(AbstractHierarchyEntity)") private void hierarchicalOps() { } /** * * @param jp * @param hierarchicalOperation */ @Before("hierarchicalOps() && @annotation(hierarchicalOperation) ") public void operation(final JoinPoint jp, final HierarchicalOperation hierarchicalOperation) { if (jp.getArgs().length != 1) { throw new IllegalArgumentException( "Expecting only one parameter of type AbstractHierarchyEntity in " + jp.getSignature()); } if (HierarchicalOperationType.SAVE.equals( hierarchicalOperation.operationType())) { save(jp); } else if (HierarchicalOperationType.DELETE.equals( hierarchicalOperation.operationType())) { delete(jp); } } /** * * @param jp */ private void save(JoinPoint jp) { AbstractHierarchyEntity entity = (AbstractHierarchyEntity) jp.getArgs()[0]; if (entity == null) return; if (entity.getParentId() == null) { Long maxRight = (Long) genericDAOHelper.executeSingleResultQuery( entity.getMaxRightQuery()); if (maxRight == null) { maxRight = 0L; } entity.setLft(maxRight + 1); entity.setRgt(maxRight + 2); } else { Long parentRight = (Long) genericDAOHelper.executeSingleResultQuery( entity.getQueryForParentRight(), entity.getParentId()); entity.setLft(parentRight); entity.setRgt(parentRight + 1); genericDAOHelper.executeUpdate( entity.getUpdateStmtForFirst(), parentRight, 2L); genericDAOHelper.executeUpdate( entity.getUpdateStmtForRight(), parentRight, 2L); } } /** * * @param jp */ private void delete(JoinPoint jp) { AbstractHierarchyEntity entity = (AbstractHierarchyEntity) jp.getArgs()[0]; genericDAOHelper.executeUpdate( entity.getDeleteStmt(), entity.getLft(), entity.getRgt()); Long width = (entity.getRgt() - entity.getLft()) + 1; genericDAOHelper.executeUpdate( entity.getUpdateStmtForFirst(), entity.getRgt(), width * (-1)); genericDAOHelper.executeUpdate( entity.getUpdateStmtForRight(), entity.getRgt(), width * (-1)); } } Sample Usage From this point on you don't have to worry about the additional tasks required for managing the data. Just use the HierarchicalOperation anotation with appropriate HierarchicalOperationType. Below is a sample use of the code developed so far. @HierarchicalOperation(operationType = HierarchicalOperationType.SAVE) public long save(VariableGroup group) { entityManager.persist(group); return group.getId(); } @HierarchicalOperation(operationType = HierarchicalOperationType.DELETE) public void delete(VariableGroup group) { entityManager.remove(entityManager.merge(group)); } http://rajeshkilango.blogspot.com/2010/10/manage-hierarchical-data-using-spring.html
October 21, 2010
by Rajesh Ilango
· 16,995 Views
article thumbnail
Practical PHP Patterns: Record Set
This is the last article from the Practical PHP Patterns series. Stay tuned on css.dzone.com for the new series, Practical PHP Testing Patterns. The RecordSet pattern's goal is to represent a set of relational database rows, with the main purpose of giving access to their values (a data structure), sometimes with the possibility of modification (via single Row Data Gateway instances). Despite the term set, the rows have usually a defined order. The intent is ultimately representing a result from an SQL query with an object, to gain the usual advantages of objects over scalars and functions: it can be passed around but maintain its encapsulated behavior, injected, mocked, wrapped and so on. A Record Set is usually not mocked if it is provided by an external extension or library, because instancing a real one working on a lightweight database such as Sqlite is used in substitution for the real one. Especially in the PHP world, this solution is fast enough to become a standard. Today, Record Set is less used on the client side to favor Object-Relational Mapping approaches, which make some kind of translation over the raw rows (what an horrible pun). Record Set instead maintains by definition a one-to-one relationship with the table rows, and it is still diffused in ORM internals (such as Doctrine's own code) or in applications that call PDO directly. It is a fairly basic pattern, but given that a vast part of legacy PHP applications still uses mysql_query()... Interesting things happen when... Interesting leverages of this pattern happen when someone stays in the middle between the Record Set creation and the user interface, with the goal of modifying or decorating it. The UI can then explore the RecordSet and automatically generate itself, in a form of scaffolding. Continuing on this line of thought, UI components can edit the RecordSet without knowing the model which it refers to, via building forms driven by the Record Set metadata. This solution is diffused, but it does not scale to Domain Models with a level of complexity greater than plain arrays. Of course, the Record Set may also encapsulate business logic as a low-cost form of Domain Model. In this case, the ability to unlink it from the database connection is important to its serializability and ease of testing. Examples PDOStatement represents both a SQL query and a RecordSet implementation, after it has been executed. When fetching all the results, it returns an array. In other languages Record Sets are more evolved and can for example be used to navigate a table and modify only certain records (via their annexed Row Data Gateway). PDOStatement is used only for reading. If you want further functionalities (which obviously depends on your domain), you should create your own Record Set accordingly. It is probably best to wrap the PDOStatement because extending it is out of question due to the instantiation not being under our control. connection = $connection; } public function getTweetsRecordSet($username) { /** * @var PDOStatement this is a Record Set */ $stmt = $this->connection->prepare('SELECT * FROM tweets WHERE username = :username'); $stmt->bindValue(':username', $username, PDO::PARAM_STR); $stmt->execute(); return $stmt; } } $pdo = new PDO('sqlite::memory:'); $pdo->exec('CREATE TABLE tweets (id INT NOT NULL, username VARCHAR(255) NOT NULL, text VARCHAR(255) NOT NULL, PRIMARY KEY(id))'); $pdo->exec('INSERT INTO tweets (id, username, text) VALUES (42, "giorgiosironi", "Practical PHP Patterns has come to an end")'); $pdo->exec('INSERT INTO tweets (id, username, text) VALUES (43, "giorgiosironi", "Cool series: will continue as Practical PHP Testing Patterns")'); // client code $table = new TweetsTable($pdo); $recordSet = $table->getTweetsRecordSet('giorgiosironi'); while ($row = $recordSet->fetch()) { var_dump($row['text']); }
October 18, 2010
by Giorgio Sironi
· 3,318 Views
article thumbnail
Enum tricks: hierarchical data structure
Java enums are typically used to hold array like data. This tip shows how to use enum for hierarchical structures. Motivation Once upon a time I wanted to create enum that contains various operating system, i.e. public enum OsType { WindowsNTWorkstation, WindowsNTServer, Windows2000Server, Windows2000Workstation, WindowsXp, WindowsVista, Windows7, Windows95, Windows98, Fedora, Ubuntu, Knopix, SunOs, HpUx, I did not like this structure because I'd like to see a group of WindowsNT that contains WinNTWorkstation and WindNT server. All windows versions should be in super group of "windows". Fedora, Knopix and Ubuntu are distributions of Linux. All Linux distributions together with SunOs and HpUx are Unix systems. All Windows systems have common properties. The same is about Unix systems. And I hate copy/paste programming. Solutions As always there are several solutions. Class per OS Solution The obvious solution here is to create separate classes per operating system and abstract classes that represent OS groups. For example class Fedora extends class Linux that extends class Unix that extends class OperatingSystem. We can enjoy all advantages of inheritance, so all common properties of Windows OS are stored in class Windows and can be overridden by its subclasses. But now we cannot see all operating systems together, iterate over them etc., i.e. very useful features of Java enum are missing. No problem! Now we can create enum like previous that holds custom field of type Class: public enum OsType { WindowsNTWorkstation(WindowsNTWorkstation.class), WindowsNTServer(WindowsNTServer.class), Windows2000Server(Windows2000Server.class), Windows2000Workstation(Windows2000Workstation.class), WindowsXp(WindowsXp.class), WindowsVista(WindowsVista.class), Windows7(Windows7.class), Windows95(Windows7.class), Windows98(Windows98.class), Fedora(Fedora.class), Ubuntu(Ubuntu.class), Knopix(Knopix.class), SunOs(SunOs.class), HpUx(HpUx.class), ; private Class clazz; OsType(Class clazz) { this.clazz = clazz; } } This solution is better but it still has disadvantages: Implementation of method that retrieves all "children" of specific OS (for example all Linux distributions) is hard and ineffective. Grouping is separate from enum. The solution is very verbose: each OS is represented by its own class even if the class has nothing to override. Hierarchical Enum To create hierarchy using enum we need custom field "parent" that is initialized by constructor: public enum OsType { OS(null), Windows(OS), WindowsNT(Windows), WindowsNTWorkstation(WindowsNT), WindowsNTServer(WindowsNT), Windows2000(Windows), Windows2000Server(Windows2000), Windows2000Workstation(Windows2000), WindowsXp(Windows), WindowsVista(Windows), Windows7(Windows), Windows95(Windows), Windows98(Windows), Unix(OS) { @Override public boolean supportsXWindows() { return true; } }, Linux(Unix), AIX(Unix), HpUx(Unix), SunOs(Unix), ; private OsType parent = null; private OsType(OsType parent) { this.parent = parent; } } This structure allows implementation of method "is" that works like operator instanceof for classes and interfaces. For example Windows2000 is Windows, Fedora is Linux, Windows is not Unix etc. public boolean is(OsType other) { if (other == null) { return false; } for (OsType t = this; t != null; t = t.parent) { if (other == t) { return true; } } return false; } Sometimes we need a method that returns all "children" of current nodes, e.g. all Linux systems or all variants of Windows2000. The easiest way to implement this is to hold collection of children per element and fill it from constructor: private List children = new ArrayList(); private OsType(OsType parent) { this.parent = parent; if (this.parent != null) { this.parent.addChild(this); } } Now method "children()" that returns direct node's children is trivial: public OsType[] children() { return children.toArray(new OsType[children.size()]); } It is not hard to implement recursive method "allChildren()" that returns all children of current node (see full source code). But hierarchy term is always accompanied by inheritance that allows overriding methods of parent. This is the basic feature of classes in all object oriented languages. Is it possible to implement a kind of inheritance relationship for elements of one enum? Overriding parent's method Unix systems support X Window graphical environment. MS Windows does not. We would like to be able to ask OS whether it supports X Window. We can define boolean flag "supportsX" and boolean method public boolean supportsX() {return suppotsX;} Now we have to add yet another argument to OsType constructor and pass true/false for each element of the enum. But it is too verbose. Is it possible to say that Unix supports X, Windows does not support X and be sure that Fedora's supportX() returns true while Winddows95's supportX() returns false? The implementation is pretty simple. First for simplicity let's say that X Window is supported by all Unix systems and is not supported by others. So, we can implement method supportsXWindowSystem() at enum level as following: public boolean supportsXWindowSystem() { return false; } Now we have to override it for all Unix systems. To implement this we change the default implementation to following: public boolean supportsXWindowSystem() { return parent == null ? false : parent.supportsXWindowSystem(); } The method of first parent in hierarchy that implements the method will be used. If no one of parents and parents of parents does not implement this method itself we call method of root element. Now we can say the following: ... Unix(OS) { @Override public boolean supportsXWindowSystem() { return true; } }, Linux(Unix), AIX(Unix), HpUx(Unix), SunOs(Unix), ... The method is overridden for Unix element only and all its children will use this method. The problem is solved. We got enum based hierarchical polymorphic structure! We can implement method in base element (using it like a super class) and then override it in any element we want. The only disadvantage of this solution is that now we have to create similar implementation for each method we add to this enum and for all other enums that hold hierarchical structure. Conclusions Although we are regular to use enums as some kind of static arrays they also can be used to present hierarchical tree-like data structures where each node can find its parent, its children and even inherit and override parent's method almost exactly as we do with class inheritance. Acknowledgments First version of this article has been written in my blog. Method supportsXWindowSystem() there was implemented using reflection. 4 guys discussed this solution and suggested me to simplify it. I would like to thank Todo, Anton Dumler, Nick and Johannes Schneider that helped me to improve the article.
October 18, 2010
by Alexander Radzin
· 80,832 Views · 2 Likes
article thumbnail
Enum Tricks: Customized valueOf
When I am writing enumerations I very often found myself implementing a static method similar to the standard enum’s valueOf() but based on field rather than name: public static TestOne valueOfDescription(String description) { for (TestOne v : values()) { if (v.description.equals(description)) { return v; } } throw new IllegalArgumentException( "No enum const " + TestOne.class + "@description." + description); } Where “description” is yet another String field in my enum. And I am not alone. See this article for example. Obviously this method is very ineffective. Every time it is invoked it iterates over all members of the enum. Here is the improved version that uses a cache: private static Map map = null; public static TestTwo valueOfDescription(String description) { synchronized(TestTwo.class) { if (map == null) { map = new HashMap(); for (TestTwo v : values()) { map.put(v.description, v); } } } TestTwo result = map.get(description); if (result == null) { throw new IllegalArgumentException( "No enum const " + TestTwo.class + "@description." + description); } return result; } It is fine if we have only one enum and only one custom field that we use to find the enum value. But if we have 20 enums, and each has 3 such fields, then the code will be very verbose. As I dislike copy/paste programming I have implemented a utility that helps to create such methods. I called this utility class ValueOf. It has 2 public methods: public static , V> T valueOf(Class enumType, String fieldName, V value); which finds the required field in specified enum. It is implemented utilizing reflection and uses a hash table initialized during the first call for better performance. The other overridden valueOf() looks like: public static > T valueOf(Class enumType, Comparable comparable); This method does not cache results, so it iterates over enum members on each invocation. But it is more universal: you can implement comparable as you want, so this method may find enum members using more complicated criteria. Full code with examples and JUnit test case are available here. Conclusions Java Enums provide the ability to locate enum members by name. This article describes a utility that makes it easy to locate enum members by any other field.
October 16, 2010
by Alexander Radzin
· 80,150 Views
article thumbnail
Naming Conventions for Parameterized Types
Parameterized types - the <> expressions that can be used in Java as of JDK 5 are not just for collections. I find myself frequently using them in APIs I design. They really do let you write things which are more generic in the non-Java sense of the word - and the result is more reusable code, which means less code overall, which means fewer bugs and things to test. The verbosity, and some of the weirdness of type-erasure are less than ideal, but used right, the benefits are worth the complexity. The standard (and somewhere recommended) naming convention for parameterized types is to use a single-letter name. That works fine in signatures that have only one such type. But in practice, single-letter names make code less self-describing, and if you're defining a class with more than one parameterized type, it can be confusing and hard to read. People other than me will have to call, understand and maintain my code - the more self-describing I can make it, the better. So I am looking for a naming convention that makes it obvious that something is a parameterized type, but allows for descriptive names. I am wondering if anybody else has run into this problem, and if there is any emerging consensus on naming generics. Do you work on a project that uses generics a lot? If so, what do you do? Here's an example. At the moment, I'm writing a generic (in both senses) class which simply limits the number of threads which can access some resource. It's basically a wrapper around a Semaphore which uses a Runnable-like object to ensure that the Semaphore is accessed correctly, and does some non-blocking statistic gathering about thread contention. So to access the scarce resource, you pass in a ResourceAccessor: public interface ResourceAccessor { public Result run (ProtectedResource resource, Argument argument); } The problem is that, when somebody looks at this interface, they will instantly get the idea that there are really classes they need to go find, which are called ProtectedResource, Argument and Result - and of course, no such classes exist - these are just names for generic types. The standard-naming-convention is worse: public interface ResourceAccessor { public S run (T resource, R argument); } Here, nobody could possibly figure out what on earth this class is for without extensive documentation - this is a really horrible idea. So I've concluded that the standard recommendations for generic type names are simply wrong for any non-trivial usage (I.e. Collection is fine, since there is one type and Collections are well-understood). You simply can't do this on a non-collection code structure you have invented, or people will just be confused and not use it. The best suggestion I've heard thus far is using $ as a prefix: public interface ResourceAccessor <$ProtectedResource, $Argument, $Result> { public $Result run ($ProtectedResource resource, $Argument argument); } I don't find this pretty, but I don't have any better ideas, and at least it makes it crystal-clear that there is something different about these names. Any thoughts? What do you do in this situation?
September 20, 2010
by Tim Boudreau
· 17,828 Views
article thumbnail
Commons Lang 3 -- Improved and Powerful StringEscapeUtils
In the first and second parts of this series I talked about some of the new features like enum and concurrency support that have been added in commons-lang 3. In this article, I am going to talk about a new package 'org.apache.commons.lang3.text.translate' which has been added in commons-lang 3. This package is added to fix problems in the design and implementation of the StringEscapeUtils class which exists in versions prior to 3.0. To make it clearer, let's first talk about the purpose of StringEscapeUtils class and the problems it had prior to version 3. Purpose of StringEscapeUtils StringEscapeUtils is a utility class which escapes and unescapes String for Java, JavaScript, HTML, XML, and SQL. For example, @Test public void test_StringEscapeUtils() { assertEquals("\\\\\\n\\t\\r", StringEscapeUtils.escapeJava("\\\n\t\r")); // escapes the Java String assertEquals("\\\n\t\r",StringEscapeUtils.unescapeJava("\\\\\\n\\t\\r")); //unescapes the Java String assertEquals("I didn\\'t say \\\"you to run!\\\"",StringEscapeUtils.escapeJavaScript("I didn't say \"you to run!\""));//escapes the Javascript assertEquals("<xml>", StringEscapeUtils.escapeXml(""));//escapes the xml } Problems with StringEscapeUtils There were a lot of problems in the StringEscapeUtils implementation prior to version3. Some of these were: The implementation was not extensible. Let's take an example of escapeJava, suppose we want to add support in the escapeJava method that it should start escaping single quotes. To add such support we would have to change the existing class code and another if condition which if satisfied will escape single quotes. So, the API was breaking the open-closed principle i.e. a class should be open for extension and closed for modification. It was not symmetric i.e. original should be equal to unescape(escape(original)) but it was not the case. StringEscapeUtils.escapeHtml() escapes multibyte characters like Chinese, Japanese etc. Issue 339 @Test public void testEscapeHiragana() { // Some random Japanese unicode characters String original = "\u304B\u304C\u3068"; String escaped = StringEscapeUtils.escapeHtml(original); assertEquals(original, escaped); } StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into 2 characters. Issue 480 @Test public void testEscapeHtmlHighUnicode() throws java.io.UnsupportedEncodingException { byte[] data = new byte[] { (byte) 0xF0, (byte) 0x9D, (byte) 0x8D,(byte) 0xA2 }; String original = new String(data, "UTF8"); String escaped = StringEscapeUtils.escapeHtml(original); assertEquals(original, escaped); } StringEscaper.escapeXml() escapes characters > 0x7f . Issue 66 @Test public void shouldNotEscapeValuesGreaterThan0x7f() { assertEquals("XML should not escape >0x7f values", "\u00A1",StringEscapeUtils.escapeXml("\u00A1")); } Solution -- Rewritten StringEscapeUtils In version 3.0, StringEscapeUtils is completely rewritten to fix all the bugs associated with this class and to provide a way for the user to customize the behavior of its methods. They have moved all the logic present in the StringEscapeUtils to the classes in the package 'org.apache.commons.lang3.text.translate'. Let's take an example of escapeJava function in StringEscapeUtils, escapeJava function does not contain any business logic, it just calls the translate method on CharSequenceTranslator reference. What they did can be best understood by looking at the code below public static final CharSequenceTranslator ESCAPE_JAVA = new AggregateTranslator(new LookupTranslator( new String[][] { {"\"", "\\\""}, {"\\", "\\\\"}, }),new LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_ESCAPE()),UnicodeEscaper.outsideOf(32, 0x7f)); and in the escapeJava method public static final String escapeJava(String input) { return ESCAPE_JAVA.translate(input); } A constant of type CharSequenceTranslator was assigned an AggregateTranslator object. AggregateTranslator can take an array of translators, and it iterates over each of them. The LookupTranslator replaces the string at zeroth index with the string at the first index. UnicodeEscaper translates values outside the given range to unicode values. As you can see, you can very easily write your own escape methods. For example, if you want to add the support of escaping &, you can do it like this public static final CharSequenceTranslator ESCAPE_JAVA = new LookupTranslator( new String[][] { {"\"", "\\\""}, {"\\", "\\\\"}, }).with(new LookupTranslator( new String[][]{ {"&", "&"}, {"<", "<"} )).with( new LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_ESCAPE()) ).with( UnicodeEscaper.outsideOf(32, 0x7f) ); StringEscapeUtils.escapeSql has been removed from the API as it was misleading developers to not use PreparedStatement.This method was not of much use as it was only escaping single quotes.
September 17, 2010
by Shekhar Gulati
· 71,149 Views · 2 Likes
article thumbnail
Jetty-maven-plugin: Running a Webapp with a DataSource and Security
this post describes how to configure the jetty-maven-plugin and the jetty servlet container to run a web application that uses a data source and requires users to log in, which are the basic requirements of most web applications. i use jetty in development because it’s fast and easy to work with. why jetty? well, because it’s much faster then the websphere as i normally use and it really well supports fast (or shall i say agile? ) development thanks to its fast turnaround. and because it’s simply cool to type bash$ svn checkout http://example.com/repo/trunk/mywebapp bash$ cd mywebapp bash$ mvn jetty:run bash$ firefox http://localhost:8080/mywebapp and to be able to immediatelly log into and interact with the application. however it should be noted that jetty isn’t a full-featured javaee server and thus may not be always usable. project setup general configuration you need to add the jetty plugin to your pom.xml : 4.0.0 com.example mywebapp war ... ... org.mortbay.jetty maven-jetty-plugin 6.1.0 3 ... ... ... as you can see, i’m using jetty 6.1.0. defining a datasource let’s assume that the application uses a datasource configured at the server and accesses it normally via jndi. then we must define a reference to the data source in src/main/webapp/ web-inf/web.xml : ... ... ... jdbc/lmsdb javax.sql.datasource container shareable next we need to describe the datasource to jetty. there are multiple ways to do that, i’ve chosen to do so in src/main/webapp/ web-inf/jetty-env.xml : jdbc/lmsdb lmsdb myuser secret db.toronto.ca.ibm.com 3711 notice that the class used is db2simpledatasource and not a jdbc driver. that is, of course, because we need a datasource, not a driver. the jetty wiki pages also contain examples of datasource configuration for other dbs . finally we must make the corresponding jdbc implementation available to jetty by adding it to the plugin’s dependencies in the pom.xml : org.mortbay.jetty maven-jetty-plugin 6.1.0 <... com.ibm.db2 db2jcc 9.7 jar system ${basedir}/../lms.sharedlibraries/db2/db2jcc.jar com.ibm.db2 db2jcc_license_cisuz 9.7 jar system ${basedir}/../lms.sharedlibraries/db2/db2jcc_license_cisuz.jar please do not scorn me for using system-scoped dependencies , sometimes that is unfortunatelly the most feasible way. enabling security and configuring an authentication mechanism we would like to limit access to the application only to the authenticated users in the admin role with the exception of pages under public/. therefore we declare the appropriate security constraints in web.xml: ... authorizedusers all urls /* admin publicaccess public pages /public/* basic learning@ibm mini person feed management administrator access admin ... beware that jetty doesn’t support https out of the box and thus if you will add the data constraint confidential to any resource, you will automatically get http 403 forbidden no matter what you do. that’s why i’ve commented it out above. it is possible to enable ssl in jetty but i didn’t want to bother with certificate generation etc. next we need to tell jetty how to authenticate users. this is done via realms and we will use the simplest, file-based one. again there are multiple ways to configure it, for example in the pom.xml : org.mortbay.jetty maven-jetty-plugin 6.1.0 3 learning@ibm mini person feed management src/test/resources/jetty-users.properties ... the name must match exactly the realm-name in web.xml. you then define the users and their passwords and roles in the declared file, in this case in src/test/resources/ jetty-users.properties : user=psw,admin the format of the file is username=password[,role1,role2,...]. when you download jetty, you will find a fine example of using jaas with a file-based back-end for authentication and authorization under examples/test-jaas-webapp (invoke mvn jetty:run from the folder and go to http://localhost:8080/jetty-test-jaas/). however it seems that jaas causes an additional overhead visible as a few-seconds delay when starting the server so it might be preferrable not to use it. conclusion with jetty it’s easy to enable security and create a data source, which are the basic requirements of most web applications. anybody can then very easily run the application to test and develop it. development is where jetty really shines provided that you don’t need any feature it doesn’t have. when troubleshooting, you may want to tell jetty to log at the debug level with mvn -ddebug .. or to log requests , which can be also configured in the jetty-env.xml. beware that this post describes configuration for jetty 6.1.0. it can be different in other versions and it certainly is different in jetty 7. from http://theholyjava.wordpress.com/2010/09/10/jetty-maven-plugin-running-a-webapp-with-a-datasource-and-security/
September 13, 2010
by Jakub Holý
· 22,986 Views
article thumbnail
Java: Overriding Getters and Setters Design
Why do we keep instance variables private? We don’t want other classes to depend on them. Moreover it gives the flexibility to change a variable’s type or implementation on a whim or an impulse. Why, then programmers automatically add or override getters and setters to their objects, exposing their private variables as if they were public? Accessor methods Accessors (also known as getters and setters) are methods that let you read and write the value of an instance variable of an object. public class AccessorExample { private String attribute; public String getAttribute() { return attribute; } public void setAttribute(String attribute) { this.attribute = attribute; } } Why Accessors? There are actually many good reasons to consider using accessors rather than directly exposing fields of a class Getter and Setter make API more stable. For instance, consider a field public in a class which is accessed by other classes. Now, later on, you want to add any extra logic while getting and setting the variable. This will impact the existing client that uses the API. So any changes to this public field will require a change to each class that refers it. On the contrary, with accessor methods, one can easily add some logic like cache some data, lazily initialize it later. Moreover, one can fire a property changed event if the new value is different from the previous value. All this will be seamless to the class that gets the value using accessor method. Should I have Accessor Methods for all my fields? Fields can be declared public for package-private or private nested class. Exposing fields in these classes produces less visual clutter compare to accessor-method approach, both in the class definition and in the client code that uses it. If a class is package-private or is a private nested class, there is nothing inherently wrong with exposing its data fields—assuming they do an adequate job of describing the abstraction provided by the class. Such code is restricted to the package where the class is declared, while the client code is tied to class internal representation. We can change it without modifying any code outside that package. Moreover, in the case of a private nested class, the scope of the change is further restricted to the enclosing class. Another example of a design that uses public fields is JavaSpace entry objects. Ken Arnold described the process they went through to decide to make those fields public instead of private with get and set methods here Now this sometimes makes people uncomfortable because they've been told not to have public fields; that public fields are bad. And often, people interpret those things religiously. But we're not a very religious bunch. Rules have reasons. And the reason for the private data rule doesn't apply in this particular case. It is a rare exception to the rule. I also tell people not to put public fields in their objects, but exceptions exist. This is an exception to the rule, because it is simpler and safer to just say it is a field. We sat back and asked: Why is the rule thus? Does it apply? In this case it doesn't. Private fields + Public accessors == encapsulation Consider the example below public class A { public int a; } Usually, this is considered bad coding practice as it violates encapsulation. The alternate approach is public class A { private int a; public void setA(int a) { this.a =a; } public int getA() { return this.a; } } It is argued that this encapsulates the attribute. Now is this really encapsulation? The fact is, Getters/setters have nothing to do with encapsulation. Here the data isn't more hidden or encapsulated than it was in a public field. Other objects still have intimate knowledge of the internals of the class. Changes made to the class might ripple out and enforce changes in dependent classes. Getter and setter in this way are generally breaking encapsulation. A truly well-encapsulated class has no setters and preferably no getters either. Rather than asking a class for some data and then compute something with it, the class should be responsible for computing something with its data and then return the result. Consider an example below, public class Screens { private Map screens = new HashMap(); public Map getScreens() { return screens; } public void setScreens(Map screens) { this.screens = screens; } // remaining code here } If we need to get a particular screen, we do code like below, Screen s = (Screen)screens.get(screenId); There are things worth noticing here.... The client needs to get an Object from the Map and casting it to the right type. Moreover, the worst is that any client of the Map has the power to clear it which may not be the case we usually want. An alternative implementation of the same logic is: public class Screens { private Map screens = new HashMap(); public Screen getById(String id) { return (Screen) screens.get(id); } // remaining code here } Here the Map instance and the interface at the boundary (Map) are hidden. Getters and Setters are highly Overused Creating private fields and then using the IDE to automatically generate getters and setters for all these fields is almost as bad as using public fields. One reason for the overuse is that in an IDE it’s just now a matter of few clicks to create these accessors. The completely meaningless getter/setter code is at times longer than the real logic in a class and you will read these functions many times even if you don't want to. All fields should be kept private, but with setters only when they make sense which makes object Immutable. Adding an unnecessary getter reveals internal structure, which is an opportunity for increased coupling. To avoid this, every time before adding the accessor, we should analyse if we can encapsulate the behaviour instead. Let’s take another example, public class Money { private double amount; public double getAmount() { return amount; } public void setAmount(double amount) { this.amount = amount; } //client Money pocketMoney = new Money(); pocketMoney.setAmount(15d); double amount = pocketMoney.getAmount(); // we know its double pocketMoney.setAmount(amount + 10d); } With the above logic, later on, if we assume that double is not a right type to use and should use BigDecimal instead, then the existing client that uses this class also breaks. Let’s restructure the above example, public class Money { private BigDecimal amount; public Money(String amount) { this.amount = new BigDecimal(amount); } public void add(Money toAdd) { amount = amount.add(toAdd.amount); } // client Money balance1 = new Money("10.0"); Money balance2 = new Money("6.0"); balance1.add(balance2); } Now instead of asking for a value, the class has a responsibility to increase its own value. With this approach, the change request for any other datatype in future requires no change in the client code. Here not only the data is encapsulated but also the data which is stored, or even the fact that it exists at all. Conclusions Use of accessors to restrict direct access to field variable is preferred over the use of public fields, however, making getters and setter for each and every field is overkill. It also depends on the situation though, sometimes you just want a dumb data object. Accessors should be added to a field where they're really required. A class should expose larger behaviour which happens to use its state, rather than a repository of state to be manipulated by other classes. More Reading http://c2.com/cgi/wiki?TellDontAsk http://c2.com/cgi/wiki?AccessorsAreEvil Effective Java - See more at http://muhammadkhojaye.blogspot.co.uk/2010/10/getter-setter-to-use-or-not-to-use.html
September 10, 2010
by Muhammad Ali Khojaye
· 98,195 Views · 1 Like
article thumbnail
Practical PHP Patterns: Data Transfer Object
Information can travel in very different ways between the parts of an applications, which could be tricky when these parts are distributed on different tiers, or times. The Data Transfer Object pattern prescribes to use a first-class citizen like a class to model the data used in the communication. This pattern was originally meant to aid logic distribution: coarse grained interfaces, which are ideal for remote interaction, must return more data in each call to reduce the number of messages sent over the network and their overhead. This kind of objects is built with the goal of be easy to serialize and send over the wire. In the PHP world, the communication is usually either from server to server, or even targets the same server: PHP is a lot different from other languages which are used for application distribution. Commonly the serialization and unserialization of objects are accomplished by the same codebase, which has a short life (the time of an HTTP request) and stores data with the serialization mechanism to maintain state between different requests. As a result, the usual dependencies discourse, which prescribes to use a Data Mapper between the Domain Model and the DTOs, do not apply; to the point that sometimes even domain model objects are serialized directly (an ORM like Doctrine 2 allows you to do so). There are more differences with classic Java DTOs which we'll see in this article. Implementation The definition of Data Transfer Object talks about communication between different processes: subsequent executions of the same PHP script are indeed different processes (or, when the same process is reused, it does not share any variable space with the previous executions). PHP is peculiar also from the implementation point of view: a DTO may not even be an object, since an ordinary or multidimensional array will do the same job in many cases. However, an object implementation is necessary to take advantage of object handlers, where the structure of data is not hierarchical but involves a connected graph of objects or recursive data structures. This implementation choice is much more clear than using arrays and variable references, which can be tricky in PHP. Basically, a DTO is the object which we have been told to never write: it has getters and setters to expose allof its properties, while providing nearly no encapsulation nor business logic. It's a data structure, like a Value Object (which was its name in certain Java literature). But it is not immutable nor it carries the semantic meaning of Value Objects. Use cases Data Transfer Objects come handy in many use cases. In their simplest implementation, they are used for returning multiple values from a method. Their further evolution involves then serialization mechanisms; of course both the provenience and the destination of a Data Transfer Object need to be a PHP environment, a fact which opens different scenarios from classical Java distribution ones, that involve clients well. For example a Data Transfer object, easily serializable, can be stored in caches (memcache) or sessions ($_SESSION). Other use cases regard databases: according to Fowler, Record Sets like PDOStatement can be defined as the Data Transfer Objects of relational databases. There are even more scenarios for Data Transfer Object, like clients which return a serialized data structure, or their usage for breaking dependencies between different layers: a Controller may populate a DTO and pass it to the View. Serialization As you may have experience while using var_dump() in PHP, the objects of a Domain Model are usually interconnected in a complex graph; DTOs isolate a subset of the graph and make it easy to serialize, even by omitting part of the information. Thus, serialization of domain object needs some complex infrastructure (lazy loading proxies which are discarded on serialization and must be re-initialized during the merge with a current object graph); sometimes is by far easier to extract data in a DTO, especially when you do not have particular libraries at your disposal. Once you have an isolated object graph, PHP will handle serialization by itself, even without marker interfaces; it will simply include all the public, protected and private properties (this behavior can be modified by specifying a __sleep() method). Example This running code shows a small domain model composed of the User and Group classes, and how a DTO for the User class allows to serialize a User without pulling in its Groups, for example for quickly storing it in a cache. _name; } public function setName($name) { $this->_name = $name; } /** * @return string */ public function getRole() { return $this->_role; } public function setRole($role) { $this->_role = $role; } public function addGroup(Group $group) { $this->_group = $group; } public function getGroups() { return $this->_groups; } } /** * Another Domain Model class. */ class Group { private $_name; /** * @return string */ public function getName() { return $this->_name; } public function setName($name) { $this->_name = $name; } } /** * The Data Transfer Object for User. * It stores the mandatory data for a particular use case - for example ignoring the groups, * and ensuring easy serialization. */ class UserDTO { /** * In more complex implementations, the population of the DTO can be the responsibilty * of an Assembler object, which would also break any dependency between User and UserDTO. */ public function __construct(User $user) { $this->_name = $user->getName(); $this->_role = $user->getRole(); } public function getName() { return $this->_name; } public function getRole() { return $this->_role; } // there are no setters because this use cases does not require modification of data // however, in general DTOs do not need to be immutable. } // client code $user = new User(); $user->setName('Giorgio'); $user->setRole('Author'); $user->addGroup(new Group('Authors')); $user->addGroup(new Group('Editors')); // many more groups $dto = new UserDTO($user); // this value is what will be stored in the session, or in a cache... var_dump(serialize($dto));
August 12, 2010
by Giorgio Sironi
· 95,325 Views · 5 Likes
article thumbnail
Code Generation With Xtext
Recently I attended a local rheinJUG meeting in Düsseldorf. While the topic of the session was Eclipse e4, the night’s sponsor itemis provided some handouts on Xtext which got me very interested. The reason is that currently at work we are developing a mobile Java application (J9, CDC/Foundation 1.1 on Windows CE6) for which we needed an easy to use and reliable way for configuring navigation through the application. In a previous iteration we had – mostly because of time constraints – hard coded most of the navigational paths, but this time the app is more complex and doing that again was not really an option. First we thought about an XML based configuration, but this seemed to be a hassle to write (and read) and also would mean we would have to pay the price of parsing it on every application startup. Enter Xtext: An Eclipse based framework/library for building text based DSLs. In short, you just provide a grammar description of a new DSL to suit your needs and with – literally – just a few mouse clicks you are provided with a content-assist, syntax-highlight, outline-view-enabled Eclipse editor and optionally a code generator based on that language. Getting started: Sample Grammar There is a nice tutorial provided as part of the Xtext documentation, but I believe it might be beneficial to provide another example of how to put a DSL to good use. I will not go into every step in great detail, because setting up Xtext is Eclipse 3.6 Helios is just a matter of putting an Update Site URL in, and the New Project wizard provided makes the initial setup a snap. I assume, you have already set up Eclipse and Xtext and created a new Xtext project including a generator project (activate the corresponding checkbox when going through the wizard). In this post I am assuming a project name of com.danielschneller.navi.dsl and a file extension of .navi. When finished we will have the infrastructure ready for editing, parsing and generating code based on files like these: navigation rules for MyApplication mappings { map permission AdminPermission to "privAdmin" map permission DataAccessPermission to "privData" map coordinate Login to "com.danielschneller.myapp.gui.login.LoginController" in "com.danielschneller.myapp.login" map coordinate LoginFailed to "com.danielschneller.myapp.gui.login.LoginFailedController" in "com.danielschneller.myapp.login" map coordinate MainMenu to "com.danielschneller.myapp.gui.menu.MainMenuController" in "com.danielschneller.myapp.menu" map coordinate UserAdministration to "com.danielschneller.myapp.gui.admin.UserAdminController" in "com.danielschneller.myapp.admin" map coordinate DataLookup to "com.danielschneller.myapp.gui.lookup.LookupController" in "com.danielschneller.myapp.lookup" } navigations { define navigation USER_LOGON_FAILED define navigation USER_LOGON_SUCCESS define navigation OK define navigation BACK define navigation ADMIN define navigation DATA_LOOKUP } navrules { from Login on navigation USER_LOGON_FAILED go to LoginFailed on navigation USER_LOGON_SUCCESS go to MainMenu from LoginFailed on navigation OK go to Login from MainMenu on navigation ADMIN go to UserAdministration with AdminPermission on navigation DATA_LOOKUP go to DataLookup with DataAccessPermission from UserAdministration on navigation BACK go to MainMenu from DataLookup on navigation BACK go to MainMenu } As you can see it is a nice little language for defining coordinates in an application, meaning a specific GUI for a certain task and the possible navigation paths between them. Optionally a navigation path can be tagged to require one or more permissions to work. So for example one possible navigation path shown in the above sample is from the applications main menu, identified by the identifier MainMenu and represented in code by the com.danielschneller.myapp.gui.menu.MainMenuController class in the com.danielschneller.myapp.menu OSGi bundle to a GUI identified as DataLookup, implemented by com.danielschneller.myapp.gui.lookup.LookupController in the com.danielschneller.myapp.lookup bundle. For this path to be taken, the application must request the DataLookup navigation path and the currently logged in user be assigned the DataAccessPermission. What exactly that means is not the focus of this tutorial, suffice it to say that we somehow need to get the information contained in this specialized language into our Java application in some shape or form that can be evaluated at runtime. In the following example all information will be transformed into a HashMap based data structure. For our little mobile application this has several advantages over the XML option mentioned earlier: No XML parsing necessary on application startup, saving some performance Validation of the navigation rules ahead of time, preventing parse errors at runtime No libraries needed to access the information – by putting everything in a simple HashMap we do not have to rely on any non-standard classes whatsoever First thing I did when I started with Xtext was define a sample input file such as the one above. Then – following its general structure – I began to extract a formal grammar for it. Of course, the first draft of the sample data was not perfect, over the course of a few iterations I refined some of the syntax, but in the end this is the grammar definition I came up with. It is heavily commented to allow you to copy it out and still leave the documentation intact: grammar com.danielschneller.navi.NavigationRules with org.eclipse.xtext.common.Terminals generate navigationRules "http://com.danielschneller/fw/funkmde/navi/NavigationRules" /* * The top level entry point for the file. * "Root" is just a name as good as any, but * makes the meaning quite clear. */ Root: // first thing in the file is a "keyword", // followed by an attribute that will be // accessible as "name" later and allow // definition of an ID type of thing. 'navigation rules for' name=ID // after the keyword and "name" attribute // three sections follow, each assigned // to an attribute for later reference // (called "mappingdefs", "transitiondefs" // and "ruledefs"). // Their types are defined later in the file. mappingsdefs=Mappings transitiondefs=TransitionDefinitions ruledefs=NavigationRules // semicolon ends the definition of "Root" ; // mappings section >>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* * Definition of the "Mappings" type used in * the "Root" type. */ Mappings: // first the keyword "mappings" is expected, // then an open curly 'mappings' '{' // after that a collection of "Mapping"s is // expected. The "+=" means that they will // all be collected in a collection type element // called "mappings" for future reference. // The "+" at the end means "at least one, but // more is just fine". (mappings+=Mapping)+ // finally the "Mappings" type requires a closing // curly brace. '}' // semicolon ends the definition of "Mappings" ; /* * Definition of a single "Mapping", those we are * collecting in the "mappings" attribute of the * "Mappings" type. */ Mapping: // each mapping starts with the keyword "map" // and is followed by an element of type "MappingSpec" 'map' MappingSpec ; /* * Definition of a "MappingSpec" element. This is * actually just a "parent type" for two more specific * kinds of "MappingSpec": */ MappingSpec: // no keywords are defined here, a "MappingSpec" // can be either a "PermissionMappingSpec" or a // "CoordinateMappingSpec". Any of these will be // fine where a "MappingSpec" is asked for. PermissionMappingSpec | CoordinateMappingSpec ; /* * Definition of a "PermissionMappingSpec" element. */ PermissionMappingSpec: // first the keyword "permission" is required. // then a "name" attribute is expected of type ID. // Following the name the "to" keyword is expected, // followed by a string that is stored in the "value" // attribute 'permission' name=ID 'to' value=STRING ; /* * Definition of a "CoordinateMappingSpec" element. * The definition is very similar to the "PermissionMappingSpec" * but has more attributes. */ CoordinateMappingSpec: // first the keyword "coordinate", then an ID stored as "name", // the keyword "to", followed by a string stored as "controllername", // next the keyword "in" and finally another string, memorized as // "bundleid" 'coordinate' name=ID 'to' controllername=STRING 'in' bundleid=STRING ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< mappings section // >>>>>>>>>>>>>>>>>>>>>>>>>>>>> navigations section /* * Definition of the "TransitionDefinitions" type used in * the "Root" type. */ TransitionDefinitions: // first, this element is introduced with the "navigations" // keyword, followed by an open curly brace. 'navigations' '{' // after that a collection of "TransitionDefinition"s is // expected. The "+=" means that they will // all be collected in a collection type element // called "transitions" for future reference. // The "+" at the end means "at least one, but // more is just fine". (transitions+=TransitionDefinition)+ // the element ends with a closing curly brace '}' ; /* * Definition of a "TransitionDefinition" element. This * one is very simple. */ TransitionDefinition: // the keyword "define navigation" is required first, // then a "name" attribute of type ID is expected. 'define navigation' name=ID ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< navigations section // >>>>>>>>>>>>>>>>>>>>>>>>>>>>> navrules section /* * Definition of the "NavigationRules" element. */ NavigationRules: // Element starts with the keywords "navrules" and // open curly. 'navrules' '{' // collection attribute called "rules", consisting // of one or more occurrences of a "Rule" element. (rules+=Rule)+ // element finishes with a closing curly keyword '}' ; /* * Definition of a "Rule" element as used in the "NavigationRules" * element. */ Rule: // first the "from" keyword, then a reference to one of the // coordinate mappings defined earlier. This time no new // definition of a coordinate is required, but one of those // that have been listed before. So the type here is put in // square brackets 'from' source=[CoordinateMappingSpec] // following the source specification, one or more "Destination" // type elements are expected, collected in a collection attribute // named "destinations" (destinations+=Destination)+ ; /* * Definition of a "Destination" type. These are collected * in a "Rule". */ Destination: // first comes an "on navigation" keyword. After that a // reference to one of the Transition elements defined // in the "navigations" section is required and stored // in the "transition" attribute. // after that follows a "go to" keyword and a reference // to a coordinate mapping, stored in the "target" attribute. // finally - as with the "destinations" collection attribute // in the "Rule" element - a "permissions" collection is // defined to store none or more (*) "PermissionReference" // elements. 'on navigation' transition=[TransitionDefinition] 'go to' target=[CoordinateMappingSpec] (permissions+=PermissionReference)* ; /* * Definition of a "PermissionReference" type. This is used * in the "permissions" collection of a "Destination". */ PermissionReference: // first, a "with" keyword is expected. After that a // "permission" attribute stores a reference to one of // the previously defined permission mappings from the // "mappings" section. 'with' permission=[PermissionMappingSpec] ; // <<<<<<<<<<<<<<<<<<<<<<<<<<<<< navrules section This is what XText can digest and create an editor plugin and outline view for. Just save this as navigationRules.xtext – when you created the XText project in Eclipse using the wizard it should have been prepared for you. Copying and pasting this into a .xtext file in Eclipse will provide you with syntax highlighting, code completion and syntax checking, making it easy to play around with grammar files. Once done, right click the .mwe2 file lying next to the grammar file in the Package Explorer view and select Run As MWE2 Workflow from the context menu. This will take a moment and generate several classes, both in the current (XText) project and the accompanying ...ui project. Next, right click the Xtext project and select Run As Eclipse Application from the context menu. This will bring up another Eclipse instance with the newly created support for navigation rules files (with a .navi suffix) installed. To try it out, just create a new project and in that a new file. Make sure its name ends in .navi. When asked, make sure to accept adding the Xtext nature to the project. You will be presented with a new, empty editor that already has an error marker in it. This is because according to our grammar definition, an empty file does not comply to all the rules we specified. Try hitting the code-completion shortcut (Ctrl-Space) twice and see what happens: The first code-completion fills in the navigation rules for part. According to the grammar this is the only valid text at the beginning of a file, so it is automatically inserted. Hitting Ctrl-Space again will tell you that now you need a Name of type ID. Just go ahead and try out the completion. It will help you create a syntactically sound navigation rules file. Notice that the Problems View tells you what is currently wrong. Also notice, that one you reach a part where references are expected by the grammar (e. g. when defining source and destination coordinates in a navigation rule) you will get suggestions based on what you entered earlier. This is what the whole sample from above looks like in the editor: While you are still fleshing out and fine tuning your grammar definitions, you will probably close this Eclipse instance and reopen it, once you repeated the Run As MWE2 Workflow steps in the main instance. In the long run I suggest you create a Feature and an Update Site project to allow easier distribution and updates of the intermediate iterations. Generating Code Now, as we have a complete Xtext DSL defined and in place let’s have a look at the Code Generation side of things. This part is completely optional: You are free to include the necessary Xtext libraries into your applications runtime (although they seem to be numerous) and just use them to dynamically load and parse .navi files on-the-fly. This would probably be a good idea if you were writing an Eclipse based application anyway. However, when targeting a very limited platform like JavaME this option is not viable. Instead we will now create a code generator that provides a transformation from the DSL syntax into more classic Java terms – specifically we will create a HashMap based data structure that carries all the same information, but in Java terms. This is a sample of what the generated output is going to look like: public class NaviRules { private Map navigationRules = new Hashtable(); // ... public NaviRules() { NaviDestination naviDest; // ========== From Login (com.danielschneller.myapp.gui.login.LoginController) // ========== On USER_LOGON_FAILED // ========== To LoginFailed (com.danielschneller.myapp.gui.login.LoginFailedController in com.danielschneller.myapp.login) naviDest = new NaviDestination(); naviDest.action = "USER_LOGON_FAILED"; naviDest.targetClassname = "com.danielschneller.myapp.gui.login.LoginFailedController"; naviDest.targetBundleId = "com.danielschneller.myapp.login"; store("com.danielschneller.myapp.gui.login.LoginController", naviDest); // ========== On USER_LOGON_SUCCESS // ========== To MainMenu (com.danielschneller.myapp.gui.menu.MainMenuController in com.danielschneller.myapp.menu) naviDest = new NaviDestination(); naviDest.action = "USER_LOGON_SUCCESS"; naviDest.targetClassname = "com.danielschneller.myapp.gui.menu.MainMenuController"; naviDest.targetBundleId = "com.danielschneller.myapp.menu"; store("com.danielschneller.myapp.gui.login.LoginController", naviDest); // ============================================================================= // ========== From LoginFailed (com.danielschneller.myapp.gui.login.LoginFailedController) // ========== On OK // ========== To Login (com.danielschneller.myapp.gui.login.LoginController in com.danielschneller.myapp.login) naviDest = new NaviDestination(); naviDest.action = "OK"; naviDest.targetClassname = "com.danielschneller.myapp.gui.login.LoginController"; naviDest.targetBundleId = "com.danielschneller.myapp.login"; store("com.danielschneller.myapp.gui.login.LoginFailedController", naviDest); // .... and so on ... } } The support class NaviDestination is omitted but is generally just a value holder struct type class. When creating the Xtext project using the wizard earlier we created a third Eclipse project, ending in ...generator. Its src folder contains three subdirectories called model, templates and workflow. Put the sample .navi file into the model directory. It will serve as the input for the generator. Create the first template Code generation is based on templates. Xtext leverages the Xpand template engine. In the templates directory create a new Xpand template using the context menu. Call it NaviRules.xpt, open it and insert the following: «REM» import the namespace defined in our DSL model «ENDREM» «IMPORT navigationRules» «REM» Define a template called "main" for elements of type "Root". The minus sign at the end takes care of not adding a newline at the end of it. «ENDREM» «DEFINE main FOR Root-» «ENDDEFINE» As there is only one instance of a Root element in a navigation rules file, this will be the main entry point - hence the name. There is no need to call it main, but it seems fitting. Now between the DEFINE and ENDDEFINE insert what is to be generated: As shown above, we need a new Java source file called NaviRules.java: ... «DEFINE main FOR Root-» «FILE "NaviRules.java"-» «ENDFILE-» «ENDDEFINE» ... Again, the contents to be generated is put in between the FILE and ENDFILE brackets. Anything not enclosed in «» will be used verbatim in the output file. So first of all, put in the static parts of the Java file. What I did was first write the source for a single navigation rule by hand, made sure it compiled and then copied over the relevant parts into the template piece by piece: ... «FILE "NaviRules.java"-» import java.util.*; public class NaviRules { public static class NaviDestination { String action; List requiredPermissions = new ArrayList(); String targetClassname; String targetBundleId; NaviDestination() {}; public final List getRequiredPermissions() { return new ArrayList(requiredPermissions); } // let Eclipse generate getters, setters, // equals and hashCode methods for this } private Map navigationRules = new Hashtable(); «ENDFILE-» ... Now, this is nothing special so far. To fill in the elements from the navigation rules DSL file put in the following: ... private Map navigationRules = new Hashtable(); public NaviRules() { NaviDestination naviDest; «REM» Iterate all elements in the "rules" collection attribute of the "ruledefs" attribute of the "Root" element. Call each iterated element (which is of type "Rule") "rule" and expand the "ruletmpl" template for it here. «ENDREM» «FOREACH ruledefs.rules AS r»«EXPAND ruletmpl FOR r»«ENDFOREACH» } ... In the class constructor we first define a local variable naviDest of the previously declared type. Then - as the comment states - the FOREACH instruction will iterate over all Rule type elements. This might not seem to be completely obvious at first. Remember at this point in the template the current scope is the "Root" element from the navigation rules file. It has an attribute called ruledefs as per the grammer definition. This attribute is of type NavigationRules which in turn has a collection attribute called rules, containing of Rule type objects. Inside the loop the current element can then be adressed by the template variable name r. The loop body (between FOREACH and ENDFOREACH) contains another Xpand instruction to expand a template called ruletmpl which will be declared next. Don't worry, even though this is a little difficult at first - switching contexts between the Java and the template scopes is made significantly easier in Eclipse, because the Xpand template editor will syntax color (static parts are blue) and also assist you with code completion inside the Xpand template parts. Ctrl-Spacing your way through it will make things more obvious than they are when reading an example. Now for the ruletmpl template. Place it below the ENDDEFINE statement belonging to the main template: ... «ENDFILE-» «ENDDEFINE» «DEFINE ruletmpl FOR Rule-» // ========== From «source.name» («source.controllername») «FOREACH destinations AS d»«EXPAND destTmpl(source) FOR d»«ENDFOREACH» // ============================================================================= «ENDDEFINE» You see the same idea used again: Static parts that get transferred into the output file 1:1 and Xpand statements that fill in data from the navigation rules definition file. In this case you see references to the attributes of the Rule element. As per the FOREACH instruction in the previous template, the one at hand will be repeated for every instance of Rule in our source file. Inside this definition the current scope is that of Rule, so with «source.name» the name attribute of the CoordinateMappingSpec object referenced as source in a Rule is taken first, then the controllername attribute likewise. Next up another FOREACH loop iterates the one or more possible Destinations of each Rule. Instead of just applying a template (destTmpl) for every Destination we also pass in the corresponding CoordinateMappingSpec stored in the source attribute of the Rule. This is then used in the following template: ... «DEFINE destTmpl(CoordinateMappingSpec source) FOR Destination-» // ========== On «transition.name» // ========== To «target.name» («target.controllername» in «target.bundleid») naviDest = new NaviDestination(); naviDest.action = "«transition.name»"; naviDest.targetClassname = "«target.controllername»"; naviDest.targetBundleId = "«target.bundleid»"; «FOREACH permissions AS p»«EXPAND permTmpl FOR p»«ENDFOREACH» store("«source.controllername»", naviDest); «ENDDEFINE» «DEFINE permTmpl FOR PermissionReference-» naviDest.requiredPermissions.add("«permission.value»"); «ENDDEFINE» In this innermost templates the attributes of the CoordinateMappingSpec objects source and target are accessed and put into place to be assigned to the members a NaviDestination Java object instance per Destination. There is only one more (very simple) template for the PermissionReference elements. With this, the Xpand file is complete. Set Up The Generator Workflow The wizard initially created a NavigationRulesGenerator.mwe2 file in the workflow folder. Open it and replace its contents with the following: module workflow.NavigationRulesGenerator import org.eclipse.emf.mwe.utils.* var targetDir = "src-gen" var fileEncoding = "Cp1252" var modelPath = "src/model" Workflow { component = org.eclipse.xtext.mwe.Reader { path = modelPath // this class has been generated by the xtext generator register = com.danielschneller.navi.NavigationRulesStandaloneSetup {} load = { slot = "root" type = "Root" } } component = org.eclipse.xpand2.Generator { metaModel = org.eclipse.xtend.typesystem.emf.EmfRegistryMetaModel {} expand = "templates::NaviRules::main FOREACH root" outlet = { path = targetDir } fileEncoding = fileEncoding } } The most interesting parts of this workflow file are the load section in the Reader component and the expand and outlet sections in the Generator component: The first one will connect a so-called slot with the Root element from our navigation rules. The second one will trigger the evaluation of the main template in the NaviRules.xpt file in the templates folder and feed any Root instances it finds in the *.navi files from the src/model (modelPath) into it. Now it is time for some actual generation. Run the generator workflow Right click the MWE2 file you just edited and select the Run As MWE2 Workflow command from the context menu. The Eclipse console will show this output: 0 [main] DEBUG org.eclipse.xtext.mwe.Reader - Resource Pathes : [src/model] 431 [main] DEBUG xt.validation.ResourceValidatorImpl - Syntax check OK! Resource: file:/Users/ds/ws/ws36_xtext/com.danielschneller.navi.dsl.generator/src/model/MyApp.navi 1013 [main] INFO org.eclipse.xpand2.Generator - Written 1 files to outlet [default](src-gen) 1014 [main] INFO .emf.mwe2.runtime.workflow.Workflow - Done. Then have a look at the newly generated contents of the src-gen source folder. If everything went alright, you should find a fresh NaviRules.java file placed there, based on the contents of your navigation rules file and the Xpand templates. Try and make some changes to the template, then re-run the workflow. You will see the changes reflected in the generated source file. Generate a second source File In the templates directory add another Xpand template file Navigation.xpt with the following content: «IMPORT navigationRules»; «DEFINE main FOR Root-» «FILE "Navigation.java"-» public final class Navigation { «FOREACH ruledefs.rules.destinations.transition.collect(e|e.name).toSet().sortBy(e|e) AS t»«EXPAND actionTmpl FOR t»«ENDFOREACH» private final String name; private Navigation(String aName) { name = aName; } public String getName() { return name; } } «ENDFILE-» «ENDDEFINE» «DEFINE actionTmpl FOR String-» /** Constant for Navigation «this» */ public static final Navigation «this» = new Navigation("«this»"); «ENDDEFINE» This is a template for a type-safe enumeration that can be used in Java 1.4 - remember I had to do this for JavaME. Notice the FOREACH loop in this case. It demonstrates that not only simple iterations are possible, but that Xpand allows more complex operations as well. In this case it will collect the names of all the navigation transitions from all the Destinations in the navigation rules. These are of type String. They are made unique by converting them to a Set datastructure and then finally sorted in their natural order. The resulting list of sorted strings is then iterated, each one - called t - is passed to the actionTmpl template. It is very simple, just placing the string itself («this») into a single line of Java source code. Of course, strictly speaking this is a rather complicated procedure to get the same information we could also have taken from the TransitionDefinitions element in the rules definition. However I think it serves as a nice example for additional Xpand capabilities. For a full description of its possibilities, have a look at the Xpand Reference in the Eclipse documentation. To use the new template, add another section to the MWE2 workflow definition: component = org.eclipse.xpand2.Generator { metaModel = org.eclipse.xtend.typesystem.emf.EmfRegistryMetaModel {} expand = "templates::Navigation::main FOREACH root" outlet = { path = targetDir } fileEncoding = fileEncoding } Running it again will produce a slightly different output, making clear that two files have been generated. This is what comes out in the src-gen folder as Navigation.java: public final class Navigation { /** Constant for Navigation ADMIN */ public static final Navigation ADMIN = new Navigation("ADMIN"); /** Constant for Navigation BACK */ public static final Navigation BACK = new Navigation("BACK"); /** Constant for Navigation DATA_LOOKUP */ public static final Navigation DATA_LOOKUP = new Navigation("DATA_LOOKUP"); /** Constant for Navigation OK */ public static final Navigation OK = new Navigation("OK"); ... More... This was just about my first experiments with Xtext. I am sure there is plenty more to be done with it. For more reading, please have a look at this very nice Getting started with Xtext tutorial by Peter Friese of Itemis. From http://www.danielschneller.com/2010/08/code-generation-with-xtext.html
August 7, 2010
by Daniel Schneller
· 26,996 Views
article thumbnail
Optimizing JPA Performance: An EclipseLink, Hibernate, and OpenJPA Comparison
'Impedance mismatch'. No two words encompass the troubles, headaches and quirks most developers face when attempting to link applications to relational databases (RDBMS). But lets face it, object orientated designs aren't going away anytime soon from mainstream languages and neither are the relational storage systems used in most applications. One side works with objects, while the other with tables. Resolving these differences -- or as its technically referred to 'object/relational impedance mismatch' -- can result in substantial overhead, which in turn can materialize into poor application performance. In Java, the Java Persistence API (JPA) is one of the most popular mechanisms used to bridge the gap between objects (i.e. the Java language) and tables (i.e. relational databases). Though there are other mechanisms that allow Java applications to interact with relational databases -- such as JDBC and JDO -- JPA has gained wider adoption due to its underpinnings: Object Relational Mapping (ORM). ORM's gain in popularity is due precisely to it being specifically designed to address the interaction between object and tables. In the case of JPA, there is a standard body charged with setting its course, a process which has given way to several JPA implementations, among the three most popular you will find: EclipseLink (evolved from TopLink), Hibernate and OpenJPA. But even though all three are based on the same standard, ORM being such a deep and complex topic, beyond core functionality each implementation has differences ranging from configuration to optimization techniques. What I will do next is explain a series of topics related to optimizing an application's use of the JPA, using and comparing each of the previous JPA implementations. While JPA is capable of automatically creating relational tables and can work with a series of relational database vendors, I will part from having pre-existing data deployed on a MySQL relational database, in addition to relying on the Spring framework to facilitate the use of the JPA. This will not only make it a fairer comparison, but also make the described techniques appealing to a wider audience, since performance issues become a serious concern once you have a large volume of data, in addition to MySQL and Spring being a common choice due to their community driven (i.e. open-source) roots. See the source code/application section at the end for instructions on setting up the application code discussed in the remainder of the sections. Download the Source Code associated with this article (~45 MB) The basics: Metrics In order to establish JPA performance levels in an application, it's vital to first obtain a series of metrics related to a JPA implementation's inner workings. These include things like: What are the actual queries being performed against a RDBMS? How long does each query take? Are queries being performed constantly against the RDBMS or is a cache being used? These metrics will be critical to our performance analysis, since they will shed light on the underlying operations performed by a JPA implementation and in the process show the effectiveness or ineffectiveness of certain techniques. In this area you will find the first differences among implementations, and I'm not talking about metric results, but actually how to obtain these metrics. To kick things off, I will first address the topic of logging. By default, all three JPA implementations discussed here -- EclipseLink, Hibernate and OpenJPA -- log the query performed against a RDBMS, which will be an advantage in determining if the queries performed by an ORM are optimal for a particular relational data model. Nevertheless, tweaking the logging level of a JPA implementation further can be helpful for one of two things: Getting even more details from the underlying operations made by a JPA -- which can be turned off by default (e.g. database connection details) -- or getting no logging information at all -- which can benefit a production system's performance. Logging in JPA implementations is managed through one of several logging frameworks, such as Apache Commons Logging or Log4J. This requires the presence of such libraries in an application. Logging configuration of a JPA implementation is mostly done through a value in an application's persistence.xml file or in some cases, directly in a logging framework's configuration files. The following table describes JPA logging configuration parameters: Large table, so here's an external link In addition to the information obtained through logging, there is another set of JPA performance metrics which require different steps to be obtained. One of these metrics is the time it takes to perform a query. Even though some JPA implementations provide this information using certain configurations, some do not. Even so, I opted to use a separate approach and apply it to all three JPA implementations in question. After all, time metrics measured in milliseconds can be skewed in certain ways depending on start and end time criteria. So to measure query times, I will use Aspects with the aid of the Spring framework. Aspects will allow us to measure the time it takes a method containing a query to be executed, without mixing the timing logic with the actual query logic -- the last feature of which is the whole purpose of using Aspects. Further discussing Aspects would go beyond the scope of performance, so next I will concentrate on the Aspect itself. I advise you to look over the accompanying source code, Aspects and Spring Aspects for more details on these topics and their configuration. The following Aspect is used for measuring execution times in query methods. package com.webforefront.aop;import org.apache.commons.lang.time.StopWatch;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Pointcut;import org.aspectj.lang.annotation.Aspect;@Aspectpublic class DAOInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object logQueryTimes(ProceedingJoinPoint pjp) throws Throwable { StopWatch stopWatch = new StopWatch(); stopWatch.start(); Object retVal = pjp.proceed(); stopWatch.stop(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": " + stopWatch.getTime() + "ms"); return retVal; } The main part of the Aspect is the @Around annotation. The value assigned to this last annotation indicates to execute the aspect method -- logQueryTimes -- each time a method belonging to a class in the com.webforefront.jpa.service package is executed -- this last package is where all our application's JPA query methods will reside. The logic performed by the logQueryTimes aspect method is tasked with calculating the execution time and outputting it as logging information using Apache Commons Logging. Another set of important JPA metrics is related to statistics beyond those provided by standard logging. The statistics I'm referring to are things related to caches, sessions and transactions. Since the JPA standard doesn't dictate any particular approach to statistics, each JPA implementation also varies in the type and way it collects statistics. Both Hibernate and OpenJPA have their own statistics class, where as EclipseLink relies on a Profiler to gather similar metrics. Since I'm already relying on Aspects, I will also use an Aspect to obtain statistics both prior and after the execution of a JPA query method. The following Aspect obtains statistics for an application relying on Hibernate. package com.webforefront.aop;import org.hibernate.stat.Statistics;import org.hibernate.SessionFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.hibernate.ejb.HibernateEntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheHibernateInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { HibernateEntityManagerFactory hbmanagerfactory = (HibernateEntityManagerFactory) entityManagerFactory; SessionFactory sessionFactory = hbmanagerfactory.getSessionFactory(); Statistics statistics = sessionFactory.getStatistics(); String str = pjp.getTarget().toString(); statistics.setStatisticsEnabled(true); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) " + statistics); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) " + statistics); return result; } } Notice the similar structure to the prior timing Aspect, except in this case the logging output contains values that belong to the Statistics Hibernate class obtained via the application's EntityManagerFactory. The next Aspect is used to obtain statistics for an application relying on OpenJPA. package com.webforefront.aop;import org.apache.openjpa.datacache.CacheStatistics;import org.apache.openjpa.persistence.OpenJPAEntityManagerFactory;import org.apache.openjpa.persistence.OpenJPAPersistence;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheOpenJPAInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { OpenJPAEntityManagerFactory ojpamanagerfactory = OpenJPAPersistence.cast(entityManagerFactory); CacheStatistics statistics = ojpamanagerfactory.getStoreCache().getStatistics(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); return result; } } Once again, notice the similar Aspect structure to the previous Aspect which relies on an application's EntityManagerFactory. In this case, the logging output contains values that belong to the CacheStatistics OpenJPA class. Since OpenJPA does not enable statistics by default, you will need to add the following two properties to an application's persistence.xml file: The first property ensures statistics are gathered, while the second property is used to indicate the gathering of statistics take place on a single JVM. NOTE: The value "true(EnableStatistics=true)" also enables caching in addition to statistics. Since EclipseLink doesn't have any particular statistics class and relies on a Profiler to determine advanced metrics, the simplest way to obtain similar statistics to those of Hibernate and OpenJPA is through the Profiler itself. To active EclipseLink's Profiler you just need to add the following property to an application's persistence.xml file: . By doing so, the EclipseLink Profiler output's several metrics on each JPA query method execution as logging information. Now that you know how to obtain several metrics from all three JPA implementations and understand they will be obtained as fairly as possible for all three providers, it's time to put each JPA implementation to the test along with several performance techniques. JPQL queries, weaving and class transformations Lets start by making a query that retrieves data belonging to a pre-existing RDBMS table named "Master". The "Master" table contains over 17,000 records belonging to baseball players. To simplify matters, I will create a Java class named "Player" and map it to the "Master" table in order to retrieve the records as objects. Next, relying on the Spring framework's JpaTemplate functionality, I will setup a query to retrieve all "Player" objects, with the query taking the following form: getJpaTemplate().find("select e from Player e"); See the accompanying source code for more details on this last process. Next, I deploy the application using each of the three JPA implementations on Apache Tomcat, doing so separately, as well as starting and stopping the server on each deployment to ensure fair results. These are the results of doing so on a 64-bit Ubuntu-4GB RAM box, using Java 1.6: All player objects - 17,468 records Time Query Hibernate 3558 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ EclipseLink (Run-time weaver - Spring ReflectiveLoadTimeWeaver weaver ) 3215 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (Build-time weaving) 3571 ms SELECT lahmanID, nameLast, nameFirst FROM Master EclipseLink (No weaving) 3996 ms SELECT lahmanID, nameLast, nameFirst FROM Master OpenJPA (Build-time enhanced classes) 5998 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Run-time enhanced classes- OpenJPA enhancer) 6136 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 OpenJPA (Non enhanced classes) 7677 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 As you can observe, the queries performed by each JPA implementation are fairly similar, with two of them using a shortcut notation (e.g. t0 and player0 for the table named 'Master'). This syntax variation though has minimal impact on performance, since directly querying an RDBMS using any of these notation variations shows identical results. However, the query times made through several JPA implementations using distinct parameters vary considerably. One important factor leading to this time difference is due to how each implementation handles JPA entities. Lets start with the OpenJPA implementation which had the poorest times. OpenJPA can execute an enhancement process on Java entities (e.g. in this case the 'Player' class). This enhancement process can be performed when the entities are built, at run-time or foregone altogether. As you can observe, foregoing entity enhancement altogether in OpenJPA produced the longest query times. Where as enhancing entities at either build-time or run-time produced relatively better results, with the former beating out the latter. By default, OpenJPA expects entities to be enhanced. This means you will either need to explicitly configure an application to support unenhanced classes by adding the following: ...property to an application's persistence.xml file or enhance classes at build-time or at run-time relying on the OpenJPA enhancer, otherwise an application relying on OpenJPA will throw an error. Given these OpenJPA results, the remaining OpenJPA tests will be based on build-time enhanced entity classes. For more on the topic of OpenJPA enhancement, refer to the OpenJPA documentation in addition to consulting the accompanying source code for this article. You may be wondering what exactly constitutes OpenJPA enhancement ? OpenJPA entity enhancement is a processing step applied to the bytecode generated by the Java compiler which adds JPA specific instructions to provide optimal runtime performance, these instructions can include things like flexible lazy loading and dirty read tracking. So why doesn't Hibernate or EclipseLink enhance entities ? In short, Hibernate and EclipseLink also enhance JPA entites, they just don't outright call it 'enhancement'. EclipseLink calls this 'enhancement' process by the more technical term: weaving. Similar to OpenJPA's enhancement process, weaving in EclipseLink can take place at either build-time (a.k.a. static weaving), run-time or forgone altogether. As you can observe in the results, all of EclipseLink's tests present smaller variations compared to OpenJPA. The longest EclipseLink variation involved not using weaving. If you think about it, this is rather logical given that the purpose of weaving consists of altering Java byte code for the purpose of adding optimized JPA instructions that include lazy loading, change tracking, fetch groups and internal optimizations. For the EclipseLink tests using weaving, both build-time and run-time weaving present better results. For build-time weaving, I used EclipseLink's library along with an Apache Ant task, where as for run-time weaving, I used the Spring framework's ReflectiveLoadTimeWeaver. I can only assume the slightly better performance of using run-time weaving over build-time weaving in EclipseLink was due to the fact of using a weaver integrated with the Spring framework, which in turn could result in better JPA optimizations designed for Spring applications. Nevertheless, considering the test result of forgoing weaving altogether, weaving does not appear to be a major performance impact when using EclipseLink, ceteris paribus. By default, EclipseLink expects run-time weaving to be enabled, otherwise you will receive an error in the form 'Cannot apply class transformer without LoadTimeWeaver specified'. This means that for cases using build-time weaving or no weaving at all, you will need to explicitly indicate this behavior. In order to disable EclipseLink weaving you will need to either configure an application's EntityManagerFactory Spring bean with: ... or add the .... ...property to an application's persistence.xml file. To indicate an application's entities are built using build-time weaving, substitute the previous property's "false" value with "static". To configure the default run-time weaver expected by EclipseLink, add the following: ...property to an application's EntityManagerFactory Spring bean. Given these EclipseLink results, the remaining EclipseLink tests will be based on run-time weaving provided by the Spring framework. For more on the topic of EclipseLink weaving, refer to the EclipseLink documentation at http://wiki.eclipse.org/Introduction_to_EclipseLink_Application_Development_(ELUG)#Using_Weaving, in addition to consulting the accompanying source code for this article. Hibernate doesn't require neither enhancing JPA entities or weaving. For this reason, there is only one test result. This not only makes Hibernate simpler to setup, but judging by its only test result -- which clock's in at second place with respect to all other tests -- Hibernate's performance ranks high compared to its counterparts. However, in what I would consider Hibernate's equivalent to OpenJPA's enhancement process or EclipseLink's weaving, you will find a series of Hibernate properties. For example, Hibernate has properties such as hibernate.default_batch_fetch_size designed to optimize lazy loading. As you might recall, among the purposes of both OpenJPA's enhancement process and EclipseLink's weaving are the optimization of lazy loading. So where as OpenJPA and EclipseLink require a separate and monolithic step -- at build-time or run-time -- to achieve JPA optimization techniques, Hibernate falls back to the use of granular properties specified in an application's persistence.xml file. Nevertheless, given that Hibernate's default behavior proved to be on par with the best query times, I didn't feel a need to further explore with these Hibernate properties. To get another sense of the times and mapping procedures of each JPA implementation, I will make more selective queries based on a Player object's first name and last name. These are the results of performing a query for all Player objects whose first name is John and a query for all Player objects whose last name in Smith. All player objects whose first name is John - 472 records Time Query EclipseLink 1265 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameFirst = ?) Hibernate 613 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameFirst=? OpenJPA 1643 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameFirst = ?) [params=?] All player objects whose last name is Smith - 146 records Time Query EclipseLink 986 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameLastt = ?) Hibernate 537 ms select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameLast=? OpenJPA 1452 ms SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameLast = ?) [params=?] These test results tell a slightly different story,with all three JPA implementations presenting substantial time differences amongst one another. At a lower record count, Hibernate's out-of-the-box configuration resulted in almost twice as fast queries as its closest competitor and almost three times faster queries than its other competitor. To get an even broader sense of the times and mapping procedures of each JPA implementation, I will make a query on a single Player object based on its id. These are the results of performing such a query. Single player object whose ID is 777- 1 record Time Query EclipseLink 521 ms SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (lahmanID = ?) Hibernate 157 ms select player0_.lahmanID as lahmanID0_0_, player0_.nameFirst as nameFirst0_0_, player0_.nameLast as nameLast0_0_ from Master player0_ where player0_.lahmanID=? OpenJPA 1052 ms SELECT t0.nameFirst, t0.nameLast FROM Master t0 WHERE t0.lahmanID = ? [params=?] With the exception of the faster query times -- due to it being a query for a single Player object -- the times between JPA implementations are practically in proportion to the queries used for extracting multiple Player objects by first and last name. This will do it as far as test queries are concerned. However, a word of caution is in order when discussing these topics on optimization/enhancement/weaving. Even though the previous tests consisted of querying over 17,000 records and confirm clear advantages of using one provider and technique over another, they are still one dimensional, since they're based on read operations performed on a single object type and a single RDBMS table. JPA can perform a large array of operations that also include updating, writing and deleting RDBMS records, not to mention the execution of more elaborate queries that can span multiple objects and tables. In addition, RDBMS themselves can have influencing factors (e.g. indexes) over JPA query times. So all this said, it's not too far fetched to think the use of OpenJPA entity enhancement, EclipseLink weaving or Hibernate properties, could have varying degrees -- either beneficial or detrimental -- depending on the queries (i.e. multi-table, multi-object) and type of JPA operation (i.e. read, write, update, delete) involved. Next, I will describe one of the most popular techniques used to boost performance in JPA applications. Caches A cache allows data to remain closer to an application's tier without constantly polling an RDBMS for the same data. I entitled the section in plural -- caches -- because there can be several caches involved in an application using JPA. This of course doesn't mean you have to configure or use all the caches provided by an application relying on JPA, but properly configuring caches can go a long way toward enhancing an application's JPA performance. So lets start by analyzing what it's each JPA implementation offers in its out-of-the-box state in terms of caching. The following table illustrates tests done by simply invoking the previous JPA queries for a second and third consecutive time, without stopping the server. Note that the same process of deploying a single application at once was used, in addition to the server being re-started on each set of tests. Query / Implementation EclipseLink Hibernate OpenJPA All records (1st time) 3215 ms 3558 ms 5998 ms All records (2nd time) 507 ms 272 ms 521 ms All records (3rd time) 439 ms 218 ms 263 ms First name (1st time) 1265 ms 613 ms 1643 ms First name (2nd time) 151 ms 115 ms 239 ms First name (3rd time) 154 ms 101 ms 227 ms Last name (1st time) 986 ms 537 ms 1452 ms Last name (2nd time) 41 ms 41 ms 112 ms Last name (3rd time) 65 ms 38 ms 117 ms By ID (1st time) 521 ms 157 ms 1052 ms By ID (2nd time) 1 ms 6 ms 3 ms By ID (3rd time) 1 ms 3 ms 3 ms As you can observe, on both the second and third invocation all the queries show substantial improvements with respect to the first invocation. The primary cause for these improvements is unequivocally due to the use of a cache. But what type of cache exactly ? Could it be an RDBMS's own caching engine ? JPA ? Spring ? Or some other variation ?. In order to shed some light on cache usage, the following table illustrates the cache statistics generated on each of the previous JPA queries. Query / Impleme)ntation EclipseLink Hibernate OpenJPA All records (2nd time) number of objects=17468, total time=506, local time=506, row fetch=65, object building=328, cache=112, sql execute=47, objects/second=34521, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=34936, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A All records (3rd time) number of objects=17468, total time=435, local time=435, profiling time=1, row fetch=28, object building=323, cache=106, logging=1, sql execute=27, objects/second=40156, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=52404, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (2nd time) number of objects=472, total time=148, local time=148, row fetch=27, object building=106, cache=7, logging=1, sql execute=3, objects/second=3189, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=944, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A First name (3rd time) number of objects=472, total time=152, local time=152, row fetch=20, object building=121, cache=7, sql execute=3, objects/second=3105, sessions opened=3, sessions closed=3 connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=1416, queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (2nd time) number of objects=146, total time=40, local time=40, row fetch=7, object building=27, cache=2, logging=1, sql execute=3, objects/second=3650, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=292, queries executed to database=2, query cache puts=0, query cache hits=0, query cache misses=0 N/A Last name (3rd time) number of objects=146, total time=63, local time=63, profiling time=1, row fetch=6, object building=19, cache=5, sql prepare=1, sql execute=23, objects/second=2317, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=438 queries executed to database=3, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (2nd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=2, sessions closed=2, connections obtained=2, statements prepared=2, statements closed=2, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=2, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A By ID (3rd time) number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, sessions opened=3, sessions closed=3, connections obtained=3, statements prepared=3, statements closed=3, second level cache puts=0, second level cache hits=0, second level cache misses=0, entities loaded=3, queries executed to database=0, query cache puts=0, query cache hits=0, query cache misses=0 N/A Notice the statistics generated by each JPA implementation are different. EclipseLink reports a single cache statistic, OpenJPA doesn't even report statistics unless a cache is enabled -- see previous section on metrics for details on this behavior -- and Hibernate reports two cache related statistics: second level cache and query cache. At this juncture, if you look at the test results and statistics for the second and third invocation, something won't add up. How is it that OpenJPA's test results came out faster when caching is disabled by default ? An how about Hibernate returning 0's on its cache related statistics, even when its test results came out faster ? The reason for this performance increase is due to RDBMS caching. On the first query, the RDBMS needs to read data from its own file system (i.e. perform an I/O operation), on subsequent requests the data is present in RDBMS memory (i.e. its cache) making the entire JPA query much faster. A closer look at the Hibernate statistics field 'queries executed to the database' can confirm this. Notice that on every second query it shows 2 and on every third query it shows 3, meaning the data was read directly from the database. NOTE: The only exception to this occurs when a query is made on a single entity (i.e. by id), I will address this shortly. Next, lets start breaking down the caches you will encounter when using JPA applications. The JPA 2.0 standard defines two types of caches: A first level cache and a second level cache. The first level cache or EntityManager cache is used to properly handle JPA transactions. A first level cache only exist for the duration of the EntityManager. With the exception of long lived operations performed against a RDBMS, JPA EntityManager's are short lived and are created & destroyed per request or per transaction. In this case, given the nature of the queries, first level caches are cleared on every query. A second level cache on the other hand is a broader cache that can be used across transactions and users. This makes a JPA second level cache more powerful, since it can avoid constantly polling an RDBMS for the same data. But even though the JPA 2.0 standard now addresses second level cache features, this was not the case in JPA 1.0. In the 1.0 version of the JPA standard only a first level cache was addressed, leaving the door completely open on the topic of a second level cache. This created a fragmented approach to caching in JPA implementations, which even now as JPA 2.0 compliant implementations emerge, some non-standard features continue to be part of certain implementations given the value they provide to JPA caching in general. So as I move forward, bear in mind that just like previous JPA topics, each JPA implementation can have its own particular way of dealing with second level caching. I will start with OpenJPA, which has the least amount of proprietary caching options. To enable OpenJPA caching (i.e. second level caching) you need to declare the following two properties in an application's persistence.xml file: The first property ensures caching and statistics are activated, while the second property is used to indicate caching take place on a single JVM. The following results and statistics were obtained with OpenJPA's second level cache enabled. Query with OpenJPA caching Time Statistics Time without statistics All records (2nd time) 420 ms read count=34936, hit count=17468, write count=17468, total read count=34936, total hit count=17468, total write count=17468 347 ms All records (3rd time) 254 ms read count=52404, hit count=34936, write count=17468, total read count=52404, total hit count=34936, total write count=17468 230 ms First name (2nd time) 125 ms read count=944, hit count=472, write count=472, total read count=944, total hit count=472, total write count=472 127 ms First name (3rd time) 114 ms read count=1416, hit count=944, write count=472, total read count=1416, total hit count=944, total write count=472 132 ms Last name (2nd time) 63 ms read count=292, hit count=146, write count=146, total read count=292, total hit count=146, total write count=146 53 ms Last name (3rd time) 49 ms read count=438, hit count=292, write count=146, total read count=438, total hit count=292, total write count=146 50 ms By ID (2nd time) 5 ms read count=2, hit count=1, write count=1, total read count=2, total hit count=1, total write count=1 1 ms By ID (3rd time) 4 ms read count=3, hit count=2, write count=1, total read count=3, total hit count=2, total write count=1 1 ms As these test results illustrate, executing subsequent JPA queries with OpenJPA's second level cache produce superior results. Another important behavior illustrated in some of these test cases is that by simply disabling statistics -- and still using the second level cache -- query times improve even more. The OpenJPA statistics also demonstrate how the cache is being used. Notice that on each subsequent query the statistics field 'hit count' is duplicated, which means data is being read from the cache (i.e. a hit). Also notice the statistics field 'write count' remains static, which means data is only written once from the RDBMS to the cache. This is pretty basic functionality for a second level cache. On certain occasions a need may arise to interact directly with a cache. These interactions can range from prohibiting an entity from being cached, assigning a particular amount of memory to a cache, forcing an entity to always be cached, flushing all the data contained in a cache, or even plugging-in a third party caching solution to provide a more robust strategy, among other things. The JPA 2.0 standard provides a very basic feature set in terms of second level caching through javax.persistence.Cache. Upon consulting this interface, you'll realize it only provides four methods charged with verifying the presence of entities and evicting them. This feature set not only proves to be limited, but also cumbersome since it can only be leveraged programmatically (i.e. through an API). In this sense, and as I've already mentioned, JPA implementations have provided a series of features ranging from persistence.xml properties to Java annotations related to second level caching. OpenJPA offers several of these second level caching features, including a separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will point you directly to OpenJPA's cache documentation available at http://openjpa.apache.org/builds/apache-openjpa-2.1.0-SNAPSHOT/docs/manual/ref_guide_caching.html#ref_guide_cache_query so you can try these parameters for yourself on the accompanying application source code. Hibernate just like OpenJPA has its second level cache disabled. To enable Hibernate's second level cache you need to add the following properties to an application's persistence.xml file: Its worth mentioning that Hibernate has integral support for other second level caches. The previous properties displayed how to enable the HashtableCacheProvider cache -- the simplest of the integral second level caches -- but Hibernate also provides support for five additional caches, which include: EHCache, OSCache, SwarmCache, JBoss cache 1 and JBoss cache 2, all of which provide distinct features, albeit require additional configuration. Besides these properties, Hibernate also requires that each JPA entity be declared with a caching strategy. In this case, since the Person entity is read only, a caching strategy like the following would be used: Similar to OpenJPA, Hibernate also offers several second level caching features through proprietary annotations and configurations, as well as support for the separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will also point you directly to Hibernate's cache documentation available at http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache so you can try these parameters for yourself on the accompanying application source code. Unlike OpenJPA and Hibernate, EclipseLink's second level cache is enabled by default, therefore there is no need to provide any additional configuration. However, similar to its counterparts, EclipseLink also has a series of proprietary second level cache features which can enhance JPA performance. You can find more information on these features by consulting EclipseLink's cache documentation available at: http://wiki.eclipse.org/Introduction_to_Cache_(ELUG) With this we bring our discussion on object relational mapping performance with JPA to a close. I hope you found the various tests and metrics presented here a helpful aid in making decisions about your own JPA applications. In addition, don't forget you can rely on the accompanying source code to try out several JPA variations more ad-hoc to your circumstances. About the author Daniel Rubio is an independent technology consultant specializing in enterprise and web-based software. He blogs regularly on these and other software areas at http://www.webforefront.com. He's also authored and co-authored three books on Java technology. Source code/Application installation * Install MySQL on your workstation (Tested on MySQL 5.1.37-64 bits) - http://dev.mysql.com/downloads/ * Install data set on MySQL - Go to http://www.baseball-databank.org/ and click on the link titled 'Database in MySQL form'. This will download a zipped file with a series of MySQL data structures containing baseball statistics. First create a MySQL database to host the data using the command: 'mysqladmin -p create jpaperformance'. This will create a database named 'jpaperformance'. Next, load the baseball statistics using the following command: 'mysql -p -D jpaperformance < BDB-sql-2009-11-25.sql' where 'BDB-sql-2009-11.25.sql' represents the unzipped SQL script obtained by extracting the zip file you dowloaded. * Create JPA application WARs - The download includes source code, library dependencies and an Ant build file. This includes all three JPA implementations Hibernate 3.5.3, EclipseLink 2.1 and OpenJPA 2.1. To build the JPA Hibernate WAR - ant hibernate To build the JPA EclipseLink WAR - ant eclipselink To build the JPA OpenJPA WAR - ant openjpa All builds are placed under the dist/ directories. * Deploy to Tomcat 6.0.26 - Copy the MySQL Java driver and Spring Tomcat Weaver -- included in the download directory 'tomcat_jar_deps' -- to Apache Tomcat's /lib directory. - Copy each JPA application WAR to Apache Tomcat's /webapps directory, as needed. * Deployment URL's http://localhost:8080/hibernate/hibernate/home ( Query all Player objects ) http://localhost:8080/eclipselink/eclipselink/home ( Query all Player objects ) http://localhost:8080/openjpa/openjpa/home ( Query all Player objects ) http://localhost:8080/hibernate/hibernate/firstname/ ( Query Player objects by first name) http://localhost:8080/eclipselink/eclipselink/firstname/ ( Query Player objects by first name) http://localhost:8080/openjpa/openjpa/firstname/ ( Query Player objects by first name) http://localhost:8080/hibernate/hibernate/lastname/ ( Query Player objects by last name) http://localhost:8080/eclipselink/eclipselink/lastname/ ( Query Player objects by last name) http://localhost:8080/openjpa/openjpa/lastname/ ( Query Player objects by last name) http://localhost:8080/hibernate/hibernate/playerid/ (Query Player by id) http://localhost:8080/eclipselink/eclipselink/playerid/ ( Query Player by id) http://localhost:8080/openjpa/openjpa/playerid/ ( Query Player by id)
July 20, 2010
by Daniel Rubio
· 153,672 Views · 2 Likes
article thumbnail
Practical PHP Patterns: Metadata Mapping
The intent of the Metadata Mapping pattern is to express implementation details, related a particular domain and Domain Model, as metadata of a general purpose library. In the sense intended here, metadata is related to the persistence operations (transferring objects back and forth from a database). These metadata is usually fed to a general purpose object-relational mapper. Technically the term metadata is plural (of metadatum, data about data), but it is commonly used as an uncountable noun. Why expressing metadata Object-relational mapping is a difficult task to automate, prone to lots of potential bugs and undefined behaviors; expressing the domain-related peculiarities as metadata means that you are able to code only one ORM, and not have to repeat the same work in many custom Data Mappers, which are very boring to write and can't be transported out a specific application. Custom Data Mappers were a cleaner solution for Domain Models with regard to employing Active Records, and they are advocated for example in Zend Framework books like Keith's Pope one. They are finally becoming obsolete thanks to the power of a declarative approach like this pattern, which tools like Doctrine 2 are based on. Historicacally, Hibernate from JBoss was the first Data Mapper implemented as a generic ORM (it is a Java product). Doctrine 2 is the most famous PHP implementation, and it is in beta at the time of this writing. The metadata we'd like to tell to an ORM are for example: which classes should be persisted at all. Optional names for the tables (it can use the class names.) Which fields form the primary key. The types of the different columns, particularly important in a loosely typed language like PHP. Which collaborators have to be persisted and via what means: foreign keys and additional association tables. The metadata should usually not consist of code: non-standard behavior shouldn't be contained in them, as in general all the behavior like ineritance strategies and conversion of relationships is extracted in the generic ORM. Thus there are different formats we can use in place of PHP code: XML, annotations, YAML, INI... Different approaches There are two approaches to Metadata Mapping pattern, described by Fowler in his original book. The first one is code generation: the metadata is processed to generate the source code of the mapping classes, for example a Data Mapper for every entity or aggregate root of your model (one for User, one for BlogPost, and so on). The ORM would theoretically not be necessary in production if the generation is complete enough. Doctrine 1 used this approach in part, but it generated also the PHP code of the domain model itself from the Yaml mapping, as subclasses of Doctrine_Record. Still, Doctrine 1 was necessary to instantiate those classes and the solution wasn't so clean. Doctrine 2 is very different in architecture and goals. The second approach is called reflective program, and consists in interpreting the mapping at runtime in the ORM's code, to open up correctly the objects via reflection (or a standard interface) and putting them in the database. The converse can happen: objects can be recreated from the union of metadata and database tables. How it is used The reflective solution is the common one nowadays, and Doctrine 2 borrows it from Hibernate in its own design. Reflection is used to access the private fields to persist. Some critics point out speed problems of this technique, but keep in mind that your ORM is communicating with an external process or database machine at the same time of using reflection: it probably won't count much in the benchmark. Doctrine 2 however takes optimization seriously to the point that metadata internal classes (accessed very often) present an Api with public properties instead of methods to avoid every overhead in a crucial part (hydration of objects with data retrieved from the database). An advantage of generated code is that it would be easier to debug, but it is usually a pain to maintain: every time you evolve or refactor a domain class you have to regenerate the Mapper classes. You can't customize this code either, because you would lost your changes at the regeneration time. Advantages and (few) disadvantages Of course we lose some expressiveness by specifying metadata instead of a programmatical behavior like the source code of a custom Data Mapper. But we gain very much: a fully tested ORM, like Doctrine 2 in the PHP case, with only some lines of added metadata to keep in sync with the rest of the code base. Declarative approaches trading off completeness of functionalities (the absent ones are not used very often anyway) for developers time. But there are other advantages, such as the generation (and migration) of the database schema based on the metadata, and also of the proxy classes. Ideally, the metadata mapping is the only point of strong coupling of your Domain Model with an external adapter, the ORM. It is of course part of the infrastructure, so keep it under version control along with the code! Adding and removing fields or relationships, changing keys or refactoring is much easier because you do it declaratively instead of refactoring a specific mapper class. Note that automated refactoring tools are not to be trusted here: for example they usually ignore the mapping when you change a field name. So grep is your best ally. Examples The sample code of this article will present the different ways of specifying metadata for Doctrine 2, the most high-tech PHP ORM. The performance of the different methods are equivalent, since the metadata are read only one time into native PHP objects and then cached. Metadata is a vast subject since all the different persistence implementations have to be driven by it, but we will look more at the types of metadata specification we can use instead of all the different metadata instances, which are best described in conjunction with the single features (for example, the inheritance patterns articles contain the description of the metadata related to subclassing.) The simplest way to express metadata mapping in Doctrine 2 is via annotations, embedded in the docblocks and ignored from anything but the ORM: Don't be alarmed by the size: this mapping does much more than the annotations example's one. A third way to specify metadata is via YAML, a format widely used in symfony-related software: --- # Doctrine.Tests.ORM.Mapping.User.dcm.yml Doctrine\Tests\ORM\Mapping\User: type: entity table: cms_users id: id: type: integer generator: strategy: AUTO fields: name: type: string length: 50 oneToOne: address: targetEntity: Address joinColumn: name: address_id referencedColumnName: id oneToMany: phonenumbers: targetEntity: Phonenumber mappedBy: user cascade: cascadePersist manyToMany: groups: targetEntity: Group joinTable: name: cms_users_groups joinColumns: user_id: referencedColumnName: id inverseJoinColumns: group_id: referencedColumnName: id lifecycleCallbacks: prePersist: [ doStuffOnPrePersist, doOtherStuffOnPrePersistToo ] postPersist: [ doStuffOnPostPersist ]
July 5, 2010
by Giorgio Sironi
· 3,889 Views
article thumbnail
Versioning Static Assets with UrlRewriteFilter
A few weeks ago, a co-worker sent me interesting email after talking with the Zoompf CEO at JSConf. One interesting tip mentioned was how we querystring the version on our scripts and css. Apparently this doesn't always cache the way we expected it would (some proxies will never cache an asset if it has a querystring). The recommendation is to rev the filename itself. This article explains how we implemented a "cache busting" system in our application with Maven and the UrlRewriteFilter. We originally used querystring in our implementation, but switched to filenames after reading Souders' recommendation. That part was figured out by my esteemed colleague Noah Paci. Our Requirements Make the URL include a version number for each static asset URL (JS, CSS and SWF) that serves to expire a client's cache of the asset. Insert the version number into the application so the version number can be included in the URL. Use a random version number when in development mode (based on running without a packaged war) so that developers will not need to clear their browser cache when making changes to static resources. The random version number should match the production version number formats which is currently: x.y-SNAPSHOT-revisionNumber When running in production, the version number/cachebust is computed once (when a Filter is initialized). In development, a new cachebust is computed on each request. In our app, we're using Maven, Spring and JSP, but the latter two don't really matter for the purposes of this discussion. Implementation Steps 1. First we added the buildnumber-maven-plugin to our project's pom.xml so the build number is calculated from SVN. org.codehaus.mojo buildnumber-maven-plugin 1.0-beta-4 validate create false false javasvn 2. Next we used the maven-war-plugin to add these values to our WAR's MANIFEST.MF file. maven-war-plugin 2.0.2 true ${project.version} ${buildNumber} ${timestamp} 3. Then we configured a Filter to read the values from this file on startup. If this file doesn't exist, a default version number of "1.0-SNAPSHOT-{random}" is used. Otherwise, the version is calculated as ${project.version}-${buildNumber}. private String buildNumber = null; ... @Override public void initFilterBean() throws ServletException { try { InputStream is = servletContext.getResourceAsStream("/META-INF/MANIFEST.MF"); if (is == null) { log.warn("META-INF/MANIFEST.MF not found."); } else { Manifest mf = new Manifest(); mf.read(is); Attributes atts = mf.getMainAttributes(); buildNumber = atts.getValue("Implementation-Version") + "-" + atts.getValue("Implementation-Build"); log.info("Application version set to: " + buildNumber); } } catch (IOException e) { log.error("I/O Exception reading manifest: " + e.getMessage()); } } ... // If there was a build number defined in the war, then use it for // the cache buster. Otherwise, assume we are in development mode // and use a random cache buster so developers don't have to clear // their browswer cache. requestVars.put("cachebust", buildNumber != null ? buildNumber : "1.0-SNAPSHOT-" + new Random().nextInt(100000)); 4. We then used the "cachebust" variable and appended it to static asset URLs as indicated below. The injection of /v/[CACHEBUSTINGSTRING]/(assets|compressed) eventually has to map back to the actual asset (that does not include the two first elements of the URI). The application must remove these two elements to map back to the actual asset. To do this, we use the UrlRewriteFilter. The UrlRewriteFilter is used (instead of Apache's mod_rewrite) so when developers run locally (using mvn jetty:run) they don't have to configure Apache. 5. In our application, "/compressed/" is mapped to wro4j's WroFilter. In order to get UrlRewriteFilter and WroFilter to work with this setup, the WroFilter has to accept FORWARD and REQUEST dispatchers. rewriteFilter /* WebResourceOptimizer /compressed/* FORWARD REQUEST Once this was configured, we added the following rules to our urlrewrite.xml to allow rewriting of any assets or compressed resource request back to its "correct" URL. ^/v/[0-9A-Za-z_.\-]+/assets/(.*)$ /assets/$1 ^/v/[0-9A-Za-z_.\-]+/compressed/(.*)$ /compressed/$1 /compressed/** /compressed/$1 Of course, you can also do this in Apache. This is what it might look like in your vhost.d file: RewriteEngine on RewriteLogLevel 0! RewriteLog /srv/log/apache22/app_rewrite_log RewriteRule ^/v/[.A-Za-z0-9_-]+/assets/(.*) /assets/$1 [PT] RewriteRule ^/v/[.A-Za-z0-9_-]+/compressed/(.*) /compressed/$1 [PT] Whether it's a good idea to implement this in Apache or using the UrlRewriteFilter is up for debate. If we're able to do this with the UrlRewriteFilter, the benefit of doing this at all in Apache is questionable, especially since it creates a duplicate of code. From http://raibledesigns.com/rd/entry/versioning_static_assets_with_urlrewritefilter
June 5, 2010
by Matt Raible
· 11,763 Views
article thumbnail
Practical PHP Patterns: Identity Map
The Identity Map pattern is a Map implementation related to a Data Mapper usage. A map in the computer science sense is also called dictionary, or associative array; although in PHP associative arrays are very powerful, this kind of Map can be implemented as an object to present a specific interface to client code. The purpose of an Identity Map in the Data Mapper context is to keep a list of all the references to the in-memory domain objects that has been reconstituted by the Data Mapper internal mechanism, or are somehow managed by the Data Mapper itself (for example they have been scheduled for persistence). The Identity Map solves the problem of multiple loading of objects, which leads to performance issues and inconsistencies like two different objects with different states (but whose identity is the same, since they have for example an equal user id) that has to be stored in the back end. Ideally has a reference to every single object of the domain (that contains state, and thus is managed by the Data Mapper instead of being created by infrastructure or domain factories), in practice it is an array of references to the loaded objects. PHP implementation In PHP, the Identity Map is not unique troughout the whole application, but it is an object whose scope is limited to the single HTTP request (and so for example different requests have different Identity Maps which can become inconsistent with each other.) This limited scope, which is part of the nature of PHP and its scalable architecture, requires careful handling of objects that have been detached from the Data Mapper. In general, you can serialize or store in a cache domain objects for performance boost or simplification of business logic. You have, however, the obligation of reattaching a domain object to the Data Mapper with a special method (in Doctrine 2 EntityManager::merge()) to subsequently persist it, so that it can be reinserted in the Identity Map instead of being considered new or being duplicated. Remember that here duplication is more an issue of consistency than performance: a Data Mapper which accepts two different objects that points to the same place in the data store is not reliable. In fact, an Identity Map is a fundamental part of a non-naive Data Mapper: before recreating an entity the mapper looks for it in the Identity Map, to check if it is already available. Only if the object is not there, the Data Mapper creates a new one and inserts it in the Map for later reuse. Thus the Identity Map bridges the gap between the storage and the memory, keeping track of which parts of the object graph have been brought in memory and which are still on disks or external database machines, since we are forced by the technology to actually reconstitute a very small part of the application state in the form of objects (to be able to work on them). This approach is particularly suited to PHP's shared nothing mentality: there are other solutions for languages like Java and C#, like keeping the whole object graph in memory (some gigabytes) and dealing with persistence by taking a periodical snapshot of the graph, which is then freezed and stored on slower-but-larger memories like disks or SSD. In Doctrine 2 From the technical point of view, the Identity Map is an object or an associative array, with a single instance that exists for the entire request. This data structure is composed by the Entity Manager (the Facade of the Data Mapper) or by a Unit of Work, or even by some internal class of the Data Mapper. Even when an object is reconstituted as part of a query and not requested by its primary key, the loader class has to extract a unique identifier for the domain object and ask the Identity Map. In the sample code we will see at the end of this article, Doctrine 2 choice has been to keep the Identity Map as a private property (a multidimensional associative array) of the Unit of Work, which has a set of public methods available to access the Map to act as the unique Facade for the internal code. The tipical key used for the indexing is a combination of the class name of the domain object and of its unique identifier (usually the primary key used in storage, reduced to a serialized value if constituted by multiple fields.) This indexing implementation is generic enough to deal with most of the use cases, even with inheritance strategies. Another supplemental indexing is based on the spl_object_hash() function result, which returns a unique identifier for every in-memory object; this indexing is used to quickly check if an object originated from somewhere is in the Identity Map, without extracting its identifier and class name. The sample code is part of the Unit of Work of Doctrine 2. I cut all the aspects which did not involve its internal Identity Map as we have already described it in its own article. * @author Guilherme Blanco * @author Jonathan Wage * @author Roman Borschel * @internal This class contains highly performance-sensitive code. */ class UnitOfWork implements PropertyChangedListener { //... /** * The identity map that holds references to all managed entities that have * an identity. The entities are grouped by their class name. * Since all classes in a hierarchy must share the same identifier set, * we always take the root class name of the hierarchy. * * @var array */ private $_identityMap = array(); /** * Map of all identifiers of managed entities. * Keys are object ids (spl_object_hash). * * @var array */ private $_entityIdentifiers = array(); /** * INTERNAL: * Registers an entity in the identity map. * Note that entities in a hierarchy are registered with the class name of * the root entity. * * @ignore * @param object $entity The entity to register. * @return boolean TRUE if the registration was successful, FALSE if the identity of * the entity in question is already managed. */ public function addToIdentityMap($entity) { $classMetadata = $this->_em->getClassMetadata(get_class($entity)); $idHash = implode(' ', $this->_entityIdentifiers[spl_object_hash($entity)]); if ($idHash === '') { throw new \InvalidArgumentException("The given entity has no identity."); } $className = $classMetadata->rootEntityName; if (isset($this->_identityMap[$className][$idHash])) { return false; } $this->_identityMap[$className][$idHash] = $entity; if ($entity instanceof NotifyPropertyChanged) { $entity->addPropertyChangedListener($this); } return true; } /** * INTERNAL: * Removes an entity from the identity map. This effectively detaches the * entity from the persistence management of Doctrine. * * @ignore * @param object $entity * @return boolean */ public function removeFromIdentityMap($entity) { $oid = spl_object_hash($entity); $classMetadata = $this->_em->getClassMetadata(get_class($entity)); $idHash = implode(' ', $this->_entityIdentifiers[$oid]); if ($idHash === '') { throw new \InvalidArgumentException("The given entity has no identity."); } $className = $classMetadata->rootEntityName; if (isset($this->_identityMap[$className][$idHash])) { unset($this->_identityMap[$className][$idHash]); $this->_entityStates[$oid] = self::STATE_DETACHED; return true; } return false; } /** * Checks whether an entity is registered in the identity map of this UnitOfWork. * * @param object $entity * @return boolean */ public function isInIdentityMap($entity) { $oid = spl_object_hash($entity); if ( ! isset($this->_entityIdentifiers[$oid])) { return false; } $classMetadata = $this->_em->getClassMetadata(get_class($entity)); $idHash = implode(' ', $this->_entityIdentifiers[$oid]); if ($idHash === '') { return false; } return isset($this->_identityMap[$classMetadata->rootEntityName][$idHash]); } }
May 24, 2010
by Giorgio Sironi
· 8,723 Views
  • Previous
  • ...
  • 428
  • 429
  • 430
  • 431
  • 432
  • 433
  • 434
  • 435
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×