Data Engineering Resources

The Latest Data Engineering Topics

A simple and intuitive approach to interface your database with Java

Introduction In recent years, I have experienced the same developer's need again and again. The need for improved persistence support. After lots of years of experience with Java, I have grown tired with all the solutions that are "standard", "J2EE compliant", but in the end, just ever so complicated. I don't deny, there are many good ideas around, that have eventually brought up excellent tools, such as Hibernate, JPA/EJB3, iBatis, etc. But all of those tools seem to go to a single direction without giving up any of that thought: Object-relational Mapping. So you end up using a performant database that cost's 100k+$ of license every year just to abstract it with a "standard" persistence layer. I wanted to go a different direction. And take the best of OR-Mapping (code generation, type safety, object oriented query construction, SQL dialect abstraction, etc) without denying the fact, that beneath, I'm running an RDBMS. That's right. R like Relational. Read on about how jOOQ (Java Object Oriented Querying) succeeds in bringing the "relational to the object" Abstract Many companies and software projects seem to implement one of the following two approaches to interfacing Java with SQL The very basic approach: Using JDBC directly or adding a home-grown abstraction on top of it. There is a lot of manual work associated with the creation, maintenance, and extension of the data layer code base. Developers can easily use the full functionality of the underlying database, but will always operate on a very low level, concatenating Strings all over the place. The very sophisticated approach: There is a lot of configuration and a steep learning curve associated with the introduction of sophisticated database abstraction layers, such as the ones created by Hibernate, JPA, iBatis, or even plain old EJB entity beans. While the generated objects and API's may allow for easy manipulation of data, the setup and maintenance of the abstraction layer may become very complex. Besides, these abstraction layers provide so much abstraction on top of SQL, that SQL-experienced developers have to rethink. A different paradigm I tried to find a new solution addressing many issues that I think most developers face every day. With jOOQ - Java Object Oriented Querying, I want to embrace the following paradigm: SQL is a good thing. Many things can be expressed quite nicely in SQL. The relational data model is a good thing. It should not be abstracted by OR-Mapping SQL has a structure and syntax. It should not be expressed using "low-level" String concatenation. Variable binding tends to be very complex when dealing with major queries. POJO's (or data transfer objects) are great when writing Java code manipulating database data. POJO's are a pain to write and maintain manually. Source code generation is the way to go The database comes first. Then the code on top of it. Yes, you do have stored procedures and user defined types (UDT's) in your legacy database. Your database-tool should support that. I think that these key ideas are useful for a very specific type of developer. That specific developer interfaces Java with huge legacy databases. knows SQL well and wants to use it extensively. doesn't want to learn any new language (HQL, JPQL, etc) doesn't want to spend one minute fine-tuning some sophisticated XML-configuration. wants little abstraction over SQL, because his software is tightly coupled with his database. Something that I think the guys at Hibernate or JPA seem to have ignored. needs a strong but light-weight library for database access. For instance to develop for mobile devices. How does jOOQ fit in this paradigm? Not only does jOOQ completely address the above paradigm, it does so quite elegantly. Let's say you have this database that models your bookstore. And you need to run a query selecting all books by authors born after 1920. You know how to do this in SQL: -- Select all books by authors born after 1920, named "Paulo" from a catalogue: SELECT * FROM t_author a JOIN t_book b ON a.id = b.author_id WHERE a.year_of_birth > 1920 AND a.first_name = 'Paulo' ORDER BY b.title The same query expressed with jOOQ-Objects // Instanciate your factory using a JDBC connection // and specify the SQL dialect you're using. Of course you can // have several factories in your application. Factory create = new Factory(connection, SQLDialect.MYSQL); // Create the query using generated, type-safe objects. You could // write even less code than that with static imports! SelectQuery q = create.selectQuery(); q.addFrom(TAuthor.T_AUTHOR); q.addJoin(TBook.T_BOOK, TAuthor.ID, TBook.AUTHOR_ID); // Note how you do not need to worry about variable binding. // jOOQ does that for you, dynamically q.addCompareCondition(TAuthor.YEAR_OF_BIRTH, 1920, Comparator.GREATER); // The AND operator and EQUALS comparator are implicit here q.addCompareCondition(TAuthor.FIRST_NAME, "Paulo"); q.addOrderBy(TBook.TITLE); The jOOQ query object model uses generated classes, such as TAuthor or TBook. Like many other code generation tools do, jOOQ will generate static final objects for the fields contained in each table. In this case, TAuthor holds a member called TAuthor.T_AUTHOR to represent the table itself, and members such as TAuthor.ID, TAuthor.YEAR_OF_BIRTH, etc to hold the table's fields. But you could also use the jOOQ DSL API to stay closer to SQL // Do it all "on one line". SelectQuery q = create.select() .from(T_AUTHOR) .join(T_BOOK).on(TAuthor.ID.equal(TBook.AUTHOR_ID)) .where(TAuthor.YEAR_OF_BIRTH.greaterThan(1920) .and(TAuthor.FIRST_NAME.equal("Paulo"))) .orderBy(TBook.TITLE).getQuery(); jOOQ ships with a DSL (Domain Specific Language) somewhat similar to Linq that facilitates query creation. The strength of DSL becomes obvious when you are using jOOQ constructs such as the decode function: // Create a case statement. Unfortunately "case" is a reserved word in Java // Hence the method is called DECODE after its related Oracle function Field nationality = create.decode() .when(TAuthor.FIRST_NAME.equal("Paulo"), "brazilian") .when(TAuthor.FIRST_NAME.equal("George"), "english") .otherwise("unknown"); // "else" is also a reserved word ;-) The above will render this SQL code: CASE WHEN T_AUTHOR.FIRST_NAME = 'Paulo' THEN 'brazilian' WHEN T_AUTHOR.FIRST_NAME = 'George' THEN 'english' ELSE 'unknown' END Use the DSL API when: You want your Java code to look like SQL You want your IDE to help you with auto-completion (you will not be able to write select .. order by .. where .. join or any of that stuff) Use the regular API when: You want to create your query step-by-step, creating query parts one-by-one You need to assemble your query from various places, passing the query around, adding new conditions and joins on the way In any case, all API's will construct the same underlying implementation object, and in many cases, you can combine the two approaches Once you have established the query, execute it and fetch results // Execute the query and fetch the results q.execute(); Result result = q.getResult(); // Result is Iterable, so you can loop over the resulting records like this: for (Record record : result) { // Type safety assured with generics String firstName = record.getValue(TAuthor.FIRST_NAME); String lastName = record.getValue(TAuthor.LAST_NAME); String title = record.getValue(TBook.TITLE); Integer publishedIn = record.getValue(TBook.PUBLISHED_IN); System.out.println(title + " (published in " + publishedIn + ") by " + firstName + " " + lastName); } Or simply write for (Record record : q.fetch()) { // [...] } Fetch data from a single table and use jOOQ as a simple OR-Mapper // Similar query, but don't join books to authors. // Note the generic record type that is added to your query: SimpleSelectQuery q = create.select(T_AUTHOR) .where(TAuthor.YEAR_OF_BIRTH.greaterThan(1920) .and(TAuthor.FIRST_NAME.equal("Paulo"))) .orderBy(TAuthor.LAST_NAME).getQuery(); // When executing this query, also Result holds a generic type: q.execute(); Result result = q.getResult(); for (TAuthorRecord record : result) { // With generate record classes, you can use generated getters and setters: String firstName = record.getFirstName(); String lastName = record.getLastName(); System.out.println("Author : " + firstName + " " + lastName + " wrote : "); // Use generated foreign key navigation methods for (TBookRecord book : record.getTBooks()) { System.out.println(" Book : " + book.getTitle()); } } jOOQ not only generates code to model your schema, but it also generates domain model classes to represent tuples in your schema. In the above example, you can see how selecting from the TAuthor.T_AUTHOR table will produce results containing well-defined TAuthorRecord types. These types hold getters and setters like any POJO, but also some more advanced OR-code, such as foreign key navigator methods like // Return all books for an author that are obtained through the // T_AUTHOR.ID = T_BOOK.AUTHOR_ID foreign key relationship public List getTBooks() Now, for true OR-mapping, you would probably prefer mature and established frameworks such as Hibernate or iBATIS. Don't panic. Better integration with Hibernate and JPA is on the feature roadmap. The goals of jOOQ should not be to reimplement things that are already well-done, but to bring true SQL to Java Execute CRUD operations with jOOQ as an OR-mapper // Create a new record and insert it into the database TBookRecord book = create.newRecord(T_BOOK); book.setTitle("My first book"); book.store(); // Update it with new values book.setPublishedIn(2010); book.store(); // Delete it book.delete(); Nothing new in the OR-mapping world. These ideas have been around since EJB entity beans or even before. It's still quite useful for simple purposes. Execute CRUD operations the way you're used to You don't need to go into that OR-mapping business. You can create your own INSERT, "INSERT SELECT", UPDATE, DELETE queries. Some examples: InsertQuery i = create.insertQuery(T_AUTHOR); i.addValue(TAuthor.FIRST_NAME, "Hermann"); i.addValue(TAuthor.LAST_NAME, "Hesse"); i.execute(); UpdateQuery u = create.updateQuery(T_AUTHOR); u.addValue(TAuthor.FIRST_NAME, "Hermie"); u.addCompareCondition(TAuthor.LAST_NAME.equal("Hesse")); u.execute(); // etc... Now for the advanced stuff Many tools can do similar stuff as what we have seen before. Especially Hibernate and JPA have a feature called criteria query, that provides all of the type-safety and query object building using DSL's while being based on a solid (but blown-up) underlying architecture. An important goal for jOOQ is to provide you with all (or at least: most) SQL features that you are missing in other frameworks but that you would like to use because you think SQL is a great thing but JDBC is too primitive for the year 2010, 2011, or whatever year we're in, when you're reading this. So, jOOQ comes along with aliasing, nested selects, unions and many other SQL features. Check out the following sections: Aliasing That's a very important feature. How could you have self-joins or in/exists clauses without aliasing? Let's say we have a "T_TREE" table with fields "ID", "PARENT_ID", and "NAME". If we want to find all parent/child NAME couples, we will need to execute a self-join on T_TREE. In SQL, this reads: SELECT parent.NAME parent_name, child.NAME child_name FROM T_TREE parent JOIN T_TREE child ON (parent.ID = child.PARENT_ID) No problem for jOOQ. We'll write: // Create table aliases Table parent = TTree.T_TREE.as("parent"); Table child = TTree.T_TREE.as("child"); // Create field aliases from aliased table Field parentName = parent.getField(TTree.NAME).as("parent_name"); Field childName = child.getField(TTree.NAME).as("child_name"); // Execute the above select Record record = create.select(parentName, childName) .from(parent) .join(child).on(parent.getField(TTree.ID).equal(child.getField(TTree.PARENT_ID))) .fetchAny(); // The aliased fields can be read from the record as in the simpler examples: record.getValue(parentName); Functionally, it is easy to see how this works. Look out for future releases of jOOQ for improvements in the DSL support of field and table aliasing IN clause The org.jooq.Field class provides many methods to construct conditions. In previous examples, we have seen how to create regular compare conditions with = < <= >= > != operators. Now Field also has a couple of methods to create IN conditions: // Create IN conditions with constant values that are bound to the // query via JDBC's '?' bind variable placeholders Condition in(T... values); Condition in(Collection values); Condition notIn(T... values); Condition notIn(Collection values); // Create IN conditions with a sub-select Condition in(QueryProvider query) Condition notIn(QueryProvider query) The constant set of values for IN conditions is an obvious feature. But the sub-select is quite nice: -- Select authors with books that are sold out SELECT * FROM T_AUTHOR WHERE T_AUTHOR.ID IN (SELECT DISTINCT T_BOOK.AUTHOR_ID FROM T_BOOK WHERE T_BOOK.STATUS = 'SOLD OUT'); In jOOQ, this translates to create.select(T_AUTHOR) .where (TAuthor.ID.in(create.selectDistinct(TBook.AUTHOR_ID) .from(T_BOOK) .where(TBook.STATUS.equal(TBookStatus.SOLD_OUT)))); EXISTS clause Very similar statements can be expressed with the EXISTS clause. The above set of authors could also be obtained with this statement: -- Select authors with books that are sold out SELECT * FROM T_AUTHOR a WHERE EXISTS (SELECT 1 FROM T_BOOK WHERE T_BOOK.STATUS = 'SOLD OUT' AND T_BOOK.AUTHOR_ID = a.ID); In jOOQ (as of version 1.5.0), this translates to // Alias the author table Table a = T_AUTHOR.as("a"); // Use the aliased table in the select statement create.selectFrom(a) .where(create.exists(create.select(create.constant(1)) .from(T_BOOK) .where(TBook.STATUS.equal(TBookStatus.SOLD_OUT) .and(TBook.AUTHOR_ID.equal(a.getField(TAuthor.ID)))))); UNION clauses SQL knows of four types of "UNION operators": UNION UNION ALL EXCEPT INTERSECT All of these operators are supported by all types of select queries. So in order to write things like: SELECT TITLE FROM T_BOOK WHERE PUBLISHED_IN > 1945 UNION SELECT TITLE FROM T_BOOK WHERE AUTHOR_ID = 1 You can write the following jOOQ logic: create.select(TBook.TITLE).from(T_BOOK).where(TBook.PUBLISHED_IN.greaterThan(1945)).union( create.select(TBook.TITLE).from(T_BOOK).where(TBook.AUTHOR_ID.equal(1))); Of course, you can then again nest the union query in another one (but be careful to correctly use aliases): -- alias_38173 is an example of a generated alias, -- generated by jOOQ for union queries SELECT alias_38173.TITLE FROM ( SELECT T_BOOK.TITLE, T_BOOK.AUTHOR_ID FROM T_BOOK WHERE T_BOOK.PUBLISHED_IN > 1945 UNION SELECT T_BOOK.TITLE, T_BOOK.AUTHOR_ID FROM T_BOOK WHERE T_BOOK.AUTHOR_ID = 1 ) alias_38173 ORDER BY alias_38173.AUTHOR_ID DESC In jOOQ: Select union = create.select(TBook.TITLE, TBook.AUTHOR_ID).from(T_BOOK).where(TBook.PUBLISHED_IN.greaterThan(1945)).union( create.select(TBook.TITLE, TBook.AUTHOR_ID).from(T_BOOK).where(TBook.AUTHOR_ID.equal(1))); create.select(union.getField(TBook.TITLE)) .from(union) .orderBy(union.getField(TBook.AUTHOR_ID).descending()); Note that a UNION query will automatically generate an alias if you use it as a nested table. In order to nest this query correctly, you need to get the aliased field from the query as seen in the example abov. Other, non-standard SQL features See more examples about stored procedures, UDT's, enums, etc on https://sourceforge.net/apps/trac/jooq/wiki/Examples Summary jOOQ brings the relational world to Java without trying to cover up its origins. jOOQ is relational. And object oriented. Just in a different way. Try it for yourself and I would be very glad for any feedback you may have. Find jOOQ on http://jooq.sourceforge.net Cheers Lukas Eder

December 14, 2010

by Lukas Eder

· 3,871 Views

Getting MySQL work with Entity Framework 4.0

Does MySQL work with Entity Framework 4.0? The answer is: yes, it works! I just put up one experimental project to play with MySQL and Entity Framework 4.0 and in this posting I will show you how to get MySQL data to EF. Also I will give some suggestions how to deploy your applications to hosting and cloud environments. MySQL stuff As you may guess you need MySQL running somewhere. I have MySQL installed to my development machine so I can also develop stuff when I’m offline. The other thing you need is MySQL Connector for .NET Framework. Currently there is available development version of MySQL Connector/NET 6.3.5 that supports Visual Studio 2010. Before you start download MySQL and Connector/NET: MySQL Community Server Connector/NET 6.3.5 If you are not big fan of phpMyAdmin then you can try out free desktop client for MySQL – HeidiSQL. I am using it and I am really happy with this program. NB! If you just put up MySQL then create also database with couple of table there. To use all features of Entity Framework 4.0 I suggest you to use InnoDB or other engine that has support for foreign keys. Connecting MySQL to Entity Framework 4.0 Now create simple console project using Visual Studio 2010 and go through the following steps. 1. Add new ADO.NET Entity Data Model to your project. For model insert the name that is informative and that you are able later recognize. Now you can choose how you want to create your model. Select “Generate from database” and click OK. 2. Set up database connection Change data connection and select MySQL Database as data source. You may also need to set provider – there is only one choice. Select it if data provider combo shows empty value. Click OK and insert connection information you are asked about. Don’t forget to click test connection button to see if your connection data is okay. If everything works then click OK. 3. Insert context name Now you should see the following dialog. Insert your data model name for application configuration file and click OK. Click next button. 4. Select tables for model Now you can select tables and views your classes are based on. I have small database with events data. Uncheck the checkbox “Include foreign key columns in the model” – it is damn annoying to get them away from model later. Also insert informative and easy to remember name for your model. Click finish button. 5. Define your classes Now it’s time to define your classes. Here you can see what Entity Framework generated for you. Relations were detected automatically – that’s why we needed foreign keys. The names of classes and their members are not nice yet. After some modifications my class model looks like on the following diagram. Note that I removed attendees navigation property from person class. Now my classes look nice and they follow conventions I am using when naming classes and their members. NB! Don’t forget to see properties of classes (properties windows) and modify their set names if set names contain numbers (I changed set name for Entity from Entity1 to Entities). 6. Let’s test! Now let’s write simple testing program to see if MySQL data runs through Entity Framework 4.0 as expected. My program looks for events where I attended. using(var context = new MySqlEntities()) { var myEvents = from e in context.Events from a in e.Attendees where a.Person.FirstName == "Gunnar" && a.Person.LastName == "Peipman" select e; Console.WriteLine("My events: "); foreach(var e in myEvents) { Console.WriteLine(e.Title); } } Console.ReadKey(); And when I run it I get the result shown on screenshot on right. I checked out from database and these results are correct. At first run connector seems to work slow but this is only the effect of first run. As connector is loaded to memory by Entity Framework it works fast from this point on. Now let’s see what we have to do to get our program work in hosting and cloud environments where MySQL connector is not installed. Deploying application to hosting and cloud environments If your hosting or cloud environment has no MySQL connector installed you have to provide MySQL connector assemblies with your project. Add the following assemblies to your project’s bin folder and include them to your project (otherwise they are not packaged by WebDeploy and Azure tools): MySQL.Data MySQL.Data.Entity MySQL.Web You can also add references to these assemblies and mark references as local so these assemblies are copied to binary folder of your application. If you have references to these assemblies then you don’t have to include them to your project from bin folder. Also add the following block to your application configuration file. ... ... Conclusion It was not hard to get MySQL connector installed and MySQL connected to Entity Framework 4.0. To use full power of Entity Framework we used InnoDB engine because it supports foreign keys. It was also easy to query our model. To get our project online we needed some easy modifications to our project and configuration files.

December 10, 2010

by Gunnar Peipman

· 24,533 Views

Using Sphinx and Java to Implement Free Text Search

As promised I am going to provide an article on how we can use Sphinx with Java to perform a full text search. I will begin the article with an introduction to Sphinx. Introduction to Sphinx Databases are continually growing and sometimes tend to hold about 100M records and need an external solution for full text search to be performed. I have picked Sphinx, an open source full-text search engine, distributed under GPL version 2 to perform a full text search on such a huge amount of data. Generally, it's a standalone search engine meant to provide fast, size-efficient and relevant full-text search functions to other applications very much compatible with an SQL Database. So my example will be based on the MySQL database, as we cannot produce millions of data to evaluate the real power of Sphinx, we will have a small amount of data and I think that should not be a problem. Here are few Sphinx Unique Features: high indexing speed (up to 10 MB/sec on modern CPUs) high search speed (avg query is under 0.1 sec on 2-4 GB text collections) high scalability (up to 100 GB of text, upto 100 M documents on a single CPU) provides distributed searching capabilities provides searching from within MySQL through pluggable storage engine supports boolean, phrase, and word proximity queries supports multiple full-text fields per document (upto 32 by default) supports multiple additional attributes per document (ie. groups, timestamps, etc) supports MySQL natively (MyISAM and InnoDB tables are both supported) The important features which have been adopted to perform a full text search are the provision of the Java API to integrate easily with the web application and considerably high indexing and searching speed with an average of 4-10 MB/sec & 20-30 ms/q @5GB,3.5M docs(wikipedia) Sphinx Terms & How It Works The fist principle part of sphinx is indexer. It is solely responsible for gathering the data that will be searchable. From the Sphinx point of view, the data it indexes is a set of structured documents, each of which has the same set of fields. This is biased towards SQL, where each row corresponds to a document, and each column to a field. Sphinx builds a special data structure optimized for our queries from the data provided. This structure is called index; and the process of building index from data is called indexing and the element of sphinx which carries out these tasks is called indexer. Indexer can be executed either from a regular script or command-line interface. Sphinx documents are equal to records in DB. Document is set of text fields and number attributes + unique ID – similar to row in DB Set of fields and attributes is constant for index – similar to table in DB Fields are searchable for FullText queries Attributes may be used for filtering, sorting, grouping searchd is the second principle tools as part of Sphinx. It is the part of the system which actually handles searches; it functions as a server and is responsible for receiving queries, processing them and returning a dataset back to the different APIs for client applications. Unlike indexer, searchd is not designed to be run either from a regular script or command-line calling, but instead either as a daemon to be called from init.d (on Unix/Linux type systems) or to be called as a service (on Windows-type systems). I am going to focus on Windows environment so later I will show you how we can install sphinx on windows as a service. Finally search is one of the helper tools within the Sphinx package. Whereas searchd is responsible for searches in a server-type environment, search is aimed at testing the index quickly without building a framework to make the connection to the server and process its response. This will only be used for testing sphinx from command – line and with respect to application’s requirement; searchd service will be used to query the MySql Server with a pre created index. Installation on Windows So now we come to the part of installing Sphinx on Windows: Download Sphinx from the official Sphinx download site i.e http://sphinxsearch.com (I downloaded Win32 release binaries with MySQL support: sphinx-0.9.9-win32.zip) Unzip the file to some folder, I unzipped to C:\devel\sphinx-0.9.9-win32 and added the bin directory to the windows path variable Well Sphinx is installed. Nice, simple, easy. Later I will tell how to set up indexes and search. Sample Application Till now I guess the whole motto of this article is clear to you, let's move ahead to define our sample application. We all use the Address Book to search for people by using their name or e-mail address when we want to immediately address an e-mail message to a specific person, people, or distribution list. We also search for people by using other basic information, such as e-mail alias, office location, and telephone number etc. I think most of the people on this planet are quire familiar with this kind of search, so let's make outlook address book as our sample database schema. Most of the fields are mapped from microsoft outlook, the only additional column is date of joining so that we can filter our queries based on joining dates of the employees. The example that I am going to put forth will use Sphinx to search for a particluar address entry using free text search, meaning the user is free to type in anything, here is our search screen, the DOJ (date of joining) search parameter is optional. The screen is self explanatory, let's move ahead and define our database. As Sphinx works well with MySQL and MySQL being free also, lets create our db scripts around mysql database (Those who wish to install MySQL can dowload it from http://www.mysql.com) Let's create our sample database 'addressbook' mysql> create database addressbook; Query OK, 1 row affected (0.03 sec) mysql> use addressbook; Database changed Note: The fields defined in the following tables are for the purpose of learning only and may not contain a complete set of fields that microsoft address book or any similar software may provide. mysql> CREATE TABLE addressbook ( Id int(11) NOT NULL, FirstName varchar(30) NOT NULL, LastName varchar(30) NOT NULL, OfficeId int(11) DEFAULT NULL, Title varchar(20) DEFAULT NULL, Alias varchar(20) NOT NULL, Email varchar(50) NOT NULL, DOJ date NOT NULL, PhoneNo varchar(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; mysql> CREATE TABLE CompanyLocations ( Id int(11) NOT NULL, Location varchar(60) NOT NULL, Country varchar(20) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; It's time to put some dummy data into the table, so let's fill our tables. Our virtual company 'gogs.it' has six offices across India and Singapore as defined in the following insert script. mysql> insert into CompanyLocations (Id, Location, Country) VALUES (1, 'Tower One, Harbour Front, Singapore', 'SG'); insert into CompanyLocations (Id, Location, Country) VALUES (2, 'DLF Phase 3, Gurgaon, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (3, 'Hiranandani Gardens, Powai, Mumbai, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (4, 'Hinjwadi, Pune, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (5, 'Toll Post, Nagrota, Jammu, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (6, 'Bani (Kathua), India', 'IN'); Now comes the real stuff... The data sphinx is going to index, let's populate that as well...wooooo mysql> INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (1,'Aabheer','Kumar',1,'Mr','u534','[email protected]','2008-9-3', '+911234599990'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (2,'Aadarsh','Gupta',6,'Mr','u668','[email protected]','2007-2-23','+911234599991'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (3,'Aachman','Singh',5,'Mr','u2766','[email protected]','2006-12-18','+911234599992'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (4,'Aadesh','Shrivastav',5,'Mr','u3198','[email protected]','2007-11-23','+911234599993'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (5,'Aadi','manav',1,'Mr','u2686','[email protected]','2010-7-20','+911234599994'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (6,'Aadidev','singh',4,'Mr','u572','[email protected]','2010-8-18','+911234599995'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (7,'Aafreen','sheikh',4,'Smt','u1092','[email protected]','2007-7-11','+911234599996'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (8,'Aakar','Sherpa',5,'Mr','u1420','[email protected]','2009-10-3','+911234599997'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (9,'Aakash','Singh',4,'Mrs','u2884','[email protected]','2008-6-11','+911234599998'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (10,'Aalap','Singhania',4,'Mrs','u609','[email protected]','2010-10-8','+911234599999'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (11,'Aandaleeb','mahajan',1,'Smt','u131','[email protected]','2010-10-21','+911234580001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (12,'Mamata','kumari',5,'Sh','u2519','[email protected]','2009-6-12','+911234580002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (13,'Mamta','sharma',6,'Smt','u4123','[email protected]','2009-2-8','+911234580003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (14,'Manali','singh',6,'Mr','u1078','[email protected]','2008-6-14','+911234580004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (15,'Manda','saxena',1,'Mrs','u196','[email protected]','2010-9-4','+911234580005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (16,'Salila','shetty',3,'Miss','u157','[email protected]','2009-11-15','+911234580006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (17,'Salima','happy',3,'Mrs','u3445','[email protected]','2006-7-14','+911234580007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (18,'Salma','haik',5,'Sh','u4621','[email protected]','2008-6-23','+911234580008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (19,'Samita','patil',3,'Smt','u3156','[email protected]','2006-6-7','+911234580009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (20,'Sameena','sheikh',5,'Mrs','u952','[email protected]','2008-8-13','+911234580010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (21,'Ranita','gupta',5,'Mrs','u2664','[email protected]','2008-10-20','+911234580011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (22,'Ranjana','sharma',1,'Sh','u3085','[email protected]','2010-6-21','+911234580012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (23,'Ranjini','singh',6,'Mrs','u4200','[email protected]','2007-4-13','+911234580013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (24,'Ranjita','vyapari',2,'Smt','u1109','[email protected]','2008-1-22','+911234580014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (25,'Rashi','gupta',6,'Mrs','u3492','[email protected]','2006-2-2','+911234580015'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (26,'Rashmi','sehgal',3,'Mr','u3248','[email protected]','2008-9-9','+911234580016'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (27,'Rashmika','sexy',1,'Mrs','u4599','[email protected]','2009-3-12','+911234580017'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (28,'Rasika','dulari',3,'Smt','u2089','[email protected]','2009-1-24','+911234580018'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (29,'Dilber','lover',6,'Mr','u4241','[email protected]','2007-10-11','+911234580019'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (30,'Dilshad','happy',1,'Mr','u1564','[email protected]','2007-4-8','+911234580020'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (31,'Dipali','lights',5,'Sh','u1127','[email protected]','2006-11-1','+911234580021'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (32,'Dipika','lamp',1,'Sh','u2271','[email protected]','2010-12-17','+911234580022'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (33,'Dipti','brightness',5,'Smt','u422','[email protected]','2010-9-25','+911234580023'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (34,'Disha','singh',3,'Sh','u4604','[email protected]','2006-5-2','+911234580024'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (35,'Maadhav','Krishna',1,'Miss','u2561','[email protected]','2007-11-6','+911234580025'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (36,'Maagh','month',5,'Miss','u874','[email protected]','2008-5-8','+911234580026'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (37,'Maahir','Skilled',4,'Mr','u3372','[email protected]','2007-8-4','+911234580027'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (38,'Maalolan','Ahobilam',5,'Mrs','u3498','[email protected]','2007-7-9','+911234580028'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (39,'Maandhata','King',1,'Smt','u2089','[email protected]','2009-9-3','+911234580029'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (40,'Maaran','Brave',2,'Miss','u4020','[email protected]','2008-4-5','+9112345606001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (41,'Maari','Rain',2,'Sh','u3593','[email protected]','2007-12-5','+9112345606002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (42,'Madan','Cupid',4,'Mrs','u795','[email protected]','2007-11-11','+9112345606003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (43,'Madangopal','Krishna',3,'Sh','u438','[email protected]','2007-2-19','+9112345606004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (44,'sahil','gogna',1,'Sh','u2273','[email protected]','2007-10-7','+9112345606005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (45,'nikhil','gogna',2,'Mr','u1240','[email protected]','2009-9-14','+9112345606006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (46,'amit','gogna',5,'Sh','u3879','[email protected]','2006-2-8','+9112345606007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (47,'krishan','gogna',4,'Miss','u3632','[email protected]','2010-9-20','+9112345606008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (48,'anil','kashyap',4,'Smt','u3939','[email protected]','2010-3-15','+9112345606009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (49,'sunil','kashyap',5,'Mrs','u3493','[email protected]','2008-3-16','+9112345606010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (50,'sandy','singh',6,'Mrs','u4691','[email protected]','2009-6-2','+9112345606011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (51,'vishal','kapoor',3,'Mr','u1087','[email protected]','2010-5-13','+9112345606012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (52,'bala','ji',5,'Mrs','u4762','[email protected]','2007-8-9','+9112345606013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (53,'karan','sarin',4,'Miss','u3030','[email protected]','2008-4-8','+9112345606014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (54,'abhishek','kumar',4,'Miss','u1093','[email protected]','2008-12-21','+9112345605001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (55,'babu','the',1,'Miss','u1055','[email protected]','2008-7-2','+9112345506001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (56,'sandeep','gainda',3,'Miss','u1320','[email protected]','2010-5-14','+9112345606301'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (57,'dheeraj','kumar',3,'Miss','u3685','[email protected]','2007-10-14','+9112345606091'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (58,'dharmendra','chauhan',1,'Smt','u3235','[email protected]','2008-8-1','+9112345806001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (59,'max','alan',3,'Smt','u3465','[email protected]','2009-5-5','+9112345608011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (60,'hidayat','khan',3,'Smt','u958','[email protected]','2007-11-18','+911234599101'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (61,'himnashu','singh',4,'Miss','u2027','[email protected]','2008-3-2','+911234599102'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (62,'dinesh','kumar',6,'Sh','u3233','[email protected]','2008-5-9','+911234599103'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (63,'toshi','prakash',1,'Mr','u3766','[email protected]','2010-9-17','+911234599104'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (64,'niti','puri',3,'Mr','u3575','[email protected]','2009-11-15','+911234599105'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (65,'pawan','tikki',3,'Sh','u3919','[email protected]','2006-3-19','+911234599106'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (66,'gaurav','sharma',2,'Sh','u413','[email protected]','2010-4-2','+911234599107'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (67,'himanshu','verma',2,'Mrs','u4732','[email protected]','2009-3-20','+911234599108'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (68,'priyanshu','verma',3,'Sh','u183','[email protected]','2010-8-12','+911234599109'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (69,'nitika','luthra',2,'Mrs','u4259','[email protected]','2010-7-12','+911234599110'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (70,'neeru','gogna',2,'Sh','u1633','[email protected]','2010-6-23','+91532110000'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (71,'bindu','gupta',1,'Sh','u1859','[email protected]','2006-11-10','+91532110001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (72,'gurleen','bakshi',5,'Miss','u1423','[email protected]','2007-7-1','+91532110003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (73,'rahul','gupta',3,'Sh','u1223','[email protected]','2009-8-11','+91532110004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (74,'jagdish','salgotra',3,'Mr','u12','[email protected]','2008-5-19','+91532110005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (75,'vikas','sharma',3,'Smt','u465','[email protected]','2006-6-2','+91532110006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (76,'poonam','mahendra',2,'Sh','u1744','[email protected]','2009-12-2','+91532110007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (77,'pooja','kulkarni',3,'Mrs','u1903','[email protected]','2008-10-6','+91532110008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (78,'priya','mahajan',6,'Sh','u4205','[email protected]','2010-8-5','+91532110009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (79,'manoj','zerger',1,'Mrs','u3369','[email protected]','2009-12-4','+91532110010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (80,'mohan','master',5,'Mr','u2841','[email protected]','2010-10-7','+91532110011'); Please note that above employee data is just a data *only data* I created using a small java programme using random number generators and reading some names file, so you may find titles getting messed up :( We next create a procedure that we will use from java to fetch records that we just inserted. DROP PROCEDURE IF EXISTS search_address_book; CREATE PROCEDURE search_address_book(IN address_ids VARCHAR(1000) ) BEGIN DECLARE search_address_query VARCHAR(2000) DEFAULT ''; SET address_ids = CONCAT('\'', REPLACE(address_ids, ',', '\',\''), '\''); SET search_address_query = CONCAT(search_address_query, ' select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, ab.Title as Title, ab.Alias as Alias, ab.Email as Email, ab.DOJ as DOJ, ab.PhoneNo as PhoneNo ' ); SET search_address_query = CONCAT(search_address_query, ' from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id '); SET search_address_query = CONCAT(search_address_query, ' where ab.id IN (', address_ids ,') '); SET @statement = search_address_query; PREPARE dynquery FROM @statement; EXECUTE dynquery; DEALLOCATE PREPARE dynquery; END; # To get records for ids 1, 6 and 7, we run following commands: call search_address_book('1,6,7'); Configuring Sphinx It turns out that it was not terribly difficult to setup sphinx, but I had a hard time finding instructions on the web, so I'll post my steps here. By default Sphinx looks for 'sphinx.co.in' configuration file to come with indexes and other stuff, lets create and define source and index for our sample application addressbook.conf (read between the lines) ############################################################################# ## data source definition ############################################################################# source addressBookSource { ## SQL settings for 'mysql' ## type = mysql # some straightforward parameters for SQL source types sql_host = localhost sql_user = root sql_pass = root sql_db = addressbook sql_port = 3306 # optional, default is 3306 # pre-query, executed before the main fetch query sql_query_pre = SET NAMES utf8 # main document fetch query, integer document ID field MUST be the first selected column sql_query = \ select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, \ ab.Title as Title, ab.Alias as Alias, ab.Email as Email, UNIX_TIMESTAMP(ab.DOJ) as DOJ, ab.PhoneNo as PhoneNo \ from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id sql_attr_timestamp = DOJ # document info query, ONLY for CLI search (ie. testing and debugging) , optional, default is empty must contain $id macro and must fetch the document by that id sql_query_info = SELECT * FROM AddressBook WHERE id=$id } ############################################################################# ## index definition ############################################################################# # local index example, this is an index which is stored locally in the filesystem index addressBookIndex { # document source(s) to index source = addressBookSource # index files path and file name, without extension, make sure you have this folder path = C:\devel\sphinx-0.9.9-win32\data\addressBookIndex # document attribute values (docinfo) storage mode docinfo = extern # memory locking for cached data (.spa and .spi), to prevent swapping mlock = 0 morphology = none # make sure this file exists exceptions =C:\devel\sphinx-0.9.9-win32\data\exceptions.txt enable_star = 1 } ############################################################################# ## indexer settings ############################################################################# indexer { # memory limit, in bytes, kiloytes (16384K) or megabytes (256M) # optional, default is 32M, max is 2047M, recommended is 256M to 1024M mem_limit = 32M # maximum IO calls per second (for I/O throttling) # optional, default is 0 (unlimited) # # max_iops = 40 # maximum IO call size, bytes (for I/O throttling) # optional, default is 0 (unlimited) # # max_iosize = 1048576 # maximum xmlpipe2 field length, bytes # optional, default is 2M # # max_xmlpipe2_field = 4M # write buffer size, bytes # several (currently up to 4) buffers will be allocated # write buffers are allocated in addition to mem_limit # optional, default is 1M # # write_buffer = 1M } ############################################################################# ## searchd settings ############################################################################# searchd { # hostname, port, or hostname:port, or /unix/socket/path to listen on listen = 9312 # log file, searchd run info is logged here # optional, default is 'searchd.log' log = C:\devel\sphinx-0.9.9-win32\data\log\searchd.log # query log file, all search queries are logged here # optional, default is empty (do not log queries) query_log = C:\devel\sphinx-0.9.9-win32\data\log\query.log # client read timeout, seconds # optional, default is 5 read_timeout = 5 # request timeout, seconds # optional, default is 5 minutes client_timeout = 300 # maximum amount of children to fork (concurrent searches to run) # optional, default is 0 (unlimited) max_children = 30 # PID file, searchd process ID file name # mandatory pid_file = C:\devel\sphinx-0.9.9-win32\data\log\searchd.pid # max amount of matches the daemon ever keeps in RAM, per-index # WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL # default is 1000 (just like Google) max_matches = 1000 # seamless rotate, prevents rotate stalls if precaching huge datasets # optional, default is 1 seamless_rotate = 1 # whether to forcibly preopen all indexes on startup # optional, default is 0 (do not preopen) preopen_indexes = 0 } # --eof-- Once the configuration is done, its time to index our sql data, the command to use is 'indexer' as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... indexing index 'addressBookIndex'... collected 80 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 80 docs, 5514 bytes total 0.057 sec, 96386 bytes/sec, 1398.43 docs/sec total 2 reads, 0.000 sec, 3.5 kb/call avg, 0.0 msec/call avg total 7 writes, 0.000 sec, 2.5 kb/call avg, 0.0 msec/call avg Note: As I told earlier that Sphinx creates 1 document for each row, as we had 80 rows in the database so a total of 80 docs are created. Time taken is also very very small, believe me I tried with half million rows and it took around 3-4 seconds :) cool isn't it? Once the index is up let's try to search few records, the utility command to perform search is 'search'. Ok Sphinx maharaj* please search for employee whose alias is u4732 C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf u4732 CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... index 'addressBookIndex': query 'u4732 ': returned 1 matches of 1 total in 0.001 sec displaying matches: 1. document=67, weight=1, doj=Fri Mar 20 00:00:00 2009 Id=67 FirstName=himanshu LastName=verma OfficeId=2 Title=Mrs Alias=u4732 [email protected] DOJ=2009-03-20 PhoneNo=+911234599108 words: 1. 'u4732': 1 documents, 1 hits words: 1. 'u4732': 1 documents, 1 hits As you can see above this is a unique record for Himanshu. Note: You see a lot of information for the result, this is because of following line in our configuration file sql_query_info = SELECT * FROM AddressBook WHERE id=$id If you want to see less columns you need to change the sql_query_info in configuration file. Let's try another search, sphinx maharaj* please tell me which all rows have gurleen or toshi in them. C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --any toshi gurleen CONSOLE: displaying matches: 1. document=63, weight=2, doj=Fri Sep 17 00:00:00 2010 Id=63 FirstName=toshi LastName=prakash OfficeId=1 Title=Mr Alias=u3766 [email protected] DOJ=2010-09-17 PhoneNo=+911234599104 2. document=72, weight=2, doj=Sun Jul 01 00:00:00 2007 Id=72 FirstName=gurleen LastName=bakshi OfficeId=5 Title=Miss Alias=u1423 [email protected] DOJ=2007-07-01 PhoneNo=+91532110003 Exactly two records were returned and this is what we were expecting. The following special operators and modifiers can be used when using the extended matching mode: operator OR: nikhil | sahil operator NOT: hello -sandy hello !sandy field search operator: @Email [email protected] For a complete set of search features , I advise you to go through http://sphinxsearch.com/docs/manual-0.9.9.html#searching link. Sphinx as Windows Service Now our main aim is to use sphinx with JAVA API, so let's move towards that now, before java can utilize the true power of Sphinx, we need to start 'searchd' as a windows service so that our java programme can connect to sphinx search engine. Let's install Sphinx as a windows service so that our java program can use this daemon service to query the index that we just created, the command is : C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff Installing service... Service 'SphinxSearch' installed succesfully. Well now the sphinx is ready to serve us on port 9312 Note: If you try to install Sphinx without admin rights, you may get following error messages. C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Installing service... FATAL: OpenSCManager() failed: code=5, error=Access is denied. Once done you can start the service as: c:\>sc start SphinxSearch (or alternatively from the services screen, start 'services.msc' in windows Run) If some how you want to delete the service , use c:\>sc delete SphinxSearch Let's create an adapter to fetch data from the database. package it.gogs.sphinx.util; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.sql.CallableStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.util.ArrayList; import java.util.List; import org.apache.log4j.Logger; /** * Adapter to fetch data from the database. * * @author Munish Gogna * */ public class AddressBookAdapter { private static Logger logger = Logger.getLogger(AddressBookAdapter.class); private AddressBookAdapter() { // use in static way.. } private static Connection getConnection() throws AddressBookTechnicalException { String userName = "root"; String password = "root"; String url = "jdbc:mysql://localhost/addressbook"; try { Class.forName("com.mysql.jdbc.Driver").newInstance(); return DriverManager.getConnection(url, userName, password); } catch (Exception e) { throw new AddressBookTechnicalException("could not get connection"); } } public static List getAddressBookList(List addressIds) throws AddressBookTechnicalException, AddressBookBizException { List addressBoookList = new ArrayList(); if (addressIds == null || addressIds.size() == 0){ logger.error("AddressIds was null or empty, returning empty list"); return addressBoookList; } Connection connection = null; CallableStatement callableStatement = null; try { connection = getConnection(); callableStatement = connection.prepareCall("{ call search_address_book(?)}"); callableStatement.setString(1, Utils.toCommaString(addressIds)); callableStatement.execute(); ResultSet resultSet = callableStatement.getResultSet(); prepareResults(resultSet, addressBoookList); connection.close(); } catch (SQLException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw new AddressBookTechnicalException(e.getMessage()); } catch (AddressBookTechnicalException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw e; } finally{ if(connection != null){ try { connection.close(); } catch (SQLException e) { logger.error("Problem closing conection - " + e.getMessage()); e.printStackTrace(); } } } return addressBoookList; } private static void prepareResults(ResultSet resultSet, List addressBoookList) throws SQLException { AddressBoook addressBoook; while (resultSet.next()) { addressBoook = new AddressBoook(); addressBoook.setAlias(resultSet.getString("Alias")); addressBoook.setEmail(resultSet.getString("Email")); addressBoook.setfName(resultSet.getString("FName")); addressBoook.setlName(resultSet.getString("LName")); addressBoook.setOfficeLocation(resultSet.getString("Location")); addressBoook.setPhoneNo(resultSet.getString("PhoneNo")); addressBoook.setTitle(resultSet.getString("Title")); addressBoook.setDateOfJoining(resultSet.getDate("DOJ")); addressBoook.setId(resultSet.getLong("Id")); addressBoookList.add(addressBoook); } } } Next we create the SphinxInstance that will parse the keywords and date range and provide us a list of Ids that matches the search. package it.gogs.sphinx.util; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxClient; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.api.SphinxMatch; import it.gogs.sphinx.api.SphinxResult; import it.gogs.sphinx.exception.AddressBookBizException; import java.util.ArrayList; import java.util.Date; import java.util.List; import org.apache.log4j.Logger; /** * Instance that will parse our free text and provide the results. * * Note: Make sure that 'searchd' is up and running before you use this class * @author Munish Gogna * */ public class SphinxInstance { private static String SPHINX_HOST = "localhost"; private static String SPHINX_INDEX = "addressBookIndex"; private static int SPHINX_PORT = 9312; private static SphinxClient sphinxClient; private static Logger logger = Logger.getLogger(SphinxInstance.class); static { sphinxClient = new SphinxClient(SPHINX_HOST, SPHINX_PORT); } public static List getAddressBookIds(SearchCriteria criteria) throws AddressBookBizException, SphinxException { List addressIdsList = new ArrayList(); try { if (Utils.isNull(criteria)) { logger.error("criteria is null"); throw new AddressBookBizException("criteria is null"); } if (Utils.isNull(criteria.getKeywords())) { logger.error("keyword is a required field"); throw new AddressBookBizException("keyword is a required field"); } DateRange dateRange = criteria.getDateRage(); if (!Utils.isNull(dateRange)) { if (Utils.isDateRangeValid(dateRange)) { // this is to filter results based on joining dates if they are provided sphinxClient.SetFilterRange("DOJ", getTimeInSeconds(dateRange.getFromDate()), getTimeInSeconds(dateRange.getToDate()), false); } else { logger.error(" fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); throw new AddressBookBizException("fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); } } sphinxClient.SetMatchMode(SphinxClient.SPH_MATCH_EXTENDED2); sphinxClient.SetSortMode(SphinxClient.SPH_SORT_RELEVANCE, ""); SphinxResult result = sphinxClient.Query(buildSearchQuery(criteria), SPHINX_INDEX, "buidling query for address book search"); SphinxMatch[] matches = result.matches; for (SphinxMatch match : matches) { addressIdsList.add(String.valueOf(match.docId)); } } catch (SphinxException e) { throw e; } catch (AddressBookBizException e) { throw e; } logger.info("Total record(s):" + addressIdsList.size()); return addressIdsList; } private static long getTimeInSeconds(Date time) { return time.getTime()/1000; } private static String buildSearchQuery(SearchCriteria criteria) throws AddressBookBizException { String keywords[] = criteria.getKeywords().split(" "); StringBuilder searchFor = new StringBuilder(); for (String key : keywords) { if (!Utils.isEmpty(key)) { searchFor.append(key); if (searchFor.length() > 1) { searchFor.append("*|*"); } } } searchFor.delete(searchFor.lastIndexOf("|*"), searchFor.length()); StringBuilder queryBuilder = new StringBuilder(); String query = searchFor.toString(); queryBuilder.append("@FName *" + query + " | "); queryBuilder.append("@LName *" + query + " | "); queryBuilder.append("@Title *" + query + " | "); queryBuilder.append("@Location *"+ query + " | "); queryBuilder.append("@Alias *" + query + " | "); queryBuilder.append("@Email *" + query + " | "); queryBuilder.append("@PhoneNo *" + query); logger.info("Sphinx Query: " + queryBuilder.toString()); return queryBuilder.toString(); } } Here is the interface that I will expose to the outside world (in my future article I will expose this interface as Web Service) import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.util.List; /** * * @author Munish Gogna * */ public interface AddressBook { /** * Returns the list of AddressBook objects based on search criteria. * * @param criteria * @throws AddressBookTechnicalException * @throws AddressBookBizException * @throws SphinxException */ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException; } and here is the implementation class for the same. package it.gogs.sphinx.addressbook.impl; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import it.gogs.sphinx.util.AddressBookAdapter; import it.gogs.sphinx.util.SphinxInstance; /** * Implementation for our Address Book example * * @author Munish Gogna * */ public class AddressBookImpl implements AddressBook{ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException { List addressIds= SphinxInstance.getAddressBookIds(criteria); return AddressBookAdapter.getAddressBookList(addressIds); } } ok so far so good, let's run some tests now ............ package it.gogs.sphinx.test; import java.util.Calendar; import java.util.GregorianCalendar; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.addressbook.impl.AddressBookImpl; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import junit.framework.TestCase; /** * * @author Munish Gogna * */ public class AddressBookTest extends TestCase { private AddressBook addressBook; @Override protected void setUp() throws Exception { super.setUp(); addressBook = new AddressBookImpl(); } @Override protected void tearDown() throws Exception { super.tearDown(); } /** this should be a unique record for Himanshu */ public void test_search_for_himanshu() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the first 'search' example?? criteria.setKeywords("u4732"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 1); assertTrue("expecting himanshu here", "himanshu".equals(addressList.get(0).getfName())); } /** only two employees have name gurleen or toshi */ public void test_search_for_gurleen_or_toshi() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the second 'search' example?? criteria.setKeywords("gurleen toshi"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 2); assertTrue("expecting toshi here", "toshi".equals(addressList.get(0).getfName())); assertTrue("expecting gurleen here", "gurleen".equals(addressList.get(1).getfName())); } /** there are 16 people from jammu location */ public void test_search_for_people_from_jammu_location() throws Exception { SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("jammu"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 16); } /** only Aalap, Manda and nitika are having title as Mrs and joined in 2010 */ public void test_joined_in_2010_with_title_Mrs() throws Exception { DateRange dateRange = new DateRange(); GregorianCalendar calendar1 = new GregorianCalendar(); calendar1.set(Calendar.YEAR, 2010); calendar1.set(Calendar.MONTH, Calendar.JANUARY); calendar1.set(Calendar.DAY_OF_MONTH, 1); dateRange.setFromDate(calendar1.getTime()); GregorianCalendar calendar2 = new GregorianCalendar(); calendar2.set(Calendar.YEAR, 2010); calendar2.set(Calendar.MONTH, Calendar.DECEMBER); calendar2.set(Calendar.DAY_OF_MONTH, 31); dateRange.setToDate(calendar2.getTime()); SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("Mrs"); criteria.setDateRage(dateRange); List addressList = addressBook.getAddressBookList(criteria); assertTrue("expecting 3 records here", addressList.size() == 3); } /** should get a business exception here */ public void test_without_specifying_keywords(){ SearchCriteria criteria = new SearchCriteria(); //criteria.setKeywords("Mrs"); try { addressBook.getAddressBookList(criteria); } catch (Exception e) { assertTrue(e instanceof AddressBookBizException); assertTrue(e.getMessage().indexOf("keyword is a required field") >-1); } } } How we update the Index once database changes? For these kinds of requirements, we can set up two sources and two indexes, with one "main" index for the data which only changes rarely (if ever), and one "delta" for the new documents. First Time data will go in the "main" index and the newly inserted address book entries will go into "delta". Delta index could then be reindexed very frequently, and the documents can be made available to search in a matter of minutes. Also one thing to take from this article is once 'searchd' daemon is running we can't index the data in normal way,we have to use --rotate option in such cases. For some applications where there is a timely batch update for the data, we can configure some cron job to reindex our documents in Sphinx as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --rotate Capsule We asked Sphinx to provide us the Document Ids corresponding to our search parameters and then we used those Ids to fire database query. In case the data we want to return is included in Index (DOJ attribute for example in our case) we can skip the database portion, so choose wisely how much information (attributes) you want to include while you index your sql data. Well that's all ... it's time to say good bye. Take good care of your health and don't forget to vote, its a must :) - Munish Gogna

December 7, 2010

by Munish Gogna

· 38,107 Views · 2 Likes

Setting mouse cursor position with WinAPI

Setting the mouse cursor position on a Windows machine with the help of .NET Framework shouldn't be that big of a problem. After all, there is the built-in Cursor class that lets you do that by executing a simple line of code: Cursor.Position = new System.Drawing.Point(0, 0); Of course, here 0 and 0 are the absolute coordinates for the mouse cursor on the screen. One thing to mention about this type of position setting is that Cursor requires a reference to System.Windows.Forms. And in some cases you don't want this extra reference. If that's the case, WinAPI is your solution. It requires some more work compared to the regular .NET way (class instance -> method call) but at the end you get more control than you would expect. When using WinAPI to set the cursor position, there are two ways you can go: mouse_event SendInput mouse_event is the very basic function that is only able to set the mouse coordinates. It was superseded and Microsoft recomends using SendInput instead. Nonetheless, it still works (although I cannot say for sure whether it will be working in future releases of Windows). So to start, I have a very basic class: class WINAPI_SUPERSEDED { [DllImport("user32.dll",SetLastError=true)] public static extern void mouse_event(uint dwFlags, uint dx, uint dy, uint dwData, int dwExtraInfo); public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } public enum DataFlags { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } } So to set the cursor position to 0,0 I would use this: WINAPI_SUPERSEDED.mouse_event((int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_MOVE | (int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_ABSOLUTE, 0, 0, 0, 0); If you go through the documentation, you will notice that in fact, dx and dy are in no way direct coordinates but rather normalized values ranged between 0 and 65,535 - that is only when the MOUSEEVENTF_ABSOLUTE flag is present. Otherwise, the position will be adjusted according to the current mouse cursor position. This method doesn't return any value so I cannot be informed whether it was successful or not. GetLastError won't give much information either. In this case, SendInput comes to the rescue. Here is what I have defined as the main class: class WINAPI { [DllImport("kernel32.dll")] public static extern uint GetLastError(); public enum MouseData { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_HWHEEL = 0x01000, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_MOVE_NOCOALESCE = 0x2000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_VIRTUALDESK = 0x4000, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } [DllImport("user32.dll", SetLastError=true)] public static extern uint SendInput(uint nInputs, ref INPUT pInputs, int cbSize); [StructLayout (LayoutKind.Explicit)] public struct INPUT { [FieldOffset(0)] public int type; [FieldOffset(4)] public MOUSEINPUT mi; } public struct MOUSEINPUT { public int dx; public int dy; public int mouseData; public int dwFlags; public int time; public int extraInfo; } } This class is a bit more complicated, but at the same time you have to understand that in some cases, SendInput is used for hardware and keyboard input as well. Of course, for experimentation purposes I removed those parts in the sample class. Here you have the same MouseFlags enum that will let you pass custom flags defining the mouse behavior. Notice, that SendInput has SetLastError set to true, therefore if something wrong happens via this method, the error will be easily obtained via GetLastError, that is implemented as a helper method in the same class. VERY IMPORTANT: When you define the INPUT struct, make sure you use LayoutKind.Explicit since when passed to an unamanaged call, a specific field layout is required - as you can see, every field is decorated with a FieldOffset attribute. Also taking about StructLayout, you don't have to set StructLayout.Sequential to MOUSEINPUT since it is setautomatically by CLR. When I want to call the method above, I can simply use this snippet: WINAPI.MOUSEINPUT mouseInput = new WINAPI.MOUSEINPUT(); mouseInput.dx = 100; mouseInput.dy = 10; mouseInput.dwFlags = (int)WINAPI.MouseFlags.MOUSEEVENTF_ABSOLUTE | (int)WINAPI.MouseFlags.MOUSEEVENTF_MOVE; WINAPI.INPUT input = new WINAPI.INPUT(); input.type = 0; input.mi = mouseInput; uint x = WINAPI.SendInput(1, ref input, Marshal.SizeOf(input)); Console.WriteLine(x); Console.WriteLine(WINAPI.GetLastError()); Console.ReadLine(); Notice that I have to call the unmanaged version of sizeof and pass the INPUT struct to it in order for the method to correctly execute. The regular C# sizeof won't cut it here. When I am defining the type of input, 0 is repesenting the INPUT_MOUSE flag, since I am only handling the mouse here. Of course, I can re-organize my method to accept a set of INPUT instances - the native call itself allows this by requesting an array of INPUT and the correct indication of the number of INPUT instances passed, but that is not required for testing purposes.

November 28, 2010

by Denzel D.

· 17,344 Views

Map Reduce and Stream Processing

Hadoop Map/Reduce model is very good in processing large amount of data in parallel. It provides a general partitioning mechanism (based on the key of the data) to distribute aggregation workload across different machines. Basically, map/reduce algorithm design is all about how to select the right key for the record at different stage of processing. However, "time dimension" has a very different characteristic compared to other dimensional attributes of data, especially when real-time data processing is concerned. It presents a different set of challenges to the batch oriented, Map/Reduce model. Real-time processing demands a very low latency of response, which means there isn't too much data accumulated at the "time" dimension for processing. Data collected from multiple sources may not have all arrived at the point of aggregation. In the standard model of Map/Reduce, the reduce phase cannot start until the map phase is completed. And all the intermediate data is persisted in the disk before download to the reducer. All these added to significant latency of the processing. Here is a more detail description of this high latency characteristic of Hadoop. Although Hadoop Map/Reduce is designed for batch-oriented work load, certain application, such as fraud detection, ad display, network monitoring requires real-time response for processing large amount of data, have started to looked at various way of tweaking Hadoop to fit in the more real-time processing environment. Here I try to look at some technique to perform low-latency parallel processing based on the Map/Reduce model. General stream processing model In this model, data are produced at various OLTP system, which update the transaction data store and also asynchronously send additional data for analytic processing. The analytic processing will write the output to a decision model, which will feed back information to the OLTP system for real-time decision making. Notice the "asynchronous nature" of the analytic processing which is decoupled from the OLTP system, this way the OLTP system won't be slow down waiting for the completion of the analytic processing. Nevetheless, we still need to perform the analytic processing ASAP, otherwise the decision model will not be very useful if it doesn't reflect the current picture of the world. What latency is tolerable is application specific. Micro-batch in Map/Reduce One approach is to cut the data into small batches based on time window (e.g. every hour) and submit the data collected in each batch to the Map Reduce job. Staging mechanism is needed such that the OLTP application can continue independent of the analytic processing. A job scheduler is used to regulate the producer and consumer so each of them can proceed independently. Continuous Map/Reduce Here lets imagine some possible modification of the Map/Reduce execution model to cater for real-time stream processing. I am not trying to worry about the backward compatibility of Hadoop which is the approach that Hadoop online prototype (HOP) is taking. Long running The first modification is to make the mapper and reducer long-running. Therefore, we cannot wait for the end of the map phase before starting the reduce phase as the map phase never ends. This implies the mapper push the data to the reducer once it complete its processing and let the reducer to sort the data. A downside of this approach is that it offers no opportunity to run the combine() function on the map side to reduce the bandwidth utilization. It also shift more workload to the reducer which now needs to do the sorting. Notice there is a tradeoff between latency and optimization. Optimization requires more data to be accumulated at the source (ie: the Mapper) so local consolidation (ie: combine) can be performed. Unfortunately, low latency requires the data to be sent ASAP so not much accumulation can be done. HOP suggest an adaptive flow control mechanism such that data is pushed out to reducer ASAP until the reducer is overloaded and push back (using some sort of flow control protocol). Then the mapper will buffer the processed message and perform combine() before it send to the reducer. This approach automatically shift back and forth the aggregation workload between the reducer and the mapper. Time Window: Slice and Range This is a "time slice" concept and a "time range" concept. "Slice" defines a time window where result is accumulated before the reduce processing is executed. This is also the minimum amount of data that the mapper should accumulate before sending to the reducer. "Range" defines the time window where results are aggregated. It can be a landmark window where it has a well-defined starting point, or a jumping window (consider a moving landmark scenario). It can also be a sliding window where is a fixed size window from the current time is aggregated. After receiving a specific time slice from every mapper, the reducer can start the aggregation processing and combine the result with the previous aggregation result. Slice can be dynamically adjusted based on the amount of data sent from the mapper. Incremental processing Notice that the reducer need to compute the aggregated slice value after receive all records of the same slice from all mappers. After that it calls the user-defined merge() function to merge the slice value with the range value. In case the range need to be refreshed (e.g. reaching a jumping window boundary), the init() functin will be called to get a refreshed range value. If the range value need to be updated (when certain slice value falls outside a sliding range), the unmerge() function will be invoked. Here is an example of how we keep tracked of the average hit rate (ie: total hits per hour) within a 24 hour sliding window with update happens per hour (ie: an one-hour slice). # Call at each hit record map(k1, hitRecord) { site = hitRecord.site # lookup the slice of the particular key slice = lookupSlice(site) if (slice.time - now > 60.minutes) { # Notify reducer whole slice of site is sent advance(site, slice) slice = lookupSlice(site) } emitIntermediate(site, slice, 1) } combine(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(site, slice, hitCount) } # Called when reducer receive full slice from all mappers reduce(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } sv = SliceValue.new sv.hitCount = hitCount return sv } # Called at each jumping window boundary init(slice) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Called after each reduce() merge(rangeValue, slice, sliceValue) { rangeValue.hitCount += sliceValue.hitCount } # Called when a slice fall out the sliding window unmerge(rangeValue, slice, sliceValue) { rangeValue.hitCount -= sliceValue.hitCount }

November 23, 2010

by Ricky Ho

· 17,262 Views · 1 Like

Implementing Retries with a MDB or an MQ Batch Job? (WAS 7, MQ 6)

Both approaches have some advantages and disadvantages and so it’s a question of the likelihood of particular problems and business requirements and priorities.

November 10, 2010

by Jakub Holý

· 27,285 Views

Struts 2 : Creating and Accessing Maps

Today, in this post, I am going to discuss how to create and access HashMaps in Struts 2. My environment has the following jar files. struts2-core-2.1.8.1.jar ognl-2.7.3.jar Struts 2 makes extensive use of OGNL in order to retrieve the values of elements. OGNL stands for Object Graph Navigation Language. As the name suggests, OGNL is used to navigate an object graph. In this post, i am going to use the OGNL syntax to create Map on a jsp page, and show you how to iterate over it to fetch the keys and values from the map. In the following example, i will create a map, on they fly in the iterator tag, and will use it in the body of the iterator tag. Note the syntax that has been used to create a Map on a jsp page in struts 2 using OGNL. Once the map is created, the iterator tag can be used to iterate over each element of the map. Now suppose the map that we want to access is in a map inside the HttpRequest. Assume that some action somewhere in the chain has kept a map in the request using the key "myMap". In order to iterate over the elements of the map, we can do the following Okay now, enough with iterating over maps. I don't think there are any more permutations for accessing maps. But if i do find more, ill document them down here. Consider a case where you dont want to iterate over the entire map. Instead, all that you want to do is to extract a value form the map based upon a key that is already known to you in your jsp page. Assume that you have a variable in page scope called "runtimeKey" that you set using the s:set tag. The value of this variable is a string key that can be used to get a value from a map. Here is how you can fetch the value from the map without iterating over it. As you see, since the variable "runtimeKey" is an OGNL variable and is available on the value stack, it can be referenced using the # notations. Also not that instead of using the dot notation, I have used square brackets to fetch the key. This is because the value of my expression "#runtimeKey will only be evaluated when its inside the brackets. Also note that the value of the key runtimeKey is contained within single quotes to direct OGNL to evaluate it as a string when setting as the value for my key. Consider another situation where your keys follow a pattern. For example key_1, key_2. And you have the values 1 and 2 as page scoped variables. So, now instead of having the string value of the key directly, you may have to construct the key using concatenation. Your key pattern was set above. And you Map has a key called key_1. So, here is how you would have to concatenate your strings in order to construct the key and fetch the value. Huh. So easy! Thats all for now folks.. Stay tuned for more! Happy Programming :) From http://mycodefixes.blogspot.com/2010/11/struts-2-creating-and-accessing-maps.html

November 8, 2010

by Ryan Sukale

· 32,536 Views

From Doctrine 1 to Doctrine 2

What you need to know Doctrine 2 is an implementation of the Data Mapper pattern, and does not force your model classes to extend an Active Record, nor to contain details about the relational model like foreign keys. Doctrine 2 is divided into three packages, each building on the previous one: Doctrine\Common contains reusable code like the annotation parser and the event system. Doctrine\DBAL asbtracts not only the particular database driver, like PDO already does natively, but also the SQL dialects of different DBMS. Doctrine\ORM maps your object graphs into database and vice versa. It is where the fun happens. Note that you will have to run your application of PHP 5.3 for Doctrine 2 to work, mainly because of the use of namespaces in it. Porting mapping information In Doctrine 2, you'll need to maintain your PHP classes, instead of a mapping schema. These classes need to be annotated with @Entity and various other annotations to define which columns and relationships should be reflected in the database. This means private properties containing other objects or field values need to be eplicitly defined. You can take a look to the Doctrine\Tests namespace on github to see some examples of model definition along with metadata. Thus, your Domain Model classes are managed by you, and you should maintain your object graph consistent. Then mapping information can be read by Doctrine 2 to port your objects into the database tables and reconstitute them when you execute query. Doctrine 2 is much more focused into mapping, instead of on shipping validation and behaviors as Doctrine 1 did. Of course, the porting of Doctrine 1 metadata to Doctrine 2 ones can be automated with a script, contained in the Doctrine command line tools. doctrine orm:convert-d1-schema will transform your YAML schema into YAML mapping information for Doctrine 2. you'll still need to generate your PHP classes however, just for this first time. I think the best approach to store mapping information is directly into class source code, with annotations that can be easily kept synchronized with the fields they refer to. While in other languages annotations require their classes to be present in order to compile, PHP applications define annotations as comments, which are ignored by anything else than Doctrine 2. There may be issues with APC and other optimizators that strip away comments, however. You have now to configure your doctrine command line tool to load the metadata generated by the previous step, and then run: doctrine orm:convert-mapping annotations /path/to/folder/where/generate/classes From now on, you'll maintain metadata only in the classes in that folder (after reconfiguration of Doctrine to read them instead of the YAML metadata). The generated classes will be empty, apart from annotated private properties, and you'll still have to insert the original public methods if you have any. This mechanism port relationships as well as columns, but only the one explicitly defined (in the relations key). Autodiscovered relations like the ones inferred from field_id-like fields are not ported, but you'll still have the foreign key as a column. Doctrine 2 objects do not save foreign keys internally, so they will act as a placefolder that should be substituted with a relation annotation. Your application code Doctrine 2 has a cleaner architecture than its previous iteration, and it promotes Dependency Injection instead of accessing Doctrine_* classes statically or via a singleton like Doctrine_Manager. Here's what to inject in substitution of your old Doctrine_*::*() calls: if you used Doctrine_Connection or Doctrine_Manager directly, now you should inject a Doctrine\ORM\EntityManager, which is the Facade of Doctrine 2. When you have a reference to the Entity Manager, you can access any functionality of Doctrine 2. If you referred to Doctrine_Record instances, you'll have to inject or instantiate your models, which do not extend any Doctrine base class. If you referred to a Doctrine_Table object (a single one), you can instead inject a Doctrine\ORM\EntityRepository, obtained with $em->getRepository($className). If you need to access the mapping metadata, to discover for example the type of a field, you have an entry point where each class metadata are available: $em->getMetadataFactory() will give you a Doctrine\ORM\Mapping\ClassMetadataFactory, which you can query for metadata of each class you know. If you need instead metadata only for a single class, you can obtain with $metadataFactory->getMetadataFor($className) a single Doctrine\ORM\Mapping\ClassMetadata, to inject into your own object. To perform queries, you can instance a Builder for queries with $this->em->createQueryBuilder($className). Until you have a QueryBuilder, it provides you with a fluent interface to add DQL parts; when you call getQuery() on it, it will return an immutable Query object which you can execute with several utility methods, like getSingleResult(). Is it worth the hassle? Doctrine 2 is a new major version and it is indeed going to require some changes to your codebase that take more than an hour. However, its mapping power, well-though architecture and speed are worth the effort. If you want to do serious object-oriented programming in PHP, you will have to consider Doctrine 2 someday.

November 3, 2010

by Giorgio Sironi

· 23,185 Views

MyBatis (formerly iBatis) – Examples and Hints using SELECT, INSERT and UPDATE Annotations

MyBatis is a lightweight persistence framework for Java and .NET. This blog entry addresses the Java side. MyBatis is an alternative positioned somewhere between plain JDBC and ORM frameworks (e.g. EclipseLink or Hibernate). MyBatis usually uses XML, but it also supports annotations since version 3. The documentation is very detailed for XML, but lacks annotation examples. Just the Annotations itself are described, but no examples how to use them. I could not find any good and easy examples anywhere, so I will describe some very basic examples for SELECT, INSERT and UPDATE statements by implementing a Data Access Object (DAO) using MyBatis. These examples are a good starting point to create more complex MyBatis queries using a DAO. You can find the full source code at the end of this blog. A simple SQL-Table I use a very simple table with two attributes. The name of the table is “simple_information”. The primary key is a Integer and may not be null (info_id). The only real data is a character and may also not be null (info_content). That is enough “complexity” to learn the usage of MyBatis annotations. The Java class “SimpleInformationEntity” is a POJO which contains these two attributes. @SELECT-Statement The @Select annotation is very easy to use, if you want to use exactly one paramter. If you need more than one paramter, use the @Param annotation (which is described below at the update example). You do not have to map the found information to a SimpleInformationEntity object, as you would have to do with a JDBC ResultSet. The magic of the framework does this for you. final String GET_INFO = “SELECT * FROM simple_information WHERE info_id = #{info_id}”; @Select(GET_INFO) public SimpleInformationEntity getSimpleInformationById(int info_id) throws Exception; @INSERT-Statement You can use the object (which you want to be persist) as parameter. You do not have to use several parameters for each attribute of the object. The magic of the framework does this for you. final String PERSIST_INFO = “INSERT INTO simple_information(info_id, info_content) VALUES (#{infoId}, #{infoContent})”; @Insert(PERSIST_INFO) public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception; @UPDATE-Statement You cannot use more than one parameter within a method. If you want to understand why, look at some MyBatis XML examples in the documenation: There you use the attribute “parameterType”, which must be exactly one “parameter”! So you will get a (strange) exception, if you use two or more parameters. Instead you have to use the @Param annotation, if you need more than one parameter. final String UPDATE_INFO = “UPDATE simple_information SET info_content = #{newInfo} WHERE info_id = #{infoId}”; @Update(UPDATE_INFO) public int updateInformation(@Param(“infoId”) int info_id, @Param(“newInfo”) String new_content) throws Exception; Configuration You have to add your MyBatis-Interface for the Mapper to the SQLSessionFactory: sqlSessionFactory.getConfiguration().addMapper(InfoMapper.class); Some Hints for developing with MyBatis The following means helped me a lot to use MyBatis annotations despite the lack of documentation about using annotations: - MyBatis is open source! So add the sources to your build path and use the debugging function of your IDE to enter the MyBatis source code while executing some queries. You will see what MyBatis expects as input and how it is processed. - Read the documentation about using MyBatis XML. This does not really make any sense you think? It does! The processing deep inside MyBatis does not change if you use annotations instead of XML. It is just another way to develop persistence queries. E.g. if you know that a XML select query may just use exactly one parameterType-attribute, then you know that you may just use one parameter in an annotation-based method too! If you need more parameters, you have to use the @Param annotation. - If you get any strange exception that does not make any sense, then clean and re-compile your project. This often helps, because as with other persistence frameworks such as Hibernate, the bytecode enhancement sometimes confuses your IDE. Conclusion: @MyBatis-Team: Improve and extend the documentation instead of improving the framework itself! MyBatis is a nice lightweigt persistence framework. But the documentation is not enough detailed. Some important information is completely missing. Especially, if you are a newbie to MyBatis / iBatis, it is very tough to develop with MyBatis using annotations instead of XML. Besides the usage of annotations, another good example for missing documentation is how to configure transactions in MyBatis by using JNDI and a J2EE / JEE Application server. You have to use google to find out, and if you are lucky you will find a mailing list or blog entry describing your problem. If not, you have to try it out. The missing documentation makes MyBatis much more tough than it actually is. So in the next months, the MyBatis team should improve and extend the documentation instead of improving the framework itself… Best regards, Kai Wähner (Twitter: @Kai Waehner) [Content from my Blog: MyBatis (formerly iBatis) - Examples and Hints using Select, Insert and Update Annotations - Kai Wähner's IT-Blog] Appendix: Source Code Here you see all the necessary source code, also including a MyBatis Connection Factory, which reads the configuration data from a XML file. ############################################################################ Connection Factory (using a static initializer) ############################################################################ import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import org.apache.ibatis.session.SqlSessionFactory; import org.apache.ibatis.session.SqlSessionFactoryBuilder; import de.waehner.kai.persistence.InfoDAO.InfoMapper; public class MyBatisConnectionFactory { private static SqlSessionFactory sqlSessionFactory; static { Reader reader = null; try { InputStream in = MyBatisConnectionFactory.class.getResourceAsStream(“myBatisConfiguration.xml”); reader = new InputStreamReader(in); if (sqlSessionFactory == null) { sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader); sqlSessionFactory.getConfiguration().addMapper(InfoMapper.class); } in.close(); } catch (FileNotFoundException fileNotFoundException) { fileNotFoundException.printStackTrace(); } catch (IOException iOException) { iOException.printStackTrace(); } } public static SqlSessionFactory getSqlSessionFactory() { return sqlSessionFactory; } } ############################################################################ Data Acces Object (including the MyBatis-Mapper as inner Class) ############################################################################ import org.apache.ibatis.annotations.Insert; import org.apache.ibatis.annotations.Param; import org.apache.ibatis.annotations.Select; import org.apache.ibatis.annotations.Update; import org.apache.ibatis.session.SqlSession; import org.apache.ibatis.session.SqlSessionFactory; public class InfoDAO { public interface InfoMapper { final String GET_INFO = “SELECT * FROM simple_information WHERE info_id = #{info_id}”; @Select(GET_INFO) public SimpleInformationEntity getSimpleInformationById(int info_id) throws Exception; final String PERSIST_INFO = “INSERT INTO simple_information(info_id, info_content) VALUES (#{infoId}, #{infoContent})”; @Insert(PERSIST_INFO) public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception; final String UPDATE_INFO = “UPDATE simple_information SET info_content = #{newInfo} WHERE info_id = #{infoId}”; @Update(UPDATE_INFO) public int updateInformation(@Param(“infoId”) int info_id, @Param(“newInfo”) String new_content) throws Exception; } public SimpleInformationEntity getSingleAlarm(int info_id) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); SimpleInformationEntity simpleInfo = mapper.getSimpleInformationById(info_id); return simpleInfo; } finally { session.close(); } } public int persistInformation(SimpleInformationEntity simpleInfo) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); int answer = mapper.persistInformation(simpleInfo); return answer; } finally { session.close(); } } public int updateInformation(int info_id, String new_content) throws Exception { SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { InfoMapper mapper = session.getMapper(InfoMapper.class); int answer = mapper.updateInformation(info_id, new_content); return answer; } finally { session.close(); } } } ############################################################################ SimpleInformation Entity (a simple POJO) ############################################################################ import java.io.Serializable; public class SimpleInformationEntity implements Serializable{ private static final long serialVersionUID = -821826330941829539L; private int infoId; private String infoContent; public int getInfoId() { return infoId; } public void setInfoId(int infoId) { this.infoId = infoId; } public String getInfoContent() { return infoContent; } public void setInfoContent(String infoContent) { this.infoContent = infoContent; } }

November 2, 2010

by Kai Wähner

CORE

· 79,379 Views · 2 Likes

SQL Server: How to insert million numbers to table fast?

Yesterday I attended at local community evening where one of the most famous Estonian MVPs – Henn Sarv – spoke about SQL Server queries and performance. During this session we saw very cool demos and in this posting I will introduce you my favorite one – how to insert million numbers to table. The problem is: how to get one million numbers to table with less time? We can solve this problem using different approaches but not all of them are quick. Let’s go now step by step and see how different approaches perform. NB! The code samples here are not original ones but written by me as I wrote this posting. Using WHILE First idea for many guys is using WHILE. It is robust and primitive approach but it works if you don’t think about better solutions. Solution with WHILE is here. declare @i as int set @i = 0 while(@i < 1000000) begin insert into numbers values(@i) set @i += 1 end When we run this code we have to wait. Well… we have to wait couple of minutes before SQL Server gets done. On my heavily loaded development machine it took 6 minutes to run. Well, maybe we can do something. Using inline table As a next thing we may think that inline table that is kept in memory will boost up performance. Okay, let’s try out the following code. declare @t TABLE (number int) declare @i as int set @i = 0 while(@i < 1000000) begin insert into @t values(@i) set @i += 1 end insert into numbers select * from @t Okay, it is better – it took “only” 01:30 to run. It is better than six minutes but it is not good yet. Maybe we can do something more? Optimizing WHILE If we investigate the code in first example we can find one hidden resource eater. All these million inserts are run in separate transaction. Let’s try to run inserts in one transaction. declare @i as int set @i = 0 begin transaction while(@i < 1000000) begin insert into numbers values(@i) set @i += 1 end commit transaction Okay, it’s a lot better – 18 seconds only! Using only set operations Now let’s write some SQL that doesn’t use any sequential constructs like WHILE or other loops. We will write SQL that uses only set operations and no long running stuff like before. declare @t table (number int) insert into @t select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 insert into numbers select t1.number + t2.number*10 + t3.number*100 + t4.number*1000 + t5.number*10000 + t6.number*100000 from @t as t1, @t as t2, @t as t3, @t as t4, @t as t5, @t as t6 Bad side of this SQL is that it is not as intuitive for application programmers as previous examples. But when you are working with databases you have to know how some set calculus as well. The result is now seven seconds! Results As last thing, let’s see the results as bar chart to illustrate difference between approaches. I think this example shows very well how usual optimization can give you better results but when you are moving to sets – this is something that SQL Server and other databases understand better – you can get very good results in performance.

October 22, 2010

by Gunnar Peipman

· 34,836 Views

Manage Hierarchical Data using Spring, JPA and Aspects

Managing hierarchical data using two dimentional tables is a pain. There are some patterns to reduce this pain. One such solution is described here. This article is about implementing the same using Spring, JPA, Annotations and Aspects. Please go through follow the link to better understand this solution described. The purpose is to come up with a component that will remove the boiler-plate code in the business layer to handle hierarchical data. Summary Create a base class for Entities used to represent Hierarchical data Create annotation classes Code the Aspect that will execute addional steps for managing Hierarchical data. (Heart of the solution) Now the Aspect can be used everywhere Hierarchical data is used. Detail Create base class for Entities used to represent Hierarchical data. The purpose of the super class is to encapsulate all the common attrubutes and operations required for managing hierarchical data in a table. Please note that the class is annotated as @MappedSuperclass. The methods are meant to generate queries required to perform CRUD operations on the Table. Their use will be more clear later in the article when we will revisit HierarchicalEntity. Now any Entity that extends this class will have all the attributes required to manage hierarchical data. import com.es.clms.aspect.HierarchicalEntity; import javax.persistence.EntityListeners; import javax.persistence.MappedSuperclass; @MappedSuperclass @EntityListeners({HierarchicalEntity.class}) public abstract class AbstractHierarchyEntity implements Serializable { protected Long parentId; protected Long lft; protected Long rgt; public String getMaxRightQuery() { return "Select max(e.rgt) from " + this.getClass().getName() + " e"; } public String getQueryForParentRight() { return "Select e.rgt from " + this.getClass().getName() + " e where e.id = ?1"; } public String getDeleteStmt() { return "Delete from " + this.getClass().getName() + " e Where e.lft between ?1 and ?2"; } public String getUpdateStmtForFirst() { return "Update " + this.getClass().getName() + " e set e.lft = e.lft + ?2 Where e.lft >= ?1"; } public String getUpdateStmtForRight() { return "Update " + this.getClass().getName() + " e set e.rgt = e.rgt + ?2 Where e.rgt >= ?1"; } . . .//Getter and setters for all the attributes. } Create annotation classes The following is an annotation class that will be used to annotate the methods that perform CRUD operations on hierarchical data. It is followed by an enum that will decide the type of CRUD operation to be performed. These classes will make more sense after the next section. import java.lang.annotation.ElementType; import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; import java.lang.annotation.Target; @Target(ElementType.METHOD) @Retention(RetentionPolicy.RUNTIME) public @interface HierarchicalOperation { HierarchicalOperationType operationType(); } /** * Enum - Type of CRUD operation. */ public enum HierarchicalOperationType { SAVE, DELETE; } Code the Aspect that will execute addional steps for managing Hierarchical data. HierarchicalEntity is an aspect that performs the additional logic required to manage the hierarchical data as descriped in the article here. This is the first time I have used an Aspect, therefore I am sure that there are better ways to do this. Those of you, who are good at it, please improve this part of code. This class is annotated as @Aspect. The pointcut will intercept any method anotated with HierarchicalOperation and has a input of type AbstractHierarchyEntity. A sample its usage is in next section. operation method is annotated to be executed before the pointcut. Based on the HierarchicalOperationType passed, this method will either execute the additional tasks required to save or delete the hierarchical record. This is where the methods defined in AbstractHierarchyEntity for generating JPA Queries are used. GenericDAOHelper is a utility class for using JPA. import com.es.clms.annotation.HierarchicalOperation; import com.es.clms.annotation.HierarchicalOperationType; import com.es.clms.common.GenericDAOHelper; import com.es.clms.model.AbstractHierarchyEntity; import org.aspectj.lang.JoinPoint; import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Before; import org.aspectj.lang.annotation.Pointcut; import org.springframework.stereotype.Service; import org.springframework.beans.factory.annotation.Autowired; @Aspect @Service("hierarchicalEntity") public class HierarchicalEntity { @Autowired private GenericDAOHelper genericDAOHelper; @Pointcut(value = "execution(@com.es.clms.annotation.HierarchicalOperation * *(..)) " + "&& args(AbstractHierarchyEntity)") private void hierarchicalOps() { } /** * * @param jp * @param hierarchicalOperation */ @Before("hierarchicalOps() && @annotation(hierarchicalOperation) ") public void operation(final JoinPoint jp, final HierarchicalOperation hierarchicalOperation) { if (jp.getArgs().length != 1) { throw new IllegalArgumentException( "Expecting only one parameter of type AbstractHierarchyEntity in " + jp.getSignature()); } if (HierarchicalOperationType.SAVE.equals( hierarchicalOperation.operationType())) { save(jp); } else if (HierarchicalOperationType.DELETE.equals( hierarchicalOperation.operationType())) { delete(jp); } } /** * * @param jp */ private void save(JoinPoint jp) { AbstractHierarchyEntity entity = (AbstractHierarchyEntity) jp.getArgs()[0]; if (entity == null) return; if (entity.getParentId() == null) { Long maxRight = (Long) genericDAOHelper.executeSingleResultQuery( entity.getMaxRightQuery()); if (maxRight == null) { maxRight = 0L; } entity.setLft(maxRight + 1); entity.setRgt(maxRight + 2); } else { Long parentRight = (Long) genericDAOHelper.executeSingleResultQuery( entity.getQueryForParentRight(), entity.getParentId()); entity.setLft(parentRight); entity.setRgt(parentRight + 1); genericDAOHelper.executeUpdate( entity.getUpdateStmtForFirst(), parentRight, 2L); genericDAOHelper.executeUpdate( entity.getUpdateStmtForRight(), parentRight, 2L); } } /** * * @param jp */ private void delete(JoinPoint jp) { AbstractHierarchyEntity entity = (AbstractHierarchyEntity) jp.getArgs()[0]; genericDAOHelper.executeUpdate( entity.getDeleteStmt(), entity.getLft(), entity.getRgt()); Long width = (entity.getRgt() - entity.getLft()) + 1; genericDAOHelper.executeUpdate( entity.getUpdateStmtForFirst(), entity.getRgt(), width * (-1)); genericDAOHelper.executeUpdate( entity.getUpdateStmtForRight(), entity.getRgt(), width * (-1)); } } Sample Usage From this point on you don't have to worry about the additional tasks required for managing the data. Just use the HierarchicalOperation anotation with appropriate HierarchicalOperationType. Below is a sample use of the code developed so far. @HierarchicalOperation(operationType = HierarchicalOperationType.SAVE) public long save(VariableGroup group) { entityManager.persist(group); return group.getId(); } @HierarchicalOperation(operationType = HierarchicalOperationType.DELETE) public void delete(VariableGroup group) { entityManager.remove(entityManager.merge(group)); } http://rajeshkilango.blogspot.com/2010/10/manage-hierarchical-data-using-spring.html

October 21, 2010

by Rajesh Ilango

· 17,007 Views

Tutorial: Linked/Cascading ExtJS Combo Boxes using Spring MVC 3 and Hibernate 3.5

this post will walk you through how to implement extjs linked/cascading/nested combo boxes using spring mvc 3 and hibernate 3.5. i am going to use the classic linked combo boxes: state and cities. in this example, i am going to use states and cities from brazil! what is our main goal? when we select a state from the first combo box, the application will load the second combo box with the cities that belong to the selected state. there are two ways to implement it. the first one is to load all the information you need for both combo boxes, and when user selects a state, the application will filter the cities combo box according to the selected state. the second one is to load information only to populate the state combo box. when user selects a state, the application will retrieve all the cities that belong to the selected state from database. which one is best? it depends on the amount of data you have to retrieve from your database. for example: you have a combo box that lists all the countries in the world. and the second combo box represents all the cities in the world. in this case, scenario number 2 is the best option, because you will have to retrieve a large amount of data from the database. ok. let’s get into the code. i’ll show how to implement both scenarios. first, let me explain a little bit of how the project is organized: let’s take a look at the java code. basedao: contains the hibernate template used for citydao and statedao. package com.loiane.dao; import org.hibernate.sessionfactory; import org.springframework.beans.factory.annotation.autowired; import org.springframework.orm.hibernate3.hibernatetemplate; import org.springframework.stereotype.repository; @repository public abstract class basedao { private hibernatetemplate hibernatetemplate; public hibernatetemplate gethibernatetemplate() { return hibernatetemplate; } @autowired public void setsessionfactory(sessionfactory sessionfactory) { hibernatetemplate = new hibernatetemplate(sessionfactory); } } citydao: contains two methods: one to retrieve all cities from database (used in scenario #1), and one method to retrieve all the cities that belong to a state (used in scenario #2). package com.loiane.dao; import java.util.list; import org.hibernate.criterion.detachedcriteria; import org.hibernate.criterion.restrictions; import org.springframework.stereotype.repository; import com.loiane.model.city; @repository public class citydao extends basedao{ public list getcitylistbystate(int stateid) { detachedcriteria criteria = detachedcriteria.forclass(city.class); criteria.add(restrictions.eq("stateid", stateid)); return this.gethibernatetemplate().findbycriteria(criteria); } public list getcitylist() { detachedcriteria criteria = detachedcriteria.forclass(city.class); return this.gethibernatetemplate().findbycriteria(criteria); } } statedao: contains only one method to retrieve all the states from database. package com.loiane.dao; import java.util.list; import org.hibernate.criterion.detachedcriteria; import org.springframework.stereotype.repository; import com.loiane.model.state; @repository public class statedao extends basedao{ public list getstatelist() { detachedcriteria criteria = detachedcriteria.forclass(state.class); return this.gethibernatetemplate().findbycriteria(criteria); } } city: represents the city pojo, represents the city table. package com.loiane.model; import javax.persistence.column; import javax.persistence.entity; import javax.persistence.generatedvalue; import javax.persistence.id; import javax.persistence.table; import org.codehaus.jackson.annotate.jsonautodetect; @jsonautodetect @entity @table(name="city") public class city { private int id; private int stateid; private string name; //getters and setters } state: represents the state pojo, represents the state table. package com.loiane.model; import javax.persistence.column; import javax.persistence.entity; import javax.persistence.generatedvalue; import javax.persistence.id; import javax.persistence.table; import org.codehaus.jackson.annotate.jsonautodetect; @jsonautodetect @entity @table(name="state") public class state { private int id; private int countryid; private string code; private string name; //getters and setters } cityservice: contains two methods: one to retrieve all cities from database (used in scenario #1), and one method to retrieve all the cities that belong to a state (used in scenario #2). only makes a call to citydao class. package com.loiane.service; import java.util.list; import org.springframework.beans.factory.annotation.autowired; import org.springframework.stereotype.service; import com.loiane.dao.citydao; import com.loiane.model.city; @service public class cityservice { private citydao citydao; public list getcitylistbystate(int stateid) { return citydao.getcitylistbystate(stateid); } public list getcitylist() { return citydao.getcitylist(); } @autowired public void setcitydao(citydao citydao) { this.citydao = citydao; } } stateservice: contains only one method to retrieve all the states from database. makes a call to statedao. package com.loiane.service; import java.util.list; import org.springframework.beans.factory.annotation.autowired; import org.springframework.stereotype.service; import com.loiane.dao.statedao; import com.loiane.model.state; @service public class stateservice { private statedao statedao; public list getstatelist() { return statedao.getstatelist(); } @autowired public void setstatedao(statedao statedao) { this.statedao = statedao; } } citycontroller: contains two methods: one to retrieve all cities from database (used in scenario #1), and one method to retrieve all the cities that belong to a state (used in scenario #2). only makes a call to cityservice class. both methods return a json object that looks like this: {"data":[ {"stateid":1,"name":"acrelândia","id":1}, {"stateid":1,"name":"assis brasil","id":2}, {"stateid":1,"name":"brasiléia","id":3}, {"stateid":1,"name":"bujari","id":4}, {"stateid":1,"name":"capixaba","id":5}, {"stateid":1,"name":"cruzeiro do sul","id":6}, {"stateid":1,"name":"epitaciolândia","id":7}, {"stateid":1,"name":"feijó","id":8}, {"stateid":1,"name":"jordão","id":9}, {"stateid":1,"name":"mâncio lima","id":10}, ]} class: package com.loiane.web; import java.util.hashmap; import java.util.map; import org.springframework.beans.factory.annotation.autowired; import org.springframework.stereotype.controller; import org.springframework.web.bind.annotation.requestmapping; import org.springframework.web.bind.annotation.requestparam; import org.springframework.web.bind.annotation.responsebody; import com.loiane.service.cityservice; @controller @requestmapping(value="/city") public class citycontroller { private cityservice cityservice; @requestmapping(value="/getcitiesbystate.action") public @responsebody map getcitiesbystate(@requestparam int stateid) throws exception { map modelmap = new hashmap(3); try{ modelmap.put("data", cityservice.getcitylistbystate(stateid)); return modelmap; } catch (exception e) { e.printstacktrace(); modelmap.put("success", false); return modelmap; } } @requestmapping(value="/getallcities.action") public @responsebody map getallcities() throws exception { map modelmap = new hashmap(3); try{ modelmap.put("data", cityservice.getcitylist()); return modelmap; } catch (exception e) { e.printstacktrace(); modelmap.put("success", false); return modelmap; } } @autowired public void setcityservice(cityservice cityservice) { this.cityservice = cityservice; } } statecontroller: contains only one method to retrieve all the states from database. makes a call to stateservice. the method returns a json object that looks like this: {"data":[ {"countryid":1,"name":"acre","id":1,"code":"ac"}, {"countryid":1,"name":"alagoas","id":2,"code":"al"}, {"countryid":1,"name":"amapá","id":3,"code":"ap"}, {"countryid":1,"name":"amazonas","id":4,"code":"am"}, {"countryid":1,"name":"bahia","id":5,"code":"ba"}, {"countryid":1,"name":"ceará","id":6,"code":"ce"}, {"countryid":1,"name":"distrito federal","id":7,"code":"df"}, {"countryid":1,"name":"espírito santo","id":8,"code":"es"}, {"countryid":1,"name":"goiás","id":9,"code":"go"}, {"countryid":1,"name":"maranhão","id":10,"code":"ma"}, {"countryid":1,"name":"mato grosso","id":11,"code":"mt"}, {"countryid":1,"name":"mato grosso do sul","id":12,"code":"ms"}, {"countryid":1,"name":"minas gerais","id":13,"code":"mg"}, {"countryid":1,"name":"pará","id":14,"code":"pa"}, ]} class: package com.loiane.web; import java.util.hashmap; import java.util.map; import org.springframework.beans.factory.annotation.autowired; import org.springframework.stereotype.controller; import org.springframework.web.bind.annotation.requestmapping; import org.springframework.web.bind.annotation.responsebody; import com.loiane.service.stateservice; @controller @requestmapping(value="/state") public class statecontroller { private stateservice stateservice; @requestmapping(value="/view.action") public @responsebody map view() throws exception { map modelmap = new hashmap(3); try{ modelmap.put("data", stateservice.getstatelist()); return modelmap; } catch (exception e) { e.printstacktrace(); modelmap.put("success", false); return modelmap; } } @autowired public void setstateservice(stateservice stateservice) { this.stateservice = stateservice; } } inside the webcontent folder we have: ext-3.2.1 – contains all extjs files js – contains all javascript files i implemented for this example. liked-comboboxes-local.js contains the combo boxes for scenario #1; liked-comboboxes-remote.js contains the combo boxes for scenario #2; linked-comboboxes.js contains a tab panel that contains both scenarios. now let’s take a look at the extjs code. scenario number 1: retrieve all the data from database to populate both combo boxes. will user filter on cities combo box. liked-comboboxes-local.js var localform = new ext.formpanel({ width: 400 ,height: 300 ,style:'margin:16px' ,bodystyle:'padding:10px' ,title:'linked combos - local filtering' ,defaults: {xtype:'combo'} ,items:[{ fieldlabel:'select state' ,displayfield:'name' ,valuefield:'id' ,store: new ext.data.jsonstore({ url: 'state/view.action', remotesort: false, autoload:true, idproperty: 'id', root: 'data', totalproperty: 'total', fields: ['id','name'] }) ,triggeraction:'all' ,mode:'local' ,listeners:{select:{fn:function(combo, value) { var combocity = ext.getcmp('combo-city-local'); combocity.clearvalue(); combocity.store.filter('stateid', combo.getvalue()); } } },{ fieldlabel:'select city' ,displayfield:'name' ,valuefield:'id' ,id:'combo-city-local' ,store: new ext.data.jsonstore({ url: 'city/getallcities.action', remotesort: false, autoload:true, idproperty: 'id', root: 'data', totalproperty: 'total', fields: ['id','stateid','name'] }) ,triggeraction:'all' ,mode:'local' ,lastquery:'' }] }); the state combo box is declared on lines 9 to 28. the city combo box is declared on lines 31 to 46. note that both stores are loaded when we load the page, as we can see in lines 15 and 38 (autoload:true). the state combo box has select event listener that, when executed, filters the cities combo (the child combo) based on the currently selected state. you can see it on lines 23 to 28. cities combo has lastquery:”" . this is to fool internal combo filtering routines on the first page load. the cities combo just thinks that it has already been expanded once. scenario number 2: retrieve all the state data from database to populate state combo. when user selects a state, application will retrieve from database only selected information. liked-comboboxes-remote.js: var databaseform = new ext.formpanel({ width: 400 ,height: 200 ,style:'margin:16px' ,bodystyle:'padding:10px' ,title:'linked combos - database' ,defaults: {xtype:'combo'} ,items:[{ fieldlabel:'select state' ,displayfield:'name' ,valuefield:'id' ,store: new ext.data.jsonstore({ url: 'state/view.action', remotesort: false, autoload:true, idproperty: 'id', root: 'data', totalproperty: 'total', fields: ['id','name'] }) ,triggeraction:'all' ,mode:'local' ,listeners: { select: { fn:function(combo, value) { var combocity = ext.getcmp('combo-city'); //set and disable cities combocity.setdisabled(true); combocity.setvalue(''); combocity.store.removeall(); //reload city store and enable city combobox combocity.store.reload({ params: { stateid: combo.getvalue() } }); combocity.setdisabled(false); } } } },{ fieldlabel:'select city' ,displayfield:'name' ,valuefield:'id' ,disabled:true ,id:'combo-city' ,store: new ext.data.jsonstore({ url: 'city/getcitiesbystate.action', remotesort: false, idproperty: 'id', root: 'data', totalproperty: 'total', fields: ['id','stateid','name'] }) ,triggeraction:'all' ,mode:'local' ,lastquery:'' }] }); the state combo box is declared on lines 9 to 38. the city combo box is declared on lines 40 to 55. note that only state combo store is loaded when we load the page, as we can see at the line 15 (autoload:true). the state combo box has the select event listener that, when executed, reloads the data for cities store (passes stateid as parameter) based on the currently selected state. you can see it on lines 24 to 38. cities combo has lastquery:”" . this is to fool internal combo filtering routines on the first page load. the cities combo just thinks that it has already been expanded once. you can download the complete project from my github repository: i used eclipse ide + tomcat 7 to develop this sample project. references: http://www.sencha.com/learn/tutorial:linked_combos_tutorial_for_ext_2 from http://loianegroner.com/2010/10/tutorial-linkedcascading-extjs-combo-boxes-using-spring-mvc-3-and-hibernate-3-5/

October 20, 2010

by Loiane Groner

· 32,298 Views · 1 Like

Enum Tricks: Dynamic Enums

Introduced to Java 1.5, Enum is a very useful and well known feature. There are a lot of tutorials that explain enums usage in details (e.g. the official Sun’s tutorial). Java enums by definition are immutable and must be defined in code. In this article I would like to explain use case when dynamic enums are needed and how to implement them. Motivation There are 3 ways to use enums: direct access using the enum value, e.g. Color.RED access using enum name, e.g. Color.valueOf("RED") get enumeration of all enum values using values() method, e.g. Color.values() Sometimes it is very convenient to store some information about enum in DB, file etc. In this case the enum defined in code must contain appropriate values. For example let’s discover the enum Color: enum Color {RED, GREEN, BLUE;} Let’s assume that we would like to allow user to create his custom colors. So, we have to handle the table “Color” in database. But in this case we cannot continue using enum Color: if user adds new color (e.g. YELLOW) to DB we have to modify code and add this color to enum too. OK, we can refuse to use enum in this case. Just rename enum to class and initialize list of colors from DB. But what if we already have 100 thousand lines of code where methods values() and valueOf() of enum Color are used? No problem: we can implement valueOf() and values() manually for the new class “Color.” Now we see the problem. What if we have to refactor 12 enums like Color? Copy/Paste the new methods to all these classes? I would like to suggest other, more generic approach. Making enums dynamic Enum is compile time feature. When we create enum Foo, class Foo that extends java.lang.Enum is generated for us automatically. This is the reason that enum cannot extend other class (multiple inheritance is not supported in Java). Moreover some compiler magic prevents any attempt to manually write a class that extends java.lang.Enum. The only solution is to write yet another class similar to Enum. I wrote class DynaEnum. This class mimics functionality of Enum but it is regular class, so it can be inherited. A static hash table holds elements of all created dynamic enums. My DynaEnum contains a generic implementation of valueOf() and values() done using reflection, so users of this class do not have to implement them. This class allows relatively easy refactoring for use case described above. Just change enum to class, inherit it from DynaEnum and write code to initialize members. Example Source code of the examples can be found here. I implemented two examples: DigitsDynaEnum that does not have any added value relative to enum. It is implemented mostly to check that the base class functionality works. DigitsDynaEnumTest contains several unit tests. package com.alexr.dynaenum; public class DigitsDynaEnum extends DynaEnum { public final static DigitsDynaEnum ZERO = new DigitsDynaEnum("ZERO", 0); public final static DigitsDynaEnum ONE = new DigitsDynaEnum("ONE", 1); public final static DigitsDynaEnum TWO = new DigitsDynaEnum("TWO", 2); public final static DigitsDynaEnum THREE = new DigitsDynaEnum("THREE", 3); protected DigitsDynaEnum(String name, int ordinal) { super(name, ordinal); } public static DynaEnum>[] values() { return values(DigitsDynaEnum.class); } } PropertiesDynaEnum is a dynamic enum that reads its members from properties file. Its subclass, WritersDynaEnum, reads a list of famous writers from a properties file WritersDynaEnum.properties. package com.alexr.dynaenum; import java.io.BufferedReader; import java.io.InputStreamReader; import java.lang.reflect.Constructor; public class PropertiesDynaEnum extends DynaEnum { protected PropertiesDynaEnum(String name, int ordinal) { super(name, ordinal); } public static DynaEnum>[] values() { return values(PropertiesDynaEnum.class); } protected static void init(Class clazz) { try { initProps(clazz); } catch (Exception e) { throw new IllegalStateException(e); } } private static void initProps(Class clazz) throws Exception { String rcName = clazz.getName().replace('.', '/') + ".properties"; BufferedReader reader = new BufferedReader(new InputStreamReader(Thread.currentThread().getContextClassLoader().getResourceAsStream(rcName))); Constructor minimalConstructor = getConstructor(clazz, new Class[] {String.class, int.class}); Constructor additionalConstructor = getConstructor(clazz, new Class[] {String.class, int.class, String.class}); int ordinal = 0; for (String line = reader.readLine(); line != null; line = reader.readLine()) { line = line.replaceFirst("#.*", "").trim(); if (line.equals("")) { continue; } String[] parts = line.split("\\s*=\\s*"); if (parts.length == 1 || additionalConstructor == null) { minimalConstructor.newInstance(parts[0], ordinal); } else { additionalConstructor.newInstance(parts[0], ordinal, parts[1]); } } } @SuppressWarnings("unchecked") private static Constructor getConstructor(Class clazz, Class[] argTypes) { for(Class c = clazz; c != null; c = c.getSuperclass()) { try { return (Constructor)c.getDeclaredConstructor(String.class, int.class, String.class); } catch(Exception e) { continue; } } return null; } } The goal is implemented! We have class that mimics functionality of enum but is absolutely dynamic. We can change list of writers without re-compiling the code. Therefore we can actually store the enum values everywhere and change “enums” at runtime. It is not a problem to implement something like “JdbcDynaEnum” that reads values from database but this implementation is out of scope of this article. Limitations The solution is not ideal. It uses static initialization that could cause problems in multi-class loaders environment. Members of dynamic enums obviously cannot be accessed directly (Color.RED) but only using valueOf() or values(). But still in some cases this technique may be very useful. Conclusions Java enum is a very powerful feature but it has serious limitations. Enum values are static and have to be defined in code. This article suggests trick that allows to mimic enum functionality and store “enum” values separately from code.

October 19, 2010

by Alexander Radzin

· 219,770 Views · 5 Likes

Practical PHP Patterns: Record Set

This is the last article from the Practical PHP Patterns series. Stay tuned on css.dzone.com for the new series, Practical PHP Testing Patterns. The RecordSet pattern's goal is to represent a set of relational database rows, with the main purpose of giving access to their values (a data structure), sometimes with the possibility of modification (via single Row Data Gateway instances). Despite the term set, the rows have usually a defined order. The intent is ultimately representing a result from an SQL query with an object, to gain the usual advantages of objects over scalars and functions: it can be passed around but maintain its encapsulated behavior, injected, mocked, wrapped and so on. A Record Set is usually not mocked if it is provided by an external extension or library, because instancing a real one working on a lightweight database such as Sqlite is used in substitution for the real one. Especially in the PHP world, this solution is fast enough to become a standard. Today, Record Set is less used on the client side to favor Object-Relational Mapping approaches, which make some kind of translation over the raw rows (what an horrible pun). Record Set instead maintains by definition a one-to-one relationship with the table rows, and it is still diffused in ORM internals (such as Doctrine's own code) or in applications that call PDO directly. It is a fairly basic pattern, but given that a vast part of legacy PHP applications still uses mysql_query()... Interesting things happen when... Interesting leverages of this pattern happen when someone stays in the middle between the Record Set creation and the user interface, with the goal of modifying or decorating it. The UI can then explore the RecordSet and automatically generate itself, in a form of scaffolding. Continuing on this line of thought, UI components can edit the RecordSet without knowing the model which it refers to, via building forms driven by the Record Set metadata. This solution is diffused, but it does not scale to Domain Models with a level of complexity greater than plain arrays. Of course, the Record Set may also encapsulate business logic as a low-cost form of Domain Model. In this case, the ability to unlink it from the database connection is important to its serializability and ease of testing. Examples PDOStatement represents both a SQL query and a RecordSet implementation, after it has been executed. When fetching all the results, it returns an array. In other languages Record Sets are more evolved and can for example be used to navigate a table and modify only certain records (via their annexed Row Data Gateway). PDOStatement is used only for reading. If you want further functionalities (which obviously depends on your domain), you should create your own Record Set accordingly. It is probably best to wrap the PDOStatement because extending it is out of question due to the instantiation not being under our control. connection = $connection; } public function getTweetsRecordSet($username) { /** * @var PDOStatement this is a Record Set */ $stmt = $this->connection->prepare('SELECT * FROM tweets WHERE username = :username'); $stmt->bindValue(':username', $username, PDO::PARAM_STR); $stmt->execute(); return $stmt; } } $pdo = new PDO('sqlite::memory:'); $pdo->exec('CREATE TABLE tweets (id INT NOT NULL, username VARCHAR(255) NOT NULL, text VARCHAR(255) NOT NULL, PRIMARY KEY(id))'); $pdo->exec('INSERT INTO tweets (id, username, text) VALUES (42, "giorgiosironi", "Practical PHP Patterns has come to an end")'); $pdo->exec('INSERT INTO tweets (id, username, text) VALUES (43, "giorgiosironi", "Cool series: will continue as Practical PHP Testing Patterns")'); // client code $table = new TweetsTable($pdo); $recordSet = $table->getTweetsRecordSet('giorgiosironi'); while ($row = $recordSet->fetch()) { var_dump($row['text']); }

October 18, 2010

by Giorgio Sironi

· 3,333 Views

Enum tricks: hierarchical data structure

Java enums are typically used to hold array like data. This tip shows how to use enum for hierarchical structures. Motivation Once upon a time I wanted to create enum that contains various operating system, i.e. public enum OsType { WindowsNTWorkstation, WindowsNTServer, Windows2000Server, Windows2000Workstation, WindowsXp, WindowsVista, Windows7, Windows95, Windows98, Fedora, Ubuntu, Knopix, SunOs, HpUx, I did not like this structure because I'd like to see a group of WindowsNT that contains WinNTWorkstation and WindNT server. All windows versions should be in super group of "windows". Fedora, Knopix and Ubuntu are distributions of Linux. All Linux distributions together with SunOs and HpUx are Unix systems. All Windows systems have common properties. The same is about Unix systems. And I hate copy/paste programming. Solutions As always there are several solutions. Class per OS Solution The obvious solution here is to create separate classes per operating system and abstract classes that represent OS groups. For example class Fedora extends class Linux that extends class Unix that extends class OperatingSystem. We can enjoy all advantages of inheritance, so all common properties of Windows OS are stored in class Windows and can be overridden by its subclasses. But now we cannot see all operating systems together, iterate over them etc., i.e. very useful features of Java enum are missing. No problem! Now we can create enum like previous that holds custom field of type Class: public enum OsType { WindowsNTWorkstation(WindowsNTWorkstation.class), WindowsNTServer(WindowsNTServer.class), Windows2000Server(Windows2000Server.class), Windows2000Workstation(Windows2000Workstation.class), WindowsXp(WindowsXp.class), WindowsVista(WindowsVista.class), Windows7(Windows7.class), Windows95(Windows7.class), Windows98(Windows98.class), Fedora(Fedora.class), Ubuntu(Ubuntu.class), Knopix(Knopix.class), SunOs(SunOs.class), HpUx(HpUx.class), ; private Class clazz; OsType(Class clazz) { this.clazz = clazz; } } This solution is better but it still has disadvantages: Implementation of method that retrieves all "children" of specific OS (for example all Linux distributions) is hard and ineffective. Grouping is separate from enum. The solution is very verbose: each OS is represented by its own class even if the class has nothing to override. Hierarchical Enum To create hierarchy using enum we need custom field "parent" that is initialized by constructor: public enum OsType { OS(null), Windows(OS), WindowsNT(Windows), WindowsNTWorkstation(WindowsNT), WindowsNTServer(WindowsNT), Windows2000(Windows), Windows2000Server(Windows2000), Windows2000Workstation(Windows2000), WindowsXp(Windows), WindowsVista(Windows), Windows7(Windows), Windows95(Windows), Windows98(Windows), Unix(OS) { @Override public boolean supportsXWindows() { return true; } }, Linux(Unix), AIX(Unix), HpUx(Unix), SunOs(Unix), ; private OsType parent = null; private OsType(OsType parent) { this.parent = parent; } } This structure allows implementation of method "is" that works like operator instanceof for classes and interfaces. For example Windows2000 is Windows, Fedora is Linux, Windows is not Unix etc. public boolean is(OsType other) { if (other == null) { return false; } for (OsType t = this; t != null; t = t.parent) { if (other == t) { return true; } } return false; } Sometimes we need a method that returns all "children" of current nodes, e.g. all Linux systems or all variants of Windows2000. The easiest way to implement this is to hold collection of children per element and fill it from constructor: private List children = new ArrayList(); private OsType(OsType parent) { this.parent = parent; if (this.parent != null) { this.parent.addChild(this); } } Now method "children()" that returns direct node's children is trivial: public OsType[] children() { return children.toArray(new OsType[children.size()]); } It is not hard to implement recursive method "allChildren()" that returns all children of current node (see full source code). But hierarchy term is always accompanied by inheritance that allows overriding methods of parent. This is the basic feature of classes in all object oriented languages. Is it possible to implement a kind of inheritance relationship for elements of one enum? Overriding parent's method Unix systems support X Window graphical environment. MS Windows does not. We would like to be able to ask OS whether it supports X Window. We can define boolean flag "supportsX" and boolean method public boolean supportsX() {return suppotsX;} Now we have to add yet another argument to OsType constructor and pass true/false for each element of the enum. But it is too verbose. Is it possible to say that Unix supports X, Windows does not support X and be sure that Fedora's supportX() returns true while Winddows95's supportX() returns false? The implementation is pretty simple. First for simplicity let's say that X Window is supported by all Unix systems and is not supported by others. So, we can implement method supportsXWindowSystem() at enum level as following: public boolean supportsXWindowSystem() { return false; } Now we have to override it for all Unix systems. To implement this we change the default implementation to following: public boolean supportsXWindowSystem() { return parent == null ? false : parent.supportsXWindowSystem(); } The method of first parent in hierarchy that implements the method will be used. If no one of parents and parents of parents does not implement this method itself we call method of root element. Now we can say the following: ... Unix(OS) { @Override public boolean supportsXWindowSystem() { return true; } }, Linux(Unix), AIX(Unix), HpUx(Unix), SunOs(Unix), ... The method is overridden for Unix element only and all its children will use this method. The problem is solved. We got enum based hierarchical polymorphic structure! We can implement method in base element (using it like a super class) and then override it in any element we want. The only disadvantage of this solution is that now we have to create similar implementation for each method we add to this enum and for all other enums that hold hierarchical structure. Conclusions Although we are regular to use enums as some kind of static arrays they also can be used to present hierarchical tree-like data structures where each node can find its parent, its children and even inherit and override parent's method almost exactly as we do with class inheritance. Acknowledgments First version of this article has been written in my blog. Method supportsXWindowSystem() there was implemented using reflection. 4 guys discussed this solution and suggested me to simplify it. I would like to thank Todo, Anton Dumler, Nick and Johannes Schneider that helped me to improve the article.

October 18, 2010

by Alexander Radzin

· 80,872 Views · 2 Likes

REST API: for Infrastructure, Domain or Application Layer?

It seems that lots of projects/products/services want to expose a REST API these days. But I have found very few that actually follow the REST constraints, and in a lot of the cases it doesn't even make sense for them to follow REST constraints in the first place. One of the main constraints that is commonly violated is the hypertext constraint. Basically, all state changes have to be done by following links, starting from a bookmarked URL. But almost noone does that. However, should they? This article will outline various layers that REST API's can be implemented in, and when it makes sense, and when not. To begin with, in a typical enterprise app there are three options for layers that you might want to expose using a REST API. These are the infrastructure layer, the domain layer, and the application layer. Infrastructure layer If we start with the infrastructure layer, we are typically talking about a database vendor that wants to allow developers to access it using "REST". The API would allow you to create/remove databases, and then insert/update/delete data. Typically it's pretty normal stuff, and the API doesn't change all that much between versions. Accessing this over HTTP maybe makes sense, but is it RESTful? I'll give you an example. I installed CouchDB, and given the hypermedia constraint I should then be able to go to "http://localhost:5984/", and it will tell me what I can do next (like create a database). But when I do a GET on that URL I get this: {"couchdb":"Welcome","version":"1.0.1"} So now what? The hypermedia doesn't tell me what I can do, so therefore as a REST client I will assume there's nothing I can do. This very simple test shows that the HTTP API for CouchDB isn't really RESTful at all. The question is: should it be? That is obviously up to the developers to decide. But if I were the architect I would maybe say, no, it shouldn't be RESTful. Why? Because I want to allow URL templates to be used, so that the client, given the server URL and a document id, is allowed to construct a URL on its own and GET the document. If this was truly RESTful the client would have to do a query in a form first, with the id, in order to get the URL of the document to be retrieved. That might be inefficient for a database, so I might opt not to do this. Which is, in effect, what they already have done. The only problem is that they call it RESTful, when it isn't, so it gives me as a developer the wrong impression of what I can expect from it. This line of reasoning could be done for pretty much most infrastructure layer API's. They're not RESTful, though many say they are, and most likely they shouldn't try to be! IT'S OK! Just say "Accessible over HTTP, see docs for URL templates and whatnot", and be done with it. Domain layer The next potential layer to be exposed over REST is the domain layer. This typically means that you take your domain entities and expose their data straight on the web, through CRUD operations. Very straightforward. There are tons of articles and blogs that show how to do this. But is it RESTful? Or is it even a good idea in the first place? The first test, again, would be to see if the app follows the hypermedia constraint. In this option it is technically possible to allow queries that will list the various URL's to entities in your domain, which you then can update/delete. So on the surface it might seem like you are following the hypermedia constraint. The problem usually comes with the fact that you are exposing domain state rather than application state. Let me explain through a simple example. Let's say you are building an issue tracker. You can access individual issues through links like: /issue/123 which on GET gives you documents such as: {"status":"OPEN","description":"Some issue"} Awesome. Now a client can change the status to "CLOSED" and PUT that. Tada! Case closed. Or is it? What if a client then decides to reopen it, by simply posting a new status of "OPEN" to it. Ok, that worked. But should it? Maybe your domain model really would have wanted it to only go to "REOPENED" from the "CLOSED" state. But how do you express that? How is the client to know that this is the only valid transition? And what happens when we have many versions of clients, each of which has a slightly different set of rules for what you are allowed to do when? Basically, chaos is ensured. And this is the problem with exposing your domain model using a REST API. The client has to own the application logic, and there's no way the server can be sure that it has the "right" logic. And the client, even if it *wants* to play nice (if code ever wants anything is debatable), will have a hard time knowing whether it is playing by the rules or not. It might even get a bit neurotic, trying to do the right thing, whatever that means. In summary, exposing your domain model does not help the client know what the valid state transitions are, and makes it very hard to do other things like role-based security authorization (maybe only an admin is allowed to REOPEN a CLOSED case?). I would therefore recommend that noone exposes their domain models using a REST API. Application layer Finally we come to the application layer. The application layer is designed to implement usecases of the domain model, and has all the context and logic needed to ensure that only valid state transitions are made. In short, it seems like it is especially appropriate to being exposed through a REST API, as it can at all points tell the client what it can do (either based on state or authorization rules or any other type of rules it might have). If we go back to the issue tracker, what would this mean in practice? It could mean that when you do a GET on /issue/123 you get something like this back: {"data":{"status":"OPEN","description":"Some issue"},"links":[{"close":"/issue/123/close.json"}]} This now instead of referring to viewing the domain state of an issue refers to the usecase of viewing an issue with the intent of working on it. There might be other URL's and other queries that only return the data, or maybe a table of the data, or somesuch. But this one, specifically, refers to the usecase of working with the issue. So, as a REST client I can now inspect the data, and then look at what links are available. If the client has a UI it can enable a button that says "Close issue" based on the available link, since it detected a link relation "close" that it understands. The client can then do GET on that link, find out whether the server expects any form to be filled in, and then submit it using POST, thereby letting the server application layer logic transition the issue to the "CLOSED" state. We are no longer relying on the client to contain the logic of knowing when to allow what, and the client also does not have to know how to construct the URL. As long as it can parse the hypertext (and we might use a custom JSON mediatype to indicate what "data" and "links" mean) and do something with it, we're fine. If we in the future change the domain model to also allow the "resolve" link relation for "OPEN" issues, old clients can ignore it, and new clients can enable new actions in the UI that uses it. In summary, the application layer is a very good candidate to be exposed through a REST API. It encapsulates the application rules for when the various state transitions are allowed, and can make use of user authorization to further enable/disable actions. This takes away a lot of responsibilities from the client, which now also can be "dynamic" in the sense that it can easily react to what state changes are available when by simply checking link availability in the hypermedia returned from the server. The main issue with exposing the application layer through a REST API is that there are pretty much no available frameworks that help you do all this in an easy way. But this is not REST's "fault", obviously, but rather that the "REST" community hasn't yet matured to understand what it should and what it should not do. In the Streamflow project we rolled our own simple framework for doing the above, and I'm very happy with that, but unfortunately most other frameworks seems to be in the "expose your domain model" camp, which means that a lot of this link management is non-trivial to do. This is a fixable situation though. I hope that this post has somewhat clarified what the issues are with exposing infrastructure and domain models through REST API's, and why it's not really a good idea in the first place, and why exposing the application layer really is the logical and simpler option. From http://www.jroller.com/rickard/entry/rest_api_for_infrastructure_domain

October 18, 2010

by Rickard Oberg

· 21,363 Views

Enum Tricks: Customized valueOf

When I am writing enumerations I very often found myself implementing a static method similar to the standard enum’s valueOf() but based on field rather than name: public static TestOne valueOfDescription(String description) { for (TestOne v : values()) { if (v.description.equals(description)) { return v; } } throw new IllegalArgumentException( "No enum const " + TestOne.class + "@description." + description); } Where “description” is yet another String field in my enum. And I am not alone. See this article for example. Obviously this method is very ineffective. Every time it is invoked it iterates over all members of the enum. Here is the improved version that uses a cache: private static Map map = null; public static TestTwo valueOfDescription(String description) { synchronized(TestTwo.class) { if (map == null) { map = new HashMap(); for (TestTwo v : values()) { map.put(v.description, v); } } } TestTwo result = map.get(description); if (result == null) { throw new IllegalArgumentException( "No enum const " + TestTwo.class + "@description." + description); } return result; } It is fine if we have only one enum and only one custom field that we use to find the enum value. But if we have 20 enums, and each has 3 such fields, then the code will be very verbose. As I dislike copy/paste programming I have implemented a utility that helps to create such methods. I called this utility class ValueOf. It has 2 public methods: public static , V> T valueOf(Class enumType, String fieldName, V value); which finds the required field in specified enum. It is implemented utilizing reflection and uses a hash table initialized during the first call for better performance. The other overridden valueOf() looks like: public static > T valueOf(Class enumType, Comparable comparable); This method does not cache results, so it iterates over enum members on each invocation. But it is more universal: you can implement comparable as you want, so this method may find enum members using more complicated criteria. Full code with examples and JUnit test case are available here. Conclusions Java Enums provide the ability to locate enum members by name. This article describes a utility that makes it easy to locate enum members by any other field.

October 16, 2010

by Alexander Radzin

· 80,169 Views

Reduce Boilerplate Code for DAO's -- Hades Introduction

Most web applications will have DAO's for accessing the database layer. A DAO provides an interface for some type of database or persistence mechanism, providing CRUD and finders operations without exposing any database details. So, in your application you will have different DAO's for different entities. Most of the time, code that you have written in one DAO will get duplicated in other DAO's because much of the functionality in DAO's is same (like CRUD and finder methods). One of way of avoiding this problem is to have generic DAO and have your domain classes inherit this generic DAO implementation. You can also add finders using Spring AOP; this approach is explained Per Mellqvist in this article. There is a problem with the approach: this boiler plate code becomes part of your application source code and you will have to maintain it. The more code you write, there are more chances of new bugs getting introduced in your application. So, to avoid writing this code in an application, we can use an open source framework called Hades. Hades is a utility library to work with Data Access Objects implemented with Spring and JPA. The main goal is to ease the development and operation of a data access layer in applications. In this article, I will show you how easy it is write DAO's using Hades without writing any boiler plate code. In order to introduce you to Hades, I will show you how we can manage an entity like Book. Before we write any code we need to add the following dependencies to pom.xml. org.synyx.hades org.synyx.hades 2.0.0.RC3 org.hibernate hibernate-entitymanager 3.5.5-Final So, lets start by creating a Book Entity import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; @Entity public class Book { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; @Column(unique = true) private String title; private String author; private String isbn; private double price; // setters and getters } This is a very simple JPA entity without any relationships. Now that we have modeled our entity we need to add a DAO interface for handling persistence operations. You need to create a BookDao interface which will extend GenericDao interface provided by Hades. GenericDao is an interface for generic CRUD operations on a DAO for a specific type. So, we passed the type parameters Book for entity and Long for id. import org.synyx.hades.dao.GenericDao; public interface BookDao extends GenericDao { } GenericDao has an default implementation called GenericJpaDao which provides implementation of all its operations. Now that we have created a BookDao interface, we will configure it in the Spring application context xml. Hades provides a factory bean which will provide the DAO instance for the given interface (in our case BookDao). In the xml shown above I have used a new feature introduced in Spring 3 Embedded Databases to give me the instance of HSQL database datasource. You can refer to my earlier post on Embedded databases in case you are not aware of it. I have used Hibernate as my JPA provider so you need to configure it in persistence.xml as shown below org.hibernate.ejb.HibernatePersistence Next we will write a JUnit test for testing this code. @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration @Transactional public class BookDaoTest { @Autowired private BookDao bookDao; private Book book; @Before public void setUp() throws Exception { book = new Book(); book.setAuthor("shekhar"); book.setTitle("Effective Java"); book.setPrice(500); book.setIsbn("1234567890123"); } @Test public void shouldCreateABook() throws Exception { Book persistedBook = bookDao.save(book); assertBook(persistedBook); } @Test public void shouldReadAPersistedBook() throws Exception { Book persistedBook = bookDao.save(book); Book bookReadByPrimaryKey = bookDao.readByPrimaryKey(persistedBook.getId()); assertBook(bookReadByPrimaryKey); } @Test public void shouldDeleteBook() throws Exception { Book persistedBook = bookDao.save(book); bookDao.delete(persistedBook); Book bookReadByPrimaryKey = bookDao.readByPrimaryKey(persistedBook.getId()); assertNull(bookReadByPrimaryKey); } private void assertBook(Book persistedBook) { assertThat(persistedBook, is(notNullValue())); assertThat(persistedBook.getId(), is(not(equalTo(null)))); assertThat(persistedBook.getAuthor(), equalTo(book.getAuthor())); } } Auto Configuration Using Spring Namespaces The way that we have configured the DAO can become quite cumbersome if the number of DAO's increases. To overcome this we can make use of namespaces to configure daos. This configuration will trigger the auto detection mechanism of DAOs that extend GenericDAO or Extended-GenericDAO. It will create DAO instances for all the DAO interfaces found in this package. You can use or for including or excluding interfaces from getting their beans created. Adding Finders and Query Methods So far we have used the inbuilt operations provided by GenericDao but most of the time we need to add our own finders methods like findByAuthorAndTitle, findWithPriceLessThan . Hades makes it very easy for you to add such methods in your domain dao interface like BookDao. Hades provides 3 strategies for creating JPA query at runtime. These are :- CREATE : This will create a JPA query from method name. This strategy ties you with the method name so you have to think twice before changing the method name. public interface BookDao extends GenericDao { public Book findByAuthorAndTitle(String author, String title); } // test code @Test public void shouldFindByAuthorAndTitle() throws Exception { Book persistedBook = bookDao.save(book); Book bookByAuthorAndTitle = bookDao.findByAuthorAndTitle("shekhar", "Effective Java"); assertBook(bookByAuthorAndTitle); } USE DECLARED QUERY : This lets you define query using JPA @NamedQuery or Hades @Query annotation. If no query is found exception will be thrown. @Query("FROM Book b WHERE b.author = ?1") public Book findBookByAuthorName(String author); // test code public void shouldFindBookByAuthorName() { Book persistedBook = bookDao.save(book); Book bookByAuthor = bookDao.findBookByAuthorName("shekhar"); assertBook(bookByAuthor); } CREATE IF NOT FOUND : It is the combination of both the strategies mentioned above. It will first lookup for the declared query and if it is not found will lookup for method name. This is by default option and you can change it by changing query-lookup-strategy attribute in hades:dao-config element. Hades queries also has the support for pagination and sorting. You can pass the instance of Pageable and Sort to the finder methods create above. public Page findByAuthor(String author, Pageable pageable); public List findByAuthor(String author, Sort sort); Hades is not limited to just limited to CRUD and adding custom finder methods. There are some other features like auditing ,Specifications,etc that I will discuss in second part of this article.

October 15, 2010

by Shekhar Gulati

· 21,182 Views

Asynchronous (non-blocking) Execution in JDBC, Hibernate or Spring?

There is no so-called asynchronous execution support in JDBC mainly because most of the time you want to wait for the result of your DML or DDL, or because there is too much complexity involved between the back-end database and the front end JDBC driver. Some database vendors do provide such support in their native drives. For example Oracle supports non-blocking calls in its native OCI driver. Unfortunately it is based on polling instead of callback or interrupt. Neither Hibernate or Spring supports this feature. But sometimes you do need such a feature. For example some business logic is still implemented using legacy Oracle PL/SQL stored procedures and they run pretty long. The front-end UI doesn't want to wait for its finish and it just needs to check the running result later in a database logging table into which the store procedure will write the execution status. In other cases your front-end application really cares about low latency and doesn't care too much about how individual DML is executed. So you just fire a DML into the database and forget the running status. Nothing can stop you from making asynchronous DB calls using multi-threading in your application. (Actually even Oracle recommends to use multi-thread instead of polling OCI for efficiency). However you must think about how to handle transaction and connection (or Hibernate Session) in threads. Before continuing, let's assume we are only handling local transaction instead of JTA. 1. JDBC It is straightforward. You just create another thread (DB thread hereafter) from the calling thread to make the actual JDBC call. If such a call is frequent, you call use ThreadPoolExecutor to reduce thread's creation and destroy overhead. 2. Hibernate You usually use session context policy "thread" for Hibernate to automatically handle your session and transaction. With this policy, you get one session and transaction per thread. When you commit the transaction, Hibernate automatically closes the session. Again you need to create a DB thread for the actual stored procedure call. Some developer may be wondering whether the new DB thread inherits its parent calling thread's session and transaction. This is an important question. First of all, you usually don't want to share the same transaction between the calling thread and its spawned DB thread because you want to return immediately from the calling thread and if both threads share the same session and transaction, the calling thread can't commit the transaction and long running transaction should be avoided. Secondly Hibernate's "thread" policy doesn't support such inheritance because if you look at Hibernate's corresponding ThreadLocalSessionContext, it is using ThreadLocal class instead of InheritableThreadLocal. Here is a sample code in the DB thread: // Non-managed environment and "thread" policy is in place // gets a session first Session sess = factory.getCurrentSession(); Transaction tx = null; try { tx = sess.beginTransaction(); // call the long running DB stored procedure //Hibernate automatically closes the session tx.commit(); } catch (RuntimeException e) { if (tx != null) tx.rollback(); throw e; } 3.Spring's Declarative Transaction Let's suppose your stored procedure call is included in method: @Transactional(readOnly=false) public void callDBStoredProcedure(); The calling thread has the following method to call the above method asynchronously using Spring's TaskExecutor: @Transactional(readOnly=false) public void asynchCallDBStoredProcedure() { //creates a DB thread pool this.taskExecutor.execute(new Runnable() { @Override public void run() { //call callDBStoredProcedure() } }); } You usually configure Spring's HibernateTransactionManager and the default proxy mode (aspectj is another mode) for declarative transactions. This class binds a transaction and a Hibernate session to each thread and doesn't Inheritance either just like Hibernate's "thread" policy. Where you put the above method callDBStoredProcedure() makes a huge difference. If you put the method in the same class as the calling thread, the declared transaction for callDBStoredProcedure() doesn't take place because in the proxy mode only external or remote method calls coming in through the AOP proxy (an object created by the AOP framework in order to implement the transaction aspect. This object supports your calling thread's class by composing an instance of your calling thread class) will be intercepted. This meas that "self-invocation", i.e. a method within the target object (the composed instance of your calling thread class in the AOP proxy) calling some other method of the target object, won't lead to an actual transaction at runtime even if the invoked method is marked with @Transactional! So you must put callDBStoredProcedure() in a different class as a Spring's bean so that the DB thread in method asynchCallDBStoredProcedure() can load that bean's AOP proxy and call callDBStoredProcedure() through that proxy. About The Author Yongjun Jiao is a technical manager with SunGard Consulting Services. He has been a professional software developer for the past 10 years. His expertise covers Java SE, Java EE, Oracle, application tuning and high performance and low latency computing.

October 15, 2010

by Yongjun Jiao

· 98,291 Views · 6 Likes

Java Barcode API

Originally Barcodes were 1D representation of data using width and spacing of bars. Common bar code types are UPC barcodes which are seen on product packages. There are 2D barcodes as well (they are still called Barcodes even though they don’t use bars). A common example of 2D bar code is QR code (shown on right) which is commonly used by mobile phone apps. You can read history and more info about Barcodes on Wikipedia. There is an open source Java library called ‘zxing’ (Zebra Crossing) which can read and write many differently types of bar codes formats. I tested zxing and it was able to read a barcode embedded in the middle of a 100 dpi grayscale busy text document! This article demonstrates how to use zxing to read and write bar codes from a Java program. Getting the library It would be nice if the jars where hosted in a maven repo somewhere, but there is no plan to do that (see Issue 88). Since I could not find the binaries available for download, I decided to download the source code and build the binaries, which was actually quite easy. The source code of the library is available on Google Code. At the time of writing, 1.6 is the latest version of zxing. 1. Download the release file ZXing-1.6.zip (which contains of mostly source files) from here. 2. Unzip the file in a local directory 3. You will need to build 2 jar files from the downloaded source: core.jar, javase.jar Building core.jar cd zxing-1.6/core mvn install cd zxing-1.6/core mvn install This will install the jar in your local maven repo. Though not required, you can also deploy it to your company’s private repo by using mvn:deploy or by manually uploading it to your maven repository. There is an ant script to build the jar as well. Building javase.jar Repeat the same procedure to get javase.jar cd zxing-1.6/javase mvn install cd zxing-1.6/javase mvn install Including the libraries in your project If you are using ant, add the core.jar and javase.jar to your project’s classpath. If you are using maven, add the following to your pom.xml. com.google.zxing core 1.6-SNAPSHOT com.google.zxing javase 1.6-SNAPSHOT com.google.zxing core 1.6-SNAPSHOT com.google.zxing javase 1.6-SNAPSHOT Once you have the jars included in your project’s classpath, you are now ready to read and write barcodes from java! Reading a Bar Code from Java You can read the bar code by first loading the image as an input stream and then calling this utility method. InputStream barCodeInputStream = new FileInputStream("file.jpg"); BufferedImage barCodeBufferedImage = ImageIO.read(barCodeInputStream); LuminanceSource source = new BufferedImageLuminanceSource(barCodeBufferedImage); BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source)); Reader reader = new MultiFormatReader(); Result result = reader.decode(bitmap); System.out.println("Barcode text is " + result.getText()); InputStream barCodeInputStream = new FileInputStream("file.jpg"); BufferedImage barCodeBufferedImage = ImageIO.read(barCodeInputStream); LuminanceSource source = new BufferedImageLuminanceSource(barCodeBufferedImage); BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source)); Reader reader = new MultiFormatReader(); Result result = reader.decode(bitmap); System.out.println("Barcode text is " + result.getText()); Writing a Bar Code from Java You can encode a small text string as follows: String text = "98376373783"; // this is the text that we want to encode int width = 400; int height = 300; // change the height and width as per your requirement // (ImageIO.getWriterFormatNames() returns a list of supported formats) String imageFormat = "png"; // could be "gif", "tiff", "jpeg" BitMatrix bitMatrix = new QRCodeWriter().encode(text, BarcodeFormat.QR_CODE, width, height); MatrixToImageWriter.writeToStream(bitMatrix, imageFormat, new FileOutputStream(new File("qrcode_97802017507991.png"))); String text = "98376373783"; // this is the text that we want to encode int width = 400; int height = 300; // change the height and width as per your requirement // (ImageIO.getWriterFormatNames() returns a list of supported formats) String imageFormat = "png"; // could be "gif", "tiff", "jpeg" BitMatrix bitMatrix = new QRCodeWriter().encode(text, BarcodeFormat.QR_CODE, width, height); MatrixToImageWriter.writeToStream(bitMatrix, imageFormat, new FileOutputStream(new File("qrcode_97802017507991.png"))); In the above example, the bar code for “97802017507991″ is written to the file “qrcode_97802017507991.png” (click to see the output). JavaDocs and Documentation The Javadocs are part of the downloaded zip file. You can find a list of supported bar code formats in the Javadocs. Open the following file to see the javadocs. zxing-1.6/docs/javadoc/index.html zxing-1.6/docs/javadoc/index.html From http://www.vineetmanohar.com/2010/09/java-barcode-api/

September 27, 2010

by Vineet Manohar

· 112,412 Views · 2 Likes