Databases Resources

The Latest Databases Topics

SQL Server Driver for PHP Connection Options: Encrypt

This short post adds to my series on connection options in the SQL Server Driver for PHP. I’ll go into a bit more detail on the Encrypt and TrustServerCertificate options than the driver documentation does. I’ll start with three important points related to these options, then I’ll go into a couple of hypothetical situations that should shed more light on what these options actually do. The first thing to note is that these two options, Encrypt and TrustServerCertificate, are often used together. The Encrypt option is used to specify whether or not the connection to the server is encrypted (the default is false). The TrustServerCertificate option is used to indicate whether the certificate on the SQL Server instance should be trusted (the default is false). Note: Setting the Encrypt option does not mean that data is stored in an encrypted form. It simply means that data is encrypted while in transport to the server. Also note that this setting does not apply to authentication credentials such as a SQL Server password – authentication credentials are always encrypted. For information about storing encrypted data, see Encryption Hierarchy in the SQL Server documentation. The second thing to note is that, by default, when SQL Server is installed it creates a self-signed certificate that it will use to encrypt connections. Of course, self-signed certificates are not ideal for secure connections (they are vulnerable to man-in-the-middle attacks), so it is best to replace this certificate with one from a certificate authority (CA). The third thing to know is how to use these options when connecting to SQL Server. If you are using the SQLSRV API, your connection code might look something like this: $serverName = "serverName"; $connectionInfo = array( "Database"=>"DbName", "UID"=>"UserName", "PWD"=>"Password", "Encrypt"=>true, "TrustServerCertificate"=>false); $conn = sqlsrv_connect( $serverName, $connectionInfo); If you are using the PDO_SQLSRV API, your connection code might look something like this: $serverName = "serverName"; $conn = new PDO("sqlsrv:Server = $serverName; Database = DBName; Encrypt = true; TrustServerCertificate = false", "UserName", "Password"); Now, with those three things in mind, let’s look at a couple of examples. First suppose the following: You did not replace the self-signed certificate created when SQL Server was installed. You set Encrypt = true. You set TrustServerCertificate = false. In this scenario, your connection will fail. When you set TrustServerCertificate = false, you are asking for some 3rd party verification of the certificate. Because this is a self-signed certificate, there is no 3rd party to do the verification. However, if you set TrustServerCertificate = true, then your connection will succeed because you are trusting the certificate. (Note that, as mentioned earlier, this connection would be vulnerable to man-in-the-middle attacks.) Now consider the following: You replaced the self-signed certificate created when SQL Server was installed with a certificate from a certificate authority (CA). You set Encrypt = true. In this case (assuming there are no problems with your certificate), regardless of your setting for TrustServerCertificate, your connection will succeed. However, for a more secure connection (one not vulnerable to man-in-the-middle attacks), you should set TrustServerCertificate = false. Doing so will force third party verification of the certificate. For information about installing a certificate on SQL Server, see How to: Enable Encrypted Connections to the Database Engine. Note that that topic references setting the Force Encryption option on the database server. Setting this option to Yes on the server does the same thing as setting Encryption = true on the client – it forces the connection to be encrypted using the server’s certificate. That’s it for today. Thanks. -Brian

March 14, 2011

by Brian Swan

· 12,723 Views

Persisting Entity Classes using XML in JPA

Preface This document explains the working of a JPA application purely based on XML configurations. This explains how to create entity classes without using any annotations in Java classes. Entire configuration is done using xml files. Introduction Persistence is one of the major challenges for enterprise applications. JPA is the persistence standard from Sun Microsystems. JPA supports two ways of configuring persistence information- through annotations and XML files. This article discusses the XML based approach towards embedding persistence information for the entity classes. Setting up the environment JPA is light-weight persistence model. It works for both JavaSE and JavaEE applications. All the required class files are present in the JavaEE.jar file which should be added to the classpath. Along with the JavaEE.jar file, we need to add the Persistence Provider jar file as well to the classpath. The provider can be Toplink/ Hibernate/ App server specific persistence provider. In this application, we’ve used Toplink as the persistence provider. So we need to add toplink-essentials.jar file as well in the class path. To summarize, the jar files required are: 1.JavaEE.jar 2.ToplinkEssentials.jar (Replace with the jar file for Persistence Provider, in case you want to use other persistence provider) 3.derbyClient.jar (For JavaDB (Derby) Database, replace this with the jar file of your database) Instead of adding the configuration details in the entity class in the form of annotations, we’ll add the configuration information in the orm.xml file. Before we look into the code, let us discuss the orm.xml file in detail. orm.xml orm.xml file contains all the configuration details required for mapping a class to a relational database table. These details include the primary key of the entity class and the various constraints/ rules to be applied for the primary key. Other details about the entity class include the various attributes of the entity class and columns to which the attributes should be mapped. We can also specify multiple constraints for the same attribute. Other configurable things include information about the various relationships the entity may have with other entities (one-to-one, many-to-one, many-to-many or one-to-many), embedded attributes, information about the version and transient attributes. One application can have multiple entities. The configuration details about all these entity classes, embeddable classes go inside the same orm.xml file. Name of this configuration file can be anything, not mandatorily orm.xml. It should be placed in META-INF subdirectory along with the persistence.xml file. Persistence.xml file should include the orm.xml file. Please find below sample orm.xml and persistence.xml file. The xml schema definitions are highlighted for both. Also, please note the entry required in persistence.xml for including the orm.xml file. PFB the entries made in a sample orm.xml, which’ll persist employee instances to the database. My First JPA XML Application entity Different properties in orm.xml file The various properties in the orm.xml in the order defined in xml schema definition are discussed below: Root tag is It contains the following four types of elements: The persistence-unit-metadata element contains metadata for the entire persistence unit. It is undefined if this element occurs in multiple mapping files within the same persistence unit. The package, schema, catalog and access elements apply to all the entity, mapped-superclass and embeddable elements defined in the same file in which they occur. The sequence-generator, table-generator, named-query, named-native-query and sql-result-set-mapping elements are global to the persistence unit. The entity, mapped-superclass and embeddable elements each define the mapping information for a managed persistent class. The mapping information contained in these elements may be complete or it may be partial. One line string information about the entity classes in the application. specifies the package of the classes listed within the sub elements and attributes of the same mapping file only. classAttribute defines the fully qualified class name of the entity class name: Attribute defines the name of the entity entity: tag can be repeated to embed mapping information for all the entity classes in the application. The Sub-Tags of entity tag are maps the database table to which the entity class shall be persisted overrides or creates a new id-class setting overrides or creates a new inheritance setting overrides or creates a new discriminator value setting, useful in Single_Table Inheritance strategy. overrides or creates a new discriminator column setting, used while configuring Super class in Single_Table inheritance strategy. A sequence-generator is unique by name A table-generator is unique by name used to define named-query for the entity class. Can be repeated to define multiple named queries for the entity class. used to define native named query for the entity class. creates or overrides a pre-persist setting i.e. the entity listener method to be invoked before persisting the entity instance. creates or overrides a post-persist setting i.e. the entity listener method to be invoked after persisting the entity instance. creates or overrides a pre-remove setting i.e. the entity listener method to be invoked before removing the entity instance. creates or overrides a post-remove setting i.e. the entity listener method to be invoked after removing the entity instance. creates or overrides a pre-update setting i.e. the entity listener method to be invoked before updating the entity instance. creates or overrides a post-update setting i.e. the entity listener method to be invoked before updating the entity instance. creates or overrides a post-load setting i.e. the entity listener method to be invoked after loading the entity instance state from database defines the attributes of the entity class which shall be persisted in the database table. The sub tags of the attributes are: :- this tag defines the id column of the entity class. Cannot be repeated for an entity. An entity class can have only one Id attribute. :-this is used to define the ID generation strategy to be used for the primary key column. :-this tag is used to map the entity columns to the columns in the database table. Should be repeated to provide configurations for all the attributes of the entity class. :- This tag is used to add the various column-level constraints on the entity attributes. E.g. unique, insertable, updatable, length, precision etc. :- maps the Version attribute of the entity class. Development Environment We’ve used NetBeans IDE 6.5.1 for creating this JavaSE application for persistence through XML files. NetBeans has persistence support for JPA as well as Hibernate. So, we can choose either Hibernate or Toplink Essentials as the Persistence Provider. Let’s name the application as “JPAEntity”. The name of the entity class is “Employee” and is placed in the package “entity”. “EmpClient.java” is the client class. The xml configuration files are put in META-INF sub directory. The directory structure of the application is as follows:- First JPA META-INF orm.xml persistence.xml entity Employee.java EmpClient.java Developing the Application Step1:Start the NetBeans IDE. Create a new Java Application Step 2:Add the jar files for the Persistence Provider (Toplink Essentials in our case), JavaEE.jar and the database driver(DerbyClient.jar in our case) to the classpath. Step 3: Add the entity class “Employee.java” and client class “EmpClient.java” in the package "jpaentity" This is the code of Employee.java package jpaentity; public class Employee { private int empId; private String empName; private double empSalary; public Employee() { } public Employee(int empId, String empName, double empSalary) { this.empId = empId; this.empName = empName; this.empSalary = empSalary; } public int getEmpId() { return empId; } public void setEmpId(int empId) { this.empId = empId; } public String getEmpName() { return empName; } public void setEmpName(String empName) { this.empName = empName; } public double getEmpSalary() { return empSalary; } public void setEmpSalary(double empSalary) { this.empSalary = empSalary; } @Override public String toString() { return "Employee Id:="+empId+ +" Employee Name:="+empName+" Employee Salary:="+empSalary; } }//End of Employee.java This is the code of EmpClient.java package jpaentity; import java.util.List; import javax.persistence.EntityManager; import javax.persistence.EntityManagerFactory; import javax.persistence.Persistence; public class EmpClient { private EntityManager em; private EntityManagerFactory emf; public void initEmfAndEm() { emf=Persistence.createEntityManagerFactory("JPAEntityPU"); em=emf.createEntityManager(); } public void cleanup() { em.close(); } public void insertAndRetrieve() { System.out.println("-------------------Creating the Objects---------------------"); Employee empObj1=new Employee(1, "Anu", 1000.0); Employee empObj2=new Employee(2, "Rahul", 1500.0); System.out.println("-------------------Starting the transaction---------------------"); em.getTransaction().begin(); em.persist(empObj1); em.persist(empObj2); System.out.println("-------------------Committing the transaction---------------------"); em.getTransaction().commit(); System.out.println("-------------------Objects saved successfully--------------------"); System.out.println("*******************************************************************"); System.out.println("------------------- Reading Objects--------------------"); List emps=em.createQuery("select p from Employee p").getResultList(); for (Employee current:emps) System.out.println(current); System.out.println("-------------------Finished Reading Objects--------------------"); } public static void main(String args[]) { EmpClient myClient=new EmpClient(); System.out.println("-------------------Starting the Client---------------------"); myClient.initEmfAndEm(); myClient.insertAndRetrieve(); myClient.cleanup(); System.out.println("---------------Shutting down the Client---------------------"); } }//End of EmpClient.java Step 4: Set up the database connection. Go to Services Tab in the NetBeans IDE and expand the Databases node. Right click on JavaDB node and select Create Database. Give the database name, username and password and click on OK. The database is created and ready to accept connections. Step 5: Add the persistence unit to the application. Also, add the orm.xml file. The name of the orm.xml file can be changed. We have named it mapping.xml. Following is the code of mapping.xml file My First JPA XML Application entity Add the following code to the persistence.xml file oracle.toplink.essentials.PersistenceProvider \META-INF\orm.xml jpaentity.Employee Steps for execution Compile the source files Build the project Start the Database Server Execute the “EmpClient” class. Connect to the database & verify the table has been created & data is also added in the table. Advantages and Disadvantages of Using XML for Configuration Advantages No coupling between the metadata and the source code Compatible with pre EJB3.0 development process Support from IDEs like NetBeans, Eclipse etc Easy to modify with the help of good editors. Disadvantages Complexity Difficulty in debugging in absence of editors Wrap Up This article helps in understanding entity configuration with XML files as an alternative to embedding annotations in Java code for configuring persistence details.

March 6, 2011

by Anu Bakshi

· 131,316 Views · 1 Like

IBatis (MyBatis): Handling Joins: Advanced Result Mapping, Association, Collections, N+1 Select Problem

This tutorial will present examples using advanced result mappings, how to handle mappings with association, collections, the n+1 problem, and more.

March 2, 2011

by Loiane Groner

· 121,530 Views · 3 Likes

5 Key Events in the history of Cloud Computing

While we have been evaluating in our blog posts the various features available on popular Cloud Computing platforms today, I thought it might be a good idea to understand when and how all this started and look back at where this began and trace some of the key events in the progress of cloud computing. Amazon like all other Internet companies in the period of the dot com bubble were left with large amounts of underutilized computing infrastructure, reports suggest less than 10% of the server infrastructure of many companies were being used. Amazon may have use cloud computing as a way to provide this unused resources as utility computing service when they launched S3 as the first true cloud computing service in March 2006. 1. Launch of Amazon Web Services in July 2002 The initial version of AWS in 2002 was focused more on making information available from Amazon to partners through a web services model with programmatic and developer support and was very focused on Amazon as a retailer. While this set the stage for the next steps the launch of S3 was the true step towards building a cloud platform. Amazon Press Release 2. S3 Launches in March 2006 Here are some interesting articles on the launch of S3 in 2006. The real breakthrough however was the pricing model for S3 which defined the model of 'pay-per-use' which has now become the defacto standard for cloud pricing. Also the launch of S3 really defined the shift of Amazon from being just a retailer to a strong player in the technology space. Techcrunch Post on S3 on March 14th, 2006 Read Write Web Post on S3 and EC2 on Nov 3rd, 2006 Business Week Article on Jeff Bezos vision on cloud computing on Nov 13th, 2006 3. EC2 Launches in August 2006 EC2 had a much quieter launch in August 2006 but i would think had the bigger impact by making core computing infrastructure available. This completed the loop on enabling a more complete cloud infrastructure being available. In fact at that time analysts had some difficulty in understanding what the big deal is, and thought it looks similar to other hosting services available online only with a different pricing model. Some interesting articles from that time on the launch: Technologyevangelist Blog Virtualization Info 4. Launch of Google App Engine in April 2008 The launch of Google App Engine in 2008 was the entry of the first pure play technology company into the Cloud Computing market. Google a dominant Internet company entering into this market was clearly a major step towards wide spread adoption of cloud computing. As with all their other products they introduced radical pricing models with a free entry level plan and extremely low cost computing and storage services which are currently among the lowest in the market. Techcrunch post on App Engine Launch Google App Engine Launch Post 5. Windows Azure launches Beta in Nov 2009 The entry of Microsoft into Cloud Computing is a clear indication of the growth of the space. Microsoft for long has not accepted the Internet and the web as a significant market and has continued to focus on the desktop market for all these years. I think this is a realization that a clear shift is taking place. The launch of Azure is a key event in the history of cloud computing with the largest software company making a small but significant shift to the web. Launch of Azure Beta Azure General Availability - Feb 2010 You might also like: Cloud Computing, Google App Engine: How big is the market Really ? Comparing Google App Engine with Amazon EC2 Comparing Amazon EC2 and Microsoft Azure Languages Supported by Google App Engine Cloud Computing: What is it really ?

February 26, 2011

by Kaushik Raghupathi

· 48,144 Views

How to remove getters and setters

Getters and setters are one of the first abstraction step that is thought over public fields in object-oriented programming. However, the paradigm was never about encapsulating properties by providing a special reading/writing mechanism via methods, but about objects responding to messages. In other words, encapsulation is (not only, but also) about being capable of changing private fields. If you write getters and setters, you introduce a leaky abstraction over private fields, since the names and number of your public methods are influenced coupled to them. They aren't really private anymore: for example in service classes I pass dependencies for private methods in the constructor, and the client code is never affected when these dependencies change. With getters and setters, a field addition, removal or renaming affects the contract of the class. In this article, we're going to take this User Entity class and explore the many ways we have for removing getters and setters. The choice between them depends mainly on what you need them for: it's the client code that should decide what contracts wants from these classes. nickname = $nickname; } public function getNickname() { return $nickname; } public function setPassword($password) { $this->password = $password; } public function getPassword() { return $this->password; } // ... you get the picture: other 8 getters or setters } The various techniques are ordered by complexity. As always, I express use cases for the User class via a test case. All the code is on Github. Constructor First, we can pass some fields in the constructor. If we do not expose the field then, the value will be immutable and the client code will never even know that this value exists. For example, our User has an immutable nickname, which serves also as a primary key: public function testPassWriteOnlyDataInTheConstructor() { $user = new User('giorgiosironi'); // should not explode //removed the setter as it cannot be changed } class User { public function __construct($nickname, $activationKey = '') { $this->nickname = $nickname; $this->activationKey = $activationKey; } } Information Expert Next, we have the Information Expert pattern: assign an operation to the object with the greatest knowledge for accomplishing it. If you do so, you won't need to expose private fields via getters and setters, since you will model some behavior as code in that class, which can see private fields. For example, when we register an User we want to activate it via email verification, so we send an activation key via mail. But to check it, we don't need to extract the value from the User object. // Information Expert /** * @expectedException InvalidArgumentException */ public function testAnUserIsActivated() { $user = new User('giorgiosironi', 'ABC'); $user->activate('AB'); } class User { public function activate($key) { if ($this->activationKey === $key) { $this->active = true; return; } throw new InvalidArgumentException('Key for activation is incorrect.'); } } This style is an example of the Tell, Don't Ask principle: we tell our User to do something instead of asking it for information and doing it ourselves. Double Dispatch Putting behavior inside an Entity class is nice, but sometimes the operation needs some external dependency to work, like a login mechanism needing a storage for the identity of the user (usually the session). We have two types of coupling to solve here: static: the User class should not depend on any other infrastructure class, which in turn refers the database or the session storage. runtime: the User class cannot hold a reference to an infrastructure object, for instance because we want to serialize it or to create it simply with a new operator, or our ORM does not support injection of collaborators. The first issues is solved by introducing an interface implemented by the infrastructure class; the second by passing in the dependency via Double Dispatch instead of via the constructor like we do with service classes. public function testUsersLogin() { $user = new User('giorgiosironi'); $user->setPassword('gs'); // will be removed in next tests // in reality, we would use a SessionLoginAdapter or something like that $loginAdapterMock = $this->getMock('LoginAdapter'); $loginAdapterMock->expects($this->once()) ->method('storeIdentity') ->with('giorgiosironi'); $user->login('gs', $loginAdapterMock); } class User { public function login($password, LoginAdapter $loginAdapter) { if ($this->password == $password) { $loginAdapter->storeIdentity($this->nickname); return true; } else { return false; } } } Still an example of Tell, Don't Ask, but more real world now. Command and changesets You really have to put data inside this object: the user has just compiled a form and you have to get thos input values in. So how do we define that operation? As an atomic method call, by passing in what I call a Changeset but it's a specialization of a Command (not Command pattern but CommandQueryResponsibilitySegregation). In the simplest cases, it's just a Value Object or a Data Transfer Object with no behavior. public function testCommandForChangingPassword() { $user = new User('giorgiosironi'); $passwordChange = new ChangeUserPassword('', 'gs'); $user->handle($passwordChange); $this->assertEquals('gs', $user->getPassword()); //deprecated, will be removed in next tests } class User { public function handle($command) { if ($command instanceof ChangeUserPassword) { $this->handleChangeUserPassword($command); } if ($command instanceof SetUserDetails) { $this->handleSetUserDetails($command); } // support other commands here... } private function handleChangeUserPassword(ChangeUserPassword $command) { if ($command->getOldPassword() == $this->password) { $this->password = $command->getNewPassword(); } else { throw new Exception('The old password is not correct.'); } } } Think about it: you will have to put these getters and setters somewhere; it's best to put them on an object which is a data structure than that on your Entity. This way: you will be coupled to the current fields only on this particular operation and not when you pass an User around. you will make it clear that you support only a full update operation, and it's not ok to call an isolate setter. Actually in PHP you could just use an array as a Changeset, but a class provides a stricter contract. Also public fields are not viable for a contract as no error will be raised by PHP if you assign a non-existent field on a Changeset object. Rendering on a canvas In the Growing Object-Oriented Software mailing list, there has been recently a discussion on how to emulate getters via callbacks. This solution is the specular of our Changeset argument used for extracting data instead of updating it. public function testCanvasForRenderingAnObject() { $user = new User('giorgiosironi'); $detailsSet = new SetUserDetails('Italy', 'Pizza'); //THIS may have set/get $user->handle($detailsSet); // canvas can also be a form, or xml, or json... $canvas = new HtmlCanvas('{{location}{{favoriteFood}'); $user->render($canvas); $this->assertEquals('ItalyPizza', (string) $canvas); } class User { public function render(Canvas $canvas) { $canvas->nickname = $this->nickname; $canvas->location = $this->location; $canvas->favoriteFood = $this->favoriteFood; } } Again, the canvas is hidden behind an interface and can be anything: an HTML view, a form, a JSON or RSS feed generator... CQRS In Command-Query Responsibility Segregation, you use the ORM for mapping your objects in the database, and fill your report screens by querying it directly, or even by querying another store which is continuously rebuilt from your master database. I don't know of any implementations of CQRS in PHP, but this mechanism promises to at least eliminate getters, as your domain objects will be write-only. Conclusion The full code is in this Github repository, as always. You have no excuses now: go and ditch one of your getters and setters immediately. Your code will breath a bit of fresh air.

February 22, 2011

by Giorgio Sironi

· 22,288 Views

Getting Started with iBatis (MyBatis): Annotations

This tutorial will walk you through how to setup iBatis (MyBatis) in a simple Java project and will present examples using simple insert, update, select and delete statements using annotations. This is the third tutorial of the iBatis/MyBatis series, you can read the first 2 tutorials on the following links: Introduction to iBatis (MyBatis), An alternative to Hibernate and JDBC Getting Started with iBatis (MyBatis): XML Configuration iBatis/ MyBatis 3 offers a new feature: annotations. But it lacks examples and documentation about annotations. I started to explore and read the MyBatis mailing list archive to write this tutorial. Another thing you will notice is the limitation related to annotations. I am going to demonstrate some examples that you can do with annotations in this tutorial. All the power of iBatis is in its XMl configuration. So let’s play a little bit with annotations. It is much simpler and you can use it for simple queries in small projects. As I already mentioned, if you want something more complex, you will have to use XML configuration. One more thing before we get started: this tutorial is the same as the previous one, but instead of XML, we are going to use annotations. Pre-Requisites For this tutorial I am using: IDE: Eclipse (you can use your favorite one) DataBase: MySQL Libs/jars: Mybatis, MySQL conector and JUnit (for testing) This is how your project should look like: Sample Database Please run this script into your database before getting started with the project implementation: I will not post the sample database here again. You can get it from the previous post about iBatis or download this sample project. The files are the same. 1 – Contact POJO We will create a POJO class first to respresent a contact with id, name, phone number and email address – same as previous post. 2 – ContactMapper In this file, we are going to set up all the queries using annotations. It is the MyBatis-Interface for the SQLSessionFactory. A mapper class is simply an interface with method definitions that match up against the SqlSession methods. Mapper interfaces do not need to implement any interface or extend any class. As long as the method signature can be used to uniquely identify a corresponding mapped statement. Mapper interfaces can extend other interfaces. Be sure that you have the statements in the appropriate namespace when using XML binding to Mapper interfaces. Also, the only limitation is that you cannot have the same method signature in two interfaces in a hierarchy (a bad idea anyway). A mapperclass is simply an interface with method definitions that match up against the SqlSession methods. You can create some strings with the SQL code. Remember it has to be the same code as in XML Configuration. package com.loiane.data; import java.util.List; import org.apache.ibatis.annotations.Delete;import org.apache.ibatis.annotations.Insert;import org.apache.ibatis.annotations.Options;import org.apache.ibatis.annotations.Param;import org.apache.ibatis.annotations.Result;import org.apache.ibatis.annotations.Results;import org.apache.ibatis.annotations.Select;import org.apache.ibatis.annotations.Update; import com.loiane.model.Contact; public interface ContactMapper { final String SELECT_ALL = "SELECT * FROM CONTACT"; final String SELECT_BY_ID = "SELECT * FROM CONTACT WHERE CONTACT_ID = #{id}"; final String UPDATE = "UPDATE CONTACT SET CONTACT_EMAIL = #{email}, CONTACT_NAME = #{name}, CONTACT_PHONE = #{phone} WHERE CONTACT_ID = #{id}"; final String UPDATE_NAME = "UPDATE CONTACT SET CONTACT_NAME = #{name} WHERE CONTACT_ID = #{id}"; final String DELETE = "DELETE FROM CONTACT WHERE CONTACT_ID = #{id}"; final String INSERT = "INSERT INTO CONTACT (CONTACT_EMAIL, CONTACT_NAME, CONTACT_PHONE) VALUES (#{name}, #{phone}, #{email})"; /** * Returns the list of all Contact instances from the database. * @return the list of all Contact instances from the database. */ @Select(SELECT_ALL) @Results(value = { @Result(property="id", column="CONTACT_ID"), @Result(property="name", column="CONTACT_NAME"), @Result(property="phone", column="CONTACT_PHONE"), @Result(property="email", column="CONTACT_EMAIL") }) List selectAll(); /** * Returns a Contact instance from the database. * @param id primary key value used for lookup. * @return A Contact instance with a primary key value equals to pk. null if there is no matching row. */ @Select(SELECT_BY_ID) @Results(value = { @Result(property="id"), @Result(property="name", column="CONTACT_NAME"), @Result(property="phone", column="CONTACT_PHONE"), @Result(property="email", column="CONTACT_EMAIL") }) Contact selectById(int id); /** * Updates an instance of Contact in the database. * @param contact the instance to be updated. */ @Update(UPDATE) void update(Contact contact); /** * Updates an instance of Contact in the database. * @param name name value to be updated. * @param id primary key value used for lookup. */ void updateName(@Param("name") String name, @Param("id") int id); /** * Delete an instance of Contact from the database. * @param id primary key value of the instance to be deleted. */ @Delete(DELETE) void delete(int id); /** * Insert an instance of Contact into the database. * @param contact the instance to be persisted. */ @Insert(INSERT) @Options(useGeneratedKeys = true, keyProperty = "id") void insert(Contact contact);} @Select The @Select annotation is very simple. Let’s take a look at the first select statment of this class: selectAll. Note that you don’t need to use the mapping if your database table columns mach the name of the class atributes. I used different names to use the @Result annotation. If table columns match with atribute names, you don’t need to use the @Result annotation. Not let’s take a look on the second method: selectById. Notice that we have a parameter. It is a simple parameter – easy to use when you have a single parameter. @Update Let’s say you want to update all the columns. You can pass the object as parameter and iBatis will do all the magic for you. Remember to mach the parameter name with the atribute name, otherwise iBatis can get confused, Now let’s say you want to use 2 or 3 paramaters, and they don’t belong to an object. If you take a look at iBatis XML configuration you will see you have to set the option parameterType (remember?) and specify you parameter type in it, in another words, you can use only one parameter. If you are using annotation, you can use more than one parameter using the @Param annotation. @Delete The @Delete annotation is also very simple. It follows the previous rules related to parameters. @Insert The @Insert annotation also follows the rules related to parameters. You can use an object or use the @Param annotation to specify more than one parameter. What about the generation key? Well, if your database supports auto generation key, you can set it up using annotations with the @Options annotation. You will need to specify the option useGeneratedKeys and keyProperty. If your database does not support auto generation key, sorry, but I still did not figure out how to do it (in last case, use can do it manually and then pass as parameter to your insert query). 3 – MyBatisConnectionFactory Every MyBatis application centers around an instance of SqlSessionFactory. A SqlSessionFactory instance can be acquired by using the SqlSessionFactoryBuilder. SqlSessionFactoryBuilder can build a SqlSessionFactory instance from an XML configuration file, of from a custom prepared instance of the Configuration class. An observation about this file: on the previous example, we set the mappers on the iBatis Configuration XML file. Using annotations, we will set the mapper manually. In a future example, I’ll show you how to work with XML and Annotations together. package com.loiane.dao; import java.io.FileNotFoundException;import java.io.IOException;import java.io.Reader; import org.apache.ibatis.io.Resources;import org.apache.ibatis.session.SqlSessionFactory;import org.apache.ibatis.session.SqlSessionFactoryBuilder; import com.loiane.data.ContactMapper; public class MyBatisConnectionFactory { private static SqlSessionFactory sqlSessionFactory; static { try { String resource = "SqlMapConfig.xml"; Reader reader = Resources.getResourceAsReader(resource); if (sqlSessionFactory == null) { sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader); sqlSessionFactory.getConfiguration().addMapper(ContactMapper.class); } } catch (FileNotFoundException fileNotFoundException) { fileNotFoundException.printStackTrace(); } catch (IOException iOException) { iOException.printStackTrace(); } } public static SqlSessionFactory getSqlSessionFactory() { return sqlSessionFactory; } } 4 – ContactDAO Now that we set up everything needed, let’s create our DAO. To call the sql statments, we need to do one more configuration, that is to set and get the mapper from the SqlSessionFactory. Then we just need to call the mapper method. It is a little bit different, but it does the same thing. package com.loiane.dao; import java.util.List; import org.apache.ibatis.session.SqlSession;import org.apache.ibatis.session.SqlSessionFactory; import com.loiane.data.ContactMapper;import com.loiane.model.Contact; public class ContactDAO { private SqlSessionFactory sqlSessionFactory; public ContactDAO(){ sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); } /** * Returns the list of all Contact instances from the database. * @return the list of all Contact instances from the database. */ public List selectAll(){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); List list = mapper.selectAll(); return list; } finally { session.close(); } } /** * Returns a Contact instance from the database. * @param id primary key value used for lookup. * @return A Contact instance with a primary key value equals to pk. null if there is no matching row. */ public Contact selectById(int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); Contact list = mapper.selectById(id); return list; } finally { session.close(); } } /** * Updates an instance of Contact in the database. * @param contact the instance to be updated. */ public void update(Contact contact){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.update(contact); session.commit(); } finally { session.close(); } } /** * Updates an instance of Contact in the database. * @param name name value to be updated. * @param id primary key value used for lookup. */ public void updateName(String name, int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.updateName(name, id); session.commit(); } finally { session.close(); } } /** * Insert an instance of Contact into the database. * @param contact the instance to be persisted. */ public void insert(Contact contact){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.insert(contact); session.commit(); } finally { session.close(); } } /** * Delete an instance of Contact from the database. * @param id primary key value of the instance to be deleted. */ public void delete(int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.delete(id); session.commit(); } finally { session.close(); } } 5 – Mapper Configuration File The MyBatis XML configuration file contains settings and properties that have a dramatic effect on how MyBatis behaves. This time, we do not need to configure alias or xml config files. We did all the magic with annotations, so we have a simple config file with only the information about the database we want to connect with. Download I suggest you to take a look at the org.apache.ibatis.annotations package and try to find out what each annotation can do. Unfortunatelly, you won’t find much documentation or examples on MyBatis website. I also created a TestCase class. If you want to download the complete sample project, you can get it from my GitHub account: https://github.com/loiane/ibatis-annotations-helloworld If you want to download the zip file of the project, just click on download: There are more articles about iBatis to come. Stay tooned! In next articles, I’m going to demonstrate how to implement the feature using XML and then Annotations (when it is possible). Happy Coding! From http://loianegroner.com/2011/02/getting-started-with-ibatis-mybatis-annotations/

February 22, 2011

by Loiane Groner

· 72,254 Views

Getting Started with iBatis (MyBatis): XML Configuration

This tutorial will walk you through how to setup iBatis (MyBatis) in a simple Java project and will present examples using simple insert, update, select and delete statements.

February 18, 2011

by Loiane Groner

· 112,707 Views

Solve Foreign-key Problems in DBUnit Test Data

If you create small per-test datasets, as DBUnit advises, you’ll get intermittent build failures due to foreign-key violations. This post explains (1) why this happens, (2) why small per-test datasets are still a good idea, and (3) one simple way to get around the problem. NB When I searched for solutions to this problem, I discovered that other kinds of foreign-key problem come up with DBUnit. Some people have circular dependencies in their relational database schemas, which stops DBUnit from loading the test data. If such is your case, I’m sorry to say that this post won’t help you with it, and your best option is probably to just take yourself outside and shoot yourself now. (Although some people seem to chosen instead to disable foreign key checking during test runs.) What causes the foreign-key violations The cause of the problem is simple, and illustrated by a trivial example. Suppose you have two entity classes, HitchHiker and SpaceShip. The HitchHiker table has a foreign key that references SpaceShip. The test data for HitchHikerDaoTest contains lines from both tables, whereas the test data for SpaceShipDaoTest contains only lines from SpaceShip. DBUnit’s default setup operation, CLEAN_INSERT, wipes data from every table occurring in the test dataset and then inserts the lines listed in that dataset. When SpaceShipDaoTest runs, DBUnit will start by deleting everything in the SpaceShip table. If any HitchHikers are currently riding in the SpaceShips that are about to be deleted, the database will object to their untimely eviction (I’m not sure whether the error message will read like Vogon poetry, though). If you start from an empty database, and execute SpaceShipDaoTest and then HitchHikerDaoTest, you’ll be fine; but if you do it in the other order, your build will fail. It’s that second-worst kind of bug, the unpredictable kind, since you don’t (usually) specify the order in which tests run. After all, they’re supposed to be independent! So you may well find that you have no problems for months on end, until one day you get an error running individual tests in a particular sequence, or Maven changes the order in which it runs your tests on the CI server, and BOOM! Why you should still use small independent datasets It’s tempting to circumvent the problem by using a single monolithic dataset for all your integration tests. I’ve tried this, and I advise against it. A big data file is hard to work with: you waste a lot of time scrolling around looking for the line you need, and it’s very hard to follow and understand foreign-key relations. Worse still: by modifying the data to make one test pass, you can easily accidentally break another one. The larger the dataset and the test suite become, the more fragile they get, and the more painstaking it becomes to modify them. How to avoid the foreign-key problem with small independent datasets One working but unsatisfactory solution would be to pad out every XML dataset with the list of all tables touched in the test suite. It’s unsatisfactory because the only way to add a table into a FlatXmlDataSet is to list a line of that table — a FlatXmlDataSet can’t contain empty tables — and there’s no justification for polluting the test data with lines from tables that are not part of the test. The solution I found was to use a DTD to clean tables before tests. Every XML file has different contents, but they all reference a single DTD which lists all the tables involved in the test suite. The DTD is easy to generate from the database schema, and useful for auto-complete and catching typos in column names, so you should probably already be using one. The code to exploit its contents is very simple: private IDataSet loadTestDataWithDtdTableList(String dtdFilename) throws IOException, DataSetException, SQLException { Reader dtdReader = new FileReader(new ClassPathResource(dtdFilename).getFile()); IDataSet dtdDataset = new FlatDtdDataSet(dtdReader); FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder(); builder.setMetaDataSet(new DatabaseDataSet(dbUnitConnection, false)); IDataSet xmlDataset = builder.build(asFile(xmlFilename)); return new CompositeDataSet(dtdDataset, xmlDataset);} How it works: DBUnit provides a facility to load a dataset from a DTD. This dataset contains all the tables listed in the DTD, but of course empty of data. The DTD dataset is then combined with a FlatXmlDataSet representing your test data. The graphic below illustrates the composite dataset that would be produced for the SpaceShip example. If you have dictionary tables whose contents never change, you can and should leave them out of the DTD as well as out of the XML datasets, to improve test performance a little. One further detail: you should close the FileReader after test setup. I couldn’t find a hook into the end of the test setup operation (short of writing my own DatabaseOperation), so I saved the reference as a member variable and hooked the close() call into the tear-down phase of the test. NB For a more complete code example, see this Gist snippet of a base class for TestNG+Spring+DBUnit tests that adds the above-described DBUnit setup operation to Spring’s TestNG helper class. Happy database testing! From http://www.andrewspencer.net/2011/solve-foreign-key-problems-in-dbunit-test-data/

February 16, 2011

by Andrew Spencer

· 27,901 Views

Table sorting & pagination with jQuery and Razor in ASP.NET MVC

Introduction jQuery enjoys living inside pages which are built on top of ASP.NET MVC Framework. The ASP.NET MVC is a place where things are organized very well and it is quite hard to make them dirty, especially because the pattern enforces you on purity (you can still make it dirty if you want so ;) ). We all know how easy is to build a HTML table with a header row, footer row and table rows showing some data. With ASP.NET MVC we can do this pretty easy, but, the result will be pure HTML table which only shows data, but does not includes sorting, pagination or some other advanced features that we were used to have in the ASP.NET WebForms GridView. Ok, there is the WebGrid MVC Helper, but what if we want to make something from pure table in our own clean style? In one of my recent projects, I’ve been using the jQuery tablesorter and tablesorter.pager plugins that go along. You don’t need to know jQuery to make this work… You need to know little CSS to create nice design for your table, but of course you can use mine from the demo… So, what you will see in this blog is how to attach this plugin to your pure html table and a div for pagination and make your table with advanced sorting and pagination features. Demo Project Resources The resources I’m using for this demo project are shown in the following solution explorer window print screen: Content/images – folder that contains all the up/down arrow images, pagination buttons etc. You can freely replace them with your own, but keep the names the same if you don’t want to change anything in the CSS we will built later. Content/Site.css – The main css theme, where we will add the theme for our table too Controllers/HomeController.cs – The controller I’m using for this project Models/Person.cs – For this demo, I’m using Person.cs class Scripts – jquery-1.4.4.min.js, jquery.tablesorter.js, jquery.tablesorter.pager.js – required script to make the magic happens Views/Home/Index.cshtml – Index view (razor view engine) the other items are not important for the demo. ASP.NET MVC 1. Model In this demo I use only one Person class which defines Person entity with several properties. You can use your own model, maybe one which will access data from database or any other resource. Person.cs public class Person { public string Name { get; set; } public string Surname { get; set; } public string Email { get; set; } public int? Phone { get; set; } public DateTime? DateAdded { get; set; } public int? Age { get; set; } public Person(string name, string surname, string email, int? phone, DateTime? dateadded, int? age) { Name = name; Surname = surname; Email = email; Phone = phone; DateAdded = dateadded; Age = age; } } 2. View In our example, we have only one Index.chtml page where Razor View engine is used. Razor view engine is my favorite for ASP.NET MVC because it’s very intuitive, fluid and keeps your code clean. 3. Controller Since this is simple example with one page, we use one HomeController.cs where we have two methods, one of ActionResult type (Index) and another GetPeople() used to create and return list of people. HomeController.cs public class HomeController : Controller { // // GET: /Home/ public ActionResult Index() { ViewBag.People = GetPeople(); return View(); } public List GetPeople() { List listPeople = new List(); listPeople.Add(new Person("Hajan", "Selmani", "[email protected]", 070070070,DateTime.Now, 25)); listPeople.Add(new Person("Straight", "Dean", "[email protected]", 123456789, DateTime.Now.AddDays(-5), 35)); listPeople.Add(new Person("Karsen", "Livia", "[email protected]", 46874651, DateTime.Now.AddDays(-2), 31)); listPeople.Add(new Person("Ringer", "Anne", "[email protected]", null, DateTime.Now, null)); listPeople.Add(new Person("O'Leary", "Michael", "[email protected]", 32424344, DateTime.Now, 44)); listPeople.Add(new Person("Gringlesby", "Anne", "[email protected]", null, DateTime.Now.AddDays(-9), 18)); listPeople.Add(new Person("Locksley", "Stearns", "[email protected]", 2135345, DateTime.Now, null)); listPeople.Add(new Person("DeFrance", "Michel", "[email protected]", 235325352, DateTime.Now.AddDays(-18), null)); listPeople.Add(new Person("White", "Johnson", null, null, DateTime.Now.AddDays(-22), 55)); listPeople.Add(new Person("Panteley", "Sylvia", null, 23233223, DateTime.Now.AddDays(-1), 32)); listPeople.Add(new Person("Blotchet-Halls", "Reginald", null, 323243423, DateTime.Now, 26)); listPeople.Add(new Person("Merr", "South", "[email protected]", 3232442, DateTime.Now.AddDays(-5), 85)); listPeople.Add(new Person("MacFeather", "Stearns", "[email protected]", null, DateTime.Now, null)); return listPeople; } } TABLE CSS/HTML DESIGN Now, lets start with the implementation. First of all, lets create the table structure and the main CSS. 1. HTML Structure @{ Layout = null; } value value value So, this is the main structure you need to create for each of your tables where you want to apply the functionality we will create. Of course the scripts are referenced once ;). As you see, our table has class tablesorter and also we have a div with id pager. In the next steps we will use both these to create the needed functionalities. The complete Index.cshtml coded to get the data from controller and display in the page is: NameSurnameEmailPhoneDate Added @{ foreach (var p in ViewBag.People) { @[email protected]@[email protected]@p.DateAdded } } NameSurnameEmailPhoneDate Added 510203040 So, mainly the structure is the same. I have added @Razor code to create table with data retrieved from the ViewBag.People which has been filled with data in the home controller. 2. CSS Design The CSS code I’ve created is: /* DEMO TABLE */ body { font-size: 75%; font-family: Verdana, Tahoma, Arial, "Helvetica Neue", Helvetica, Sans-Serif; color: #232323; background-color: #fff; } table { border-spacing:0; border:1px solid gray;} table.tablesorter thead tr .header { background-image: url(images/bg.png); background-repeat: no-repeat; background-position: center right; cursor: pointer; } table.tablesorter tbody td { color: #3D3D3D; padding: 4px; background-color: #FFF; vertical-align: top; } table.tablesorter tbody tr.odd td { background-color:#F0F0F6; } table.tablesorter thead tr .headerSortUp { background-image: url(images/asc.png); } table.tablesorter thead tr .headerSortDown { background-image: url(images/desc.png); } table th { width:150px; border:1px outset gray; background-color:#3C78B5; color:White; cursor:pointer; } table thead th:hover { background-color:Yellow; color:Black;} table td { width:150px; border:1px solid gray;} PAGINATION AND SORTING Now, when everything is ready and we have the data, lets make pagination and sorting functionalities 1. jQuery Scripts referencing 2. jQuery Sorting and Pagination script So, with only two lines of code, I’m using both tablesorter and tablesorterPager plugins, giving some options to both these. Options added: tablesorter - widthFixed: true – gives fixed width of the columns tablesorter - sortList[[0,0]] – An array of instructions for per-column sorting and direction in the format: [[columnIndex, sortDirection], ... ] where columnIndex is a zero-based index for your columns left-to-right and sortDirection is 0 for Ascending and 1 for Descending. A valid argument that sorts ascending first by column 1 and then column 2 looks like: [[0,0],[1,0]] (source: http://tablesorter.com/docs/) tablesorterPager – container: $(“#pager”) – tells the pager container, the div with id pager in our case. tablesorterPager – size: the default size of each page, where I get the default value selected, so if you put selected to any other of the options in your select list, you will have this number of rows as default per page for the table too. END RESULTS 1. Table once the page is loaded (default results per page is 5 and is automatically sorted by 1st column as sortList is specified) 2. Sorted by Phone Descending 3. Changed pagination to 10 items per page 4. Sorted by Phone and Name (use SHIFT to sort on multiple columns) 5. Sorted by Date Added 6. Page 3, 5 items per page ADDITIONAL ENHANCEMENTS We can do additional enhancements to the table. We can make search for each column. I will cover this in one of my next blogs. Stay tuned. DEMO PROJECT You can download demo project source code from HERE. CONCLUSION Once you finish with the demo, run your page and open the source code. You will be amazed of the purity of your code. Working with pagination in client side can be very useful. One of the benefits is performance, but if you have thousands of rows in your tables, you will get opposite result when talking about performance. Hence, sometimes it is nice idea to make pagination on back-end. So, the compromise between both approaches would be best to combine both of them. I use at most up to 500 rows on client-side and once the user reach the last page, we can trigger ajax postback which can get the next 500 rows using server-side pagination of the same data. I would like to recommend the following blog post http://weblogs.asp.net/gunnarpeipman/archive/2010/09/14/returning-paged-results-from-repositories-using-pagedresult-lt-t-gt.aspx, which will help you understand how to return page results from repository. I hope this was helpful post for you. Wait for my next posts ;). Please do let me know your feedback. Best Regards, Hajan

February 14, 2011

by Hajan Selmani

· 77,420 Views

Introduction to iBatis (MyBatis), An alternative to Hibernate and JDBC

i started to write a new article series about ibatis / mybatis . this is the first article and it will walk you through what is ibatis / mybatis and why you should use it. for those who does not know ibatis / mybatis yet, it is a persistence framework – an alternative to jdbc and hibernate , available for java and .net platforms. i’ve been working with it for almost two years, and i am enjoying it! the first thing you may notice in this and following articles about ibatis/mybatis is that i am using both ibatis and mybatis terms. why? until june 2010, ibatis was under apache license and since then, the framework founders decided to move it to google code and they renamed it to mybatis. the framework is still the same though, it just has a different name now. i gathered some resources, so i am just going to quote them: what is mybatis/ibatis? the mybatis data mapper framework makes it easier to use a relational database with object-oriented applications. mybatis couples objects with stored procedures or sql statements using a xml descriptor. simplicity is the biggest advantage of the mybatis data mapper over object relational mapping tools.to use the mybatis data mapper, you rely on your own objects, xml, and sql. there is little to learn that you don’t already know. with the mybatis data mapper, you have the full power of both sql and stored procedures at your fingertips. ( www.mybatis.org ) ibatis is based on the idea that there is value in relational databases and sql, and that it is a good idea to embrace the industrywide investment in sql. we have experiences whereby the database and even the sql itself have outlived the application source code, and even multiple versions of the source code. in some cases we have seen that an application was rewritten in a different language, but the sql and database remained largely unchanged. it is for such reasons that ibatis does not attempt to hide sql or avoid sql. it is a persistence layer framework that instead embraces sql by making it easier to work with and easier to integrate into modern object-oriented software. these days, there are rumors that databases and sql threaten our object models, but that does not have to be the case. ibatis can help to ensure that it is not. ( ibatis in action book) so… what is ibatis ? a jdbc framework developers write sql, ibatis executes it using jdbc. no more try/catch/finally/try/catch. an sql mapper automatically maps object properties to prepared statement parameters. automatically maps result sets to objects. support for getting rid of n+1 queries. a transaction manager ibatis will provide transaction management for database operations if no other transaction manager is available. ibatis will use external transaction management (spring, ejb cmt, etc.) if available. great integration with spring, but can also be used without spring (the spring folks were early supporters of ibatis). what isn’t ibatis ? an orm does not generate sql does not have a proprietary query language does not know about object identity does not transparently persist objects does not build an object cache essentially, ibatis is a very lightweight persistence solution that gives you most of the semantics of an o/r mapping toolkit, without all the drama. in other words ,ibatis strives to ease the development of data-driven applications by abstracting the low-level details involved in database communication (loading a database driver, obtaining and managing connections, managing transaction semantics, etc.), as well as providing higher-level orm capabilities (automated and configurable mapping of objects to sql calls, data type conversion management, support for static queries as well as dynamic queries based upon an object’s state, mapping of complex joins to complex object graphs, etc.). ibatis simply maps javabeans to sql statements using a very simple xml descriptor. simplicity is the key advantage of ibatis over other frameworks and object relational mapping tools.( http://www.developersbook.com ) who is using ibatis/mybatis? see the list in this link: http://www.apachebookstore.com/confluence/oss/pages/viewpage.action?pageid=25 i think the biggest case is myspace , with millions of users. very nice! this was just an introduction, so in next articles i will show how to create an application using ibatis/mybatis – step-by-step. enjoy! from http://loianegroner.com/2011/02/introduction-to-ibatis-mybatis-an-alternative-to-hibernate-and-jdbc/

February 9, 2011

by Loiane Groner

· 42,515 Views · 5 Likes

Spring Data with Redis

The Spring Data project provides a solution for accessing data stored in new emerging technologies like NoSQL databases and cloud based services. When we look into the SpringSource git repository we see a lot of spring-data sub-projects: spring-data-commons: common interfaces and utility class for other spring-data projects. spring-data-column: support for column based databases. It has not started yet, but there will be support for Cassandra and HBase spring-data-document: support for document databases. Currently MongoDB and CouchDB are supported. spring-data-graph: support for graph based databases. Currently Neo4j is supported. spring-data-keyvalue: support for key-value databases. Currently Redis and Riak are supported and probably Membase will be supported in future. spring-data-jdbc-ext: JDBC extensions, as example Oracle RAC connection failover is implemented. spring-data-jpa: simplifies JPA based data access layer. I would like to share with you how you can use Redis. The first step is to download it from the redis.io web page. try.redis-db.com is a useful site where we can run Redis commands. It also provides a step by step tutorial. This tutorial shows us all structures that Redis supports (list, set, sorted set and hashes) and some useful commands. A lot of reputable sites use Redis today. After download and unpacking we should compile Redis (version 2.2, the release candidate is the preferable one to use since some commands do not work in version 2.0.4). make sudo make install Once we run these commands we are all set to run the following five commands: redis-benchmark - for benchmarking Redis server redis-check-aof - check the AOF (Aggregate Objective Function), and it can repair that. redis-check-dump - check rdb files for unprocessable opcodes. redis-cli - Redis client. redis-server - Redis server. We can test Redis server. redis-server [1055] 06 Jan 18:19:15 # Warning: no config file specified, using the default config. In order to specify a config file use 'redis-server /path/to/redis.conf' [1055] 06 Jan 18:19:15 * Server started, Redis version 2.0.4 [1055] 06 Jan 18:19:15 * The server is now ready to accept connections on port 6379 [1055] 06 Jan 18:19:15 - 0 clients connected (0 slaves), 1074272 bytes in use and Redis client. redis-cli redis> set my-super-key "my-super-value" OK Now we create a simple Java project in order to show how simple a spring-data-redis module really is. mvn archetype:create -DgroupId=info.pietrowski -DpackageName=info.pietrowski.redis -DartifactId=spring-data-redis -Dpackage=jar Next we have to add in pom.xml milestone spring repository, and add spring-data-redis as a dependency. After that all required dependencies will be fetched. Next we create a resources folder under the main folder, and create application.xml which will have all the configuration. We can configure the JedisConnectionFactory, in two different ways, One - we can provide a JedisShardInfo object in shardInfo property. Two - we can provide host (default localhost), port (default 6379), password (default empty) and timeout (default 2000) properties. One thing to keep in mind is that the JedisShardInfo object has precedence and allows to setup weight, but only allows constructor injection. We can setup the factory to use connection pooling by setting the value of the pooling property to 'true' (default). See application.xml comments to see three different way of configuration. Note: There are two different libraries supported: Jedis and JRedis. They have very similar names and both have the same factory name. See the difference: org.springframework.data.keyvalue.redis.connection.jedis.JedisConnectionFactory org.springframework.data.keyvalue.redis.connection.jredis.JredisConnectionFactory Similar to what we do in Spring, we configure the template object by providing it with a connection factory. We will perform all the operations through this template object. By default we need to provide only Connection Factory, but there are more properties we can provide: exposeConnection (default false) - if we return real connection or proxy object. keySerializer, hashKeySerializer, valueSerializer, hashValueSerializer (default JdkSerializationRedisSerializer) which delegates serialization to Java serialization mechanism. stringSerializer (default StringRedisSerializer) which is simple String to byte[] (and back) serializer with UTF8 encoding. We are ready to execute some code which will be cooperating with the Redis instance. Spring-Data provides us with two ways of interaction, First is by using the execute method and providing a RedisCallback object. Second is by using *Operations helpers (these will be explained later) When we are using RedisCallback we have access to low level Redis commands, see this list of interfaces (I won't put all the methods here because it is huge list): RedisConnection - gathers all Redis commands plus connection management. RedisCommands - gathers all Redis commands (listed beloved). RedisHashCommands - Hash-specific Redis commands. RedisListCommands - List-specific Redis commands. RedisSetCommands - Set-specific Redis commands. RedisStringCommands - key/value specific Redis commands. RedisTxCommands - Transaction/Batch specific Redis commands. RedisZSetCommands - Sorted Set-specific Redis commands. Check RedisCallbackExample class, this was the hard way and the problem is we have to convert our objects into byte arrays in both directions, the second way is easier. Spring Data provides for us with Operations objects, so we have much more simpler API and all byte<->object conversion is made by serializer we setup (or the default one). Higher level API (you will easily recognize *Operation *Commands equivalents): HashOperations - Redis hash operations. ListOperations - Redis list operations. SetOperations - Redis set operations. ValueOperations - Redis 'string' operations. ZSetOperations - Redis sorted set operations. Most of methods get key as first parameters so we have an even better API for multiple operations on the same key: BoundHashOperations - Redis hash operations for specific key. BoundListOperations - Redis list operations for specific key. BoundSetOperations - Redis set operations for specific key. BoundValueOperations - Redis 'string' operations for specific key. BoundZSetOperations - Redis sorted set operations for specific key. Check RedisCallbackExample class to see some easy examples of *Operations usage. One important thing to mention is that you should use stringSerializers for keys, otherwise you will have problems from other clients, because standard serialization adds class information. Otherwise you end up keys such as: "\xac\xed\x00\x05t\x00\x05atInt" "\xac\xed\x00\x05t\x00\nmySuperKey" "\xac\xed\x00\x05t\x00\bsuperKey" Up until now we have just checked the API for Redis, but Spring Data offers more for us. All the cool stuff is in org.springframework.data.keyvalue.redis.support package and all sub-packages. We have: RedisAtomicInteger - Atomic integer (CAS operation) backed by Redis. RedisAtomicLong - Same as previous for Long. RedisList - Redis extension for List, Queue, Deque, BlockingDeque and BlockingQueue with two additional methods List range(start, end) and RedisList trim(start, end). RedisSet - Redis extension for Set with additional methods: diff, diffAndStore, intersect, intersectAndStore, union, unionAndStore. RedisZSet - Redis extension for SortedSet. Note that Comparator is not applicable here so this interface extends normal Set and provide proper methods similar to SortedSet. RedisMap - Redis extension for Map with additional Long increment(key, delta) method Every interface currently has one Default implementation. Check application-support.xml for examples of configuration and RedisSupportClassesExample for examples of use. There is lot of useful information in the comments as well. Summary The library is a first milestone release so there are minor bugs, the documentation isn't as perfect as we used to and the current version needs no stable Redis server. But this is definitely a great library which allows us to use all this cool NoSQL stuff in a "standard" Spring Data Access manner. Awesome job! This post is only useful if you checkout the code: from bitbucket , for the lazy ones here is spring-data-redis zip file as well. This post is originally from http://pietrowski.info/2011/01/spring-data-redis-tutorial/

February 3, 2011

by Sebastian Pietrowski

· 31,068 Views

Apache Solr: Get Started, Get Excited!

we've all seen them on various websites. crappy search utilities. they are a constant reminder that search is not something you should take lightly when building a website or application. search is not just google's game anymore. when a java library called lucene was introduced into the apache ecosystem, and then solr was built on top of that, open source developers began to wield some serious power when it came to customizing search features. in this article you'll be introduced to apache solr and a wealth of applications that have been built with it. the content is divided as follows: introduction setup solr applications summary 1. introduction apache solr is an open source search server. it is based on the full text search engine called apache lucene . so basically solr is an http wrapper around an inverted index provided by lucene. an inverted index could be seen as a list of words where each word-entry links to the documents it is contained in. that way getting all documents for the search query "dzone" is a simple 'get' operation. one advantage of solr in enterprise projects is that you don't need any java code, although java itself has to be installed. if you are unsure when to use solr and when lucene, these answers could help. if you need to build your solr index from websites, you should take a look into the open source crawler called apache nutch before creating your own solution. to be convinced that solr is actually used in a lot of enterprise projects, take a look at this amazing list of public projects powered by solr . if you encounter problems then the mailing list or stackoverflow will help you. to make the introduction complete i would like to mention my personal link list and the resources page which lists books, articles and more interesting material. 2. setup solr 2.1. installation as the very first step, you should follow the official tutorial which covers the basic aspects of any search use case: indexing - get the data of any form into solr. examples: json, xml, csv and sql-database. this step creates the inverted index - i.e. it links every term to its documents. querying - ask solr to return the most relevant documents for the users' query to follow the official tutorial you'll have to download java and the latest version of solr here . more information about installation is available at the official description . next you'll have to decide which web server you choose for solr. in the official tutorial, jetty is used, but you can also use tomcat. when you choose tomcat be sure you are setting the utf-8 encoding in the server.xml . i would also research the different versions of solr, which can be quite confusing for beginners: the current stable version is 1.4.1. use this if you need a stable search and don't need one of the latest features. the next stable version of solr will be 3.x the versions 1.5 and 2.x will be skipped in order to reach the same versioning as lucene. version 4.x is the latest development branch. solr 4.x handles advanced features like language detection via tika, spatial search , results grouping (group by field / collapsing), a new "user-facing" query parser ( edismax handler ), near real time indexing, huge fuzzy search performance improvements, sql join-a like feature and more. 2.2. indexing if you've followed the official tutorial you have pushed some xml files into the solr index. this process is called indexing or feeding. there are a lot more possibilities to get data into solr: using the data import handler (dih) is a really powerful language neutral option. it allows you to read from a sql database, from csv, xml files, rss feeds, emails, etc. without any java knowledge. dih handles full-imports and delta-imports. this is necessary when only a small amount of documents were added, updated or deleted. the http interface is used from the post tool, which you have already used in the official tutorial to index xml files. client libraries in different languages also exist. (e.g. for java (solrj) or python ). before indexing you'll have to decide which data fields should be searchable and how the fields should get indexed. for example, when you have a field with html in it, then you can strip irrelevant characters , tokenize the text into 'searchable terms', lower case the terms and finally stem the terms . in contrast, if you would have a field with text in it that should not be interpreted (e.g. urls) you shouldn't tokenize it and use the default field type string. please refer to the official documentation about field and field type definitions in the schema.xml file. when designing an index keep in mind the advice from mauricio : "the document is what you will search for. " for example, if you have tweets and you want to search for similar users, you'll need to setup a user index - created from the tweets. then every document is a user. if you want to search for tweets, then setup a tweet index; then every document is a tweet. of course, you can setup both indices with the multi index options of solr. please also note that there is a project called solr cell which lets you extract the relevant information out of several different document types with the help of tika. 2.3. querying for debugging it is very convenient to use the http interface with a browser to query solr and get back xml. use firefox and the xml will be displayed nicely: you can also use the velocity contribution , a cross-browser tool, which will be covered in more detail in the section about 'search application prototyping' . to query the index you can use the dismax handler or standard query handler . you can filter and sort the results: q=superman&fq=type:book&sort=price asc you can also do a lot more ; one other concept is boosting. in solr you can boost while indexing and while querying. to prefer the terms in the title write: q=title:superman^2 subject:superman when using the dismax request handler write: q=superman&qf=title^2 subject check out all the various query options like fuzzy search , spellcheck query input , facets , collapsing and suffix query support . 3. applications now i will list some interesting use cases for solr - in no particular order. to see how powerful and flexible this open source search server is. 3.1. drupal integration the drupal integration can be seen as generic use case to integrate solr into php projects. for the php integration you have the choice to either use the http interface for querying and retrieving xml or json. or to use the php solr client library . here is a screenshot of a typical faceted search in drupal : for more information about faceted search look into the wiki of solr . more php projects which integrates solr: open source typo3- solr module magento enterprise - solr module . the open source integration is out dated. oxid - solr module . no open source integration available. 3.2. hathi trust the hathi trust project is a nice example that proves solr's ability to search big digital libraries. to quote directly from the article : "... the index for our one million book index is over 200 gigabytes ... so we expect to end up with a two terabyte index for 10 million books" other examples for libraries: vufind - aims to replace opac internet archive national library of australia 3.3. auto suggestions mainly, there are two approaches to implement auto-suggestions (also called auto-completion) with solr: via facets or via ngramfilterfactory . to push it to the extreme you can use a lucene index entirely in ram. this approach is used in a large music shop in germany. live examples for auto suggestions: kaufda.de 3.4. spatial search applications when mentioning spatial search, people have geographical based applications in mind. with solr, this ordinary use case is attainable . some examples for this are : city search - city guides yellow pages kaufda.de spatial search can be useful in many different ways : for bioinformatics, fingerprints search, facial search, etc. (getting the fingerprint of a document is important for duplicate detection). the simplest approach is implemented in jetwick to reduce duplicate tweets, but this yields a performance of o(n) where n is the number of queried terms. this is okay for 10 or less terms, but it can get even better at o(1)! the idea is to use a special hash set to get all similar documents. this technique is called local sensitive hashing . read this nice paper about 'near similarity search and plagiarism analysis' for more information. 3.5. duckduckgo duckduckgo is made with open source and its "zero click" information is done with the help of solr using the dismax query handler: the index for that feature contains 18m documents and has a size of ~12gb. for this case had to tune solr: " i have two requirements that differ a bit from most sites with respect to solr: i generally only show one result, with sometimes a couple below if you click on them. therefore, it was really important that the first result is what people expected. false positives are really bad in 0-click, so i needed a way to not show anything if a match wasn't too relevant. i got around these by a) tweaking dismax and schema and b) adding my own relevancy filter on top that would re-order and not show anything in various situations. " all the rest is done with tuned open source products. to quote gabriel again: "the main results are a hybrid of a lot of things, including external apis, e.g. bing, wolframalpha, yahoo, my own indexes and negative indexes (spam removal), etc. there are a bunch of different types of data i'm working with. " check out the other cool features such as privacy or bang searches . 3.6. clustering support with carrot2 carrot2 is one of the "contributed plugins" of solr. with carrot2 you can support clustering : " clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. " see some research papers regarding clustering here . here is one visual example when applying clustering on the search "pannous" - our company : 3.7. near real time search solr isn't real time yet, but you can tune solr to the point where it becomes near real time, which means that the time ('real time latency') that a document takes to be searchable after it gets indexed is less than 60 seconds even if you need to update frequently. to make this work, you can setup two indices. one write-only index "w" for the indexer and one read-only index "r" for your application. index r refers to the same data directory of w, which has to be defined in the solrconfig.xml of r via: /pathto/indexw/data/ to make sure your users and the r index see the indexed documents of w, you have to trigger an empty commit every 60 seconds: wget -q http://localhost:port/solr/update?stream.body=%3ccommit/%3e -o /dev/null everytime such a commit is triggered a new searcher without any cache entries is created. this can harm performance for visitors hitting the empty cache directly after this commit, but you can fill the cache with static searches with the help of the newsearcher entry in your solrconfig.xml. additionally, the autowarmcount property needs to be tuned, which fills the cache with a newsearcher from old entries. also, take a look at the article 'scaling lucene and solr' , where experts explain in detail what to do with large indices (=> 'sharding') and what to do for high query volume (=> 'replicating'). 3.8. loggly = full text search in logs feeding log files into solr and searching them at near real-time shows that solr can handle massive amounts of data and queries the data quickly. i've setup a simple project where i'm doing similar things , but loggly has done a lot more to make the same task real-time and distributed. you'll need to keep the write index as small as possible otherwise commit time will increase too great. loggly creates a new solr index every 5 minutes and includes this when searching using the distributed capabilities of solr ! they are merging the cores to keep the number of indices small, but this is not as simple as it sounds. watch this video to get some details about their work. 3.9. solandra = solr + cassandra solandra combines solr and the distributed database cassandra , which was created by facebook for its inbox search and then open sourced. at the moment solandra is not intended for production use. there are still some bugs and the distributed limitations of solr apply to solandra too. tthe developers are working very hard to make solandra better. jetwick can now run via solandra just by changing the solrconfig.xml. solandra also has the advantages of being real-time (no optimize, no commit!) and distributed without any major setup involved. the same is true for solr cloud. 3.10. category browsing via facets solr provides facets , which make it easy to show the user some useful filter options like those shown in the "drupal integration" example. like i described earlier , it is even possible to browse through a deep category tree. the main advantage here is that the categories depend on the query. this way the user can further filter the search results with this category tree provided by you. here is an example where this feature is implemented for one of the biggest second hand stores in germany. a click on 'schauspieler' shows its sub-items: other shops: game-change 3.11. jetwick - open twitter search you may have noticed that twitter is using lucene under the hood . twitter has a very extreme use case: over 1,000 tweets per second, over 12,000 queries per second, but the real-time latency is under 10 seconds! however, the relevancy at that volume is often not that good in my opinion. twitter search often contains a lot of duplicates and noise. reducing this was one reason i created jetwick in my spare time. i'm mentioning jetwick here because it makes extreme use of facets which provides all the filters to the user. facets are used for the rss-alike feature (saved searches), the various filters like language and retweet-count on the left, and to get trending terms and links on the right: to make jetwick more scalable i'll need to decide which of the following distribution options to choose: use solr cloud with zookeeper use solandra move from solr to elasticsearch which is also based on apache lucene other examples with a lot of facets: cnet reviews - product reviews. electronics reviews, computer reviews & more. shopper.com - compare prices and shop for computers, cell phones, digital cameras & more. zappos - shoes and clothing. manta.com - find companies. connect with customers. 3.12. plaxo - online address management plaxo.com , which is now owned by comcast, hosts web addresses for more than 40 million people and offers smart search through the addresses - with the help of solr. plaxo is trying to get the latest 'social' information of your contacts through blog posts, tweets, etc. plaxo also tries to reduce duplicates . 3.13. replace fast or google search several users report that they have migrated from a commercial search solution like fast or google search appliance (gsa) to solr (or lucene). the reasons for that migration are different: fast drops linux support and google can make integration problems. the main reason for me is that solr isn't a black box —you can tweak the source code, maintain old versions and fix your bugs more quickly! 3.14. search application prototyping with the help of the already integrated velocity plugin and the data import handler it is possible to create an application prototype for your search within a few hours. the next version of solr makes the use of velocity easier. the gui is available via http://localhost:port/solr/browse if you are a ruby on rails user, you can take a look into flare. to learn more about search application prototyping, check out this video introduction and take a look at these slides. 3.15. solr as a whitelist imagine you are the new google and you have a lot of different types of data to display e.g. 'news', 'video', 'music', 'maps', 'shopping' and much more. some of those types can only be retrieved from some legacy systems and you only want to show the most appropriated types based on your business logic . e.g. a query which contains 'new york' should result in the selection of results from 'maps', but 'new yorker' should prefer results from the 'shopping' type. with solr you can set up such a whitelist-index that will help to decide which type is more important for the search query. for example if you get more or more relevant results for the 'shopping' type then you should prefer results from this type. without the whitelist-index - i.e. having all data in separate indices or systems, would make it nearly impossible to compare the relevancy. the whitelist-index can be used as illustrated in the next steps. 1. query the whitelist-index, 2. decide which data types to display, 3. query the sub-systems and 4. display results from the selected types only. 3.16. future solr is also useful for scientific applications, such as a dna search systems. i believe solr can also be used for completely different alphabets so that you can query nucleotide sequences - instead of words - to get the matching genes and determine which organism the sequence occurs in, something similar to blast . another idea you could harness would be to build a very personalized search. every user can drag and drop their websites of choice and query them afterwards. for example, often i only need stackoverflow, some wikis and some mailing lists with the expected results, but normal web search engines (google, bing, etc.) give me results that are too cluttered. my final idea for a future solr-based app could be a lucene/solr implementation of desktop search. solr's facets would be especially handy to quickly filter different sources (files, folders, bookmarks, man pages, ...). it would be a great way to wade through those extra messy desktops. 4. summary the next time you think about a problem, think about solr! even if you don't know java and even if you know nothing about search: solr should be in your toolbox. solr doesn't only offer professional full text search, it could also add valuable features to your application. some of them i covered in this article, but i'm sure there are still some exciting possibilities waiting for you!

January 25, 2011

by Peter Karussell

· 147,401 Views

Linqer – a nice tool for SQL to LINQ transition

Almost all .NET developers who have been working in several applications up to date are probably familiar with writing SQL queries for specific needs within the application. Before LINQ as a technology came on scene, my daily programming life was about 60-70% of the day writing code either in the front-end (ASPX, JavaScript, jQuery, HTML/CSS etc…) or in the back-end (C#, VB.NET etc…), and about 30-40% writing SQL queries for specific needs used within the application. Now, when LINQ is there, I feel that the percentage for writing SQL queries got down to about 10% per day. I don’t say it won’t change with time depending what technology I use within the projects or what way would be better, but since I’m writing a lot LINQ code in the latest projects, I thought to see if there is a tool that can automatically translate SQL to LINQ so that I can transfer many queries as a LINQ statements within the code. Linqer is a tool that I have tested in the previous two weeks and I see it works pretty good. Even I’m not using it yet to convert SQL to LINQ code because I did it manually before I discovered that Linqer could have really helped me, I would recommend it for those who are just starting with LINQ and have knowledge of writing SQL queries. Let’s pass through several steps so that I will help you get started faster… 1. Go to http://www.sqltolinq.com/ website and download the version you want. There is a Linqer Version 4.0.1 for .NET 4.0 or Linqer Version 3.5.1 for .NET 3.5. 2. Once you download the zip file, extract it and launch the Linqer4Inst.exe then add install location. In the location you will add, the Linqer.exe will be created. 3. Launch the Linqer.exe. Once you run it for first time, the Linqer Connection Pool will be displayed so that you can create connection to your existing Model Click the Add button Right after this, the following window will appear #1 – The name of the connection string you are creating #2 – Click “…” to construct your connection string using Wizard window #3 – Chose your language, either C# or VB #4 – Model LINQ to SQL or LINQ to Entities Right after you select LINQ to SQL, the options to select the files for the Model will be displayed. In our case I will select LINQ to SQL, and here is the current progress So, you can select existing model from your application or you can Generate LINQ to SQL Files so that the *.dbml and *.designer.cs will be automatically filled #5 – At the end, you can chose your context name of the model which will be used when generating the LINQ code Once you are done, click OK. You will get back to the parent window filled with all needed info and click Close. Note: You can later add additional connections in your Linqer Connections Pool from Tools –> Linqer Connections In the root folder where your Linqer.exe is placed, now you have Linqer.ini file containing the Connection string settings. Ok, now lets go to the interesting part. Lets create one (first) simple SQL query and try to translate it to LINQ statement. SQL Query select * from authors a where a.city = 'Oakland' If we add this query to Linqer, here is the result: So, the LINQ code is similar to the SQL code and is easy to read since it’s simple. Also, if you notice, the tool generates class (you can add class name) with prepared code for using in your project. Perfect! Now, lets try to translate a query with two joined tables (little bit more complex): SQL Query select * from employee left join publishers on employee.pub_id = publishers.pub_id where employee.fname like '%a' The LINQ generated code is: from employee in db.Employee join publishers in db.Publishers on employee.Pub_id equals publishers.Pub_id into publishers_join from publishers in publishers_join.DefaultIfEmpty() where employee.Fname.EndsWith("a") select new { employee.Emp_id, employee.Fname, employee.Minit, employee.Lname, employee.Job_id, employee.Job_lvl, employee.Pub_id, employee.Hire_date, Column1 = publishers.Pub_id, Pub_name = publishers.Pub_name, City = publishers.City, State = publishers.State, Country = publishers.Country } So, if you can notice the where clause, we said in the SQL query: ... like "%a" and the corresponding LINQ code in C# is ... EndsWith("a"); - Excellent! And the Class automatically generated by the tool is public class EmployeePubClass { private String _Emp_id; private String _Fname; private String _Minit; private String _Lname; private Int16? _Job_id; private Byte? _Job_lvl; private String _Pub_id; private DateTime? _Hire_date; private String _Column1; private String _Pub_name; private String _City; private String _State; private String _Country; public EmployeePubClass( String AEmp_id, String AFname, String AMinit, String ALname, Int16? AJob_id, Byte? AJob_lvl, String APub_id, DateTime? AHire_date, String AColumn1, String APub_name, String ACity, String AState, String ACountry) { _Emp_id = AEmp_id; _Fname = AFname; _Minit = AMinit; _Lname = ALname; _Job_id = AJob_id; _Job_lvl = AJob_lvl; _Pub_id = APub_id; _Hire_date = AHire_date; _Column1 = AColumn1; _Pub_name = APub_name; _City = ACity; _State = AState; _Country = ACountry; } public String Emp_id { get { return _Emp_id; } } public String Fname { get { return _Fname; } } public String Minit { get { return _Minit; } } public String Lname { get { return _Lname; } } public Int16? Job_id { get { return _Job_id; } } public Byte? Job_lvl { get { return _Job_lvl; } } public String Pub_id { get { return _Pub_id; } } public DateTime? Hire_date { get { return _Hire_date; } } public String Column1 { get { return _Column1; } } public String Pub_name { get { return _Pub_name; } } public String City { get { return _City; } } public String State { get { return _State; } } public String Country { get { return _Country; } } } public class List: List { public List(Pubs db) { var query = from employee in db.Employee join publishers in db.Publishers on employee.Pub_id equals publishers.Pub_id into publishers_join from publishers in publishers_join.DefaultIfEmpty() where employee.Fname.EndsWith("a") select new { employee.Emp_id, employee.Fname, employee.Minit, employee.Lname, employee.Job_id, employee.Job_lvl, employee.Pub_id, employee.Hire_date, Column1 = publishers.Pub_id, Pub_name = publishers.Pub_name, City = publishers.City, State = publishers.State, Country = publishers.Country }; foreach (var r in query) Add(new EmployeePubClass( r.Emp_id, r.Fname, r.Minit, r.Lname, r.Job_id, r.Job_lvl, r.Pub_id, r.Hire_date, r.Column1, r.Pub_name, r.City, r.State, r.Country)); } } Great! We have ready-to-use class for our application and we don't need to type all this code. Besides this way to generate code, you can in same time use this tool to see the db results I like this tool because mainly it’s very easy to use, lightweight and does the job pretty straight forward. You can try the tool and send me feedback using the comments in this blog post.

January 24, 2011

by Hajan Selmani

· 67,483 Views

Deterring “ToMany” Relationships in JPA models

This article considers the issues of one to many relationships from the JPA model, and looks at an alternative strategy to provide more efficient and fine grained data access, to build more robust and lightweight applications and web services. A fairly typical use is to have one entity ‘owned’ by the other in such a way that one entity is said to ‘have’ many instances of the other one. A typical example would be customer and orders : class Customer { @OneToMany(mappedBy="customer") private Set orders; } class Order { @ManyToOne private Customer customer; } In this trivial example, the order belongs to a customer, and the customer has a set of orders. We don’t have a problem with the ManyToOne relationship, especially as it is required in order to map the order back to the customer. When we load an order we will at most get a single reference to a customer. No, our problem is with the value we get from customer.getOrders() as this set of order entities doesn’t really serve any useful purpose and can cause more problems than it solves for the following reasons : Dumb Relationship – It will contain every order for this particular customer when you usually only want a subset of the orders that match a set of criteria. You either have to read them all and filter the ones you don’t want manually (which is what SQL is for) or you end up having to make a call to a method to get the specific entities you are interested in. Unbounded dataset – How many orders a customer has could vary and you could end up with a customer with thousands of orders. Combined with accidental eager fetching and loading a simple list of 10 people could mean loading thousands of entities. Unsecured Access – Sometimes we may want to restrict the items visible to the user based on their security rights. By making it available as a property controlled by JPA we lose that ability or have to implement it further down in the application stack. No Pagination – Similar to the unbounded dataset, you end up throwing the whole list into the pagination components and letting them sort out what to display. In most cases, you need to treat each dataset like it will eventually contain more than 30 records so you really need to consider pagination early. Overgrown object graph – When you request an entity, how much of the object graph do you need? How do you know which pieces to initialize so you can avoid LIEs? This is often the case with JPA, but is also more relevant when you take account of the needs to serialize object graphs to XML or JSON. Sometimes you might need the relationships and sometimes you do not depending on the context you will be using the data in. Rife with pitfalls – Who saves and cascades what and how do you bind one to the other? You create an order, and assign the customer, do you need to then add it to the customers list of orders or not. What happens if you forget to add it to the customer and you save the customer? Whatever strategy you pick for dealing with this will no doubt end up being implemented inconsistently. (Ok, the first four are really different facets of the same problem, that you can’t control the data you are getting back.) So what use are they? Well, they make it really tempting just to use customer.orders in the view which is suitable for some sets of data. They also allow the relationship to be used in ejbql statements, although the inverse of the relationship can also be used in most cases. Specifying this relationship can also allow you to cascade updates/deletes from the customer to the order, but then so can your database. Going Granular The best alternative I’ve found is to provide additional methods to obtain the relational information separate from the model. This more granular approach gives you plenty of ways of obtaining data from the database without the dangers and temptations of bad practices. For example, the Order object still has the Customer reference on it, which we use to obtain lists of orders from the data access layer which can be constrained by customer, time frame, or other criteria depending on where it is being used. Also, it allows data to be fetched when needed without having to define a single initialization strategy using annotations or mapping files. The code that knows what pieces of data it needs will have access to facilities to fetch the specific data it needs. Alternatively, the methods to fetch the data can either be exposed as web services directly or DTO objects can be used to build a data payload to be returned from a single web service that consolidates the calls. Regardless, you don’t need to worry about setting the JPA fetch or XML/JSON serialization policy permanently in the model. Some examples might be to fetch orders for a customer in different ways. public List getOrders(Long customerId) {...} public List getOrders(Long customerId,Date startDate,Date endDate) {...} public List getOrders(SearchCriteria searchCriteria,int firstResult,int pageSize) {...} What about @ManyToMany Good question. In most cases I find that what starts as a many to many relationship can usually be modeled as a separate entity because when you create a many to many relationship, there is usually additional information stored with that relationship. For example, a Users and Groups ManyToMany relationship has many users belonging to many groups and vice versa. The membership however also probably has start and end dates and also maybe a role within that group. This also exhibits one of the earlier problems in that user.getGroupMemberships() would return all group memberships past and present whereas you probably only want the active ones. Modeling it as a separate entity means it becomes an entity with two OneToMany relationships. While there are cases where the many to many relationship is literally just a pair of ids (think blog post tags, many tags to many posts), you could benefit at a later date by using an entity if you decide to add additional information into the relationship. In summary, moving relational fetches out of the data model and into the data layer means you remove some of the temptations of bad practices and create a library of reusable functions for fetching the data that can be used from different code points. From http://www.andygibson.net/blog/article/deterring-tomany-relationships-in-jpa-models/

January 18, 2011

by Andy Gibson

· 23,333 Views · 1 Like

Are the JPA callback methods useful?

Definitely no. The section 3.5 of JPA specification states: “In general, the lifecycle method of a portable application should not invoke EntityManager or Query operations, access other entity instances, or modify relationships within the same persistence context. A lifecycle callback method may modify the non-relationship state of the entity on which it is invoked.” Surely these restrictions has a good technical reason behind them, but from a business application developer perspective they mean that JPA callback methods are practically useless. For example, these scenarios are typical: In order to remove some entity we need to verify if some data exists, and we want do it using a JPA query. When an entity is saved, some other entities must be automatically created and saved, and we want to use the JPA EntityManager to do so. Unfortunately, it’s difficult to solve such cases using the standard annotations: @PrePersist, @PostPersist, @PreRemove, @PostRemove, @PreUpdate, @PostUpdate or @PostLoad. In order to remove some entity we need to verify if some data exists, and we want do it using a JPA query. When an entity is saved, some other entities must be automatically created and saved, and we want to use the JPA EntityManager to do so. Unfortunately, it’s difficult to solve such cases using the standard annotations: @PrePersist, @PostPersist, @PreRemove, @PostRemove, @PreUpdate, @PostUpdate or @PostLoad. What can we do? We have several options such as: Using JDBC from the callback methods: Horror! Create a new EntityManager in the callback method: This works sometimes, but you can have problems with isolation levels. Moreover, you lose the transactional behavior. Put the on-save or on-remove logic in the controller layer, that is in the actions: Of course, this works just fine, but if you access to the entities from other actions, from a batch process, or from a web service, the on-logic or on-remove will not be executed. Obviously, these options are dirty and unnatural, and even even worse, they mean more work for us. Create your own callback annotations In OpenXava, we have opted for the simplest solution for the poor application developer, just creating some new callback annotations that allow to use JPA inside them. OpenXava 4.0.1 includes the next new annotations: @PreCreate, @PostCreate and @PreDelete. For example, if we need to create a customer and assign it to an invoice when the customer is not specified, you can write: @PreCreate public void onPreCreate() { // Automatically create a new customer if (getCustomer() == null) { Customer cust = new Customer(); cust.setName(getName()); cust.setAddress(getAddress()); cust = XPersistence.getManager().merge(cust); // Here we use the EntityManager setCustomer(cust); // and here we change a relationship } } If you want to enjoy these annotations just use OpenXava for developing your application. Although if you are not still ready for rapid development, you can create these annotations yourself easily, just use the decorator pattern over the EntityManager or use AOP to refine the behavior of persist() and remove() methods. Learn more about these annotations

January 12, 2011

by Javier Paniza

· 15,301 Views · 1 Like

Interview: Troy Giunipero, Author of NetBeans E-commerce Tutorial

Troy Giunipero (pictured, right) is a student at the University of Edinburgh studying toward an MSc in Computer Science. Formerly, he was one of the NetBeans Docs writers based in Prague, Czech Republic, where he spent most of his time writing Java web tutorials. In this interview, Troy introduces you to The NetBeans E-commerce Tutorial. This is a very detailed tutorial describing just about everything you need to know when creating an e-commerce web application in Java. It has received a lot of very positive feedback. Let's find out about the background of this tutorial and what Troy learnt in writing it. Hi Troy! During your time on the NetBeans team, you wrote a very large tutorial describing how to create an e-commerce site. How and why did you start writing it? Well, there’s a short answer and a long answer to this. The short answer is that I was lucky to take part in Sun’s SEED (Sun Engineering Enrichment and Development) program. I wanted to focus on technical aspects, so I based my curriculum on developing an e-commerce application using Java technologies. I documented my efforts and applied them toward deliverables for the IDE’s 6.8 and 6.9 releases, resulting in the 13-part NetBeans E-commerce Tutorial. The long answer is that I had previously been tasked with creating an e-commerce application for my degree project (I was studying toward a BSc in IT and Computing), and ran into loads of trouble trying to integrate the various technologies into a cohesive, functioning application. I was coming from a non-technical background and found there was a steep learning curve involved in web development. My work was fraught with problems which I can now attribute to poor time-management, and a lack of good, practical, hands-on learning resources. So in a way, working on the AffableBean project (this is the project used in the NetBeans E-commerce Tutorial) was a way for me to go back and attempt to do the whole thing right. With the tutorial, I had two goals in mind: one, I wanted to consolidate my understanding of everything by writing about it, and two, I wanted to help others avoid the problems and pitfalls that I’d earlier ran into by designing a piece of documentation that puts everything together. Can you run us through the basic parts and what they provide? Certainly. First I want to point out that there’s a live demo application (http://dot.netbeans.org:8080/AffableBean/) which I managed to get up and running with help from Honza Pirek from the NetBeans Web Team (thanks Honza!): The application is modeled on the well-known MVC architecture: The tutorial refers to the above diagram at various points, and covers a bunch of different concepts and technologies along the way, including: Project design and planning (unit 2) Designing a data model (using MySQL WorkBench) (unit 4) Forward-engineering the data model into a database schema (unit 4) Database connectivity (units 3, 4, 6) Servlet, JSP/EL and JSTL technologies (units 3, 5, 6) EJB 3 and JPA 2 technologies (unit 7), and transactional support (unit 9) Session management (i.e., for the shopping cart mechanism) (unit 8) Form validation and data conversion (unit 9) Multilingual support (unit 10) Security (i.e., using form-based authentication and encrypted communication) (unit 11) Load testing with JMeter (unit 12) Monitoring the application with the IDE’s Profiler (unit 12) Tips for deployment to a production server (units 12, 13) Also, the tutorial aims to provide ample background information on the whole “Java specifications” concept, with an introduction to the Java Community Process, how final releases include reference implementations, and how these relate to the tutorial application using the IDE’s bundled GlassFish server (units 1, 7). Finally, the tutorial is as much about the above concepts and technologies as it is about learning to make best use of the IDE. I really tried to squeeze as much IDE-centric information in there as possible. So for example you’ll find: An introduction to the IDE’s main windows and their functions (unit 3) A section dedicated to editor tips and tricks (unit 5), and abundant usage of keyboard shortcuts in steps throughout the tutorial Use of the debugger (unit 8) Special “tip boxes” that discuss IDE functionality that is sometimes difficult to fit into conventional documentation. For example, there are tips on using the IDE’s Template Manager (unit 5), GUI support for database tables (unit 6), Javadoc support (unit 8), and support for code templates (unit 9). Did you learn any new things yourself while writing it? Yes! Three things immediately come to mind: EJB 3 technology. Initially this was a big hurdle for me. Using EJB 3 effectively seems to be something of an art form. If you know what you’re doing and understand exactly how to use the EntityManager to handle persistence operations on a database, EJB lets you do some amazingly smart things with just a few lines of code. But there seems to be a lack of good free documentation online—especially since EJB 3 is a significant departure from EJB 2. Therefore, almost all of the tutorial’s information on EJB comes from the very excellent book, EJB in Action by Debu Panda and Reza Rahman. Interpreting the NetBeans Profiler. The final hands-on unit, Testing an Profiling, was the most difficult for me to write, primarily because I just wasn’t familiar with the Profiler at all. I spent an unhealthy amount of time just watching the Telemetry graph run against my JMeter test plan, which is only slightly more stimulating than watching water come to boil. That being said, I feel that by just examining the graphs and other windows over time, critical logical associations start to jump out at you after a while. Likewise with JMeter. Hopefully unit 12 was able to capture and relay some of these. How to search online for decent articles and learning materials. The old Sun Technical Articles site was a great resource. Many of the links in the See Also sections at the bottom of tutorial units were found by adding site:java.sun.com/developer/technicalArticles/ to a Google search. Also the official forums (found at forums.sun.com) became a good place for questions I couldn’t find ready answers to. I had both the Java EE 5 and 6 Tutorials bookmarked. And Marty Hall’s Core Servlets and JavaServer Pages became an invaluable resource for the first half of the tutorial. What are your personal favorite features of the technologies discussed in the tutorial? I particularly liked learning about session management—using the HttpSession object to carry user-specific data between requests, and working with JSP’s corresponding implicit objects in the front-end pages. Session management is a defining aspect for e-commerce applications, as they need to provide some sort of shopping cart mechanism... ...and so the Managing Sessions unit (unit 8) was a key chapter in the tutorial. It’s extremely useful to be able to suspend the debugger on a portion of code that includes session-scoped variables, then hover the mouse over a given variable in the editor to determine its current value. I used the debugger continuously during this phase, and so I went so far as to incorporate use of the debugger throughout the Managing Sessions unit. What kind of background does someone starting the tutorial need to have? Someone can come to the tutorial with little or absolutely no experience using NetBeans. I’ve tried to be particularly careful in providing clear and easy-to-follow instructions in this respect. But one would be best off having some background or knowledge in basic web technologies, and at least some exposure to relational databases. With this foundation, I think that the topics covered in the second half of the tutorial, like applying entity classes and session beans, language support and security, won’t seem too daunting. I’ve noticed that the vast majority of feedback that comes in relates to the first half of the tutorial, and I sometimes get the impression that people feel they need to follow the tutorial units consecutively. Not so. The units are 90% modular. In other words, if somebody just wants to run through the security unit (unit 11), they can do so by downloading the associated project snapshot, follow the setup instructions, and then just follow along without needing to even look at other parts of the tutorial. What will they be able to do at the end of it? Naturally, anybody who completes individual tutorial units will be able to apply the concepts and technologies to their own work. But anyone who completes the tutorial in its entirety will gain an insight into the development process as a whole, and I think will also get a certain confidence that comes with knowing how “all the pieces fit together”—from gathering customer requirements all the way to deployment of the completed app to a production server. They’ll also have gained a solid familiarity with the NetBeans IDE, and be in a good position to explore popular Java web frameworks that work on top of servlet technology or impose an MVC architecture on their own, such as JSF, Spring, Struts, or Wicket. Do you see any problems in the technologies discussed and what would be your suggestions for enhancements? Well there’s one thing that comes to mind. When I started working on this project, I was studying the Duke’s BookStore example from the Java EE 5 Tutorial. A wonderful example that demonstrates how to progressively implement the same application using various technologies and combinations thereof. So for example you start out with an all-servlet implementation, then move on to a JSP/servlet version. Then there’s a JSTL implementation and ultimately, a version using JavaServer Faces. It’s great learning material, but also terrifically outdated. Right around this time, Sun was gearing up for the big Java EE 6 release (Dec. 2009), and I was also trying to learn about the new upcoming technologies, namely CDI, JSF 2, and EJB 3, for my regular NetBeans documentation work. I was getting the definite sense that JSP and JSTL were slowly being pushed aside—in the case of JavaServer Faces, Facelets templating was the new page authoring technology. So really, the E-commerce Tutorial application has become a sort of EE 5/EE 6 hybrid by combining JSP/JSTL with EJB 3 and JPA 2. Now the problem I see from the perspective of a student trying to learn this stuff from scratch, is that the leap from basic servlet technology to a full-blown JSF/EJB/JPA solution is tremendous, and cannot readily be taught through a single tutorial. Naturally, others may disagree with me here. I’m not sure if there’s a solution other than to compensate by producing a lot of quality learning material that covers lots of different use-cases. I’d suggest that the E-commerce Tutorial puts one in a very advantageous position to begin learning about Java-based frameworks, such as GWT, Spring, and JSF, which is a natural course of action for people looking to get a job with this knowledge. Planning any more parts to the tutorial or a new one? No more parts. The E-commerce Tutorial is done. Upon committing the final installments and changes last November, I rejoiced. However, I’m still actively responding to feedback [the ‘Send Us Your Feedback’ links at the bottom of tutorials] and plan to maintain it indefinitely, so if anyone spots any typos, has questions or comments, recommendations for improvement, etc., please write in! :-)

January 9, 2011

by Geertjan Wielenga

· 30,226 Views

A simple and intuitive approach to interface your database with Java

Introduction In recent years, I have experienced the same developer's need again and again. The need for improved persistence support. After lots of years of experience with Java, I have grown tired with all the solutions that are "standard", "J2EE compliant", but in the end, just ever so complicated. I don't deny, there are many good ideas around, that have eventually brought up excellent tools, such as Hibernate, JPA/EJB3, iBatis, etc. But all of those tools seem to go to a single direction without giving up any of that thought: Object-relational Mapping. So you end up using a performant database that cost's 100k+$ of license every year just to abstract it with a "standard" persistence layer. I wanted to go a different direction. And take the best of OR-Mapping (code generation, type safety, object oriented query construction, SQL dialect abstraction, etc) without denying the fact, that beneath, I'm running an RDBMS. That's right. R like Relational. Read on about how jOOQ (Java Object Oriented Querying) succeeds in bringing the "relational to the object" Abstract Many companies and software projects seem to implement one of the following two approaches to interfacing Java with SQL The very basic approach: Using JDBC directly or adding a home-grown abstraction on top of it. There is a lot of manual work associated with the creation, maintenance, and extension of the data layer code base. Developers can easily use the full functionality of the underlying database, but will always operate on a very low level, concatenating Strings all over the place. The very sophisticated approach: There is a lot of configuration and a steep learning curve associated with the introduction of sophisticated database abstraction layers, such as the ones created by Hibernate, JPA, iBatis, or even plain old EJB entity beans. While the generated objects and API's may allow for easy manipulation of data, the setup and maintenance of the abstraction layer may become very complex. Besides, these abstraction layers provide so much abstraction on top of SQL, that SQL-experienced developers have to rethink. A different paradigm I tried to find a new solution addressing many issues that I think most developers face every day. With jOOQ - Java Object Oriented Querying, I want to embrace the following paradigm: SQL is a good thing. Many things can be expressed quite nicely in SQL. The relational data model is a good thing. It should not be abstracted by OR-Mapping SQL has a structure and syntax. It should not be expressed using "low-level" String concatenation. Variable binding tends to be very complex when dealing with major queries. POJO's (or data transfer objects) are great when writing Java code manipulating database data. POJO's are a pain to write and maintain manually. Source code generation is the way to go The database comes first. Then the code on top of it. Yes, you do have stored procedures and user defined types (UDT's) in your legacy database. Your database-tool should support that. I think that these key ideas are useful for a very specific type of developer. That specific developer interfaces Java with huge legacy databases. knows SQL well and wants to use it extensively. doesn't want to learn any new language (HQL, JPQL, etc) doesn't want to spend one minute fine-tuning some sophisticated XML-configuration. wants little abstraction over SQL, because his software is tightly coupled with his database. Something that I think the guys at Hibernate or JPA seem to have ignored. needs a strong but light-weight library for database access. For instance to develop for mobile devices. How does jOOQ fit in this paradigm? Not only does jOOQ completely address the above paradigm, it does so quite elegantly. Let's say you have this database that models your bookstore. And you need to run a query selecting all books by authors born after 1920. You know how to do this in SQL: -- Select all books by authors born after 1920, named "Paulo" from a catalogue: SELECT * FROM t_author a JOIN t_book b ON a.id = b.author_id WHERE a.year_of_birth > 1920 AND a.first_name = 'Paulo' ORDER BY b.title The same query expressed with jOOQ-Objects // Instanciate your factory using a JDBC connection // and specify the SQL dialect you're using. Of course you can // have several factories in your application. Factory create = new Factory(connection, SQLDialect.MYSQL); // Create the query using generated, type-safe objects. You could // write even less code than that with static imports! SelectQuery q = create.selectQuery(); q.addFrom(TAuthor.T_AUTHOR); q.addJoin(TBook.T_BOOK, TAuthor.ID, TBook.AUTHOR_ID); // Note how you do not need to worry about variable binding. // jOOQ does that for you, dynamically q.addCompareCondition(TAuthor.YEAR_OF_BIRTH, 1920, Comparator.GREATER); // The AND operator and EQUALS comparator are implicit here q.addCompareCondition(TAuthor.FIRST_NAME, "Paulo"); q.addOrderBy(TBook.TITLE); The jOOQ query object model uses generated classes, such as TAuthor or TBook. Like many other code generation tools do, jOOQ will generate static final objects for the fields contained in each table. In this case, TAuthor holds a member called TAuthor.T_AUTHOR to represent the table itself, and members such as TAuthor.ID, TAuthor.YEAR_OF_BIRTH, etc to hold the table's fields. But you could also use the jOOQ DSL API to stay closer to SQL // Do it all "on one line". SelectQuery q = create.select() .from(T_AUTHOR) .join(T_BOOK).on(TAuthor.ID.equal(TBook.AUTHOR_ID)) .where(TAuthor.YEAR_OF_BIRTH.greaterThan(1920) .and(TAuthor.FIRST_NAME.equal("Paulo"))) .orderBy(TBook.TITLE).getQuery(); jOOQ ships with a DSL (Domain Specific Language) somewhat similar to Linq that facilitates query creation. The strength of DSL becomes obvious when you are using jOOQ constructs such as the decode function: // Create a case statement. Unfortunately "case" is a reserved word in Java // Hence the method is called DECODE after its related Oracle function Field nationality = create.decode() .when(TAuthor.FIRST_NAME.equal("Paulo"), "brazilian") .when(TAuthor.FIRST_NAME.equal("George"), "english") .otherwise("unknown"); // "else" is also a reserved word ;-) The above will render this SQL code: CASE WHEN T_AUTHOR.FIRST_NAME = 'Paulo' THEN 'brazilian' WHEN T_AUTHOR.FIRST_NAME = 'George' THEN 'english' ELSE 'unknown' END Use the DSL API when: You want your Java code to look like SQL You want your IDE to help you with auto-completion (you will not be able to write select .. order by .. where .. join or any of that stuff) Use the regular API when: You want to create your query step-by-step, creating query parts one-by-one You need to assemble your query from various places, passing the query around, adding new conditions and joins on the way In any case, all API's will construct the same underlying implementation object, and in many cases, you can combine the two approaches Once you have established the query, execute it and fetch results // Execute the query and fetch the results q.execute(); Result result = q.getResult(); // Result is Iterable, so you can loop over the resulting records like this: for (Record record : result) { // Type safety assured with generics String firstName = record.getValue(TAuthor.FIRST_NAME); String lastName = record.getValue(TAuthor.LAST_NAME); String title = record.getValue(TBook.TITLE); Integer publishedIn = record.getValue(TBook.PUBLISHED_IN); System.out.println(title + " (published in " + publishedIn + ") by " + firstName + " " + lastName); } Or simply write for (Record record : q.fetch()) { // [...] } Fetch data from a single table and use jOOQ as a simple OR-Mapper // Similar query, but don't join books to authors. // Note the generic record type that is added to your query: SimpleSelectQuery q = create.select(T_AUTHOR) .where(TAuthor.YEAR_OF_BIRTH.greaterThan(1920) .and(TAuthor.FIRST_NAME.equal("Paulo"))) .orderBy(TAuthor.LAST_NAME).getQuery(); // When executing this query, also Result holds a generic type: q.execute(); Result result = q.getResult(); for (TAuthorRecord record : result) { // With generate record classes, you can use generated getters and setters: String firstName = record.getFirstName(); String lastName = record.getLastName(); System.out.println("Author : " + firstName + " " + lastName + " wrote : "); // Use generated foreign key navigation methods for (TBookRecord book : record.getTBooks()) { System.out.println(" Book : " + book.getTitle()); } } jOOQ not only generates code to model your schema, but it also generates domain model classes to represent tuples in your schema. In the above example, you can see how selecting from the TAuthor.T_AUTHOR table will produce results containing well-defined TAuthorRecord types. These types hold getters and setters like any POJO, but also some more advanced OR-code, such as foreign key navigator methods like // Return all books for an author that are obtained through the // T_AUTHOR.ID = T_BOOK.AUTHOR_ID foreign key relationship public List getTBooks() Now, for true OR-mapping, you would probably prefer mature and established frameworks such as Hibernate or iBATIS. Don't panic. Better integration with Hibernate and JPA is on the feature roadmap. The goals of jOOQ should not be to reimplement things that are already well-done, but to bring true SQL to Java Execute CRUD operations with jOOQ as an OR-mapper // Create a new record and insert it into the database TBookRecord book = create.newRecord(T_BOOK); book.setTitle("My first book"); book.store(); // Update it with new values book.setPublishedIn(2010); book.store(); // Delete it book.delete(); Nothing new in the OR-mapping world. These ideas have been around since EJB entity beans or even before. It's still quite useful for simple purposes. Execute CRUD operations the way you're used to You don't need to go into that OR-mapping business. You can create your own INSERT, "INSERT SELECT", UPDATE, DELETE queries. Some examples: InsertQuery i = create.insertQuery(T_AUTHOR); i.addValue(TAuthor.FIRST_NAME, "Hermann"); i.addValue(TAuthor.LAST_NAME, "Hesse"); i.execute(); UpdateQuery u = create.updateQuery(T_AUTHOR); u.addValue(TAuthor.FIRST_NAME, "Hermie"); u.addCompareCondition(TAuthor.LAST_NAME.equal("Hesse")); u.execute(); // etc... Now for the advanced stuff Many tools can do similar stuff as what we have seen before. Especially Hibernate and JPA have a feature called criteria query, that provides all of the type-safety and query object building using DSL's while being based on a solid (but blown-up) underlying architecture. An important goal for jOOQ is to provide you with all (or at least: most) SQL features that you are missing in other frameworks but that you would like to use because you think SQL is a great thing but JDBC is too primitive for the year 2010, 2011, or whatever year we're in, when you're reading this. So, jOOQ comes along with aliasing, nested selects, unions and many other SQL features. Check out the following sections: Aliasing That's a very important feature. How could you have self-joins or in/exists clauses without aliasing? Let's say we have a "T_TREE" table with fields "ID", "PARENT_ID", and "NAME". If we want to find all parent/child NAME couples, we will need to execute a self-join on T_TREE. In SQL, this reads: SELECT parent.NAME parent_name, child.NAME child_name FROM T_TREE parent JOIN T_TREE child ON (parent.ID = child.PARENT_ID) No problem for jOOQ. We'll write: // Create table aliases Table parent = TTree.T_TREE.as("parent"); Table child = TTree.T_TREE.as("child"); // Create field aliases from aliased table Field parentName = parent.getField(TTree.NAME).as("parent_name"); Field childName = child.getField(TTree.NAME).as("child_name"); // Execute the above select Record record = create.select(parentName, childName) .from(parent) .join(child).on(parent.getField(TTree.ID).equal(child.getField(TTree.PARENT_ID))) .fetchAny(); // The aliased fields can be read from the record as in the simpler examples: record.getValue(parentName); Functionally, it is easy to see how this works. Look out for future releases of jOOQ for improvements in the DSL support of field and table aliasing IN clause The org.jooq.Field class provides many methods to construct conditions. In previous examples, we have seen how to create regular compare conditions with = < <= >= > != operators. Now Field also has a couple of methods to create IN conditions: // Create IN conditions with constant values that are bound to the // query via JDBC's '?' bind variable placeholders Condition in(T... values); Condition in(Collection values); Condition notIn(T... values); Condition notIn(Collection values); // Create IN conditions with a sub-select Condition in(QueryProvider query) Condition notIn(QueryProvider query) The constant set of values for IN conditions is an obvious feature. But the sub-select is quite nice: -- Select authors with books that are sold out SELECT * FROM T_AUTHOR WHERE T_AUTHOR.ID IN (SELECT DISTINCT T_BOOK.AUTHOR_ID FROM T_BOOK WHERE T_BOOK.STATUS = 'SOLD OUT'); In jOOQ, this translates to create.select(T_AUTHOR) .where (TAuthor.ID.in(create.selectDistinct(TBook.AUTHOR_ID) .from(T_BOOK) .where(TBook.STATUS.equal(TBookStatus.SOLD_OUT)))); EXISTS clause Very similar statements can be expressed with the EXISTS clause. The above set of authors could also be obtained with this statement: -- Select authors with books that are sold out SELECT * FROM T_AUTHOR a WHERE EXISTS (SELECT 1 FROM T_BOOK WHERE T_BOOK.STATUS = 'SOLD OUT' AND T_BOOK.AUTHOR_ID = a.ID); In jOOQ (as of version 1.5.0), this translates to // Alias the author table Table a = T_AUTHOR.as("a"); // Use the aliased table in the select statement create.selectFrom(a) .where(create.exists(create.select(create.constant(1)) .from(T_BOOK) .where(TBook.STATUS.equal(TBookStatus.SOLD_OUT) .and(TBook.AUTHOR_ID.equal(a.getField(TAuthor.ID)))))); UNION clauses SQL knows of four types of "UNION operators": UNION UNION ALL EXCEPT INTERSECT All of these operators are supported by all types of select queries. So in order to write things like: SELECT TITLE FROM T_BOOK WHERE PUBLISHED_IN > 1945 UNION SELECT TITLE FROM T_BOOK WHERE AUTHOR_ID = 1 You can write the following jOOQ logic: create.select(TBook.TITLE).from(T_BOOK).where(TBook.PUBLISHED_IN.greaterThan(1945)).union( create.select(TBook.TITLE).from(T_BOOK).where(TBook.AUTHOR_ID.equal(1))); Of course, you can then again nest the union query in another one (but be careful to correctly use aliases): -- alias_38173 is an example of a generated alias, -- generated by jOOQ for union queries SELECT alias_38173.TITLE FROM ( SELECT T_BOOK.TITLE, T_BOOK.AUTHOR_ID FROM T_BOOK WHERE T_BOOK.PUBLISHED_IN > 1945 UNION SELECT T_BOOK.TITLE, T_BOOK.AUTHOR_ID FROM T_BOOK WHERE T_BOOK.AUTHOR_ID = 1 ) alias_38173 ORDER BY alias_38173.AUTHOR_ID DESC In jOOQ: Select union = create.select(TBook.TITLE, TBook.AUTHOR_ID).from(T_BOOK).where(TBook.PUBLISHED_IN.greaterThan(1945)).union( create.select(TBook.TITLE, TBook.AUTHOR_ID).from(T_BOOK).where(TBook.AUTHOR_ID.equal(1))); create.select(union.getField(TBook.TITLE)) .from(union) .orderBy(union.getField(TBook.AUTHOR_ID).descending()); Note that a UNION query will automatically generate an alias if you use it as a nested table. In order to nest this query correctly, you need to get the aliased field from the query as seen in the example abov. Other, non-standard SQL features See more examples about stored procedures, UDT's, enums, etc on https://sourceforge.net/apps/trac/jooq/wiki/Examples Summary jOOQ brings the relational world to Java without trying to cover up its origins. jOOQ is relational. And object oriented. Just in a different way. Try it for yourself and I would be very glad for any feedback you may have. Find jOOQ on http://jooq.sourceforge.net Cheers Lukas Eder

December 14, 2010

by Lukas Eder

· 3,913 Views

Getting MySQL work with Entity Framework 4.0

Does MySQL work with Entity Framework 4.0? The answer is: yes, it works! I just put up one experimental project to play with MySQL and Entity Framework 4.0 and in this posting I will show you how to get MySQL data to EF. Also I will give some suggestions how to deploy your applications to hosting and cloud environments. MySQL stuff As you may guess you need MySQL running somewhere. I have MySQL installed to my development machine so I can also develop stuff when I’m offline. The other thing you need is MySQL Connector for .NET Framework. Currently there is available development version of MySQL Connector/NET 6.3.5 that supports Visual Studio 2010. Before you start download MySQL and Connector/NET: MySQL Community Server Connector/NET 6.3.5 If you are not big fan of phpMyAdmin then you can try out free desktop client for MySQL – HeidiSQL. I am using it and I am really happy with this program. NB! If you just put up MySQL then create also database with couple of table there. To use all features of Entity Framework 4.0 I suggest you to use InnoDB or other engine that has support for foreign keys. Connecting MySQL to Entity Framework 4.0 Now create simple console project using Visual Studio 2010 and go through the following steps. 1. Add new ADO.NET Entity Data Model to your project. For model insert the name that is informative and that you are able later recognize. Now you can choose how you want to create your model. Select “Generate from database” and click OK. 2. Set up database connection Change data connection and select MySQL Database as data source. You may also need to set provider – there is only one choice. Select it if data provider combo shows empty value. Click OK and insert connection information you are asked about. Don’t forget to click test connection button to see if your connection data is okay. If everything works then click OK. 3. Insert context name Now you should see the following dialog. Insert your data model name for application configuration file and click OK. Click next button. 4. Select tables for model Now you can select tables and views your classes are based on. I have small database with events data. Uncheck the checkbox “Include foreign key columns in the model” – it is damn annoying to get them away from model later. Also insert informative and easy to remember name for your model. Click finish button. 5. Define your classes Now it’s time to define your classes. Here you can see what Entity Framework generated for you. Relations were detected automatically – that’s why we needed foreign keys. The names of classes and their members are not nice yet. After some modifications my class model looks like on the following diagram. Note that I removed attendees navigation property from person class. Now my classes look nice and they follow conventions I am using when naming classes and their members. NB! Don’t forget to see properties of classes (properties windows) and modify their set names if set names contain numbers (I changed set name for Entity from Entity1 to Entities). 6. Let’s test! Now let’s write simple testing program to see if MySQL data runs through Entity Framework 4.0 as expected. My program looks for events where I attended. using(var context = new MySqlEntities()) { var myEvents = from e in context.Events from a in e.Attendees where a.Person.FirstName == "Gunnar" && a.Person.LastName == "Peipman" select e; Console.WriteLine("My events: "); foreach(var e in myEvents) { Console.WriteLine(e.Title); } } Console.ReadKey(); And when I run it I get the result shown on screenshot on right. I checked out from database and these results are correct. At first run connector seems to work slow but this is only the effect of first run. As connector is loaded to memory by Entity Framework it works fast from this point on. Now let’s see what we have to do to get our program work in hosting and cloud environments where MySQL connector is not installed. Deploying application to hosting and cloud environments If your hosting or cloud environment has no MySQL connector installed you have to provide MySQL connector assemblies with your project. Add the following assemblies to your project’s bin folder and include them to your project (otherwise they are not packaged by WebDeploy and Azure tools): MySQL.Data MySQL.Data.Entity MySQL.Web You can also add references to these assemblies and mark references as local so these assemblies are copied to binary folder of your application. If you have references to these assemblies then you don’t have to include them to your project from bin folder. Also add the following block to your application configuration file. ... ... Conclusion It was not hard to get MySQL connector installed and MySQL connected to Entity Framework 4.0. To use full power of Entity Framework we used InnoDB engine because it supports foreign keys. It was also easy to query our model. To get our project online we needed some easy modifications to our project and configuration files.

December 10, 2010

by Gunnar Peipman

· 24,570 Views

Using Sphinx and Java to Implement Free Text Search

As promised I am going to provide an article on how we can use Sphinx with Java to perform a full text search. I will begin the article with an introduction to Sphinx. Introduction to Sphinx Databases are continually growing and sometimes tend to hold about 100M records and need an external solution for full text search to be performed. I have picked Sphinx, an open source full-text search engine, distributed under GPL version 2 to perform a full text search on such a huge amount of data. Generally, it's a standalone search engine meant to provide fast, size-efficient and relevant full-text search functions to other applications very much compatible with an SQL Database. So my example will be based on the MySQL database, as we cannot produce millions of data to evaluate the real power of Sphinx, we will have a small amount of data and I think that should not be a problem. Here are few Sphinx Unique Features: high indexing speed (up to 10 MB/sec on modern CPUs) high search speed (avg query is under 0.1 sec on 2-4 GB text collections) high scalability (up to 100 GB of text, upto 100 M documents on a single CPU) provides distributed searching capabilities provides searching from within MySQL through pluggable storage engine supports boolean, phrase, and word proximity queries supports multiple full-text fields per document (upto 32 by default) supports multiple additional attributes per document (ie. groups, timestamps, etc) supports MySQL natively (MyISAM and InnoDB tables are both supported) The important features which have been adopted to perform a full text search are the provision of the Java API to integrate easily with the web application and considerably high indexing and searching speed with an average of 4-10 MB/sec & 20-30 ms/q @5GB,3.5M docs(wikipedia) Sphinx Terms & How It Works The fist principle part of sphinx is indexer. It is solely responsible for gathering the data that will be searchable. From the Sphinx point of view, the data it indexes is a set of structured documents, each of which has the same set of fields. This is biased towards SQL, where each row corresponds to a document, and each column to a field. Sphinx builds a special data structure optimized for our queries from the data provided. This structure is called index; and the process of building index from data is called indexing and the element of sphinx which carries out these tasks is called indexer. Indexer can be executed either from a regular script or command-line interface. Sphinx documents are equal to records in DB. Document is set of text fields and number attributes + unique ID – similar to row in DB Set of fields and attributes is constant for index – similar to table in DB Fields are searchable for FullText queries Attributes may be used for filtering, sorting, grouping searchd is the second principle tools as part of Sphinx. It is the part of the system which actually handles searches; it functions as a server and is responsible for receiving queries, processing them and returning a dataset back to the different APIs for client applications. Unlike indexer, searchd is not designed to be run either from a regular script or command-line calling, but instead either as a daemon to be called from init.d (on Unix/Linux type systems) or to be called as a service (on Windows-type systems). I am going to focus on Windows environment so later I will show you how we can install sphinx on windows as a service. Finally search is one of the helper tools within the Sphinx package. Whereas searchd is responsible for searches in a server-type environment, search is aimed at testing the index quickly without building a framework to make the connection to the server and process its response. This will only be used for testing sphinx from command – line and with respect to application’s requirement; searchd service will be used to query the MySql Server with a pre created index. Installation on Windows So now we come to the part of installing Sphinx on Windows: Download Sphinx from the official Sphinx download site i.e http://sphinxsearch.com (I downloaded Win32 release binaries with MySQL support: sphinx-0.9.9-win32.zip) Unzip the file to some folder, I unzipped to C:\devel\sphinx-0.9.9-win32 and added the bin directory to the windows path variable Well Sphinx is installed. Nice, simple, easy. Later I will tell how to set up indexes and search. Sample Application Till now I guess the whole motto of this article is clear to you, let's move ahead to define our sample application. We all use the Address Book to search for people by using their name or e-mail address when we want to immediately address an e-mail message to a specific person, people, or distribution list. We also search for people by using other basic information, such as e-mail alias, office location, and telephone number etc. I think most of the people on this planet are quire familiar with this kind of search, so let's make outlook address book as our sample database schema. Most of the fields are mapped from microsoft outlook, the only additional column is date of joining so that we can filter our queries based on joining dates of the employees. The example that I am going to put forth will use Sphinx to search for a particluar address entry using free text search, meaning the user is free to type in anything, here is our search screen, the DOJ (date of joining) search parameter is optional. The screen is self explanatory, let's move ahead and define our database. As Sphinx works well with MySQL and MySQL being free also, lets create our db scripts around mysql database (Those who wish to install MySQL can dowload it from http://www.mysql.com) Let's create our sample database 'addressbook' mysql> create database addressbook; Query OK, 1 row affected (0.03 sec) mysql> use addressbook; Database changed Note: The fields defined in the following tables are for the purpose of learning only and may not contain a complete set of fields that microsoft address book or any similar software may provide. mysql> CREATE TABLE addressbook ( Id int(11) NOT NULL, FirstName varchar(30) NOT NULL, LastName varchar(30) NOT NULL, OfficeId int(11) DEFAULT NULL, Title varchar(20) DEFAULT NULL, Alias varchar(20) NOT NULL, Email varchar(50) NOT NULL, DOJ date NOT NULL, PhoneNo varchar(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; mysql> CREATE TABLE CompanyLocations ( Id int(11) NOT NULL, Location varchar(60) NOT NULL, Country varchar(20) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; It's time to put some dummy data into the table, so let's fill our tables. Our virtual company 'gogs.it' has six offices across India and Singapore as defined in the following insert script. mysql> insert into CompanyLocations (Id, Location, Country) VALUES (1, 'Tower One, Harbour Front, Singapore', 'SG'); insert into CompanyLocations (Id, Location, Country) VALUES (2, 'DLF Phase 3, Gurgaon, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (3, 'Hiranandani Gardens, Powai, Mumbai, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (4, 'Hinjwadi, Pune, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (5, 'Toll Post, Nagrota, Jammu, India', 'IN'); insert into CompanyLocations (Id, Location, Country) VALUES (6, 'Bani (Kathua), India', 'IN'); Now comes the real stuff... The data sphinx is going to index, let's populate that as well...wooooo mysql> INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (1,'Aabheer','Kumar',1,'Mr','u534','[email protected]','2008-9-3', '+911234599990'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (2,'Aadarsh','Gupta',6,'Mr','u668','[email protected]','2007-2-23','+911234599991'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (3,'Aachman','Singh',5,'Mr','u2766','[email protected]','2006-12-18','+911234599992'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (4,'Aadesh','Shrivastav',5,'Mr','u3198','[email protected]','2007-11-23','+911234599993'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (5,'Aadi','manav',1,'Mr','u2686','[email protected]','2010-7-20','+911234599994'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (6,'Aadidev','singh',4,'Mr','u572','[email protected]','2010-8-18','+911234599995'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (7,'Aafreen','sheikh',4,'Smt','u1092','[email protected]','2007-7-11','+911234599996'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (8,'Aakar','Sherpa',5,'Mr','u1420','[email protected]','2009-10-3','+911234599997'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (9,'Aakash','Singh',4,'Mrs','u2884','[email protected]','2008-6-11','+911234599998'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (10,'Aalap','Singhania',4,'Mrs','u609','[email protected]','2010-10-8','+911234599999'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (11,'Aandaleeb','mahajan',1,'Smt','u131','[email protected]','2010-10-21','+911234580001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (12,'Mamata','kumari',5,'Sh','u2519','[email protected]','2009-6-12','+911234580002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (13,'Mamta','sharma',6,'Smt','u4123','[email protected]','2009-2-8','+911234580003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (14,'Manali','singh',6,'Mr','u1078','[email protected]','2008-6-14','+911234580004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (15,'Manda','saxena',1,'Mrs','u196','[email protected]','2010-9-4','+911234580005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (16,'Salila','shetty',3,'Miss','u157','[email protected]','2009-11-15','+911234580006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (17,'Salima','happy',3,'Mrs','u3445','[email protected]','2006-7-14','+911234580007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (18,'Salma','haik',5,'Sh','u4621','[email protected]','2008-6-23','+911234580008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (19,'Samita','patil',3,'Smt','u3156','[email protected]','2006-6-7','+911234580009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (20,'Sameena','sheikh',5,'Mrs','u952','[email protected]','2008-8-13','+911234580010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (21,'Ranita','gupta',5,'Mrs','u2664','[email protected]','2008-10-20','+911234580011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (22,'Ranjana','sharma',1,'Sh','u3085','[email protected]','2010-6-21','+911234580012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (23,'Ranjini','singh',6,'Mrs','u4200','[email protected]','2007-4-13','+911234580013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (24,'Ranjita','vyapari',2,'Smt','u1109','[email protected]','2008-1-22','+911234580014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (25,'Rashi','gupta',6,'Mrs','u3492','[email protected]','2006-2-2','+911234580015'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (26,'Rashmi','sehgal',3,'Mr','u3248','[email protected]','2008-9-9','+911234580016'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (27,'Rashmika','sexy',1,'Mrs','u4599','[email protected]','2009-3-12','+911234580017'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (28,'Rasika','dulari',3,'Smt','u2089','[email protected]','2009-1-24','+911234580018'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (29,'Dilber','lover',6,'Mr','u4241','[email protected]','2007-10-11','+911234580019'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (30,'Dilshad','happy',1,'Mr','u1564','[email protected]','2007-4-8','+911234580020'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (31,'Dipali','lights',5,'Sh','u1127','[email protected]','2006-11-1','+911234580021'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (32,'Dipika','lamp',1,'Sh','u2271','[email protected]','2010-12-17','+911234580022'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (33,'Dipti','brightness',5,'Smt','u422','[email protected]','2010-9-25','+911234580023'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (34,'Disha','singh',3,'Sh','u4604','[email protected]','2006-5-2','+911234580024'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (35,'Maadhav','Krishna',1,'Miss','u2561','[email protected]','2007-11-6','+911234580025'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (36,'Maagh','month',5,'Miss','u874','[email protected]','2008-5-8','+911234580026'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (37,'Maahir','Skilled',4,'Mr','u3372','[email protected]','2007-8-4','+911234580027'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (38,'Maalolan','Ahobilam',5,'Mrs','u3498','[email protected]','2007-7-9','+911234580028'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (39,'Maandhata','King',1,'Smt','u2089','[email protected]','2009-9-3','+911234580029'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (40,'Maaran','Brave',2,'Miss','u4020','[email protected]','2008-4-5','+9112345606001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (41,'Maari','Rain',2,'Sh','u3593','[email protected]','2007-12-5','+9112345606002'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (42,'Madan','Cupid',4,'Mrs','u795','[email protected]','2007-11-11','+9112345606003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (43,'Madangopal','Krishna',3,'Sh','u438','[email protected]','2007-2-19','+9112345606004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (44,'sahil','gogna',1,'Sh','u2273','[email protected]','2007-10-7','+9112345606005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (45,'nikhil','gogna',2,'Mr','u1240','[email protected]','2009-9-14','+9112345606006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (46,'amit','gogna',5,'Sh','u3879','[email protected]','2006-2-8','+9112345606007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (47,'krishan','gogna',4,'Miss','u3632','[email protected]','2010-9-20','+9112345606008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (48,'anil','kashyap',4,'Smt','u3939','[email protected]','2010-3-15','+9112345606009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (49,'sunil','kashyap',5,'Mrs','u3493','[email protected]','2008-3-16','+9112345606010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (50,'sandy','singh',6,'Mrs','u4691','[email protected]','2009-6-2','+9112345606011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (51,'vishal','kapoor',3,'Mr','u1087','[email protected]','2010-5-13','+9112345606012'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (52,'bala','ji',5,'Mrs','u4762','[email protected]','2007-8-9','+9112345606013'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (53,'karan','sarin',4,'Miss','u3030','[email protected]','2008-4-8','+9112345606014'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (54,'abhishek','kumar',4,'Miss','u1093','[email protected]','2008-12-21','+9112345605001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (55,'babu','the',1,'Miss','u1055','[email protected]','2008-7-2','+9112345506001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (56,'sandeep','gainda',3,'Miss','u1320','[email protected]','2010-5-14','+9112345606301'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (57,'dheeraj','kumar',3,'Miss','u3685','[email protected]','2007-10-14','+9112345606091'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (58,'dharmendra','chauhan',1,'Smt','u3235','[email protected]','2008-8-1','+9112345806001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (59,'max','alan',3,'Smt','u3465','[email protected]','2009-5-5','+9112345608011'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (60,'hidayat','khan',3,'Smt','u958','[email protected]','2007-11-18','+911234599101'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (61,'himnashu','singh',4,'Miss','u2027','[email protected]','2008-3-2','+911234599102'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (62,'dinesh','kumar',6,'Sh','u3233','[email protected]','2008-5-9','+911234599103'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (63,'toshi','prakash',1,'Mr','u3766','[email protected]','2010-9-17','+911234599104'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (64,'niti','puri',3,'Mr','u3575','[email protected]','2009-11-15','+911234599105'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (65,'pawan','tikki',3,'Sh','u3919','[email protected]','2006-3-19','+911234599106'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (66,'gaurav','sharma',2,'Sh','u413','[email protected]','2010-4-2','+911234599107'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (67,'himanshu','verma',2,'Mrs','u4732','[email protected]','2009-3-20','+911234599108'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (68,'priyanshu','verma',3,'Sh','u183','[email protected]','2010-8-12','+911234599109'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (69,'nitika','luthra',2,'Mrs','u4259','[email protected]','2010-7-12','+911234599110'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (70,'neeru','gogna',2,'Sh','u1633','[email protected]','2010-6-23','+91532110000'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (71,'bindu','gupta',1,'Sh','u1859','[email protected]','2006-11-10','+91532110001'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (72,'gurleen','bakshi',5,'Miss','u1423','[email protected]','2007-7-1','+91532110003'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (73,'rahul','gupta',3,'Sh','u1223','[email protected]','2009-8-11','+91532110004'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (74,'jagdish','salgotra',3,'Mr','u12','[email protected]','2008-5-19','+91532110005'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (75,'vikas','sharma',3,'Smt','u465','[email protected]','2006-6-2','+91532110006'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (76,'poonam','mahendra',2,'Sh','u1744','[email protected]','2009-12-2','+91532110007'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (77,'pooja','kulkarni',3,'Mrs','u1903','[email protected]','2008-10-6','+91532110008'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (78,'priya','mahajan',6,'Sh','u4205','[email protected]','2010-8-5','+91532110009'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (79,'manoj','zerger',1,'Mrs','u3369','[email protected]','2009-12-4','+91532110010'); INSERT INTO AddressBook(Id, FirstName, LastName, OfficeId, Title, Alias, Email, DOJ, PhoneNo) VALUES (80,'mohan','master',5,'Mr','u2841','[email protected]','2010-10-7','+91532110011'); Please note that above employee data is just a data *only data* I created using a small java programme using random number generators and reading some names file, so you may find titles getting messed up :( We next create a procedure that we will use from java to fetch records that we just inserted. DROP PROCEDURE IF EXISTS search_address_book; CREATE PROCEDURE search_address_book(IN address_ids VARCHAR(1000) ) BEGIN DECLARE search_address_query VARCHAR(2000) DEFAULT ''; SET address_ids = CONCAT('\'', REPLACE(address_ids, ',', '\',\''), '\''); SET search_address_query = CONCAT(search_address_query, ' select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, ab.Title as Title, ab.Alias as Alias, ab.Email as Email, ab.DOJ as DOJ, ab.PhoneNo as PhoneNo ' ); SET search_address_query = CONCAT(search_address_query, ' from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id '); SET search_address_query = CONCAT(search_address_query, ' where ab.id IN (', address_ids ,') '); SET @statement = search_address_query; PREPARE dynquery FROM @statement; EXECUTE dynquery; DEALLOCATE PREPARE dynquery; END; # To get records for ids 1, 6 and 7, we run following commands: call search_address_book('1,6,7'); Configuring Sphinx It turns out that it was not terribly difficult to setup sphinx, but I had a hard time finding instructions on the web, so I'll post my steps here. By default Sphinx looks for 'sphinx.co.in' configuration file to come with indexes and other stuff, lets create and define source and index for our sample application addressbook.conf (read between the lines) ############################################################################# ## data source definition ############################################################################# source addressBookSource { ## SQL settings for 'mysql' ## type = mysql # some straightforward parameters for SQL source types sql_host = localhost sql_user = root sql_pass = root sql_db = addressbook sql_port = 3306 # optional, default is 3306 # pre-query, executed before the main fetch query sql_query_pre = SET NAMES utf8 # main document fetch query, integer document ID field MUST be the first selected column sql_query = \ select ab.Id as Id , ab.FirstName as FName, ab.LastName as LName, cl.Location as Location, \ ab.Title as Title, ab.Alias as Alias, ab.Email as Email, UNIX_TIMESTAMP(ab.DOJ) as DOJ, ab.PhoneNo as PhoneNo \ from AddressBook ab left join CompanyLocations cl on ab.OfficeId=cl.Id sql_attr_timestamp = DOJ # document info query, ONLY for CLI search (ie. testing and debugging) , optional, default is empty must contain $id macro and must fetch the document by that id sql_query_info = SELECT * FROM AddressBook WHERE id=$id } ############################################################################# ## index definition ############################################################################# # local index example, this is an index which is stored locally in the filesystem index addressBookIndex { # document source(s) to index source = addressBookSource # index files path and file name, without extension, make sure you have this folder path = C:\devel\sphinx-0.9.9-win32\data\addressBookIndex # document attribute values (docinfo) storage mode docinfo = extern # memory locking for cached data (.spa and .spi), to prevent swapping mlock = 0 morphology = none # make sure this file exists exceptions =C:\devel\sphinx-0.9.9-win32\data\exceptions.txt enable_star = 1 } ############################################################################# ## indexer settings ############################################################################# indexer { # memory limit, in bytes, kiloytes (16384K) or megabytes (256M) # optional, default is 32M, max is 2047M, recommended is 256M to 1024M mem_limit = 32M # maximum IO calls per second (for I/O throttling) # optional, default is 0 (unlimited) # # max_iops = 40 # maximum IO call size, bytes (for I/O throttling) # optional, default is 0 (unlimited) # # max_iosize = 1048576 # maximum xmlpipe2 field length, bytes # optional, default is 2M # # max_xmlpipe2_field = 4M # write buffer size, bytes # several (currently up to 4) buffers will be allocated # write buffers are allocated in addition to mem_limit # optional, default is 1M # # write_buffer = 1M } ############################################################################# ## searchd settings ############################################################################# searchd { # hostname, port, or hostname:port, or /unix/socket/path to listen on listen = 9312 # log file, searchd run info is logged here # optional, default is 'searchd.log' log = C:\devel\sphinx-0.9.9-win32\data\log\searchd.log # query log file, all search queries are logged here # optional, default is empty (do not log queries) query_log = C:\devel\sphinx-0.9.9-win32\data\log\query.log # client read timeout, seconds # optional, default is 5 read_timeout = 5 # request timeout, seconds # optional, default is 5 minutes client_timeout = 300 # maximum amount of children to fork (concurrent searches to run) # optional, default is 0 (unlimited) max_children = 30 # PID file, searchd process ID file name # mandatory pid_file = C:\devel\sphinx-0.9.9-win32\data\log\searchd.pid # max amount of matches the daemon ever keeps in RAM, per-index # WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL # default is 1000 (just like Google) max_matches = 1000 # seamless rotate, prevents rotate stalls if precaching huge datasets # optional, default is 1 seamless_rotate = 1 # whether to forcibly preopen all indexes on startup # optional, default is 0 (do not preopen) preopen_indexes = 0 } # --eof-- Once the configuration is done, its time to index our sql data, the command to use is 'indexer' as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... indexing index 'addressBookIndex'... collected 80 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 80 docs, 5514 bytes total 0.057 sec, 96386 bytes/sec, 1398.43 docs/sec total 2 reads, 0.000 sec, 3.5 kb/call avg, 0.0 msec/call avg total 7 writes, 0.000 sec, 2.5 kb/call avg, 0.0 msec/call avg Note: As I told earlier that Sphinx creates 1 document for each row, as we had 80 rows in the database so a total of 80 docs are created. Time taken is also very very small, believe me I tried with half million rows and it took around 3-4 seconds :) cool isn't it? Once the index is up let's try to search few records, the utility command to perform search is 'search'. Ok Sphinx maharaj* please search for employee whose alias is u4732 C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf u4732 CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff using config file 'C:\devel\sphinx-0.9.9-win32\addressbook.conf'... index 'addressBookIndex': query 'u4732 ': returned 1 matches of 1 total in 0.001 sec displaying matches: 1. document=67, weight=1, doj=Fri Mar 20 00:00:00 2009 Id=67 FirstName=himanshu LastName=verma OfficeId=2 Title=Mrs Alias=u4732 [email protected] DOJ=2009-03-20 PhoneNo=+911234599108 words: 1. 'u4732': 1 documents, 1 hits words: 1. 'u4732': 1 documents, 1 hits As you can see above this is a unique record for Himanshu. Note: You see a lot of information for the result, this is because of following line in our configuration file sql_query_info = SELECT * FROM AddressBook WHERE id=$id If you want to see less columns you need to change the sql_query_info in configuration file. Let's try another search, sphinx maharaj* please tell me which all rows have gurleen or toshi in them. C:\devel\sphinx-0.9.9-win32\bin>search.exe --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --any toshi gurleen CONSOLE: displaying matches: 1. document=63, weight=2, doj=Fri Sep 17 00:00:00 2010 Id=63 FirstName=toshi LastName=prakash OfficeId=1 Title=Mr Alias=u3766 [email protected] DOJ=2010-09-17 PhoneNo=+911234599104 2. document=72, weight=2, doj=Sun Jul 01 00:00:00 2007 Id=72 FirstName=gurleen LastName=bakshi OfficeId=5 Title=Miss Alias=u1423 [email protected] DOJ=2007-07-01 PhoneNo=+91532110003 Exactly two records were returned and this is what we were expecting. The following special operators and modifiers can be used when using the extended matching mode: operator OR: nikhil | sahil operator NOT: hello -sandy hello !sandy field search operator: @Email [email protected] For a complete set of search features , I advise you to go through http://sphinxsearch.com/docs/manual-0.9.9.html#searching link. Sphinx as Windows Service Now our main aim is to use sphinx with JAVA API, so let's move towards that now, before java can utilize the true power of Sphinx, we need to start 'searchd' as a windows service so that our java programme can connect to sphinx search engine. Let's install Sphinx as a windows service so that our java program can use this daemon service to query the index that we just created, the command is : C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff Installing service... Service 'SphinxSearch' installed succesfully. Well now the sphinx is ready to serve us on port 9312 Note: If you try to install Sphinx without admin rights, you may get following error messages. C:\devel\sphinx-0.9.9-win32\bin>searchd.exe --install --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --servicename --port 9312 SphinxSearch CONSOLE: Installing service... FATAL: OpenSCManager() failed: code=5, error=Access is denied. Once done you can start the service as: c:\>sc start SphinxSearch (or alternatively from the services screen, start 'services.msc' in windows Run) If some how you want to delete the service , use c:\>sc delete SphinxSearch Let's create an adapter to fetch data from the database. package it.gogs.sphinx.util; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.sql.CallableStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.util.ArrayList; import java.util.List; import org.apache.log4j.Logger; /** * Adapter to fetch data from the database. * * @author Munish Gogna * */ public class AddressBookAdapter { private static Logger logger = Logger.getLogger(AddressBookAdapter.class); private AddressBookAdapter() { // use in static way.. } private static Connection getConnection() throws AddressBookTechnicalException { String userName = "root"; String password = "root"; String url = "jdbc:mysql://localhost/addressbook"; try { Class.forName("com.mysql.jdbc.Driver").newInstance(); return DriverManager.getConnection(url, userName, password); } catch (Exception e) { throw new AddressBookTechnicalException("could not get connection"); } } public static List getAddressBookList(List addressIds) throws AddressBookTechnicalException, AddressBookBizException { List addressBoookList = new ArrayList(); if (addressIds == null || addressIds.size() == 0){ logger.error("AddressIds was null or empty, returning empty list"); return addressBoookList; } Connection connection = null; CallableStatement callableStatement = null; try { connection = getConnection(); callableStatement = connection.prepareCall("{ call search_address_book(?)}"); callableStatement.setString(1, Utils.toCommaString(addressIds)); callableStatement.execute(); ResultSet resultSet = callableStatement.getResultSet(); prepareResults(resultSet, addressBoookList); connection.close(); } catch (SQLException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw new AddressBookTechnicalException(e.getMessage()); } catch (AddressBookTechnicalException e) { logger.error("Problem connecting MYSQL - " + e.getMessage()); throw e; } finally{ if(connection != null){ try { connection.close(); } catch (SQLException e) { logger.error("Problem closing conection - " + e.getMessage()); e.printStackTrace(); } } } return addressBoookList; } private static void prepareResults(ResultSet resultSet, List addressBoookList) throws SQLException { AddressBoook addressBoook; while (resultSet.next()) { addressBoook = new AddressBoook(); addressBoook.setAlias(resultSet.getString("Alias")); addressBoook.setEmail(resultSet.getString("Email")); addressBoook.setfName(resultSet.getString("FName")); addressBoook.setlName(resultSet.getString("LName")); addressBoook.setOfficeLocation(resultSet.getString("Location")); addressBoook.setPhoneNo(resultSet.getString("PhoneNo")); addressBoook.setTitle(resultSet.getString("Title")); addressBoook.setDateOfJoining(resultSet.getDate("DOJ")); addressBoook.setId(resultSet.getLong("Id")); addressBoookList.add(addressBoook); } } } Next we create the SphinxInstance that will parse the keywords and date range and provide us a list of Ids that matches the search. package it.gogs.sphinx.util; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxClient; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.api.SphinxMatch; import it.gogs.sphinx.api.SphinxResult; import it.gogs.sphinx.exception.AddressBookBizException; import java.util.ArrayList; import java.util.Date; import java.util.List; import org.apache.log4j.Logger; /** * Instance that will parse our free text and provide the results. * * Note: Make sure that 'searchd' is up and running before you use this class * @author Munish Gogna * */ public class SphinxInstance { private static String SPHINX_HOST = "localhost"; private static String SPHINX_INDEX = "addressBookIndex"; private static int SPHINX_PORT = 9312; private static SphinxClient sphinxClient; private static Logger logger = Logger.getLogger(SphinxInstance.class); static { sphinxClient = new SphinxClient(SPHINX_HOST, SPHINX_PORT); } public static List getAddressBookIds(SearchCriteria criteria) throws AddressBookBizException, SphinxException { List addressIdsList = new ArrayList(); try { if (Utils.isNull(criteria)) { logger.error("criteria is null"); throw new AddressBookBizException("criteria is null"); } if (Utils.isNull(criteria.getKeywords())) { logger.error("keyword is a required field"); throw new AddressBookBizException("keyword is a required field"); } DateRange dateRange = criteria.getDateRage(); if (!Utils.isNull(dateRange)) { if (Utils.isDateRangeValid(dateRange)) { // this is to filter results based on joining dates if they are provided sphinxClient.SetFilterRange("DOJ", getTimeInSeconds(dateRange.getFromDate()), getTimeInSeconds(dateRange.getToDate()), false); } else { logger.error(" fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); throw new AddressBookBizException("fromDate/toDate should not be empty and 'fromDate' should be less than equal to 'toDate'"); } } sphinxClient.SetMatchMode(SphinxClient.SPH_MATCH_EXTENDED2); sphinxClient.SetSortMode(SphinxClient.SPH_SORT_RELEVANCE, ""); SphinxResult result = sphinxClient.Query(buildSearchQuery(criteria), SPHINX_INDEX, "buidling query for address book search"); SphinxMatch[] matches = result.matches; for (SphinxMatch match : matches) { addressIdsList.add(String.valueOf(match.docId)); } } catch (SphinxException e) { throw e; } catch (AddressBookBizException e) { throw e; } logger.info("Total record(s):" + addressIdsList.size()); return addressIdsList; } private static long getTimeInSeconds(Date time) { return time.getTime()/1000; } private static String buildSearchQuery(SearchCriteria criteria) throws AddressBookBizException { String keywords[] = criteria.getKeywords().split(" "); StringBuilder searchFor = new StringBuilder(); for (String key : keywords) { if (!Utils.isEmpty(key)) { searchFor.append(key); if (searchFor.length() > 1) { searchFor.append("*|*"); } } } searchFor.delete(searchFor.lastIndexOf("|*"), searchFor.length()); StringBuilder queryBuilder = new StringBuilder(); String query = searchFor.toString(); queryBuilder.append("@FName *" + query + " | "); queryBuilder.append("@LName *" + query + " | "); queryBuilder.append("@Title *" + query + " | "); queryBuilder.append("@Location *"+ query + " | "); queryBuilder.append("@Alias *" + query + " | "); queryBuilder.append("@Email *" + query + " | "); queryBuilder.append("@PhoneNo *" + query); logger.info("Sphinx Query: " + queryBuilder.toString()); return queryBuilder.toString(); } } Here is the interface that I will expose to the outside world (in my future article I will expose this interface as Web Service) import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import java.util.List; /** * * @author Munish Gogna * */ public interface AddressBook { /** * Returns the list of AddressBook objects based on search criteria. * * @param criteria * @throws AddressBookTechnicalException * @throws AddressBookBizException * @throws SphinxException */ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException; } and here is the implementation class for the same. package it.gogs.sphinx.addressbook.impl; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import it.gogs.sphinx.util.AddressBookAdapter; import it.gogs.sphinx.util.SphinxInstance; /** * Implementation for our Address Book example * * @author Munish Gogna * */ public class AddressBookImpl implements AddressBook{ public List getAddressBookList(SearchCriteria criteria) throws AddressBookTechnicalException, AddressBookBizException, SphinxException { List addressIds= SphinxInstance.getAddressBookIds(criteria); return AddressBookAdapter.getAddressBookList(addressIds); } } ok so far so good, let's run some tests now ............ package it.gogs.sphinx.test; import java.util.Calendar; import java.util.GregorianCalendar; import java.util.List; import it.gogs.sphinx.AddressBoook; import it.gogs.sphinx.DateRange; import it.gogs.sphinx.SearchCriteria; import it.gogs.sphinx.addressbook.AddressBook; import it.gogs.sphinx.addressbook.impl.AddressBookImpl; import it.gogs.sphinx.api.SphinxException; import it.gogs.sphinx.exception.AddressBookBizException; import it.gogs.sphinx.exception.AddressBookTechnicalException; import junit.framework.TestCase; /** * * @author Munish Gogna * */ public class AddressBookTest extends TestCase { private AddressBook addressBook; @Override protected void setUp() throws Exception { super.setUp(); addressBook = new AddressBookImpl(); } @Override protected void tearDown() throws Exception { super.tearDown(); } /** this should be a unique record for Himanshu */ public void test_search_for_himanshu() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the first 'search' example?? criteria.setKeywords("u4732"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 1); assertTrue("expecting himanshu here", "himanshu".equals(addressList.get(0).getfName())); } /** only two employees have name gurleen or toshi */ public void test_search_for_gurleen_or_toshi() throws Exception { SearchCriteria criteria = new SearchCriteria(); // remember the second 'search' example?? criteria.setKeywords("gurleen toshi"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 2); assertTrue("expecting toshi here", "toshi".equals(addressList.get(0).getfName())); assertTrue("expecting gurleen here", "gurleen".equals(addressList.get(1).getfName())); } /** there are 16 people from jammu location */ public void test_search_for_people_from_jammu_location() throws Exception { SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("jammu"); List addressList = addressBook.getAddressBookList(criteria); assertTrue(addressList.size() == 16); } /** only Aalap, Manda and nitika are having title as Mrs and joined in 2010 */ public void test_joined_in_2010_with_title_Mrs() throws Exception { DateRange dateRange = new DateRange(); GregorianCalendar calendar1 = new GregorianCalendar(); calendar1.set(Calendar.YEAR, 2010); calendar1.set(Calendar.MONTH, Calendar.JANUARY); calendar1.set(Calendar.DAY_OF_MONTH, 1); dateRange.setFromDate(calendar1.getTime()); GregorianCalendar calendar2 = new GregorianCalendar(); calendar2.set(Calendar.YEAR, 2010); calendar2.set(Calendar.MONTH, Calendar.DECEMBER); calendar2.set(Calendar.DAY_OF_MONTH, 31); dateRange.setToDate(calendar2.getTime()); SearchCriteria criteria = new SearchCriteria(); criteria.setKeywords("Mrs"); criteria.setDateRage(dateRange); List addressList = addressBook.getAddressBookList(criteria); assertTrue("expecting 3 records here", addressList.size() == 3); } /** should get a business exception here */ public void test_without_specifying_keywords(){ SearchCriteria criteria = new SearchCriteria(); //criteria.setKeywords("Mrs"); try { addressBook.getAddressBookList(criteria); } catch (Exception e) { assertTrue(e instanceof AddressBookBizException); assertTrue(e.getMessage().indexOf("keyword is a required field") >-1); } } } How we update the Index once database changes? For these kinds of requirements, we can set up two sources and two indexes, with one "main" index for the data which only changes rarely (if ever), and one "delta" for the new documents. First Time data will go in the "main" index and the newly inserted address book entries will go into "delta". Delta index could then be reindexed very frequently, and the documents can be made available to search in a matter of minutes. Also one thing to take from this article is once 'searchd' daemon is running we can't index the data in normal way,we have to use --rotate option in such cases. For some applications where there is a timely batch update for the data, we can configure some cron job to reindex our documents in Sphinx as shown below. C:\devel\sphinx-0.9.9-win32\bin>indexer.exe --all --config C:\devel\sphinx-0.9.9-win32\addressbook.conf --rotate Capsule We asked Sphinx to provide us the Document Ids corresponding to our search parameters and then we used those Ids to fire database query. In case the data we want to return is included in Index (DOJ attribute for example in our case) we can skip the database portion, so choose wisely how much information (attributes) you want to include while you index your sql data. Well that's all ... it's time to say good bye. Take good care of your health and don't forget to vote, its a must :) - Munish Gogna

December 7, 2010

by Munish Gogna

· 38,157 Views · 2 Likes

Implementing Retries with a MDB or an MQ Batch Job? (WAS 7, MQ 6)

Both approaches have some advantages and disadvantages and so it’s a question of the likelihood of particular problems and business requirements and priorities.

November 10, 2010

by Jakub Holý

· 27,312 Views