Data Engineering Resources

The Latest Data Engineering Topics

Practical PHP Testing Patterns: Fake Object

The purpose of a Fake Object, a kind of Test Double, is to replace a collaborator with a functional copy. While Mocks prefer a specification of the behavior to check, Fake Objects are really a simplified version of the production object they substitute. A good Fake Object, however, will be very lightweight, easy to create and throw away to ensure test isolation. Usually it won't satisfy some other functional or non-functional requirements, otherwise we would use him instead of the real object. For example a UserRepository, which normally stores users in a database, may be substitued by a Fake that: does not send mails when an user is added. Doesn't really save anything in the database, but keeps a list in an array. When some complex method is called, just throws a NonImplementedException. Utility Of course performance and less brittleness of tests are the main selling points of a Fake Objects: think about testing with a real database and without one; however this is true for all Test Doubles and I don't need to repeat it. Often the Fake is used when the contract between the SUT and the collaborator is too complex to effectively build a Stub or Mock: many methods, or many calls to the same method are made. Return parameters are difficult to setup. A Fake avoids overspecifying a test for a very invasive contract, like the order of calls and precise parameters; a real implementation is more robust to changes in the contract. For example if the Fake is a collection, you can store and retrieve objects of another type, and still not change the Fake; expectations and configured return value instead would change (together). As an example, an embryonal Fake is PHPUnit $this->returnArgument() for the will() Mock expectation. It's a piece of functionality which substitutes an hard-coded expectation, in order to return one of the arguments of the call instead of adding the argument to the expectation itself. Note that many times we don't have to create new code to leverage Fakes: we can use the existing one with a different configuration (like a Cache maximumCachedItems=0, or with an adapter that use an in-memory cache instead of APC or Memcache) or something provided for us, like an sqlite in-memory database created by PDO. NullObjects are sometimes Fake, but are used in production more than in testing. Implementation Several steps are necessary for building an hand-rolled Fake: create the Fake class by hand, and define the simplest implementation that could do the job for this test. Remember, you're not testing the Fake, you're testing the SUT, so the Fake doesn't have to be perfect, but just passable for the test case at end. Instance an object from the Fake class: the constructor may be different from the real collaborator's one as it's not part of the contract. Install the Fake into the SUT, and proceed as normal. The steps for generating a Fake with PHPUnit are a little different; however, you will still have to write its businss logic: create the "mock" as always via getMock() or getMockBuilder(). Define expectations for its methods with will($this->returnCallback()) and some anonymous functions (or via self-shunting like we did for "Stubs"). Often objects passed in this closures (such as ArrayObject instances) can aid in making the Fake methods communicate with each other. Install the Fake and proceed with the test. The second approach is invaluable when you have a single method on the Fake: it's very fast. Variations Fake Database: an alternative implementation of the Database Facade that lets you test classes that depend on the database without actually using it. For exaple, a DAO which internally use an SplObjectStorage instead of the connection. In-Memory Database: sqlite3 in-memory database used with ORMs for tests that isolate you from the real database, but not from using PDO. High Return On Investment. Fake Web Service: mimics the interface of some web service like Google Analytics, when you have to interact also in write and not only in read. Fake Service Layer: simplified implementations of Service Layer classes, which avoid checking concurrency, authorization, authentication and so on in order to simplify functional testing. Example In this code sample, the test target a ForumManager class which needs to manipulate Posts. It's not a simple Facade: it moves Post around, merges threads and so on. So we inject in it a FakePostDao: ForumManager will call it instead of the database. When you I have many tests of this type, the time for writing the Fake implementation is well spent. array( new Post('Hello'), new Post('Hello!'), new Post('') ), 2 => array( new Post('Hi'), new Post('Hi!') ), 3 => array( new Post('Good morning.') ) )); $forumManager = new ForumManager($dao); $forumManager->mergeThreadsByIds(1, 2); $thread = $dao->getThread(1); $this->assertEquals(5, count($thread)); } } /** * The SUT. */ class ForumManager { private $dao; public function __construct(PostsDao $dao) { $this->dao = $dao; } public function mergeThreadsByIds($originalId, $toBeMergedId) { $original = $this->dao->getThread($originalId); $toBeMerged = $this->dao->getThread($toBeMergedId); $newOne = array_merge($original, $toBeMerged); $this->dao->removeThread($originalId); $this->dao->removeThread($toBeMergedId); $this->dao->addThread($originalId, $newOne); } } /** * Interface for the collaborator to substitute with the Test Double. */ interface PostsDao { public function getThread($id); public function removeThread($id); public function addThread($id, array $thread); } /** * Fake implementation. */ class FakePostDao implements PostsDao { private $threads; public function __construct(array $initialState) { $this->threads = $initialState; } public function getThread($id) { return $this->threads[$id]; } public function removeThread($id) { unset($this->threads[$id]); } /** * We model Thread as array of Posts for simplicity. */ public function addThread($id, array $thread) { $this->threads[$id] = $thread; } } /** * Again a Dummy object: minimal implementation, to make this test pass. */ class Post { }

March 16, 2011

by Giorgio Sironi

· 4,037 Views

SQL Server Driver for PHP Connection Options: Encrypt

This short post adds to my series on connection options in the SQL Server Driver for PHP. I’ll go into a bit more detail on the Encrypt and TrustServerCertificate options than the driver documentation does. I’ll start with three important points related to these options, then I’ll go into a couple of hypothetical situations that should shed more light on what these options actually do. The first thing to note is that these two options, Encrypt and TrustServerCertificate, are often used together. The Encrypt option is used to specify whether or not the connection to the server is encrypted (the default is false). The TrustServerCertificate option is used to indicate whether the certificate on the SQL Server instance should be trusted (the default is false). Note: Setting the Encrypt option does not mean that data is stored in an encrypted form. It simply means that data is encrypted while in transport to the server. Also note that this setting does not apply to authentication credentials such as a SQL Server password – authentication credentials are always encrypted. For information about storing encrypted data, see Encryption Hierarchy in the SQL Server documentation. The second thing to note is that, by default, when SQL Server is installed it creates a self-signed certificate that it will use to encrypt connections. Of course, self-signed certificates are not ideal for secure connections (they are vulnerable to man-in-the-middle attacks), so it is best to replace this certificate with one from a certificate authority (CA). The third thing to know is how to use these options when connecting to SQL Server. If you are using the SQLSRV API, your connection code might look something like this: $serverName = "serverName"; $connectionInfo = array( "Database"=>"DbName", "UID"=>"UserName", "PWD"=>"Password", "Encrypt"=>true, "TrustServerCertificate"=>false); $conn = sqlsrv_connect( $serverName, $connectionInfo); If you are using the PDO_SQLSRV API, your connection code might look something like this: $serverName = "serverName"; $conn = new PDO("sqlsrv:Server = $serverName; Database = DBName; Encrypt = true; TrustServerCertificate = false", "UserName", "Password"); Now, with those three things in mind, let’s look at a couple of examples. First suppose the following: You did not replace the self-signed certificate created when SQL Server was installed. You set Encrypt = true. You set TrustServerCertificate = false. In this scenario, your connection will fail. When you set TrustServerCertificate = false, you are asking for some 3rd party verification of the certificate. Because this is a self-signed certificate, there is no 3rd party to do the verification. However, if you set TrustServerCertificate = true, then your connection will succeed because you are trusting the certificate. (Note that, as mentioned earlier, this connection would be vulnerable to man-in-the-middle attacks.) Now consider the following: You replaced the self-signed certificate created when SQL Server was installed with a certificate from a certificate authority (CA). You set Encrypt = true. In this case (assuming there are no problems with your certificate), regardless of your setting for TrustServerCertificate, your connection will succeed. However, for a more secure connection (one not vulnerable to man-in-the-middle attacks), you should set TrustServerCertificate = false. Doing so will force third party verification of the certificate. For information about installing a certificate on SQL Server, see How to: Enable Encrypted Connections to the Database Engine. Note that that topic references setting the Force Encryption option on the database server. Setting this option to Yes on the server does the same thing as setting Encryption = true on the client – it forces the connection to be encrypted using the server’s certificate. That’s it for today. Thanks. -Brian

March 14, 2011

by Brian Swan

· 12,710 Views

Clustering Tomcat Servers with High Availability and Disaster Fallback

There has been a lot of buzz lately on high-availability and clustering. Most developers don't care and why should they? These features should be transparent to the application architecture and not something of concern to the developers of that application. But knowledge never hurts, so I emerged myself into the world of load balancing, heartbeats and virtual IP addresses. And you know what? Next time we need a infrastructure like this, I can at least sit down with the guys from the infrastructure department and at least know what the hell they are talking about. So what exactly is a high-availability clustered infrastructure (HACI, as I'll call it from now on) ? In essence, it should be a zero-downtime infrastructure (or at least perceived as one by the end user, which means never ever returning a default browser 404 page), capable of horizontal scaling when the need for it arises and without a single point of failure. It's the SLA writer's dream. A basic HACI setup looks like this: The users enters through a virtual IP address, assigned to one of the two load balancers. Only one of the load-balancers is active (the active master, LB1), the other one is there in the event LB1 fails ((LB2, a passive slave). The two load balancers are redundant, ie. having the exact same configuration. The load balancers redirect all traffic to the real servers. This can be done through round-robin assignment or through other means like sticky sessions, where the same user is redirected to the same server each and every time within a session. Servers can be added at any moment and configured on the load balancers. Ideally, the load balancer configuration is aware of the hardware specification and balances the load accordingly, but that's beyond the scope of this article (it involves adding weights). If all servers balanced by the load balancer fail, a backup server should be used to redirect all traffic coming from the load balancer. This can be a very lightweight server, which purpose is only to provide a sensible error page to the user (something like 'Sorry, we are performing maintenance'). Again, perception and immediate feedback to the user is key. You don't want to show the user a plain 404 page. Off course, if the backup server goes down too, you're in trouble (off course, by that time, warning bells should have gone off on every level in the hierarchy). So how to achieve this with as little effort as possible? If you want to try this out, I suggest you start by installing a virtual machine like VirtualBox or VMWare. This way you can try out the configuration yourself. In this example, I'll be load-balancing 3 Tomcat servers using sticky sessions using 2 load balancers in active-passive mode. I'm assuming all 3 Tomcat servers share the same hardware configuration, so they are all able to handle the same amount of traffic each. I'm also throwing in a backup server, in case all 3 Tomcat servers go down (serving a custom 503 page kindly informing the user of a catastrophic failure, instead of dropping the standard 404 bomb). You want to start off by assigning IP addresses to the servers. This will make your life a bit easier. We'll need 7 addresses: 3 for the tomcat server, 1 for the backup server, 2 for the loadbalancers and 1 virtual IP address to be shared between the load balancers (and which will be the entry point for your users). So our assignment will be: Virtual IP 10.0.5.99 www.haci.local LB1 10.0.5.100 lb1.haci.local #MASTER LB2 10.0.5.101 lb2.haci.local #SLAVE WEB1 10.0.5.102 web1.haci.local WEB2 10.0.5.103 web2.haci.local WEB3 10.0.5.104 web3.haci.local BACKUP 10.0.5.105 backup.haci.local Setting up the web servers is easy. You just install Tomcat on each server and create a simple JSP file to be served to users (make a small change, like the background color, on each server to distinguish the servers). I won't be covering session replication between the Tomcat servers, as it'll take me too far. If you want, you can configure the appropriate session replication and storage (using multicast or JDBC for example). The backup server I'm using is a basic LAMP server that returns a simple 503 page on every request it gets. The 503 error code is important, because it reflects the current state of the system: currently unavailable. For the loadbalancers I'll be using 2 applications: HAProxy and keepalived. HAProxy is going to handle load balancing, while keepalived will handle the failover between the two load balancers. First, we're going to configure HAProxy for both LB1 and LB2. Installing HAProxy is quite easy on an ubuntu system. Just do a sudo apt-get install haproxy and you're off. After the install, backup the current HAProxy config and start editing away. cp /etc/haproxy.cfg /etc/haproxy.cfg_orig cat /dev/null > /etc/haproxy.cfg vi /etc/haproxy.cfg The content of the config to reflect our setup should become something like this (same config on LB1 and LB2): global log 127.0.0.1 local0 log 127.0.0.1 local1 notice #log loghost local0 info maxconn 4096 #debug #quiet user haproxy group haproxy defaults log global mode http option httplog option dontlognull retries 3 redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 frontend http-in bind 10.0.5.99:80 default_backend servers backend servers mode http stats enable stats auth someuser:somepassword balance roundrobin cookie JSESSIONID prefix option httpclose option forwardfor option httpchk HEAD /check.txt HTTP/1.0 server web1 10.0.5.102:80 cookie haci_web1 check server web2 10.0.5.103:80 cookie haci_web2 check server web3 10.0.5.104:80 cookie haci_web3 check server webbackup 10.0.5.105:80 backup After this, enable HAProxy on both LB1 and LB2 by editing /etc/defaults/haproxy # Set ENABLED to 1 if you want the init script to start haproxy. ENABLED=1 # Add extra flags here. #EXTRAOPTS="-de -m 16" So far for the HAProxy configuration. We can't start it up yet, as LB1 and LB2 aren't listening yet on the virtual IP address. Next we'll configure the failover of the loadbalancers using keepalived. Installing it on Ubuntu is as easy as it was for HAProxy: sudo apt-get install keepalived. But its configuration is slightly different on both load balancers. First, we need to configure the both servers to be able to listen to the shared IP address. Add the following line to /etc/sysctl.conf: net.ipv4.ip_nonlocal_bind=1 And run sysctl -p Now, we configure keepalived so that LB1 is configured as the main load balancer and binds to the shared IP address, while LB2 is on standby, ready to take over whenever LB1 goes down. The configuration for LB1 looks like this (edit /etc/keepalived/keepalived.conf): vrrp_script chk_haproxy { # Requires keepalived-1.1.13 script "killall -0 haproxy" # cheaper than pidof interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface eth0 state MASTER virtual_router_id 51 priority 101 # 101 on master, 100 on backup virtual_ipaddress { 10.0.5.99 } track_script { chk_haproxy } } Start up keepalived and check whether it is listening to the virtual IP address. /etc/init.d/keepalived start ip addr sh eth0 It should return something like this, indicating it is listening to the virtual IP address 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff inet 10.0.5.100/24 brd 10.0.5.255 scope global eth0 inet 10.0.5.99/32 scope global eth0 inet6 fe80::20c:29ff:fea5:5b93/64 scope link valid_lft forever preferred_lft forever Next, we configure LB2. The configuration is almost the same, exception for the priority. vrrp_script chk_haproxy { # Requires keepalived-1.1.13 script "killall -0 haproxy" # cheaper than pidof interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface eth0 state MASTER virtual_router_id 51 priority 100 # 101 on master, 100 on backup virtual_ipaddress { 10.0.5.99 } track_script { chk_haproxy } } Start up keepalived and check the network interface. /etc/init.d/keepalived start ip addr sh eth0 It should return something like this, indicating it is not listening to the virtual IP address 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff inet 10.0.5.101/24 brd 10.0.5.255 scope global eth0 inet6 fe80::20c:29ff:fea5:5b93/64 scope link valid_lft forever preferred_lft forever Now, start up HAProxy on both LB1 and LB2. /etc/init.d/haproxy start Now you can issue requests to 10.0.5.99 (or www.haci.local), which will go to LB1, which in turn will load-balance the request to either WEB1, WEB2 and WEB3. You can test the load balancing by turning off WEB1 (or the server you're currently on). You can also the backup server by turning all main webservers (WEB1, WEB2 and WEB3). And you can test the loadbalancer failover by turning off LB1. At that point LB2 will kick in and act as the master, loadbalancing all requests. When you turn LB1 back on, it'll take over the master role once again. HAProxy allows you to add extra servers very easily, reloading the configuration without breaking existing sessions. See the HAProxy documentation for more info or on ServerFault. (http://serverfault.com/questions/165883/is-there-a-way-to-add-more-backend-server-to-haproxy-without-restarting-haproxy). Cheap and effective. While most enterprise shops have hardware load balancers, which also have these possibilities and more, if you're on a tight budget or need to simulate a HACI environment for development purposes (a lesson here: always simulate your production environment when you're testing during development), this might be the sane option. To finish, I'll quickly explain how to set up the backup server (a simple LAMP server). Create a vhost configuration on the apache for www.haci.local or any other domain pointing to the virtual IP address and set up mod_rewrite for it: RewriteEngine On RewriteCond %{REQUEST_URI} !\.(css|gif|ico|jpg|js|png|swf|txt)$ [NC] RewriteConf %{REQUEST_URI} !/503.php RewriteRule .* /503.php [L] Then create the 503.php file and add this to the top of it: Sorry, our servers are currently undergoing maintenance. Please check back with us in a while. Thank you for your patience. You can decorate the 503.php file any way you like. You can even use CSS, JavaScript and image files in the php file. Now, back to my IDE. I'm getting withdrawal symptoms.

March 11, 2011

by Lieven Doclo

· 58,008 Views

Persisting Entity Classes using XML in JPA

Preface This document explains the working of a JPA application purely based on XML configurations. This explains how to create entity classes without using any annotations in Java classes. Entire configuration is done using xml files. Introduction Persistence is one of the major challenges for enterprise applications. JPA is the persistence standard from Sun Microsystems. JPA supports two ways of configuring persistence information- through annotations and XML files. This article discusses the XML based approach towards embedding persistence information for the entity classes. Setting up the environment JPA is light-weight persistence model. It works for both JavaSE and JavaEE applications. All the required class files are present in the JavaEE.jar file which should be added to the classpath. Along with the JavaEE.jar file, we need to add the Persistence Provider jar file as well to the classpath. The provider can be Toplink/ Hibernate/ App server specific persistence provider. In this application, we’ve used Toplink as the persistence provider. So we need to add toplink-essentials.jar file as well in the class path. To summarize, the jar files required are: 1.JavaEE.jar 2.ToplinkEssentials.jar (Replace with the jar file for Persistence Provider, in case you want to use other persistence provider) 3.derbyClient.jar (For JavaDB (Derby) Database, replace this with the jar file of your database) Instead of adding the configuration details in the entity class in the form of annotations, we’ll add the configuration information in the orm.xml file. Before we look into the code, let us discuss the orm.xml file in detail. orm.xml orm.xml file contains all the configuration details required for mapping a class to a relational database table. These details include the primary key of the entity class and the various constraints/ rules to be applied for the primary key. Other details about the entity class include the various attributes of the entity class and columns to which the attributes should be mapped. We can also specify multiple constraints for the same attribute. Other configurable things include information about the various relationships the entity may have with other entities (one-to-one, many-to-one, many-to-many or one-to-many), embedded attributes, information about the version and transient attributes. One application can have multiple entities. The configuration details about all these entity classes, embeddable classes go inside the same orm.xml file. Name of this configuration file can be anything, not mandatorily orm.xml. It should be placed in META-INF subdirectory along with the persistence.xml file. Persistence.xml file should include the orm.xml file. Please find below sample orm.xml and persistence.xml file. The xml schema definitions are highlighted for both. Also, please note the entry required in persistence.xml for including the orm.xml file. PFB the entries made in a sample orm.xml, which’ll persist employee instances to the database. My First JPA XML Application entity Different properties in orm.xml file The various properties in the orm.xml in the order defined in xml schema definition are discussed below: Root tag is It contains the following four types of elements: The persistence-unit-metadata element contains metadata for the entire persistence unit. It is undefined if this element occurs in multiple mapping files within the same persistence unit. The package, schema, catalog and access elements apply to all the entity, mapped-superclass and embeddable elements defined in the same file in which they occur. The sequence-generator, table-generator, named-query, named-native-query and sql-result-set-mapping elements are global to the persistence unit. The entity, mapped-superclass and embeddable elements each define the mapping information for a managed persistent class. The mapping information contained in these elements may be complete or it may be partial. One line string information about the entity classes in the application. specifies the package of the classes listed within the sub elements and attributes of the same mapping file only. classAttribute defines the fully qualified class name of the entity class name: Attribute defines the name of the entity entity: tag can be repeated to embed mapping information for all the entity classes in the application. The Sub-Tags of entity tag are maps the database table to which the entity class shall be persisted overrides or creates a new id-class setting overrides or creates a new inheritance setting overrides or creates a new discriminator value setting, useful in Single_Table Inheritance strategy. overrides or creates a new discriminator column setting, used while configuring Super class in Single_Table inheritance strategy. A sequence-generator is unique by name A table-generator is unique by name used to define named-query for the entity class. Can be repeated to define multiple named queries for the entity class. used to define native named query for the entity class. creates or overrides a pre-persist setting i.e. the entity listener method to be invoked before persisting the entity instance. creates or overrides a post-persist setting i.e. the entity listener method to be invoked after persisting the entity instance. creates or overrides a pre-remove setting i.e. the entity listener method to be invoked before removing the entity instance. creates or overrides a post-remove setting i.e. the entity listener method to be invoked after removing the entity instance. creates or overrides a pre-update setting i.e. the entity listener method to be invoked before updating the entity instance. creates or overrides a post-update setting i.e. the entity listener method to be invoked before updating the entity instance. creates or overrides a post-load setting i.e. the entity listener method to be invoked after loading the entity instance state from database defines the attributes of the entity class which shall be persisted in the database table. The sub tags of the attributes are: :- this tag defines the id column of the entity class. Cannot be repeated for an entity. An entity class can have only one Id attribute. :-this is used to define the ID generation strategy to be used for the primary key column. :-this tag is used to map the entity columns to the columns in the database table. Should be repeated to provide configurations for all the attributes of the entity class. :- This tag is used to add the various column-level constraints on the entity attributes. E.g. unique, insertable, updatable, length, precision etc. :- maps the Version attribute of the entity class. Development Environment We’ve used NetBeans IDE 6.5.1 for creating this JavaSE application for persistence through XML files. NetBeans has persistence support for JPA as well as Hibernate. So, we can choose either Hibernate or Toplink Essentials as the Persistence Provider. Let’s name the application as “JPAEntity”. The name of the entity class is “Employee” and is placed in the package “entity”. “EmpClient.java” is the client class. The xml configuration files are put in META-INF sub directory. The directory structure of the application is as follows:- First JPA META-INF orm.xml persistence.xml entity Employee.java EmpClient.java Developing the Application Step1:Start the NetBeans IDE. Create a new Java Application Step 2:Add the jar files for the Persistence Provider (Toplink Essentials in our case), JavaEE.jar and the database driver(DerbyClient.jar in our case) to the classpath. Step 3: Add the entity class “Employee.java” and client class “EmpClient.java” in the package "jpaentity" This is the code of Employee.java package jpaentity; public class Employee { private int empId; private String empName; private double empSalary; public Employee() { } public Employee(int empId, String empName, double empSalary) { this.empId = empId; this.empName = empName; this.empSalary = empSalary; } public int getEmpId() { return empId; } public void setEmpId(int empId) { this.empId = empId; } public String getEmpName() { return empName; } public void setEmpName(String empName) { this.empName = empName; } public double getEmpSalary() { return empSalary; } public void setEmpSalary(double empSalary) { this.empSalary = empSalary; } @Override public String toString() { return "Employee Id:="+empId+ +" Employee Name:="+empName+" Employee Salary:="+empSalary; } }//End of Employee.java This is the code of EmpClient.java package jpaentity; import java.util.List; import javax.persistence.EntityManager; import javax.persistence.EntityManagerFactory; import javax.persistence.Persistence; public class EmpClient { private EntityManager em; private EntityManagerFactory emf; public void initEmfAndEm() { emf=Persistence.createEntityManagerFactory("JPAEntityPU"); em=emf.createEntityManager(); } public void cleanup() { em.close(); } public void insertAndRetrieve() { System.out.println("-------------------Creating the Objects---------------------"); Employee empObj1=new Employee(1, "Anu", 1000.0); Employee empObj2=new Employee(2, "Rahul", 1500.0); System.out.println("-------------------Starting the transaction---------------------"); em.getTransaction().begin(); em.persist(empObj1); em.persist(empObj2); System.out.println("-------------------Committing the transaction---------------------"); em.getTransaction().commit(); System.out.println("-------------------Objects saved successfully--------------------"); System.out.println("*******************************************************************"); System.out.println("------------------- Reading Objects--------------------"); List emps=em.createQuery("select p from Employee p").getResultList(); for (Employee current:emps) System.out.println(current); System.out.println("-------------------Finished Reading Objects--------------------"); } public static void main(String args[]) { EmpClient myClient=new EmpClient(); System.out.println("-------------------Starting the Client---------------------"); myClient.initEmfAndEm(); myClient.insertAndRetrieve(); myClient.cleanup(); System.out.println("---------------Shutting down the Client---------------------"); } }//End of EmpClient.java Step 4: Set up the database connection. Go to Services Tab in the NetBeans IDE and expand the Databases node. Right click on JavaDB node and select Create Database. Give the database name, username and password and click on OK. The database is created and ready to accept connections. Step 5: Add the persistence unit to the application. Also, add the orm.xml file. The name of the orm.xml file can be changed. We have named it mapping.xml. Following is the code of mapping.xml file My First JPA XML Application entity Add the following code to the persistence.xml file oracle.toplink.essentials.PersistenceProvider \META-INF\orm.xml jpaentity.Employee Steps for execution Compile the source files Build the project Start the Database Server Execute the “EmpClient” class. Connect to the database & verify the table has been created & data is also added in the table. Advantages and Disadvantages of Using XML for Configuration Advantages No coupling between the metadata and the source code Compatible with pre EJB3.0 development process Support from IDEs like NetBeans, Eclipse etc Easy to modify with the help of good editors. Disadvantages Complexity Difficulty in debugging in absence of editors Wrap Up This article helps in understanding entity configuration with XML files as an alternative to embedding annotations in Java code for configuring persistence details.

March 6, 2011

by Anu Bakshi

· 131,267 Views · 1 Like

IBatis (MyBatis): Handling Joins: Advanced Result Mapping, Association, Collections, N+1 Select Problem

This tutorial will present examples using advanced result mappings, how to handle mappings with association, collections, the n+1 problem, and more.

March 2, 2011

by Loiane Groner

· 121,371 Views · 3 Likes

A Custom Float PropertyEditor

Both Java SE and the NetBeans Platform have default property editors for several primitive and common data types. These are suitable for most cases and most of us almost never need to worry about it. There are, however, those few moments where the one-size-fits-all approach does not actually fit. For instance, the default float editor is a text editor: For general-purpose cases, this should be fine. Now, what if you want to restrict the values your property can have or have other control over it? Or simply give the user a more comfortable control for data input, like a JSpinner: Most of what I'll show was taken from the NetBeans tutorials and Javadoc, so I will not extend in details what you can find easily there. First, lets implement the PropertyEditor and the InlineEditor (don't forget to to fix imports): public abstract class FloatPropertyEditor extends PropertyEditorSupport implements ExPropertyEditor, InplaceEditor.Factory{ protected InplaceEditor ed = null; protected SpinnerNumberModel model; public FloatPropertyEditor(Object source, SpinnerNumberModel model) { super(source); this.model = model; } public FloatPropertyEditor(SpinnerNumberModel model) { this.model = model; } @Override public String getAsText() { Float d = (Float)getValue(); if (d == null) { return "0.0"; } return NumberFormat.getNumberInstance().format(d.floatValue()); } @Override public void setAsText(String s) { try { setValue(new Float( NumberFormat.getNumberInstance().parse(s).floatValue())); } catch (ParseException ex) { setValue(Float.valueOf(0.0f)); } } @Override public void attachEnv(PropertyEnv env) { env.registerInplaceEditorFactory(this); } @Override public InplaceEditor getInplaceEditor() { if (ed == null) { ed = new FloatInplaceEditor(model); } return ed; } protected static class FloatInplaceEditor implements InplaceEditor { private final JSpinner spinner; private PropertyEditor editor = null; private PropertyModel model; public FloatInplaceEditor(SpinnerNumberModel model) { this.spinner = new JSpinner(model); } @Override public void connect(PropertyEditor propertyEditor, PropertyEnv env) { editor = propertyEditor; reset(); } @Override public JComponent getComponent() { return spinner; } @Override public void clear() { //avoid memory leaks: editor = null; model = null; } @Override public Object getValue() { return spinner.getValue(); } @Override public void setValue(Object object) { try { spinner.setValue(object); } catch (IllegalArgumentException e) { spinner.setValue(null); } } @Override public boolean supportsTextEntry() { return true; } @Override public void reset() { Float d = (Float) editor.getValue(); if (d != null) { setValue(d); } } @Override public KeyStroke[] getKeyStrokes() { return new KeyStroke[0]; } @Override public PropertyEditor getPropertyEditor() { return editor; } @Override public PropertyModel getPropertyModel() { return model; } @Override public void setPropertyModel(PropertyModel propertyModel) { this.model = propertyModel; } @Override public boolean isKnownComponent(Component component) { return component == spinner || spinner.isAncestorOf(component); } @Override public void addActionListener(ActionListener actionListener) { //do nothing - not needed for this component } @Override public void removeActionListener(ActionListener actionListener) { //do nothing - not needed for this component } } } So far, we've been closely following the tutorials. Notice that the FloatPropertyEditor class is abstract. This is because PropertyEditor classes should have a default constructor (more on this below). Not much of a gain, maybe, but now you have a JSpinner as an editor. Now imagine you are developing a 3D "bodies in space" application. You can select an item and you can change its dimensions and coordinates at will. Coordinates can have any real value, whether negative, zero, or positive (besides the practical dimensional limits your universe might have), while dimensions must be at least be zero. This is where the abstract plays its role. By inheriting the class above, you can decide which values our properties can have. For coordinate properties, set this class as PropertyEditor: public class CoordinateFloatPropertyEditor extends FloatPropertyEditor { public CoordinateFloatPropertyEditor() { super(new SpinnerNumberModel(0.0, -100000.0, 100000.0, 0.5)); } } And, for dimension properties, use this one, instead: public class DimensionFloatPropertyEditor extends FloatPropertyEditor { public DimensionFloatPropertyEditor() { super(new SpinnerNumberModel(0.0, 0.09999, 100000.0, 0.5)); } } It's up to you to decide the maximum, minimum, and step values, and tweak this sample for your needs. I want also to add that this is part of a real application, from where I was inspired to write this article. I'll be further extending this subject in the next posts. Thanks for you attention. Muchas gracias por su atención.

March 1, 2011

by Alied Pérez

· 11,632 Views · 1 Like

5 Key Events in the history of Cloud Computing

While we have been evaluating in our blog posts the various features available on popular Cloud Computing platforms today, I thought it might be a good idea to understand when and how all this started and look back at where this began and trace some of the key events in the progress of cloud computing. Amazon like all other Internet companies in the period of the dot com bubble were left with large amounts of underutilized computing infrastructure, reports suggest less than 10% of the server infrastructure of many companies were being used. Amazon may have use cloud computing as a way to provide this unused resources as utility computing service when they launched S3 as the first true cloud computing service in March 2006. 1. Launch of Amazon Web Services in July 2002 The initial version of AWS in 2002 was focused more on making information available from Amazon to partners through a web services model with programmatic and developer support and was very focused on Amazon as a retailer. While this set the stage for the next steps the launch of S3 was the true step towards building a cloud platform. Amazon Press Release 2. S3 Launches in March 2006 Here are some interesting articles on the launch of S3 in 2006. The real breakthrough however was the pricing model for S3 which defined the model of 'pay-per-use' which has now become the defacto standard for cloud pricing. Also the launch of S3 really defined the shift of Amazon from being just a retailer to a strong player in the technology space. Techcrunch Post on S3 on March 14th, 2006 Read Write Web Post on S3 and EC2 on Nov 3rd, 2006 Business Week Article on Jeff Bezos vision on cloud computing on Nov 13th, 2006 3. EC2 Launches in August 2006 EC2 had a much quieter launch in August 2006 but i would think had the bigger impact by making core computing infrastructure available. This completed the loop on enabling a more complete cloud infrastructure being available. In fact at that time analysts had some difficulty in understanding what the big deal is, and thought it looks similar to other hosting services available online only with a different pricing model. Some interesting articles from that time on the launch: Technologyevangelist Blog Virtualization Info 4. Launch of Google App Engine in April 2008 The launch of Google App Engine in 2008 was the entry of the first pure play technology company into the Cloud Computing market. Google a dominant Internet company entering into this market was clearly a major step towards wide spread adoption of cloud computing. As with all their other products they introduced radical pricing models with a free entry level plan and extremely low cost computing and storage services which are currently among the lowest in the market. Techcrunch post on App Engine Launch Google App Engine Launch Post 5. Windows Azure launches Beta in Nov 2009 The entry of Microsoft into Cloud Computing is a clear indication of the growth of the space. Microsoft for long has not accepted the Internet and the web as a significant market and has continued to focus on the desktop market for all these years. I think this is a realization that a clear shift is taking place. The launch of Azure is a key event in the history of cloud computing with the largest software company making a small but significant shift to the web. Launch of Azure Beta Azure General Availability - Feb 2010 You might also like: Cloud Computing, Google App Engine: How big is the market Really ? Comparing Google App Engine with Amazon EC2 Comparing Amazon EC2 and Microsoft Azure Languages Supported by Google App Engine Cloud Computing: What is it really ?

February 26, 2011

by Kaushik Raghupathi

· 48,086 Views

How to remove getters and setters

Getters and setters are one of the first abstraction step that is thought over public fields in object-oriented programming. However, the paradigm was never about encapsulating properties by providing a special reading/writing mechanism via methods, but about objects responding to messages. In other words, encapsulation is (not only, but also) about being capable of changing private fields. If you write getters and setters, you introduce a leaky abstraction over private fields, since the names and number of your public methods are influenced coupled to them. They aren't really private anymore: for example in service classes I pass dependencies for private methods in the constructor, and the client code is never affected when these dependencies change. With getters and setters, a field addition, removal or renaming affects the contract of the class. In this article, we're going to take this User Entity class and explore the many ways we have for removing getters and setters. The choice between them depends mainly on what you need them for: it's the client code that should decide what contracts wants from these classes. nickname = $nickname; } public function getNickname() { return $nickname; } public function setPassword($password) { $this->password = $password; } public function getPassword() { return $this->password; } // ... you get the picture: other 8 getters or setters } The various techniques are ordered by complexity. As always, I express use cases for the User class via a test case. All the code is on Github. Constructor First, we can pass some fields in the constructor. If we do not expose the field then, the value will be immutable and the client code will never even know that this value exists. For example, our User has an immutable nickname, which serves also as a primary key: public function testPassWriteOnlyDataInTheConstructor() { $user = new User('giorgiosironi'); // should not explode //removed the setter as it cannot be changed } class User { public function __construct($nickname, $activationKey = '') { $this->nickname = $nickname; $this->activationKey = $activationKey; } } Information Expert Next, we have the Information Expert pattern: assign an operation to the object with the greatest knowledge for accomplishing it. If you do so, you won't need to expose private fields via getters and setters, since you will model some behavior as code in that class, which can see private fields. For example, when we register an User we want to activate it via email verification, so we send an activation key via mail. But to check it, we don't need to extract the value from the User object. // Information Expert /** * @expectedException InvalidArgumentException */ public function testAnUserIsActivated() { $user = new User('giorgiosironi', 'ABC'); $user->activate('AB'); } class User { public function activate($key) { if ($this->activationKey === $key) { $this->active = true; return; } throw new InvalidArgumentException('Key for activation is incorrect.'); } } This style is an example of the Tell, Don't Ask principle: we tell our User to do something instead of asking it for information and doing it ourselves. Double Dispatch Putting behavior inside an Entity class is nice, but sometimes the operation needs some external dependency to work, like a login mechanism needing a storage for the identity of the user (usually the session). We have two types of coupling to solve here: static: the User class should not depend on any other infrastructure class, which in turn refers the database or the session storage. runtime: the User class cannot hold a reference to an infrastructure object, for instance because we want to serialize it or to create it simply with a new operator, or our ORM does not support injection of collaborators. The first issues is solved by introducing an interface implemented by the infrastructure class; the second by passing in the dependency via Double Dispatch instead of via the constructor like we do with service classes. public function testUsersLogin() { $user = new User('giorgiosironi'); $user->setPassword('gs'); // will be removed in next tests // in reality, we would use a SessionLoginAdapter or something like that $loginAdapterMock = $this->getMock('LoginAdapter'); $loginAdapterMock->expects($this->once()) ->method('storeIdentity') ->with('giorgiosironi'); $user->login('gs', $loginAdapterMock); } class User { public function login($password, LoginAdapter $loginAdapter) { if ($this->password == $password) { $loginAdapter->storeIdentity($this->nickname); return true; } else { return false; } } } Still an example of Tell, Don't Ask, but more real world now. Command and changesets You really have to put data inside this object: the user has just compiled a form and you have to get thos input values in. So how do we define that operation? As an atomic method call, by passing in what I call a Changeset but it's a specialization of a Command (not Command pattern but CommandQueryResponsibilitySegregation). In the simplest cases, it's just a Value Object or a Data Transfer Object with no behavior. public function testCommandForChangingPassword() { $user = new User('giorgiosironi'); $passwordChange = new ChangeUserPassword('', 'gs'); $user->handle($passwordChange); $this->assertEquals('gs', $user->getPassword()); //deprecated, will be removed in next tests } class User { public function handle($command) { if ($command instanceof ChangeUserPassword) { $this->handleChangeUserPassword($command); } if ($command instanceof SetUserDetails) { $this->handleSetUserDetails($command); } // support other commands here... } private function handleChangeUserPassword(ChangeUserPassword $command) { if ($command->getOldPassword() == $this->password) { $this->password = $command->getNewPassword(); } else { throw new Exception('The old password is not correct.'); } } } Think about it: you will have to put these getters and setters somewhere; it's best to put them on an object which is a data structure than that on your Entity. This way: you will be coupled to the current fields only on this particular operation and not when you pass an User around. you will make it clear that you support only a full update operation, and it's not ok to call an isolate setter. Actually in PHP you could just use an array as a Changeset, but a class provides a stricter contract. Also public fields are not viable for a contract as no error will be raised by PHP if you assign a non-existent field on a Changeset object. Rendering on a canvas In the Growing Object-Oriented Software mailing list, there has been recently a discussion on how to emulate getters via callbacks. This solution is the specular of our Changeset argument used for extracting data instead of updating it. public function testCanvasForRenderingAnObject() { $user = new User('giorgiosironi'); $detailsSet = new SetUserDetails('Italy', 'Pizza'); //THIS may have set/get $user->handle($detailsSet); // canvas can also be a form, or xml, or json... $canvas = new HtmlCanvas('{{location}{{favoriteFood}'); $user->render($canvas); $this->assertEquals('ItalyPizza', (string) $canvas); } class User { public function render(Canvas $canvas) { $canvas->nickname = $this->nickname; $canvas->location = $this->location; $canvas->favoriteFood = $this->favoriteFood; } } Again, the canvas is hidden behind an interface and can be anything: an HTML view, a form, a JSON or RSS feed generator... CQRS In Command-Query Responsibility Segregation, you use the ORM for mapping your objects in the database, and fill your report screens by querying it directly, or even by querying another store which is continuously rebuilt from your master database. I don't know of any implementations of CQRS in PHP, but this mechanism promises to at least eliminate getters, as your domain objects will be write-only. Conclusion The full code is in this Github repository, as always. You have no excuses now: go and ditch one of your getters and setters immediately. Your code will breath a bit of fresh air.

February 22, 2011

by Giorgio Sironi

· 22,203 Views

Getting Started with iBatis (MyBatis): Annotations

This tutorial will walk you through how to setup iBatis (MyBatis) in a simple Java project and will present examples using simple insert, update, select and delete statements using annotations. This is the third tutorial of the iBatis/MyBatis series, you can read the first 2 tutorials on the following links: Introduction to iBatis (MyBatis), An alternative to Hibernate and JDBC Getting Started with iBatis (MyBatis): XML Configuration iBatis/ MyBatis 3 offers a new feature: annotations. But it lacks examples and documentation about annotations. I started to explore and read the MyBatis mailing list archive to write this tutorial. Another thing you will notice is the limitation related to annotations. I am going to demonstrate some examples that you can do with annotations in this tutorial. All the power of iBatis is in its XMl configuration. So let’s play a little bit with annotations. It is much simpler and you can use it for simple queries in small projects. As I already mentioned, if you want something more complex, you will have to use XML configuration. One more thing before we get started: this tutorial is the same as the previous one, but instead of XML, we are going to use annotations. Pre-Requisites For this tutorial I am using: IDE: Eclipse (you can use your favorite one) DataBase: MySQL Libs/jars: Mybatis, MySQL conector and JUnit (for testing) This is how your project should look like: Sample Database Please run this script into your database before getting started with the project implementation: I will not post the sample database here again. You can get it from the previous post about iBatis or download this sample project. The files are the same. 1 – Contact POJO We will create a POJO class first to respresent a contact with id, name, phone number and email address – same as previous post. 2 – ContactMapper In this file, we are going to set up all the queries using annotations. It is the MyBatis-Interface for the SQLSessionFactory. A mapper class is simply an interface with method definitions that match up against the SqlSession methods. Mapper interfaces do not need to implement any interface or extend any class. As long as the method signature can be used to uniquely identify a corresponding mapped statement. Mapper interfaces can extend other interfaces. Be sure that you have the statements in the appropriate namespace when using XML binding to Mapper interfaces. Also, the only limitation is that you cannot have the same method signature in two interfaces in a hierarchy (a bad idea anyway). A mapperclass is simply an interface with method definitions that match up against the SqlSession methods. You can create some strings with the SQL code. Remember it has to be the same code as in XML Configuration. package com.loiane.data; import java.util.List; import org.apache.ibatis.annotations.Delete;import org.apache.ibatis.annotations.Insert;import org.apache.ibatis.annotations.Options;import org.apache.ibatis.annotations.Param;import org.apache.ibatis.annotations.Result;import org.apache.ibatis.annotations.Results;import org.apache.ibatis.annotations.Select;import org.apache.ibatis.annotations.Update; import com.loiane.model.Contact; public interface ContactMapper { final String SELECT_ALL = "SELECT * FROM CONTACT"; final String SELECT_BY_ID = "SELECT * FROM CONTACT WHERE CONTACT_ID = #{id}"; final String UPDATE = "UPDATE CONTACT SET CONTACT_EMAIL = #{email}, CONTACT_NAME = #{name}, CONTACT_PHONE = #{phone} WHERE CONTACT_ID = #{id}"; final String UPDATE_NAME = "UPDATE CONTACT SET CONTACT_NAME = #{name} WHERE CONTACT_ID = #{id}"; final String DELETE = "DELETE FROM CONTACT WHERE CONTACT_ID = #{id}"; final String INSERT = "INSERT INTO CONTACT (CONTACT_EMAIL, CONTACT_NAME, CONTACT_PHONE) VALUES (#{name}, #{phone}, #{email})"; /** * Returns the list of all Contact instances from the database. * @return the list of all Contact instances from the database. */ @Select(SELECT_ALL) @Results(value = { @Result(property="id", column="CONTACT_ID"), @Result(property="name", column="CONTACT_NAME"), @Result(property="phone", column="CONTACT_PHONE"), @Result(property="email", column="CONTACT_EMAIL") }) List selectAll(); /** * Returns a Contact instance from the database. * @param id primary key value used for lookup. * @return A Contact instance with a primary key value equals to pk. null if there is no matching row. */ @Select(SELECT_BY_ID) @Results(value = { @Result(property="id"), @Result(property="name", column="CONTACT_NAME"), @Result(property="phone", column="CONTACT_PHONE"), @Result(property="email", column="CONTACT_EMAIL") }) Contact selectById(int id); /** * Updates an instance of Contact in the database. * @param contact the instance to be updated. */ @Update(UPDATE) void update(Contact contact); /** * Updates an instance of Contact in the database. * @param name name value to be updated. * @param id primary key value used for lookup. */ void updateName(@Param("name") String name, @Param("id") int id); /** * Delete an instance of Contact from the database. * @param id primary key value of the instance to be deleted. */ @Delete(DELETE) void delete(int id); /** * Insert an instance of Contact into the database. * @param contact the instance to be persisted. */ @Insert(INSERT) @Options(useGeneratedKeys = true, keyProperty = "id") void insert(Contact contact);} @Select The @Select annotation is very simple. Let’s take a look at the first select statment of this class: selectAll. Note that you don’t need to use the mapping if your database table columns mach the name of the class atributes. I used different names to use the @Result annotation. If table columns match with atribute names, you don’t need to use the @Result annotation. Not let’s take a look on the second method: selectById. Notice that we have a parameter. It is a simple parameter – easy to use when you have a single parameter. @Update Let’s say you want to update all the columns. You can pass the object as parameter and iBatis will do all the magic for you. Remember to mach the parameter name with the atribute name, otherwise iBatis can get confused, Now let’s say you want to use 2 or 3 paramaters, and they don’t belong to an object. If you take a look at iBatis XML configuration you will see you have to set the option parameterType (remember?) and specify you parameter type in it, in another words, you can use only one parameter. If you are using annotation, you can use more than one parameter using the @Param annotation. @Delete The @Delete annotation is also very simple. It follows the previous rules related to parameters. @Insert The @Insert annotation also follows the rules related to parameters. You can use an object or use the @Param annotation to specify more than one parameter. What about the generation key? Well, if your database supports auto generation key, you can set it up using annotations with the @Options annotation. You will need to specify the option useGeneratedKeys and keyProperty. If your database does not support auto generation key, sorry, but I still did not figure out how to do it (in last case, use can do it manually and then pass as parameter to your insert query). 3 – MyBatisConnectionFactory Every MyBatis application centers around an instance of SqlSessionFactory. A SqlSessionFactory instance can be acquired by using the SqlSessionFactoryBuilder. SqlSessionFactoryBuilder can build a SqlSessionFactory instance from an XML configuration file, of from a custom prepared instance of the Configuration class. An observation about this file: on the previous example, we set the mappers on the iBatis Configuration XML file. Using annotations, we will set the mapper manually. In a future example, I’ll show you how to work with XML and Annotations together. package com.loiane.dao; import java.io.FileNotFoundException;import java.io.IOException;import java.io.Reader; import org.apache.ibatis.io.Resources;import org.apache.ibatis.session.SqlSessionFactory;import org.apache.ibatis.session.SqlSessionFactoryBuilder; import com.loiane.data.ContactMapper; public class MyBatisConnectionFactory { private static SqlSessionFactory sqlSessionFactory; static { try { String resource = "SqlMapConfig.xml"; Reader reader = Resources.getResourceAsReader(resource); if (sqlSessionFactory == null) { sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader); sqlSessionFactory.getConfiguration().addMapper(ContactMapper.class); } } catch (FileNotFoundException fileNotFoundException) { fileNotFoundException.printStackTrace(); } catch (IOException iOException) { iOException.printStackTrace(); } } public static SqlSessionFactory getSqlSessionFactory() { return sqlSessionFactory; } } 4 – ContactDAO Now that we set up everything needed, let’s create our DAO. To call the sql statments, we need to do one more configuration, that is to set and get the mapper from the SqlSessionFactory. Then we just need to call the mapper method. It is a little bit different, but it does the same thing. package com.loiane.dao; import java.util.List; import org.apache.ibatis.session.SqlSession;import org.apache.ibatis.session.SqlSessionFactory; import com.loiane.data.ContactMapper;import com.loiane.model.Contact; public class ContactDAO { private SqlSessionFactory sqlSessionFactory; public ContactDAO(){ sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); } /** * Returns the list of all Contact instances from the database. * @return the list of all Contact instances from the database. */ public List selectAll(){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); List list = mapper.selectAll(); return list; } finally { session.close(); } } /** * Returns a Contact instance from the database. * @param id primary key value used for lookup. * @return A Contact instance with a primary key value equals to pk. null if there is no matching row. */ public Contact selectById(int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); Contact list = mapper.selectById(id); return list; } finally { session.close(); } } /** * Updates an instance of Contact in the database. * @param contact the instance to be updated. */ public void update(Contact contact){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.update(contact); session.commit(); } finally { session.close(); } } /** * Updates an instance of Contact in the database. * @param name name value to be updated. * @param id primary key value used for lookup. */ public void updateName(String name, int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.updateName(name, id); session.commit(); } finally { session.close(); } } /** * Insert an instance of Contact into the database. * @param contact the instance to be persisted. */ public void insert(Contact contact){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.insert(contact); session.commit(); } finally { session.close(); } } /** * Delete an instance of Contact from the database. * @param id primary key value of the instance to be deleted. */ public void delete(int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.delete(id); session.commit(); } finally { session.close(); } } 5 – Mapper Configuration File The MyBatis XML configuration file contains settings and properties that have a dramatic effect on how MyBatis behaves. This time, we do not need to configure alias or xml config files. We did all the magic with annotations, so we have a simple config file with only the information about the database we want to connect with. Download I suggest you to take a look at the org.apache.ibatis.annotations package and try to find out what each annotation can do. Unfortunatelly, you won’t find much documentation or examples on MyBatis website. I also created a TestCase class. If you want to download the complete sample project, you can get it from my GitHub account: https://github.com/loiane/ibatis-annotations-helloworld If you want to download the zip file of the project, just click on download: There are more articles about iBatis to come. Stay tooned! In next articles, I’m going to demonstrate how to implement the feature using XML and then Annotations (when it is possible). Happy Coding! From http://loianegroner.com/2011/02/getting-started-with-ibatis-mybatis-annotations/

February 22, 2011

by Loiane Groner

· 72,190 Views

Getting Started with iBatis (MyBatis): XML Configuration

This tutorial will walk you through how to setup iBatis (MyBatis) in a simple Java project and will present examples using simple insert, update, select and delete statements.

February 18, 2011

by Loiane Groner

· 112,623 Views

Solve Foreign-key Problems in DBUnit Test Data

If you create small per-test datasets, as DBUnit advises, you’ll get intermittent build failures due to foreign-key violations. This post explains (1) why this happens, (2) why small per-test datasets are still a good idea, and (3) one simple way to get around the problem. NB When I searched for solutions to this problem, I discovered that other kinds of foreign-key problem come up with DBUnit. Some people have circular dependencies in their relational database schemas, which stops DBUnit from loading the test data. If such is your case, I’m sorry to say that this post won’t help you with it, and your best option is probably to just take yourself outside and shoot yourself now. (Although some people seem to chosen instead to disable foreign key checking during test runs.) What causes the foreign-key violations The cause of the problem is simple, and illustrated by a trivial example. Suppose you have two entity classes, HitchHiker and SpaceShip. The HitchHiker table has a foreign key that references SpaceShip. The test data for HitchHikerDaoTest contains lines from both tables, whereas the test data for SpaceShipDaoTest contains only lines from SpaceShip. DBUnit’s default setup operation, CLEAN_INSERT, wipes data from every table occurring in the test dataset and then inserts the lines listed in that dataset. When SpaceShipDaoTest runs, DBUnit will start by deleting everything in the SpaceShip table. If any HitchHikers are currently riding in the SpaceShips that are about to be deleted, the database will object to their untimely eviction (I’m not sure whether the error message will read like Vogon poetry, though). If you start from an empty database, and execute SpaceShipDaoTest and then HitchHikerDaoTest, you’ll be fine; but if you do it in the other order, your build will fail. It’s that second-worst kind of bug, the unpredictable kind, since you don’t (usually) specify the order in which tests run. After all, they’re supposed to be independent! So you may well find that you have no problems for months on end, until one day you get an error running individual tests in a particular sequence, or Maven changes the order in which it runs your tests on the CI server, and BOOM! Why you should still use small independent datasets It’s tempting to circumvent the problem by using a single monolithic dataset for all your integration tests. I’ve tried this, and I advise against it. A big data file is hard to work with: you waste a lot of time scrolling around looking for the line you need, and it’s very hard to follow and understand foreign-key relations. Worse still: by modifying the data to make one test pass, you can easily accidentally break another one. The larger the dataset and the test suite become, the more fragile they get, and the more painstaking it becomes to modify them. How to avoid the foreign-key problem with small independent datasets One working but unsatisfactory solution would be to pad out every XML dataset with the list of all tables touched in the test suite. It’s unsatisfactory because the only way to add a table into a FlatXmlDataSet is to list a line of that table — a FlatXmlDataSet can’t contain empty tables — and there’s no justification for polluting the test data with lines from tables that are not part of the test. The solution I found was to use a DTD to clean tables before tests. Every XML file has different contents, but they all reference a single DTD which lists all the tables involved in the test suite. The DTD is easy to generate from the database schema, and useful for auto-complete and catching typos in column names, so you should probably already be using one. The code to exploit its contents is very simple: private IDataSet loadTestDataWithDtdTableList(String dtdFilename) throws IOException, DataSetException, SQLException { Reader dtdReader = new FileReader(new ClassPathResource(dtdFilename).getFile()); IDataSet dtdDataset = new FlatDtdDataSet(dtdReader); FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder(); builder.setMetaDataSet(new DatabaseDataSet(dbUnitConnection, false)); IDataSet xmlDataset = builder.build(asFile(xmlFilename)); return new CompositeDataSet(dtdDataset, xmlDataset);} How it works: DBUnit provides a facility to load a dataset from a DTD. This dataset contains all the tables listed in the DTD, but of course empty of data. The DTD dataset is then combined with a FlatXmlDataSet representing your test data. The graphic below illustrates the composite dataset that would be produced for the SpaceShip example. If you have dictionary tables whose contents never change, you can and should leave them out of the DTD as well as out of the XML datasets, to improve test performance a little. One further detail: you should close the FileReader after test setup. I couldn’t find a hook into the end of the test setup operation (short of writing my own DatabaseOperation), so I saved the reference as a member variable and hooked the close() call into the tear-down phase of the test. NB For a more complete code example, see this Gist snippet of a base class for TestNG+Spring+DBUnit tests that adds the above-described DBUnit setup operation to Spring’s TestNG helper class. Happy database testing! From http://www.andrewspencer.net/2011/solve-foreign-key-problems-in-dbunit-test-data/

February 16, 2011

by Andrew Spencer

· 27,875 Views

Table sorting & pagination with jQuery and Razor in ASP.NET MVC

Introduction jQuery enjoys living inside pages which are built on top of ASP.NET MVC Framework. The ASP.NET MVC is a place where things are organized very well and it is quite hard to make them dirty, especially because the pattern enforces you on purity (you can still make it dirty if you want so ;) ). We all know how easy is to build a HTML table with a header row, footer row and table rows showing some data. With ASP.NET MVC we can do this pretty easy, but, the result will be pure HTML table which only shows data, but does not includes sorting, pagination or some other advanced features that we were used to have in the ASP.NET WebForms GridView. Ok, there is the WebGrid MVC Helper, but what if we want to make something from pure table in our own clean style? In one of my recent projects, I’ve been using the jQuery tablesorter and tablesorter.pager plugins that go along. You don’t need to know jQuery to make this work… You need to know little CSS to create nice design for your table, but of course you can use mine from the demo… So, what you will see in this blog is how to attach this plugin to your pure html table and a div for pagination and make your table with advanced sorting and pagination features. Demo Project Resources The resources I’m using for this demo project are shown in the following solution explorer window print screen: Content/images – folder that contains all the up/down arrow images, pagination buttons etc. You can freely replace them with your own, but keep the names the same if you don’t want to change anything in the CSS we will built later. Content/Site.css – The main css theme, where we will add the theme for our table too Controllers/HomeController.cs – The controller I’m using for this project Models/Person.cs – For this demo, I’m using Person.cs class Scripts – jquery-1.4.4.min.js, jquery.tablesorter.js, jquery.tablesorter.pager.js – required script to make the magic happens Views/Home/Index.cshtml – Index view (razor view engine) the other items are not important for the demo. ASP.NET MVC 1. Model In this demo I use only one Person class which defines Person entity with several properties. You can use your own model, maybe one which will access data from database or any other resource. Person.cs public class Person { public string Name { get; set; } public string Surname { get; set; } public string Email { get; set; } public int? Phone { get; set; } public DateTime? DateAdded { get; set; } public int? Age { get; set; } public Person(string name, string surname, string email, int? phone, DateTime? dateadded, int? age) { Name = name; Surname = surname; Email = email; Phone = phone; DateAdded = dateadded; Age = age; } } 2. View In our example, we have only one Index.chtml page where Razor View engine is used. Razor view engine is my favorite for ASP.NET MVC because it’s very intuitive, fluid and keeps your code clean. 3. Controller Since this is simple example with one page, we use one HomeController.cs where we have two methods, one of ActionResult type (Index) and another GetPeople() used to create and return list of people. HomeController.cs public class HomeController : Controller { // // GET: /Home/ public ActionResult Index() { ViewBag.People = GetPeople(); return View(); } public List GetPeople() { List listPeople = new List(); listPeople.Add(new Person("Hajan", "Selmani", "[email protected]", 070070070,DateTime.Now, 25)); listPeople.Add(new Person("Straight", "Dean", "[email protected]", 123456789, DateTime.Now.AddDays(-5), 35)); listPeople.Add(new Person("Karsen", "Livia", "[email protected]", 46874651, DateTime.Now.AddDays(-2), 31)); listPeople.Add(new Person("Ringer", "Anne", "[email protected]", null, DateTime.Now, null)); listPeople.Add(new Person("O'Leary", "Michael", "[email protected]", 32424344, DateTime.Now, 44)); listPeople.Add(new Person("Gringlesby", "Anne", "[email protected]", null, DateTime.Now.AddDays(-9), 18)); listPeople.Add(new Person("Locksley", "Stearns", "[email protected]", 2135345, DateTime.Now, null)); listPeople.Add(new Person("DeFrance", "Michel", "[email protected]", 235325352, DateTime.Now.AddDays(-18), null)); listPeople.Add(new Person("White", "Johnson", null, null, DateTime.Now.AddDays(-22), 55)); listPeople.Add(new Person("Panteley", "Sylvia", null, 23233223, DateTime.Now.AddDays(-1), 32)); listPeople.Add(new Person("Blotchet-Halls", "Reginald", null, 323243423, DateTime.Now, 26)); listPeople.Add(new Person("Merr", "South", "[email protected]", 3232442, DateTime.Now.AddDays(-5), 85)); listPeople.Add(new Person("MacFeather", "Stearns", "[email protected]", null, DateTime.Now, null)); return listPeople; } } TABLE CSS/HTML DESIGN Now, lets start with the implementation. First of all, lets create the table structure and the main CSS. 1. HTML Structure @{ Layout = null; } value value value So, this is the main structure you need to create for each of your tables where you want to apply the functionality we will create. Of course the scripts are referenced once ;). As you see, our table has class tablesorter and also we have a div with id pager. In the next steps we will use both these to create the needed functionalities. The complete Index.cshtml coded to get the data from controller and display in the page is: NameSurnameEmailPhoneDate Added @{ foreach (var p in ViewBag.People) { @[email protected]@[email protected]@p.DateAdded } } NameSurnameEmailPhoneDate Added 510203040 So, mainly the structure is the same. I have added @Razor code to create table with data retrieved from the ViewBag.People which has been filled with data in the home controller. 2. CSS Design The CSS code I’ve created is: /* DEMO TABLE */ body { font-size: 75%; font-family: Verdana, Tahoma, Arial, "Helvetica Neue", Helvetica, Sans-Serif; color: #232323; background-color: #fff; } table { border-spacing:0; border:1px solid gray;} table.tablesorter thead tr .header { background-image: url(images/bg.png); background-repeat: no-repeat; background-position: center right; cursor: pointer; } table.tablesorter tbody td { color: #3D3D3D; padding: 4px; background-color: #FFF; vertical-align: top; } table.tablesorter tbody tr.odd td { background-color:#F0F0F6; } table.tablesorter thead tr .headerSortUp { background-image: url(images/asc.png); } table.tablesorter thead tr .headerSortDown { background-image: url(images/desc.png); } table th { width:150px; border:1px outset gray; background-color:#3C78B5; color:White; cursor:pointer; } table thead th:hover { background-color:Yellow; color:Black;} table td { width:150px; border:1px solid gray;} PAGINATION AND SORTING Now, when everything is ready and we have the data, lets make pagination and sorting functionalities 1. jQuery Scripts referencing 2. jQuery Sorting and Pagination script So, with only two lines of code, I’m using both tablesorter and tablesorterPager plugins, giving some options to both these. Options added: tablesorter - widthFixed: true – gives fixed width of the columns tablesorter - sortList[[0,0]] – An array of instructions for per-column sorting and direction in the format: [[columnIndex, sortDirection], ... ] where columnIndex is a zero-based index for your columns left-to-right and sortDirection is 0 for Ascending and 1 for Descending. A valid argument that sorts ascending first by column 1 and then column 2 looks like: [[0,0],[1,0]] (source: http://tablesorter.com/docs/) tablesorterPager – container: $(“#pager”) – tells the pager container, the div with id pager in our case. tablesorterPager – size: the default size of each page, where I get the default value selected, so if you put selected to any other of the options in your select list, you will have this number of rows as default per page for the table too. END RESULTS 1. Table once the page is loaded (default results per page is 5 and is automatically sorted by 1st column as sortList is specified) 2. Sorted by Phone Descending 3. Changed pagination to 10 items per page 4. Sorted by Phone and Name (use SHIFT to sort on multiple columns) 5. Sorted by Date Added 6. Page 3, 5 items per page ADDITIONAL ENHANCEMENTS We can do additional enhancements to the table. We can make search for each column. I will cover this in one of my next blogs. Stay tuned. DEMO PROJECT You can download demo project source code from HERE. CONCLUSION Once you finish with the demo, run your page and open the source code. You will be amazed of the purity of your code. Working with pagination in client side can be very useful. One of the benefits is performance, but if you have thousands of rows in your tables, you will get opposite result when talking about performance. Hence, sometimes it is nice idea to make pagination on back-end. So, the compromise between both approaches would be best to combine both of them. I use at most up to 500 rows on client-side and once the user reach the last page, we can trigger ajax postback which can get the next 500 rows using server-side pagination of the same data. I would like to recommend the following blog post http://weblogs.asp.net/gunnarpeipman/archive/2010/09/14/returning-paged-results-from-repositories-using-pagedresult-lt-t-gt.aspx, which will help you understand how to return page results from repository. I hope this was helpful post for you. Wait for my next posts ;). Please do let me know your feedback. Best Regards, Hajan

February 14, 2011

by Hajan Selmani

· 77,354 Views

Introduction to iBatis (MyBatis), An alternative to Hibernate and JDBC

i started to write a new article series about ibatis / mybatis . this is the first article and it will walk you through what is ibatis / mybatis and why you should use it. for those who does not know ibatis / mybatis yet, it is a persistence framework – an alternative to jdbc and hibernate , available for java and .net platforms. i’ve been working with it for almost two years, and i am enjoying it! the first thing you may notice in this and following articles about ibatis/mybatis is that i am using both ibatis and mybatis terms. why? until june 2010, ibatis was under apache license and since then, the framework founders decided to move it to google code and they renamed it to mybatis. the framework is still the same though, it just has a different name now. i gathered some resources, so i am just going to quote them: what is mybatis/ibatis? the mybatis data mapper framework makes it easier to use a relational database with object-oriented applications. mybatis couples objects with stored procedures or sql statements using a xml descriptor. simplicity is the biggest advantage of the mybatis data mapper over object relational mapping tools.to use the mybatis data mapper, you rely on your own objects, xml, and sql. there is little to learn that you don’t already know. with the mybatis data mapper, you have the full power of both sql and stored procedures at your fingertips. ( www.mybatis.org ) ibatis is based on the idea that there is value in relational databases and sql, and that it is a good idea to embrace the industrywide investment in sql. we have experiences whereby the database and even the sql itself have outlived the application source code, and even multiple versions of the source code. in some cases we have seen that an application was rewritten in a different language, but the sql and database remained largely unchanged. it is for such reasons that ibatis does not attempt to hide sql or avoid sql. it is a persistence layer framework that instead embraces sql by making it easier to work with and easier to integrate into modern object-oriented software. these days, there are rumors that databases and sql threaten our object models, but that does not have to be the case. ibatis can help to ensure that it is not. ( ibatis in action book) so… what is ibatis ? a jdbc framework developers write sql, ibatis executes it using jdbc. no more try/catch/finally/try/catch. an sql mapper automatically maps object properties to prepared statement parameters. automatically maps result sets to objects. support for getting rid of n+1 queries. a transaction manager ibatis will provide transaction management for database operations if no other transaction manager is available. ibatis will use external transaction management (spring, ejb cmt, etc.) if available. great integration with spring, but can also be used without spring (the spring folks were early supporters of ibatis). what isn’t ibatis ? an orm does not generate sql does not have a proprietary query language does not know about object identity does not transparently persist objects does not build an object cache essentially, ibatis is a very lightweight persistence solution that gives you most of the semantics of an o/r mapping toolkit, without all the drama. in other words ,ibatis strives to ease the development of data-driven applications by abstracting the low-level details involved in database communication (loading a database driver, obtaining and managing connections, managing transaction semantics, etc.), as well as providing higher-level orm capabilities (automated and configurable mapping of objects to sql calls, data type conversion management, support for static queries as well as dynamic queries based upon an object’s state, mapping of complex joins to complex object graphs, etc.). ibatis simply maps javabeans to sql statements using a very simple xml descriptor. simplicity is the key advantage of ibatis over other frameworks and object relational mapping tools.( http://www.developersbook.com ) who is using ibatis/mybatis? see the list in this link: http://www.apachebookstore.com/confluence/oss/pages/viewpage.action?pageid=25 i think the biggest case is myspace , with millions of users. very nice! this was just an introduction, so in next articles i will show how to create an application using ibatis/mybatis – step-by-step. enjoy! from http://loianegroner.com/2011/02/introduction-to-ibatis-mybatis-an-alternative-to-hibernate-and-jdbc/

February 9, 2011

by Loiane Groner

· 42,479 Views · 5 Likes

Spring Data with Redis

The Spring Data project provides a solution for accessing data stored in new emerging technologies like NoSQL databases and cloud based services. When we look into the SpringSource git repository we see a lot of spring-data sub-projects: spring-data-commons: common interfaces and utility class for other spring-data projects. spring-data-column: support for column based databases. It has not started yet, but there will be support for Cassandra and HBase spring-data-document: support for document databases. Currently MongoDB and CouchDB are supported. spring-data-graph: support for graph based databases. Currently Neo4j is supported. spring-data-keyvalue: support for key-value databases. Currently Redis and Riak are supported and probably Membase will be supported in future. spring-data-jdbc-ext: JDBC extensions, as example Oracle RAC connection failover is implemented. spring-data-jpa: simplifies JPA based data access layer. I would like to share with you how you can use Redis. The first step is to download it from the redis.io web page. try.redis-db.com is a useful site where we can run Redis commands. It also provides a step by step tutorial. This tutorial shows us all structures that Redis supports (list, set, sorted set and hashes) and some useful commands. A lot of reputable sites use Redis today. After download and unpacking we should compile Redis (version 2.2, the release candidate is the preferable one to use since some commands do not work in version 2.0.4). make sudo make install Once we run these commands we are all set to run the following five commands: redis-benchmark - for benchmarking Redis server redis-check-aof - check the AOF (Aggregate Objective Function), and it can repair that. redis-check-dump - check rdb files for unprocessable opcodes. redis-cli - Redis client. redis-server - Redis server. We can test Redis server. redis-server [1055] 06 Jan 18:19:15 # Warning: no config file specified, using the default config. In order to specify a config file use 'redis-server /path/to/redis.conf' [1055] 06 Jan 18:19:15 * Server started, Redis version 2.0.4 [1055] 06 Jan 18:19:15 * The server is now ready to accept connections on port 6379 [1055] 06 Jan 18:19:15 - 0 clients connected (0 slaves), 1074272 bytes in use and Redis client. redis-cli redis> set my-super-key "my-super-value" OK Now we create a simple Java project in order to show how simple a spring-data-redis module really is. mvn archetype:create -DgroupId=info.pietrowski -DpackageName=info.pietrowski.redis -DartifactId=spring-data-redis -Dpackage=jar Next we have to add in pom.xml milestone spring repository, and add spring-data-redis as a dependency. After that all required dependencies will be fetched. Next we create a resources folder under the main folder, and create application.xml which will have all the configuration. We can configure the JedisConnectionFactory, in two different ways, One - we can provide a JedisShardInfo object in shardInfo property. Two - we can provide host (default localhost), port (default 6379), password (default empty) and timeout (default 2000) properties. One thing to keep in mind is that the JedisShardInfo object has precedence and allows to setup weight, but only allows constructor injection. We can setup the factory to use connection pooling by setting the value of the pooling property to 'true' (default). See application.xml comments to see three different way of configuration. Note: There are two different libraries supported: Jedis and JRedis. They have very similar names and both have the same factory name. See the difference: org.springframework.data.keyvalue.redis.connection.jedis.JedisConnectionFactory org.springframework.data.keyvalue.redis.connection.jredis.JredisConnectionFactory Similar to what we do in Spring, we configure the template object by providing it with a connection factory. We will perform all the operations through this template object. By default we need to provide only Connection Factory, but there are more properties we can provide: exposeConnection (default false) - if we return real connection or proxy object. keySerializer, hashKeySerializer, valueSerializer, hashValueSerializer (default JdkSerializationRedisSerializer) which delegates serialization to Java serialization mechanism. stringSerializer (default StringRedisSerializer) which is simple String to byte[] (and back) serializer with UTF8 encoding. We are ready to execute some code which will be cooperating with the Redis instance. Spring-Data provides us with two ways of interaction, First is by using the execute method and providing a RedisCallback object. Second is by using *Operations helpers (these will be explained later) When we are using RedisCallback we have access to low level Redis commands, see this list of interfaces (I won't put all the methods here because it is huge list): RedisConnection - gathers all Redis commands plus connection management. RedisCommands - gathers all Redis commands (listed beloved). RedisHashCommands - Hash-specific Redis commands. RedisListCommands - List-specific Redis commands. RedisSetCommands - Set-specific Redis commands. RedisStringCommands - key/value specific Redis commands. RedisTxCommands - Transaction/Batch specific Redis commands. RedisZSetCommands - Sorted Set-specific Redis commands. Check RedisCallbackExample class, this was the hard way and the problem is we have to convert our objects into byte arrays in both directions, the second way is easier. Spring Data provides for us with Operations objects, so we have much more simpler API and all byte<->object conversion is made by serializer we setup (or the default one). Higher level API (you will easily recognize *Operation *Commands equivalents): HashOperations - Redis hash operations. ListOperations - Redis list operations. SetOperations - Redis set operations. ValueOperations - Redis 'string' operations. ZSetOperations - Redis sorted set operations. Most of methods get key as first parameters so we have an even better API for multiple operations on the same key: BoundHashOperations - Redis hash operations for specific key. BoundListOperations - Redis list operations for specific key. BoundSetOperations - Redis set operations for specific key. BoundValueOperations - Redis 'string' operations for specific key. BoundZSetOperations - Redis sorted set operations for specific key. Check RedisCallbackExample class to see some easy examples of *Operations usage. One important thing to mention is that you should use stringSerializers for keys, otherwise you will have problems from other clients, because standard serialization adds class information. Otherwise you end up keys such as: "\xac\xed\x00\x05t\x00\x05atInt" "\xac\xed\x00\x05t\x00\nmySuperKey" "\xac\xed\x00\x05t\x00\bsuperKey" Up until now we have just checked the API for Redis, but Spring Data offers more for us. All the cool stuff is in org.springframework.data.keyvalue.redis.support package and all sub-packages. We have: RedisAtomicInteger - Atomic integer (CAS operation) backed by Redis. RedisAtomicLong - Same as previous for Long. RedisList - Redis extension for List, Queue, Deque, BlockingDeque and BlockingQueue with two additional methods List range(start, end) and RedisList trim(start, end). RedisSet - Redis extension for Set with additional methods: diff, diffAndStore, intersect, intersectAndStore, union, unionAndStore. RedisZSet - Redis extension for SortedSet. Note that Comparator is not applicable here so this interface extends normal Set and provide proper methods similar to SortedSet. RedisMap - Redis extension for Map with additional Long increment(key, delta) method Every interface currently has one Default implementation. Check application-support.xml for examples of configuration and RedisSupportClassesExample for examples of use. There is lot of useful information in the comments as well. Summary The library is a first milestone release so there are minor bugs, the documentation isn't as perfect as we used to and the current version needs no stable Redis server. But this is definitely a great library which allows us to use all this cool NoSQL stuff in a "standard" Spring Data Access manner. Awesome job! This post is only useful if you checkout the code: from bitbucket , for the lazy ones here is spring-data-redis zip file as well. This post is originally from http://pietrowski.info/2011/01/spring-data-redis-tutorial/

February 3, 2011

by Sebastian Pietrowski

· 30,998 Views

Apache Solr: Get Started, Get Excited!

we've all seen them on various websites. crappy search utilities. they are a constant reminder that search is not something you should take lightly when building a website or application. search is not just google's game anymore. when a java library called lucene was introduced into the apache ecosystem, and then solr was built on top of that, open source developers began to wield some serious power when it came to customizing search features. in this article you'll be introduced to apache solr and a wealth of applications that have been built with it. the content is divided as follows: introduction setup solr applications summary 1. introduction apache solr is an open source search server. it is based on the full text search engine called apache lucene . so basically solr is an http wrapper around an inverted index provided by lucene. an inverted index could be seen as a list of words where each word-entry links to the documents it is contained in. that way getting all documents for the search query "dzone" is a simple 'get' operation. one advantage of solr in enterprise projects is that you don't need any java code, although java itself has to be installed. if you are unsure when to use solr and when lucene, these answers could help. if you need to build your solr index from websites, you should take a look into the open source crawler called apache nutch before creating your own solution. to be convinced that solr is actually used in a lot of enterprise projects, take a look at this amazing list of public projects powered by solr . if you encounter problems then the mailing list or stackoverflow will help you. to make the introduction complete i would like to mention my personal link list and the resources page which lists books, articles and more interesting material. 2. setup solr 2.1. installation as the very first step, you should follow the official tutorial which covers the basic aspects of any search use case: indexing - get the data of any form into solr. examples: json, xml, csv and sql-database. this step creates the inverted index - i.e. it links every term to its documents. querying - ask solr to return the most relevant documents for the users' query to follow the official tutorial you'll have to download java and the latest version of solr here . more information about installation is available at the official description . next you'll have to decide which web server you choose for solr. in the official tutorial, jetty is used, but you can also use tomcat. when you choose tomcat be sure you are setting the utf-8 encoding in the server.xml . i would also research the different versions of solr, which can be quite confusing for beginners: the current stable version is 1.4.1. use this if you need a stable search and don't need one of the latest features. the next stable version of solr will be 3.x the versions 1.5 and 2.x will be skipped in order to reach the same versioning as lucene. version 4.x is the latest development branch. solr 4.x handles advanced features like language detection via tika, spatial search , results grouping (group by field / collapsing), a new "user-facing" query parser ( edismax handler ), near real time indexing, huge fuzzy search performance improvements, sql join-a like feature and more. 2.2. indexing if you've followed the official tutorial you have pushed some xml files into the solr index. this process is called indexing or feeding. there are a lot more possibilities to get data into solr: using the data import handler (dih) is a really powerful language neutral option. it allows you to read from a sql database, from csv, xml files, rss feeds, emails, etc. without any java knowledge. dih handles full-imports and delta-imports. this is necessary when only a small amount of documents were added, updated or deleted. the http interface is used from the post tool, which you have already used in the official tutorial to index xml files. client libraries in different languages also exist. (e.g. for java (solrj) or python ). before indexing you'll have to decide which data fields should be searchable and how the fields should get indexed. for example, when you have a field with html in it, then you can strip irrelevant characters , tokenize the text into 'searchable terms', lower case the terms and finally stem the terms . in contrast, if you would have a field with text in it that should not be interpreted (e.g. urls) you shouldn't tokenize it and use the default field type string. please refer to the official documentation about field and field type definitions in the schema.xml file. when designing an index keep in mind the advice from mauricio : "the document is what you will search for. " for example, if you have tweets and you want to search for similar users, you'll need to setup a user index - created from the tweets. then every document is a user. if you want to search for tweets, then setup a tweet index; then every document is a tweet. of course, you can setup both indices with the multi index options of solr. please also note that there is a project called solr cell which lets you extract the relevant information out of several different document types with the help of tika. 2.3. querying for debugging it is very convenient to use the http interface with a browser to query solr and get back xml. use firefox and the xml will be displayed nicely: you can also use the velocity contribution , a cross-browser tool, which will be covered in more detail in the section about 'search application prototyping' . to query the index you can use the dismax handler or standard query handler . you can filter and sort the results: q=superman&fq=type:book&sort=price asc you can also do a lot more ; one other concept is boosting. in solr you can boost while indexing and while querying. to prefer the terms in the title write: q=title:superman^2 subject:superman when using the dismax request handler write: q=superman&qf=title^2 subject check out all the various query options like fuzzy search , spellcheck query input , facets , collapsing and suffix query support . 3. applications now i will list some interesting use cases for solr - in no particular order. to see how powerful and flexible this open source search server is. 3.1. drupal integration the drupal integration can be seen as generic use case to integrate solr into php projects. for the php integration you have the choice to either use the http interface for querying and retrieving xml or json. or to use the php solr client library . here is a screenshot of a typical faceted search in drupal : for more information about faceted search look into the wiki of solr . more php projects which integrates solr: open source typo3- solr module magento enterprise - solr module . the open source integration is out dated. oxid - solr module . no open source integration available. 3.2. hathi trust the hathi trust project is a nice example that proves solr's ability to search big digital libraries. to quote directly from the article : "... the index for our one million book index is over 200 gigabytes ... so we expect to end up with a two terabyte index for 10 million books" other examples for libraries: vufind - aims to replace opac internet archive national library of australia 3.3. auto suggestions mainly, there are two approaches to implement auto-suggestions (also called auto-completion) with solr: via facets or via ngramfilterfactory . to push it to the extreme you can use a lucene index entirely in ram. this approach is used in a large music shop in germany. live examples for auto suggestions: kaufda.de 3.4. spatial search applications when mentioning spatial search, people have geographical based applications in mind. with solr, this ordinary use case is attainable . some examples for this are : city search - city guides yellow pages kaufda.de spatial search can be useful in many different ways : for bioinformatics, fingerprints search, facial search, etc. (getting the fingerprint of a document is important for duplicate detection). the simplest approach is implemented in jetwick to reduce duplicate tweets, but this yields a performance of o(n) where n is the number of queried terms. this is okay for 10 or less terms, but it can get even better at o(1)! the idea is to use a special hash set to get all similar documents. this technique is called local sensitive hashing . read this nice paper about 'near similarity search and plagiarism analysis' for more information. 3.5. duckduckgo duckduckgo is made with open source and its "zero click" information is done with the help of solr using the dismax query handler: the index for that feature contains 18m documents and has a size of ~12gb. for this case had to tune solr: " i have two requirements that differ a bit from most sites with respect to solr: i generally only show one result, with sometimes a couple below if you click on them. therefore, it was really important that the first result is what people expected. false positives are really bad in 0-click, so i needed a way to not show anything if a match wasn't too relevant. i got around these by a) tweaking dismax and schema and b) adding my own relevancy filter on top that would re-order and not show anything in various situations. " all the rest is done with tuned open source products. to quote gabriel again: "the main results are a hybrid of a lot of things, including external apis, e.g. bing, wolframalpha, yahoo, my own indexes and negative indexes (spam removal), etc. there are a bunch of different types of data i'm working with. " check out the other cool features such as privacy or bang searches . 3.6. clustering support with carrot2 carrot2 is one of the "contributed plugins" of solr. with carrot2 you can support clustering : " clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. " see some research papers regarding clustering here . here is one visual example when applying clustering on the search "pannous" - our company : 3.7. near real time search solr isn't real time yet, but you can tune solr to the point where it becomes near real time, which means that the time ('real time latency') that a document takes to be searchable after it gets indexed is less than 60 seconds even if you need to update frequently. to make this work, you can setup two indices. one write-only index "w" for the indexer and one read-only index "r" for your application. index r refers to the same data directory of w, which has to be defined in the solrconfig.xml of r via: /pathto/indexw/data/ to make sure your users and the r index see the indexed documents of w, you have to trigger an empty commit every 60 seconds: wget -q http://localhost:port/solr/update?stream.body=%3ccommit/%3e -o /dev/null everytime such a commit is triggered a new searcher without any cache entries is created. this can harm performance for visitors hitting the empty cache directly after this commit, but you can fill the cache with static searches with the help of the newsearcher entry in your solrconfig.xml. additionally, the autowarmcount property needs to be tuned, which fills the cache with a newsearcher from old entries. also, take a look at the article 'scaling lucene and solr' , where experts explain in detail what to do with large indices (=> 'sharding') and what to do for high query volume (=> 'replicating'). 3.8. loggly = full text search in logs feeding log files into solr and searching them at near real-time shows that solr can handle massive amounts of data and queries the data quickly. i've setup a simple project where i'm doing similar things , but loggly has done a lot more to make the same task real-time and distributed. you'll need to keep the write index as small as possible otherwise commit time will increase too great. loggly creates a new solr index every 5 minutes and includes this when searching using the distributed capabilities of solr ! they are merging the cores to keep the number of indices small, but this is not as simple as it sounds. watch this video to get some details about their work. 3.9. solandra = solr + cassandra solandra combines solr and the distributed database cassandra , which was created by facebook for its inbox search and then open sourced. at the moment solandra is not intended for production use. there are still some bugs and the distributed limitations of solr apply to solandra too. tthe developers are working very hard to make solandra better. jetwick can now run via solandra just by changing the solrconfig.xml. solandra also has the advantages of being real-time (no optimize, no commit!) and distributed without any major setup involved. the same is true for solr cloud. 3.10. category browsing via facets solr provides facets , which make it easy to show the user some useful filter options like those shown in the "drupal integration" example. like i described earlier , it is even possible to browse through a deep category tree. the main advantage here is that the categories depend on the query. this way the user can further filter the search results with this category tree provided by you. here is an example where this feature is implemented for one of the biggest second hand stores in germany. a click on 'schauspieler' shows its sub-items: other shops: game-change 3.11. jetwick - open twitter search you may have noticed that twitter is using lucene under the hood . twitter has a very extreme use case: over 1,000 tweets per second, over 12,000 queries per second, but the real-time latency is under 10 seconds! however, the relevancy at that volume is often not that good in my opinion. twitter search often contains a lot of duplicates and noise. reducing this was one reason i created jetwick in my spare time. i'm mentioning jetwick here because it makes extreme use of facets which provides all the filters to the user. facets are used for the rss-alike feature (saved searches), the various filters like language and retweet-count on the left, and to get trending terms and links on the right: to make jetwick more scalable i'll need to decide which of the following distribution options to choose: use solr cloud with zookeeper use solandra move from solr to elasticsearch which is also based on apache lucene other examples with a lot of facets: cnet reviews - product reviews. electronics reviews, computer reviews & more. shopper.com - compare prices and shop for computers, cell phones, digital cameras & more. zappos - shoes and clothing. manta.com - find companies. connect with customers. 3.12. plaxo - online address management plaxo.com , which is now owned by comcast, hosts web addresses for more than 40 million people and offers smart search through the addresses - with the help of solr. plaxo is trying to get the latest 'social' information of your contacts through blog posts, tweets, etc. plaxo also tries to reduce duplicates . 3.13. replace fast or google search several users report that they have migrated from a commercial search solution like fast or google search appliance (gsa) to solr (or lucene). the reasons for that migration are different: fast drops linux support and google can make integration problems. the main reason for me is that solr isn't a black box —you can tweak the source code, maintain old versions and fix your bugs more quickly! 3.14. search application prototyping with the help of the already integrated velocity plugin and the data import handler it is possible to create an application prototype for your search within a few hours. the next version of solr makes the use of velocity easier. the gui is available via http://localhost:port/solr/browse if you are a ruby on rails user, you can take a look into flare. to learn more about search application prototyping, check out this video introduction and take a look at these slides. 3.15. solr as a whitelist imagine you are the new google and you have a lot of different types of data to display e.g. 'news', 'video', 'music', 'maps', 'shopping' and much more. some of those types can only be retrieved from some legacy systems and you only want to show the most appropriated types based on your business logic . e.g. a query which contains 'new york' should result in the selection of results from 'maps', but 'new yorker' should prefer results from the 'shopping' type. with solr you can set up such a whitelist-index that will help to decide which type is more important for the search query. for example if you get more or more relevant results for the 'shopping' type then you should prefer results from this type. without the whitelist-index - i.e. having all data in separate indices or systems, would make it nearly impossible to compare the relevancy. the whitelist-index can be used as illustrated in the next steps. 1. query the whitelist-index, 2. decide which data types to display, 3. query the sub-systems and 4. display results from the selected types only. 3.16. future solr is also useful for scientific applications, such as a dna search systems. i believe solr can also be used for completely different alphabets so that you can query nucleotide sequences - instead of words - to get the matching genes and determine which organism the sequence occurs in, something similar to blast . another idea you could harness would be to build a very personalized search. every user can drag and drop their websites of choice and query them afterwards. for example, often i only need stackoverflow, some wikis and some mailing lists with the expected results, but normal web search engines (google, bing, etc.) give me results that are too cluttered. my final idea for a future solr-based app could be a lucene/solr implementation of desktop search. solr's facets would be especially handy to quickly filter different sources (files, folders, bookmarks, man pages, ...). it would be a great way to wade through those extra messy desktops. 4. summary the next time you think about a problem, think about solr! even if you don't know java and even if you know nothing about search: solr should be in your toolbox. solr doesn't only offer professional full text search, it could also add valuable features to your application. some of them i covered in this article, but i'm sure there are still some exciting possibilities waiting for you!

January 25, 2011

by Peter Karussell

· 147,309 Views

Linqer – a nice tool for SQL to LINQ transition

Almost all .NET developers who have been working in several applications up to date are probably familiar with writing SQL queries for specific needs within the application. Before LINQ as a technology came on scene, my daily programming life was about 60-70% of the day writing code either in the front-end (ASPX, JavaScript, jQuery, HTML/CSS etc…) or in the back-end (C#, VB.NET etc…), and about 30-40% writing SQL queries for specific needs used within the application. Now, when LINQ is there, I feel that the percentage for writing SQL queries got down to about 10% per day. I don’t say it won’t change with time depending what technology I use within the projects or what way would be better, but since I’m writing a lot LINQ code in the latest projects, I thought to see if there is a tool that can automatically translate SQL to LINQ so that I can transfer many queries as a LINQ statements within the code. Linqer is a tool that I have tested in the previous two weeks and I see it works pretty good. Even I’m not using it yet to convert SQL to LINQ code because I did it manually before I discovered that Linqer could have really helped me, I would recommend it for those who are just starting with LINQ and have knowledge of writing SQL queries. Let’s pass through several steps so that I will help you get started faster… 1. Go to http://www.sqltolinq.com/ website and download the version you want. There is a Linqer Version 4.0.1 for .NET 4.0 or Linqer Version 3.5.1 for .NET 3.5. 2. Once you download the zip file, extract it and launch the Linqer4Inst.exe then add install location. In the location you will add, the Linqer.exe will be created. 3. Launch the Linqer.exe. Once you run it for first time, the Linqer Connection Pool will be displayed so that you can create connection to your existing Model Click the Add button Right after this, the following window will appear #1 – The name of the connection string you are creating #2 – Click “…” to construct your connection string using Wizard window #3 – Chose your language, either C# or VB #4 – Model LINQ to SQL or LINQ to Entities Right after you select LINQ to SQL, the options to select the files for the Model will be displayed. In our case I will select LINQ to SQL, and here is the current progress So, you can select existing model from your application or you can Generate LINQ to SQL Files so that the *.dbml and *.designer.cs will be automatically filled #5 – At the end, you can chose your context name of the model which will be used when generating the LINQ code Once you are done, click OK. You will get back to the parent window filled with all needed info and click Close. Note: You can later add additional connections in your Linqer Connections Pool from Tools –> Linqer Connections In the root folder where your Linqer.exe is placed, now you have Linqer.ini file containing the Connection string settings. Ok, now lets go to the interesting part. Lets create one (first) simple SQL query and try to translate it to LINQ statement. SQL Query select * from authors a where a.city = 'Oakland' If we add this query to Linqer, here is the result: So, the LINQ code is similar to the SQL code and is easy to read since it’s simple. Also, if you notice, the tool generates class (you can add class name) with prepared code for using in your project. Perfect! Now, lets try to translate a query with two joined tables (little bit more complex): SQL Query select * from employee left join publishers on employee.pub_id = publishers.pub_id where employee.fname like '%a' The LINQ generated code is: from employee in db.Employee join publishers in db.Publishers on employee.Pub_id equals publishers.Pub_id into publishers_join from publishers in publishers_join.DefaultIfEmpty() where employee.Fname.EndsWith("a") select new { employee.Emp_id, employee.Fname, employee.Minit, employee.Lname, employee.Job_id, employee.Job_lvl, employee.Pub_id, employee.Hire_date, Column1 = publishers.Pub_id, Pub_name = publishers.Pub_name, City = publishers.City, State = publishers.State, Country = publishers.Country } So, if you can notice the where clause, we said in the SQL query: ... like "%a" and the corresponding LINQ code in C# is ... EndsWith("a"); - Excellent! And the Class automatically generated by the tool is public class EmployeePubClass { private String _Emp_id; private String _Fname; private String _Minit; private String _Lname; private Int16? _Job_id; private Byte? _Job_lvl; private String _Pub_id; private DateTime? _Hire_date; private String _Column1; private String _Pub_name; private String _City; private String _State; private String _Country; public EmployeePubClass( String AEmp_id, String AFname, String AMinit, String ALname, Int16? AJob_id, Byte? AJob_lvl, String APub_id, DateTime? AHire_date, String AColumn1, String APub_name, String ACity, String AState, String ACountry) { _Emp_id = AEmp_id; _Fname = AFname; _Minit = AMinit; _Lname = ALname; _Job_id = AJob_id; _Job_lvl = AJob_lvl; _Pub_id = APub_id; _Hire_date = AHire_date; _Column1 = AColumn1; _Pub_name = APub_name; _City = ACity; _State = AState; _Country = ACountry; } public String Emp_id { get { return _Emp_id; } } public String Fname { get { return _Fname; } } public String Minit { get { return _Minit; } } public String Lname { get { return _Lname; } } public Int16? Job_id { get { return _Job_id; } } public Byte? Job_lvl { get { return _Job_lvl; } } public String Pub_id { get { return _Pub_id; } } public DateTime? Hire_date { get { return _Hire_date; } } public String Column1 { get { return _Column1; } } public String Pub_name { get { return _Pub_name; } } public String City { get { return _City; } } public String State { get { return _State; } } public String Country { get { return _Country; } } } public class List: List { public List(Pubs db) { var query = from employee in db.Employee join publishers in db.Publishers on employee.Pub_id equals publishers.Pub_id into publishers_join from publishers in publishers_join.DefaultIfEmpty() where employee.Fname.EndsWith("a") select new { employee.Emp_id, employee.Fname, employee.Minit, employee.Lname, employee.Job_id, employee.Job_lvl, employee.Pub_id, employee.Hire_date, Column1 = publishers.Pub_id, Pub_name = publishers.Pub_name, City = publishers.City, State = publishers.State, Country = publishers.Country }; foreach (var r in query) Add(new EmployeePubClass( r.Emp_id, r.Fname, r.Minit, r.Lname, r.Job_id, r.Job_lvl, r.Pub_id, r.Hire_date, r.Column1, r.Pub_name, r.City, r.State, r.Country)); } } Great! We have ready-to-use class for our application and we don't need to type all this code. Besides this way to generate code, you can in same time use this tool to see the db results I like this tool because mainly it’s very easy to use, lightweight and does the job pretty straight forward. You can try the tool and send me feedback using the comments in this blog post.

January 24, 2011

by Hajan Selmani

· 67,483 Views

Deterring “ToMany” Relationships in JPA models

This article considers the issues of one to many relationships from the JPA model, and looks at an alternative strategy to provide more efficient and fine grained data access, to build more robust and lightweight applications and web services. A fairly typical use is to have one entity ‘owned’ by the other in such a way that one entity is said to ‘have’ many instances of the other one. A typical example would be customer and orders : class Customer { @OneToMany(mappedBy="customer") private Set orders; } class Order { @ManyToOne private Customer customer; } In this trivial example, the order belongs to a customer, and the customer has a set of orders. We don’t have a problem with the ManyToOne relationship, especially as it is required in order to map the order back to the customer. When we load an order we will at most get a single reference to a customer. No, our problem is with the value we get from customer.getOrders() as this set of order entities doesn’t really serve any useful purpose and can cause more problems than it solves for the following reasons : Dumb Relationship – It will contain every order for this particular customer when you usually only want a subset of the orders that match a set of criteria. You either have to read them all and filter the ones you don’t want manually (which is what SQL is for) or you end up having to make a call to a method to get the specific entities you are interested in. Unbounded dataset – How many orders a customer has could vary and you could end up with a customer with thousands of orders. Combined with accidental eager fetching and loading a simple list of 10 people could mean loading thousands of entities. Unsecured Access – Sometimes we may want to restrict the items visible to the user based on their security rights. By making it available as a property controlled by JPA we lose that ability or have to implement it further down in the application stack. No Pagination – Similar to the unbounded dataset, you end up throwing the whole list into the pagination components and letting them sort out what to display. In most cases, you need to treat each dataset like it will eventually contain more than 30 records so you really need to consider pagination early. Overgrown object graph – When you request an entity, how much of the object graph do you need? How do you know which pieces to initialize so you can avoid LIEs? This is often the case with JPA, but is also more relevant when you take account of the needs to serialize object graphs to XML or JSON. Sometimes you might need the relationships and sometimes you do not depending on the context you will be using the data in. Rife with pitfalls – Who saves and cascades what and how do you bind one to the other? You create an order, and assign the customer, do you need to then add it to the customers list of orders or not. What happens if you forget to add it to the customer and you save the customer? Whatever strategy you pick for dealing with this will no doubt end up being implemented inconsistently. (Ok, the first four are really different facets of the same problem, that you can’t control the data you are getting back.) So what use are they? Well, they make it really tempting just to use customer.orders in the view which is suitable for some sets of data. They also allow the relationship to be used in ejbql statements, although the inverse of the relationship can also be used in most cases. Specifying this relationship can also allow you to cascade updates/deletes from the customer to the order, but then so can your database. Going Granular The best alternative I’ve found is to provide additional methods to obtain the relational information separate from the model. This more granular approach gives you plenty of ways of obtaining data from the database without the dangers and temptations of bad practices. For example, the Order object still has the Customer reference on it, which we use to obtain lists of orders from the data access layer which can be constrained by customer, time frame, or other criteria depending on where it is being used. Also, it allows data to be fetched when needed without having to define a single initialization strategy using annotations or mapping files. The code that knows what pieces of data it needs will have access to facilities to fetch the specific data it needs. Alternatively, the methods to fetch the data can either be exposed as web services directly or DTO objects can be used to build a data payload to be returned from a single web service that consolidates the calls. Regardless, you don’t need to worry about setting the JPA fetch or XML/JSON serialization policy permanently in the model. Some examples might be to fetch orders for a customer in different ways. public List getOrders(Long customerId) {...} public List getOrders(Long customerId,Date startDate,Date endDate) {...} public List getOrders(SearchCriteria searchCriteria,int firstResult,int pageSize) {...} What about @ManyToMany Good question. In most cases I find that what starts as a many to many relationship can usually be modeled as a separate entity because when you create a many to many relationship, there is usually additional information stored with that relationship. For example, a Users and Groups ManyToMany relationship has many users belonging to many groups and vice versa. The membership however also probably has start and end dates and also maybe a role within that group. This also exhibits one of the earlier problems in that user.getGroupMemberships() would return all group memberships past and present whereas you probably only want the active ones. Modeling it as a separate entity means it becomes an entity with two OneToMany relationships. While there are cases where the many to many relationship is literally just a pair of ids (think blog post tags, many tags to many posts), you could benefit at a later date by using an entity if you decide to add additional information into the relationship. In summary, moving relational fetches out of the data model and into the data layer means you remove some of the temptations of bad practices and create a library of reusable functions for fetching the data that can be used from different code points. From http://www.andygibson.net/blog/article/deterring-tomany-relationships-in-jpa-models/

January 18, 2011

by Andy Gibson

· 23,333 Views · 1 Like

EMF Tips: Accessing Model Meta Data, Serializing into Element/Attribute

Two tips for the Eclipse Modeling Framework (EMF) 2.2.1: Accessing model’s meta model – accessing EClass/attribute by name – so that you can set an attribute when you only know its name and haven’t its EAttribute How to force EMF to serialize an object as an XML element instead of an XML attribute Excuse my rather liberal and slightly confusing use of the terms model, model instance and meta model. Tip 1: Accessing model’s meta model – accessing EClass/attribute by name Normally you cannot do any operation on a dynamic EMF model instance – such as instantiating an EObject or settings its property via eSet – without the corresponding meta model objects such as EClass and EAttribute. But there is a solution – you can build an ExtendedMetaData instance and use its methods to find the meta model objects based on search criteria such as element name and namespace. Examples Create ExtendedMetaData from a model One way to build a meta data instance is to instantiate BasicExtendedMetaData based on a Registry, containing all registered packages. This is usually available either via ResourceSet.getPackageRegistry() or globally, via EPackage.Registry.INSTANCE. import org.eclipse.emf.ecore.util.*; ... ExtendedMetaData modelMetaData = new BasicExtendedMetaData(myResourceSet.getPackageRegistry()); Get EClass for a namespace and name Provided that your model contains an element with the name Book and namespace http://example.com/book: EClass bookEClass = (EClass) modelMetaData.getType("http://example.com/book", "Book"); Get EClass’ attribute by name Beware: Some properties (such as those described by named complex types) are not represented by an EAttribute but an EReference (both extend EStructuralFeature) and are accessible as the EClass’ elements, not attributes, even though from developer’s points of view they’re attributes of the owning class. Let’s suppose that the book has the attribute name: EStructuralFeature nameAttr = modelMetaData.getElement(bookEClass, null, "name"); The namespace is null because normally attributes/nested elements are not classified with a schema. Here is how you would print the name and namespace URI of an attribute/element: System.out.println("attr: " + modelMetaData.getNamespace(nameAttr) + ":" + nameAttr.getName()); //prints "null:name" Tip 2: How to force EMF to serialize an object as an XML element instead of an XML attribute Normally EMF stores simple Java properties as attributes of the XML element representing the owning class: but you might prefer to have it rather as a nested element: The Book of Songs To achieve that: Enable the save option OPTION_EXTENDED_META_DATA (so that extended meta data such as annotations and an XML map aren’t ignored) Tell EMF that you want this property to be stored as an element By attaching an eAnnotation to it (not shown) By supplying a XML Map with this information upon save To enable the extended metadata: Map saveOptions = new HashMap(); saveOptions.put(XMLResource.OPTION_EXTENDED_META_DATA, Boolean.TRUE); According to some documentation the value should be an implementation of ExtendedMetaData, according to others Boolean.TRUE is the correct choice – I use the latter for it’s easier and works for me. To tell EMF to write a property as an element when serailizing to XML: import org.eclipse.emf.ecore.xmi.impl.*; ... EAttribute bookNameEAttribute = ...; // retrieved e.g. from meta data, see Tip 1 XMLMapImpl map = new XMLMapImpl(); XMLInfoImpl x = new XMLInfoImpl(); x.setXMLRepresentation(XMLInfoImpl.ELEMENT); map.add(bookNameEAttribute, x); saveOptions.put(XMLResource.OPTION_XML_MAP, map); The XMLInfoImpl enables you to customize the namespace, name and representation of the element. When saving you then just supply the save options: EObject target = ...; org.eclipse.emf.ecore.resource.Resource outResource = ...; outResource.getContents().add(target); outResource.save(saveOptions); Reference: the Redbook Eclipse Development using the Graphical Editing Framework and the Eclipse Modeling Framework, page 74, section 2.3.4 From http://theholyjava.wordpress.com/2011/01/11/emf-tips-accessing-model-meta-data-serializing-into-elementattribute/

January 12, 2011

by Jakub Holý

· 14,524 Views

Are the JPA callback methods useful?

Definitely no. The section 3.5 of JPA specification states: “In general, the lifecycle method of a portable application should not invoke EntityManager or Query operations, access other entity instances, or modify relationships within the same persistence context. A lifecycle callback method may modify the non-relationship state of the entity on which it is invoked.” Surely these restrictions has a good technical reason behind them, but from a business application developer perspective they mean that JPA callback methods are practically useless. For example, these scenarios are typical: In order to remove some entity we need to verify if some data exists, and we want do it using a JPA query. When an entity is saved, some other entities must be automatically created and saved, and we want to use the JPA EntityManager to do so. Unfortunately, it’s difficult to solve such cases using the standard annotations: @PrePersist, @PostPersist, @PreRemove, @PostRemove, @PreUpdate, @PostUpdate or @PostLoad. In order to remove some entity we need to verify if some data exists, and we want do it using a JPA query. When an entity is saved, some other entities must be automatically created and saved, and we want to use the JPA EntityManager to do so. Unfortunately, it’s difficult to solve such cases using the standard annotations: @PrePersist, @PostPersist, @PreRemove, @PostRemove, @PreUpdate, @PostUpdate or @PostLoad. What can we do? We have several options such as: Using JDBC from the callback methods: Horror! Create a new EntityManager in the callback method: This works sometimes, but you can have problems with isolation levels. Moreover, you lose the transactional behavior. Put the on-save or on-remove logic in the controller layer, that is in the actions: Of course, this works just fine, but if you access to the entities from other actions, from a batch process, or from a web service, the on-logic or on-remove will not be executed. Obviously, these options are dirty and unnatural, and even even worse, they mean more work for us. Create your own callback annotations In OpenXava, we have opted for the simplest solution for the poor application developer, just creating some new callback annotations that allow to use JPA inside them. OpenXava 4.0.1 includes the next new annotations: @PreCreate, @PostCreate and @PreDelete. For example, if we need to create a customer and assign it to an invoice when the customer is not specified, you can write: @PreCreate public void onPreCreate() { // Automatically create a new customer if (getCustomer() == null) { Customer cust = new Customer(); cust.setName(getName()); cust.setAddress(getAddress()); cust = XPersistence.getManager().merge(cust); // Here we use the EntityManager setCustomer(cust); // and here we change a relationship } } If you want to enjoy these annotations just use OpenXava for developing your application. Although if you are not still ready for rapid development, you can create these annotations yourself easily, just use the decorator pattern over the EntityManager or use AOP to refine the behavior of persist() and remove() methods. Learn more about these annotations

January 12, 2011

by Javier Paniza

· 15,283 Views · 1 Like

Interview: Troy Giunipero, Author of NetBeans E-commerce Tutorial

Troy Giunipero (pictured, right) is a student at the University of Edinburgh studying toward an MSc in Computer Science. Formerly, he was one of the NetBeans Docs writers based in Prague, Czech Republic, where he spent most of his time writing Java web tutorials. In this interview, Troy introduces you to The NetBeans E-commerce Tutorial. This is a very detailed tutorial describing just about everything you need to know when creating an e-commerce web application in Java. It has received a lot of very positive feedback. Let's find out about the background of this tutorial and what Troy learnt in writing it. Hi Troy! During your time on the NetBeans team, you wrote a very large tutorial describing how to create an e-commerce site. How and why did you start writing it? Well, there’s a short answer and a long answer to this. The short answer is that I was lucky to take part in Sun’s SEED (Sun Engineering Enrichment and Development) program. I wanted to focus on technical aspects, so I based my curriculum on developing an e-commerce application using Java technologies. I documented my efforts and applied them toward deliverables for the IDE’s 6.8 and 6.9 releases, resulting in the 13-part NetBeans E-commerce Tutorial. The long answer is that I had previously been tasked with creating an e-commerce application for my degree project (I was studying toward a BSc in IT and Computing), and ran into loads of trouble trying to integrate the various technologies into a cohesive, functioning application. I was coming from a non-technical background and found there was a steep learning curve involved in web development. My work was fraught with problems which I can now attribute to poor time-management, and a lack of good, practical, hands-on learning resources. So in a way, working on the AffableBean project (this is the project used in the NetBeans E-commerce Tutorial) was a way for me to go back and attempt to do the whole thing right. With the tutorial, I had two goals in mind: one, I wanted to consolidate my understanding of everything by writing about it, and two, I wanted to help others avoid the problems and pitfalls that I’d earlier ran into by designing a piece of documentation that puts everything together. Can you run us through the basic parts and what they provide? Certainly. First I want to point out that there’s a live demo application (http://dot.netbeans.org:8080/AffableBean/) which I managed to get up and running with help from Honza Pirek from the NetBeans Web Team (thanks Honza!): The application is modeled on the well-known MVC architecture: The tutorial refers to the above diagram at various points, and covers a bunch of different concepts and technologies along the way, including: Project design and planning (unit 2) Designing a data model (using MySQL WorkBench) (unit 4) Forward-engineering the data model into a database schema (unit 4) Database connectivity (units 3, 4, 6) Servlet, JSP/EL and JSTL technologies (units 3, 5, 6) EJB 3 and JPA 2 technologies (unit 7), and transactional support (unit 9) Session management (i.e., for the shopping cart mechanism) (unit 8) Form validation and data conversion (unit 9) Multilingual support (unit 10) Security (i.e., using form-based authentication and encrypted communication) (unit 11) Load testing with JMeter (unit 12) Monitoring the application with the IDE’s Profiler (unit 12) Tips for deployment to a production server (units 12, 13) Also, the tutorial aims to provide ample background information on the whole “Java specifications” concept, with an introduction to the Java Community Process, how final releases include reference implementations, and how these relate to the tutorial application using the IDE’s bundled GlassFish server (units 1, 7). Finally, the tutorial is as much about the above concepts and technologies as it is about learning to make best use of the IDE. I really tried to squeeze as much IDE-centric information in there as possible. So for example you’ll find: An introduction to the IDE’s main windows and their functions (unit 3) A section dedicated to editor tips and tricks (unit 5), and abundant usage of keyboard shortcuts in steps throughout the tutorial Use of the debugger (unit 8) Special “tip boxes” that discuss IDE functionality that is sometimes difficult to fit into conventional documentation. For example, there are tips on using the IDE’s Template Manager (unit 5), GUI support for database tables (unit 6), Javadoc support (unit 8), and support for code templates (unit 9). Did you learn any new things yourself while writing it? Yes! Three things immediately come to mind: EJB 3 technology. Initially this was a big hurdle for me. Using EJB 3 effectively seems to be something of an art form. If you know what you’re doing and understand exactly how to use the EntityManager to handle persistence operations on a database, EJB lets you do some amazingly smart things with just a few lines of code. But there seems to be a lack of good free documentation online—especially since EJB 3 is a significant departure from EJB 2. Therefore, almost all of the tutorial’s information on EJB comes from the very excellent book, EJB in Action by Debu Panda and Reza Rahman. Interpreting the NetBeans Profiler. The final hands-on unit, Testing an Profiling, was the most difficult for me to write, primarily because I just wasn’t familiar with the Profiler at all. I spent an unhealthy amount of time just watching the Telemetry graph run against my JMeter test plan, which is only slightly more stimulating than watching water come to boil. That being said, I feel that by just examining the graphs and other windows over time, critical logical associations start to jump out at you after a while. Likewise with JMeter. Hopefully unit 12 was able to capture and relay some of these. How to search online for decent articles and learning materials. The old Sun Technical Articles site was a great resource. Many of the links in the See Also sections at the bottom of tutorial units were found by adding site:java.sun.com/developer/technicalArticles/ to a Google search. Also the official forums (found at forums.sun.com) became a good place for questions I couldn’t find ready answers to. I had both the Java EE 5 and 6 Tutorials bookmarked. And Marty Hall’s Core Servlets and JavaServer Pages became an invaluable resource for the first half of the tutorial. What are your personal favorite features of the technologies discussed in the tutorial? I particularly liked learning about session management—using the HttpSession object to carry user-specific data between requests, and working with JSP’s corresponding implicit objects in the front-end pages. Session management is a defining aspect for e-commerce applications, as they need to provide some sort of shopping cart mechanism... ...and so the Managing Sessions unit (unit 8) was a key chapter in the tutorial. It’s extremely useful to be able to suspend the debugger on a portion of code that includes session-scoped variables, then hover the mouse over a given variable in the editor to determine its current value. I used the debugger continuously during this phase, and so I went so far as to incorporate use of the debugger throughout the Managing Sessions unit. What kind of background does someone starting the tutorial need to have? Someone can come to the tutorial with little or absolutely no experience using NetBeans. I’ve tried to be particularly careful in providing clear and easy-to-follow instructions in this respect. But one would be best off having some background or knowledge in basic web technologies, and at least some exposure to relational databases. With this foundation, I think that the topics covered in the second half of the tutorial, like applying entity classes and session beans, language support and security, won’t seem too daunting. I’ve noticed that the vast majority of feedback that comes in relates to the first half of the tutorial, and I sometimes get the impression that people feel they need to follow the tutorial units consecutively. Not so. The units are 90% modular. In other words, if somebody just wants to run through the security unit (unit 11), they can do so by downloading the associated project snapshot, follow the setup instructions, and then just follow along without needing to even look at other parts of the tutorial. What will they be able to do at the end of it? Naturally, anybody who completes individual tutorial units will be able to apply the concepts and technologies to their own work. But anyone who completes the tutorial in its entirety will gain an insight into the development process as a whole, and I think will also get a certain confidence that comes with knowing how “all the pieces fit together”—from gathering customer requirements all the way to deployment of the completed app to a production server. They’ll also have gained a solid familiarity with the NetBeans IDE, and be in a good position to explore popular Java web frameworks that work on top of servlet technology or impose an MVC architecture on their own, such as JSF, Spring, Struts, or Wicket. Do you see any problems in the technologies discussed and what would be your suggestions for enhancements? Well there’s one thing that comes to mind. When I started working on this project, I was studying the Duke’s BookStore example from the Java EE 5 Tutorial. A wonderful example that demonstrates how to progressively implement the same application using various technologies and combinations thereof. So for example you start out with an all-servlet implementation, then move on to a JSP/servlet version. Then there’s a JSTL implementation and ultimately, a version using JavaServer Faces. It’s great learning material, but also terrifically outdated. Right around this time, Sun was gearing up for the big Java EE 6 release (Dec. 2009), and I was also trying to learn about the new upcoming technologies, namely CDI, JSF 2, and EJB 3, for my regular NetBeans documentation work. I was getting the definite sense that JSP and JSTL were slowly being pushed aside—in the case of JavaServer Faces, Facelets templating was the new page authoring technology. So really, the E-commerce Tutorial application has become a sort of EE 5/EE 6 hybrid by combining JSP/JSTL with EJB 3 and JPA 2. Now the problem I see from the perspective of a student trying to learn this stuff from scratch, is that the leap from basic servlet technology to a full-blown JSF/EJB/JPA solution is tremendous, and cannot readily be taught through a single tutorial. Naturally, others may disagree with me here. I’m not sure if there’s a solution other than to compensate by producing a lot of quality learning material that covers lots of different use-cases. I’d suggest that the E-commerce Tutorial puts one in a very advantageous position to begin learning about Java-based frameworks, such as GWT, Spring, and JSF, which is a natural course of action for people looking to get a job with this knowledge. Planning any more parts to the tutorial or a new one? No more parts. The E-commerce Tutorial is done. Upon committing the final installments and changes last November, I rejoiced. However, I’m still actively responding to feedback [the ‘Send Us Your Feedback’ links at the bottom of tutorials] and plan to maintain it indefinitely, so if anyone spots any typos, has questions or comments, recommendations for improvement, etc., please write in! :-)

January 9, 2011

by Geertjan Wielenga

· 30,180 Views