DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
The Goal of software development
The Goal by Eli Goldratt is a business book in the form of a novel, where the protagonist must save his factory from closing due to very low productivity. The Goal is not limited to the management of a large organization (not even to for-profit companies): you simply have to define different units of measurement, like goal units instead of making money, the default goal. In fact, from the applications of the Theory of Constraints in our field I think it applies to software development too. What follows is my translation of the themes of The Goal to our field. The Goal is... Mking money, of course. For the ones of you with knowledge in accounting, the original goal is: raising throghput, the amounts of items sold (not produced) in the unit of time. Lowering investment/inventory, all the money tied up in the system in the form of assets that could be sold or products that stay in a warehouse. Lowering operational expense, all the money that we have spent as support and that cannot be recovered. How does these measurements apply to software development? A team does not always have an impact on contract negotiation, so often talking about money is far from everyday reality (kudos to you if you can apply that point of the Agile Manifesto.) The goal for software development can be translated, in my opinion, to: raising the throughput, the amount of features delivered (deployed, not implemented or tested) in the unit of time. You can measure this amount in story points, since feature vary in size. lowering investment/inventory, all the time tied up in the system in the form of undeployed or untested features that clutter the code base. In a minor part, also investment in the form of hardware, but that's by far less important than the team's time. lower operational expense, the time spent by developers every day in order to support the development. Automation is a kind of time investment that will bring more time (and quality) in the future, lowering operational expense. Like for material products, WIP has storage and opportunity costs which goes into the operational expense. Kanban is a tool that tries to reduce WIP in order to foster the two latter points. Throughput accounting This kind of throughput accounting is emphasized in the novel, over the use of cost accounting, where each developer (ehm, factory worker) has to be occupied all the time, even if the work he is doing isn't moving towards the goal: neverending refactoring. Specification of a feature which cannot be implemented until two months are gone, and will have to be rewritten. Implementation of features which won't be merged with the main branch any time soon. With Test-Driven Development, we are getting good at moving a feature from implemented to tested directly in the same commit. Yet the missing step is getting the feature to the users: maybe that's also what Continuous Deployment is all about... Dependent events Dependent events and statistical fluctuations are production systems topics that make a balanced plant close to bankruptcy: however, we're not at the point in which we can model our team as precisely as a factory. The basic point is that a plant in which everyone is working all the time is inefficient: when an early stage (like defining a specification or implementing a feature) gets delayed, downstream step such as deployment are dalayed too. Converely, when an upstream step finish earlier, the downstream stage is already at maximum efficiency and cannot process the intermediate result faster. I wonder if this applies to software development too. In a factory, workers are specialized and can do just a few jobs across the plant. Since workers and machines have different production rates, there will be just one bottleneck: the slowest one. If products have to pass from the bottleneck, anyone producing faster than the bottleneck will just accumulate WIP in front of him. Continuing with our example, if the analyst or domain expert is churning out specifications for new features every day, most of them are just WIP in front of the development team. Once there is an established buffer, any additional specification won't raise throughput any faster; instead, it will raise the inventory (partial features) and the time spent in managing it. I think this is not always true in the most technical phases of development instead. For example, in a small team a developer may be moved to testing or refactoring, or setting up Continuous Integration or evaluation of a new library. Unless you have a DBA which can just manage databases, your developer is not fixed into a stage of the system. The bottleneck The previous example featured a bottleneck, the most famous concept of the Theory of Constraints introduced in the book. This translation to the software development case is mine and could be incomplete. A feature (or a user story) has to pass in a series of stations where different people will work on it to make it real: specification from a domain expert, implementation from a technical team, extended testing with optional customer validation, deployment which should be fast but at the same time must not kill the current version of the application. Each station has an average velocity. By definition, there is a station which is the slowest and can process fewest story points in the unit of time. This is the bottleneck. (This is not always true, as velocity may vary greatly in time with the addition of new people or hidden lines discovered in a feature. Becoming good at estimation and stabilizing a team are two objectives that help reach the assumption.) You can identify a bottleneck by looking at where is the WIP: it will accumulate in front of it. If you already have a kanban board, this phase is simpler... Once identified, the throughput of the system can only be improved by raising the bottleneck's capacity enough so that it is no more a bottleneck. You can move people to it (keeping an eye on communication costs); ensure it is used at maximum efficiency by freeing the specialized developers from other mundane tasks. Now you can restart and find a new bottleneck... Conclusions This is just a little introduction to the themes of the Theory of Constraints and Goldratt's teachings; I don't pretend to explain the whole book in an article. It is also against the Socratic method: you should reach the answers yourself, and these are just examples from my experience. There is more to Goldratt and The Goal than bottlenecks and throughput, such as continuous improvement. If you're working or managing in a software development team, I suggest you to read this book if you have the opportunity. Even when freelancing, it is an eye-opener in moving towards a Goal instead of busyworking; and it's written with a never boring teaching method.
October 4, 2011
by Giorgio Sironi
· 21,586 Views
article thumbnail
Hibernate by Example - Part 1 (Orphan removal)
So i thought to do a series of hibernate examples showing various features of hibernate. In the first part i wanted to show about the Delete Orphan feature and how it may be used with the use of a story line. So let us begin :) Prerequisites: In order for you to try out the following example you will need the below mentioned JAR files: org.springframework.aop-3.0.6.RELEASE.jar org.springframework.asm-3.0.6.RELEASE.jar org.springframework.aspects-3.0.6.RELEASE.jar org.springframework.beans-3.0.6.RELEASE.jar org.springframework.context.support-3.0.6.RELEASE.jar org.springframework.context-3.0.6.RELEASE.jar org.springframework.core-3.0.6.RELEASE.jar org.springframework.jdbc-3.0.6.RELEASE.jar org.springframework.orm-3.0.6.RELEASE.jar org.springframework.transaction-3.0.6.RELEASE.jar. org.springframework.expression-3.0.6.RELEASE.jar commons-logging-1.0.4.jar log4j.jar aopalliance-1.0.jar dom4j-1.1.jar hibernate-commons-annotations-3.2.0.Final.jar hibernate-core-3.6.4.Final.jar hibernate-jpa-2.0-api-1.0.0.Final.jar javax.persistence-2.0.0.jar jta-1.1.jar javassist-3.1.jar slf4j-api-1.6.2.jar mysql-connector-java-5.1.13-bin.jar commons-collections-3.0.jar For anyone who want the eclipse project to try this out, you can download it with the above mentioned JAR dependencies here. Introduction: Its year 2011. And The Justice League has grown out of proportion and are searching for a developer to help with creating a super hero registering system. A developer competent in Hibernate and ORM is ready to do the system and handle the persistence layer using Hibernate. For simplicity, He will be using a simple stand alone application to persist super heroes. This is how this example will layout: Table design Domain classes and Hibernate mappings DAO & Service classes Spring configuration for the application A simple main class to show how it all works Let the Journey Begin...................... Table Design: The design consists of three simple tables as illustrated by the diagram below; As you can see its a simple one-to-many relationship linked by a Join Table. The Join Table will be used by Hibernate to fill the Super hero list which is in the domain class which we will go on to see next. Domain classes and Hibernate mappings: There are mainly only two domain classes as the Join table in linked with the primary owning entity which is the Justice League entity. So let us go on to see how the domain classes are constructed with annotations; package com.justice.league.domain; import java.io.Serializable; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.Table; import org.hibernate.annotations.Type; @Entity @Table(name = "SuperHero") public class SuperHero implements Serializable { /** * */ private static final long serialVersionUID = -6712720661371583351L; @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "super_hero_id") private Long superHeroId; @Column(name = "super_hero_name") private String name; @Column(name = "power_description") private String powerDescription; @Type(type = "yes_no") @Column(name = "isAwesome") private boolean isAwesome; public Long getSuperHeroId() { return superHeroId; } public void setSuperHeroId(Long superHeroId) { this.superHeroId = superHeroId; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getPowerDescription() { return powerDescription; } public void setPowerDescription(String powerDescription) { this.powerDescription = powerDescription; } public boolean isAwesome() { return isAwesome; } public void setAwesome(boolean isAwesome) { this.isAwesome = isAwesome; } } As i am using MySQL as the primary database, i have used the GeneratedValue strategy as GenerationType.AUTO which will do the auto incrementing whenever a new super hero is created. All other mappings are familiar to everyone with the exception of the last variable where we map a boolean to a Char field in the database. We use Hibernate's @Type annotation to represent true & false as Y & N within the database field. Hibernate has many @Type implementations which you can read about here. In this instance we have used this type. Ok now that we have our class to represent the Super Heroes, lets go on to see how our Justice League domain class looks like which keeps tab of all super heroes who have pledged allegiance to the League. package com.justice.league.domain; import java.io.Serializable; import java.util.ArrayList; import java.util.List; import javax.persistence.CascadeType; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.FetchType; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.JoinColumn; import javax.persistence.JoinTable; import javax.persistence.OneToMany; import javax.persistence.Table; @Entity @Table(name = "JusticeLeague") public class JusticeLeague implements Serializable { /** * */ private static final long serialVersionUID = 763500275393020111L; @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "justice_league_id") private Long justiceLeagueId; @Column(name = "justice_league_moto") private String justiceLeagueMoto; @Column(name = "number_of_members") private Integer numberOfMembers; @OneToMany(cascade = { CascadeType.ALL }, fetch = FetchType.EAGER, orphanRemoval = true) @JoinTable(name = "JUSTICE_LEAGUE_SUPER_HERO", joinColumns = { @JoinColumn(name = "justice_league_id") }, inverseJoinColumns = { @JoinColumn(name = "super_hero_id") }) private List superHeroList = new ArrayList(0); public Long getJusticeLeagueId() { return justiceLeagueId; } public void setJusticeLeagueId(Long justiceLeagueId) { this.justiceLeagueId = justiceLeagueId; } public String getJusticeLeagueMoto() { return justiceLeagueMoto; } public void setJusticeLeagueMoto(String justiceLeagueMoto) { this.justiceLeagueMoto = justiceLeagueMoto; } public Integer getNumberOfMembers() { return numberOfMembers; } public void setNumberOfMembers(Integer numberOfMembers) { this.numberOfMembers = numberOfMembers; } public List getSuperHeroList() { return superHeroList; } public void setSuperHeroList(List superHeroList) { this.superHeroList = superHeroList; } } The important fact to note here is the annotation @OneToMany(cascade = { CascadeType.ALL }, fetch = FetchType.EAGER, orphanRemoval = true). Here we have set orphanRemoval = true. So what does that do exactly? Ok so say that you have a group of Super Heroes in your League. And say one Super Hero goes haywire. So we need to remove Him/Her from the League. With JPA cascade this is not possible as it does not detect Orphan records and you will wind up with the database having the deleted Super Hero(s) whereas your collection still has a reference to it. Prior to JPA 2.0 you did not have the orphanRemoval support and the only way to delete orphan records was to use the following Hibernate specific(or ORM specific) annotation which is now deprecated; @org.hibernate.annotations.Cascade(org.hibernate.annotations.CascadeType.DELETE_ORPHAN) But with the introduction of the attribute orphanRemoval, we are now able to handle the deletion of orphan records through JPA. Now that we have our Domain classes DAO & Service classes: To keep with good design standards i have separated the DAO(Data access object) layer and the service layer. So let us see the DAO interface and implementation. Note that i have used HibernateTemplatethrough HibernateDAOSupportso as to keep away any Hibernate specific detail out and access everything in a unified manner using Spring. package com.justice.league.dao; import org.springframework.transaction.annotation.Propagation; import org.springframework.transaction.annotation.Transactional; import com.justice.league.domain.JusticeLeague; @Transactional(propagation = Propagation.REQUIRED, readOnly = false) public interface JusticeLeagueDAO { public void createOrUpdateJuticeLeagure(JusticeLeague league); public JusticeLeague retrieveJusticeLeagueById(Long id); } In the interface layer i have defined the Transaction handling as Required. This is done so that whenever you do not need a transaction you can define that at the method level of that specific method and in more situations you will need a transaction with the exception of data retrieval methods. According to the JPA spec you need a valid transaction for insert/delete/update functions. So lets take a look at the DAO implementation; package com.justice.league.dao.hibernate; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.orm.hibernate3.support.HibernateDaoSupport; import org.springframework.transaction.annotation.Propagation; import org.springframework.transaction.annotation.Transactional; import com.justice.league.dao.JusticeLeagueDAO; import com.justice.league.domain.JusticeLeague; @Qualifier(value="justiceLeagueHibernateDAO") public class JusticeLeagueHibernateDAOImpl extends HibernateDaoSupport implements JusticeLeagueDAO { @Override public void createOrUpdateJuticeLeagure(JusticeLeague league) { if (league.getJusticeLeagueId() == null) { getHibernateTemplate().persist(league); } else { getHibernateTemplate().update(league); } } @Transactional(propagation = Propagation.NOT_SUPPORTED, readOnly = false) public JusticeLeague retrieveJusticeLeagueById(Long id){ return getHibernateTemplate().get(JusticeLeague.class, id); } } Here i have defined an @Qualifier to let Spring know that this is the Hibernate implementation of the DAO class. Note the package name which ends with hibernate. This as i see is a good design concept to follow where you separate your implementation(s) into separate packages to keep the design clean. Ok lets move on to the service layer implementation. The service layer in this instance is just acting as a mediation layer to call the DAO methods. But in a real world application you will probably have other validations, security related procedures etc handled within the service layer. package com.justice.league.service; import com.justice.league.domain.JusticeLeague; public interface JusticeLeagureService { public void handleJusticeLeagureCreateUpdate(JusticeLeague justiceLeague); public JusticeLeague retrieveJusticeLeagueById(Long id); } package com.justice.league.service.impl; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.stereotype.Component; import com.justice.league.dao.JusticeLeagueDAO; import com.justice.league.domain.JusticeLeague; import com.justice.league.service.JusticeLeagureService; @Component("justiceLeagueService") public class JusticeLeagureServiceImpl implements JusticeLeagureService { @Autowired @Qualifier(value = "justiceLeagueHibernateDAO") private JusticeLeagueDAO justiceLeagueDAO; @Override public void handleJusticeLeagureCreateUpdate(JusticeLeague justiceLeague) { justiceLeagueDAO.createOrUpdateJuticeLeagure(justiceLeague); } public JusticeLeague retrieveJusticeLeagueById(Long id){ return justiceLeagueDAO.retrieveJusticeLeagueById(id); } } Few things to note here. First of all the @Component binds this service implementation with the name justiceLeagueService within the spring context so that we can refer to the bean as a bean with an id of name justiceLeagueService. And we have auto wired the JusticeLeagueDAO and defined an @Qualifier so that it will be bound to the Hibernate implementation. The value of the Qualifier should be the same name we gave the class level Qualifier within the DAO Implementation class. And Lastly let us look at the Spring configuration which wires up all these together; Spring configuration for the application: com.justice.league.**.* org.hibernate.dialect.MySQLDialect com.mysql.jdbc.Driver jdbc:mysql://localhost:3306/my_test root password true org.hibernate.dialect.MySQLDialect Note that i have used the HibernateTransactionManager in this instance as i am running it stand alone. If you are running it within an application server you will almost always use a JTA Transaction manager. I have also used auto creation of tables by hibernate for simplicity purposes. The packagesToScan property instructs to scan through all sub packages(including nested packaged within them) under the root package com.justice.league.**.* to be scanned for @Entity annotated classes. We have also bounded the session factory to the justiceLeagueDAO so that we can work with the Hibernate Template. For testing purposes you can have the tag create initially if you want, and let hibernate create the tables for you. Ok so now that we have seen the building blocks of the application, lets see how this all works by first creating some super heroes within the Justice League A simple main class to show how it all works: As the first example lets see how we are going to persist the Justice League with a couple of Super Heroes; package com.test; import java.util.ArrayList; import java.util.List; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.justice.league.domain.JusticeLeague; import com.justice.league.domain.SuperHero; import com.justice.league.service.JusticeLeagureService; public class TestSpring { /** * @param args */ public static void main(String[] args) { ApplicationContext ctx = new ClassPathXmlApplicationContext( "spring-context.xml"); JusticeLeagureService service = (JusticeLeagureService) ctx .getBean("justiceLeagueService"); JusticeLeague league = new JusticeLeague(); List superHeroList = getSuperHeroList(); league.setSuperHeroList(superHeroList); league.setJusticeLeagueMoto("Guardians of the Galaxy"); league.setNumberOfMembers(superHeroList.size()); service.handleJusticeLeagureCreateUpdate(league); } private static List getSuperHeroList() { List superHeroList = new ArrayList(); SuperHero superMan = new SuperHero(); superMan.setAwesome(true); superMan.setName("Clark Kent"); superMan.setPowerDescription("Faster than a speeding bullet"); superHeroList.add(superMan); SuperHero batMan = new SuperHero(); batMan.setAwesome(true); batMan.setName("Bruce Wayne"); batMan.setPowerDescription("I just have some cool gadgets"); superHeroList.add(batMan); return superHeroList; } } And if we go to the database and check this we will see the following output; mysql> select * from superhero; +---------------+-----------+-----------------+-------------------------------+ | super_hero_id | isAwesome | super_hero_name | power_description | +---------------+-----------+-----------------+-------------------------------+ | 1 | Y | Clark Kent | Faster than a speeding bullet | | 2 | Y | Bruce Wayne | I just have some cool gadgets | +---------------+-----------+-----------------+-------------------------------+ mysql> select * from justiceleague; +-------------------+-------------------------+-------------------+ | justice_league_id | justice_league_moto | number_of_members | +-------------------+-------------------------+-------------------+ | 1 | Guardians of the Galaxy | 2 | +-------------------+-------------------------+-------------------+ So as you can see we have persisted two super heroes and linked them up with the Justice League. Now let us see how that delete orphan works with the below example; package com.test; import java.util.ArrayList; import java.util.List; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.justice.league.domain.JusticeLeague; import com.justice.league.domain.SuperHero; import com.justice.league.service.JusticeLeagureService; public class TestSpring { /** * @param args */ public static void main(String[] args) { ApplicationContext ctx = new ClassPathXmlApplicationContext( "spring-context.xml"); JusticeLeagureService service = (JusticeLeagureService) ctx .getBean("justiceLeagueService"); JusticeLeague league = service.retrieveJusticeLeagueById(1l); List superHeroList = league.getSuperHeroList(); /** * Here we remove Batman(a.k.a Bruce Wayne) out of the Justice League * cos he aint cool no more */ for (int i = 0; i < superHeroList.size(); i++) { SuperHero superHero = superHeroList.get(i); if (superHero.getName().equalsIgnoreCase("Bruce Wayne")) { superHeroList.remove(i); break; } } service.handleJusticeLeagureCreateUpdate(league); } } Here we first retrieve the Justice League record by its primary key. Then we loop through and remove Batman off the League and again call the createOrUpdate method. As we have the remove orphan defined, any Super Hero not in the list which is in the database will be deleted. Again if we query the database we will see that batman has been removed now as per the following; mysql> select * from superhero; +---------------+-----------+-----------------+-------------------------------+ | super_hero_id | isAwesome | super_hero_name | power_description | +---------------+-----------+-----------------+-------------------------------+ | 1 | Y | Clark Kent | Faster than a speeding bullet | +---------------+-----------+-----------------+-------------------------------+ So that's it. The story of how Justice League used Hibernate to remove Batman automatically without being bothered to do it themselves. Next up look forward to how Captain America used Hibernate Criteria to build flexible queries in order to locate possible enemies. Watch out!!!! Have a great day people and thank you for reading!!!! If you have any suggestions or comments pls do leave them by. From http://dinukaroshan.blogspot.com/2011/09/hibernate-by-example-part-1-orphan.html
September 24, 2011
by Dinuka Arseculeratne
· 47,773 Views
article thumbnail
Practical PHP Refactoring: Replace Record with Data Class
We often find ourselves tempted by the shortcut of using directly a record-like data structure provided by the language or a framework. There are many example scenarios where a record emerges: when you use a database for persistence (not only a with relational one) there can be a data structure containing the results of a query. when you use associative arrays, often they have a small number of fixed keys. Record-like structures are the equivalent of C structs, or Ruby hashes. This refactoring is a generalization of Replace Array with Object: in this case the starting point is not just an array which always has the same number and type of fields, but any data structure homoegeneous to it: Zend_Db_Table_Row, implementing a Row Data Gateway and giving access to a row in the database. stdClass instances (or arrays) fetched by PDOStatement. In some languages (see C) the record is a particular data structure which is not an object; in PHP, it is always an array or an object of some vendor class. Even ORMs based on Active Record are more advanced than this kind of usage: they usually let you add methods on the model classes, which are populated with data by the ORM itself. In this refactoring, we are talking about a data structure managed by the language of the persistence layer, and whose code you cannot modify. Why replacing a record-like structure? Generic classes do not let you add methods to manipulate their data: when using directly a Zend_Db_Table_Row or an associative array for storage of the result of a query, you have to resort continuously to foreign methods. Each time you need new logic, you have to compromise encapsulation on the object itself, and these methods get duplicated in various instance of client code. Solutions There are different ways to eliminate the coupling to a record-like structure. The first is to refactor to subclassing, where the data structure becomes an Active Record. You extends the vendor class with your own one: this option is only available if the structure is defined as an object. Moreover, the name of the class to instantiate must be configurable in the persistence mechanism. Zend_Db_Table_Record supports this kind of usage. A second option is to refactor to composition: Zend_Db examples exist also for this case. This approach can be used to avoid large hierarchies: your model classes compose the Zend_Db objects and hide the database; methods can be added for once at your own models. A third and final alternative is to use hydration: data is copied to your model objects, and records are thrown away after the fact. Doctrine 2 and Data Mappers in general choose this approach. Steps I will describe a short procedure for refactoring to hydration since it is the most complex approach and it is always applicable. The other are instead specific for the particular data structure (for example with Zend_Db you have to write some subclasses and configure some protected fields to contain the right class names.) Create a new class, as one of your models. Its state should be represented by one row (or more rows joined into one) in the database. This class should gain a private field for each of the record's fields, usually with getters and setters. This class should accept in the constructor or in a Factory Method an instance of the record, so that it can produce a new instance. If you want to decouple from the persistsnce, or you want to go two ways (also save and not only visualize, since this is not just a presentation model), look for a Data Mapper such as Doctrine 2, which will even hide all the record structures from you and manager associations where objects compose other ones. Example In the example, the data of a single user are returned in an array. I chose an array as the record-like structure to minimize the external dependencies of this code. find(42); $this->assertEquals('Giorgio', $giorgio['name']); } } /** * This is a Fake Table Data Gateway. The machinery for making it work with * a database will be distracting for our purposes, so they will be omitted. */ class UsersTable { /** * @return mixed the returned value can be a Zend_Db_Table_Row, * an Active Record, a stdClass, an associative array... * It should just represent a single entity. */ public function find($id) { // execute a PDOStatement and fetch the data return array('id' => 42, 'name' => 'Giorgio'); } } After the refactoring, we have a User class where we can add all the methods we need: find(42); $this->assertEquals('Giorgio', $giorgio->getName()); } } /** * This is a Fake Table Data Gateway. The machinery for making it work with * a database will be distracting for our purposes, so they will be omitted. */ class UsersTable { /** * @return mixed the returned value can be a Zend_Db_Table_Row, * an Active Record, a stdClass, an associative array... * It should just represent a single entity. */ public function find($id) { // execute a PDOStatement and fetch the data return User::fromRecord(array('id' => 42, 'name' => 'Giorgio')); } } class User { private $id; private $name; public static function fromRecord(array $record) { $object = new self(); $object->id = $record['id']; $object->name = $record['name']; return $object; } public function getName() { return $this->name; } }
September 19, 2011
by Giorgio Sironi
· 11,039 Views
article thumbnail
EC2 Interview – AWS Interview – Cloud Interview – 8 Questions
If you're looking for a cloud expert, specifically someone who knows Amazon Web Services and EC2, you'll want to have a battery of questions to assess their knowledge.
September 15, 2011
by Sean Hull
· 111,875 Views · 1 Like
article thumbnail
Memory Barriers/Fences
In this article I'll discuss the most fundamental technique in concurrent programming known as memory barriers, or fences, that make the memory state within a processor visible to other processors. CPUs have employed many techniques to try and accommodate the fact that CPU execution unit performance has greatly outpaced main memory performance. In my “Write Combining” article I touched on just one of these techniques. The most common technique employed by CPUs to hide memory latency is to pipeline instructions and then spend significant effort, and resource, on trying to re-order these pipelines to minimise stalls related to cache misses. When a program is executed it does not matter if its instructions are re-ordered provided the same end result is achieved. For example, within a loop it does not matter when the loop counter is updated if no operation within the loop uses it. The compiler and CPU are free to re-order the instructions to best utilise the CPU provided it is updated by the time the next iteration is about to commence. Also over the execution of a loop this variable may be stored in a register and never pushed out to cache or main memory, thus it is never visible to another CPU. CPU cores contain multiple execution units. For example, a modern Intel CPU contains 6 execution units which can do a combination of arithmetic, conditional logic, and memory manipulation. Each execution unit can do some combination of these tasks. These execution units operate in parallel allowing instructions to be executed in parallel. This introduces another level of non-determinism to program order if it was observed from another CPU. Finally, when a cache-miss occurs, a modern CPU can make an assumption on the results of a memory load and continue executing based on this assumption until the load returns the actual data. Provided “program order” is preserved the CPU, and compiler, are free to do whatever they see fit to improve performance. Figure 1. Loads and stores to the caches and main memory are buffered and re-ordered using the load, store, and write-combining buffers. These buffers are associative queues that allow fast lookup. This lookup is necessary when a later load needs to read the value of a previous store that has not yet reached the cache. Figure 1 above depicts a simplified view of a modern multi-core CPU. It shows how the execution units can use the local registers and buffers to manage memory while it is being transferred back and forth from the cache sub-system. In a multi-threaded environment techniques need to be employed for making program results visible in a timely manner. I will not cover cache coherence in this article. Just assume that once memory has been pushed to the cache then a protocol of messages will occur to ensure all caches are coherent for any shared data. The techniques for making memory visible from a processor core are known as memory barriers or fences. Memory barriers provide two properties. Firstly, they preserve externally visible program order by ensuring all instructions either side of the barrier appear in the correct program order if observed from another CPU and, secondly, they make the memory visible by ensuring the data is propagated to the cache sub-system. Memory barriers are a complex subject. They are implemented very differently across CPU architectures. At one end of the spectrum there is a relatively strong memory model on Intel CPUs that is more simple than say the weak and complex memory model on a DEC Alpha with its partitioned caches in addition to cache layers. Since x86 CPUs are the most common for multi-threaded programming I’ll try and simplify to this level. Store Barrier A store barrier, “sfence” instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued. This will make the program state visible to other CPUs so they can act on it if necessary. A good example of this in action is the following simplified code from the BatchEventProcessor in the Disruptor. When the sequence is updated other consumers and producers know how far this consumer has progressed and thus can take appropriate action. All previous updates to memory that happened before the barrier are now visible. private volatile long sequence = RingBuffer.INITIAL_CURSOR_VALUE; // from inside the run() method T event = null; long nextSequence = sequence + 1L; while (running) { try { final long availableSequence = dependencyBarrier.waitFor(nextSequence); while (nextSequence <= availableSequence) { event = dependencyBarrier.getEvent(nextSequence); eventHandler.onEvent(event, nextSequence == availableSequence); nextSequence++; } sequence = event.getSequence(); // store barrier inserted here !!! } catch (final Exception ex) { exceptionHandler.handle(ex, event); sequence = event.getSequence(); // store barrier inserted here !!! nextSequence = event.getSequence() + 1L; } } Load Barrier A load barrier, “lfence” instruction on x86, forces all load instructions after the barrier to happen after the barrier and then wait on the load buffer to drain for that CPU. This makes program state exposed from other CPUs visible to this CPU before making further progress. A good example of this is when the BatchEventProcessor sequence referenced above is read by producers, or consumers, in the corresponding barriers of the Disruptor. Full Barrier A full barrier, "mfence" instruction on x86, is a composite of both load and store barriers happening on a CPU. Java Memory Model In the Java Memory Model a volatile field has a store barrier inserted after a write to it and a load barrier inserted before a read of it. Qualified final fields of a class have a store barrier inserted after their initialisation to ensure these fields are visible once the constructor completes when a reference to the object is available. Atomic Instructions and Software Locks Atomic instructions, such as the “lock ...” instructions on x86, are effectively a full barrier as they lock the memory sub-system to perform an operation and have guaranteed total order, even across CPUs. Software locks usually employ memory barriers, or atomic instructions, to achieve visibility and preserve program order. Performance Impact of Memory Barriers Memory barriers prevent a CPU from performing a lot of techniques to hide memory latency therefore they have a significant performance cost which must be considered. To achieve maximum performance it is best to model the problem so the processor can do units of work, then have all the necessary memory barriers occur on the boundaries of these work units. Taking this approach allows the processor to optimise the units of work without restriction. There is an advantage to grouping necessary memory barriers in that buffers flushed after the first one will be less costly because no work will be under way to refill them. From http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html
September 12, 2011
by Martin Thompson
· 26,040 Views · 8 Likes
article thumbnail
On DTOs
DTOs, or data-transfer objects, are commonly used. What is not со commonly-known is that they originate from DDD (Domain-driven design). There it makes a lot of sense – domain objects have state, identity and business logic while DTOs have only state. But many projects today are using the anemic data model approach (my opinion) and still use DTOs. They are used whenever an object “leaves” the service layer or “leaves” the system (through web services, rmi, etc.). There are three approaches: every entity has at least one corresponding DTO. Usually more than one, for different scenarios in the view layer. When you display a user in a list you have one DTO, when you display it in a “user details” window you need a more extended DTO. I am not in favour of this approach because in too many cases the DTO and the domain structure have exactly the same structure and as a result there’s a lot of duplicated code + redundant mapping. Another thing is the variability of multiple DTOs. Even if they differ from the entity, they differ from one another with one or two fields. Why duplication is a bad thing? Because changes are to be made in two places, issues are traced harder when data passes through multiple objects, and because..it is duplication. Copy & paste within the same project is a sin. DTOs are only created when their structure significantly differs from the that of the entity. In all other cases the entity itself is used. The cases when you don’t want to show some fields (especially when exposing via web services to 3rd parties) exist, but are not that common. This can sometimes be handled via the serialization mechanism – mark them as @JsonIgnore or @XmlTransient for example – but in other cases the structures are just different. In these cases a DTO is due. For example you have a User and UserDetails, where UserDetails holds the details + the relations of the currently logged user to the given user. The latter has nothing to do with the entity, so you create a DTO. However in the case of a DirectMessage you have sender, recipient, text and datetime both in the DB and in the UI. No need to have a DTO. One caveat with this approach (as well as with the next one). Anemic entities usually come with an ORM (JPA in the case of Java). Whenever they exit the service layer they may be invalid, because of lazy collections that require an open session. You have two options here: use OpenSessionInView / OpenEntityManagerInView – thus your session stays open until you are finished preparing the response. This is easy to configure but is not my preferred option – it violates layer boundaries in a subtle way, and this sometimes leads to problems especially for novice developers Don’t use lazy collections. Lazy collections are unneeded. Either make them eager, if they are supposed to hold a small list of items (for example – the list of roles for a user), or if the data is likely to grow use queries. Yes, you are not going to show 1000 records at on go anyway, you will have to page it. Without lazy associations (@*ToOne are eager by default) you won’t have invalid objects when the session is closed Don’t use DTOs at all. Applicable a soon as there aren’t significantly varying structures. For smaller projects this is usually a good way to go. Everything mentioned in the above point applies here. So my preferred approach is the “middle way”. But it requires a lot of consideration in each case, which may not be applicable for bigger and/or less experienced teams. So one of the two “extremes” should be picked. Since the “no DTOs” approach also requires consideration – what to make @Transient, how does lazy collections affect the flow, etc, the “All DTOs” is usually chosen. But even though it is seemingly the safest approach, it has many pitfalls. First, how do you map from DTOs to entities and vice-versa? Three options: dedicated mapper classes constructors – the DTO constructor takes the entity and fills itself, and vice-versa (remember to also provide a default constructor) declarative mapping (e.g. Dozer). This is practically the same as the first option – it externalizes the mapping. It can even be used together with a dedicated mapper class map them in-line (whenever needed). This can generate unmaintainable code and is not preferred I prefer the constructor approach, at least because fewer classes are created. But they are essentially the same (DTOs are not famous for encapsulation, so all of your properties are exposed anyway). Here is a list of guidelines when using DTOs and either of the “mapping” approaches: Don’t generate too much redundant code. If two scenarios require slightly different DTOs, reuse. No need to create a new DTO for a difference of one or two fields Don’t put presentation logic in mappers/constructors. For example if (entity.isActive()) dto.setStatus("Active"); This should happen in the view layer Don’t sneak entities together with DTOs. DTOs should not have members which are entities. Generally, entities should not be used outside the service layer (this is a bit extreme, but if we use DTOs everywhere we should be consistent and stick to that practice) Don’t use the mappers/entity-to-dto constructors in controllers, use them in the service layer. The reason DTOs are used in the first place is that entities may be ORM-bound, and they may not valid outside a session (i.e. outside the service layer). If using mappers, prefer static mapper methods. Mappers don’t have state, so no need for them to be instantiated. (And they don’t have to be mocked, wrapped, etc). If using mappers, there’s no need for a separate mapper for each entity(+its multiple DTOs). Related entities can be grouped in one mapper. For example Company, CompanyProfile, CompanySubsidiary can use the same mapper class Just make sure you make all these decisions at the beginning of a project and figure out which is applicable in your scenario (team size and experience, project size, domain complexity). From http://techblog.bozho.net/?p=427
September 10, 2011
by Bozhidar Bozhanov
· 27,669 Views · 3 Likes
article thumbnail
JSON data migration
JSON data format is simple and still powerful. Nowadays you can encounter more and more web applications communicating using JSON format then a couple of years ago. It is simple for a developer to read the format, it is effective for a web browser to parse the format and there are databases using it as its primary data format. But what happens when the data structure changes? You need to migrate. And that is where acris-json-migration might help you! Example situation Let's shed a light into it and assume we have a data like this: { "firstName":"John" "secondName":"Doe" "street":"Over the rainbow" "streetNr":21 } Such data can be represented by following Java domain object: public class Person { String firstName; String secondName; String street; Integer streetNr; // ... and getters and setters... } Well, this seems like data about a person named John Doe. We stored it in a database and you can clearly see, that secondName is probably not the field name we really like to have. But a developer made a mistake and in second version of our domain model we are going to fix it: { "firstName":"John" "surname":"Doe" "street":"Over the rainbow" "streetNr":21 } Now you can see the point - thousands of data stored in the format defined by Person class in its version #1 but our program communicating in version #2 with changed secondName to surname in Person class. Clients can wonder why the don't see surnames, can't they? ;) One thing to remember (for the following context) - the class Person changed and there is only Person class in version #2. Simple migration script In this situation I would like to write a script: public class PersonV1toV2Script extends JacksonTransformationScript { @Override public void process(ObjectNode node) { rename(node, "secondName", "surname"); } } From the above example it is clear that the script will do the job. And you can do pretty anything with the whole tree of JSON data - adding new nodes, removing existing ones, transforming here and there - all thanks to Jackson's tree model. How can I execute it? There is a Transformer abstract class representing a transofmer responsible for passing JSON data to a script and writing it back. Currently there are tow kinds of transformers: Jackson-based JSONT-based Jackson-based is the preferred one and is more developed then JSONT-based. So to execute a transformation on a data set you have to specify only two lines of code: JacksonTransformer t = new JacksonTransformer(input, output); t.transform(PersonV1toV2Script.class.getName()); ... where input and output represent directories. In the input directory all files are treated as files containing JSON data and are transformed and written to the output directory. For a detailed test you can look into TransformerTest in the project. Conclusion The script's helper API is evolving and provides you with nice methods like removeIfExists or addNonExistent methods. We would like to hear about your use-cases which are not handled by acris-json-migration yet so the project can generally serve the purpose of JSON data migration.
September 9, 2011
by Ladislav Gažo
· 12,527 Views
article thumbnail
Testing Databases with JUnit and Hibernate Part 1: One to Rule them
There is little support for testing the database of your enterprise application. We will describe some problems and possible solutions based on Hibernate and JUnit.
September 6, 2011
by Jens Schauder
· 122,927 Views · 2 Likes
article thumbnail
Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS
The integration framework Apache Camel already supports several important cloud services (see my overview article at http://www.kai-waehner.de/blog/2011/07/09/cloud-computing-heterogeneity-will-require-cloud-integration-apache-camel-is-already-prepared for more details). This article describes the combination of Apache Camel and the Amazon Web Services (AWS) interfaces of Simple Storage Service (S3), Simple Queue Service (SQS) and Simple Notification Service (SNS). Thus, The concept of Infrastructure as a Service (IaaS) is used to access messaging systems and data storage without any need for configuration. Registration to AWS and Setup of Camel First, you have to register to the Amazon Web Services (for free). Most AWS services include a free monthly quota, which is absolutely sufficient to play around and develop some simple applications. As its name states, AWS uses technology-independent web services. Besides, APIs for several different programming languages are available to ease development. By the way, Camel uses the AWS SDK for Java (http://aws.amazon.com/sdkforjava), of course. The documentation is detailed and easy to understand, including tutorials, screenshots and code examples . Hint 1: You should read the introductions to S3, SQS and SNS (go to http://aws.amazon.com and click on „products“) and play around with the AWS Management Console (http://aws.amazon.com/console) before you continue. This step is very easy and takes less than one hour. Then, you will have a much better understanding about AWS and where Camel can help you! Hint 2: It really helps to look at the source code of the camel-aws component, It helps you to understand how Camel uses the AWS Java API internally. If you want to write tests, you can do it the same way. In the past, I was afraid of looking at „complex“ source code of open source frameworks. But there is no need to be scared! The camel-aws component (and most other camel components) contain only of a few classes. Everything is easy to understand. It helps you to understand Camel internals, the AWS API, and to spot and solve errors due to exceptions in your code. In the meanwhile, the current Camel version 2.8 supports three AWS services: S3, SQS and SNS. All of them use similar concepts. Therefore, they are included in one single camel component: „camel-aws“. You have to add the libraries to your existing Camel project. As always, the simplest way is to use Maven and add the following dependency to the pom.xml: org.apache.camel camel-aws ${camel-version} Configuration of the Camel Endpoint The implementation and configuration of all three services is very similar. The URI looks like this (the code shows the SQS service): aws-sqs://queue-name[?options] There are two alternatives to configure your endpoint. Using Parameters The easy way is to use two paramters in the URI of your endpoint: „accessKey“ and „secretKey“ (you receive both after your AWS registration). “aws-sqs://unique-queue-name?accessKey=“INSERT_ME“&secretKey=INSERT_ME” Be aware of the following problem, which can result in a strange, non-speaking exception (thanks to Brendan Long): You’ll need to URL encode any +’s in your secret key (otherwise, they’ll be treated as spaces). + = %2B, so if your secretkey was “my+secret\key”, your Camel URL should have “secretKey=my%2Bsecret\key”. “Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.” Source: WC3 URI Recommendations Adding a configured AmazonClient to the Registry If you need to do more configuration (e.g. because your system is behind a firewall), you have to add an AmazonClient object to your registry. The following code shows an example using SQS, but SNS and S3 use exactly the same concept. @Override protected JndiRegistry createRegistry() throws Exception { JndiRegistry registry = super.createRegistry(); AWSCredentials awsCredentials = new BasicAWSCredentials(“INSERT_ME”, “INSERT_ME”); ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setProxyHost(“http://myProxyHost”); clientConfiguration.setProxyPort(8080); AmazonSQSClient client = new AmazonSQSClient(awsCredentials, clientConfiguration); registry.bind(“amazonSQSClient”, client); return registry; } This example overwrites the createRegistry() method of a JUnit test (extending CamelTestSupport). You can also add this information to your runtime Camel application, of course. Apache Camel and the Simple Storage Service (S3) Simple Storage Service (S3) is a key-value-store. You can store small to very large data. The usage is very easy. You create buckets and put key-value data into these buckets. You can also create folders within buckets to organize your data. That’s it. You can monitor your buckets using the AWS Management Console – an intuitive GUI supporting most AWS services. The following example shows both alternatives for accessing the Amazon services (as described above): Paramenters and the AmazonClient. // Transfer data from your file inbox to the AWS S3 service from(“file:files/inbox”) // This is the key of your key-value data .setHeader(S3Constants.KEY, simple(“This is a static key”)) // Using parameters for accessing the AWS service .to(“aws-s3://camel-integration-bucket-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME&region=eu-west-1″); // Transfer data from the AWS S3 service to your file outbox from(“aws-s3://camel-integration-bucket-mwea-kw?amazonS3Client=#amazonS3Client&region=eu-wes”) .to(“file:files/outbox”); There are some additional parameters, for instance you can submit the desired AWS region or delete data after receiving it (see http://camel.apache.org/aws-s3.html and the corresponding SQS and SNS sites for more details about parameters and message headers). As you see in the code, you can use the AWS-S3 endpoint for producing and for consuming messages. Each bucket must be unique, thus you have to add some specific information such as your company to its name. Hint: If a bucket does not exist, Camel is creating it automatically (as the AWS API does). This concept is also used for SQS queues and SNS topics. Apache Camel and the Simple Queue Service (SQS) The Simple Queue Service (SQS) is similar to a JMS provider such as WebSphere MQ or ActiveMQ (but with some differences). You create queues and send messages to them. Consumers receive the messages. Contrary to most other AWS services, you cannot monitor queues by using the AWS management console directly. You have to use the service „Cloudwatch“ (http://aws.amazon.com/cloudwatch) and start an EC2 instance to monitor queues and its content. As you can see in the following code example, the syntax and concepts are almost the same as for the S3 service: from(“file:inbox”) .to(“aws-sqs://camel-integration-queue-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME”); from(“aws-sqs://camel-integration-queue-mwea-kw?amazonSQSClient=#amazonSQSClient”) .to(“file:outbox?fileName=sqs-${date:now:yyyy.MM.dd-hh:mm:ss:SS}”); Again, you can use the AWS-SQS endpoint for producing and for consuming messages. Each queue name must be unique. There exist two important differences to JMS (copy & paste from the AWS documentation): Q: How many times will I receive each message? Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies. Q: Why are there separate ReceiveMessage and DeleteMessage operations? When Amazon SQS returns a message to you, that message stays in the queue, whether or not you actually received the message. You are responsible for deleting the message; the delete request acknowledges that you’re done processing the message. If you don’t delete the message, Amazon SQS will deliver it again on another receive request. Apache Camel and the Simple Notification Service (SNS) The Simple Notification Service (SNS) acts like JMS topics. You create a topic, consumers subscribe to the topic and then receive notifications. Several transport protocols are supported: HTTP(S), Email and SQS. Further interfaces will be added in the future, e.g. the Short Message Service (SMS) for mobile phones. Contrary to S3 and SQS, Camel only offers a producer endpoint for this AWS service. You can only create topics and send messages via Camel. The reason is simple: Camel already offers endpoints for consuming these messages: HTTP, Email and SQS are already available. There is one tradeoff: A consumer cannot subscribe to topics using Camel – at the moment. The AWS Management Console has to be used. A very interesting discussion can be read on the Camel JIRA issue regarding the following questions: Should Camel be able to subscribe to topics? Should the producer contain this feature or should there be a consumer? In my opinion, there should be a consumer which is able to subscribe to topics, otherwise Camel is missing a key part of the AWS SNS service! Please read the discussion and contribute your opinion: https://issues.apache.org/jira/browse/CAMEL-3476. Apache Camel is already ready for the Cloud Computing Era AWS offers many more services for the cloud. Probably, it does not make sense to integrate everyone into Camel, but more AWS services will be supported in the future. For instance, SimpleDB and the Relational Database Service (RDS) are already planned and make sende, too: http://camel.apache.org/aws.html. The conclusion is easy: Apache Camel is already ready for the cloud computing era. Several important cloud services are already supported. Cloud integration will become very important in the future. Thus, Camel is on a very good way. Hopefully, we will see more cloud components, soon. I will continue to write articles about other Camel cloud components (and new AWS addons, ouf course). For instance, a component for the Platform as a Service (PaaS) product Google App Engine (GAE) is already available. If you have any additional important information, questions or other feedback, please write a comment. Thank you in advance… Best regards, Kai Wähner (Twitter: @KaiWaehner) [Content from my Blog: Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS]
August 30, 2011
by Kai Wähner DZone Core CORE
· 26,149 Views
article thumbnail
Concurrency Pattern: Producer and Consumer
In my career spanning 15 years, the problem of Producer and Consumer is one that I have come across only a few times. In most programming cases, what we are doing is performing functions in a synchronous fashion where the JVM or the web container handles the complexities of multi-threading on its own. However, when writing certain kinds of use cases where we need this. Last week, I came acros one such use case that sent me 3 years back when I last did it. However, the way it was done last time was very different. When I first heard the problem statement, I knew instantly what was needed. However, my approach to doing it this time was going to be different from last time. It had simply to do with how I am viewing technology in my life today. I will not go into any non-technical side and will jump straight into the problem and its solution. I started to look at what existed in the market and did come across a couple of posts that helped me in channelizing my thoughts in the right way. Problem Statement We need a solution for a batch migration. We are migrating data form System 1 to System 2 and in the process we need to do three tasks: Load data from Database based on groups Process the data Update the records loaded in step#1 with modifications We have to handle 100s of groups and each group will have around 40K records. You can imagine the amount of time it would take if we were to perform this exercise in a synchronous fashion. Image here explains this problem in an effective way. Producer Consumer: The Problem Producer and Consumer Pattern Let us take a look at the Producer Consumer pattern to begin with. If you refer to the problem statement above and look at the image, we see that there are so many entities who are ready with their part of data. However, there are not enough workers who can process all the data. Hence, as the producers continue to line-up in a queue it just continues to grow. We see that the systems start to hog up threads and take a lot of time. Intermediate Solution Producer Consumer: The Intermediate approch We do have an intermediate solution. Refer to the image and you will immediately notice that the producers are piling up their work in a filing cabinet and the worker continues to pick it up as they get done with the previous task. However, this approach does have some glaring shortcomings: There is still one worker who has to do all the work. The external systems may be happy, but the task will still continue to exist until the worker has completed all of the tasks The producers will pile up their data in a queue and it needs resources to hold the same. Just as in this example the cabinet can fill up, the same can happen with the JVM resources too. We need to be careful how much data we are going to place in memory and in some cases it may not be much. The Solution Producer Consumer: The Solution The solution is what we see everyday in many places – like the cinema hall queue, Petrol Pumps etc. There are so many people who come in to book a ticket and based on how many people come in, the more people are added to issue tickets. Essentially, refer to image here and you will notice that Producers will keep adding their jobs to the cabinet and we have more workers to handle the work load. Java provided concurrency package to solve this issue. Till now, I have always worked on threading at a much lower level and this was first time I was going to work with this package. As I started to explore the web and read fellow bloggers with what they have to say, I came across one very good article. It helped in understanding the use of BlockingQueue in a very effective manner. However, the solutions provided by Dhruba would not have helped me in achieving the high throughput which is needed. So, I started to explore the use of ArrayBlockingQueue for the same. The Controller This is the first class where the contract between the producers and consumers are managed. The controller will setup 1 thread for the Producer and 2 threads for the consumer. Based on the needs we can create as many threads as we need; and even can even read the data from a properties or do some dynamic magic. For now, we will keep this simple. package com.kapil.techieforever.producerconsumer; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; public class TestProducerConsumer { public static void main(String args[]) { try { Broker broker = new Broker(); ExecutorService threadPool = Executors.newFixedThreadPool(3); threadPool.execute(new Consumer("1", broker)); threadPool.execute(new Consumer("2", broker)); Future producerStatus = threadPool.submit(new Producer(broker)); // this will wait for the producer to finish its execution. producerStatus.get(); threadPool.shutdown(); } catch (Exception e) { e.printStackTrace(); } } } I am using ExecuteService to create a thread pool and manage it. Instead of using the basic Thread implementation, this is a more effective way as it will handle the exiting and restarting the threads as needed. You will also notice that I am using Future class to get the status of the producer thread. This class is very effective and will halt my program from further execution. This is a nice way of replacing the “.join” method on the threads. Note: I am not using Future very effectively in this example; so you may have to try a few things as you feel fit. Also, you should note the Broker class which is being used as filing cabinet between the producers and consumers. We will see its implementation in just a little while. The Producer This class is responsible for producing the data that needs to be worked upon. package com.kapil.techieforever.producerconsumer; public class Producer implements Runnable { private Broker broker; public Producer(Broker broker) { this.broker = broker; } @Override public void run() { try { for (Integer i = 1; i < 5 + 1; ++i) { System.out.println("Producer produced: " + i); Thread.sleep(100); broker.put(i); } this.broker.continueProducing = Boolean.FALSE; System.out.println("Producer finished its job; terminating."); } catch (InterruptedException ex) { ex.printStackTrace(); } } } This class is doing the most simplest of things that it can do – adding an integer to the broker. Some key areas to note are: 1. There is a property on Broker which is updated in the end by the producer when its done producing. This is also known as the “final” or “poison” entry. This is used by the consumers to know that there are no more data coming up 2. I have used Thread.sleep to simulate that some producers may take more time to produce the data. You can tweak this value and see the consumers act The Consumer This class is responsible for reading the data from the broker and doing its job package com.kapil.techieforever.producerconsumer; public class Consumer implements Runnable { private String name; private Broker broker; public Consumer(String name, Broker broker) { this.name = name; this.broker = broker; } @Override public void run() { try { Integer data = broker.get(); while (broker.continueProducing || data != null) { Thread.sleep(1000); System.out.println("Consumer " + this.name + " processed data from broker: " + data); data = broker.get(); } System.out.println("Comsumer " + this.name + " finished its job; terminating."); } catch (InterruptedException ex) { ex.printStackTrace(); } } } This is again a simple class that reads the Integer and prints it on the console. However, key points to note are: 1. The loop to process data is an endless loop, that runs on two conditions – until the producer is consuming and there is some data with the broker 2. Again, the Thread.sleep is used to create effective and different scenarios The Broker package com.kapil.techieforever.producerconsumer; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.TimeUnit; public class Broker { public ArrayBlockingQueue queue = new ArrayBlockingQueue(100); public Boolean continueProducing = Boolean.TRUE; public void put(Integer data) throws InterruptedException { this.queue.put(data); } public Integer get() throws InterruptedException { return this.queue.poll(1, TimeUnit.SECONDS); } } The very first thing to note is that we are using ArrayBlockingQueue as the data holder. I am not going to say what this does, but insist you to read it on the JavaDocs here. however, I will explain that the producers are going to place the data in the queue and the consumers will fetch from the queue in FIFO format. But, if the producers are slow, the consumers will wait for data to come in and if the array is full, the producers will wait for it to fill up. Also, note that I am using the ‘poll’ function instead of get in the queue. This is to ensure that the consumers will not keep waiting for ever and the waiting will time out after a few seconds. This helps us in inter-communication and kill the consumers when all the data is processed. (Note: try replacing poll with get and you will see some interesting outputs). Code I have the code sitting on Google project hosting. Feel free to go across and download it from there. It is essentially an eclipse (Spring STS) project. You may also get additional packages and classes when you download it based on when you are downloading it. Feel free to look into those too and share your comments - You can browse the source code on the SVN browser or; - You can download it from the project itself From http://scratchpad101.com/2011/08/22/concurrency-pattern-producer-consumer/
August 29, 2011
by Kapil Viren Ahuja
· 68,531 Views · 2 Likes
article thumbnail
Java NIO vs. IO
when studying both the java nio and io api's, a question quickly pops into mind: when should i use io and when should i use nio? in this text i will try to shed some light on the differences between java nio and io, their use cases, and how they affect the design of your code. main differences of java nio and io the table below summarizes the main differences between java nio and io. i will get into more detail about each difference in the sections following the table. io nio stream oriented buffer oriented blocking io non blocking io selectors stream oriented vs. buffer oriented the first big difference between java nio and io is that io is stream oriented, where nio is buffer oriented. so, what does that mean? java io being stream oriented means that you read one or more bytes at a time, from a stream. what you do with the read bytes is up to you. they are not cached anywhere. furthermore, you cannot move forth and back in the data in a stream. if you need to move forth and back in the data read from a stream, you will need to cache it in a buffer first. java nio's buffer oriented approach is slightly different. data is read into a buffer from which it is later processed. you can move forth and back in the buffer as you need to. this gives you a bit more flexibility during processing. however, you also need to check if the buffer contains all the data you need in order to fully process it. and, you need to make sure that when reading more data into the buffer, you do not overwrite data in the buffer you have not yet processed. blocking vs. non-blocking io java io's various streams are blocking. that means, that when a thread invokes a read() or write(), that thread is blocked until there is some data to read, or the data is fully written. the thread can do nothing else in the meantime. java nio's non-blocking mode enables a thread to request reading data from a channel, and only get what is currently available, or nothing at all, if no data is currently available. rather than remain blocked until data becomes available for reading, the thread can go on with something else. the same is true for non-blocking writing. a thread can request that some data be written to a channel, but not wait for it to be fully written. the thread can then go on and do something else in the mean time. what threads spend their idle time on when not blocked in io calls, is usually performing io on other channels in the meantime. that is, a single thread can now manage multiple channels of input and output. selectors java nio's selectors allow a single thread to monitor multiple channels of input. you can register multiple channels with a selector, then use a single thread to "select" the channels that have input available for processing, or select the channels that are ready for writing. this selector mechanism makes it easy for a single thread to manage multiple channels. how nio and io influences application design whether you choose nio or io as your io toolkit may impact the following aspects of your application design: the api calls to the nio or io classes. the processing of data. the number of thread used to process the data. the api calls of course the api calls when using nio look different than when using io. this is no surprise. rather than just read the data byte for byte from e.g. an inputstream, the data must first be read into a buffer, and then be processed from there. the processing of data the processing of the data is also affected when using a pure nio design, vs. an io design. in an io design you read the data byte for byte from an inputstream or a reader. imagine you were processing a stream of line based textual data. for instance: name: anna age: 25 email: [email protected] phone: 1234567890 this stream of text lines could be processed like this: inputstream input = ... ; // get the inputstream from the client socket bufferedreader reader = new bufferedreader(new inputstreamreader(input)); string nameline = reader.readline(); string ageline = reader.readline(); string emailline = reader.readline(); string phoneline = reader.readline(); notice how the processing state is determined by how far the program has executed. in other words, once the first reader.readline() method returns, you know for sure that a full line of text has been read. the readline() blocks until a full line is read, that's why. you also know that this line contains the name. similarly, when the second readline() call returns, you know that this line contains the age etc. as you can see, the program progresses only when there is new data to read, and for each step you know what that data is. once the executing thread have progressed past reading a certain piece of data in the code, the thread is not going backwards in the data (mostly not). this principle is also illustrated in this diagram: java io: reading data from a blocking stream. a nio implementation would look different. here is a simplified example: bytebuffer buffer = bytebuffer.allocate(48); int bytesread = inchannel.read(buffer); notice the second line which reads bytes from the channel into the bytebuffer. when that method call returns you don't know if all the data you need is inside the buffer. all you know is that the buffer contains some bytes. this makes processing somewhat harder. imagine if, after the first read(buffer) call, that all what was read into the buffer was half a line. for instance, "name: an". can you process that data? not really. you need to wait until at leas a full line of data has been into the buffer, before it makes sense to process any of the data at all. so how do you know if the buffer contains enough data for it to make sense to be processed? well, you don't. the only way to find out, is to look at the data in the buffer. the result is, that you may have to inspect the data in the buffer several times before you know if all the data is inthere. this is both inefficient, and can become messy in terms of program design. for instance: bytebuffer buffer = bytebuffer.allocate(48); int bytesread = inchannel.read(buffer); while(! bufferfull(bytesread) ) { bytesread = inchannel.read(buffer); } the bufferfull() method has to keep track of how much data is read into the buffer, and return either true or false, depending on whether the buffer is full. in other words, if the buffer is ready for processing, it is considered full. the bufferfull() method scans through the buffer, but must leave the buffer in the same state as before the bufferfull() method was called. if not, the next data read into the buffer might not be read in at the correct location. this is not impossible, but it is yet another issue to watch out for. if the buffer is full, it can be processed. if it is not full, you might be able to partially process whatever data is there, if that makes sense in your particular case. in many cases it doesn't. the is-data-in-buffer-ready loop is illustrated in this diagram: java nio: reading data from a channel until all needed data is in buffer. summary nio allows you to manage multiple channels (network connections or files) using only a single (or few) threads, but the cost is that parsing the data might be somewhat more complicated than when reading data from a blocking stream. if you need to manage thousands of open connections simultanously, which each only send a little data, for instance a chat server, implementing the server in nio is probably an advantage. similarly, if you need to keep a lot of open connections to other computers, e.g. in a p2p network, using a single thread to manage all of your outbound connections might be an advantage. this one thread, multiple connections design is illustrated in this diagram: java nio: a single thread managing multiple connections. if you have fewer connections with very high bandwidth, sending a lot of data at a time, perhaps a classic io server implementation might be the best fit. this diagram illustrates a classic io server design: java io: a classic io server design - one connection handled by one thread. from http://tutorials.jenkov.com/java-nio/nio-vs-io.html
August 28, 2011
by Jakob Jenkov
· 133,839 Views · 19 Likes
article thumbnail
Dissecting the Disruptor: Why it's so fast (part one) - Locks Are Bad
martin fowler has written a really good article describing not only the disruptor , but also how it fits into the architecture at lmax . this gives some of the context that has been missing so far, but the most frequently asked question is still "what is the disruptor?". i'm working up to answering that. i'm currently on question number two: "why is it so fast?". these questions do go hand in hand, however, because i can't talk about why it's fast without saying what it does, and i can't talk about what it is without saying why it is that way. so i'm trapped in a circular dependency. a circular dependency of blogging. to break the dependency, i'm going to answer question one with the simplest answer, and with any luck i'll come back to it in a later post if it still needs explanation: the disruptor is a way to pass information between threads. as a developer, already my alarm bells are going off because the word "thread" was just mentioned, which means this is about concurrency, and concurrency is hard. concurrency 101 imagine two threads are trying to change the same value. case one: thread 1 gets there first: the value changes to "blah" then the value changes to "blahy" when thread 2 gets there. case two: thread 2 gets there first: the value changes to "fluffy" then the value changes to "blah" when thread 1 gets there. case three: thread 1 interrupts thread 2: thread 2 gets the value "fluff" and stores it as myvalue thread 1 goes in and updates value to "blah" then thread 2 wakes up and sets the value to "fluffy". case three is probably the only one which is definitely wrong, unless you think the naive approach to wiki editing is ok ( google code wiki, i'm looking at you...). in the other two cases it's all about intentions and predictability. thread 2 might not care what's in value, the intention might be to append "y" to whatever is in there regardless. in this circumstance, cases one and two are both correct. but if thread 2 only wanted to change "fluff" to "fluffy", then both cases two and three are incorrect. assuming that thread 2 wants to set the value to "fluffy", there are some different approaches to solving the problem. approach one: pessimistic locking (does the "no entry" sign make sense to people who don't drive in britain?) the terms pessimistic and optimistic locking seem to be more commonly used when talking about database reads and writes, but the principal applies to getting a lock on an object. thread 2 grabs a lock on entry as soon as it knows it needs it and stops anything from setting it. then it does its thing, sets the value, and lets everything else carry on. you can imagine this gets quite expensive, with threads hanging around all over the place trying to get hold of objects and being blocked. the more threads you have, the more chance that things are going to grind to a halt. approach two: optimistic locking in this case thread 2 will only lock entry when it needs to write to it. in order to make this work, it needs to check if entry has changed since it first looked at it. if thread 1 came in and changed the value to "blah" after thread 2 had read the value, thread 2 couldn't write "fluffy" to the entry and trample all over the change from thread 1. thread 2 could either re-try (go back, read the value, and append "y" onto the end of the new value), which you would do if thread 2 didn't care what the value it was changing was; or it could throw an exception or return some sort of failed update flag if it was expecting to change "fluff" to "fluffy". an example of this latter case might be if you have two users trying to update a wiki page, and you tell the user on the other end of thread 2 they'll need to load the new changes from thread 1 and then reapply their changes. potential problem: deadlock locking can lead to all sorts of issues, for example deadlock. imagine two threads that need access to two resources to do whatever they need to do: if you've used an over-zealous locking technique, both threads are going to sit there forever waiting for the other one to release its lock on the resource. that's when you reboot windows your computer. definite problem: locks are sloooow... the thing about locks is that they need the operating system to arbitrate the argument. the threads are like siblings squabbling over a toy, and the os kernel is the parent that decides which one gets it. it's like when you run to your dad to tell him your sister has nicked the transformer when you wanted to play with it - he's got bigger things to worry about than you two fighting, and he might finish off loading the dishwasher and putting on the laundry before settling the argument. if you draw attention to yourself with a lock, not only does it take time to get the operating system to arbitrate, the os might decide the cpu has better things to do than servicing your thread. the disruptor paper talks about an experiment we did. the test calls a function incrementing a 64-bit counter in a loop 500 million times. for a single thread with no locking, the test takes 300ms. if you add a lock (and this is for a single thread, no contention, and no additional complexity other than the lock) the test takes 10,000ms. that's, like, two orders of magnitude slower. even more astounding, if you add a second thread (which logic suggests should take maybe half the time of the single thread with a lock) it takes 224,000ms. incrementing a counter 500 million times takes nearly a thousand times longer when you split it over two threads instead of running it on one with no lock. concurrency is hard and locks are bad i'm just touching the surface of the problem, and obviously i'm using very simple examples. but the point is, if your code is meant to work in a multi-threaded environment, your job as a developer just got a lot more difficult: naive code can have unintended consequences. case three above is an example of how things can go horribly wrong if you don't realise you have multiple threads accessing and writing to the same data. selfish code is going to slow your system down. using locks to protect your code from the problem in case three can lead to things like deadlock or simply poor performance. this is why many organisations have some sort of concurrency problems in their interview process (certainly for java interviews). unfortunately it's very easy to learn how to answer the questions without really understanding the problem, or possible solutions to it. how does the disruptor address these issues? for a start, it doesn't use locks. at all. instead, where we need to make sure that operations are thread-safe (specifically, updating the next available sequence number in the case of multiple producers ), we use a cas (compare and swap/set) operation. this is a cpu-level instruction, and in my mind it works a bit like optimistic locking - the cpu goes to update a value, but if the value it's changing it from is not the one it expects, the operation fails because clearly something else got in there first. note this could be two different cores rather than two separate cpus. cas operations are much cheaper than locks because they don't involve the operating system, they go straight to the cpu. but they're not cost-free - in the experiment i mentioned above, where a lock-free thread takes 300ms and a thread with a lock takes 10,000ms, a single thread using cas takes 5,700ms. so it takes less time than using a lock, but more time than a single thread that doesn't worry about contention at all. back to the disruptor - i talked about the claimstrategy when i went over the producers . in the code you'll see two strategies, a singlethreadedstrategy and a multithreadedstrategy. you could argue, why not just use the multi-threaded one with only a single producer? surely it can handle that case? and it can. but the multi-threaded one uses an atomiclong (java's way of providing cas operations), and the single-threaded one uses a simple long with no locks and no cas. this means the single-threaded claim strategy is as fast as possible, given that it knows there is only one producer and therefore no contention on the sequence number. i know what you're thinking: turning one single number into an atomiclong can't possibly have been the only thing that is the secret to the disruptor's speed. and of course, it's not - otherwise this wouldn't be called "why it's so fast (part one )". but this is an important point - there's only one place in the code where multiple threads might be trying to update the same value. only one place in the whole of this complicated data-structure-slash-framework. and that's the secret. remember everything has its own sequence number? if you only have one producer then every sequence number in the system is only ever written to by one thread. that means there is no contention. no need for locks. no need even for cas. the only sequence number that is ever written to by more than one thread is the one on the claimstrategy if there is more than one producer. this is also why each variable in the entry can only be written to by one consumer . it ensures there's no write contention, therefore no need for locks or cas. back to why queues aren't up to the job so you start to see why queues, which may implemented as a ring buffer under the covers, still can't match the performance of the disruptor. the queue, and the basic ring buffer , only has two pointers - one to the front of the queue and one to the end: if more than one producer wants to place something on the queue, the tail pointer will be a point of contention as more than one thing wants to write to it. if there's more than one consumer, then the head pointer is contended, because this is not just a read operation but a write, as the pointer is updated when the item is consumed from the queue. but wait, i hear you cry foul! because we already knew this, so queues are usually single producer and single consumer (or at least they are in all the queue comparisons in our performance tests). there's another thing to bear in mind with queues/buffers. the whole point is to provide a place for things to hang out between producers and consumers, to help buffer bursts of messages from one to the other. this means the buffer is usually full (the producer is out-pacing the consumer) or empty (the consumer is out-pacing the producer). it's rare that the producer and consumer will be so evenly-matched that the buffer has items in it but the producers and consumers are keeping pace with each other. so this is how things really look. an empty queue: ...and a full queue: the queue needs a size so that it can tell the difference between empty and full. or, if it doesn't, it might determine that based on the contents of that entry, in which case reading an entry will require a write to erase it or mark it as consumed. whichever implementation is chosen, there's quite a bit of contention around the tail, head and size variables, or the entry itself if a consume operation also includes a write to remove it. on top of this, these three variables are often in the same cache line , leading to false sharing . so, not only do you have to worry about the producer and the consumer both causing a write to the size variable (or the entry), updating the tail pointer could lead to a cache-miss when the head pointer is updated because they're sat in the same place. i'm going to duck out of going into that in detail because this post is quite long enough as it is. so this is what we mean when we talk about "teasing apart the concerns" or a queue's "conflated concerns". by giving everything its own sequence number and by allowing only one consumer to write to each variable in the entry, the only case the disruptor needs to manage contention is where more than one producer is writing to the ring buffer. in summary the disruptor a number of advantages over traditional approaches: no contention = no locks = it's very fast. having everything track its own sequence number allows multiple producers and multiple consumers to use the same data structure. tracking sequence numbers at each individual place (ring buffer, claim strategy, producers and consumers), plus the magic cache line padding , means no false sharing and no unexpected contention. from http://mechanitis.blogspot.com/2011/07/dissecting-disruptor-why-its-so-fast.html
July 23, 2011
by Trisha Gee
· 12,648 Views · 1 Like
article thumbnail
Testing Entity Validations with a Mock Entity - Roo in Action Corner
In Spring Roo in Action, Chapter 3, I discuss how Roo automatically executes the Bean Validators when persisting a live entity. However, when running unit tests, we don't have a live entity at all, nor do we have a Spring container - so how can we exercise the validation without actually hitting our Roo application and the database? The following post is ancillary material from the upcoming book Spring Roo in Action, by Ken Rimple and Srini Penchikala, with Gordon Dickens. You can purchase the MEAP edition of the book, and participate in the author forum, at www.manning.com/rimple. The answer is that we have to bootstrap the validation framework within the test ourselves. We can use the CourseDataOnDemand class's getNewTransientEntityName method to generate a valid, transient JPA entity. Then, we can: Mock static entity methods, such as findById, to bring back pre-fabricated class instances of your entity Initialize the validation engine, bootstrapping a JSR-303 bean validation framework engine, and perform validation on your entity Set any appropriate properties to apply to a particular test condition Initialize a test instance of the entity validator and assert the appropriate validation results are returned The concept in action... Given a Student entity with the following definition: @RooEntity @RooJavaBean @RooToString public class Student { @NotNull private String emergencyContactInfo; ... } The listing below shows a unit test method that ensures the NotNull validation fires against missing emergency contact information on the Student entity: @Test public void testStudentMissingEmergencyContactValidation() { // setup our test data StudentDataOnDemand dod = new StudentDataOnDemand(); // tell the mock to expect this call Student.findStudent(1L); // tell the mocking API to expect a return from the prior call in the form of // a new student from the test data generator, dod AnnotationDrivenStaticEntityMockingControl.expectReturn( dod.getNewTransientStudent(0)); // put our mock in playback mode AnnotationDrivenStaticEntityMockingControl.playback(); // Setup the validator API in our unit test LocalValidatorFactoryBean validator = new LocalValidatorFactoryBean(); validator.afterPropertiesSet(); // execute the call from the mock, set the emergency contact field // to an invalid value Student student = Student.findStudent(1L); student.setEmergencyContactInfo(null); // execute validation, check for violations Set> violations = validator.validate(student, Default.class); // do we have one? Assert.assertEquals(1, violations.size()); // now, check the constraint violations to check for our specific error ConstraintViolation violation = violations.iterator().next(); // contains the right message? Assert.assertEquals("{javax.validation.constraints.NotNull.message}", violation.getMessageTemplate()); // from the right field? Assert.assertEquals("emergencyContactInfo", violation.getPropertyPath().toString()); } Analysis The test starts with a declaration of a StudentOnDemand object, which we'll use to generate our test data. We'll get into the more advanced uses of the DataOnDemand Framework later in the chapter. For now, keep in mind that we can use this class to create an instance of an Entity, with randomly assigned, valid data. We then require that the test calls the Student.findStudent method, passing it a key of 1L. Next, we'll tell the entity mocking framework that the call should return a new transient Student instance. At this point, we've defined our static mocking behavior, so we'll put the mocking framework into playback mode. Next, we issue the actual Student.findById(1L) call, this time storing the result as the member variable student. This call will trip the mock, which will return a new transient instance. We then set the emergencyContactInfo field to null, so that it becomes invalid, as it is annotated with a @NotNull annotation. Now we are ready to set up our bean validation framework. We create a LocalValidatorFactoryBean instance, which will boot the Bean Validation Framework in the afterPropertiesSet() method, which is defined for any Spring Bean implementing InitializingBean. We must call this method ourselves, because Spring is not involved in our unit test. Now we're ready to run our validation and assert the proper behavior has occurred. We call our validator's validate method, passing it the student instance and the standard Default validation group, which will trigger validation. We'll then check that we only have one validation failure, and that the message template for the error is the same as the one for the @NotNull validation. We also check to ensure that the field that caused the validation was our emergencyContactInfo field. In our answer callback, we can launch the Bean Validation Framework, and execute the validate method against our entity. In this way, we can exercise our bean instance any way we want, and instead of persisting the entity, can perform the validation phase and exit gracefully. Caveats... There are a few things slightly wrong here. First of all, the Data on Demand classes actually use Spring to inject relationships to each other, which I've logged a bug against as ROO-2497. You can override the setup of the data on demand class and manually create the DoD of the referring one, which is fine. They have slated to work on this bug for Roo 1.2, so it should be fixed sometime in the next few months. Also, realize that this is NOT easy to do, compared to writing an integration test. However, this test runs markedly faster. If you have some sophisticated logic that you've attached to a @AssertTrue annotation, this is the way to test it in isolation. From http://www.rimple.com/tech/2011/7/17/testing-entity-validations-with-a-mock-entity-roo-in-action.html
July 20, 2011
by Ken Rimple
· 14,573 Views
article thumbnail
Lucene's near-real-time search is fast!
Lucene's near-real-time (NRT) search feature, available since 2.9, enables an application to make index changes visible to a new searcher with fast turnaround time. In some cases, such as modern social/news sites (e.g., LinkedIn, Twitter, Facebook, Stack Overflow, Hacker News, DZone, etc.), fast turnaround time is a hard requirement. Fortunately, it's trivial to use. Just open your initial NRT reader, like this: // w is your IndexWriter IndexReader r = IndexReader.open(w, true); (That's the 3.1+ API; prior to that use w.getReader() instead). The returned reader behaves just like one opened with IndexReader.open: it exposes the point-in-time snapshot of the index as of when it was opened. Wrap it in an IndexSearcher and search away! Once you've made changes to the index, call r.reopen() and you'll get another NRT reader; just be sure to close the old one. What's special about the NRT reader is that it searches uncommitted changes from IndexWriter, enabling your application to decouple fast turnaround time from index durability on crash (i.e., how often commit is called), something not previously possible. Under the hood, when an NRT reader is opened, Lucene flushes indexed documents as a new segment, applies any buffered deletions to in-memory bit-sets, and then opens a new reader showing the changes. The reopen time is in proportion to how many changes you made since last reopening that reader. Lucene's approach is a nice compromise between immediate consistency, where changes are visible after each index change, and eventual consistency, where changes are visible "later" but you don't usually know exactly when. With NRT, your application has controlled consistency: you decide exactly when changes must become visible. Recently there have been some good improvements related to NRT: New default merge policy, TieredMergePolicy, which is able to select more efficient non-contiguous merges, and favors segments with more deletions. NRTCachingDirectory takes load off the IO system by caching small segments in RAM (LUCENE-3092). When you open an NRT reader you can now optionally specify that deletions do not need to be applied, making reopen faster for those cases that can tolerate temporarily seeing deleted documents returned, or have some other means of filtering them out (LUCENE-2900). Segments that are 100% deleted are now dropped instead of inefficiently merged (LUCENE-2010). How fast is NRT search? I created a simple performance test to answer this. I first built a starting index by indexing all of Wikipedia's content (25 GB plain text), broken into 1 KB sized documents. Using this index, the test then reindexes all the documents again, this time at a fixed rate of 1 MB/second plain text. This is a very fast rate compared to the typical NRT application; for example, it's almost twice as fast as Twitter's recent peak during this year's superbowl (4,064 tweets/second), assuming every tweet is 140 bytes, and assuming Twitter indexed all tweets on a single shard. The test uses updateDocument, replacing documents by randomly selected ID, so that Lucene is forced to apply deletes across all segments. In addition, 8 search threads run a fixed TermQuery at the same time. Finally, the NRT reader is reopened once per second. I ran the test on modern hardware, a 24 core machine (dual x5680 Xeon CPUs) with an OCZ Vertex 3 240 GB SSD, using Oracle's 64 bit Java 1.6.0_21 and Linux Fedora 13. I gave Java a 2 GB max heap, and used MMapDirectory. The test ran for 6 hours 25 minutes, since that's how long it takes to re-index all of Wikipedia at a limited rate of 1 MB/sec; here's the resulting QPS and NRT reopen delay (milliseconds) over that time: The search QPS is green and the time to reopen each reader (NRT reopen delay in milliseconds) is blue; the graph is an interactive Dygraph, so if you click through above, you can then zoom in to any interesting region by clicking and dragging. You can also apply smoothing by entering the size of the window into the text box in the bottom left part of the graph. Search QPS dropped substantially with time. While annoying, this is expected, because of how deletions work in Lucene: documents are merely marked as deleted and thus are still visited but then filtered out, during searching. They are only truly deleted when the segments are merged. TermQuery is a worst-case query; harder queries, such as BooleanQuery, should see less slowdown from deleted, but not reclaimed, documents. Since the starting index had no deletions, and then picked up deletions over time, the QPS dropped. It looks like TieredMergePolicy should perhaps be even more aggressive in targeting segments with deletions; however, finally around 5:40 a very large merge (reclaiming many deletions) was kicked off. Once it finished the QPS recovered somewhat. Note that a real NRT application with deletions would see a more stable QPS since the index in "steady state" would always have some number of deletions in it; starting from a fresh index with no deletions is not typical. Reopen delay during merging The reopen delay is mostly around 55-60 milliseconds (mean is 57.0), which is very fast (i.e., only 5.7% "duty cycle" of the every 1.0 second reopen rate). There are random single spikes, which is caused by Java running a full GC cycle. However, large merges can slow down the reopen delay (once around 1:14, again at 3:34, and then the very large merge starting at 5:40). Many small merges (up to a few 100s of MB) were done but don't seem to impact reopen delay. Large merges have been a challenge in Lucene for some time, also causing trouble for ongoing searching. I'm not yet sure why large merges so adversely impact reopen time; there are several possibilities. It could be simple IO contention: a merge keeps the IO system very busy reading and writing many bytes, thus interfering with any IO required during reopen. However, if that were the case, NRTCachingDirectory (used by the test) should have prevented it, but didn't. It's also possible that the OS is [poorly] choosing to evict important process pages, such as the terms index, in favor of IO caching, causing the term lookups required when applying deletes to hit page faults; however, this also shouldn't be happening in my test since I've set Linux's swappiness to 0. Yet another possibility is Linux's write cache becomes temporarily too full, thus stalling all IO in the process until it clears; in this case perhaps tuning some of Linux's pdflush tunables could help, although I'd much rather find a Lucene-only solution so this problem can be fixed without users having to tweak such advanced OS tunables, even swappiness. Fortunately, we have an active Google Summer of Code student, Varun Thacker, working on enabling Directory implementations to pass appropriate flags to the OS when opening files for merging (LUCENE-2793 and LUCENE-2795). From past testing I know that passing O_DIRECT can prevent merges from evicting hot pages, so it's possible this will fix our slow reopen time as well since it bypasses the write cache. Finally, it's always possible other OSs do a better job managing the buffer cache, and wouldn't see such reopen delays during large merges. This issue is still a mystery, as there are many possibilities, but we'll eventually get to the bottom of it. It could be we should simply add our own IO throttling, so we can control net MB/sec read and written by merging activity. This would make a nice addition to Lucene! Except for the slowdown during merging, the performance of NRT is impressive. Most applications will have a required indexing rate far below 1 MB/sec per shard, and for most applications reopening once per second is fast enough. While there are exciting ideas to bring true real-time search to Lucene, by directly searching IndexWriter's RAM buffer as Michael Busch has implemented at Twitter with some cool custom extensions to Lucene, I doubt even the most demanding social apps actually truly need better performance than we see today with NRT. NIOFSDirectory vs MMapDirectory Out of curiosity, I ran the exact same test as above, but this time with NIOFSDirectory instead of MMapDirectory: There are some interesting differences. The search QPS is substantially slower -- starting at 107 QPS vs 151, though part of this could easily be from getting different compilation out of hotspot. For some reason TermQuery, in particular, has high variance from one JVM instance to another. The mean reopen time is slower: 67.7 milliseconds vs 57.0, and the reopen time seems more affected by the number of segments in the index (this is the saw-tooth pattern in the graph, matching when minor merges occur). The takeaway message seems clear: on Linux, use MMapDirectory not NIOFSDirectory! Optimizing your NRT turnaround time My test was just one datapoint, at a fixed fast reopen period (once per second) and at a high indexing rate (1 MB/sec plain text). You should test specifically for your use-case what reopen rate works best. Generally, the more frequently you reopen the faster the turnaround time will be, since fewer changes need to be applied; however, frequent reopening will reduce the maximum indexing rate. Most apps have relatively low required indexing rates compared to what Lucene can handle and can thus pick a reopen rate to suit the application's turnaround time requirements. There are also some simple steps you can take to reduce the turnaround time: Store the index on a fast IO system, ideally a modern SSD. Install a merged segment warmer (see IndexWriter.setMergedSegmentWarmer). This warmer is invoked by IndexWriter to warm up a newly merged segment without blocking the reopen of a new NRT reader. If your application uses Lucene's FieldCache or has its own caches, this is important as otherwise that warming cost will be spent on the first query to hit the new reader. Use only as many indexing threads as needed to achieve your required indexing rate; often 1 thread suffices. The fewer threads used for indexing, the faster the flushing, and the less merging (on trunk). If you are using Lucene's trunk, and your changes include deleting or updating prior documents, then use the Pulsing codec for your id field since this gives faster lookup performance which will make your reopen faster. Use the new NRTCachingDirectory, which buffers small segments in RAM to take load off the IO system (LUCENE-3092). Pass false for applyDeletes when opening an NRT reader, if your application can tolerate seeing deleted doccs from the returned reader. While it's not clear that thread priorities actually work correctly (see this Google Tech Talk), you should still set your thread priorities properly: the thread reopening your readers should be highest; next should be your indexing threads; and finally lowest should be all searching threads. If the machine becomes saturated, ideally only the search threads should take the hit. Happy near-real-time searching!
July 11, 2011
by Michael Mccandless
· 19,008 Views
article thumbnail
SEVERE: Error in xpath:java.lang.RuntimeException: solrconfig.xml missing luceneMatchVersion
One of the things that changed from Solr 1.4.1 to 1.5+ was the introduction of a parameter to tell Solr / Lucene which kind of compability version its index files should be created and used in. Solr now refuses to start if you do not provide this setting (if you’re upgrading a previous installation from 1.4.1 or earlier). The fix isn’t really straight forward, and you’ll probably have to recreate your index files if you’re just arriving at the scene with Solr / Lucene 3.2 and 4.0. Solr 3.0 (1.5) might be able to upgrade the files from the 2.9 version, but if you’re jumping from Lucene 2.9 to 4.0, the easiest solution seems to be to delete the current index and reindex (set up replication, disable replication from the master, query the slave while reindexing the master, etc.. and you’ll have no downtime while doing this!). You’ll need to add a parameter to your solrconfig.xml file as well in the section. LUCENE_CURRENT Other valid values are LUCENE_30, LUCENE_31, LUCENE_32 and LUCENE_40. These values represent specific versions of the index structure, while LUCENE_CURRENT will use the version depending on which particular release of Lucene you’re using. The version format will be upgraded automagically between most releases, so you’ll probably be fine by using LUCENE_CURRENT. If you however are trying to load index files that are more than one version older, you may have to use one of the other values.
July 9, 2011
by Mats Lindh
· 10,268 Views
article thumbnail
The era of Object-Document Mapping
The Data Mapper pattern is a mechanism for persistence where the application model and the data source have no dependencies between each other. For example, a group of PHP classes and a relational database may be used together without having the PHP classes extends a base Active Record class, thanks to a Data Mapper like Doctrine 2. But everytime we talk about the Data Mapper pattern, we assume there is a relational database on the other side of the persistence boundary. We always save objects; we always map them to MySQL or Postgres tables; but it's not mandatory. In fact, you can store objects also in NoSQL stores, with no more impedance mismatch that the one already existing with relational databases. Doctrine is expanding with two projects which have the goal of replicating the easy object-relational mapping of Doctrine 2 to other stores. These two projects are the MongoDB and CouchDB Object-Document mappers. Since the basic unit of persistence is the document in both cases, objects (or group of them) are stored as documents. The retrieval process depends on the features of the underlying database: it may consists of simple queries (MongoDB) or just of a unique identifier (CouchDB). How do an ODM work? If you use Doctrine 2, you'll find out that many of its concepts translates well to the other mappers. Instead of Entities, we have Documents in our model: id = "myuser-1234"; $user->username = "test"; $this->dm->persist($user); $this->dm->flush(); $this->dm->clear(); $userNew = $this->dm->find($this->type, $user->id); There is a lot of reuse between the various projects: Doctrine\Common is a dependency for them all, and you will be able to use the same ArrayCollection with any of the mappers. Features There's really all I would need to get started: storage of new documents, update, removal; pretty much what you expect; Unit Of Work pattern for tracking what has been modified and execute a single flush() to save changes; support for collections and both one-to-many or many-to-one associations; support for embedded documents (read Value Objects); both embedding of one or many objects as document fields is possible. This is a striking difference with respect to ORMs: Doctrine 2 does not support Value Objects mapping. cascade of persistence and removal of objects; specification of the mapping metadata via annotations, XML, YAML or PHP (like for Doctrine 2). Projects status The concept of Object-Document Mapper, at least in this name, is pretty unique to PHP and Ruby. Mandango is a PHP alternative and Mongoid a Ruby one, for the MongoDB case. Both projects are in the midst of releasing their first stable version: we are talking about bleeding edge technology here. The MongoDB mapper has been around for more than an year and is now in beta3. The CouchDB has an alpha release, but I found out that the current version shipped via PEAR is broken: if you want to play with it, checkout it from github and initialize its submodules (the standalone Symfony components and Doctrine\Common): git clone https://github.com/doctrine/couchdb-odm.git couchdb_odm cd couchdb_odm git submodule init git submodule update For the MongoDB mapper, the PEAR installation should suffice: pear install pear.doctrine-project.org/DoctrineMongoDBODM-1.0.0BETA3 Conclusions I think I'm going to explore more the CouchDB ODM, since it is a departure from the relational model. MongoDB is still similar to traditional databases in the way queries are satisfied: specifying a set of constraints like in SQL WHERE clauses. I'm more interested instead in approached that skip the Mapper for reporting (like Command-Query Responsibility Separation), and maybe CouchDB could be a good fit. By the way, object-document mapping is an alternative to explore: Data Mappers are all the rage today in PHP (they already were yesterday in other languages.) If we already accept that relational databases aren't always the solution, extending NoSQL with Data Mappers is a natural choice.
July 7, 2011
by Giorgio Sironi
· 20,870 Views
article thumbnail
Is Object Serialization Evil?
In my daily work, I use both an RDBMS and MarkLogic, an XML database. MarkLogic can be considered akin to the newer NoSQL databases, but it has the added structure of XML and standard languages in XQuery and XPath. The NoSQL databases are typically storing documents or key-value pairs, and some other things in between. Given that any datastore will be searched at some point, you will always care how the data is actually stored or whether there is some way to query it easily. Once you start thinking about the problem, you quickly generalize to the “how do I persist any type of data” question. However, my focus is not going to be the comparison of the various data stores, but the comparison of how data is stored. More specifically, I want to show the object serialization, mainly the Java built in method, as a data persistence format is evil. Given what you normally read on this blog, this may seem like an oddly timed post, but I have run into serialization issues lately in some production code and Mark Needham recently wrote an interesting post about this as well. Coincidentally, Mark is also working with MarkLogic, and there is an interesting item in his post: The advantage of doing things this way [using lightweight wrappers] is that it means we have less code to write than we would with the serialisation/deserialisation approach although it does mean that we’re strongly coupled to the data format that our storage mechanism uses. However, since this is one bit of the architecture which is not going to change it seems to makes sense to accept the leakage of that layer. The interesting part of this is that he has accepted using the data format of the storage mechanism, XML in MarkLogic in this case. Why is this interesting? First, it is a move away from the ORM technologies that try to hide the complexities of converting data into objects in the RDBMS world. Also, this is a glimpse into the types of issues that could arise from non-RDBMS storage choices as well as how to persist objects in general. So, an RDBMS is typically used to map object attributes to a table and columns. The mapping is mostly straightforward with some defined relationship for child objects and collections. This is a well-known area, called Object-Relational Mapping (ORM), and several open source and commercial options exist. In this scenario, object attributes are stored in a similar datatype, meaning a String is stored as a varchar and an int is stored as an integer. But, what happens when you move away from an RDBMS for data persistence? If you look at Java and its session objects, pure object serialization is used. Assuming that an application session is fairly short-lived, meaning at most a few hours, object serialization is simple, well supported and built into the Java concept of a session. However, when the data persistence is over a longer period of time, possibly days or weeks, and you have to worry about new releases of the application, serialization quickly becomes evil. As any good Java developer knows, if you plan to serialize an object, even in a session, you need a real serialization ID (serialVersionUID), not just a 1L, and you need to implement the Serializable interface. However, most developers do not know the real rules behind the Java deserialization process. If your object has changed, more than just adding simple fields to the object, it is possible that Java cannot deserialize the object correctly even if the serialization ID has not changed. Suddenly, you cannot retrieve your data any longer, which is inherently bad. Now, may developers reading this may say that they would never write code that would have this problem. That may be true, but what about a library that you use or some other developer no longer employed by your company? Can you guarantee that this problem will never happen? The only way to guarantee that is to use a different serialization method. What options do we have? Obviously, there are the NoSQL datastores but the actual object format is the relevant question not which solution to choose. Besides the obvious serialized object, some NoSQL datastores use JSON to store objects, MarkLogic uses XML and there are others that store just key-value pairs. Key-value pairs are typically a mapping of a text key to a value that is a serialized object, either a binary or textual format. So, that leaves us with XML, JSON and other textual formats. One of the benefits of a structured format like XML or JSON is that they can be made searchable and provide some level of context. I have talked about data formats before, so I won’t go into a comparison again. However, do these types of formats avoid the issues that native Java object serialization has? This is really dependent upon what library you are using for serialization. Some libraries will deserialize an object without any issues regardless of whether the object field list has changed. Other libraries could have problems depending upon whether a serialized field exists in the target object, or there might not be solid support for collections (though that is doubtful at this point). Given that even structured formats could have serialization issues, is the only safe path hand-coded mappings like those used by ORM tools? Some JSON and XML serialization tools use the same mapping methods as the ORM tools in order to avoid these problems. However, once you define these mappings, you are explicitly stating how an object gets translated. This explicit definition will require maintenance, but that is definitely cleaner than trying to trace down a serialization defect in some random stack trace. So is implicit object serialization really worth the potential headaches? Or should we just consider it evil and never speak of it again? From http://regulargeek.com/2011/07/06/is-object-serialization-evil/
July 7, 2011
by Robert Diana
· 19,946 Views · 1 Like
article thumbnail
Using Advantage data providers to read DBF-files
In one of my projects I have to read FoxPro DBF-files and import data from them. As this code must run in server and customer doesn’t want to install FoxPro there we found another solution that seems at least to me way better. In this posting I will show you how to read DBF-files using Sybase Advantage data providers. Getting Advantage data providers Here are the download links to data providers: Advantage .NET Data Provider Release 10.1 for Windows (32-bit and 64-bit) Platforms for Advantage OLE DB Provider Release 10.1 Platforms for Advantage ODBC Driver Release 10.1 I downloaded and installed .NET data provider and my example here is fully based on this. Configuring application If you run application without configuring some data providers stuff before you will get the following error: Error 5185: Local server connections are restricted in this environment. See the 5185 error code documentation for details. Go to your application bin folder and add there usual text file called ads.ini. Here is the content for this file: [SETTINGS] MTIER_LOCAL_CONNECTIONS=1 Make sure you add reference to Advantage data provider assembly and include ads.ini to your project like shown on image above. Getting data to DataTable Here is short code example about how to get data from DBF-file to DataTable. static void Main(string[] args) { var tableName = "TABLENAME_WITHOUT_EXTENSION"; var connStr = "data source={0};tabletype=vfp;servertype= local;"; connStr = string.Format(connStr, "c:\\temp\\"); var table = new DataTable(); using (var conn = new AdsConnection(connStr)) using (var adapter = new AdsDataAdapter()) using (var cmd = new AdsCommand()) { cmd.Connection = conn; cmd.CommandText = "select * from " + tableName; adapter.SelectCommand = cmd; conn.Open(); adapter.Fill(table); conn.Close(); } Console.WriteLine("Table fields:"); foreach (DataColumn col in table.Columns) Console.WriteLine(col.ColumnName); Console.WriteLine(" "); Console.WriteLine("Rows: " + table.Rows.Count); Console.Read(); } If Advantage data providers were installed correctly and there are no errors in table names, locations and your SQL query then you should see list of table column names and row count on console window when you run the application.
May 17, 2011
by Gunnar Peipman
· 10,424 Views
article thumbnail
Java Web Application Security - Part I: Java EE 6 Login Demo
Back in February, I wrote about my upcoming conferences: In addition to Vegas and Poland, there's a couple other events I might speak at in the next few months: the Utah Java Users Group (possibly in April), Jazoon and ÜberConf (if my proposals are accepted). For these events, I'm hoping to present the following talk: Webapp Security: Develop. Penetrate. Protect. Relax. In this session, you'll learn how to implement authentication in your Java web applications using Spring Security, Apache Shiro and good ol' Java EE Container Managed Authentication. You'll also learn how to secure your REST API with OAuth and lock it down with SSL. After learning how to develop authentication, I'll introduce you to OWASP, the OWASP Top 10, its Testing Guide and its Code Review Guide. From there, I'll discuss using WebGoat to verify your app is secure and commercial tools like webapp firewalls and accelerators. Fast forward a couple months and I'm happy to say that I've completed my talk at the Utah JUG and it's been accepted at Jazoon and Über Conf. For this talk, I created a presentation that primarily consists of demos implementing basic, form and Ajax authentication using Java EE 6, Spring Security and Apache Shiro. In the process of creating the demos, I learned (or re-educated myself) how to do a number of things in all 3 frameworks: Implement Basic Authentication Implement Form-based Authentication Implement Ajax HTTP -> HTTPS Authentication (with programmatic APIs) Force SSL for certain URLs Implement a file-based store of users and passwords (in Jetty/Maven and Tomcat standalone) Implement a database store of users and passwords (in Jetty/Maven and Tomcat standalone) Encrypt Passwords Secure methods with annotations For the demos, I showed the audience how to do almost all of these, but skipped Tomcat standalone and securing methods in the interest of time. In July, when I do this talk at ÜberConf, I plan on adding 1) hacking the app (to show security holes) and 2) fixing it to protect it against vulnerabilities. I told the audience at UJUG that I would post the presentation and was planning on recording screencasts of the various demos so the online version of the presentation would make more sense. Today, I've finished the first screencast showing how to implement security with Java EE 6. Below is the presentation (with the screencast embedded on slide 10) as well as a step-by-step tutorial. * You can also watch the screencast on YouTube or download the presentation PDF. Java EE 6 Login Tutorial Download and Run the Application Implement Basic Authentication Implement Form-based Authentication Force SSL Store Users in a Database Summary Download and Run the Application To begin, download the application you'll be implementing security in. This app is a stripped-down version of the Ajax Login application I wrote for my article on Implementing Ajax Authentication using jQuery, Spring Security and HTTPS. You'll need Java 6 and Maven installed to run the app. Run it using mvn jetty:run and open http://localhost:8080 in your browser. You'll see it's a simple CRUD application for users and there's no login required to add or delete users. Implement Basic Authentication The first step is to protect the list screen so people have to login to view users. To do this, add the following to the bottom of src/main/webapp/WEB-INF/web.xml: users /users GET POST ROLE_ADMIN BASIC Java EE Login ROLE_ADMIN At this point, if you restart Jetty (Ctrl+C and jetty:run again), you'll get an error about a missing LoginService. This happens because Jetty doesn't know where the "Java EE Login" realm is located. Add the following to pom.xml, just after in the Jetty plugin's configuration. Java EE Login ${basedir}/src/test/resources/realm.properties The realm.properties file already exists in the project and contains user names and passwords. Start the app again using mvn jetty:run and you should be prompted to login when you click on the "Users" tab. Enter admin/admin to login. After logging in, you can try to logout by clicking the "Logout" link in the top-right corner. This calls a LogoutController with the following code that logs the user out. public void logout(HttpServletResponse response) throws ServletException, IOException { request.getSession().invalidate(); response.sendRedirect(request.getContextPath()); } You'll notice that clicking this link doesn't log you out, even though the session is invalidated. The only way to logout with basic authentication is to close the browser. In order to get the ability to logout, as well as to have more control over the look-and-feel of the login, you can implement form-based authentication. Implement Form-based Authentication To change from basic to form-based authentication, you simply have to replace the in your web.xml with the following: FORM /login.jsp /login.jsp?error=true The login.jsp page already exists in the project, in the src/main/webapp directory. This JSP has 3 important elements: 1) a form that submits to "${contextPath}/j_security_check", 2) an input element named "j_username" and 3) an input element named "j_password". If you restart Jetty, you'll now be prompted to login with this JSP instead of the basic authentication dialog. Force SSL Another thing you might want to implement to secure your application is forcing SSL for certain URLs. To do this on the same you already have in web.xml, add the following after : CONFIDENTIAL To configure Jetty to listen on an SSL port, add the following just after in your pom.xml: true 8080 true 8443 60000 ${project.build.directory}/ssl.keystore appfuse appfuse The keystore must be generated for Jetty to start successfully, so add the keytool-maven-plugin just above the jetty-maven-plugin in pom.xml. org.codehaus.mojo keytool-maven-plugin 1.0 generate-resources clean clean generate-resources genkey genkey ${project.build.directory}/ssl.keystore cn=localhost appfuse appfuse appfuse RSA Now if you restart Jetty, go to http://localhost:8080 and click on the "Users" tab, you'll get a 403. What the heck?! When this first happened to me, it took me a while to figure out. It turns out that Jetty doesn't redirect to HTTPS when using Java EE authentication, so you have to manually type in https://localhost:8443/ (or add a filter to redirect for you). If you deployed this same application on Tomcat (after enabling SSL), it would redirect for you. Store Users in a Database Finally, to store your users in a database instead of file, you'll need to change the in the Jetty plugin's configuration. Replace the existing element with the following: Java EE Login ${basedir}/src/test/resources/jdbc-realm.properties The jdbc-realm.properties file already exists in the project and contains the database settings and table/column names for the user and role information. jdbcdriver = com.mysql.jdbc.Driver url = jdbc:mysql://localhost/appfuse username = root password = usertable = app_user usertablekey = id usertableuserfield = username usertablepasswordfield = password roletable = role roletablekey = id roletablerolefield = name userroletable = user_role userroletableuserkey = user_id userroletablerolekey = role_id cachetime = 300 Of course, you'll need to install MySQL for this to work. After installing it, you should be able to create an "appfuse" database and populate it using the following commands: mysql -u root -p -e 'create database appfuse' curl https://gist.github.com/raw/958091/ceecb4a6ae31c31429d5639d0d1e6bfd93e2ea42/create-appfuse.sql > create-appfuse.sql mysql -u root -p appfuse < create-appfuse.sql Next you'll need to configure Jetty so it has MySQL's JDBC Driver in its classpath. To do this, add the following dependency just after the element (before ) in pom.xml: mysql mysql-connector-java 5.1.14 Now run the jetty-password.sh file in the root directory of the project to generate a password of your choosing. For example: $ sh jetty-password.sh javaeelogin javaeelogin OBF:1vuj1t2v1wum1u9d1ugo1t331uh21ua51wts1t3b1vur MD5:53b176e6ce1b5183bc970ef1ebaffd44 The last two lines are obfuscated and MD5 versions of the password. Update the admin user's password to this new value. You can do this with the following SQL statement. UPDATE app_user SET password='MD5:53b176e6ce1b5183bc970ef1ebaffd44' WHERE username = 'admin'; Now if you restart Jetty, you should be able to login with admin/javaeelogin and view the list of users. Summary In this tutorial, you learned how to implement authentication using standard Java EE 6. In addition to the basic XML configuration, there's also some new methods in HttpServletRequest for Java EE 6 and Servlet 3.0: authenticate(response) login(user, pass) logout() This tutorial doesn't show you how to use them, but I did play with them a bit as part of my UJUG demo when implementing Ajax authentication. I found that login() did work, but it didn't persist the authentication for the users session. I also found that after calling logout(), I still needed to invalidate the session to completely logout the user. There are some additional limitations I found with Java EE authentication, namely: No error messages for failed logins No Remember Me No auto-redirect from HTTP to HTTPS Container has to be configured Doesn’t support regular expressions for URLs Of course, no error messages indicating why login failed is probably a good thing (you don't want to tell users why their credentials failed). However, when you're trying to figure out if your container is configured properly, the lack of container logging can be a pain. In the next couple weeks, I'll post Part II of this series, where I'll show you how to implement this same set of features using Spring Security. In the meantime, please let me know if you have any questions. From http://raibledesigns.com/rd/entry/java_web_application_security_part
May 15, 2011
by Matt Raible
· 41,981 Views · 3 Likes
article thumbnail
Reset MySQL Root Password On Linux
Five easy steps to reset MySQL root password. Stop the MySQL server. Start the MySQL server with the --skip-grant-tables option. (it will not prompt for password) Connect to MySQL server as the root user. Setup new MySQL root password. Exit and restart the MySQL server. ### Shell Commands ### /etc/init.d/mysql stop mysqld_safe --skip-grant-tables & mysql -u root ### SQL Commands ### use mysql; UPDATE user SET password=PASSWORD("new-password") WHERE user='root'; flush privileges; exit ### Shell Commands ### /etc/init.d/mysql stop /etc/init.d/mysql start mysql -u root -pnew-password
May 13, 2011
by Artur Mkrtchyan
· 5,779 Views
  • Previous
  • ...
  • 515
  • 516
  • 517
  • 518
  • 519
  • 520
  • 521
  • 522
  • 523
  • 524
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×