The Latest and Popular SDLC Topics

The Latest Popular Topics

i want to share something a software developer or operations guy every now and then encounters. and which i did yesterday. it all started with our nightly maven build failing last saturday jan 28 on a java.lang.noclassdeffounderror: javax/xml/bind/validationeventlocator the interesting snippet from the concole output from this build looked like this: [info] [jaxb2:generate {execution: xxx-schema-gen}] [fatal error] org.jvnet.mjiip.v_2.xjc2mojo#execute() caused a linkage error (java.lang.noclassdeffounderror) and may be out-of-date. check the realms: [fatal error] plugin realm = app0.child-container[org.jvnet.jaxb2.maven2:maven-jaxb2-plugin] urls[0] = file:/home/hudson/.m2/repository/org/jvnet/jaxb2/maven2/maven-jaxb2-plugin/0.8.1/maven-jaxb2-plugin-0.8.1.jar urls[1] = file:/home/hudson/.m2/repository/org/jvnet/jaxb2/maven2/maven-jaxb2-plugin-core/0.8.1/maven-jaxb2-plugin-core-0.8.1.jar urls[2] = file:/home/hudson/.m2/repository/com/sun/org/apache/xml/internal/resolver/20050927/resolver-20050927.jar urls[3] = file:/home/hudson/.m2/repository/org/sonatype/plexus/plexus-build-api/0.0.7/plexus-build-api-0.0.7.jar urls[4] = file:/home/hudson/.m2/repository/org/codehaus/plexus/plexus-utils/1.5.15/plexus-utils-1.5.15.jar urls[5] = file:/home/hudson/.m2/repository/org/jfrog/maven/annomojo/maven-plugin-anno/1.3.1/maven-plugin-anno-1.3.1.jar urls[6] = file:/home/hudson/.m2/repository/org/jvnet/jaxb2/maven2/maven-jaxb22-plugin/0.8.1/maven-jaxb22-plugin-0.8.1.jar urls[7] = file:/home/hudson/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.5-b10/jaxb-impl-2.2.5-b10.jar urls[8] = file:/home/hudson/.m2/repository/com/sun/xml/bind/jaxb-xjc/2.2.5-b10/jaxb-xjc-2.2.5-b10.jar [fatal error] container realm = plexus.core.maven [info] ------------------------------------------------------------------------ [error] fatal error [info] ------------------------------------------------------------------------ [info] javax/xml/bind/validationeventlocator [info] ------------------------------------------------------------------------ [info] trace java.lang.noclassdeffounderror: javax/xml/bind/validationeventlocator at com.sun.tools.xjc.reader.internalizer.domforestscanner.scan(domforestscanner.java:84) it seems that the maven-jaxb2-plugin –we use for jaxb to generate java sources for a certain schema–suddenly triggers a (missing or incompatible?) dependency. the weird thing is the “suddenly” part, since the last commit was done on friday after which the normal continuous builds ran fine and the even the nightly build on midnight friday to saturday. we use maven to take care of these things right? always have a known set of dependencies, each part of the process? ofcourse at the time things didn’t seem so obvious. first thing i wondered, could it have something do to with a java 5 vs 6 problem? as a matter of fact, on our project we’re trying to go to java 6 after a long time, but i’m the only one which already upgraded my local developer machine to work with jdk6 for a while and see if nothing strange happens. you know, keeping a lot of files from your workspace in a special “do not commit” working set or hidden from view, in order to prevent it from prematurely ending up in version control so naturally, i first explored all changesets leading up to the failed build to see if i by accident committed a file i shouldn’t have. but i couldn’t find anything. next i thought: could hudson be configured incorrectly last week by a parting co-worker (which had a somewhat unofficial ownership of our ci infrastructure until he left) which now surfaced, because e.g. a server was restarted or hudson was upgraded? checked a few things out: our hudson startup.sh looked like /usr/java/latest/bin/java -jar hudson.war --httpport=9090 --ajp13port=9091 --deamon --logfile=hudson.log & where /usr/java/latest/bin/java pointed to “java(tm) se runtime environment (build 1.6.0 _24-b07)” while normal “java” on the command-line pointed to “java(tm) 2 runtime environment, standard edition (build 1.5.0 _22-b03)” . could it be that the “latest” directory pointed earlier to java 5 en (by some mysterious upgrade on the server) now suddenly links to java 6. could the java version we start hudson with affect the java version we use for our hudson jobs? i don’t hope so! consequently, i checked the hudson configuration itself: what kind of jdk’s did we configure? luckily, only one: named “1_5_0″ with java_home pointing to “/usr/java/default”. but again not clear from the path to what java version it points, until i figured out this symlink leads to “/usr/java/jdk1.5.0_22″. good, jdk 5 should be used as default for all our maven builds. to verify i added a 2nd bogus jdk entry in the hudson configuration, so i was able to explicitly choose it in the job configuration – and after triggering a new build i was sure hudson was configured correctly jdk-wise. man, how one can digress! so let’s take a look at the offending library at hand. from the stacktrace one can spot the maven-jaxb2-plugin artefact from the group org.jvnet.jaxb2.maven2 . did something happen to this dependency itself? i opened up our nexus console, searched for it and saw the screen below: wait a minute! why do we have two versions of this plugin? in nexus i opened up the tab artefact information on the 0.8.1 version and saw that the uploaded date listed last saturday 28th of jan. i know we shouldn’t have more than one version of this plugin in nexus. did a small check on the jira release notes page of this plugin… well then. so a new release of the plugin was published online last saturday and the nightly build triggered the download of a new version 0.8.1 – which seemed to be compiled against java 6 now which caused the java.lang.noclassdeffounderror . this naturally lead me to the offending part in our pom.xml : org.jvnet.jaxb2.maven2 maven-jaxb2-plugin argh. no version! changed it into: org.jvnet.jaxb2.maven2 maven-jaxb2-plugin 0.8.0 i committed the pom.xml and triggerd a manual build – and… it worked. ofcourse. in conclusion there are some lessons to be learned i think: when debugging a problem, always try to reason about possible paths which come to mind to follow first and don’t get hung on your very first idea which pops up. this got me i think sidetracked too long on finding out whether or not i accidentally committed java 6 files or whether or not hudson got accidentally misconfigured into not using java 5 anymore somehow. troubleshooting skills are highly dependent of previous experience in a field and ofcourse, if ever a similar problem arises i’d be able to recognize it faster now. write it down – so a collegue or yourself can find it back more easily later on. explicity set dependency versions in your pom.xml! don’t blame maven for everything does anyone else have a troubleshooting-tale of going in the wrong direction? original posting: http://tedvinke.wordpress.com/2012/02/01/why-does-my-maven-build-fail

February 4, 2012

by Ted Vinke

· 16,134 Views

Marshalling / Unmarshalling Java Objects: Serialization vs Externalization

We all know the Java platform allows us to create reusable objects in memory. However, all of those objects exist only as long as the Java virtual machine remains running. It would be nice if the objects we create could exist beyond the lifetime of the virtual machine. Well, with object serialization, you can flatten your objects and reuse them in powerful ways. Object serialization is the process of saving an object’s state to a sequence of bytes, as well as the process of rebuilding those bytes into a live object at some future time. The Java Serialization API provides a standard mechanism for developers to handle object serialization. The API is small and easy to use, provided the classes and methods are understood. By implementating java.io.Serializable, you get “automatic” serialization capability for objects of your class. No need to implement any other logic, it’ll just work. The Java runtime will use reflection to figure out how to marshal and unmarshal your objects. In earlier version of Java, reflection was very slow, and so serializaing large object graphs (e.g. in client-server RMI applications) was a bit of a performance problem. To handle this situation, the java.io.Externalizable interface was provided, which is like java.io.Serializable but with custom-written mechanisms to perform the marshalling and unmarshalling functions (you need to implement readExternal and writeExternal methods on your class). This gives you the means to get around the reflection performance bottleneck. In recent versions of Java (1.3 onwards, certainly) the performance of reflection is vastly better than it used to be, and so this is much less of a problem. I suspect you’d be hard-pressed to get a meaningful benefit from Externalizable with a modern JVM. Also, the built-in Java serialization mechanism isn’t the only one, you can get third-party replacements, such as JBoss Serialization, which is considerably quicker, and is a drop-in replacement for the default. A big downside of Externalizable is that you have to maintain this logic yourself – if you add, remove or change a field in your class, you have to change your writeExternal/readExternal methods to account for it. In summary, Externalizable is a relic of the Java 1.1 days. There’s really no need for it any more. References http://java.sun.com/developer/technicalArticles/Programming/serialization http://docs.oracle.com/javase/6/docs/api/java/io/Serializable.html http://docs.oracle.com/javase/6/docs/api/java/io/Externalizable.html From http://singztechmusings.in/marshalling-unmarshalling-java-objects-serialization-vs-externalization/

February 1, 2012

by Singaram Subramanian

· 45,257 Views · 1 Like

Access Server Side Variable In Javascript

Add following javascript function to aspx page function check() { alert("this is check value " + ''); } Add following variable declaration on server side i.e aspx.cs public string checkvalue ="indrnilhafa";

January 31, 2012

by Snippets Manager

· 7,662 Views

Gentle introduction to WADL (in Java)

WADL (Web Application Description Language) is to REST what WSDL is to SOAP. The mere existence of this language causes a lot of controversy (see: Do we need WADL? and To WADL or not to WADL). I can think of few legitimate use cases for using WADL, but if you are here already, you are probably not seeking for yet another discussion. So let us move forward to the WADL itself. In principle WADL is similar to WSDL, but the structure of the language is much different. Whilst WSDL defines a flat list of messages and operations either consuming or producing some of them, WADL emphasizes the hierarchical nature of RESTful web services. In REST, the primary artifact is the resource. Each resource (noun) is represented as an URI. Every resource can define both CRUD operations (verbs, implemented as HTTP methods) and nested resources. The nested resource has a strong relationship with a parent resource, typically representing an ownership. A simple example would be http://example.com/api/books resource representing a list of books. You can (HTTP) GET this resource, meaning to retrieve the whole list. You can also GET the http://example.com/api/books/7 resource, fetching the details of 7th book inside books resource. Or you can even PUT new version or DELETE the resource altogether using the same URI. You are not limited to a single level of nesting: GETting http://example.com/api/books/7/reviews?page=2&size=10 will retrieve the second page (up to 10 items) of reviews of 7th book. Obviously you can also place other resources next to books, like http://example.com/api/readers The requirement arose to formally and precisely describe every available resource, method, request and response, just like WSDL guys were able to do. WADL is one of the options to describe “available URIs", although some believe that well-written REST service should be self-descriptive (see HATEOAS). Nevertheless here is a simple, empty WADL document: Nothing fancy here. Note that the tag defines base API address. All named resources, which we are just about to add, are relative to this address. Also you can define several tags to describe more than one APIs. So, let's add a simple resource: This defines resource under http://example.com/api/books with two methods possible: GET to retrieve the whole list and POST to create (add) new item. Depending on your requirements you might want to allow DELETE method as well (to delete all items), and it is the responsibility of WADL to document what is allowed. Remember our example at the beginning: /books/7? Obviously 7 is just an example and we won't declare every possible book id in WADL. Instead there is a handy placeholder syntax:There are two important aspects you should note: first, The {bookId} place-holder was used in place of nested resource. Secondly, to make it clear, we are documenting this place-holder using tag. We will see soon how it can be used in combination with methods. Just to make sure you are still with me, the document above describes GET /books and GET /books/some_id resources. The web service is getting complex, however it describes quite a lot of operations. First of all GET /books/42/reviews is a valid operation. But the interesting part is the nested tag. As you can see we can describe parameters of each method independently. In our case optional query parameters (as opposed to template parameters used previously for URI place-holders) were defined. This gives the client additional knowledge about acceptable page and size query parameters. This means that /books/7/reviews?page=2&size=10 is a valid resource identifier. And did I mention that every resource, method and parameter can have documentation attached as per the WADL specification? We will stop here and only mention about remaining pieces of WADL. First of all, as you have probably guessed so far, there is also a child tag possible for each . Both request and response can define exact grammar (e.g. in XML Schema) that either the request or the response must follow. The response can also document possible HTTP response codes. But since we will be using the knowledge you have gained so far in a code-first application, I intentionally left the definition. WADL is agile and it allows you to define as little (or as much) information as you need. So we know the basics of WADL, now we would like to use it, maybe as a consumer or as a producer in a Java-based application. Fortunately there is a wadl.xsd XML Schema description of the language itself, which we can use to generate JAXB-annotated POJOs to work with (using xjc tool in the JDK): $ wget http://www.w3.org/Submission/wadl/wadl.xsd $ xjc wadl.xsd And there it... hangs! The life of a software developer is full of challenges and non-trivial problems. And sometimes it is just an annoying network filter that makes suspicious packets (together with half hour of your life) disappear. It is not hard to spot the problem, once you recall that article written around 2008: W3C’s Excessive DTD Traffic: Accessing xml.xsd from the browser returns an HTML page instantly, but xjc tool waits forever. Downloading this file locally and correcting the schemaLocation attribute in wadl.xsd helped. It's always the little things... $ xjc wadl.xsd parsing a schema... compiling a schema... net/java/dev/wadl/_2009/_02/Application.java net/java/dev/wadl/_2009/_02/Doc.java net/java/dev/wadl/_2009/_02/Grammars.java net/java/dev/wadl/_2009/_02/HTTPMethods.java net/java/dev/wadl/_2009/_02/Include.java net/java/dev/wadl/_2009/_02/Link.java net/java/dev/wadl/_2009/_02/Method.java net/java/dev/wadl/_2009/_02/ObjectFactory.java net/java/dev/wadl/_2009/_02/Option.java net/java/dev/wadl/_2009/_02/Param.java net/java/dev/wadl/_2009/_02/ParamStyle.java net/java/dev/wadl/_2009/_02/Representation.java net/java/dev/wadl/_2009/_02/Request.java net/java/dev/wadl/_2009/_02/Resource.java net/java/dev/wadl/_2009/_02/ResourceType.java net/java/dev/wadl/_2009/_02/Resources.java net/java/dev/wadl/_2009/_02/Response.java net/java/dev/wadl/_2009/_02/package-info.java Since we'll be using these classes in a maven based project (and I hate committing generated classes to source repository), let's move xjc execution to maven lifecycle: org.codehaus.mojo jaxb2-maven-plugin 1.3 net.java.dev.jaxb2-commons jaxb-fluent-api 2.0.1 com.sun.xml jaxb-xjc xjc -Xfluent-api bindings.xjb net.java.dev.wadl Well, pom.xml isn't the most concise format ever... Never mind, this will generate WADL XML classes during every build, before the source code is compiled. I also love the fluent-api plugin that adds with*() methods along with ordinary setters, returning this to allow chaining. Pretty convenient. Finally we define more pleasant package name for generated artifacts (if you find net.java.dev.wadl._2009._02 package name pleasant enough, you can skip this step) and add Wadl prefix to all generated classes bindings.xjb file: We are now ready to produce and consume WADL in XML format using JAXB and POJO classes. Equipped with that knowledge and the foundation we are ready to develop some interesting library – which will be the subject of the next article. From http://nurkiewicz.blogspot.com/2012/01/gentle-introduction-to-wadl-in-java.html

January 31, 2012

by Tomasz Nurkiewicz

· 29,809 Views

Algorithm of the Week: Data Compression with Relative Encoding

Overview Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur. The only problem with these algorithms is that the input stream of data is not always constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress. 1, 2, 3, 4, 5, 6, 7 It's hard to imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the letters of the alphabet are the very base of words so it is the minimal part for word construction and therefore hard to compress. Fortunately this isn’t true always. An algorithm that tries to deal with non-repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90′s). 1991, 1991, 1999, 1998, 1991, 1993, 1992, 1992 Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do. 91, 91, 99, 98, 91, 93, 92, 92 Now we have a shorter string, but we can go even further by keeping only the first year. All other years will as relative to this year. 91, 0, 8, 7, 0, 2, 1, 1 Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont always be formatted in such a pretty way. How about the next character stream? 91, 94, 95, 95, 98, 100, 101, 102, 105, 110 We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become: -9, -6, -5, -5, -2, 100, 1, 2, 5, 10 The problem is that we can’t always decide which value will be the base value so easily. What if the data was dispersed in a different way: 96, 97, 98, 99, 100, 101, 102, 103, 999, 1000, 1001, 1002 Now the value of “100” isn’t useful, because compressing the stream will get something like this: -4, -3, -2, -1, 100, 1, 2, 3, 899, 900, 901, 902 To group the relative values around “some” base values will be far more handy. (-4, -3, -2, -1, 100, 1, 2, 3) (-1, 1000, 1, 2) However, to decide which value will be the base value isn’t that easy. Also the encoding format is not so trivial. On the other hand, this type of encoding can be useful in some specific cases as we can see below. Implementation The implementation of this algorithm depends on the specific task and the format of the data stream. Assuming that we have to transfer the stream of years in JSON from a web server to a browser, here’s a short PHP snippet. // JSON: [1991,1991,1999,1998,1999,1998,1995,1997,1994,1993] $years = array(1991,1991,1999,1998,1999,1998,1995,1997,1994,1993); function relative_encoding($input) { $output = array(); $inputLength = count($input); $base = $input[0]; $output[] = $base; for ($i = 1; $i < $inputLength; $i++) { $output[] = $input[$i] - $base; } return $output; } // JSON: [1991,0,8,7,8,7,4,6,3,2] echo json_encode(relative_encoding($years)); Application This algorithm may be very useful in many cases, such as this one: there are plenty of map applications around the web. Some products such as Google Maps, Yahoo! Maps, Bing Maps are quite famous, while there are also very useful open source projects like OpenStreetMap. The web sites using these apps number in the thousands. A typical use case is to transfer lots of Geo coordinates from a web server to a browser using JSON. Indeed any GEO point on Earth is relative to the point (0,0), which is located near the west coast of Africa, however on large zoom levels, when there are tons of markers we can transfer the information with relative encoding. For instance the following diagram shows San Francisco with some markers on it. The coordinates are relative to the point (0,0) on Earth. Map markers can be relative to the (0, 0) point on Earth, which can occasionally be useless. Far more useful may be to encode those markers, relative to the center of the city, thus we can save some space. Relative encoding can be useful for map markers on a large zoom level, however this type of compression can be tricky. For example, when dragging the map and updating the marker array. On the other hand, we must group markers if we have to load more than one city. That’s why we must be careful when implementing it. But it can be very useful – for instance on initial load of the map we can reduce data and speed up the load time. The thing is that with relative encoding we can save only changes to base value (data) – something like version control systems and thus reducing data transfer and load. Here’s a graphical example. In the first case on the diagram below we can see that each item is stored on its own. It doesn’t depend on the adjacent items and it can be completely independent of them. However we can keep full info only for the first item and any other item will be relative to it, like on the diagram bellow. Source: http://www.stoimen.com/blog/2012/01/30/computer-algorithms-data-compression-with-relative-encoding/

January 31, 2012

by Stoimen Popov

· 17,714 Views

Mapping Mongodb ISODate to Spring Roo Entity

I have been inserting log4j entries into a mongodb database and each entry has been given an ISODate timestamp: "timestamp" : ISODate("2012-01-17T22:30:19.839Z") To create a mapping for this, I had to manually add the timestamp as Spring Roo did not allow timestamp to be used as it was a reserved word. So I manually added: @DateTimeFormat(style="MM/dd/yyyy") private java.util.Date timestamp; But I started getting the following error: Invalid style specification: MM/dd/yyyy The stack trace for that error was: org.joda.time.format.DateTimeFormat.createFormatterForStyle(DateTimeFormat.java:702) org.joda.time.format.DateTimeFormat.patternForStyle(DateTimeFormat.java:212) com.comcast.uivr.web.LoggingController_Roo_Controller.ajc$interMethod$com_comcast_uivr_web_LoggingController_Roo_Controller$com_comcast_uivr_web_LoggingController$addDateTimeFormatPatterns(LoggingController_Roo_Controller.aj:98) com.comcast.uivr.web.LoggingController.ajc$interMethodDispatch2$com_comcast_uivr_web$addDateTimeFormatPatterns(LoggingController.java:1) com.comcast.uivr.web.LoggingController_Roo_Controller.ajc$interMethodDispatch1$com_comcast_uivr_web_LoggingController_Roo_Controller$com_comcast_uivr_web_LoggingController$addDateTimeFormatPatterns(LoggingController_Roo_Controller.aj) com.comcast.uivr.web.LoggingController_Roo_Controller.ajc$interMethod$com_comcast_uivr_web_LoggingController_Roo_Controller$com_comcast_uivr_web_LoggingController$list(LoggingController_Roo_Controller.aj:66) com.comcast.uivr.web.LoggingController.list(LoggingController.java:1) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:212) org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:900) org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:827) org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778) javax.servlet.http.HttpServlet.service(HttpServlet.java:617) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77) org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88) org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) java.lang.Thread.run(Thread.java:662) To fix this I attempted to add the ISO date format for the @DateTimeFormat @DateTimeFormat(style="yyyyMMdd'T'HHmmss.SSSZ") private java.util.Date timestamp; Which still did not work and had the error. To resolve this I shitched to use ISO.DATE_TIME as the style: @DateTimeFormat(iso=ISO.DATE_TIME) private java.util.Date timestamp; From http://www.baselogic.com/blog/development/springframework/mapping-mongodb-isodate-spring-roo-entity/

January 30, 2012

by Mick Knutson

· 23,918 Views · 2 Likes

JavaScript to Convert Date to MM/DD/YYYY Format

In this post, you'll find a quick, 7-line code block of JavaScript that you can use to covert dates to the MM/DD/YYYY format.

January 27, 2012

by Snippets Manager

· 479,656 Views · 8 Likes

Visualize Maven Project Dependencies with dependency:tree and Dot Diagram Output

The dependency:tree goal of the Maven plugin dependency supports various graphical outputs from the version 2.4 up. This is how you would create a diagram showing all dependencies in the com.example group in the dot format: mvn dependency:tree -Dincludes=com.example-DappendOutput=true -DoutputType=dot -DappendOutput=true -DoutputFile=/path/to/output.dot To actually produce an image from .dot you can use one of .dot renderers, f.ex. this online dot renderer (paste into the right text box, press enter). You could also generate the output f.ex. in the graphml format & visualize it in Eclipse. From http://theholyjava.wordpress.com/2012/01/13/visualize-maven-project-dependencies-with-dependencytree-and-dot-diagram-output/

January 25, 2012

by Jakub Holý

· 31,628 Views

Algorithm of the Week: Data Compression with Diagram Encoding and Pattern Substitution

Two variants of run-length encoding are the diagram encoding and the pattern substitution algorithms. The diagram encoding is actually a very simple algorithm. Unlike run-length encoding, where the input stream must consists of many repeating elements, “aaaaaaaa” for instance, which are very rare in a natural language, there are many so-called “diagrams” in almost any natural language. In plain English there are some diagrams such as “the”, “and”, “ing” (in the word “waiting” for example), “ a”, “ t”, “ e” and many doubled letters. Actually we can extend those diagrams by adding surrounding spaces. Thus we can encode not only “the”, but “ the “, which are 5 characters (2 spaces and 3 letters) with something shorter. On the other hand, as I said, in plain English there are too many doubled letters, which unfortunately aren’t something special for run-length encoding and the compression ratio will be small. Even worse the encoded text may happen to be longer than the input message. Let’s see some examples. Let’s say we’ve to encode the message “successfully accomplished”, which consists of four doubled letters. However to compress it with run-length encoding we’ll need at least 8 characters, which doesn’t help us a lot. // 8 chars replaced by 8 chars!? input: "successfully accomplished" output: "su2ce2sfu2ly a2complished" The problem is that if the input text contains numbers, “2” in particular, we’ve to chose an escape symbol (“@” for example), which we’ll use to mark where the encoded run begins. Thus if the input message is “2 successfully accomplished tasks”, it will be encoded as “2 su@2ce@2sfu@2ly a@2complished tasks”. Now the output message is longer!!! than the input string. // the compressed message is longer!!! input: "2 successfully accomplished" output: "2 su@2ce@2sfu@2ly a@2complished tasks" Again if the input stream contains the escape symbol, we have to find another one, and the problem is that it is often too difficult to find short escape symbol that doesn’t appear in the input text, without a full scan of the text. That is why run-length encoding isn’t a good solution when compressing plain text, where long runs rarely appear. Well, of course, there are exceptions. For example such an exception is the lossy text compression with run-length encoding. It is intuitively clear that compressing text with loss is rarely useful, especially when you’ve to decompress exactly the same text. However there are some cases that lossy compression may be useful. Such case can be removing spaces. Indeed the text “successfully accomplished” brings us exactly the same information as “successfully accomplished”. In this case we can simply remove those spaces. Indeed we can use a marker to indicate the long run of spaces like “successfully@6 accomplished” in order to decompress the input string with absolutely no loss, but we can also throw those symbols away. This desision depends on the goal. Exactly with the same goal in mind we can remove new lines and tabs, only if we’re sure that the sense of the text is preserved. Yet again, a problem is that such long runs don’t happen to occur in random texts. That is why it’s better to use diagram encoding for plain text compression instead of run-length encoding. A Few Questions After understanding the principles of the diagram encoding, let’s see some examples. In the example above it is better to replace doubled letters with something shorter. Let’s say # for “cc”, @ for “ss” and % for “ll”. Thus the input text will be compressed as “su#e@fu%y a#omplished”, which is shorter. But yet again what will happen if the input message contains one of the substitutions? Also we can’t say if there are many doubled letters and enough reasonable substitutions for them. A better approach is to replace patterns. Run-length encoding isn't a good approach for text compression, because long runs rarely appear in a natural language. Pattern Substitution The pattern substitution algorithm is a variant of the diagram encoding. As I said above in plain English a very commonly used pattern can be “ the “, which is five characters long. We can now replace it with something like “$%” for example. In this case the message “I send the message” will become “I send$%message”. However there are some obstacles to overcome. The first problem is that we need to know the language and somehow to define commonly used patterns in a dictionary. What would happen with a message written in some language we don’t know nothing about. Let’s say – Latin like the example bellow. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras venenatis, sapien eget suscipit placerat, justo quam blandit mauris, quis tempor ante sapien sodales augue. Praesent ut mauris quam. Phasellus scelerisque, ante quis consequat tristique, metus turpis consectetur leo, vitae facilisis sapien mi eu sapien. Praesent vitae ligula elit, et faucibus augue. Sed rhoncus sodales dolor ut gravida. In quis augue ac nulla auctor mattis sed sed libero. Donec eget purus eget enim tempor porta vitae eget diam. Mauris aliquet malesuada ipsum, non pulvinar urna vestibulum ac. Donec feugiat velit vitae nunc cursus imperdiet. Donec accumsan faucibus dictum. Phasellus sed mauris sapien. Maecenas mi metus, tincidunt sed rhoncus nec, sodales non sapien. Clearly without knowing Latin it isn’t easy to define which are those commonly used patterns. The thing is that it’s better to use pattern substitution if you know in advance the set of words and characters. The second problem is related to decompression. It is obvious that we need to define a dictionary and this dictionary must be used when decoding the message. It will be great also if we find more patterns longer than three characters. If not, the compression ratio will be low. Unfortunately such patterns aren’t very common in any natural language. Diagram encoding and pattern substitution are far more suitable for text compression than run-length encoding. In fact, pattern substitution is very effective on compressing programming languages. Application It is interesting to answer the question, how to use diagram encoding or patter substitution to compress text in natural language, especially when we don’t know the language in detail? The answer hides in the question. We wont compress natural languages, but machine language. Exactly machine (programming) languages are limited to a smaller sets of words and symbols. Isn’t it true for any programing language? Like PHP, where words like “function”, “while”, “for”, “break”, “switch”, “foreach” happen to be often in use, or HTML with its defined set of tags. Perhaps the best example is CSS, where only the values of the properties can vary. CSS files also tend to have multiple new lines, tabs and spaces, which only humans read. The question here is why should we compress those file types. It’s clear that after the compression they will be completely useless, both for humans and machines. Yes, that is true, but what if we have to store versions of those files into a DB. Kind of a backup. Imagine you’re working for a web hosting company that has to store daily versions of the sites it’s hosting. Thus the volume of stored information even for small companies hosting only few sites can be enormous. The problem is that compressing those files with some conventional compressing tool isn’t a good idea. Thus we’ve to save a copy of the entire site every day, but as we know the difference between daily versions of a site can be small. A version control system is another solution, but then you’ve to store the plain text of the files. Perhaps a better approach is to compress the text using pattern substitution and then saving only differences – kind of version control, which can be done with “relative encoding”. Using the above method we can save lots of disk space and in the same time we can compress/decompress easily. Another good thing is that you can save only changes to the initial files, like version control, which can also be compressed. Implementation The implementation of this algorithm is again on PHP and tries only to describe the main principles of compression. In this case I tried to compress a CSS file using the compression above. Although this example is quite primitive we can see some interesting facts. First of all you only need encoding and decoding dictionaries. Practically the encoding and decoding processes are equal, so you don’t need to implement two different functions. Here in this example a native PHP function is used – str_replace, because the purpose of this algorithm is not to describe pattern substitution techniques, but pattern substitution. It assumes that today’s programming languages have string manipulation functions for the purposes of this task. $str = file_get_contents('large_style_file.css'); $encoding_dict = array( "\n" => '$0', 'text' => '$1', 'color' => '$2', 'display' => '$3', 'font' => '$4', 'width' => '$5', 'height' => '$6', ' ' => '', ); function replace_patterns($input, $dict) { foreach ($dict as $pattern => $replace) { $input = str_replace($pattern, $replace, $input); } return $input; } $result = replace_patterns($str, $encoding_dict); By only replacing few CSS properties I achieved almost 40% of compression ratio (as shown the diagram bellow). The initial file is 202 KB, while compressed it’s only 131 KB. Of course, it all depends on the CSS file, but how about replacing all property names with shorter ones. Perhaps then the compression will be even better. Source: http://www.stoimen.com/blog/2012/01/23/computer-algorithms-data-compression-with-diagram-encoding-and-pattern-substitution/

January 24, 2012

by Stoimen Popov

· 23,820 Views

Modular Java Apps - A Microkernel Approach

Software Engineering is all about reuse. We programmers therefore love to split applications up into smaller components so that each of them can be reused or extended in an independent manner. A keyword here is "loose coupling". Slightly simplified, this means, each component should have as few dependencies to other components as possible. Most important, if I have a component B which relies on component A, I don't want that A needs to know about B. The component A should just provide a clean interface which could be used and extended by B. In Java there are many frameworks which provide this exact functionality: JavaEE, Spring, OSGI. However, each of those frameworks come with their own way to do things and provide lots and lots of additional functionality - whether you want it or not! Since we here at scireum love modularity (we build 4 products out of a set of about 10 independet modules) we built our own little framework. I factored out the most important parts and now have a single class with less than 250 lines of code+comments! I call this a microkernel approach, since it nicely compares to the situation we have with operating systems: There are monolithic kernels like the one of Linux with about 11,430,712 lines of code. And there is a concept called a microkernel, like to one of Minix with about 6,000 lines of executable kernel code. There is still an ongoing discussion which of the two solituons is better. A monolithic kernel is faster, a microkernel has way less critical code (critical code means: a bug there will crash the complete system. If you haven't already, you should read more about mikrokernels on Wikipedia. However one might think about operating systems - when it comes to Java I prefer less dependencies and if possible no black magic I don't understand. Especially if this magic involves complex ClassLoader structures. Therefore, here comes Nucleus... How does this work? The framework (Nucleus) solves two problems of modular applications: I want provide a service to other components - but I only want to show an interface and they should be provided with my implementation at runtime without knowning(referencing) it. I want to provide a service or callback for other components. I provide an interface, and I want to know all classes implementig it, so I can invoke them. Ok, we probably need examples for this. Say we want to implement a simple timer service. It provides an interface: public interface EveryMinute { void runTimer() throws Exception; } All classes implementing this interface should be invoked every minute. Additionally we provide some infos - namely, when was the timer executed last. public interface TimerInfo { String getLastOneMinuteExecution(); } Ok, next we need a client for our services: @Register(classes = EveryMinute.class) public class ExampleNucleus implements EveryMinute { private static Part timerInfo = Part.of(TimerInfo.class); public static void main(String[] args) throws Exception { Nucleus.init(); while (true) { Thread.sleep(10000); System.out.println("Last invocation: " + timerInfo.get().getLastOneMinuteExecution()); } } @Override public void runTimer() throws Exception { System.out.println("The time is: " + DateFormat.getTimeInstance().format(new Date())); } } The static field "Part timerInfo" is a simple helper class which fetches the registered instance from Nucleus on the first call and loads it into a private field. So accessing this part has almost no overhead to a normal field access - yet we only reference an interface, not an implementation. The main method first initializes Nucleus (this performs the classpath scan etc.) and then simply goes into an infinite loop, printing the last execution of our timer every ten seconds. Since our class wears a @Register annotation, it will be discovered by a special ClassLoadAction (not by Nucleus itself) instantiated and registered for the EveryMinute interface. Its method runTimer will then be invoced by our timer service every minute. Ok, but how would our TimerService look like? @Register(classes = { TimerInfo.class }) public class TimerService implements TimerInfo { @InjectList(EveryMinute.class) private List everyMinute; private long lastOneMinuteExecution = 0; private Timer timer; public TimerService() { start(); } public void start() { timer = new Timer(true); // Schedule the task to wait 60 seconds and then invoke // every 60 seconds. timer.schedule(new InnerTimerTask(), 1000 * 60, 1000 * 60); } private class InnerTimerTask extends TimerTask { @Override public void run() { // Iterate over all instances registered for // EveryMinute and invoke its runTimer method. for (EveryMinute task : everyMinute) { task.runTimer(); } // Update lastOneMinuteExecution lastOneMinuteExecution = System.currentTimeMillis(); } } @Override public String getLastOneMinuteExecution() { if (lastOneMinuteExecution == 0) { return "-"; } return DateFormat.getDateTimeInstance().format( new Date(lastOneMinuteExecution)); } } This class also wears a @Register annotation so that it will also be loaded by the ClassLoadAction named above (the ServiceLoadAction actually). As above it will be instantiated and put into Nucleus (as implementation of TimerInfo). Additionally it wears an @InjectList annotation on the everyMinute field. This will be processed by another class named Factory which performs simple dependency injection. Since its constructur starts a Java Timer for the InnerTimerTask, from that point on all instances registered for EveryMinute will be invoced by this timer - as the name says - every minute. How is it implemented? The good thing about Nucleus is, that it is powerful on the one hand, but very simple and small on the other hand. As you could see, there is no inner part for special or privileged services. Everything is built around the kernel - the class Nuclues. Here is what it does: It scans the classpath and looks for files called "component.properties". Those need to be in the root folder of a JAR or in the /src folder of each Eclipse project respectively. For each identified JAR / project / classpath element, it then collects all contained class files and loads them using Class.forName. For each class, it checks if it implements ClassLoadAction, if yes, it is put into a special list. Each ClassLoadAction is instanciated and each previously seen class is sent to it using: void handle(Class clazz) Finally each ClassLoadAction is notified, that nucleus is complete so that final steps (like annotation based dependency injection) could be performed. That's it. The only other thing Nucleus provides is a registry which can be used to register and retrieve objects for a class. (An in-depth description of the process above, can be found here: http://andreas.haufler.info/2012/01/iterating-over-all-classes-with.html). Now to make this framework useable as shown above, there is a set of classes around Nucleus. Most important is the class ServiceLoadAction, which will instantiate each class which wears a @Register annoation, runs Factory.inject (our mini DI tool) on it, and throws it into Nucleus for the listed classes. Whats important: The ServiceLoadActions has no specific rights or privileges, you can easily write your implementation which does smarter stuff. Next to some annotations, there are three other handy classes when it comes to retrieving instances from Nucleus: Factory, Part and Parts. As noted above, the Factory is a simple dependency injector. Currently only the ServiceLoadAction autmatically uses the Factory, as all classes wearing the @Register annotation are scanned for required injections. You can however use this factory to run injections on your own classes or other ClassLoadActions to do the same as ServiceLoadAction. If you can't or don't want to rely in annotation based dependency magic, you can use the two helper classes Part and Parts. Those are used like normal fields (see ExampleNucleus.timerInfo above) and fetch the appropriate object or list of objects automatically. Since the result is cached, repeated invocations have almost no overhead compared to a normal field. Nucleus and the example shown above is open source (MIT-License) and available here: https://github.com/andyHa/scireumOpen/blob/master/src/examples/ExampleNucleus.java https://github.com/andyHa/scireumOpen/tree/master/src/com/scireum/open/nucleus If you're interested in using Nucleus, I could put the relevant souces into a separater repository and also provide a release jar - just write a comment below an let me know. This post is the fourth part of the my series "Enterprisy Java" - We share our hints and tricks how to overcome the obstacles when trying to build several multi tenant web applications out of a set of common modules.

January 23, 2012

by Andreas Haufler

· 9,738 Views · 1 Like

Datatype Conversion in Java: XMLGregorianCalendar to java.util.Date / java.util.Date to XMLGregorianCalendar

package singz.test; import java.util.Date; import java.util.GregorianCalendar; import javax.xml.datatype.DatatypeConfigurationException; import javax.xml.datatype.DatatypeFactory; import javax.xml.datatype.XMLGregorianCalendar; /** * A utility class for converting objects between java.util.Date and * XMLGregorianCalendar types * */ public class XMLGregorianCalendarConversionUtil { // DatatypeFactory creates new javax.xml.datatype Objects that map XML // to/from Java Objects. private static DatatypeFactory df = null; static { try { df = DatatypeFactory.newInstance(); } catch(DatatypeConfigurationException e) { throw new IllegalStateException( "Error while trying to obtain a new instance of DatatypeFactory", e); } } // Converts a java.util.Date into an instance of XMLGregorianCalendar public static XMLGregorianCalendar asXMLGregorianCalendar(java.util.Date date) { if(date == null) { return null; } else { GregorianCalendar gc = new GregorianCalendar(); gc.setTimeInMillis(date.getTime()); return df.newXMLGregorianCalendar(gc); } } // Converts an XMLGregorianCalendar to an instance of java.util.Date public static java.util.Date asDate(XMLGregorianCalendar xmlGC) { if(xmlGC == null) { return null; } else { return xmlGC.toGregorianCalendar().getTime(); } } public static void main(String[] args) { Date currentDate = new Date(); // Current date // java.util.Date to XMLGregorianCalendar XMLGregorianCalendar xmlGC = XMLGregorianCalendarConversionUtil.asXMLGregorianCalendar( currentDate); System.out.println( "Current date in XMLGregorianCalendar format: " + xmlGC.toString()); // XMLGregorianCalendar to java.util.Date System.out.println( "Current date in java.util.Date format: " + XMLGregorianCalendarConversionUtil.asDate(xmlGC).toString()); } } Why do we need XMLGregorianCalendar? Java Architecture for XML Binding (JAXB) allows Java developers to map Java classes to XML representations. JAXB provides two main features: the ability to marshal Java objects into XML and the inverse, i.e. to unmarshal XML back into Java objects. In the default data type bindings i.e. mappings of XML Schema (XSD) data types to Java data types in JAXB, the following types in XML schema (mostly used in web services definition) – xsd:dateTime, xsd:time, xsd:date and so on map to javax.xml.datatype.XMLGregorianCalendar Java type. From http://singztechmusings.in/datatype-conversion-in-java-xmlgregoriancalendar-to-java-util-date-java-util-date-to-xmlgregoriancalendar/

January 21, 2012

by Singaram Subramanian

· 101,145 Views

The Persistence Layer with Spring Data JPA

This is the forth of a series of articles about Persistence with Spring. This article will focus on the configuration and implementation of the persistence layer with Spring 3.1, JPA and Spring Data. For a step by step introduction about setting up the Spring context using Java based configuration and the basic Maven pom for the project, see this article. The Persistence with Spring series: Part 1 – The Persistence Layer with Spring 3.1 and Hibernate Part 3 – The Persistence Layer with Spring 3.1 and JPA Part 5 – Transaction configuration with JPA and Spring 3.1 No More DAO implementations As I discussed in a previous post, the DAO layer usually consists of a lot of boilerplate code that can and should be simplified. The advantages of such a simplification are many fold: a decrease in the number of artifacts that need to be defined and maintained, simplification and consistency of data access patterns and consistency of configuration. Spring Data takes this simplification one step forward and makes it possible to remove the DAO implementations entirely – the interface of the DAO is now the only artifact that need to be explicitly defined. The Spring Data managed DAO In order to start leveraging the Spring Data programming model with JPA, a DAO interface needs to extend the JPA specific Repository interface - JpaRepository – in Spring’s interface hierarchy. This will enable Spring Data to find this interface and automatically create an implementation for it. Also, by extending the interface we get most if not all relevant CRUD generic methods for standard data access available in the DAO. Defining custom access method and queries As discussed, by implementing one of the Repository interfaces, the DAO will already have some basic CRUD methods (and queries) defined and implemented. To define more specific access methods, Spring JPA supports quite a few options – you can either simply define a new method in the interface, or you can provide the actual JPQ query by using the @Query annotation. A third option to define custom queries is to make use of JPA Named Queries, but this has the disadvantage that it either involves XML or burdening the domain class with the queries. In addition to these, Spring Data introduces a more flexible and convenient API, similar to the JPA Criteria API, only more readable and reusable. The advantages of this API will become more pronounced when dealing with a large number of fixed queries that could potentially be more concisely expressed through a smaller number of reusable blocks that keep occurring in different combinations. Automatic Custom Queries When Spring Data creates a new Repository implementation, it analyzes all the methods defined by the interfaces and tries to automatically generate queries from the method name. While this has limitations, it is a very powerful and elegant way of defining new custom access methods with very little effort. For example, if the managed entity has a name field (and the Java Bean standard getter and setter for that field), defining the findByName method in the DAO interface will automatically generate the correct query: public interface IFooDAO extends JpaRepository< Foo, Long >{ Foo findByName( final String name ); } This is a relatively simple example; a much larger set of keywords is supported by query creation mechanism. In the case that the parser cannot match the property with the domain object field, the following exception is thrown: java.lang.IllegalArgumentException: No property nam found for type class org.rest.model.Foo Manual Custom Queries In addition to deriving the query from the method name, a custom query can be manually specified with the method level @Query annotation. For even more fine grained control over the creation of queries, such as using named parameters or modifying existing queries, the reference is a good place to start. Spring Data transaction configuration The actual implementation of the Spring Data managed DAO – SimpleJpaRepository – uses annotations to define and configure transactions. A read only @Transactional annotation is used at the class level, which is then overridden for the non read-only methods. The rest of the transaction semantics are default, but these can be easily overridden manually per method. Exception Translation without the template One of the responsibilities of Spring ORM templates (JpaTemplate, HibernateTemplate) is exception translation – translating JPA exceptions – which tie the API to JPA – to Spring’s DataAccessException hierarchy. Without the template to do that, exception translation can still be enabled by annotating the DAOs with the @Repository annotation. That, coupled with a Spring bean postprocessor will advice all @Repository beans with all the implementations of PersistenceExceptionTranslator found in the Container – to provide exception translation without using the template. The fact that exception translation is indeed active can easily be verified with an integration test: @Test( expected = DataAccessException.class ) public void whenAUniqueConstraintIsBroken_thenSpringSpecificExceptionIsThrown(){ String name = "randomName"; this.service.save( new Foo( name ) ); this.service.save( new Foo( name ) ); } Exception translation is done through proxies; in order for Spring to be able to create proxies around the DAO classes, these must not be declared final. Spring Data Configuration To activate the Spring JPA repository support, the jpa namespace is defined and used to specify the package where to DAO interfaces are located: At this point, there is no equivalent Java based configuration – support for it is however in the works. The Spring Java or XML configuration The JPA configuration with Spring 3.1 has already been carefully discussed in the previous article of this series. Spring Data also takes advantage of the Spring support for the JPA @PersistenceContext annotation which it uses to wire the EntityManager into the Spring factory bean responsible with creating the actual DAO implementations – JpaRepositoryFactoryBean. In addition to the already discussed configuration, there is one last missing piece – including the Spring Data XML configuration in the overall persistence configuration: @Configuration @EnableTransactionManagement @ImportResource( "classpath*:*springDataConfig.xml" ) public class PersistenceJPAConfig{ ... } The Maven configuration In addition to the Maven configuration for JPA defined in a previous article, the spring-data-jpa dependency is addeed: org.springframework.data spring-data-jpa 1.0.2.RELEASE Conclusion This article covered the configuration and implementation of the persistence layer with Spring 3.1, JPA 2 and Spring JPA (part of the Spring Data umbrella project), using both XML and Java based configuration. The various method of defining more advanced custom queries are discussed, as well as configuration with the new jpa namespace and transactional semantics. The final result is a new and elegant take on data access with Spring, with almost no actual implementation work. You can check out the full implementation in the github project. From the originalThe Persistence Layer with Spring Data JPA of the Persistence with Spring series

January 20, 2012

by Eugen Paraschiv

· 154,836 Views · 2 Likes

Which Integration Framework Should You Use – Spring Integration, Mule ESB or Apache Camel?

Data exchanges between companies are increasing a lot. The number of applications that must be integrated is increasing, too. The interfaces use different technologies, protocols and data formats. Nevertheless, the integration of these applications must be modeled in a standardized way, realized efficiently and supported by automatic tests. Three integration frameworks are available in the JVM environment, which fulfil these requirements: Spring Integration, Mule ESB and Apache Camel. They implement the well-known Enteprise Integration Patterns (EIP, http://www.eaipatterns.com) and therefore offer a standardized, domain-specific language to integrate applications. These integration frameworks can be used in almost every integration project within the JVM environment – no matter which technologies, transport protocols or data formats are used. All integration projects can be realized in a consistent way without redundant boilerplate code. This article compares all three alternatives and discusses their pros and cons. If you want to know, when to use a more powerful Enterprise Service Bus (ESB) instead of one of these lightweight integration frameworks, then you should read this blog post: http://www.kai-waehner.de/blog/2011/06/02/when-to-use-apache-camel/ (it explains when to use Apache Camel, but the title could also be „When to use a lightweight integration framework“). Comparison Criteria Several criteria can be used to compare these three integration frameworks: Open source Basic concepts / architecture Testability Deployment Popularity Commercial support IDE-Support Errorhandling Monitoring Enterprise readiness Domain specific language (DSL) Number of components for interfaces, technologies and protocols Expandability Similarities All three frameworks have many similarities. Therefore, many of the above comparison criteria are even! All implement the EIPs and offer a consistent model and messaging architecture to integrate several technologies. No matter which technologies you have to use, you always do it the same way, i.e. same syntax, same API, same automatic tests. The only difference is the the configuration of each endpoint (e.g. JMS needs a queue name while JDBC needs a database connection url). IMO, this is the most significant feature. Each framework uses different names, but the idea is the same. For instance, „Camel routes“ are equivalent to „Mule flows“, „Camel components“ are called „adapters“ in Spring Integration. Besides, several other similarities exists, which differ from heavyweight ESBs. You just have to add some libraries to your classpath. Therefore, you can use each framework everywhere in the JVM environment. No matter if your project is a Java SE standalone application, or if you want to deploy it to a web container (e.g. Tomcat), JEE application server (e.g. Glassfish), OSGi container or even to the cloud. Just add the libraries, do some simple configuration, and you are done. Then you can start implementing your integration stuff (routing, transformation, and so on). All three frameworks are open source and offer familiar, public features such as source code, forums, mailing lists, issue tracking and voting for new features. Good communities write documentation, blogs and tutorials (IMO Apache Camel has the most noticeable community). Only the number of released books could be better for all three. Commercial support is available via different vendors: Spring Integration: SpringSource (http://www.springsource.com) Mule ESB: MuleSoft (http://www.mulesoft.org) Apache Camel: FuseSource (http://fusesource.com) and Talend (http://www.talend.com) IDE support is very good, even visual designers are available for all three alternatives to model integration problems (and let them generate the code). Each of the frameworks is enterprise ready, because all offer required features such as error handling, automatic testing, transactions, multithreading, scalability and monitoring. Differences If you know one of these frameworks, you can learn the others very easily due to their same concepts and many other similarities. Next, let’s discuss their differences to be able to decide when to use which one. The two most important differences are the number of supported technologies and the used DSL(s). Thus, I will concentrate especially on these two criteria in the following. I will use code snippets implementing the well-known EIP „Content-based Router“ in all examples. Judge for yourself, which one you prefer. Spring Integration Spring Integration is based on the well-known Spring project and extends the programming model with integration support. You can use Spring features such as dependency injection, transactions or security as you do in other Spring projects. Spring Integration is awesome, if you already have got a Spring project and need to add some integration stuff. It is almost no effort to learn Spring Integration if you know Spring itself. Nevertheless, Spring Integration only offers very rudimenary support for technologies – just „basic stuff“ such as File, FTP, JMS, TCP, HTTP or Web Services. Mule and Apache Camel offer many, many further components! Integrations are implemented by writing a lot of XML code (without a real DSL), as you can see in the following code snippet: You can also use Java code and annotations for some stuff, but in the end, you need a lot of XML. Honestly, I do not like too much XML declaration. It is fine for configuration (such as JMS connection factories), but not for complex integration logic. At least, it should be a DSL with better readability, but more complex Spring Integration examples are really tough to read. Besides, the visual designer for Eclipse (called integration graph) is ok, but not as good and intuitive as its competitors. Therefore, I would only use Spring Integration if I already have got an existing Spring project and must just add some integration logic requiring only „basic technologies“ such as File, FTP, JMS or JDBC. Mule ESB Mule ESB is – as the name suggests – a full ESB including several additional features instead of just an integration framework (you can compare it to Apache ServiceMix which is an ESB based on Apache Camel). Nevertheless, Mule can be use as lightweight integration framework, too – by just not adding and using any additional features besides the EIP integration stuff. As Spring Integration, Mule only offers a XML DSL. At least, it is much easier to read than Spring Integration, in my opinion. Mule Studio offers a very good and intuitive visual designer. Compare the following code snippet to the Spring integration code from above. It is more like a DSL than Spring Integration. This matters if the integration logic is more complex. The major advantage of Mule is some very interesting connectors to important proprietary interfaces such as SAP, Tibco Rendevous, Oracle Siebel CRM, Paypal or IBM’s CICS Transaction Gateway. If your integration project requires some of these connectors, then I would probably choose Mule! A disadvantage for some projects might be that Mule says no to OSGi: http://blogs.mulesoft.org/osgi-no-thanks/ Apache Camel Apache Camel is almost identical to Mule. It offers many, many components (even more than Mule) for almost every technology you could think of. If there is no component available, you can create your own component very easily starting with a Maven archetype! If you are a Spring guy: Camel has awesome Spring integration, too. As the other two, it offers a XML DSL: ${in.header.type} is ‘com.kw.DvdOrder’ ${in.header.type} is ‘com.kw.VideogameOrder’ Readability is better than Spring Integration and almost identical to Mule. Besides, a very good (but commercial) visual designer called Fuse IDE is available by FuseSource – generating XML DSL code. Nevertheless, it is a lot of XML, no matter if you use a visual designer or just your xml editor. Personally, I do not like this. Therefore, let’s show you another awesome feature: Apache Camel also offers DSLs for Java, Groovy and Scala. You do not have to write so much ugly XML. Personally, I prefer using one of these fluent DSLs instead XML for integration logic. I only do configuration stuff such as JMS connection factories or JDBC properties using XML. Here you can see the same example using a Java DSL code snippet: from(“file:incomingOrders “) .choice() .when(body().isInstanceOf(com.kw.DvdOrder.class)) .to(“file:incoming/dvdOrders”) .when(body().isInstanceOf(com.kw.VideogameOrder.class)) .to(“jms:videogameOrdersQueue “) .otherwise() .to(“mock:OtherOrders “); The fluent programming DSLs are very easy to read (even in more complex examples). Besides, these programming DSLs have better IDE support than XML (code completion, refactoring, etc.). Due to these awesome fluent DSLs, I would always use Apache Camel, if I do not need some of Mule’s excellent connectors to proprietary products. Due to its very good integration to Spring, I would even prefer Apache Camel to Spring Integration in most use cases. By the way: Talend offers a visual designer generating Java DSL code, but it generates a lot of boilerplate code and does not allow vice-versa editing (i.e. you cannot edit the generated code). This is a no-go criteria and has to be fixed soon (hopefully)! And the winner is… … all three integration frameworks, because they are all lightweight and easy to use – even for complex integration projects. It is awesome to integrate several different technologies by always using the same syntax and concepts – including very good testing support. My personal favorite is Apache Camel due to its awesome Java, Groovy and Scala DSLs, combined with many supported technologies. I would only use Mule if I need some of its unique connectors to proprietary products. I would only use Spring Integration in an existing Spring project and if I only need to integrate „basic technologies“ such as FTP or JMS. Nevertheless: No matter which of these lightweight integration frameworks you choose, you will have much fun realizing complex integration projects easily with low efforts. Remember: Often, a fat ESB has too much functionality, and therefore too much, unnecessary complexity and efforts. Use the right tool for the right job! Best regards, Kai Wähner (Twitter: @KaiWaehner) http://www.kai-waehner.de/blog/2012/01/10/spoilt-for-choice-which-integration-framework-to-use-spring-integration-mule-esb-or-apache-camel/

January 19, 2012

by Kai Wähner

CORE

· 110,861 Views · 9 Likes

Java Garbage Collection Algorithm Design Choices And Metrics To Evaluate Garbage Collector Performance

Memory Management in the Java HotSpot Virtual Machine View more documents from white paper Serial vs Parallel With serial collection, only one thing happens at a time. For example, even when multiple CPUs are available, only one is utilized to perform the collection. When parallel collection is used, the task of garbage collection is split into parts and those subparts are executed simultaneously, on different CPUs. The simultaneous operation enables the collection to be done more quickly, at the expense of some additional complexity and potential fragmentation. Concurrent versus Stop-the-world When stop-the-world garbage collection is performed, execution of the application is completely suspended during the collection. Alternatively, one or more garbage collection tasks can be executed concurrently, that is, simultaneously, with the application. Typically, a concurrent garbage collector does most of its work concurrently, but may also occasionally have to do a few short stop-the-world pauses. Stop-the-world garbage collection is simpler than concurrent collection, since the heap is frozen and objects are not changing during the collection. Its disadvantage is that it may be undesirable for some applications to be paused. Correspondingly, the pause times are shorter when garbage collection is done concurrently, but the collector must take extra care, as it is operating over objects that might be updated at the same time by the application. This adds some overhead to concurrent collectors that affects performance and requires a larger heap size. Compacting versus Non-compacting versus Copying After a garbage collector has determined which objects in memory are live and which are garbage, it can compact the memory, moving all the live objects together and completely reclaiming the remaining memory. After compaction, it is easy and fast to allocate a new object at the first free location. A simple pointer can be utilized to keep track of the next location available for object allocation. In contrast with a compacting collector, a non-compacting collector releases the space utilized by garbage objects in-place, i.e., it does not move all live objects to create a large reclaimed region in the same way a compacting collector does. The benefit is faster completion of garbage collection, but the drawback is potential fragmentation. In general, it is more expensive to allocate from a heap with in-place deallocation than from a compacted heap. It may be necessary to search the heap for a contiguous area of memory sufficiently large to accommodate the new object. A third alternative is a copying collector, which copies (or evacuates) live objects to a different memory area. The benefit is that the source area can then be considered empty and available for fast and easy subsequent allocations, but the drawback is the additional time required for copying and the extra space that may be required. Performance Metrics Several metrics are utilized to evaluate garbage collector performance, including: Throughput—the percentage of total time not spent in garbage collection, considered over long periods of time. Garbage collection overhead—the inverse of throughput, that is, the percentage of total time spent in garbage collection. Pause time—the length of time during which application execution is stopped while garbage collection is occurring. Frequency of collection—how often collection occurs, relative to application execution. Footprint—a measure of size, such as heap size. Promptness—the time between when an object becomes garbage and when the memory becomes available. If you’d like to explore more on this and in general about Java’s garbage collection / memory management, have a look at these slides: Java Garbage Collection, Monitoring, and Tuning View more presentations from Carol McDonald Related articles Practical Garbage Collection – Part 1: Introduction (worldmodscode.wordpress.com) Reducing memory churn when processing large data set (stackoverflow.com) The Top Java Memory Problems – Part 2 (dynatrace.com) imabonehead: Performance Tuning the JVM for Running Apache Tomcat | TomcatExpert (tomcatexpert.com) When Does the Garbage Collector Run in JVM ? (javacircles.wordpress.com) Why Garbage Collection Paranoia is Still (sometimes) Justified (prog21.dadgum.com) Adventures in Java Garbage Collection Tuning (rapleaf.com) From http://singztechmusings.in/java-garbage-collection-algorithm-design-choices-and-metrics-to-evaluate-garbage-collector-performance/

January 19, 2012

by Singaram Subramanian

· 14,418 Views

Algorithm of the Week: Data Compression with Bitmaps

In my previous post we saw how to compress data consisting of very long runs of repeating elements. This type of compression is known as “run-length encoding” and can be very handy when transferring data with no loss. The problem is that the data must follow a specific format. Thus the string “aaaaaaaabbbbbbbb” can be compressed as “a8b8”. Now a string with length 16 can be compressed as a string with length 4, which is 25% of its initial length without loosing any information. There will be a problem in case the characters (elements) were dispersed in a different way. What would happen if the characters are the same, but they don’t form long runs? What if the string was “abababababababab”? The same length, the same characters, but we cannot use run-length encoding! Indeed using this algorithm we’ll get at best the same string. In this case, however, we can see another fact. The string consists of too many repeating elements, although not arranged one after another. We can compress this string with a bitmap. This means that we can save the positions of the occurrences of a given element with a sequence of bits, which can be easily converted into a decimal value. In the example above the string “abababababababab” can be compressed as “1010101010101010”, which is 43690 in decimals, and even better AAAA in hexadecimal. Thus the long string can be compressed. When decompressing (decoding) the message we can convert again from decimal/hexadecimal into binary and match the occurrences of the characters. Well, the example above is too simple, but let’s say only one of the characters is repeating and the rest of the string consists of different characters like this: “abacadaeafagahai”. Then we can use bitmap only for the character “a” – “1010101010101010” and compress it as “AAAA bcdefghi”. As you can see all the example strings are exactly 16 characters and that is a limitation. To use bitmaps with variable length of the data is a bit tricky and it is not always easy (if possible) to decompress it. Basically bitmap compression saves the positions of an element that is repeated very often in the message! In the other hand bitmap compression is not only applicable on strings. We can compress also arrays, objects or any kind of data. The example from my previous post is very suitable. Then we had to transfer a large array from a server to the client (browser) using JSON. The data then was very suitable for “run-length encoding”. Now let’s assume we have the same data – a set of different years, which this time are dispersed in a different way. $data = array( 0 => 1991, 1 => 1992, 2 => 1993, 3 => 1994, 4 => 1991, 5 => 1992, 6 => 1993, 7 => 1992, 8 => 1991, 9 => 1991, 10 => 1991, 11 => 1992, 12 => 1992, 13 => 1991, 14 => 1991, 15 => 1992, ... ); The JSON will encoded message will be the following (a simple but yet very large javascript array). [1991,1992,1993,1994,1991,1992,1993,1992,1991,1991,1991,1992,1992,1991,1991,1992, ...] However if we use bitmap compression we’ll get a “shorter” array. $data = array( 0 => array(1991, '1000100011100110'), 1 => array(1992, '0100010100011001'), 2 => array(1993, '0010001000000000'), 3 => array(1994, '0001000000000000'), ); Now the JSON is: [[1991,"1000100011100110"],[1992,"0100010100011001"],[1993,"0010001000000000"],[1994,"0001000000000000"]] It is obvious that the compression ratio is getting better and better as the uncompressed data grows. In fact, most of us know bitmap compression from images, because this algorithm is largely used for image compression. We can imagine how successful it can be when compressing black and white images (as black and white can be represented as 0 and 1s). Actually it is used for more than two colors (256 for instance) and again the level of compression is very high. Implementation The following implementation on PHP aims only to illustrate the bitmap compressing algorithm. As we know this algorithm can be applicable for any kind of data structures. // too many repeating "a" characters $msg = 'aazahalavaatalawacamaahakafaaaqaaaiauaacaaxaauaxaaaaaapaayatagaaoafaawayazavaaaazaaabararaaaaakakaaqaarazacajaazavanazaaaeanaaoajauaaaaaxalaraaapabataaavaaab'; function bitmap($message) { $i = 0; $bits = $rest = ''; while ($v = $message[$i]) { if ($v == 'a') { $bits .= '1'; } else { $bits .= '0'; $rest .= $v; } $i++; } return number_format(bindec($bits), 0, '.', '') . $rest;; } echo bitmap($msg); // uncompressed: acaaaaadaaaabalaaeaaaaganaaxakaavawamaasavajawaaaayaauaaadalanagaeaeamaarafalaazaaaiasaanaahaaazaraxaalaahaaawaaajasamahaajaakarapanaakaoakaanawalaacamauaamaal // compressed: 152299251941730035874325065523548237677352452096zhlvtlwcmhkfqiucxuxpytgofwyzvzbrrkkqrzcjzvnzenojuxlrpbtvb Application This algorithm is very useful when there is an element in our data that repeats very often, so you need to investigate the nature of the data you want to compress. Actually because of this fact this algorithm is used for image compression as PNG8 or GIF. Source: http://www.stoimen.com/blog/2012/01/16/computer-algorithms-data-compression-with-bitmaps/

January 17, 2012

by Stoimen Popov

· 20,342 Views

Method Validation With Spring 3.1 and Hibernate Validator 4.2

JSR 303 and Hibernate Validator have been awesome additions to the Java ecosystem, giving you a standard way to validate your domain model across application layers.

January 13, 2012

by Gunnar Hillert

· 32,863 Views

From Java to Node.js

I’ve been developing for quite a while and in quite a few languages. Somehow though, I’ve always seemed to fall back to Java when doing my own stuff – maybe partly from habit, partly because it has in my opinion the best open source selection out there, and party because I liked its mix of features and performance. Originally authored by Matan Amir Specifically though, in the web arena things have been moving fast and furious with new languages, approaches, and methods like RoR, Play!, and Lift (and many others!). While I “get it” regarding the benefits of these frameworks, I never felt the need to give them more than an initial deep dive to see how they work. I nod a few times at their similarities and move on back to plain REST-ful services in Java with Spring, maybe an ORM (i try to avoid them these days), and a JS-rich front-end. Recently, two factors made me deep dive into Node.js. First is my growing appreciation with the progress of the JavaScript front-end. What used to be little JavaScript snippets to validate a form “onSubmit” have evolved to a complete ecosystem of technologies, frameworks, and stacks. My personal favorite these days is Backbone.js and try to use it whenever I can. Second was the first hand feedback I got from a friend at Voxer about their success in using and deploying Node.js at real scale (they have a pretty big Node.js deployment). So I jumped in. In the short time I’ve been using it for some (real) projects, I can say that it’s my new favorite language of choice. First of all, the event-driven (and non-blocking) model is a perfect fit for server side development. While this has existed in the other languages and forms (Java Servlet 3.0, Event Machine, Twisted to name a few), I connected with the easy of use and natural maturity of it in JavaScript. We’re all used to using callbacks anyway from your typical run-of-the-mill AJAX work right? Second is the community and how large its gotten in the short time Node has been around. There are lots of useful open source libraries to use to solve your problems and the quality level is high. Third is that it was just so easy to pick up. I have to admit that my core JavaScript was decent before I started using Node, but in terms of the Node core library and feature set itself, it’s pretty lean and mean and provides a good starting point to build what you need in case you can’t find someone who already did (which you most likely can). So having said all that, I wanted to share what resources helped me get up to speed in Node.js coming from a Java background. Getting to know JavaScript There are lots of good resources out there on Node.js and i’ve listed them below. For me, the most critical piece was getting my JavaScript knowledge to the next level. Because you’ll be spending all your time writing your server code in JavaScript, it pays huge dividends to understand how to take full advantage of it. As a Java developer you’re probably trained to think in OO designs. With me, this was the part that I focused the most on. Fortunately (or unforunately), JavaScript is not a classic OO language. It can be if you shoehorn it, but i think that defeats the purpose. Here is my short list of JavaScript resources: JavaScript: The Good Parts - Definitely a requirement. Chapters 4 and 5 (functions, objects, and inheritance) are probably the most important part to understand well for those with an OO background. Learning Javascript with Object Graphs (part 2, and part 3) – howtonode.org has lots of good Node material, but this 3 part post is a good resource to top off your knowledge from the book. Simple “Class” Instantiation – Another good post I read recently. Worth digesting. Learning Node.js With a good JavaScript background, starting to use Node.js is pretty straightforward. The main part to understand is the asynchronous nature of the I/O and the need to produce and consume events to get stuff done. Here is a list of resources I used to get up to speed: DailyJS’s Node Tutorial - A multipart tutorial for Node on DailyJS’s blog. It’s a great resource and worth going through all the posts. Mixu’s Node Book - Not complete, but still worth it. I look forward to reading the future chapters. Node Beginner Book – A good starter read. How To Node – A blog dedicated to Node.js. Bookmark it. I felt that going through these was more than enough to give me the push I needed get started. Hopefully it does for you too (thanks to the authors for taking the time to share their knowledge!). Frameworks? Java is all about open source frameworks. That’s part of why it’s so popular. While Node.js is much newer, lots of people have already done quite a bit of heavy-lifting and have shared their code with the world. Below is what I think is a good mapping of popular Java frameworks to their Node.js equivalents (from what I know so far). Web MVC In Java land, most people are familiar with Web MVC frameworks like Spring MVC, Struts, Wicket, and JSF. More recently though, the trend towards client-side JS MVC frameworks like Ember.js (SproutCore) and Backbone.js reduces the required feature-set of some of these frameworks. Nonetheless, a good comparable Node.js web framework is Express. In a sense, it’s even more than a web framework because it also provides most of the “web server” functionality most Java developers are used to through Tomcat, Jetty, etc (more specifically, Express builds on top of Connect) It’s well thought out and provides the feature set needed to get things done. Application Lifecycle Framework (DI Framework) Spring is a popular framework in Java to provide a great deal of glue and abstraction functionality. It allows easy dependency injection, testing, object lifecycle management, transaction management, etc. It’s usually one of the first things I slap into a new project in Java. In Node.js… I haven’t really missed it. JavaScript allows much more flexible ways of decoupling dependencies and I personally haven’t felt the need to find a replacement for Spring. Maybe that’s a good thing? Object-Relational Mapping (ORMs) I have mixed feelings regarding ORMs in general, but it does make your life easier sometimes. There is no shortage of ORM librares for Node.js. Take your pick. Package Management Tools Maven is probably most popular build management tool for Java. While it is very flexible and powerful with a wide variety of plug-ins, it can get very cumbersome. Npm is the mainstream package manager for Node.js. It’s light, fast, and useful. Testing Framework Java has lots of these for sure, jUnit and company as standard. There are also mock libraries, stub libraries, db test libraries, etc. Node.js has quite a few as well. Pick your poison. I see that nodeunit is popular and is similar to jUnit. I’m personally testing Mocha. Testing tools are a more personal and subjective choice, but the good thing is that there definitely are good choices out there. Logging Java developers have quite a list of choices when choosing a logger library. Commons logging, log4j, logback, and slf4j (wrapper) are some of the more popular ones. Node.js also has a few. I’m currently using winston and have no complaints so far. It has the logging levels, multiple transports (appenders to log4j people), and does it all asynchronously as well. Hopefully this will help someone save some time when peeking into the world of Node. Good luck! Source: http://n0tw0rthy.wordpress.com/2012/01/08/from-java-to-node-js/

January 10, 2012

by Chris Smith

· 28,760 Views · 2 Likes

Object-oriented Clojure

Clojure is a LISP dialect, and as such a functional language based on a large set of functions and a small set of data structures that they operate with. However, it is possible to implement classes and object in Clojure, with constructs well-supported in the language itself. I do not claim you should program in Clojure only with the techniques described in this article: they are just an attempt to bridge Java libraries with Clojure, and to introduce objects and interfaces where needed. Java interoperability Even from the Clojure REPL, you can easily instantiate Java classes as long as they are in the classpath (so use lein if they are in a library) (def ford (Car. arg1 arg2)) Since classes usually reside into packages, in real code you would use their fully qualified name: (def ford (com.dzone.example.Car. arg1 arg2)) You can also call methods: (.brake ford arg1 arg2) The first argument of the evaluation of a .methodName is always the object, while additional arguments are placed after it, in the same s-expression (a fancy name for a list that can be evaluated.) One of the first exercises in the book The Joy of Clojure is an instantiation of a java.awt.Frame object and a rendering done on it, all from the Clojure interactive interpreter. Doing exploratory testing classes and objects for Java development with the REPL is as faster as writing code into JUnit test. Define your own: interfaces defprotocol takes as the first argument an identifier, and a variable number of parameters. Each of this parameters is a method definition like you would do with fn, but without the fn keyword itself and a body to evaluate. The first argument of the methods should always be this, representing the current object (remember Python?). (defprotocol Position (description [this]) (translateX [this dx]) (translateY [this dy]) (doubleCoords [this]) (average [this another])) Define your own: classes Defining classes in Clojure is simple, at least when they implement immutable Value Objects. defrecord takes a class name, a list of parameters for the constructor, an optional protocol name and a variable number of arguments representing the methods. Again, the methods definition are similar to the ones made with fn, but this time with a body, an s-expression to evaluate. There is a catch: you can only implement methods defined in a protocol. (defrecord CartesianPoint [x y] Position (description [this] (str "(x=" x ", y=" y ")")) (translateX [this dx] (CartesianPoint. (+ x dx) y)) (translateY [this dy] (CartesianPoint. x (+ y dy))) (doubleCoords [this] (CartesianPoint. (* 2 x) (* 2 y))) (average [this another] (let [mean (fn [a b] (/ (+ a b) 2))] (CartesianPoint. (mean x (:x another)) (mean y (:y another)))))) The method access the (immutable) fields of the current object with their names, so this is not probably necessary if not for calling other methods on the object. Note that records are not strictly objects in the OO sense, since they do not encapsulate their fields: fields can be accessed anywhere with (:fieldname object). Here's how to work with CartesianPoint objects: (deftest test-point-x-field (is (= 1 (:x (CartesianPoint. 1 2))))) (deftest test-point-y-field (is (= 2 (:y (CartesianPoint. 1 2))))) (deftest test-point-to-string (is (= "(x=1, y=2)" (description (CartesianPoint. 1 2))))) (deftest test-point-translation-x (is (= (CartesianPoint. 11 2) (translateX (CartesianPoint. 1 2) 10)))) (deftest test-point-translation-y (is (= (CartesianPoint. 1 12) (translateY (CartesianPoint. 1 2) 10)))) (deftest test-point-doubling (is (= (CartesianPoint. 2 4) (doubleCoords (CartesianPoint. 1 2))))) (deftest test-points-average (is (= (CartesianPoint. 3 4) (average (CartesianPoint. 1 2) (CartesianPoint. 5 6)))))

January 10, 2012

by Giorgio Sironi

· 13,758 Views · 2 Likes

Searching relational content with Lucene's BlockJoinQuery

Lucene's 3.4.0 release adds a new feature called index-time join (also sometimes called sub-documents, nested documents or parent/child documents), enabling efficient indexing and searching of certain types of relational content. Most search engines can't directly index relational content, as documents in the index logically behave like a single flat database table. Yet, relational content is everywhere! A job listing site has each company joined to the specific listings for that company. Each resume might have separate list of skills, education and past work experience. A music search engine has an artist/band joined to albums and then joined to songs. A source code search engine would have projects joined to modules and then files. Perhaps the PDF documents you need to search are immense, so you break them up and index each section as a separate Lucene document; in this case you'll have common fields (title, abstract, author, date published, etc.) for the overall document, joined to the sub-document (section) with its own fields (text, page number, etc.). XML documents typically contain nested tags, representing joined sub-documents; emails have attachments; office documents can embed other documents. Nearly all search domains have some form of relational content, often requiring more than one join. If such content is so common then how do search applications handle it today? One obvious "solution" is to simply use a relational database instead of a search engine! If relevance scores are less important and you need to do substantial joining, grouping, sorting, etc., then using a database could be best overall. Most databases include some form a text search, some even using Lucene. If you still want to use a search engine, then one common approach is to denormalize the content up front, at index-time, by joining all tables and indexing the resulting rows, duplicating content in the process. For example, you'd index each song as a Lucene document, copying over all fields from the song's joined album and artist/band. This works correctly, but can be horribly wasteful as you are indexing identical fields, possibly including large text fields, over and over. Another approach is to do the join yourself, outside of Lucene, by indexing songs, albums and artist/band as separate Lucene documents, perhaps even in separate indices. At search-time, you first run a query against one collection, for example the songs. Then you iterate through all hits, gathering up (joining) the full set of corresponding albums and then run a second query against the albums, with a large OR'd list of the albums from the first query, repeating this process if you need to join to artist/band as well. This approach will also work, but doesn't scale well as you may have to create possibly immense follow-on queries. Yet another approach is to use a software package that has already implemented one of these approaches for you! elasticsearch, Apache Solr, Apache Jackrabbit, Hibernate Search and many others all handle relational content in some way. With BlockJoinQuery you can now directly search relational content yourself! Let's work through a simple example: imagine you sell shirts online. Each shirt has certain common fields such as name, description, fabric, price, etc. For each shirt you have a number of separate stock keeping units or SKUs, which have their own fields like size, color, inventory count, etc. The SKUs are what you actually sell, and what you must stock, because when someone buys a shirt they buy a specific SKU (size and color). Maybe you are lucky enough to sell the incredible Mountain Three-wolf Moon Short Sleeve Tee, with these SKUs (size, color): small, blue small, black medium, black large, gray Perhaps a user first searches for "wolf shirt", gets a bunch of hits, and then drills down on a particular size and color, resulting in this query: name:wolf AND size=small AND color=blue which should match this shirt. name is a shirt field while the size and color are SKU fields. But if the user drills down instead on a small gray shirt: name:wolf AND size=small AND color=gray then this shirt should not match because the small size only comes in blue and black. How can you run these queries using BlockJoinQuery? Start by indexing each shirt (parent) and all of its SKUs (children) as separate documents, using the new IndexWriter.addDocuments API to add one shirt and all of its SKUs as a single document block. This method atomically adds a block of documents into a single segment as adjacent document IDs, which BlockJoinQuery relies on. You should also add a marker field to each shirt document (e.g. type = shirt), as BlockJoinQuery requires a Filter identifying the parent documents. To run a BlockJoinQuery at search-time, you'll first need to create the parent filter, matching only shirts. Note that the filter must use FixedBitSet under the hood, like CachingWrapperFilter: Filter shirts = new CachingWrapperFilter( new QueryWrapperFilter( new TermQuery( new Term("type", "shirt")))); Create this filter once, up front and re-use it any time you need to perform this join. Then, for each query that requires a join, because it involves both SKU and shirt fields, start with the child query matching only SKU fields: BooleanQuery skuQuery = new BooleanQuery(); skuQuery.add(new TermQuery(new Term("size", "small")), Occur.MUST); skuQuery.add(new TermQuery(new Term("color", "blue")), Occur.MUST); Next, use BlockJoinQuery to translate hits from the SKU document space up to the shirt document space: BlockJoinQuery skuJoinQuery = new BlockJoinQuery( skuQuery, shirts, ScoreMode.None); The ScoreMode enum decides how scores for multiple SKU hits should be aggregated to the score for the corresponding shirt hit. In this query you don't need scores from the SKU matches, but if you did you can aggregate with Avg, Max or Total instead. Finally you are now free to build up an arbitrary shirt query using skuJoinQuery as a clause: BooleanQuery query = new BooleanQuery(); query.add(new TermQuery(new Term("name", "wolf")), Occur.MUST); query.add(skuJoinQuery, Occur.MUST); You could also just run skuJoinQuery as-is if the query doesn't have any shirt fields. Finally, just run this query like normal! The returned hits will be only shirt documents; if you'd also like to see which SKUs matched for each shirt, use BlockJoinCollector: BlockJoinCollector c = new BlockJoinCollector( Sort.RELEVANCE, // sort 10, // numHits true, // trackScores false // trackMaxScore ); searcher.search(query, c); The provided Sort must use only shirt fields (you cannot sort by any SKU fields). When each hit (a shirt) is competitive, this collector will also record all SKUs that matched for that shirt, which you can retrieve like this: TopGroups hits = c.getTopGroups( skuJoinQuery, skuSort, 0, // offset 10, // maxDocsPerGroup 0, // withinGroupOffset true // fillSortFields ); Set skuSort to the sort order for the SKUs within each shirt. The first offset hits are skipped (use this for paging through shirt hits). Under each shirt, at most maxDocsPerGroup SKUs will be returned. Use withinGroupOffset if you want to page within the SKUs. If fillSortFields is true then each SKU hit will have values for the fields from skuSort. The hits returned by BlockJoinCollector.getTopGroups are SKU hits, grouped by shirt. You'd get the exact same results if you had denormalized up-front and then used grouping to group results by shirt. You can also do more than one join in a single query; the joins can be nested (parent to child to grandchild) or parallel (parent to child1 and parent to child2). However, there are some important limitations of index-time joins: The join must be computed at index-time and "compiled" into the index, in that all joined child documents must be indexed along with the parent document, as a single document block. Different document types (for example, shirts and SKUs) must share a single index, which is wasteful as it means non-sparse data structures like FieldCache entries consume more memory than they would if you had separate indices. If you need to re-index a parent document or any of its child documents, or delete or add a child, then the entire block must be re-indexed. This is a big problem in some cases, for example if you index "user reviews" as child documents then whenever a user adds a review you'll have to re-index that shirt as well as all its SKUs and user reviews. There is no QueryParser support, so you need to programmatically create the parent and child queries, separating according to parent and child fields. The join can currently only go in one direction (mapping child docIDs to parent docIDs), but in some cases you need to map parent docIDs to child docIDs. For example, when searching songs, perhaps you want all matching songs sorted by their title. You can't easily do this today because the only way to get song hits is to group by album or band/artist. The join is a one (parent) to many (children), inner join. As usual, patches are welcome! There is work underway to create a more flexible, but likely less performant, query-time join capability, which should address a number of the above limitations. Source: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

January 9, 2012

by Michael Mccandless

· 14,587 Views

Java Sequential IO Performance

Many applications record a series of events to file-based storage for later use. This can be anything from logging and auditing, through to keeping a transaction redo log in an event sourced design or its close relative CQRS. Java has a number of means by which a file can be sequentially written to, or read back again. This article explores some of these mechanisms to understand their performance characteristics. For the scope of this article I will be using pre-allocated files because I want to focus on performance. Constantly extending a file imposes a significant performance overhead and adds jitter to an application resulting in highly variable latency. "Why is a pre-allocated file better performance?", I hear you ask. Well, on disk a file is made up from a series of blocks/pages containing the data. Firstly, it is important that these blocks are contiguous to provide fast sequential access. Secondly, meta-data must be allocated to describe this file on disk and saved within the file-system. A typical large file will have a number of "indirect" blocks allocated to describe the chain of data-blocks containing the file contents that make up part of this meta-data. I'll leave it as an exercise for the reader, or maybe a later article, to explore the performance impact of not preallocating the data files. If you have used a database you may have noticed that it preallocates the files it will require. The Test I want to experiment with 2 file sizes. One that is sufficiently large to test sequential access, but can easily fit in the file-system cache, and another that is much larger so that the cache subsystem is forced to retire pages so that new ones can be loaded. For these two cases I'll use 400MB and 8GB respectively. I'll also loop over the files a number of times to show the pre and post warm-up characteristics. I'll test 4 means of writing and reading back files sequentially: RandomAccessFile using a vanilla byte[] of page size. Buffered FileInputStream and FileOutputStream. NIO FileChannel with ByteBuffer of page size. Memory mapping a file using NIO and direct MappedByteBuffer. The tests are run on a 2.0Ghz Sandybridge CPU with 8GB RAM, an Intel 230 SSD on Fedora Core 15 64-bit Linux with an ext4 file system, and Oracle JDK 1.6.0_30. The Code import java.io.*; import java.nio.ByteBuffer; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; import static java.lang.Integer.MAX_VALUE; import static java.lang.System.out; import static java.nio.channels.FileChannel.MapMode.READ_ONLY; import static java.nio.channels.FileChannel.MapMode.READ_WRITE; public final class TestSequentialIoPerf { public static final int PAGE_SIZE = 1024 * 4; public static final long FILE_SIZE = PAGE_SIZE * 2000L * 1000L; public static final String FILE_NAME = "test.dat"; public static final byte[] BLANK_PAGE = new byte[PAGE_SIZE]; public static void main(final String[] arg) throws Exception { preallocateTestFile(FILE_NAME); for (final PerfTestCase testCase : testCases) { for (int i = 0; i < 5; i++) { System.gc(); long writeDurationMs = testCase.test(PerfTestCase.Type.WRITE, FILE_NAME); System.gc(); long readDurationMs = testCase.test(PerfTestCase.Type.READ, FILE_NAME); long bytesReadPerSec = (FILE_SIZE * 1000L) / readDurationMs; long bytesWrittenPerSec = (FILE_SIZE * 1000L) / writeDurationMs; out.format("%s\twrite=%,d\tread=%,d bytes/sec\n", testCase.getName(), bytesWrittenPerSec, bytesReadPerSec); } } deleteFile(FILE_NAME); } private static void preallocateTestFile(final String fileName) throws Exception { RandomAccessFile file = new RandomAccessFile(fileName, "rw"); for (long i = 0; i < FILE_SIZE; i += PAGE_SIZE) { file.write(BLANK_PAGE, 0, PAGE_SIZE); } file.close(); } private static void deleteFile(final String testFileName) throws Exception { File file = new File(testFileName); if (!file.delete()) { out.println("Failed to delete test file=" + testFileName); out.println("Windows does not allow mapped files to be deleted."); } } public abstract static class PerfTestCase { public enum Type { READ, WRITE } private final String name; private int checkSum; public PerfTestCase(final String name) { this.name = name; } public String getName() { return name; } public long test(final Type type, final String fileName) { long start = System.currentTimeMillis(); try { switch (type) { case WRITE: { checkSum = testWrite(fileName); break; } case READ: { final int checkSum = testRead(fileName); if (checkSum != this.checkSum) { final String msg = getName() + " expected=" + this.checkSum + " got=" + checkSum; throw new IllegalStateException(msg); } break; } } } catch (Exception ex) { ex.printStackTrace(); } return System.currentTimeMillis() - start; } public abstract int testWrite(final String fileName) throws Exception; public abstract int testRead(final String fileName) throws Exception; } private static PerfTestCase[] testCases = { new PerfTestCase("RandomAccessFile") { public int testWrite(final String fileName) throws Exception { RandomAccessFile file = new RandomAccessFile(fileName, "rw"); final byte[] buffer = new byte[PAGE_SIZE]; int pos = 0; int checkSum = 0; for (long i = 0; i < FILE_SIZE; i++) { byte b = (byte)i; checkSum += b; buffer[pos++] = b; if (PAGE_SIZE == pos) { file.write(buffer, 0, PAGE_SIZE); pos = 0; } } file.close(); return checkSum; } public int testRead(final String fileName) throws Exception { RandomAccessFile file = new RandomAccessFile(fileName, "r"); final byte[] buffer = new byte[PAGE_SIZE]; int checkSum = 0; int bytesRead; while (-1 != (bytesRead = file.read(buffer))) { for (int i = 0; i < bytesRead; i++) { checkSum += buffer[i]; } } file.close(); return checkSum; } }, new PerfTestCase("BufferedStreamFile") { public int testWrite(final String fileName) throws Exception { int checkSum = 0; OutputStream out = new BufferedOutputStream(new FileOutputStream(fileName)); for (long i = 0; i < FILE_SIZE; i++) { byte b = (byte)i; checkSum += b; out.write(b); } out.close(); return checkSum; } public int testRead(final String fileName) throws Exception { int checkSum = 0; InputStream in = new BufferedInputStream(new FileInputStream(fileName)); int b; while (-1 != (b = in.read())) { checkSum += (byte)b; } in.close(); return checkSum; } }, new PerfTestCase("BufferedChannelFile") { public int testWrite(final String fileName) throws Exception { FileChannel channel = new RandomAccessFile(fileName, "rw").getChannel(); ByteBuffer buffer = ByteBuffer.allocate(PAGE_SIZE); int checkSum = 0; for (long i = 0; i < FILE_SIZE; i++) { byte b = (byte)i; checkSum += b; buffer.put(b); if (!buffer.hasRemaining()) { channel.write(buffer); buffer.clear(); } } channel.close(); return checkSum; } public int testRead(final String fileName) throws Exception { FileChannel channel = new RandomAccessFile(fileName, "rw").getChannel(); ByteBuffer buffer = ByteBuffer.allocate(PAGE_SIZE); int checkSum = 0; while (-1 != (channel.read(buffer))) { buffer.flip(); while (buffer.hasRemaining()) { checkSum += buffer.get(); } buffer.clear(); } return checkSum; } }, new PerfTestCase("MemoryMappedFile") { public int testWrite(final String fileName) throws Exception { FileChannel channel = new RandomAccessFile(fileName, "rw").getChannel(); MappedByteBuffer buffer = channel.map(READ_WRITE, 0, Math.min(channel.size(), MAX_VALUE)); int checkSum = 0; for (long i = 0; i < FILE_SIZE; i++) { if (!buffer.hasRemaining()) { buffer = channel.map(READ_WRITE, i, Math.min(channel.size() - i , MAX_VALUE)); } byte b = (byte)i; checkSum += b; buffer.put(b); } channel.close(); return checkSum; } public int testRead(final String fileName) throws Exception { FileChannel channel = new RandomAccessFile(fileName, "rw").getChannel(); MappedByteBuffer buffer = channel.map(READ_ONLY, 0, Math.min(channel.size(), MAX_VALUE)); int checkSum = 0; for (long i = 0; i < FILE_SIZE; i++) { if (!buffer.hasRemaining()) { buffer = channel.map(READ_WRITE, i, Math.min(channel.size() - i , MAX_VALUE)); } checkSum += buffer.get(); } channel.close(); return checkSum; } }, }; } Results 400MB file =========== RandomAccessFile write=379,610,750 read=1,452,482,269 bytes/sec RandomAccessFile write=294,041,636 read=1,494,890,510 bytes/sec RandomAccessFile write=250,980,392 read=1,422,222,222 bytes/sec RandomAccessFile write=250,366,748 read=1,388,474,576 bytes/sec RandomAccessFile write=260,394,151 read=1,422,222,222 bytes/sec BufferedStreamFile write=98,178,331 read=286,433,566 bytes/sec BufferedStreamFile write=100,244,738 read=288,857,545 bytes/sec BufferedStreamFile write=82,948,562 read=154,100,827 bytes/sec BufferedStreamFile write=108,503,311 read=153,869,271 bytes/sec BufferedStreamFile write=113,055,478 read=152,608,047 bytes/sec BufferedChannelFile write=388,246,445 read=358,041,958 bytes/sec BufferedChannelFile write=390,467,111 read=375,091,575 bytes/sec BufferedChannelFile write=321,759,622 read=1,539,849,624 bytes/sec BufferedChannelFile write=318,259,518 read=1,539,849,624 bytes/sec BufferedChannelFile write=322,265,932 read=1,534,082,397 bytes/sec MemoryMappedFile write=300,955,180 read=305,899,925 bytes/sec MemoryMappedFile write=313,149,847 read=310,538,286 bytes/sec MemoryMappedFile write=326,374,501 read=303,857,566 bytes/sec MemoryMappedFile write=327,680,000 read=304,535,315 bytes/sec MemoryMappedFile write=326,895,450 read=303,632,320 bytes/sec 8GB File ============ RandomAccessFile write=167,402,321 read=251,922,012 bytes/sec RandomAccessFile write=193,934,802 read=257,052,307 bytes/sec RandomAccessFile write=192,948,159 read=248,460,768 bytes/sec RandomAccessFile write=191,814,180 read=245,225,408 bytes/sec RandomAccessFile write=190,635,762 read=275,315,073 bytes/sec BufferedStreamFile write=154,823,102 read=248,355,313 bytes/sec BufferedStreamFile write=152,083,913 read=253,418,301 bytes/sec BufferedStreamFile write=133,099,369 read=146,056,197 bytes/sec BufferedStreamFile write=131,065,708 read=146,217,827 bytes/sec BufferedStreamFile write=132,694,052 read=148,116,004 bytes/sec BufferedChannelFile write=406,147,744 read=304,693,892 bytes/sec BufferedChannelFile write=397,457,668 read=298,183,671 bytes/sec BufferedChannelFile write=364,672,364 read=414,281,379 bytes/sec BufferedChannelFile write=371,266,711 read=404,343,534 bytes/sec BufferedChannelFile write=373,705,579 read=406,934,578 bytes/sec MemoryMappedFile write=123,023,322 read=231,530,156 bytes/sec MemoryMappedFile write=121,961,023 read=230,403,600 bytes/sec MemoryMappedFile write=123,317,778 read=229,899,250 bytes/sec MemoryMappedFile write=121,472,738 read=231,739,745 bytes/sec MemoryMappedFile write=120,362,615 read=231,190,382 bytes/sec Analysis For years I was a big fan of using RandomAccessFile directly because of the control it gives and the predictable execution. I never found using buffered streams to be useful from a performance perspective and this still seems to be the case. In more recent testing I've found that using NIO FileChannel and ByteBuffer are the clear winners from a performance perspective. With Java 7 the flexibility of this programming approach has been improved for random access with SeekableByteChannel. I've seen these results vary greatly depending on platform. File system, OS, storage devices, and available memory all have a significant impact. In a few cases I've seen memory-mapped files perform significantly better than the others but this needs to be tested on your platform because your mileage may vary... A special note should be made for the use of memory-mapped large files when pushing for maximum throughput. I've often found the OS can become unresponsive due the the pressure put on the virtual memory sub-system. Conclusion There is a significant difference in performance for the different means of doing sequential file IO from Java. Not all methods are even remotely equal. For most IO I've found the use of ByteBuffers and Channels to be the best optimised parts of the IO libraries. If buffered streams are your IO libraries of choice, then it is worth branching out and and getting familiar with the sub-classes of Channel and Buffer. From http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html

January 9, 2012

by Martin Thompson

· 19,091 Views · 2 Likes