Java Resources

The Latest Java Topics

Java 8 Friday: 10 Subtle Mistakes When Using the Streams API

at data geekery , we love java. and as we’re really into jooq’s fluent api and query dsl , we’re absolutely thrilled about what java 8 will bring to our ecosystem. java 8 friday every friday, we’re showing you a couple of nice new tutorial-style java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. you’ll find the source code on github . 10 subtle mistakes when using the streams api we’ve done all the sql mistakes lists: 10 common mistakes java developers make when writing sql 10 more common mistakes java developers make when writing sql yet another 10 common mistakes java developers make when writing sql (you won’t believe the last one) but we haven’t done a top 10 mistakes list with java 8 yet! for today’s occasion ( it’s friday the 13th ), we’ll catch up with what will go wrong in your application when you’re working with java 8. (it won’t happen to us, as we’re stuck with java 6 for another while) 1. accidentally reusing streams wanna bet, this will happen to everyone at least once. like the existing “streams” (e.g. inputstream ), you can consume streams only once. the following code won’t work: intstream stream = intstream.of(1, 2); stream.foreach(system.out::println); // that was fun! let's do it again! stream.foreach(system.out::println); you’ll get a java.lang.illegalstateexception: stream has already been operated upon or closed so be careful when consuming your stream. it can be done only once 2. accidentally creating “infinite” streams you can create infinite streams quite easily without noticing. take the following example: // will run indefinitely intstream.iterate(0, i -> i + 1) .foreach(system.out::println); the whole point of streams is the fact that they can be infinite, if you design them to be. the only problem is, that you might not have wanted that. so, be sure to always put proper limits: // that's better intstream.iterate(0, i -> i + 1) .limit(10) .foreach(system.out::println); 3. accidentally creating “subtle” infinite streams we can’t say this enough. you will eventually create an infinite stream, accidentally. take the following stream, for instance: intstream.iterate(0, i -> ( i + 1) % 2) .distinct() .limit(10) .foreach(system.out::println); so… we generate alternating 0′s and 1′s then we keep only distinct values, i.e. a single 0 and a single 1 then we limit the stream to a size of 10 then we consume it well… the distinct() operation doesn’t know that the function supplied to the iterate() method will produce only two distinct values. it might expect more than that. so it’ll forever consume new values from the stream, and the limit(10) will never be reached. tough luck, your application stalls. 4. accidentally creating “subtle” parallel infinite streams we really need to insist that you might accidentally try to consume an infinite stream. let’s assume you believe that the distinct() operation should be performed in parallel. you might be writing this: intstream.iterate(0, i -> ( i + 1) % 2) .parallel() .distinct() .limit(10) .foreach(system.out::println); now, we’ve already seen that this will turn forever. but previously, at least, you only consumed one cpu on your machine. now, you’ll probably consume four of them, potentially occupying pretty much all of your system with an accidental infinite stream consumption. that’s pretty bad. you can probably hard-reboot your server / development machine after that. have a last look at what my laptop looked like prior to exploding: if i were a laptop, this is how i’d like to go. 5. mixing up the order of operations so, why did we insist on your definitely accidentally creating infinite streams? it’s simple. because you may just accidentally do it. the above stream can be perfectly consumed if you switch the order of limit() and distinct() : intstream.iterate(0, i -> ( i + 1) % 2) .limit(10) .distinct() .foreach(system.out::println); this now yields: 0 1 why? because we first limit the infinite stream to 10 values (0 1 0 1 0 1 0 1 0 1), before we reduce the limited stream to the distinct values contained in it (0 1). of course, this may no longer be semantically correct, because you really wanted the first 10 distinct values from a set of data (you just happened to have “forgotten” that the data is infinite). no one really wants 10 random values, and only then reduce them to be distinct. if you’re coming from a sql background, you might not expect such differences. take sql server 2012, for instance. the following two sql statements are the same: -- using top selectdistincttop10 * fromi orderby.. -- using fetch select* fromi orderby.. offset 0 rows fetchnext10 rowsonly so, as a sql person, you might not be as aware of the importance of the order of streams operations. 6. mixing up the order of operations (again) speaking of sql, if you’re a mysql or postgresql person, you might be used to the limit .. offset clause. sql is full of subtle quirks, and this is one of them. the offset clause is applied first , as suggested in sql server 2012′s (i.e. the sql:2008 standard’s) syntax. if you translate mysql / postgresql’s dialect directly to streams, you’ll probably get it wrong: intstream.iterate(0, i -> i + 1) .limit(10) // limit .skip(5) // offset .foreach(system.out::println); the above yields 5 6 7 8 9 yes. it doesn’t continue after 9 , because the limit() is now applied first , producing (0 1 2 3 4 5 6 7 8 9). skip() is applied after, reducing the stream to (5 6 7 8 9). not what you may have intended. beware of the limit .. offset vs. "offset .. limit" trap! 7. walking the file system with filters we’ve blogged about this before . what appears to be a good idea is to walk the file system using filters: files.walk(paths.get(".")) .filter(p -> !p.tofile().getname().startswith(".")) .foreach(system.out::println); the above stream appears to be walking only through non-hidden directories, i.e. directories that do not start with a dot. unfortunately, you’ve again made mistake #5 and #6. walk() has already produced the whole stream of subdirectories of the current directory. lazily, though, but logically containing all sub-paths. now, the filter will correctly filter out paths whose names start with a dot “.”. e.g. .git or .idea will not be part of the resulting stream. but these paths will be: .\.git\refs , or .\.idea\libraries . not what you intended. now, don’t fix this by writing the following: files.walk(paths.get(".")) .filter(p -> !p.tostring().contains(file.separator + ".")) .foreach(system.out::println); while that will produce the correct output, it will still do so by traversing the complete directory subtree, recursing into all subdirectories of “hidden” directories. i guess you’ll have to resort to good old jdk 1.0 file.list() again. the good news is, filenamefilter and filefilter are both functional interfaces. 8. modifying the backing collection of a stream while you’re iterating a list , you must not modify that same list in the iteration body. that was true before java 8, but it might become more tricky with java 8 streams. consider the following list from 0..9: // of course, we create this list using streams: list list = intstream.range(0, 10) .boxed() .collect(tocollection(arraylist::new)); now, let’s assume that we want to remove each element while consuming it: list.stream() // remove(object), not remove(int)! .peek(list::remove) .foreach(system.out::println); interestingly enough, this will work for some of the elements! the output you might get is this one: 0 2 4 6 8 null null null null null java.util.concurrentmodificationexception if we introspect the list after catching that exception, there’s a funny finding. we’ll get: [1, 3, 5, 7, 9] heh, it “worked” for all the odd numbers. is this a bug? no, it looks like a feature. if you’re delving into the jdk code, you’ll find this comment in arraylist.arralistspliterator : /* * if arraylists were immutable, or structurally immutable (no * adds, removes, etc), we could implement their spliterators * with arrays.spliterator. instead we detect as much * interference during traversal as practical without * sacrificing much performance. we rely primarily on * modcounts. these are not guaranteed to detect concurrency * violations, and are sometimes overly conservative about * within-thread interference, but detect enough problems to * be worthwhile in practice. to carry this out, we (1) lazily * initialize fence and expectedmodcount until the latest * point that we need to commit to the state we are checking * against; thus improving precision. (this doesn't apply to * sublists, that create spliterators with current non-lazy * values). (2) we perform only a single * concurrentmodificationexception check at the end of foreach * (the most performance-sensitive method). when using foreach * (as opposed to iterators), we can normally only detect * interference after actions, not before. further * cme-triggering checks apply to all other possible * violations of assumptions for example null or too-small * elementdata array given its size(), that could only have * occurred due to interference. this allows the inner loop * of foreach to run without any further checks, and * simplifies lambda-resolution. while this does entail a * number of checks, note that in the common case of * list.stream().foreach(a), no checks or other computation * occur anywhere other than inside foreach itself. the other * less-often-used methods cannot take advantage of most of * these streamlinings. */ now, check out what happens when we tell the stream to produce sorted() results: list.stream() .sorted() .peek(list::remove) .foreach(system.out::println); this will now produce the following, “expected” output 0 1 2 3 4 5 6 7 8 9 and the list after stream consumption? it is empty: [] so, all elements are consumed, and removed correctly. the sorted() operation is a “stateful intermediate operation” , which means that subsequent operations no longer operate on the backing collection, but on an internal state. it is now “safe” to remove elements from the list! well… can we really? let’s proceed with parallel() , sorted() removal: list.stream() .sorted() .parallel() .peek(list::remove) .foreach(system.out::println); this now yields: 7 6 2 5 8 4 1 0 9 3 and the list contains [8] eek. we didn’t remove all elements!? free beers ( and jooq stickers ) go to anyone who solves this streams puzzler! this all appears quite random and subtle, we can only suggest that you never actually do modify a backing collection while consuming a stream. it just doesn’t work. 9. forgetting to actually consume the stream what do you think the following stream does? intstream.range(1, 5) .peek(system.out::println) .peek(i -> { if(i == 5) thrownewruntimeexception("bang"); }); when you read this, you might think that it will print (1 2 3 4 5) and then throw an exception. but that’s not correct. it won’t do anything. the stream just sits there, never having been consumed. as with any fluent api or dsl, you might actually forget to call the “terminal” operation. this might be particularly true when you use peek() , as peek() is an aweful lot similar to foreach() . this can happen with jooq just the same, when you forget to call execute() or fetch() : dsl.using(configuration) .update(table) .set(table.col1, 1) .set(table.col2, "abc") .where(table.id.eq(3)); oops. no execute() yes, the “best” way – with 1-2 caveats ;-) 10. parallel stream deadlock this is now a real goodie for the end! all concurrent systems can run into deadlocks, if you don’t properly synchronise things. while finding a real-world example isn’t obvious, finding a forced example is. the following parallel() stream is guaranteed to run into a deadlock: object[] locks = { newobject(), newobject() }; intstream .range(1, 5) .parallel() .peek(unchecked.intconsumer(i -> { synchronized(locks[i % locks.length]) { thread.sleep(100); synchronized(locks[(i + 1) % locks.length]) { thread.sleep(50); } } })) .foreach(system.out::println); note the use of unchecked.intconsumer() , which transforms the functional intconsumer interface into a org.jooq.lambda.fi.util.function.checkedintconsumer , which is allowed to throw checked exceptions. well. tough luck for your machine. those threads will be blocked forever :-) the good news is, it has never been easier to produce a schoolbook example of a deadlock in java! for more details, see also brian goetz’s answer to this question on stack overflow . conclusion with streams and functional thinking, we’ll run into a massive amount of new, subtle bugs. few of these bugs can be prevented, except through practice and staying focused. you have to think about how to order your operations. you have to think about whether your streams may be infinite. streams (and lambdas) are a very powerful tool. but a tool which we need to get a hang of, first.

June 16, 2014

by Lukas Eder

· 10,347 Views · 2 Likes

Converting ListenableFutures to CompletableFutures and back

Java 8 introduced CompletableFutures. They build on standard Futures and add completion callbacks, chaining and other useful stuff. But the world did not wait for Java 8 and lot of libraries added different variants of ListenableFutures which serve the same purpose. Some library authors are reluctant to add support for CompletableFutures even today. It makes sense, Java 8 is quite new and it's not easy to add support for CompletableFutures and be compatible with Java 7 at the same time. Luckily it's easy to convert to CompletableFutures and back. Let's take Spring 4 ListenableFutures as an example. How to convert it to CompletableFuture? static CompletableFuture buildCompletableFuture( final ListenableFuture listenableFuture ) { //create an instance of CompletableFuture CompletableFuture completable = new CompletableFuture() { @Override public boolean cancel(boolean mayInterruptIfRunning) { // propagate cancel to the listenable future boolean result = listenableFuture.cancel(mayInterruptIfRunning); super.cancel(mayInterruptIfRunning); return result; } }; // add callback listenableFuture.addCallback(new ListenableFutureCallback() { @Override public void onSuccess(T result) { completable.complete(result); } @Override public void onFailure(Throwable t) { completable.completeExceptionally(t); } }); return completable; } We just create a CompletableFuture instance and add a callback to the ListenableFuture. In the callback method we just notify the CompletableFuture that the underlying task has finished. We can even propagate call to cancel method if we want to. That's all you need to convert to CompletableFuture. What about the opposite direction? The approach is a bit different, but it's more or less straightforward as well class ListenableCompletableFutureWrapper implements ListenableFuture { private final ListenableFutureCallbackRegistry callbackRegistry = new ListenableFutureCallbackRegistry<>(); private final Future wrappedFuture; ListenableCompletableFutureWrapper( CompletableFuture wrappedFuture ) { this.wrappedFuture = wrappedFuture; wrappedFuture.whenComplete((result, ex) -> { if (ex != null) { if (ex instanceof CompletionException && ex.getCause() != null ) { callbackRegistry.failure(ex.getCause()); } else { callbackRegistry.failure(ex); } } else { callbackRegistry.success(result); } }); } @Override public void addCallback( ListenableFutureCallback callback ) { callbackRegistry.addCallback(callback); } @Override public boolean cancel(boolean mayInterruptIfRunning) { return wrappedFuture.cancel(mayInterruptIfRunning); } @Override public boolean isCancelled() { return wrappedFuture.isCancelled(); } @Override public boolean isDone() { return wrappedFuture.isDone(); } @Override public T get() throws InterruptedException, ExecutionException { return wrappedFuture.get(); } @Override public T get(long timeout, TimeUnit unit) throws InterruptedException, ExecutionException, TimeoutException { return wrappedFuture.get(timeout, unit); } } We just wrap the CompletableFuture and again register a callback. The only non-obvious part is the use of ListenableFutureCallbackRegistry which keeps track of registered ListenableFutureCallbacks. We also have to do some exception processing, but that's all. If you need to do something like this, I have good news. I have wrapped the code to a reusable library, so you do not have to copy and paste the code, you can just use it as described in the library documentation.

June 16, 2014

by Lukas Krecan

· 28,882 Views · 18 Likes

What's Wrong in Java 8, Part V: Tuples

In part 5 of our "What's Wrong in Java 8" series, we turn to Tuples.

June 10, 2014

by Pierre-Yves Saumont

· 132,251 Views · 4 Likes

Building a Simple RESTful API with Java Spark

Disclaimer: This post is about the Java micro web framework named Spark and not about the data processing engine Apache Spark. In this blog post we will see how Spark can be used to build a simple web service. As mentioned in the disclaimer, Spark is a micro web framework for Java inspired by the Ruby framework Sinatra. Spark aims for simplicity and provides only a minimal set of features. However, it provides everything needed to build a web application in a few lines of Java code. Getting Started Let's assume we have a simple domain class with a few properties and a service that provides some basic CRUDfunctionality: public class User { private String id; private String name; private String email; // getter/setter } public class UserService { // returns a list of all users public List getAllUsers() { .. } // returns a single user by id public User getUser(String id) { .. } // creates a new user public User createUser(String name, String email) { .. } // updates an existing user public User updateUser(String id, String name, String email) { .. } } We now want to expose the functionality of UserService as a RESTful API (For simplicity we will skip the hypermedia part of REST ;-)). For accessing, creating and updating user objects we want to use following URL patterns: GET /users Get a list of all users GET /users/ Get a specific user POST /users Create a new user PUT /users/ Update a user The returned data should be in JSON format. To get started with Spark we need the following Maven dependencies: com.sparkjava spark-core 2.0.0 org.slf4j slf4j-simple 1.7.7 Spark uses SLF4J for logging, so we need to a SLF4J binder to see log and error messages. In this example we use the slf4j-simple dependency for this purpose. However, you can also use Log4j or any other binder you like. Having slf4j-simple in the classpath is enough to see log output in the console. We will also use GSON for generating JSON output and JUnit to write a simple integration tests. You can find these dependencies in the complete pom.xml. Returning All Users Now it is time to create a class that is responsible for handling incoming requests. We start by implementing the GET /users request that should return a list of all users. import static spark.Spark.*; public class UserController { public UserController(final UserService userService) { get("/users", new Route() { @Override public Object handle(Request request, Response response) { // process request return userService.getAllUsers(); } }); // more routes } } Note the static import of spark.Spark.* in the first line. This gives us access to various static methods including get(), post(), put() and more. Within the constructor the get() method is used to register aRoute that listens for GET requests on /users. A Route is responsible for processing requests. Whenever aGET /users request is made, the handle() method will be called. Inside handle() we return an object that should be sent to the client (in this case a list of all users). Spark highly benefits from Java 8 Lambda expressions. Route is a functional interface (it contains only one method), so we can implement it using a Java 8 Lambda expression. Using a Lambda expression the Routedefinition from above looks like this: get("/users", (req, res) -> userService.getAllUsers()); To start the application we have to create a simple main() method. Inside main() we create an instance of our service and pass it to our newly created UserController: public class Main { public static void main(String[] args) { new UserController(new UserService()); } } If we now run main(), Spark will start an embedded Jetty server that listens on Port 4567. We can test our first route by initiating a GET http://localhost:4567/users request. In case the service returns a list with two user objects the response body might look like this: [com.mscharhag.sparkdemo.User@449c23fd, com.mscharhag.sparkdemo.User@437b26fe] Obviously this is not the response we want. Spark uses an interface called ResponseTransformer to convert objects returned by routes to an actual HTTP response. ReponseTransformer looks like this: public interface ResponseTransformer { String render(Object model) throws Exception; } ResponseTransformer has a single method that takes an object and returns a String representation of this object. The default implementation of ResponseTransformer simply calls toString() on the passed object (which creates output like shown above). Since we want to return JSON we have to create a ResponseTransformer that converts the passed objects to JSON. We use a small JsonUtil class with two static methods for this: public class JsonUtil { public static String toJson(Object object) { return new Gson().toJson(object); } public static ResponseTransformer json() { return JsonUtil::toJson; } } toJson() is an universal method that converts an object to JSON using GSON. The second method makes use of Java 8 method references to return a ResponseTransformer instance. ResponseTransformer is again a functional interface, so it can be satisfied by providing an appropriate method implementation (toJson()). So whenever we call json() we get a new ResponseTransformer that makes use of our toJson()method. In our UserController we can pass a ResponseTransformer as a third argument to Spark's get()method: import static com.mscharhag.sparkdemo.JsonUtil.*; public class UserController { public UserController(final UserService userService) { get("/users", (req, res) -> userService.getAllUsers(), json()); ... } } Note again the static import of JsonUtil.* in the first line. This gives us the option to create a newResponseTransformer by simply calling json(). Our response looks now like this: [{ "id": "1866d959-4a52-4409-afc8-4f09896f38b2", "name": "john", "email": "john@foobar.com" },{ "id": "90d965ad-5bdf-455d-9808-c38b72a5181a", "name": "anna", "email": "anna@foobar.com" }] We still have a small problem. The response is returned with the wrong Content-Type. To fix this, we can register a Filter that sets the JSON Content-Type: after((req, res) -> { res.type("application/json"); }); Filter is again a functional interface and can therefore be implemented by a short Lambda expression. After a request is handled by our Route, the filter changes the Content-Type of every response toapplication/json. We can also use before() instead of after() to register a filter. Then, the Filterwould be called before the request is processed by the Route. The GET /users request should be working now :-) Returning a Specific User To return a specific user we simply create a new route in our UserController: get("/users/:id", (req, res) -> { String id = req.params(":id"); User user = userService.getUser(id); if (user != null) { return user; } res.status(400); return new ResponseError("No user with id '%s' found", id); }, json()); With req.params(":id") we can obtain the :id path parameter from the URL. We pass this parameter to our service to get the corresponding user object. We assume the service returns null if no user with the passed id is found. In this case, we change the HTTP status code to 400 (Bad Request) and return an error object. ResponseError is a small helper class we use to convert error messages and exceptions to JSON. It looks like this: public class ResponseError { private String message; public ResponseError(String message, String... args) { this.message = String.format(message, args); } public ResponseError(Exception e) { this.message = e.getMessage(); } public String getMessage() { return this.message; } } We are now able to query for a single user with a request like this: GET /users/5f45a4ff-35a7-47e8-b731-4339c84962be If an user with this id exists we will get a response that looks somehow like this: { "id": "5f45a4ff-35a7-47e8-b731-4339c84962be", "name": "john", "email": "john@foobar.com" } If we use an invalid user id, a ResponseError object will be created and converted to JSON. In this case the response looks like this: { "message": "No user with id 'foo' found" } Creating and Updating Users Creating and updating users is again very easy. Like returning the list of all users it is done using a single service call: post("/users", (req, res) -> userService.createUser( req.queryParams("name"), req.queryParams("email") ), json()); put("/users/:id", (req, res) -> userService.updateUser( req.params(":id"), req.queryParams("name"), req.queryParams("email") ), json()); To register a route for HTTP POST or PUT requests we simply use the static post() and put() methods of Spark. Inside a Route we can access HTTP POST parameters using req.queryParams(). For simplicity reasons (and to show another Spark feature) we do not do any validation inside the routes. Instead we assume that the service will throw an IllegalArgumentException if we pass in invalid values. Spark gives us the option to register ExceptionHandlers. An ExceptionHandler will be called if anException is thrown while processing a route. ExceptionHandler is another single method interface we can implement using a Java 8 Lambda expression: exception(IllegalArgumentException.class, (e, req, res) -> { res.status(400); res.body(toJson(new ResponseError(e))); }); Here we create an ExceptionHandler that is called if an IllegalArgumentException is thrown. The caught Exception object is passed as the first parameter. We set the response code to 400 and add an error message to the response body. If the service throws an IllegalArgumentException when the email parameter is empty, we might get a response like this: { "message": "Parameter 'email' cannot be empty" } The complete source the controller can be found here. Testing Because of Spark's simple nature it is very easy to write integration tests for our sample application. Let's start with this basic JUnit test setup: public class UserControllerIntegrationTest { @BeforeClass public static void beforeClass() { Main.main(null); } @AfterClass public static void afterClass() { Spark.stop(); } ... } In beforeClass() we start our application by simply running the main() method. After all tests finished we call Spark.stop(). This stops the embedded server that runs our application. After that we can send HTTP requests within test methods and validate that our application returns the correct response. A simple test that sends a request to create a new user can look like this: @Test public void aNewUserShouldBeCreated() { TestResponse res = request("POST", "/users?name=john&email=john@foobar.com"); Map json = res.json(); assertEquals(200, res.status); assertEquals("john", json.get("name")); assertEquals("john@foobar.com", json.get("email")); assertNotNull(json.get("id")); } request() and TestResponse are two small self made test utilities. request() sends a HTTP request to the passed URL and returns a TestResponse instance. TestResponse is just a small wrapper around some HTTP response data. The source of request() and TestResponse is included in the complete test classfound on GitHub. Conclusion Compared to other web frameworks Spark provides only a small amount of features. However, it is so simple you can build small web applications within a few minutes (even if you have not used Spark before). If you want to look into Spark you should clearly use Java 8, which reduces the amount of code you have to write a lot. You can find the complete source of the sample project on GitHub.

June 9, 2014

by Michael Scharhag

· 111,033 Views · 3 Likes

Working with ZeroMQ, Java, and JZMQ on a CentOS Platform

Recently I decided to port some of my development using ZeroMQ onto my CentOS development machine and I ran into some challenges. I’m documenting those challenges so that if someone else runs into the same pitfalls I did, they can avoid it. In this example today, we will work with the first “HelloWorld” examples in the ZeroMQ guide found here. I added a few modifications to the sample such as a package name and a try-catch around the Thread and an exception.tostring() to display any stack-trace. Source code for src/zmq/hwserver.java package zmq; import java.io.PrintWriter; import java.io.StringWriter; import org.zeromq.ZMQ; // // Hello World server in Java // Binds REP socket to tcp://*:5555 // Expects "Hello" from client, replies with "World" // public class hwserver { /** * @param args */ public static void main(String[] args) { ZMQ.Context context = ZMQ.context(1); // Socket to talk to clients ZMQ.Socket socket = context.socket(ZMQ.REP); socket.bind ("tcp://*:5555"); try { while (!Thread.currentThread ().isInterrupted ()) { byte[] reply = socket.recv(0); System.out.println("Received Hello"); String request = "World" ; socket.send(request.getBytes (), 0); Thread.sleep(1000); // Do some 'work' } } catch(Exception e) { StringWriter sw = new StringWriter(); PrintWriter pw = new PrintWriter(sw); e.printStackTrace(pw); System.out.println(sw.toString()); } socket.close(); context.term(); } } Similarly, source code for the client, src/zmq/hwclient.java package zmq; import org.zeromq.ZMQ; public class hwclient { /** * @param args */ public static void main(String[] args) { ZMQ.Context context = ZMQ.context(1); // Socket to talk to server System.out.println("Connecting to hello world server"); ZMQ.Socket socket = context.socket(ZMQ.REQ); socket.connect ("tcp://localhost:5555"); for(int requestNbr = 0; requestNbr != 10; requestNbr++) { String request = "Hello" ; System.out.println("Sending Hello " + requestNbr ); socket.send(request.getBytes (), 0); byte[] reply = socket.recv(0); System.out.println("Received " + new String (reply) + " " + requestNbr); } socket.close(); context.term(); } } Now that you have the sample code, how do you compile using the ZeroMQ? Assumption: You have installed Java (1.7 or above) Step-1: Installing ZeroMQ onto CentOS [Following steps are performed under root account] Install “Development Tools” if it’s not already installed on your CentOS as root: yum groupinstall “Development Tools” Download the “POSIX tarball” ZeroMQ source code onto your CentOS development machine from here. At the time of writing this article, ZeroMQ version 3.2.3 was the stable release. You might want to download the latest stable release. Unpack the .tar.gz source archive. Run ./configure, followed by “make” then “make install“. Run ldconfig after installation. Step-2: Installing a Language Binding for Java. In this case, we will use JZMQ from https://github.com/zeromq/jzmq Download the latest stable release from GITHub link above. (git clone git://github.com/zeromq/jzmq.git) Change directory, cd jzmq Compile and Install: 1 2 3 4 ./autogen.sh ./configure make make install Where did it install? 1 2 # JAR is located here: /usr/local/share/java/zmq.jar # .so link files are located here: /usr/local/lib Important Step: Add /usr/local/lib to a line in /etc/ld.so.conf (here is my copy after editing) 1 2 include ld.so.conf.d/*.conf /usr/local/lib Reload “ldconfig“. This clears the cache. Step-3: Compile and run the Java examples above. cd ~/dev/zeromq/example/ # Compile hwserver.java javac -classpath /usr/local/share/java/zmq.jar ./zmq/hwserver.java # Compile hwclient.java javac -classpath /usr/local/share/java/zmq.jar ./zmq/hwclient.java # Run hwserver in a separate prompt java -classpath .: /usr/local/share/java/zmq.jar -Djava.library.path=/usr/local/lib zmq.hwserver # Run hwclient in a seperate prompt java -classpath .:/usr/local/share/java/zmq.jar -Djava.library.path=/usr/local/lib zmq.hwclient Output on the hwserver console: Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello output on the hwclient console: Connecting to hello world server Sending Hello 0 Received World 0 Sending Hello 1 Received World 1 Sending Hello 2 Received World 2 Sending Hello 3 Received World 3 Sending Hello 4 Received World 4 Sending Hello 5 Received World 5 Sending Hello 6 Received World 6 Sending Hello 7 Received World 7 Sending Hello 8 Received World 8 Sending Hello 9 Received World 9 Few interesting points to note are as follows: What happens if you started the client first and then the server? Well, the client waits until the server becomes available (or in other words, until some process connects to socket port 5555) and then sends the message. When you say socket.send(…), ZeroMQ actually enqueues a message to be sent later by a dedicated communication thread and this thread waits until a bind on port 5555 happens by “server”. Also observe that the “server” is doing the connecting, and the “client” is doing the binding. What is ZeroMQ (ØMQ)? (Excerpt from the ZeroMQ website!) ØMQ (also seen as ZeroMQ, 0MQ, zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fanout, pub-sub, task distribution, and request-reply. It’s fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multicore applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems. ØMQ is from iMatix and is LGPLv3 open source.

June 6, 2014

by Venkatt Guhesan

· 24,702 Views

MapDB: The Agile Java Data Engine

MapDB is a pure Java database, specifically designed for the Java developer. The fundamental concept for MapDB is very clever yet natural to use: provide a reliable, full-featured and “tune-able” database engine using the Java Collections API. MapDB 1.0 has just been released, this is the culmination of years of research and development to get the project to this point. Jan Kotek, the primary developer for MapDB, also worked on predecessor projects (JDBM), starting MapDB as an entire from-scratch rewrite. Jan’s expertise and dedication to low-level debugging has yielded excellent results, producting an easy-to-use database for Java with comparable performance to many C-based engines. What sets MapDB apart is the “map” concept. The idea is to leverage the totally natural Java Collections API – so familiar to Java developers that most of them literally use it daily in their work. For most database interactions with a Java application, some sort of translator is required. There are many Object-Relational Mapping (ORM) tools to name just one category of such components. The goal has always been in the direction of making it natural to code in objects in the Java language, and translate them to a specific database syntax (such as SQL). However, such efforts have always come up short, adding complexity for both the application developer and the data architect. When using MapDB there is no object “translation layer” – developers just access data in familiar structures like Maps, Sets, Queues, etc. There is no change in syntax from typical Java coding, other than a brief initialization syntax and transaction management. A developer can literally transform memory-limited maps into a high-speed persistent store in seconds (typically changing just one line of code). A MapDB Example Here is a simple MapDB example, showing how easy and intuitive it is to use in a Java application: // Initialize a MapDB database DB db = DBMaker.newFileDB(new File("testdb")) .closeOnJvmShutdown() .make(); // Create a Map: Map myMap = db.getTreeMap(“testmap”); // Work with the Map using the normal Map API. myMap.put(“key1”, “value1”); myMap.put(“key2”, “value2”); String value = myMap.get(“key1”); ... That’s all you need to do, now you have a file-backed Map of virtually any size. Note the “builder-style” initialization syntax, enabling MapDB as the agile database choice for Java. There are many builder options that let you tune your database for the specific requirements at hand. Just a small subset of options include: In-memory implementation Enable transactions Configurable caching This means that you can configure your database just for what you need, effectively making MapDB serve the job of many other databases. MapDB comes with a set of powerful configuration options, and you can even extend the product to make your own data implementations if necessary. Another very powerful feature is that MapDB utilizes some of the advanced Java Collections variants, such as ConcurrentNavigableMap. With this type of Map you can go beyond simple key-value semantics, as it is also a sorted Map allowing you to access data in order, and find values near a key. Not many people are aware of this extension to the Collections API, but it is extremely powerful and allows you to do a lot with your MapDB database (I will cover more of these capabilities in a future article). The Agile Aspect of MapDB When I first met Jan and started talking with him about MapDB he said something that made a very important impression: If you know what data structure you want, MapDB allows you to tailor the structure and database characteristics to your exact application needs. In other words, the schema and ways you can structure your data is very flexible. The configuration of the physical data store is just as flexible, making a perfect combination for meeting almost any database need. They key to this capability is inherent in MapDB’s architecture, and how it translates to the MapDB API itself. Here is a simple diagram of the MapDB architecture: As you can see from the diagram, there are 3 tiers in MapDB: Collections API: This is the familiar Java Collections API that every Java developer uses for maintaining application state. It has a simple builder-style extension to allow you to control the exact characteristics of a given database (including its internal format or record structure). Engine: The Engine is the real key to MapDB, this is where the records for a database – including their internal structure, concurrency control, transactional semantics – are controlled. MapDB ships with several engines already, and it is straightforward to add your own Engine if needed for specialized data handling. Volume: This is the physical storage layer (e.g., on-disk or in-memory). MapDB has a few standard Volume implementations, and they should suffice for most projects. The main point is that the development API is completely distinct from the Engine implementation (the heart of MapDB), and both are separate from the actual physical storage layer. This offers a very agile approach, allowing developers to exactly control what type of internal structure is needed for a given database, and what the actual data structure looks like from the top-level Collections API. To make things even more extensible and agile, MapDB uses a concept of Engine Wrappers. An Engine Wrapper allows adding additional features and options on top of a specific engine layer. For example, if the standard Map engine is utilized for creating a B-Tree backed Map, it is feasible to enable (or disable) caching support. This caching feature is done through an Engine Wrapper, and that is what shows up in the builder-style API used to configure a given database. While a whole article could be written just about this, the point here is that this adds to MapDB’s inherent agile nature. By way of example, here is how you configure a pure in-memory database, without transactional capabilities: // Initialize an in-memory MapDB database // without transactions DB db = DBMaker.newMemoryDB() .transactionDisable() .closeOnJvmShutdown() .make(); // Create a Map: Map myMap = db.getTreeMap(“testmap”); // Work with the Map using the normal Map API. myMap.put(“key1”, “value1”); myMap.put(“key2”, “value2”); String value = myMap.get(“key1”); ... That’s it! All that was needed was to change the DBMaker call to add the new options, everything else works exactly the same as in the example shown earlier. Agile Data Model In addition to customizing the features and performance characteristics of a given database instance, MapDB allows you to create an agile data model, with a schema exactly matching your application requirements. This is probably similar to how you write your code when creating standard Java in-memory structures. For example, let’s say you need to lookup a Person object by username, or by personID. Simply create a Person object and two Maps to meet your needs: public class Person { private Integer personID; private String username; ... // Setters and getters go here ... } // Create a Map of Person by username. Map personByUsernameMap = ... // Create a Map of Person by personID. Map personByPersonIDMap = ... This is a very trivial example, but now you can easily write to both maps for each new Person instance, and subsequently retrieve a Person by either key. Another interesting concept with MapDB data structures are some key extensions to the normal Java Collections API. A common requirement in applications is to have a Map with a key/value, and in addition to finding the value for a key to be able to perform the inverse: lookup the key for a given value. We can easily do this using the MapDB extension for bi-directional maps: // Create a primary map HTreeMap map = DBMaker.newTempHashMap(); // Create the inverse mapping for primary map NavigableSet> inverseMapping = new TreeSet>(); // Bind the inverse mapping to primary map, so it is auto-updated each time the primary map gets a new key/value Bind.mapInverse(map, inverseMapping); map.put(10L,"value2"); map.put(1111L,"value"); map.put(1112L,"value"); map.put(11L,"val"); // Now find a key by a given value. Long keyValue = Fun.filter(inverseMapping.get(“value2”); MapDB supports many constructs for the interaction of Maps or other collections, allowing you to create a schema of related structures that can automatically be kept in sync. This avoids a lot of scanning of structures, makes coding fast and convenient, and can keep things very fast. Wrapping it up I have shown a very brief introduction on MapDB and how the product works. As you can see its strengths are its use of the natural Java Collections API, the agile nature of the engine itself, and the support for virtually any type of data model or schema that your application needs. MapDB is freely available for any use under the Apache 2.0 license. To learn more, check out: www.mapdb.org.

June 5, 2014

by Cory Isaacson

· 28,027 Views · 3 Likes

Spring Integration Java DSL sample

A new Java based DSL has now been introduced for Spring Integration which makes it possible to define the Spring Integration message flows using pure java based configuration instead of using the Spring XML based configuration. I tried the DSL for a sample Integration flow that I have - I call it the Rube Goldberg flow, for it follows a convoluted path in trying to capitalize a string passed in as input. The flow looks like this and does some crazy things to perform a simple task: It takes in a message of this type - "hello from spring integ" splits it up into individual words(hello, from, spring, integ) sends each word to a ActiveMQ queue from the queue the word fragments are picked up by a enricher to capitalize each word placing the response back into a response queue It is picked up, resequenced based on the original sequence of the words aggregated back into a sentence("HELLO FROM SPRING INTEG") and returned back to the application. To start with Spring Integration Java DSL, a simple Xml based configuration to capitalize a String would look like this: There is nothing much going on here, a messaging gateway takes in the message passed in from the application, capitalizes it in a transformer and this is returned back to the application. Expressing this in Spring Integration Java DSL: @Configuration @EnableIntegration @IntegrationComponentScan @ComponentScan public class EchoFlow { @Bean public DirectChannel requestChannel() { return new DirectChannel(); } @Bean public IntegrationFlow simpleEchoFlow() { return IntegrationFlows.from(requestChannel()) .transform((String s) -> s.toUpperCase()) .get(); } } @MessagingGateway public interface EchoGateway { @Gateway(requestChannel = "requestChannel") String echo(String message); } Do note that @MessagingGateway annotation is not a part of Spring Integration Java DSL, it is an existing component in Spring Integration and serves the same purpose as the gateway component in XML based configuration. I like the fact that the transformation can be expressed using typesafe Java 8 lambda expressions rather than the Spring-EL expression. Note that the transformation expression could have coded in quite few alternate ways: ??.transform((String s) -> s.toUpperCase()) Or: ??.transform(s -> s.toUpperCase()) Or using method references: ??.transform(String::toUpperCase) Moving onto the more complicated Rube Goldberg flow to accomplish the same task, again starting with XML based configuration. There are two configurations to express this flow: rube-1.xml: This configuration takes care of steps 1, 2, 3, 6, 7, 8 : It takes in a message of this type - "hello from spring integ" splits it up into individual words(hello, from, spring, integ) sends each word to a ActiveMQ queue from the queue the word fragments are picked up by a enricher to capitalize each word placing the response back into a response queue It is picked up, resequenced based on the original sequence of the words aggregated back into a sentence("HELLO FROM SPRING INTEG") and returned back to the application. and rube-2.xml for steps 4, 5: It takes in a message of this type - "hello from spring integ" splits it up into individual words(hello, from, spring, integ) sends each word to a ActiveMQ queue from the queue the word fragments are picked up by a enricher to capitalize each word placing the response back into a response queue It is picked up, resequenced based on the original sequence of the words aggregated back into a sentence("HELLO FROM SPRING INTEG") and returned back to the application. Now, expressing this Rube Goldberg flow using Spring Integration Java DSL, the configuration looks like this, again in two parts: EchoFlowOutbound.java: @Bean public DirectChannel sequenceChannel() { return new DirectChannel(); } @Bean public DirectChannel requestChannel() { return new DirectChannel(); } @Bean public IntegrationFlow toOutboundQueueFlow() { return IntegrationFlows.from(requestChannel()) .split(s -> s.applySequence(true).get().getT2().setDelimiters("\\s")) .handle(jmsOutboundGateway()) .get(); } @Bean public IntegrationFlow flowOnReturnOfMessage() { return IntegrationFlows.from(sequenceChannel()) .resequence() .aggregate(aggregate -> aggregate.outputProcessor(g -> Joiner.on(" ").join(g.getMessages() .stream() .map(m -> (String) m.getPayload()).collect(toList()))) , null) .get(); } and EchoFlowInbound.java: @Bean public JmsMessageDrivenEndpoint jmsInbound() { return new JmsMessageDrivenEndpoint(listenerContainer(), messageListener()); } @Bean public IntegrationFlow inboundFlow() { return IntegrationFlows.from(enhanceMessageChannel()) .transform((String s) -> s.toUpperCase()) .get(); } Again here the code is completely typesafe and is checked for any errors at development time rather than at runtime as with the XML based configuration. Again I like the fact that transformation, aggregation statements can be expressed concisely using Java 8 lamda expressions as opposed to Spring-EL expressions. What I have not displayed here is some of the support code, to set up the activemq test infrastructure, this configuration continues to remain as xml and I have included this code in a sample github project. All in all, I am very excited to see this new way of expressing the Spring Integration messaging flow using pure Java and I am looking forward to seeing its continuing evolution and may be even try and participate in its evolution in small ways. Here is the entire working code in a github repo: https://github.com/bijukunjummen/rg-si References and Acknowledgement: Spring Integration Java DSL introduction blog article by Artem Bilan: https://spring.io/blog/2014/05/08/spring-integration-java-dsl-milestone-1-released Spring Integration Java DSL website and wiki: https://github.com/spring-projects/spring-integration-extensions/wiki/Spring-Integration-Java-DSL-Reference. A lot of code has been shamelessly copied over from this wiki by me :-). Also, a big thanks to Artem for guidance on a question that I had Webinar by Gary Russell on Spring Integration 4.0 in which Spring Integration Java DSL is covered in great detail.

June 3, 2014

by Biju Kunjummen

· 43,411 Views

InterruptedException and Interrupting Threads Explained

Let's take a simple example of a thread that periodically does some clean up, but in between sleeps most of the time.

June 1, 2014

by Tomasz Nurkiewicz

CORE

· 18,265 Views · 1 Like

Connecting to Cassandra from Java

In this post, I look at the basics of connecting to a Cassandra database from a Java client. I will use the DataStax Java Client JAR in order to do so.

May 30, 2014

by Dustin Marx

· 120,931 Views · 3 Likes

SOAP Webservices Using Apache CXF: Adding Custom Object as Header in Outgoing Requests

What is CXF? Apache CXF is an open source services framework. CXF helps you build and develop services using frontend programming APIs, like JAX-WS and JAX-RS. These services can speak a variety of protocols such as SOAP, XML/HTTP, RESTful HTTP, or CORBA and work over a variety of transports such as HTTP, JMS etc. How CXF Works? As you can see here and here, how cxf service calls are processed,most of the functionality in the Apache CXF runtime is implemented by interceptors. Every endpoint created by the Apache CXF runtime has potential interceptor chains for processing messages. The interceptors in the these chains are responsible for transforming messages between the raw data transported across the wire and the Java objects handled by the endpoint’s implementation code. Interceptors in CXF When a CXF client invokes a CXF server, there is an outgoing interceptor chain for the client and an incoming chain for the server. When the server sends the response back to the client, there is an outgoing chain for the server and an incoming one for the client. Additionally, in the case of SOAPFaults, a CXF web service will create a separate outbound error handling chain and the client will create an inbound error handling chain. The interceptors are organized into phases to ensure that processing happens on the proper order.Various phases involved during the Interceptor chains are listed in CXF documentation here. Adding your custom Interceptor involves extending one of the Abstract Intereceptor classes that CXF provides, and providing a phase when that interceptor should be invoked. AbstractPhaseInterceptor class - This abstract class provides implementations for the phase management methods of the PhaseInterceptor interface. The AbstractPhaseInterceptor class also provides a default implementation of the handleFault() method. Developers need to provide an implementation of the handleMessage() method. They can also provide a different implementation for the handleFault() method. The developer-provided implementations can manipulate the message data using the methods provided by the generic org.apache.cxf.message.Message interface. For applications that work with SOAP messages, Apache CXF provides an AbstractSoapInterceptor class. Extending this class provides the handleMessage() method and the handleFault() method with access to the message data as an org.apache.cxf.binding.soap.SoapMessage object. SoapMessage objects have methods for retrieving the SOAP headers, the SOAP envelope, and other SOAP metadata from the message. Below piece of code will show, how we can add a Custom Object as Header to an outgoing request – Spring Configuration - Interceptor :- public class SoapHeaderInterceptor extends AbstractSoapInterceptor { public SoapHeaderInterceptor() { super(Phase.POST_LOGICAL); } @Override public void handleMessage(SoapMessage message) throws Fault { List headers = message.getHeaders(); TestHeader testHeader = new TestHeader(); JAXBElement testHeaders = new ObjectFactory() .createTestHeader(testHeader); try { Header header = new Header(testHeaders.getName(), testHeader, new JAXBDataBinding(TestHeader.class)); headers.add(header); message.put(Header.HEADER_LIST, headers); } catch (JAXBException e) { e.printStackTrace(); } }

May 29, 2014

by Saurabh Chhajed

· 15,019 Views · 1 Like

Running the Maven Release Plugin with Jenkins

Learn more about using the Maven Release plugin on Jenkins, including subversion source control, artifactory, continuous integration, and more.

May 23, 2014

by $$anonymous$$

· 104,465 Views · 6 Likes

Quick Note: SSL with SOAP and SOAPUI

For doing SSL with SOAP, there’s a few things you need to setup. C:\Program Files (x86)\SmartBear\soapUI-Pro-4.5.1\jre\lib\security Also did it at the main jre at C:\Program Files (x86)\Java\jre7\lib\security keytool -genkey -alias svs -keyalg RSA -keystore keystore.jks -keysize 2048 git config --global core.autocrlf true javax.net.ssl.trustStore=<> javax.net.ssl.trustStorePassword=<> If these properties are not set, the default ones will be picked up from your the default location.[$JAVA_HOME/lib/security/jssecacerts, $JAVA_HOME/lib/security/cacerts] To view the contents of keystore file, use: keytool -list -v -keystore file.keystore -storepass changeit To debug the ssl handshake process and view the certificates, set the VM parameter -Djavax.net.debug=all keytool -genkey -keyalg RSA -alias selfsigned -keystore keystore.jks -storepass changeit -validity 360 -keysize 2048 -Djava.net.preferIPv4Stack=true added to soapui.bat C:\Program Files (x86)\SmartBear\SoapUI-4.6.3\bin -Djavax.net.debug=ssl,trustmanager http://docs.oracle.com/cd/E19509-01/820-3503/ggfgo/index.html http://www.sslshopper.com/article-most-common-java-keytool-keystore-commands.html http://ianso.blogspot.com/2009/12/building-ws-security-enabled-soap.html http://javarevisited.blogspot.com/2012/09/difference-between-truststore-vs-keyStore-Java-SSL.html http://javarevisited.blogspot.com/2012/03/add-list-certficates-java-keystore.html http://www.ibm.com/developerworks/java/library/j-jws17/index.html http://www.coderanch.com/t/223027/Web-Services/java/SOAP-HTTPS-SSL http://ruchirawageesha.blogspot.in/2010/07/how-to-create-clientserver-keystores.html http://stackoverflow.com/questions/11001102/how-to-programmatically-set-the-sslcontext-of-a-jax-ws-client http://busylog.net/ssl-java-keytool-soap-and-eclipse/ http://www.sslshopper.com/article-how-to-create-a-self-signed-certificate-using-java-keytool.html openssl s_client -showcerts -host webservices-cert.storedvalue.com -port 443 keytool -keystore clientkeystore -genkey -alias client wsdl2java.bat -uri my.wsdl -o svsproj -p com.agilemobiledeveloper.service -d xmlbeans -t -ss -ssi -sd -g -ns2p System.setProperty("javax.net.ssl.keyStore", keystore.jks"); System.setProperty("javax.net.ssl.keyStorePassword", "changeit"); System.setProperty("javax.net.ssl.trustStore", "clientkeystore"); System.setProperty("javax.net.ssl.trustStorePassword", "changeit"); setx -m JAVA_HOME "C:\Program Files\Java\jdk1.7.0_51″ setx -m javax.net.ssl.keyStore "keystore.jks"); setx -m javax.net.ssl.keyStorePassword "changeit"); setx -m javax.net.ssl.trustStore "keystore.jks"); setx -m javax.net.ssl.trustStorePassword "passwordislong");

May 23, 2014

by Tim Spann

CORE

· 19,445 Views · 1 Like

What's Wrong in Java 8, Part IV: Monads

Further exploring what's wrong with Java 8, we turn our discussion to monads.

May 21, 2014

by Pierre-Yves Saumont

· 111,676 Views · 36 Likes

What's Wrong in Java 8, Part III: Streams and Parallel Streams

When the first early access versions of Java 8 were made available, what seemed the most important (r)evolution were lambdas. This is now changing and many developers seem to think now that streams are the most valuable Java 8 feature. And this is because they believe that by changing a single word in their programs (replacing stream with parallelStream) they will make these programs work in parallel. Many Java 8 evangelists have demonstrated amazing examples of this. Is there something wrong with this? No. Not something. Many things: Running in parallel may or may not be a benefit. It depends what you are using this feature for. Java 8 parallel streams may make your programs run faster. Or not. Or even slower. Thinking about streams as a way to achieve parallel processing at low cost will prevent developers to understand what is really happening. Streams are not directly linked to parallel processing. Most of the above problems are based upon a misunderstanding: parallel processing is not the same thing as concurrent processing. And most examples shown about “automatic parallelization” with Java 8 are in fact examples of concurrent processing. Thinking about map, filter and other operations as “internal iteration” is a complete nonsense (although this is not a problem with Java 8, but with the way we use it). So, what are streams According to Wikipedia: “a stream is a potentially infinite analog of a list, given by the inductive definition: data Stream a = Cons a (Stream a) Generating and computing with streams requires lazy evaluation, either implicitly in a lazily evaluated language or by creating and forcing thunks in an eager language.” One most important think to notice is that Java is what Wikipedia calls an “eager” language, which means Java is mostly strict (as opposed to lazy) in evaluating things. For example, if you create a List in Java, all elements are evaluated when the list is created. This may surprise you, since you may create an empty list and add elements after. This is only because either the list is mutable (and you are replacing a null reference with a reference to something) or you are creating a new list from the old one appended with the new element. Lists are created from something producing its elements. For example: List list = Arrays.asList(1, 2, 3, 4, 5); Here the producer is an array, and all elements of the array are strictly evaluated. It is also possible to create a list in a recursive way, for example the list starting with 1 and where all elements are equals to 1 plus the previous element and smaller than 6. In Java < 8, this translates into: List list = new ArrayList(); for(int i = 0; i < 6; i++) { list.add(i); } One may argue that the for loop is one of the rare example of lazy evaluation in Java, but the result is a list in which all elements are evaluated. What happens if we want to apply a function to all elements of this list? We may do this in a loop. For example, if with want to increase all elements by 2, we may do this: for(int i = 0; i < list.size(); i++) { list.set(i, list.get(i) * 2); } However, this does not allow using an operation that changes the type of the elements, for example increasing all elements by 10%. The following solution solves this problem: List list2 = new ArrayList(); for(int i = 0; i < list.size(); i++) { list2.add(list.get(i) * 1.2); } This form allows the use of a the Java 5 for each syntax: List list2 = new ArrayList<>(); for(Integer i : list) { list2.add(i * 1.2); } or the Java 8 syntax: List list2 = new ArrayList<>(); list.forEach(x -> list2.add(x * 1.2)); So far, so good. But what if we want to increase the value by 10% and then divide it by 3? The trivial answer would be to do: List list2 = new ArrayList<>(); list.forEach(x -> list2.add(x * 1.2)); List list3 = new ArrayList<>(); list2.forEach(x -> list3.add(x / 3)); This is far from optimal because we are iterating twice on the list. A much better solution is: List list2 = new ArrayList<>(); for(Integer i : list) { list2.add(i * 1.2 / 3); } Let aside the auto boxing/unboxing problem for now. In Java 8, this can be written as: List list2 = new ArrayList<>(); list.forEach(x -> list2.add(x * 1.2 / 3)); But wait... This is only possible because we see the internals of the Consumer bound to the list, so we are able to manually compose the operations. If we had: List list2 = new ArrayList<>(); list.forEach(consumer1); List list3 = new ArrayList<>(); list2.forEach(consumer2); How could we know how to compose them? No way. In Java 8, the Consumer interface has a default method andThen. We could be tempted to compose the consumers this way: list.forEach(consumer1.andThen(consumer2)); but this will result in an error, because andThen is defined as: default Consumer andThen(Consumer after) { Objects.requireNonNull(after); return (T t) -> { accept(t); after.accept(t); }; } This means that we can't use andThen to compose consumers of different types. In fact, we have it all wrong since the beginning. What we need is to bind the list to a function in order to get a new list, such as: Function function1 = x -> x * 1.2; Function function2 = x -> x / 3; list.bind(function1).bind(function2); where the bind method would be defined in a special FList class like: public class FList { final List list; public FList(List list) { this.list = list; } public FList bind(Function f) { List newList = new ArrayList(); for (T t : list) { newList.add(f.apply(t)); } return new FList(newList); } } and we would use it as in the following example: new Flist<>(list).bind(function1).bind(function2); The only trouble we have then is that binding twice would require iterating twice on the list. This is because bind is evaluated strictly. What we would need is a lazy evaluation, so that we could iterate only once. The problem here is that the bind method is not a real binding. It is in reality a composition of a real binding and a reduce. "Reducing" is applying an operation to each element of the list, resulting in the combination of this element and the result of the same operation applied to the previous element. As there is no previous element when we start from the first element, we start with an initial value. For example, applying (x) -> r + x, where r is the result of the operation on the previous element, or 0 for the first element, gives the sum of all elements of the list. Applying () -> r + 1 to each element, starting with r = 0 gives the length of the list. (This may not be the more efficient way to get the length of the list, but it is totally functional!) Here, the operation is add(element) and the initial value is an empty list. And this occurs only because the function application is strictly evaluated. What Java 8 streams give us is the same, but lazily evaluated, which means that when binding a function to a stream, no iteration is involved! Binding a Function to a Stream gives us a Stream with no iteration occurring. The resulting Stream is not evaluated, and this does not depend upon the fact that the initial stream was built with evaluated or non evaluated data. In functional languages, binding a Function to a Stream is itself a function. In Java 8, it is a method, which means it's arguments are strictly evaluated, but this has nothing to do with the evaluation of the resulting stream. To understand what is happening, we can imagine that the functions to bind are stored somewhere and they become part of the data producer for the new (non evaluated) resulting stream. In Java 8, the method binding a function T -> U to a Stream, resulting in a Stream is called map. The function binding a function T -> Stream to a Stream, resulting in a Stream is called flatMap. Where is flatten? Most functional languages also offer a flatten function converting a Stream> into a Stream, but this is missing in Java 8 streams. It may not look like a big trouble since it is so easy to define a method for doing this. For example, given the following function: Function> f = x -> Stream.iterate(1, y -> y + 1).limit(x); Stream stream = Stream.iterate(1, x -> x + 1); Stream stream2 = stream.limit(5).flatMap(f); System.out.println(stream2.collect(toList())) to produce: [1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5] Using map instead of flatMap: Stream stream = Stream.iterate(1, x -> x + 1); Stream stream2 = stream.limit(5).map(f); System.out.println(stream2.collect(toList())) will produce a stream of streams: [java.util.stream.SliceOps$1@12133b1, java.util.stream.SliceOps$1@ea2f77, java.util.stream.SliceOps$1@1c7353a, java.util.stream.SliceOps$1@1a9515, java.util.stream.SliceOps$1@f49f1c] Converting this stream of streams of integers to a stream of integers is very straightforward using the functional paradigm: one just need to flatMap the identity function to it: System.out.println(stream2.flatMap(x -> x).collect(toList())); It is however strange that a flatten method has not been added to the stream, knowing the strong relation that ties map, flatMap, unit and flatten, where unit is the function from T to Stream, represented by the method: Stream Stream.of(T... t) When are stream evaluated? Streams are evaluated when we apply to them some specific operations called terminal operation. This may be done only once. Once a terminal operation is applied to a stream, is is no longer usable. Terminal operations are: forEach forEachOrdered toArray reduce collect min max count anyMatch allMatch noneMatch findFirst findAny iterator spliterator Some of these methods are short circuiting. For example, findFirst will return as soon as the first element will be found. Non terminal operations are called intermediate and can be stateful (if evaluation of an element depends upon the evaluation of the previous) or stateless. Intermediate operations are: filter map mapTo... (Int, Long or Double) flatMap flatMapTo... (Int, Long or Double) distinct sorted peek limit skip sequential parallel unordered onClose Several intermediate operations may be applied to a stream, but only one terminal operation may be use. So what about parallel processing? One most advertised functionality of streams is that they allow automatic parallelization of processing. And one can find the amazing demonstrations on the web, mainly based of the same example of a program contacting a server to get the values corresponding to a list of stocks and finding the highest one not exceeding a given limit value. Such an example may show an increase of speed of 400 % and more. But this example as little to do with parallel processing. It is an example of concurrent processing, which means that the increase of speed will be observed also on a single processor computer. This is because the main part of each “parallel” task is waiting. Parallel processing is about running at the same time tasks that do no wait, such as intensive calculations. Automatic parallelization will generally not give the expected result for at least two reasons: The increase of speed is highly dependent upon the kind of task and the parallelization strategy. And over all things, the best strategy is dependent upon the type of task. The increase of speed in highly dependent upon the environment. In some environments, it is easy to obtain a decrease of speed by parallelizing. Whatever the kind of tasks to parallelize, the strategy applied by parallel streams will be the same, unless you devise this strategy yourself, which will remove much of the interest of parallel streams. Parallelization requires: A pool of threads to execute the subtasks, Dividing the initial task into subtasks, Distributing subtasks to threads, Collating the results. Without entering the details, all this implies some overhead. It will show amazing results when: Some tasks imply blocking for a long time, such as accessing a remote service, or There are not many threads running at the same time, and in particular no other parallel stream. If all subtasks imply intense calculation, the potential gain is limited by the number of available processors. Java 8 will by default use as many threads as they are processors on the computer, so, for intensive tasks, the result is highly dependent upon what other threads may be doing at the same time. Of course, if each subtask is essentially waiting, the gain may appear to be huge. The worst case is if the application runs in a server or a container alongside other applications, and subtasks do not imply waiting. In such a case, (for example running in a J2EE server), parallel streams will often be slower that serial ones. Imagine a server serving hundreds of requests each second. There are great chances that several streams might be evaluated at the same time, so the work is already parallelized. A new layer of parallelization at the business level will most probably make things slower. Worst: there are great chances that the business applications will see a speed increase in the development environment and a decrease in production. And that is the worst possible situation. Edit: for a better understanding of why parallel streams in Java 8 (and the Fork/Join pool in Java 7) are broken, refer to these excellent articles by Edward Harned: A Java Fork-Join Calamity A Java Parallel Calamity What streams are good for Stream are a useful tool because they allow lazy evaluation. This is very important in several aspect: They allow functional programming style using bindings. They allow for better performance by removing iteration. Iteration occurs with evaluation. With streams, we can bind dozens of functions without iterating. They allow easy parallelization for task including long waits. Streams may be infinite (since they are lazy). Functions may be bound to infinite streams without problem. Upon evaluation, there must be some way to make them finite. This is often done through a short circuiting operation. What streams are not good for Streams should be used with high caution when processing intensive computation tasks. In particular, by default, all streams will use the same ForkJoinPool, configured to use as many threads as there are cores in the computer on which the program is running. If evaluation of one parallel stream results in a very long running task, this may be split into as many long running sub-tasks that will be distributed to each thread in the pool. From there, no other parallel stream can be processed because all threads will be occupied. So, for computation intensive stream evaluation, one should always use a specific ForkJoinPool in order not to block other streams. To do this, one may create a Callable from the stream and submit it to the pool: List list = // A list of objects Stream stream = list.parallelStream().map(this::veryLongProcessing); Callable> task = () -> stream.collect(toList()); ForkJoinPool forkJoinPool = new ForkJoinPool(4); List newList = forkJoinPool.submit(task).get() This way, other parallel streams (using their own ForkJoinPool) will not be blocked by this one. In other words, we would need a pool of ForkJoinPool in order to avoid this problem. If a program is to be run inside a container, one must be very careful when using parallel streams. Never use the default pool in such a situation unless you know for sure that the container can handle it. In a Java EE container, do not use parallel streams. Previous articles What's Wrong with Java 8, Part I: Currying vs Closures What's Wrong in Java 8, Part II: Functions & Primitives

May 20, 2014

by Pierre-Yves Saumont

· 184,457 Views · 17 Likes

Too Fast, Too Megamorphic: what influences method call performance in Java?

whats this all about then? let’s start with a short story. a few weeks back i proposed a change on the a java core libs mailing list to override some methods which are currently final . this stimulated several discussion topics - one of which was the extent to which a performance regression would be introduced by taking a method which was final and stopping it from being final . i had some ideas about whether there would be a performance regression or not, but i put these aside to try and enquire as to whether there were any sane benchmarks published on the subject. unfortunately i couldn’t find any. that’s not to say that they don’t exist or that other people haven’t investigated the situation, but that i didn't see any public peer-reviewed code. so - time to write some benchmarks. benchmarking methodology so i decided to use the ever-awesome jmh framework in order to put together these benchmarks. if you aren't convinced that a framework will help you get accurate benchmarking results then you should look at this talk by aleksey shipilev , who wrote the framework, or nitsan wakart's really cool blog post which explains how it helps. in my case i wanted to understand what influenced the performance of method invocation. i decided to try out different variations of methods calls and measure the cost. by having a set of benchmarks and changing only one factor at a time, we can individually rule out or understand how different factors or combinations of factors influence method invocation costs. inlining let's squish these method callsites down. simultaneously the most and least obvious influencing factor is whether there is a method call at all! it's possible for the actual cost of a method call to be optimized away entirely by the compiler. there are, broadly speaking, two ways to reduce the cost of the call. one is to directly inline the method itself, the other is to use an inline cache. don't worry - these are pretty simple concepts but there's a bit of terminology involved which needs to be introduced. let's pretend that we have a class called foo , which defines a method called bar . class foo { void bar() { ... } } we can call the bar method by writing code that looks like this: foo foo = new foo(); foo.bar(); the important thing here is the location where bar is actually invoked - foo.bar() - this is referred to as a callsite . when we say a method is being "inlined" what is means is that the body of the method is taken and plopped into the callsite, in place of a method call. for programs which consist of lots of small methods (i'd argue, a properly factored program) the inlining can result in a significant speedup. this is because the program doesn't end up spending most of its time calling methods and not actually doing work! we can control whether a method is inlined or not in jmh by using the compilercontrol annotations. we'll come back to the concept of an inline cache a bit later. hierarchy depth and overriding methods do parents slow their children down? if we're choosing to remove the final keyword from a method it means that we'll be able to override it. this is another factor which we consequently need to take into account. so i took methods and called them at different levels of a class hierarchy and also had methods which were overridden at different levels of the hierarchy. this allowed me to understand or eliminate how deep class hierarchies interfere with overriding costs. polymorphism animals: how any oo concept is described. when i mentioned the idea of a callsite earlier i sneakily avoided a fairly important issue. since it's possible to override a non- final method in a subclass, our callsites can end up invoking different methods. so perhaps i pass in a foo or it's child - baz - which also implements a bar(). how does your compiler know which method to invoke? methods are by default virtual (overridable) in java it has to lookup the correct method in a table, called a vtable, for every invocation. this is pretty slow, so optimizing compilers are always trying to reduce the lookup costs involved. one approach we mentioned earlier is inlining, which is great if your compiler can prove that only one method can be called at a given callsite. this is called a monomorphic callsite. unfortunately much of the time the analysis required to prove a callsite is monomorphic can end up being impractical. jit compilers tend to take an alternative approach of profiling which types are called at a callsite and guessing that if the callsite has been monomorphic for it's first n calls then it's worth speculatively optimising based on the assumption that it always will be monomorphic. this speculative optimisation is frequently correct, but because it's not always right the compiler needs to inject a guard before the method call in order to check the type of the method. monomorphic callsites aren't the only case we want to optimise for though. many callsites are what is termed bimorphic - there are two methods which can be invoked. you can still inline bimorphic callsites by using your guard code to check which implementation to call and then jumping to it. this is still cheaper than a full method invocation. it's also possible to optimise this case using an inline cache. an inline cache doesn't actually inline the method body into a callsite but it has a specialised jump table which acts like a cache on a full vtable lookup. the hotspot jit compiler supports bimorphic inline caches and declares that any callsite with 3 or more possible implementations is megamorphic . this splits out 3 more invocation situations for us to benchmark and investigate: the monomorphic case, the bimorphic case and the megamorphic case. results let's groups up results so it's easier to see the wood from the trees, i've presented the raw numbers along with a bit of analysis around them. the specific numbers/costs aren't really of that much interest. what is interesting is the ratios between different types of method call and that the associated error rates are low. there's quite a significant difference going on - 6.26x between the fastest and slowest. in reality the difference is probably larger because of the overhead associated with measuring the time of an empty method. the source code for these benchmarks is available on github . the results aren't all presented in one block to avoid confusion. the polymorphic benchmarks at the end come from running polymorphicbenchmark , whilst the others are from javafinalbenchmark simple callsites benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.finalinvoke avgt 25 2.606 0.007 ns/op c.i.j.javafinalbenchmark.virtualinvoke avgt 25 2.598 0.008 ns/op c.i.j.javafinalbenchmark.alwaysoverriddenmethod avgt 25 2.609 0.006 ns/op our first set of results compare the call costs of a virtual method, a final method and a method which has a deep hierarchy and gets overridden. note that in all these cases we've forced the compiler to not inline the methods. as we can see the difference between the times is pretty minimal and and our mean error rates show it to be of no great importance. so we can conclude that simply adding the final keyword isn't going to drastically improve method call performance. overriding the method also doesn't seem to make much difference either. inlining simple callsites benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.inlinablefinalinvoke avgt 25 0.782 0.003 ns/op c.i.j.javafinalbenchmark.inlinablevirtualinvoke avgt 25 0.780 0.002 ns/op c.i.j.javafinalbenchmark.inlinablealwaysoverriddenmethod avgt 25 1.393 0.060 ns/op now, we've taken the same three cases and removed the inlining restriction. again the final and virtual method calls end up being of a similar time to each other. they are about 4x faster than the non-inlineable case, which i would put down to the inlining itself. the always overridden method call here ends up being between the two. i suspect that this is because the method itself has multiple possible subclass implementations and consequently the compiler needs to insert a type guard. the mechanics of this are explained above in more detail under polymorphism . class hierachy impact benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.parentmethod1 avgt 25 2.600 0.008 ns/op c.i.j.javafinalbenchmark.parentmethod2 avgt 25 2.596 0.007 ns/op c.i.j.javafinalbenchmark.parentmethod3 avgt 25 2.598 0.006 ns/op c.i.j.javafinalbenchmark.parentmethod4 avgt 25 2.601 0.006 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod1 avgt 25 1.373 0.006 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod2 avgt 25 1.368 0.004 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod3 avgt 25 1.371 0.004 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod4 avgt 25 1.371 0.005 ns/op wow - that's a big block of methods! each of the numbered method calls (1-4) refer to how deep up a class hierarchy a method was invoked upon. so parentmethod4 means we called a method declared on the 4th parent of the class. if you look at the numbers there is very little difference between 1 and 4. so we can conclude that hierarchy depth makes no difference. the inlineable cases all follow the same pattern: hierarchy depth makes no difference. our inlineable method performance is comparable to inlinablealwaysoverriddenmethod , but slower than inlinablevirtualinvoke . i would again put this down to the type guard being used. the jit compiler can profile the methods to figure out only one is inlined, but it can't prove that this holds forever. class hierachy impact on final methods benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.parentfinalmethod1 avgt 25 2.598 0.007 ns/op c.i.j.javafinalbenchmark.parentfinalmethod2 avgt 25 2.596 0.007 ns/op c.i.j.javafinalbenchmark.parentfinalmethod3 avgt 25 2.640 0.135 ns/op c.i.j.javafinalbenchmark.parentfinalmethod4 avgt 25 2.601 0.009 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod1 avgt 25 1.373 0.004 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod2 avgt 25 1.375 0.016 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod3 avgt 25 1.369 0.005 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod4 avgt 25 1.371 0.003 ns/op this follows the same pattern as above - the final keyword seems to make no difference. i would have thought it was possible here, theoretically, for inlinableparentfinalmethod4 to be proved inlineable with no type guard but it doesn't appear to be the case. polymorphism monomorphic: 2.816 +- 0.056 ns/op bimorphic: 3.258 +- 0.195 ns/op megamorphic: 4.896 +- 0.017 ns/op inlinable monomorphic: 1.555 +- 0.007 ns/op inlinable bimorphic: 1.555 +- 0.004 ns/op inlinable megamorphic: 4.278 +- 0.013 ns/op finally we come to the case of polymorphic dispatch. monomorphoric call costs are roughly the same as our regular virtual invoke call costs above. as we need to do lookups on larger vtables, they become slower as the bimorphic and megamorphic cases show. once we enable inlining the type profiling kicks in and our monomorphic and bimorphic callsites come down the cost of our "inlined with guard" method calls. so similar to the class hierarchy cases, just a bit slower. the megamorphic case is still very slow. remember that we've not told hotspot to prevent inlining here, it just doesn't implement polymorphic inline cache for callsites more complex than bimorphic. what did we learn? i think it's worth noting that there are plenty of people who don't have a performance mental model that accounts for different types of method calls taking different amounts of time and plenty of people who understand they take different amounts of time but don't really have it quite right. i know i've been there before and made all sorts of bad assumptions. so i hope this investigation has been helpful to people. here's a summary of claims i'm happy to stand by. there is a big difference between the fastest and slowest types of method invocation. in practice the addition or removal of the final keyword doesn't really impact performance, but, if you then go and refactor your hierarchy things can start to slow down. deeper class hierarchies have no real influence on call performance. monomorphic calls are faster than bimorphic calls. bimorphic calls are faster than megamorphic calls. the type guard that we see in the case of profile-ably, but not provably, monomorphic callsites does slow things down quite a bit over a provably monomorphic callsite. i would say that the cost of the type guard is my personal "big revelation". it's something that i rarely see talked about and often dismissed as being irrelevant. caveats and further work of course this isn't a conclusive treatment of the topic area! this blog has just focussed on type related factors surrounding method invoke performance. one factor i've not mentioned is the heuristics surrounding method inlining due to body size or call stack depth. if your method is too large it won't get inlined at all, and you'll still end up paying for the cost of the method call. yet another reason to write small, easy to read, methods. i've not looked into how invoking over an interface affects any of these situations. if you've found this interesting then there's an investigation of invoke interface performance on the mechanical sympathy blog. one factor that we've completely ignored here is the impact of method inlining on other compiler optimisations. when compilers are performing optimisations which only look at one method (intra-procedural optimisation) they really want as much information as they can get in order to optimize effectively. the limitations of inlining can significantly reduce the scope that other optimisations have to work with. tying the explanation right down to the assembly level to dive into more detail on the issue. perhaps these are topics for a future blog post. thanks to aleksey shipilev for feedback on the benchmarks and to martin thompson , aleksey, martijn verburg , sadiq jaffer and chris west for the very helpful feedback on the blog post.

May 15, 2014

by Richard Warburton

· 11,041 Views

Java 8 Optional: How to Use it

Java 8 comes with a new Optional type, similar to what is available in other languages. This post will go over how this new type is meant to be used, namely what is it's main use case. What is the Optional type? Optional is a new container type that wraps a single value, if the value is available. So it's meant to convey the meaning that the value might be absent. Take for example this method: public Optional findCustomerWithSSN(String ssn) { ... } Returning Optional adds explicitly the possibility that there might not be a customer for that given social security number. This means that the caller of the method is explicitly forced by the type system to think about and deal with the possibility that there might not be a customer with that SSN. The caller will have to to something like this: Optional optional = findCustomerWithSSN(ssn); if (optional.isPresent()) { Customer customer = maybeCustomer.get(); ... use customer ... } else { ... deal with absence case ... } Or if applicable, provide a default value: Long value = findOptionalLong(ssn).orElse(0L); This use of optional is somewhat similar to the more familiar case of throwing checked exceptions. By throwing a checked exception, we use the compiler to enforce callers of the API to somehow handle an exceptional case. What is Optional trying to solve? Optional is an attempt to reduce the number of null pointer exceptions in Java systems, by adding the possibility to build more expressive APIs that account for the possibility that sometimes return values are missing. If Optional was there since the beginning, most libraries and applications would likely deal better with missing return values, reducing the number of null pointer exceptions and the overall number of bugs in general. What is Optional not trying to solve Optional is not meant to be a mechanism to avoid all types of null pointers. The mandatory input parameters of methods and constructors still have to be tested for example. Like when using null, Optional does not help with conveying the meaning of an absent value. In a similar way that null can mean many different things (value not found, etc.), so can an absent Optional value. The caller of the method will still have to check the javadoc of the method for understanding the meaning of the absent Optional, in order to deal with it properly. Also in a similar way that a checked exception can be caught in an empty block, nothing prevents the caller of calling get() and moving on. What is wrong with just returning null? The problem is that the caller of the function might not have read the javadoc for the method, and forget about handling the null case. This happens frequently and is one of the main causes of null pointer exceptions, although not the only one. How should Optional be used then? Optional should be used mostly as the return type of functions that might not return a value. In the context of domain driver development, this means certain service, repository or utility methods such as the one shown above. How should Optional NOT be used? Optional is not meant to be used in these contexts, as it won't buy us anything: in the domain model layer (not serializable) in DTOs (same reason) in input parameters of methods in constructor parameters How does Optional help with functional programming? In chained function calls, Optional provides method ifPresent(), that allows to chain functions that might not return values: findCustomerWithSSN(ssn).ifPresent(() -> System.out.println("customer exists!")); Useful Links This blog post from Oracle goes further into Optional and it's uses, comparing it with similar functionality in other languages - Tired of Null Pointer Exceptions This cheat sheet provides a thorough overview of Optional - Optional in Java 8 Cheat Sheet

May 12, 2014

by Vasco Cavalheiro

· 91,728 Views · 11 Likes

Tracking Exceptions - Part 6 - Building an Executable Jar

If you’ve read the previous five blogs in this series, you’ll know that I’ve been building a Spring application that runs periodically to check a whole bunch of error logs for exceptions and then email you the results. Having written the code and the tests, and being fairly certain it’ll work the next and final step is to package the whole thing up and deploy it to a production machine. The actual deployment and packaging methods will depend upon your own organisation's processes and procedures. In this example, however, I’m going to choose the simplest way possible to create and deploy an executable JAR file. The first step was completed several weeks ago, and that’s defining our output as a JAR file in the Maven POM file, which, as you’ll probably already know, is done using the packaging element: jar It’s okay having a JAR file, but in this case there’s a further step involved: making it executable. To make a JAR file executable you need to add a MANIFEST.MF file and place it in a directory called META-INF. The manifest file is a file that describes the JAR file to both the JVM and human readers. As usual, there are a couple of ways of doing this, for example if you wanted to make life difficult for yourself, you could hand-craft your own file and place it in the META-INF directory inside the project’s src/main/resources directory. On the other hand, you could use themaven-jar plug-in and do it automatically. To do that, you need to to add the following to your POM file. org.apache.maven.plugins maven-jar-plugin 2.4 true com.captaindebug.errortrack.Main lib/ The interesting point here is the configuration element. It contains three sub-elements: addClasspath: this means that the plug-in will add the classpath to the MANIFEST.MF file so that the JVM can find all the support jars when running the app. mainClass: this tells the plug-in to add a Main-Class attribute to the MANIFEST.MF file, so that the JVM knows where to find the the entry point to the application. In this case it’s com.captaindebug.errortrack.Main classpathPrefix: this is really useful. It allows you to locate all the support jars in a different directory to the main part of the application. In this case I’ve chosen the very simple and short name of lib. If you run the build and then open up the resulting JAR file and extract and examine the /META-INF/MANIFEST.MFfile, you’ll find something rather like this: Manifest-Version: 1.0 Built-By: Roger Build-Jdk: 1.7.0_09 Class-Path: lib/spring-context-3.2.7.RELEASE.jar lib/spring-aop-3.2.7.RELEASE.jar lib/aopalliance-1.0.jar lib/spring-beans-3.2.7.RELEASE.jar lib/spring-core-3.2.7.RELEASE.jar lib/spring-expression-3.2.7.RELEASE.jar lib/slf4j-api-1.6.6.jar lib/slf4j-log4j12-1.6.6.jar lib/log4j-1.2.16.jar lib/guava-13.0.1.jar lib/commons-lang3-3.1.jar lib/commons-logging-1.1.3.jar lib/spring-context-support-3.2.7.RELEASE.jar lib/spring-tx-3.2.7.RELEASE.jar lib/quartz-1.8.6.jar lib/mail-1.4.jar lib/activation-1.1.jar Created-By: Apache Maven 3.0.4 Main-Class: com.captaindebug.errortrack.Main Archiver-Version: Plexus Archiver The last step is to marshall all the support jars into one directory, in this case the lib directory, so that the JVM can find them when you run the application. Again, there are two ways of approaching this: the easy way and the hard way. The hard way involves manually collecting together all the JAR files as defined by the POM (both direct and transient dependencies) and copying them to an output directory. The easy way involves getting the maven-dependency-plugin to do it for you. This involves adding the following to your POM file: org.apache.maven.plugins maven-dependency-plugin 2.5.1 copy-dependencies package copy-dependencies ${project.build.directory}/lib/ In this case you’re using the copy-dependencies goal executed in the package phase to copy all the project dependencies to the${project.build.directory}/lib/ directory - note that the final part of the directory path, lib, matches theclasspathPrefix setting from the previous step. In order to make life easier, I’ve also created a small run script: runme.sh: #!/bin/bash echo Running Error Tracking... java -jar error-track-1.0-SNAPSHOT.jar com.captaindebug.errortrack.Main And that’s about it. The application is just about complete. I’ve copied it to my build machine where it now monitors the Captain Debug Github sample apps and build. I could, and indeed may, add a few more features to the app. There are a few rough edges that need knocking off the code: for example is it best to run it as a separate app, or would it be a better idea to turn it into a web app? Furthermore, wouldn’t it be a good idea to ensure that the same errors aren’t reported twice? I may get around to thart soon... or maybe I'll talk about something else; so much to blog about so little time... The code for this blog is available on Github at: https://github.com/roghughe/captaindebug/tree/master/error-track. If you want to look at other blogs in this series take a look here… Tracking Application Exceptions With Spring Tracking Exceptions With Spring - Part 2 - Delegate Pattern Error Tracking Reports - Part 3 - Strategy and Package Private Tracking Exceptions - Part 4 - Spring's Mail Sender Tracking Exceptions - Part 5 - Scheduling With Spring

May 9, 2014

by Roger Hughes

· 9,883 Views

Understanding the Cloud Foundry Java Buildpack Code with Tomcat Example

Cloudfoundry's java buildpack is supporting some popular jvm based applications. This article is oriented to the audiences already with experience of cloudfoundry/heroku buildpack who want to have more understanding of how buildpack and cloudfoundry works internally. cf push app -p app.war -b build-pack-url The above command demonstrates the usage of pushing a war file to cloudfoundry by using a custom buildpack (E.g. https://github.com/cloudfoundry/java-buildpack). However, what exactly happens inside, or how cloudfoundry bootstrap the war file with tomcat? There are three contracts phase that bridge communication between buildpack and cloudfoundry. The three phases are detect, compile and release, which are three ruby shell scripts: Java buildpack has multiple sub components, while each of them has all of these three phases (E.g. tomcat is one of the sub components, while it contained another layer of sub components). Detect Phase: detect phase is to check whether a particular buildpack/component applies to the deployed application. Take the war file example, tomcat applies only when https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat.rb is true: def supports? web_inf? && !JavaBuildpack::Util::JavaMainUtils.main_class(@application) end The above code means, the tomcat applies when the application has a WEB-INF folder andthisisnot a main class bootstrapped application. Compile Phase: Compile phase would be the major/comprehensive work for a customized buildpack, while it is trying to build a file system on a lxc container. Take the example of our war application and tomcat example. In https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat/tomcat_instance.rb def compile download(@version, @uri) { |file| expand file } link_to(@application.root.children, root) @droplet.additional_libraries << tomcat_datasource_jar if tomcat_datasource_jar.exist? @droplet.additional_libraries.link_to web_inf_lib end def expand(file) with_timing "Expanding Tomcat to #{@droplet.sandbox.relative_path_from(@droplet.root)}" do FileUtils.mkdir_p @droplet.sandbox shell "tar xzf #{file.path} -C #{@droplet.sandbox} --strip 1 --exclude webapps 2>&1" @droplet.copy_resources end The above code is all about preparing the tomcat and link the application files, so the application files will be available for the tomcat classpath. Before going to the code, we have to understand the working directory when the above code executes: . => working directory .app => @application, contains the extracted war archive .buildpack/tomcat => @droplet.sandbox .buildpack/jdk .buildpack/other needed components Inside compile method: download method will download tomcat binary file (specified here: https://github.com/cloudfoundry/java-buildpack/blob/master/config/tomcat.yml), and then extract the archive file to @droplet.sandbox directory. Then copy the resources folder's files to https://github.com/cloudfoundry/java-buildpack/tree/master/resources/tomcat/conf to @droplet.sandbox/conf Symlink the @droplet.sandbox/webapps/ROOT to .app/ Symlink additional libraries (comes from other component rather than application) to the WEB-INF/lib Note: All the symlinks use relative path, since when the container deployed to DEA, the absolute paths would be different. RELEASE PHASE: Release phase is to setup instructions of how to start tomcat. Look at the code in :https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat.rb def command @droplet.java_opts.add_system_property 'http.port', '$PORT' [ @droplet.java_home.as_env_var, @droplet.java_opts.as_env_var, "$PWD/#{(@droplet.sandbox + 'bin/catalina.sh').relative_path_from(@droplet.root)}", 'run' ].flatten.compact.join(' ') end The above code does: Add java system properties http.port (referenced in tomcat server.xml) with environment properties ($PORT), this is the port on the DEA bridging to the lxc container already setup when the container was provisioned. instruction of how to run the tomcat Eg. "./bin/catalina.sh run"

May 9, 2014

by Shaozhen Ding

· 22,835 Views · 1 Like

Simple Binary Encoding

Financial systems communicate by sending and receiving vast numbers of messages in many different formats. When people use terms like "vast" I normally think, "really..how many?" So lets quantify "vast" for the finance industry. Market data feeds from financial exchanges typically can be emitting tens or hundreds of thousands of message per second, and aggregate feeds like OPRA can peek at over 10 million messages per second with volumes growing year-on-year. This presentation gives a good overview. In this crazy world we still see significant use of ASCII encoded presentations, such as FIX tag value, and some more slightly sane binary encoded presentations like FAST. Some markets even commit the sin of sending out market data as XML! Well I cannot complain too much as they have at times provided me a good income writing ultra fast XML parsers. Last year the CME, who are a member the FIX community, commissioned Todd Montgomery, of 29West LBM fame, and myself to build the reference implementation of the new FIX Simple Binary Encoding (SBE) standard. SBE is a codec aimed at addressing the efficiency issues in low-latency trading, with a specific focus on market data. The CME, working within the FIX community, have done a great job of coming up with an encoding presentation that can be so efficient. Maybe a suitable atonement for the sins of past FIX tag value implementations. Todd and I worked on the Java and C++ implementation, and later we were helped on the .Net side by the amazing Olivier Deheurles at Adaptive. Working on a cool technical problem with such a team is a dream job. SBE Overview SBE is an OSI layer 6 presentation for encoding/decoding messages in binary format to support low-latency applications. Of the many applications I profile with performance issues, message encoding/decoding is often the most significant cost. I've seen many applications that spend significantly more CPU time parsing and transforming XML and JSON than executing business logic. SBE is designed to make this part of a system the most efficient it can be. SBE follows a number of design principles to achieve this goal. By adhering to these design principles sometimes means features available in other codecs will not being offered. For example, many codecs allow strings to be encoded at any field position in a message; SBE only allows variable length fields, such as strings, as fields grouped at the end of a message. The SBE reference implementation consists of a compiler that takes a message schema as input and then generates language specific stubs. The stubs are used to directly encode and decode messages from buffers. The SBE tool can also generate a binary representation of the schema that can be used for the on-the-fly decoding of messages in a dynamic environment, such as for a log viewer or network sniffer. The design principles drive the implementation of a codec that ensures messages are streamed through memory without backtracking, copying, or unnecessary allocation. Memory access patterns should not be underestimated in the design of a high-performance application. Low-latency systems in any language especially need to consider all allocation to avoid the resulting issues in reclamation. This applies for both managed runtime and native languages. SBE is totally allocation free in all three language implementations. The end result of applying these design principles is a codec that has ~25X greater throughput than Google Protocol Buffers (GPB) with very low and predictable latency. This has been observed in micro-benchmarks and real-world application use. A typical market data message can be encoded, or decoded, in ~25ns compared to ~1000ns for the same message with GPB on the same hardware. XML and FIX tag value messages are orders of magnitude slower again. The sweet spot for SBE is as a codec for structured data that is mostly fixed size fields which are numbers, bitsets, enums, and arrays. While it does work for strings and blobs, many my find some of the restrictions a usability issue. These users would be better off with another codec more suited to string encoding. Message Structure A message must be capable of being read or written sequentially to preserve the streaming access design principle, i.e. with no need to backtrack. Some codecs insert location pointers for variable length fields, such as string types, that have to be indirected for access. This indirection comes at a cost of extra instructions plus loosing the support of the hardware prefetchers. SBE's design allows for pure sequential access and copy-free native access semantics. Figure 1 SBE messages have a common header that identifies the type and version of the message body to follow. The header is followed by the root fields of the message which are all fixed length with static offsets. The root fields are very similar to a struct in C. If the message is more complex then one or more repeating groups similar to the root block can follow. Repeating groups can nest other repeating group structures. Finally, variable length strings and blobs come at the end of the message. Fields may also be optional. The XML schema describing the SBE presentation can be found here. SbeTool and the Compiler To use SBE it is first necessary to define a schema for your messages. SBE provides a language independent type system supporting integers, floating point numbers, characters, arrays, constants, enums, bitsets, composites, grouped structures that repeat, and variable length strings and blobs. A message schema can be input into the SbeTool and compiled to produce stubs in a range of languages, or to generate binary metadata suitable for decoding messages on-the-fly. java [-Doption=value] -jar sbe.jar SbeTool and the compiler are written in Java. The tool can currently output stubs in Java, C++, and C#. Programming with Stubs A full example of messages defined in a schema with supporting code can be found here. The generated stubs follow a flyweight pattern with instances reused to avoid allocation. The stubs wrap a buffer at an offset and then read it sequentially and natively. // Write the message header first MESSAGE_HEADER.wrap(directBuffer, bufferOffset, messageTemplateVersion) .blockLength(CAR.sbeBlockLength()) .templateId(CAR.sbeTemplateId()) .schemaId(CAR.sbeSchemaId()) .version(CAR.sbeSchemaVersion()); // Then write the body of the message car.wrapForEncode(directBuffer, bufferOffset) .serialNumber(1234) .modelYear(2013) .available(BooleanType.TRUE) .code(Model.A) .putVehicleCode(VEHICLE_CODE, srcOffset); Messages can be written via the generated stubs in a fluent manner. Each field appears as a generated pair of methods to encode and decode. // Read the header and lookup the appropriate template to decode MESSAGE_HEADER.wrap(directBuffer, bufferOffset, messageTemplateVersion); finalinttemplateId = MESSAGE_HEADER.templateId(); finalintactingBlockLength = MESSAGE_HEADER.blockLength(); finalintschemaId = MESSAGE_HEADER.schemaId(); finalintactingVersion = MESSAGE_HEADER.version(); // Once the template is located then the fields can be decoded. car.wrapForDecode(directBuffer, bufferOffset, actingBlockLength, actingVersion); finalStringBuilder sb = newStringBuilder(); sb.append("\ncar.templateId=").append(car.sbeTemplateId()); sb.append("\ncar.schemaId=").append(schemaId); sb.append("\ncar.schemaVersion=").append(car.sbeSchemaVersion()); sb.append("\ncar.serialNumber=").append(car.serialNumber()); sb.append("\ncar.modelYear=").append(car.modelYear()); sb.append("\ncar.available=").append(car.available()); sb.append("\ncar.code=").append(car.code()); The generated code in all languages gives performance similar to casting a C struct over the memory. On-The-Fly Decoding The compiler produces an intermediate representation (IR) for the input XML message schema. This IR can be serialised in the SBE binary format to be used for later on-the-fly decoding of messages that have been stored. It is also useful for tools, such as a network sniffer, that will not have been compiled with the stubs. A full example of the IR being used can be found here. Direct Buffers SBE provides an abstraction to Java, via the DirectBuffer class, to work with buffers that are byte[], heap or directByteBuffer buffers, and off heap memory addresses returned from Unsafe.allocateMemory(long) or JNI. In low-latency applications, messages are often encoded/decoded in memory mapped files via MappedByteBuffer and thus can be be transferred to a network channel by the kernel thus avoiding user space copies. C++ and C# have built-in support for direct memory access and do not require such an abstraction as the Java version does. A DirectBuffer abstraction was added for C# to support Endianess and encapsulate the unsafe pointer access. Message Extension and Versioning SBE schemas carry a version number that allows for message extension. A message can be extended by adding fields at the end of a block. Fields cannot be removed or reordered for backwards compatibility. Extension fields must be optional otherwise a newer template reading an older message would not work. Templates carry metadata for min, max, null, timeunit, character encoding, etc., these are accessible via static (class level) methods on the stubs. Byte Ordering and Alignment The message schema allows for precise alignment of fields by specifying offsets. Fields are by default encoded in LittleEndian form unless otherwise specified in a schema. For maximum performance native encoding with fields on word aligned boundaries should be used. The penalty for accessing non-aligned fields on some processors can be very significant. For alignment one must consider the framing protocol and buffer locations in memory. Message Protocols I often see people complain that a codec cannot support a particular presentation in a single message. However this is often possible to address with a protocol of messages. Protocols are a great way to split an interaction into its component parts, these parts are then often composable for many interactions between systems. For example, the IR implementation of schema metadata is more complex than can be supported by the structure of a single message. We encode IR by first sending a template message providing an overview, followed by a stream of messages, each encoding the tokens from the compiler IR. This allows for the design of a very fast OTF decoder which can be implemented as a threaded interrupter with much less branching than the typical switch based state machines. Protocol design is an area that most developers don't seem to get an opportunity to learn. I feel this is a great loss. The fact that so many developers will call an "encoding" such as ASCII a "protocol" is very telling. The value of protocols is so obvious when one gets to work with a programmer like Todd who has spent his life successfully designing protocols. Stub Performance The stubs provide a significant performance advantage over the dynamic OTF decoding. For accessing primitive fields we believe the performance is reaching the limits of what is possible from a general purpose tool. The generated assembly code is very similar to what a compiler will generate for accessing a C struct, even from Java! Regarding the general performance of the stubs, we have observed that C++ has a very marginal advantage over the Java which we believe is due to runtime inserted Safepoint checks. The C# version lags a little further behind due to its runtime not being as aggressive with inlining methods as the Java runtime. Stubs for all three languages are capable of encoding or decoding typical financial messages in tens of nanoseconds. This effectively makes the encoding and decoding of messages almost free for most applications relative to the rest of the application logic. Feedback This is the first version of SBE and we would welcome feedback. The reference implementation is constrained by the FIX community specification. It is possible to influence the specification but please don't expect pull requests to be accepted that significantly go against the specification. Support for Javascript, Python, Erlang, and other languages has been discussed and would be very welcome.

May 8, 2014

by Martin Thompson

· 18,237 Views · 1 Like

Hibernate Debugging - Finding the origin of a Query

It's not always immediate why and in which part of the program is Hibernate generating a given SQL query, especially if we are dealing with code that we did not write ourselves. This post will go over how to configure Hibernate query logging, and use that together with other tricks to find out why and where in the program a given query is being executed. What does the Hibernate query log look like Hibernate has built-in query logging that looks like this: select /* load your.package.Employee */ this_.code, ... from employee this_ where this_.employee_id=? TRACE 12-04-2014@16:06:02 BasicBinder - binding parameter [1] as [NUMBER] - 1000 Why can't Hibernate log the actual query ? Notice that what is logged by Hibernate is the prepared statement sent by Hibernate to the JDBC driver plus it's parameters. The prepared statement has ? in the place of the query parameters, and the parameter values themselves are logged just bellow the prepared statement. This is not the same as the actual query sent to the database, as there is no way for Hibernate to log the actual query. The reason for this is that Hibernate only knows about the prepared statements and the parameters that it sends to the JDBC driver, and it's the driver that will build the actual queries and then send them to the database. In order to produce a log with the real queries, a tool like log4jdbc is needed, which will be the subject of another post. How to find out the origin of the query The logged query above contains a comment that allows to identify in most cases the origin of the query: if the query is due to a load by ID the comment is /* load your.entity.Name */, if it's a named query then the comment will contain the name of the query. If it's a one to many lazy initialization the comment will contain the name of the class and the property that triggered it, etc. In many cases the query comment created by is enough to identify the origin of the query. Setting up the Hibernate query log In order to obtain a query log, the following flags need to be set in the configuration of the session factory: ... true true true The example above is for Spring configuration of an entity manager factory. This is the meaning of the flags: show_sql enables query logging format_sql pretty prints the SQL use_sql_comments adds an explanatory comment In order to log the query parameters, the following log4j or equivalent configuration is needed: If everything else fails If the query comment added by the option use_sql_comments is not sufficient, then we can start by identifying the entity returned by the query based on the table names involved, and put a breakpoint in the constructor of the returned entity. If the entity does not have a constructor, then we can create one and put the breakpoint in the call to super(): @Entity public class Employee { public Employee() { super(); // put the breakpoint here } ... } When the breakpoint is hit, go to the IDE debug view containing the stack call of the program and go through it from top to bottom. The place where the query was made in the program will be there in the call stack.

May 8, 2014

by Vasco Cavalheiro

· 43,341 Views · 3 Likes