Coding Resources

The Latest Coding Topics

neo4j is quite awesome for pretty much everything . here is an extract i found from this blog whenever someone gives you a problem, think graphs . they are the most fundamental and flexible way of representing any kind of a relationship, so it’s about a 50-50 shot that any interesting design problem has a graph involved in it. make absolutely sure you can’t think of a way to solve it using graphs before moving on to other solution types. as a baby step, let us try to visualize an xml as a graph. we can start with a simple xml here, belgian waffles $5.95 two of our famous belgian waffles with plenty of real maple syrup 650 it is a simple ‘breakfast menu’ with just one item, ‘ belgian waffles ‘. neo4j has a neat interface which lets us visualize the graph we created. the graph for the above xml looks slick.. the above xml has 6 tags and we have 6 nodes in our graph. the nested tags/nodes has a ‘child-of’ relationship with the parent tag/node. in the graph above, node numbered 0 is the root node – corresponds to the tag node numbered 1 is the immediate child of the root – corresponds to the tag nodes numbered from 2-4 are the tags which come under tag, viz , , and transforming xml.. let us parse the xml into a data structure which can be easily persisted using neo4j . personally, i would prefer sax parser as there are serious memory constraints when creating a dom object (really painful if you talking big data as xml). additionally sax parsing gives you unlimited freedom to do whatever you want to do with the xml. this is how i represent a node, using the object xmlelement . each xmlelement is identified by an id. this is basically an integer or long. only thing to make sure is that the number has to be unique across all xmlelements. publicclassxmlelement { privatestring tagname; privatestring tagvalue; privatemap attributes = newhashmap(); privatehierarchyidentifier hierarchyidentifier; privateintparentid; privatebooleanpersisted; //setters and getters for the members publicstring getatrributestring(){ objectmapper jsonmapper = newobjectmapper(); try{ returnjsonmapper.writevalueasstring(this.attributes); } catch(jsonprocessingexception e) { logger.severe(e.getmessage()); e.printstacktrace(); } returnnull; } } the xml tag name and value are stored as string and the attributes are stored as a map. also, there is another member object, hierarchyidentifier publicclasshierarchyidentifier { privateintdepth; privateintwidth; privateintid; //getters and setters for member element } the hierarchyidentifier class contains the id class which is used to identify an xmlelement , which translates to identifying an xml tag. each xmlelement has the id of it’s parent stored. and after the parse, the xml should be represented as a map of and each xmlelement identified by the corresponding integer id. you can see the sax parser here .. so we pass the map to the graphwriter object. we create a node with the following properties value – the value of the xml tag id – the id of the current node parent – the id of the parent tag attributes – the string representation of the tag attributes also, the node has the following labels node – for all nodes parent – for the seed ( highest parent ) node xml tag name – say, the node for the tag will have label food currently, there is only a single type of relationship, ‘child_of’ , between immediate nested tags. see the below the example for a bigger xml belgian waffles $5.95 two of our famous belgian waffles with plenty of real maple syrup 650 masala dosa $10.95 south india's famous slim pancake with mashed potatoes 650 25 i have added a second item to the ‘breakfast_menu’.. the legendary masala dosa . also, the second food item has a new child tag, . and the resultant graph looks prettier the sets of tags,relationships and properties can be seen in the neo4j local server interface. the entire codebase can be found here in github .. visualizing as a graphgist an easy representation can be done in graphgist. i have created the graph of this xml. here is the link to the gist . also, the gist script can be found in the github gist here ...

May 29, 2014

by Nikhil Kuriakose

· 23,768 Views

Implementing Correlation IDs in Spring Boot (for Distributed Tracing in SOA/Microservices)

After attending Sam Newman’s microservice talks at Geecon last week I started to think more about what is most likely an essential feature of service-oriented/microservice platforms for monitoring, reporting and diagnostics: correlation ids. Correlation ids allow distributed tracing within complex service oriented platforms, where a single request into the application can often be dealt with by multiple downstream service. Without the ability to correlate downstream service requests it can be very difficult to understand how requests are being handled within your platform. I’ve seen the benefit of correlation ids in several recent SOA projects I have worked on, but as Sam mentioned in his talks, it’s often very easy to think this type of tracing won’t be needed when building the initial version of the application, but then very difficult to retrofit into the application when you do realise the benefits (and the need for!). I’ve not yet found the perfect way to implement correlation ids within a Java/Spring-based application, but after chatting to Sam via email he made several suggestions which I have now turned into a simple project using Spring Boot to demonstrate how this could be implemented. Why? During both of Sam’s Geecon talks he mentioned that in his experience correlation ids were very useful for diagnostic purposes. Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash) The big players in the service-oriented field have been talking about the need for distributed tracing and correlating requests for quite some time, and as such Twitter have created their open source Zipkin framework (which often plugs into their RPC framework Finagle), and Netflix has open-sourced their Karyon web/microservice framework, both of which provide distributed tracing. There are of course commercial offering in this area, one such product being AppDynamics, which is very cool, but has a rather hefty price tag. Creating a proof-of-concept in Spring Boot As great as Zipkin and Karyon are, they are both relatively invasive, in that you have to build your services on top of the (often opinionated) frameworks. This might be fine for some use cases, but no so much for others, especially when you are building microservices. I’ve been enjoying experimenting with Spring Boot of late, and this framework builds on the much known and loved (at least by me :-) ) Spring framework by providing lots of preconfigured sensible defaults. This allows you to build microservices (especially ones that communicate via RESTful interfaces) very rapidly. The remainder of this blog pos explains how I implemented a (hopefully) non-invasive way of implementing correlation ids. Goals Allow a correlation id to be generated for a initial request into the application Enable the correlation id to be passed to downstream services, using as method that is as non-invasive into the code as possible Implementation I have created two projects on GitHub, one containing an implementation where all requests are being handled in a synchronous style (i.e. the traditional Spring approach of handling all request processing on a single thread), and also one for when an asynchronous (non-blocking) style of communication is being used (i.e., using the Servlet 3 asynchronous support combined with Spring’s DeferredResult and Java’s Futures/Callables). The majority of this article describes the asynchronous implementation, as this is more interesting: Spring Boot asynchronous (DeferredResult + Futures) communication correlation id Github repo The main work in both code bases is undertaken by the CorrelationHeaderFilter, which is a standard Java EE Filter that inspects the HttpServletRequest header for the presence of a correlationId. If one is found then we set a ThreadLocal variable in the RequestCorrelation Class (discussed later). If a correlation id is not found then one is generated and added to the RequestCorrelation Class: public class CorrelationHeaderFilter implements Filter { //... @Override public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException { final HttpServletRequest httpServletRequest = (HttpServletRequest) servletRequest; String currentCorrId = httpServletRequest.getHeader(RequestCorrelation.CORRELATION_ID_HEADER); if (!currentRequestIsAsyncDispatcher(httpServletRequest)) { if (currentCorrId == null) { currentCorrId = UUID.randomUUID().toString(); LOGGER.info("No correlationId found in Header. Generated : " + currentCorrId); } else { LOGGER.info("Found correlationId in Header : " + currentCorrId); } RequestCorrelation.setId(currentCorrId); } filterChain.doFilter(httpServletRequest, servletResponse); } //... private boolean currentRequestIsAsyncDispatcher(HttpServletRequest httpServletRequest) { return httpServletRequest.getDispatcherType().equals(DispatcherType.ASYNC); } The only thing is this code that may not instantly be obvious is the conditional check currentRequestIsAsyncDispatcher(httpServletRequest), but this is here to guard against the correlation id code being executed when the Async Dispatcher thread is running to return the results (this is interesting to note, as I initially didn’t expect the Async Dispatcher to trigger the execution of the filter again?) Here is the RequestCorrelation Class, which contains a simple ThreadLocal static variable to hold the correlation id for the current Thread of execution (set via the CorrelationHeaderFilter above) public class RequestCorrelation { public static final String CORRELATION_ID = "correlationId"; private static final ThreadLocal id = new ThreadLocal(); public static String getId() { return id.get(); } public static void setId(String correlationId) { id.set(correlationId); } } Once the correlation id is stored in the RequestCorrelation Class it can be retrieved and added to downstream service requests (or data store access etc) as required by calling the static getId() method within RequestCorrelation. It is probably a good idea to encapsulate this behaviour away from your application services, and you can see an example of how to do this in a RestClient Class I have created, which composes Spring’s RestTemplate and handles the setting of the correlation id within the header transparently from the calling Class. @Component public class CorrelatingRestClient implements RestClient { private RestTemplate restTemplate = new RestTemplate(); @Override public String getForString(String uri) { String correlationId = RequestCorrelation.getId(); HttpHeaders httpHeaders = new HttpHeaders(); httpHeaders.set(RequestCorrelation.CORRELATION_ID, correlationId); LOGGER.info("start REST request to {} with correlationId {}", uri, correlationId); //TODO: error-handling and fault-tolerance in production ResponseEntity response = restTemplate.exchange(uri, HttpMethod.GET, new HttpEntity(httpHeaders), String.class); LOGGER.info("completed REST request to {} with correlationId {}", uri, correlationId); return response.getBody(); } } //... calling Class public String exampleMethod() { RestClient restClient = new CorrelatingRestClient(); return restClient.getForString(URI_LOCATION); //correlation id handling completely abstracted to RestClient impl } Making this work for asynchronous requests… The code included above works fine when you are handling all of your requests synchronously, but it is often a good idea in a SOA/microservice platform to handle requests in a non-blocking asynchronous manner. In Spring this can be achieved by using the DeferredResult Class in combination with the Servlet 3 asynchronous support. The problem with using ThreadLocal variables within the asynchronous approach is that the Thread that initially handles the request (and creates the DeferredResult/Future) will not be the Thread doing the actual processing. Accordingly, a bit of glue code is needed to ensure that the correlation id is propagated across the Threads. This can be achieved by extending Callable with the required functionality: (don’t worry if example Calling Class code doesn’t look intuitive – this adaption between DeferredResults and Futures is a necessary evil within Spring, and the full code including the boilerplate ListenableFutureAdapter is in my GitHub repo): public class CorrelationCallable implements Callable { private String correlationId; private Callable callable; public CorrelationCallable(Callable targetCallable) { correlationId = RequestCorrelation.getId(); callable = targetCallable; } @Override public V call() throws Exception { RequestCorrelation.setId(correlationId); return callable.call(); } } //... Calling Class @RequestMapping("externalNews") public DeferredResult externalNews() { return new ListenableFutureAdapter<>(service.submit(new CorrelationCallable<>(externalNewsService::getNews))); } And there we have it – the propagation of correlation id regardless of the synchronous/asynchronous nature of processing! You can clone the Github report containing my asynchronous example, and execute the application by running mvn spring-boot:run at the command line. If you access http://localhost:8080/externalNews in your browser (or via curl) you will see something similar to the following in your Spring Boot console, which clearly demonstrates a correlation id being generated on the initial request, and then this being propagated through to a simulated external call (have a look in the ExternalNewsServiceRest Class to see how this has been implemented): [nio-8080-exec-1] u.c.t.e.c.w.f.CorrelationHeaderFilter : No correlationId found in Header. Generated : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : start REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 [nio-8080-exec-2] u.c.t.e.c.w.f.CorrelationHeaderFilter : Found correlationId in Header : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : completed REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 Conclusion I’m quite happy with this simple prototype, and it does meet the two goals I listed above. Future work will include writing some tests for this code (shame on me for not TDDing!), and also extend this functionality to a more realistic example. I would like to say a massive thanks to Sam, not only for sharing his knowledge at the great talks at Geecon, but also for taking time to respond to my emails. If you’re interested in microservices and related work I can highly recommend Sam’s Microservice book which is available in Early Access at O’Reilly. I’ve enjoyed reading the currently available chapters, and having implemented quite a few SOA projects recently I can relate to a lot of the good advice contained within. I’ll be following the development of this book with keen interest! If you have any comments or thoughts then please do share them via the comment below, or feel free to get in touch via the usual mechanisms! References I used Tomasz Nurkiewicz’s excellent blog several times for learning how best to wire up all of the DeferredResult/Future code in Spring: http://www.nurkiewicz.com/2013/03/deferredresult-asynchronous-processing.html

May 28, 2014

by Daniel Bryant

· 71,790 Views · 1 Like

Spring/Hibernate Improved SQL Logging with log4jdbc

Hibernate provides SQL logging out of the box, but such logging only shows prepared statements, and not the actual SQL queries sent to the database. It also does not log the execution time of each query, which is useful for performance troubleshooting. This blog post will go over how to setup Hibernate query logging, and then compare it to the logging that can be obtained with log4jdbc . The Hibernate query logging functionality Hibernate does not log the real SQL queries sent to the database. This is because Hibernate interacts with the database via the JDBC driver, to which it sends prepared statements but not the actual queries. So Hibernate can only log the prepared statements and the values of their binding parameters, but not the actual SQL queries themselves. This is how a query looks like when logged by Hibernate: select /* load your.package.Employee */ this_.code, ... from employee this_ where this_.employee_id=? TRACE 12-04-2014@16:06:02 BasicBinder - binding parameter [1] as [NUMBER] - 1000 See this post Hibernate Debugging - Finding the origin of a Query for how to setup this type of logging. Using log4jdbc For a developer it's useful to be able to copy paste a query from the log and be able to execute the query directly in an SQL client, but the variable placeholders ? make that unfeasible. Log4jdbc in an open source tool that allows to do just that, and more. Log4jdbc is a spy driver that will wrap itself around the real JDBC driver, logging queries as they go through it. The version linked from this post provides Spring integration, unlike several other log4jdbc forks. Setting up log4jdbc First include the log4jdbc-remix library in your pom.xml. This library is a fork of the original log4jdbc: org.lazyluke log4jdbc-remix0.2.7 Next, find in the Spring configuration the definition of the data source. As an example, when using the JNDI lookup element this is how the data source looks like: After finding the data source definition, rename it to the following name: Then define a new log4jdbc data source that wraps the real data source, and give it the original name: With this configuration, the query logging should already be working. It's possible to customize the logging level of the several log4jdbc loggers available. The original log4jdbc documentation provides more information on the available loggers: jdbc.sqlonly: Logs only SQL jdbc.sqltiming: Logs the SQL, post-execution, including timing execution statistics jdbc.audit: Logs ALL JDBC calls except for ResultSets jdbc.resultset: all calls to ResultSet objects are logged jdbc.connection: Logs connection open and close events The jdbc.audit logger is especially useful to validate the scope of transactions, as it logs the begin/commit/rollback events of a database transaction. This is the proposed log4j configuration that will print only the SQL queries together with their execution time: Conclusion Using log4jdbc does imply some initial setup, but once it's in place it's really convenient to have. Having a true query log is also useful for performance troubleshooting (to be described in a future post).

May 27, 2014

by Vasco Cavalheiro

· 19,783 Views · 1 Like

Cisco AnyConnect and Hyper-V - Connect to a VPN from Inside a VM Session

Clients and VMs and VPNs, Oh My! As regular readers of this blog may be aware, I recently hung up my technical evangelist hat, and made the jump back into full-time consulting. Consistent with best practices, I decided that when working with a new client, the best course of action would be to set up a new virtual machine to keep all of the development environment, tools, and files isolated from anything on my host machine, which helps minimize the risk that installing the latest bleeding-edge tools (which are good to have to stay ahead of the learning curve) don't endanger the work I'm doing for the client. With my current client, I need to be able to access files, servers, and tools on their remote network, which they enable via the Cisco AnyConnect VPN client software. So far, so good. I had no trouble at all installing and connecting with this software from my laptop over my FiOS connection. Just like being at the office. The Tricky Part Unfortunately, the VPN connection does not pass through to the virtual machine I set up, using client Hyper-V on Windows 8.1 (update 1). Which is interesting, because while I was onsite recently, when I connected to the LAN directly via cable, that connection would pass through to the VM. But since I'm not a networking geek, I'll leave that to others to explain. So, the next step was to try installing the VPN client software in the VM itself. But it was not to be. The client software installs fine, but I found that when I tried to connect, I'd get the following error message: OK, so now what? Well, truth be told, since I didn't have time to troubleshoot this immediately, I set the problem aside for a while, which can be a good way to let your brain work on the problem while you're doing other things. Or sometimes, you get lucky...this was one of those times. Basic or Enhanced? By good fortune, this morning, I ran across a brief blog post by Osama Mourad (No, not the same person who runs one of the CMAP Special Interest Groups), which suggested that connecting the VPN was possible "if connected to the VM using Hyper-V Manager." A bit cryptic, but it gave me hope that it was at least possible. Here's where luck comes in. I was trying to see if there was a different way to connect to the VM from Hyper-V Manager, when I noticed that if I did not have the VM session window full-screen, there is an icon at the end of the toolbar that looks like this: That button switches the VM session from Enhanced Session Mode (the default in newer versions of Hyper-V), which uses a Remote Desktop Connection to interact with the VM, to Basic Session Mode, which provides simple screen, keyboard, and mouse redirection. And beautifully, it turns out that in Basic Session Mode, connecting the VPN works just fine. And once connected, you can switch back to Enhanced Session Mode, and the VPN will remain connected. Conclusion Using a virtual machine is a good practice for keeping your client environment isolated from your day-to-day experiments or bleeding edge tools, etc. And it also has the advantage of making the environment portable. You can store the VM files on a portable drive, or copy them from one machine to another if you need to migrate systems. But along with the convenience comes the occasional head-scratcher or stumbling block. I hope that this post will help anyone else who runs into this particular issue resolve their problem. You can learn more about Enhanced Session Mode from this TechNet article. My thanks to Osama for the clue that helped me track down the solution.

May 26, 2014

by G. Andrew Duthie

· 17,019 Views

DO... WHILE and REPEAT... UNTIL Loops in MS SQL

Introduction When I am looking for a forum post related to SQL Server, one of the junior professional is asking how to use a DO…WHILE loop is MS SQL Server. Several people wrote their opinion related to it. Everyone is saying to use WHILE loop and some of them suggesting with T-SQL structure of CURSOR with WHILE LOOP. Obviously, when a junior professional is learning MS SQL server, the question in mind arises: is there DO… WHILE, REPEAT … UNTIL loop present in MS SQL Server as there is in C or C++ etc? No one is answering directly on the forum whether we can use DO… WHILE or REPEAT … UNTIL in MS SQL Server or NOT. If yes, how can we implement them? DO… WHILE in MS SQL Sever First we look at the algorithm of DO… WHILE. SET X = 1 DO PRINT X SET X = X + 1 WHILE X <= 10 Now we try to implement it in MS SQL Server. DECLARE @X INT=1; WAY: --> Here the DO statement PRINT @X; SET @X += 1; IF @X<=10 GOTO WAY; --> Here the WHILE @X<=1 REPEAT… UNTIL First we look at the algorithm of REPEAT... UNTIL SET X = 1 REPEAT PRINT X SET X = X + 1 UNTIL X > 10 Now we try to implement it in MS SQL Server DECLARE @X INT = 1; WAY: -- Here the REPEAT statement PRINT @X; SET @X += 1; IFNOT(@X >1 0) GOTO WAY; -- Here the UNTIL @X>10 So we see that it is possible, but a little complicated… So most developers prefer the WHILE loop in MS SQL Server.

May 26, 2014

by Joydeep Das

· 103,175 Views

Running the Maven Release Plugin with Jenkins

Learn more about using the Maven Release plugin on Jenkins, including subversion source control, artifactory, continuous integration, and more.

May 23, 2014

by $$anonymous$$

· 104,284 Views · 6 Likes

String Interning — What, Why, and When?

Learn about string interning, a method of storing only one copy of each distinct string value, which must be immutable.

May 23, 2014

by Saurabh Chhajed

· 145,617 Views · 13 Likes

Quick Note: SSL with SOAP and SOAPUI

For doing SSL with SOAP, there’s a few things you need to setup. C:\Program Files (x86)\SmartBear\soapUI-Pro-4.5.1\jre\lib\security Also did it at the main jre at C:\Program Files (x86)\Java\jre7\lib\security keytool -genkey -alias svs -keyalg RSA -keystore keystore.jks -keysize 2048 git config --global core.autocrlf true javax.net.ssl.trustStore=<> javax.net.ssl.trustStorePassword=<> If these properties are not set, the default ones will be picked up from your the default location.[$JAVA_HOME/lib/security/jssecacerts, $JAVA_HOME/lib/security/cacerts] To view the contents of keystore file, use: keytool -list -v -keystore file.keystore -storepass changeit To debug the ssl handshake process and view the certificates, set the VM parameter -Djavax.net.debug=all keytool -genkey -keyalg RSA -alias selfsigned -keystore keystore.jks -storepass changeit -validity 360 -keysize 2048 -Djava.net.preferIPv4Stack=true added to soapui.bat C:\Program Files (x86)\SmartBear\SoapUI-4.6.3\bin -Djavax.net.debug=ssl,trustmanager http://docs.oracle.com/cd/E19509-01/820-3503/ggfgo/index.html http://www.sslshopper.com/article-most-common-java-keytool-keystore-commands.html http://ianso.blogspot.com/2009/12/building-ws-security-enabled-soap.html http://javarevisited.blogspot.com/2012/09/difference-between-truststore-vs-keyStore-Java-SSL.html http://javarevisited.blogspot.com/2012/03/add-list-certficates-java-keystore.html http://www.ibm.com/developerworks/java/library/j-jws17/index.html http://www.coderanch.com/t/223027/Web-Services/java/SOAP-HTTPS-SSL http://ruchirawageesha.blogspot.in/2010/07/how-to-create-clientserver-keystores.html http://stackoverflow.com/questions/11001102/how-to-programmatically-set-the-sslcontext-of-a-jax-ws-client http://busylog.net/ssl-java-keytool-soap-and-eclipse/ http://www.sslshopper.com/article-how-to-create-a-self-signed-certificate-using-java-keytool.html openssl s_client -showcerts -host webservices-cert.storedvalue.com -port 443 keytool -keystore clientkeystore -genkey -alias client wsdl2java.bat -uri my.wsdl -o svsproj -p com.agilemobiledeveloper.service -d xmlbeans -t -ss -ssi -sd -g -ns2p System.setProperty("javax.net.ssl.keyStore", keystore.jks"); System.setProperty("javax.net.ssl.keyStorePassword", "changeit"); System.setProperty("javax.net.ssl.trustStore", "clientkeystore"); System.setProperty("javax.net.ssl.trustStorePassword", "changeit"); setx -m JAVA_HOME "C:\Program Files\Java\jdk1.7.0_51″ setx -m javax.net.ssl.keyStore "keystore.jks"); setx -m javax.net.ssl.keyStorePassword "changeit"); setx -m javax.net.ssl.trustStore "keystore.jks"); setx -m javax.net.ssl.trustStorePassword "passwordislong");

May 23, 2014

by Tim Spann

CORE

· 19,320 Views · 1 Like

What's Wrong in Java 8, Part IV: Monads

Further exploring what's wrong with Java 8, we turn our discussion to monads.

May 21, 2014

by Pierre-Yves Saumont

· 111,443 Views · 36 Likes

Load All Implementors of an Interface into List with Spring

last week i wrote a blog post how to load complete inheritance tree of spring beans into list . a similar feature can be used for autowiring all implementors of a certain interface. let’s have this structure of spring beans. notice that bear is abstract class, therefore it’s not a spring bean. so we have three beas: wolf, polarbear and grizzly. in following snippet are all implementors loaded into list with constructor injection: @service public class nature { list runners; @autowired public nature(list runners) { this.runners = runners; } public void showrunners() { runners.foreach(system.out::println); } } method showrunners is using java 8 foreach method that consumes method reference. this construct outputs loaded beans into console. you would find a lot of reading about these new java 8 features these days. spring context is loaded by this main class: public class main { public static void main(string[] args) { annotationconfigapplicationcontext context = new annotationconfigapplicationcontext(springcontext.class); nature nature = context.getbean(nature.class); nature.showrunners(); } } console output: polarbear [] wolf [] grizzly [] this feature can be handy sometimes. source code of this short example is on github .

May 20, 2014

by Lubos Krnac

· 60,049 Views · 1 Like

What's Wrong in Java 8, Part III: Streams and Parallel Streams

When the first early access versions of Java 8 were made available, what seemed the most important (r)evolution were lambdas. This is now changing and many developers seem to think now that streams are the most valuable Java 8 feature. And this is because they believe that by changing a single word in their programs (replacing stream with parallelStream) they will make these programs work in parallel. Many Java 8 evangelists have demonstrated amazing examples of this. Is there something wrong with this? No. Not something. Many things: Running in parallel may or may not be a benefit. It depends what you are using this feature for. Java 8 parallel streams may make your programs run faster. Or not. Or even slower. Thinking about streams as a way to achieve parallel processing at low cost will prevent developers to understand what is really happening. Streams are not directly linked to parallel processing. Most of the above problems are based upon a misunderstanding: parallel processing is not the same thing as concurrent processing. And most examples shown about “automatic parallelization” with Java 8 are in fact examples of concurrent processing. Thinking about map, filter and other operations as “internal iteration” is a complete nonsense (although this is not a problem with Java 8, but with the way we use it). So, what are streams According to Wikipedia: “a stream is a potentially infinite analog of a list, given by the inductive definition: data Stream a = Cons a (Stream a) Generating and computing with streams requires lazy evaluation, either implicitly in a lazily evaluated language or by creating and forcing thunks in an eager language.” One most important think to notice is that Java is what Wikipedia calls an “eager” language, which means Java is mostly strict (as opposed to lazy) in evaluating things. For example, if you create a List in Java, all elements are evaluated when the list is created. This may surprise you, since you may create an empty list and add elements after. This is only because either the list is mutable (and you are replacing a null reference with a reference to something) or you are creating a new list from the old one appended with the new element. Lists are created from something producing its elements. For example: List list = Arrays.asList(1, 2, 3, 4, 5); Here the producer is an array, and all elements of the array are strictly evaluated. It is also possible to create a list in a recursive way, for example the list starting with 1 and where all elements are equals to 1 plus the previous element and smaller than 6. In Java < 8, this translates into: List list = new ArrayList(); for(int i = 0; i < 6; i++) { list.add(i); } One may argue that the for loop is one of the rare example of lazy evaluation in Java, but the result is a list in which all elements are evaluated. What happens if we want to apply a function to all elements of this list? We may do this in a loop. For example, if with want to increase all elements by 2, we may do this: for(int i = 0; i < list.size(); i++) { list.set(i, list.get(i) * 2); } However, this does not allow using an operation that changes the type of the elements, for example increasing all elements by 10%. The following solution solves this problem: List list2 = new ArrayList(); for(int i = 0; i < list.size(); i++) { list2.add(list.get(i) * 1.2); } This form allows the use of a the Java 5 for each syntax: List list2 = new ArrayList<>(); for(Integer i : list) { list2.add(i * 1.2); } or the Java 8 syntax: List list2 = new ArrayList<>(); list.forEach(x -> list2.add(x * 1.2)); So far, so good. But what if we want to increase the value by 10% and then divide it by 3? The trivial answer would be to do: List list2 = new ArrayList<>(); list.forEach(x -> list2.add(x * 1.2)); List list3 = new ArrayList<>(); list2.forEach(x -> list3.add(x / 3)); This is far from optimal because we are iterating twice on the list. A much better solution is: List list2 = new ArrayList<>(); for(Integer i : list) { list2.add(i * 1.2 / 3); } Let aside the auto boxing/unboxing problem for now. In Java 8, this can be written as: List list2 = new ArrayList<>(); list.forEach(x -> list2.add(x * 1.2 / 3)); But wait... This is only possible because we see the internals of the Consumer bound to the list, so we are able to manually compose the operations. If we had: List list2 = new ArrayList<>(); list.forEach(consumer1); List list3 = new ArrayList<>(); list2.forEach(consumer2); How could we know how to compose them? No way. In Java 8, the Consumer interface has a default method andThen. We could be tempted to compose the consumers this way: list.forEach(consumer1.andThen(consumer2)); but this will result in an error, because andThen is defined as: default Consumer andThen(Consumer after) { Objects.requireNonNull(after); return (T t) -> { accept(t); after.accept(t); }; } This means that we can't use andThen to compose consumers of different types. In fact, we have it all wrong since the beginning. What we need is to bind the list to a function in order to get a new list, such as: Function function1 = x -> x * 1.2; Function function2 = x -> x / 3; list.bind(function1).bind(function2); where the bind method would be defined in a special FList class like: public class FList { final List list; public FList(List list) { this.list = list; } public FList bind(Function f) { List newList = new ArrayList(); for (T t : list) { newList.add(f.apply(t)); } return new FList(newList); } } and we would use it as in the following example: new Flist<>(list).bind(function1).bind(function2); The only trouble we have then is that binding twice would require iterating twice on the list. This is because bind is evaluated strictly. What we would need is a lazy evaluation, so that we could iterate only once. The problem here is that the bind method is not a real binding. It is in reality a composition of a real binding and a reduce. "Reducing" is applying an operation to each element of the list, resulting in the combination of this element and the result of the same operation applied to the previous element. As there is no previous element when we start from the first element, we start with an initial value. For example, applying (x) -> r + x, where r is the result of the operation on the previous element, or 0 for the first element, gives the sum of all elements of the list. Applying () -> r + 1 to each element, starting with r = 0 gives the length of the list. (This may not be the more efficient way to get the length of the list, but it is totally functional!) Here, the operation is add(element) and the initial value is an empty list. And this occurs only because the function application is strictly evaluated. What Java 8 streams give us is the same, but lazily evaluated, which means that when binding a function to a stream, no iteration is involved! Binding a Function to a Stream gives us a Stream with no iteration occurring. The resulting Stream is not evaluated, and this does not depend upon the fact that the initial stream was built with evaluated or non evaluated data. In functional languages, binding a Function to a Stream is itself a function. In Java 8, it is a method, which means it's arguments are strictly evaluated, but this has nothing to do with the evaluation of the resulting stream. To understand what is happening, we can imagine that the functions to bind are stored somewhere and they become part of the data producer for the new (non evaluated) resulting stream. In Java 8, the method binding a function T -> U to a Stream, resulting in a Stream is called map. The function binding a function T -> Stream to a Stream, resulting in a Stream is called flatMap. Where is flatten? Most functional languages also offer a flatten function converting a Stream> into a Stream, but this is missing in Java 8 streams. It may not look like a big trouble since it is so easy to define a method for doing this. For example, given the following function: Function> f = x -> Stream.iterate(1, y -> y + 1).limit(x); Stream stream = Stream.iterate(1, x -> x + 1); Stream stream2 = stream.limit(5).flatMap(f); System.out.println(stream2.collect(toList())) to produce: [1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5] Using map instead of flatMap: Stream stream = Stream.iterate(1, x -> x + 1); Stream stream2 = stream.limit(5).map(f); System.out.println(stream2.collect(toList())) will produce a stream of streams: [java.util.stream.SliceOps$1@12133b1, java.util.stream.SliceOps$1@ea2f77, java.util.stream.SliceOps$1@1c7353a, java.util.stream.SliceOps$1@1a9515, java.util.stream.SliceOps$1@f49f1c] Converting this stream of streams of integers to a stream of integers is very straightforward using the functional paradigm: one just need to flatMap the identity function to it: System.out.println(stream2.flatMap(x -> x).collect(toList())); It is however strange that a flatten method has not been added to the stream, knowing the strong relation that ties map, flatMap, unit and flatten, where unit is the function from T to Stream, represented by the method: Stream Stream.of(T... t) When are stream evaluated? Streams are evaluated when we apply to them some specific operations called terminal operation. This may be done only once. Once a terminal operation is applied to a stream, is is no longer usable. Terminal operations are: forEach forEachOrdered toArray reduce collect min max count anyMatch allMatch noneMatch findFirst findAny iterator spliterator Some of these methods are short circuiting. For example, findFirst will return as soon as the first element will be found. Non terminal operations are called intermediate and can be stateful (if evaluation of an element depends upon the evaluation of the previous) or stateless. Intermediate operations are: filter map mapTo... (Int, Long or Double) flatMap flatMapTo... (Int, Long or Double) distinct sorted peek limit skip sequential parallel unordered onClose Several intermediate operations may be applied to a stream, but only one terminal operation may be use. So what about parallel processing? One most advertised functionality of streams is that they allow automatic parallelization of processing. And one can find the amazing demonstrations on the web, mainly based of the same example of a program contacting a server to get the values corresponding to a list of stocks and finding the highest one not exceeding a given limit value. Such an example may show an increase of speed of 400 % and more. But this example as little to do with parallel processing. It is an example of concurrent processing, which means that the increase of speed will be observed also on a single processor computer. This is because the main part of each “parallel” task is waiting. Parallel processing is about running at the same time tasks that do no wait, such as intensive calculations. Automatic parallelization will generally not give the expected result for at least two reasons: The increase of speed is highly dependent upon the kind of task and the parallelization strategy. And over all things, the best strategy is dependent upon the type of task. The increase of speed in highly dependent upon the environment. In some environments, it is easy to obtain a decrease of speed by parallelizing. Whatever the kind of tasks to parallelize, the strategy applied by parallel streams will be the same, unless you devise this strategy yourself, which will remove much of the interest of parallel streams. Parallelization requires: A pool of threads to execute the subtasks, Dividing the initial task into subtasks, Distributing subtasks to threads, Collating the results. Without entering the details, all this implies some overhead. It will show amazing results when: Some tasks imply blocking for a long time, such as accessing a remote service, or There are not many threads running at the same time, and in particular no other parallel stream. If all subtasks imply intense calculation, the potential gain is limited by the number of available processors. Java 8 will by default use as many threads as they are processors on the computer, so, for intensive tasks, the result is highly dependent upon what other threads may be doing at the same time. Of course, if each subtask is essentially waiting, the gain may appear to be huge. The worst case is if the application runs in a server or a container alongside other applications, and subtasks do not imply waiting. In such a case, (for example running in a J2EE server), parallel streams will often be slower that serial ones. Imagine a server serving hundreds of requests each second. There are great chances that several streams might be evaluated at the same time, so the work is already parallelized. A new layer of parallelization at the business level will most probably make things slower. Worst: there are great chances that the business applications will see a speed increase in the development environment and a decrease in production. And that is the worst possible situation. Edit: for a better understanding of why parallel streams in Java 8 (and the Fork/Join pool in Java 7) are broken, refer to these excellent articles by Edward Harned: A Java Fork-Join Calamity A Java Parallel Calamity What streams are good for Stream are a useful tool because they allow lazy evaluation. This is very important in several aspect: They allow functional programming style using bindings. They allow for better performance by removing iteration. Iteration occurs with evaluation. With streams, we can bind dozens of functions without iterating. They allow easy parallelization for task including long waits. Streams may be infinite (since they are lazy). Functions may be bound to infinite streams without problem. Upon evaluation, there must be some way to make them finite. This is often done through a short circuiting operation. What streams are not good for Streams should be used with high caution when processing intensive computation tasks. In particular, by default, all streams will use the same ForkJoinPool, configured to use as many threads as there are cores in the computer on which the program is running. If evaluation of one parallel stream results in a very long running task, this may be split into as many long running sub-tasks that will be distributed to each thread in the pool. From there, no other parallel stream can be processed because all threads will be occupied. So, for computation intensive stream evaluation, one should always use a specific ForkJoinPool in order not to block other streams. To do this, one may create a Callable from the stream and submit it to the pool: List list = // A list of objects Stream stream = list.parallelStream().map(this::veryLongProcessing); Callable> task = () -> stream.collect(toList()); ForkJoinPool forkJoinPool = new ForkJoinPool(4); List newList = forkJoinPool.submit(task).get() This way, other parallel streams (using their own ForkJoinPool) will not be blocked by this one. In other words, we would need a pool of ForkJoinPool in order to avoid this problem. If a program is to be run inside a container, one must be very careful when using parallel streams. Never use the default pool in such a situation unless you know for sure that the container can handle it. In a Java EE container, do not use parallel streams. Previous articles What's Wrong with Java 8, Part I: Currying vs Closures What's Wrong in Java 8, Part II: Functions & Primitives

May 20, 2014

by Pierre-Yves Saumont

· 184,126 Views · 16 Likes

Correctly Using Apache Camel’s AdviceWith in Unit Tests

We care a lot about the stuff that goes around Solr and Elasticsearch in our client’s infrastructure. One area that seems to always be being reinvented for-better-or-worse is the data ETL/data ingest path from data source X to the search engine. One tool we’ve enjoyed using for basic ETL these days is Apache Camel. Camel is an extremely feature-rich Java data integration framework for wiring up just about anything to anything else. And by anything I mean anything: file system, databases, HTTP, search engines, twitter, IRC, etc. One area I initially struggled with with Camel was exactly how to test my code. Lets say I have defined a simple Camel route like this: from("file:inbox") .unmarshall(csv) // parse as CSV .split() // now we're operating on individual CSV lines .bean("customTransformation") // do some random operation on the CSV line .to("solr://localhost:8983/solr/collection1/update") Great! Now if you’ve gotten into Camel testing, you may know there’s something called “AdviceWith“. What is this interesting sounding thing? Well I think its a way of saying “take these routes and muck with them” — stub out this, intercept that and don’t forward, etc. Exactly the kind of slicing and dicing I’d like to do in my unit tests! I definitely recommend reading up on the docs, but here’s the real step-by-step built around where you’re probably going to get stuck (cause its where I got stuck!) getting AdviceWith to work for your tests. 1. Use CamelTestSupport Ok most importantly, we need to actually define a test that uses CamelTestSupport. CamelTestSupport automatically creates and starts our camel context for us. public class ItGoesToSolrTest extends CamelTestSupport { ... } 2. Specify the route builder we’re testing In our test, we need to tell CamelTestSupport where it can access its routes: @Override protected RouteBuilder createRouteBuilder() { return new MyProductionRouteBuilder(); } 3. Specify any beans we’d like to register Its probably the case that you’re using Java beans with Camel. If you’re using the bean integration and referring to beans by name in your camel routes, you’ll need to register those names with an instance of your class. @Override protected Context createJndiContext() throws Exception { JndiContext context = new JndiContext(); context.bind("customTransformation", new CustomTransformation()); return context; } 4. Monkey with our production routes using advice with Second we need to actually use the AdviceWithRouteBuilder before each test: @Before public void mockEndpoints() throws Exception { AdviceWithRouteBuilder mockSolr = new AdviceWithRouteBuilder() { @Override public void configure() throws Exception { // mock the for testing interceptSendToEndpoint("solr://localhost:8983/solr/collection1/update") .skipSendToOriginalEndpoint() .to("mock:catchSolrMessages"); } }) context.getRouteDefinition(1). .adviceWith(context, mockSolr); } There’s a couple things to notice here: In configure we simply snag an endpoint (in this case Solr) and then we have complete freedom to do whatever we want. In this case, we’re rewiring it to a mock endpoint we can use for testing. Notice how we get a route definition by index (in this case 1) to snag the route we’re testing and that we’d like to monkey with. This is how I’ve seen it in most Camel examples, and its hard to guess how Camel is going to assign some index to your route. A better way would be to give our route definition a name: from(“file:inbox”) .routeId(“csvToSolrRoute”) .unmarshall(csv) // parse as CSV then we can refer to this name when retrieving our route: context.getRouteDefinition("csvToSolrRoute"). .adviceWith(context, mockSolr); 5. Tell CamelTestSupport you want to manually start/stop camel One problem you will run into with the normal tutorials is that CamelTestSupport may start routes before your mocks have taken hold. Thus your mocked routes won’t be part of what CamelTestSupport has actually started. You’ll be pulling your hair out wondering why Camel insists on attempting to forward documents to an actual Solr instance and not your test endpoint. To take matters into your own hands, luckily CamelTestSupport comes to the rescue with a simple method you need to override to communicate your intent to manually start/stop the camel context: @Override public boolean isUseAdviceWith() { return true; } Then in your test, you’ll need to be sure to do: @Test public void foo() { context.start(); // tests! context.stop(); } 6. Write a test! Now you’re equipped to try out a real test! @Test public void testWithRealFile() { MockEndpoint mockSolr = getMockEndpoint("mock:catchSolrMessages"); File testCsv = getTestfile(); context.start(); mockSolr.expectedMessageCount(1); FileUtils.copyFile(testCsv, "inbox"); mockSolr.assertIsSatisfied(); context.stop(); } And that’s just scratching the surface of Camel’s testing capabilities. Check out the camel docs for information on stimulating endpoints directly with the ProducerTemplate thus letting you avoid using real files — and all kinds of goodies. Anyway, hopefully my experiences with AdviceWith can help you get it up and running in your tests! I’d love to hear about your experiences or any tips I’m missing either in the comments or [via email][5]. If you’d love to utilize Solr or Elasticsearch for search and analytics, but can’t figure out how to integrate them with your data infrastructure — contact us! Maybe there’s a camel recipe we could cook up for you that could do just the trick.

May 16, 2014

by Doug Turnbull

· 23,816 Views · 1 Like

Generating UML Class Diagrams from Code With ObjectAid

I've used this handy Eclipse plugin for years. But very few of my colleagues were aware of it. This fact surprises me and therefore I would like to highlight it. It is very handy for generation of UML class diagrams from your code. It is also invaluable for analyzing of existing design. One of the top UML tools is Enterprise Architect. But I was never happy with Enterprise Architect usability. I find it much easier to sketch class structure in Java and generate UML class diagrams by ObjectAid. Advantage of this approach is that you have basic skeleton of the module. You can check-it in, so that team can start working on implementation. Have to mention that it’s commercial product, but Class Diagram version is free. For my UML class diagrams needs (design analyze, generating UML diagrams for documentation, modules design sketching) it was enough so far. They provide also paid Sequence diagram and Diagram Add-On versions. To be honest I have never tried these. For sequence diagrams I was using only Enterprise Architect so far. I would be definitely pushing towards buying ObjectAid licenses if we wouldn’t have Enterprise Architect licenses already. There’s no point for me to provide detailed description of it, because ObjectAid site contains short and explanatory overview. I suggest to start with One-Minute Introduction. Everyone understand how inaccurate can UML class diagrams become. Keeping them up to date manually doesn’t make sense at all. ObjectAid fills this gap, because you can maintain your class diagrams up to date. But I had problems in the past when diagrams couldn’t handle renaming or moving of Java classes. Not sure if this is still problem of recent versions, because my diagrams are short lived. I can easily generate new diagram by just dragging appropriate classes into Class diagram editor. This is beauty of ObjectAid plugin. Links: ObjectAid site Installation instructions One-Minute Introduction

May 16, 2014

by Lubos Krnac

· 73,933 Views · 1 Like

Too Fast, Too Megamorphic: what influences method call performance in Java?

whats this all about then? let’s start with a short story. a few weeks back i proposed a change on the a java core libs mailing list to override some methods which are currently final . this stimulated several discussion topics - one of which was the extent to which a performance regression would be introduced by taking a method which was final and stopping it from being final . i had some ideas about whether there would be a performance regression or not, but i put these aside to try and enquire as to whether there were any sane benchmarks published on the subject. unfortunately i couldn’t find any. that’s not to say that they don’t exist or that other people haven’t investigated the situation, but that i didn't see any public peer-reviewed code. so - time to write some benchmarks. benchmarking methodology so i decided to use the ever-awesome jmh framework in order to put together these benchmarks. if you aren't convinced that a framework will help you get accurate benchmarking results then you should look at this talk by aleksey shipilev , who wrote the framework, or nitsan wakart's really cool blog post which explains how it helps. in my case i wanted to understand what influenced the performance of method invocation. i decided to try out different variations of methods calls and measure the cost. by having a set of benchmarks and changing only one factor at a time, we can individually rule out or understand how different factors or combinations of factors influence method invocation costs. inlining let's squish these method callsites down. simultaneously the most and least obvious influencing factor is whether there is a method call at all! it's possible for the actual cost of a method call to be optimized away entirely by the compiler. there are, broadly speaking, two ways to reduce the cost of the call. one is to directly inline the method itself, the other is to use an inline cache. don't worry - these are pretty simple concepts but there's a bit of terminology involved which needs to be introduced. let's pretend that we have a class called foo , which defines a method called bar . class foo { void bar() { ... } } we can call the bar method by writing code that looks like this: foo foo = new foo(); foo.bar(); the important thing here is the location where bar is actually invoked - foo.bar() - this is referred to as a callsite . when we say a method is being "inlined" what is means is that the body of the method is taken and plopped into the callsite, in place of a method call. for programs which consist of lots of small methods (i'd argue, a properly factored program) the inlining can result in a significant speedup. this is because the program doesn't end up spending most of its time calling methods and not actually doing work! we can control whether a method is inlined or not in jmh by using the compilercontrol annotations. we'll come back to the concept of an inline cache a bit later. hierarchy depth and overriding methods do parents slow their children down? if we're choosing to remove the final keyword from a method it means that we'll be able to override it. this is another factor which we consequently need to take into account. so i took methods and called them at different levels of a class hierarchy and also had methods which were overridden at different levels of the hierarchy. this allowed me to understand or eliminate how deep class hierarchies interfere with overriding costs. polymorphism animals: how any oo concept is described. when i mentioned the idea of a callsite earlier i sneakily avoided a fairly important issue. since it's possible to override a non- final method in a subclass, our callsites can end up invoking different methods. so perhaps i pass in a foo or it's child - baz - which also implements a bar(). how does your compiler know which method to invoke? methods are by default virtual (overridable) in java it has to lookup the correct method in a table, called a vtable, for every invocation. this is pretty slow, so optimizing compilers are always trying to reduce the lookup costs involved. one approach we mentioned earlier is inlining, which is great if your compiler can prove that only one method can be called at a given callsite. this is called a monomorphic callsite. unfortunately much of the time the analysis required to prove a callsite is monomorphic can end up being impractical. jit compilers tend to take an alternative approach of profiling which types are called at a callsite and guessing that if the callsite has been monomorphic for it's first n calls then it's worth speculatively optimising based on the assumption that it always will be monomorphic. this speculative optimisation is frequently correct, but because it's not always right the compiler needs to inject a guard before the method call in order to check the type of the method. monomorphic callsites aren't the only case we want to optimise for though. many callsites are what is termed bimorphic - there are two methods which can be invoked. you can still inline bimorphic callsites by using your guard code to check which implementation to call and then jumping to it. this is still cheaper than a full method invocation. it's also possible to optimise this case using an inline cache. an inline cache doesn't actually inline the method body into a callsite but it has a specialised jump table which acts like a cache on a full vtable lookup. the hotspot jit compiler supports bimorphic inline caches and declares that any callsite with 3 or more possible implementations is megamorphic . this splits out 3 more invocation situations for us to benchmark and investigate: the monomorphic case, the bimorphic case and the megamorphic case. results let's groups up results so it's easier to see the wood from the trees, i've presented the raw numbers along with a bit of analysis around them. the specific numbers/costs aren't really of that much interest. what is interesting is the ratios between different types of method call and that the associated error rates are low. there's quite a significant difference going on - 6.26x between the fastest and slowest. in reality the difference is probably larger because of the overhead associated with measuring the time of an empty method. the source code for these benchmarks is available on github . the results aren't all presented in one block to avoid confusion. the polymorphic benchmarks at the end come from running polymorphicbenchmark , whilst the others are from javafinalbenchmark simple callsites benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.finalinvoke avgt 25 2.606 0.007 ns/op c.i.j.javafinalbenchmark.virtualinvoke avgt 25 2.598 0.008 ns/op c.i.j.javafinalbenchmark.alwaysoverriddenmethod avgt 25 2.609 0.006 ns/op our first set of results compare the call costs of a virtual method, a final method and a method which has a deep hierarchy and gets overridden. note that in all these cases we've forced the compiler to not inline the methods. as we can see the difference between the times is pretty minimal and and our mean error rates show it to be of no great importance. so we can conclude that simply adding the final keyword isn't going to drastically improve method call performance. overriding the method also doesn't seem to make much difference either. inlining simple callsites benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.inlinablefinalinvoke avgt 25 0.782 0.003 ns/op c.i.j.javafinalbenchmark.inlinablevirtualinvoke avgt 25 0.780 0.002 ns/op c.i.j.javafinalbenchmark.inlinablealwaysoverriddenmethod avgt 25 1.393 0.060 ns/op now, we've taken the same three cases and removed the inlining restriction. again the final and virtual method calls end up being of a similar time to each other. they are about 4x faster than the non-inlineable case, which i would put down to the inlining itself. the always overridden method call here ends up being between the two. i suspect that this is because the method itself has multiple possible subclass implementations and consequently the compiler needs to insert a type guard. the mechanics of this are explained above in more detail under polymorphism . class hierachy impact benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.parentmethod1 avgt 25 2.600 0.008 ns/op c.i.j.javafinalbenchmark.parentmethod2 avgt 25 2.596 0.007 ns/op c.i.j.javafinalbenchmark.parentmethod3 avgt 25 2.598 0.006 ns/op c.i.j.javafinalbenchmark.parentmethod4 avgt 25 2.601 0.006 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod1 avgt 25 1.373 0.006 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod2 avgt 25 1.368 0.004 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod3 avgt 25 1.371 0.004 ns/op c.i.j.javafinalbenchmark.inlinableparentmethod4 avgt 25 1.371 0.005 ns/op wow - that's a big block of methods! each of the numbered method calls (1-4) refer to how deep up a class hierarchy a method was invoked upon. so parentmethod4 means we called a method declared on the 4th parent of the class. if you look at the numbers there is very little difference between 1 and 4. so we can conclude that hierarchy depth makes no difference. the inlineable cases all follow the same pattern: hierarchy depth makes no difference. our inlineable method performance is comparable to inlinablealwaysoverriddenmethod , but slower than inlinablevirtualinvoke . i would again put this down to the type guard being used. the jit compiler can profile the methods to figure out only one is inlined, but it can't prove that this holds forever. class hierachy impact on final methods benchmark mode samples mean mean error units c.i.j.javafinalbenchmark.parentfinalmethod1 avgt 25 2.598 0.007 ns/op c.i.j.javafinalbenchmark.parentfinalmethod2 avgt 25 2.596 0.007 ns/op c.i.j.javafinalbenchmark.parentfinalmethod3 avgt 25 2.640 0.135 ns/op c.i.j.javafinalbenchmark.parentfinalmethod4 avgt 25 2.601 0.009 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod1 avgt 25 1.373 0.004 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod2 avgt 25 1.375 0.016 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod3 avgt 25 1.369 0.005 ns/op c.i.j.javafinalbenchmark.inlinableparentfinalmethod4 avgt 25 1.371 0.003 ns/op this follows the same pattern as above - the final keyword seems to make no difference. i would have thought it was possible here, theoretically, for inlinableparentfinalmethod4 to be proved inlineable with no type guard but it doesn't appear to be the case. polymorphism monomorphic: 2.816 +- 0.056 ns/op bimorphic: 3.258 +- 0.195 ns/op megamorphic: 4.896 +- 0.017 ns/op inlinable monomorphic: 1.555 +- 0.007 ns/op inlinable bimorphic: 1.555 +- 0.004 ns/op inlinable megamorphic: 4.278 +- 0.013 ns/op finally we come to the case of polymorphic dispatch. monomorphoric call costs are roughly the same as our regular virtual invoke call costs above. as we need to do lookups on larger vtables, they become slower as the bimorphic and megamorphic cases show. once we enable inlining the type profiling kicks in and our monomorphic and bimorphic callsites come down the cost of our "inlined with guard" method calls. so similar to the class hierarchy cases, just a bit slower. the megamorphic case is still very slow. remember that we've not told hotspot to prevent inlining here, it just doesn't implement polymorphic inline cache for callsites more complex than bimorphic. what did we learn? i think it's worth noting that there are plenty of people who don't have a performance mental model that accounts for different types of method calls taking different amounts of time and plenty of people who understand they take different amounts of time but don't really have it quite right. i know i've been there before and made all sorts of bad assumptions. so i hope this investigation has been helpful to people. here's a summary of claims i'm happy to stand by. there is a big difference between the fastest and slowest types of method invocation. in practice the addition or removal of the final keyword doesn't really impact performance, but, if you then go and refactor your hierarchy things can start to slow down. deeper class hierarchies have no real influence on call performance. monomorphic calls are faster than bimorphic calls. bimorphic calls are faster than megamorphic calls. the type guard that we see in the case of profile-ably, but not provably, monomorphic callsites does slow things down quite a bit over a provably monomorphic callsite. i would say that the cost of the type guard is my personal "big revelation". it's something that i rarely see talked about and often dismissed as being irrelevant. caveats and further work of course this isn't a conclusive treatment of the topic area! this blog has just focussed on type related factors surrounding method invoke performance. one factor i've not mentioned is the heuristics surrounding method inlining due to body size or call stack depth. if your method is too large it won't get inlined at all, and you'll still end up paying for the cost of the method call. yet another reason to write small, easy to read, methods. i've not looked into how invoking over an interface affects any of these situations. if you've found this interesting then there's an investigation of invoke interface performance on the mechanical sympathy blog. one factor that we've completely ignored here is the impact of method inlining on other compiler optimisations. when compilers are performing optimisations which only look at one method (intra-procedural optimisation) they really want as much information as they can get in order to optimize effectively. the limitations of inlining can significantly reduce the scope that other optimisations have to work with. tying the explanation right down to the assembly level to dive into more detail on the issue. perhaps these are topics for a future blog post. thanks to aleksey shipilev for feedback on the benchmarks and to martin thompson , aleksey, martijn verburg , sadiq jaffer and chris west for the very helpful feedback on the blog post.

May 15, 2014

by Richard Warburton

· 10,821 Views

Debugging to Understand Finalizer

This post is covering one of the Java built-in concepts called Finalizer. This concept is actually both well-hidden and well-known, depending whether you have bothered to take a look at the java.lang.Object class thoroughly enough. Right in the java.lang.Object itself, there is a method called finalize(). The implementation of the method is empty, but both the power and dangers lie on the JVM internal behaviour based upon the presence of such method. When JVM detects that class has a finalize() method, magic starts to happen. So, lets go forward and create a class with a non-trivial finalize() method so we can see how differently JVM is handling objects in this case. For this, lets start by constructing an example program: Example of Finalizable class import java.util.concurrent.atomic.AtomicInteger; class Finalizable { static AtomicInteger aliveCount = new AtomicInteger(0); Finalizable() { aliveCount.incrementAndGet(); } @Override protected void finalize() throws Throwable { Finalizable.aliveCount.decrementAndGet(); } public static void main(String args[]) { for (int i = 0;; i++) { Finalizable f = new Finalizable(); if ((i % 100_000) == 0) { System.out.format("After creating %d objects, %d are still alive.%n", new Object[] {i, Finalizable.aliveCount.get() }); } } } } The example is creating new objects in an unterminated loop. These objects use static aliveCount variable to keep track how many instances have already been created. Whenever a new instance is created, the counter is incremented and whenever the finalize() is called after GC, the counter value is reduced. So what would you expect from such a simple code snippet? As the newly created objects are not referenced from anywhere, they should be immediately eligible for GC. So you might expect the code to run forever with the output of the program to be something similar to the following: After creating 345,000,000 objects, 0 are still alive. After creating 345,100,000 objects, 0 are still alive. After creating 345,200,000 objects, 0 are still alive. After creating 345,300,000 objects, 0 are still alive. Apparently this is not the case. The reality is completely different, for example in my Mac OS X on JDK 1.7.0_51, I see the program failing with java.lang.OutOfMemoryError: GC overhead limitt exceeded just about after ~1.2M objects have been created: After creating 900,000 objects, 791,361 are still alive. After creating 1,000,000 objects, 875,624 are still alive. After creating 1,100,000 objects, 959,024 are still alive. After creating 1,200,000 objects, 1,040,909 are still alive. Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.ref.Finalizer.register(Finalizer.java:90) at java.lang.Object.(Object.java:37) at eu.plumbr.demo.Finalizable.(Finalizable.java:8) at eu.plumbr.demo.Finalizable.main(Finalizable.java:19) Garbage Colletion behaviour To understand what is happening, we would need to take a look at our example code during the runtime. For this, lets run our example with -XX:+PrintGCDetails flag turned on: [GC [PSYoungGen: 16896K->2544K(19456K)] 16896K->16832K(62976K), 0.0857640 secs] [Times: user=0.22 sys=0.02, real=0.09 secs] [GC [PSYoungGen: 19440K->2560K(19456K)] 33728K->31392K(62976K), 0.0489700 secs] [Times: user=0.14 sys=0.01, real=0.05 secs] [GC-- [PSYoungGen: 19456K->19456K(19456K)] 48288K->62976K(62976K), 0.0601190 secs] [Times: user=0.16 sys=0.01, real=0.06 secs] [Full GC [PSYoungGen: 16896K->14845K(19456K)] [ParOldGen: 43182K->43363K(43520K)] 60078K->58209K(62976K) [PSPermGen: 2567K->2567K(21504K)], 0.4954480 secs] [Times: user=1.76 sys=0.01, real=0.50 secs] [Full GC [PSYoungGen: 16896K->16820K(19456K)] [ParOldGen: 43361K->43361K(43520K)] 60257K->60181K(62976K) [PSPermGen: 2567K->2567K(21504K)], 0.1379550 secs] [Times: user=0.47 sys=0.01, real=0.14 secs] --- cut for brevity--- [Full GC [PSYoungGen: 16896K->16893K(19456K)] [ParOldGen: 43351K->43351K(43520K)] 60247K->60244K(62976K) [PSPermGen: 2567K->2567K(21504K)], 0.1231240 secs] [Times: user=0.45 sys=0.00, real=0.13 secs] [Full GCException in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded [PSYoungGen: 16896K->16866K(19456K)] [ParOldGen: 43351K->43351K(43520K)] 60247K->60218K(62976K) [PSPermGen: 2591K->2591K(21504K)], 0.1301790 secs] [Times: user=0.44 sys=0.00, real=0.13 secs] at eu.plumbr.demo.Finalizable.main(Finalizable.java:19) From the logs we see that after just a few minor GCs cleaning Eden, the JVM turns to a lot more expensive Full GC cycles cleaning tenured and old space. Why so? As nothing is referring our objects, shouldn’t all the instances die young in Eden? What is wrong with our code? To understand, the reasons for GC behaving as it does, let us do just a minor change to the code and remove the body of the finalize() method. Now the JVM detects that our class does not need to be finalized and changes the behaviour back to “normal”. Looking at the GC logs we would see only cheap minor GCs running forever. As in this modified example nothing indeed refers to the objects in Eden (where all objects are born), the GC can do a very efficient job and discard the whole Eden at once. So immediately, we have cleansed the whole Eden, and the unterminated loop can continue forever. In our original example on the other hand, the situation is different. Instead of objects without any references, JVM creates a personal watchdog for each and every one of the Finalizableinstances. This watchdog is an instance of Finalizer. And all those instances in turn are referenced by the Finalizer class. So due to this reference chain, the whole gang stays alive. Now with the Eden full and all objects being referenced, GC has no other alternatives than to copy everything into Survivor space. Or worse, if the free space in Survivor is also limited, then expand to the Tenured space. As you might recall, GC in Tenured space is a completely different beast and is a lot more expensive than “lets throw away everything” approach used to clean Eden. Finalizer queue Only after the GC has finished, JVM understands that apart from the Finalizers nothing refers to our instances, so it can mark all Finalizers pointing to those instances to be ready for processing. So the GC internals add all Finalizer objects to a special queue atjava.lang.ref.Finalizer.ReferenceQueue. Only when all this hassle is completed our application threads can proceed with the actual work. One of those threads is now particularly interesting for us – the “Finalizer” daemon thread. You can see this thread in action by taking a thread dump via jstack: My Precious:~ demo$ jps 1703 Jps 1702 Finalizable My Precious:~ demo$ jstack 1702 --- cut for brevity --- "Finalizer" daemon prio=5 tid=0x00007fe33b029000 nid=0x3103 runnable [0x0000000111fd4000] java.lang.Thread.State: RUNNABLE at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:101) at java.lang.ref.Finalizer.access$100(Finalizer.java:32) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:190) --- cut for brevity --- From the above we see the “Finalizer” daemon thread running. “Finalizer” thread is a thread with just a single responsibility. The thread runs an unterminated loop blocked waiting for new instances to appear in java.lang.ref.Finalizer.ReferenceQueue queue. Whenever the “Finalizer” threads detects new objects in the queue, it pops the object, calls the finalize() method and removes the reference from Finalizer class, so the next time the GC runs the Finalizer and the referenced object can now be GCd. So we have two unterminated loops now running in two different threads. Our main thread is busy creating new objects. Those objects all have their personal watchdogs called Finalizer which are being added to the java.lang.ref.Finalizer.ReferenceQueue by the GC. And the “Finalizer” thread is processing this queue, popping all the instances from this queue and calling the finalize() methods on the instances. Most of the time you would get away with this. Calling the finalize() method should complete faster than we actually create new instances. So in many cases, the “Finalizer” thread would be able to catch up and empty the queue before the next GC pours more Finalizers into it. In our case, it is apparently not happening. Why so? The “Finalizer” thread is run at a lower priority than the main thread. In means that it gets less CPU time and is thus not able to catch up with the pace objects are being created. And here we have it – the objects are created faster than the “Finalizer” thread is able to finalize() them, causing all the available heap to be consumed. Result – different flavours of our dear friendjava.lang.OutOfMemoryError. If you still do not believe me, take a heap dump and take a look inside. For example, when our code snipped is launched with -XX:+HeapDumpOnOutOfMemoryError parameter, I see a following picture in Eclipse MAT Dominator Tree: As seen from the screenshot, my 64m heap is completely filled with Finalizers. Conclusions So to recap, the lifecycle of Finalizable objects is completely different from the standard behaviour, namely: The JVM will create the instance of Finalizable object The JVM will create an instance of the java.lang.ref.Finalizer, pointing to our newly created object instance. java.lang.ref.Finalizer class holds on to the java.lang.ref.Finalizer instance that was just created. This blocks next minor GC from collecting our objects and is keeping them alive. Minor GC is not able to clean the Eden and expands to Survivor and/or Tenured spaces. GC detects that the objects are eligible for finalizing and adds those objects to thejava.lang.ref.Finalizer.ReferenceQueue The queue will be processed by “Finalizer” thread, popping the objects one-by-one and calling their finalize() methods. After finalize() is called, the “Finalizer” thread removes the reference from Finalizer class, so during the next GC the objects are eligible to be GCd. The “Finalizer” thread competes with our “main” thread, but due to lower priority gets less CPU time and is thus never able to catch up. The program exhausts all available resources and throws OutOfMemoryError. Moral of the story? Next time, when you consider finalize() to be superior to the usual cleanup, teardown or finally blocks, think again. You might be happy with the clean code you produced, but the ever-growing queue of Finalizable objects thrashing your tenured and old generations might indicate the need to reconsider.

May 12, 2014

by Nikita Salnikov-Tarnovski

· 13,833 Views · 3 Likes

Java 8 Optional: How to Use it

Java 8 comes with a new Optional type, similar to what is available in other languages. This post will go over how this new type is meant to be used, namely what is it's main use case. What is the Optional type? Optional is a new container type that wraps a single value, if the value is available. So it's meant to convey the meaning that the value might be absent. Take for example this method: public Optional findCustomerWithSSN(String ssn) { ... } Returning Optional adds explicitly the possibility that there might not be a customer for that given social security number. This means that the caller of the method is explicitly forced by the type system to think about and deal with the possibility that there might not be a customer with that SSN. The caller will have to to something like this: Optional optional = findCustomerWithSSN(ssn); if (optional.isPresent()) { Customer customer = maybeCustomer.get(); ... use customer ... } else { ... deal with absence case ... } Or if applicable, provide a default value: Long value = findOptionalLong(ssn).orElse(0L); This use of optional is somewhat similar to the more familiar case of throwing checked exceptions. By throwing a checked exception, we use the compiler to enforce callers of the API to somehow handle an exceptional case. What is Optional trying to solve? Optional is an attempt to reduce the number of null pointer exceptions in Java systems, by adding the possibility to build more expressive APIs that account for the possibility that sometimes return values are missing. If Optional was there since the beginning, most libraries and applications would likely deal better with missing return values, reducing the number of null pointer exceptions and the overall number of bugs in general. What is Optional not trying to solve Optional is not meant to be a mechanism to avoid all types of null pointers. The mandatory input parameters of methods and constructors still have to be tested for example. Like when using null, Optional does not help with conveying the meaning of an absent value. In a similar way that null can mean many different things (value not found, etc.), so can an absent Optional value. The caller of the method will still have to check the javadoc of the method for understanding the meaning of the absent Optional, in order to deal with it properly. Also in a similar way that a checked exception can be caught in an empty block, nothing prevents the caller of calling get() and moving on. What is wrong with just returning null? The problem is that the caller of the function might not have read the javadoc for the method, and forget about handling the null case. This happens frequently and is one of the main causes of null pointer exceptions, although not the only one. How should Optional be used then? Optional should be used mostly as the return type of functions that might not return a value. In the context of domain driver development, this means certain service, repository or utility methods such as the one shown above. How should Optional NOT be used? Optional is not meant to be used in these contexts, as it won't buy us anything: in the domain model layer (not serializable) in DTOs (same reason) in input parameters of methods in constructor parameters How does Optional help with functional programming? In chained function calls, Optional provides method ifPresent(), that allows to chain functions that might not return values: findCustomerWithSSN(ssn).ifPresent(() -> System.out.println("customer exists!")); Useful Links This blog post from Oracle goes further into Optional and it's uses, comparing it with similar functionality in other languages - Tired of Null Pointer Exceptions This cheat sheet provides a thorough overview of Optional - Optional in Java 8 Cheat Sheet

May 12, 2014

by Vasco Cavalheiro

· 91,436 Views · 11 Likes

Groovy Goodness: BaseScript with Abstract Run Script Method

In a previous blog post we have seen how we can use a BaseScript AST transformation to set a base script class for running scripts. Since Groovy 2.3 we can apply the @BaseScript annotation on package and import statements. Also we can implement a run method in our Script class in which we call an abstract method. The abstract method will actually run the script, so we can execute code before and after the script code runs by implementing logic in the run method. In the following sample we create a Script class CustomScript. We implement the run method and add the abstract method runCode: // File: CustomScript.groovy package com.mrhaki.groovy.blog abstract class CustomScript extends Script { def run() { before() try { // Run actually script code. final result = runCode() println "Script says $result" } finally { println 'Script ended' } } private void before() { println 'Script starts' } // Abstract method as placeholder for // the actual script code to run. abstract def runCode() } Next we create a Groovy script where we use our new CustomScript class. // File: Sample.groovy // Since Groovy 2.3 we can apply the // @BaseScript annotation on package // and import statement. @groovy.transform.BaseScript(com.mrhaki.groovy.blog.CustomScript) package com.mrhaki.groovy.blog // Script code: final String value = 'Groovy rules' assert value.size() == 12 // Return value value When we run our script we see the following output: Before script runs Script says Groovy rules Script ended Code written with Groovy 2.3.

May 11, 2014

by Hubert Klein Ikkink

· 8,636 Views

Tracking Exceptions - Part 6 - Building an Executable Jar

If you’ve read the previous five blogs in this series, you’ll know that I’ve been building a Spring application that runs periodically to check a whole bunch of error logs for exceptions and then email you the results. Having written the code and the tests, and being fairly certain it’ll work the next and final step is to package the whole thing up and deploy it to a production machine. The actual deployment and packaging methods will depend upon your own organisation's processes and procedures. In this example, however, I’m going to choose the simplest way possible to create and deploy an executable JAR file. The first step was completed several weeks ago, and that’s defining our output as a JAR file in the Maven POM file, which, as you’ll probably already know, is done using the packaging element: jar It’s okay having a JAR file, but in this case there’s a further step involved: making it executable. To make a JAR file executable you need to add a MANIFEST.MF file and place it in a directory called META-INF. The manifest file is a file that describes the JAR file to both the JVM and human readers. As usual, there are a couple of ways of doing this, for example if you wanted to make life difficult for yourself, you could hand-craft your own file and place it in the META-INF directory inside the project’s src/main/resources directory. On the other hand, you could use themaven-jar plug-in and do it automatically. To do that, you need to to add the following to your POM file. org.apache.maven.plugins maven-jar-plugin 2.4 true com.captaindebug.errortrack.Main lib/ The interesting point here is the configuration element. It contains three sub-elements: addClasspath: this means that the plug-in will add the classpath to the MANIFEST.MF file so that the JVM can find all the support jars when running the app. mainClass: this tells the plug-in to add a Main-Class attribute to the MANIFEST.MF file, so that the JVM knows where to find the the entry point to the application. In this case it’s com.captaindebug.errortrack.Main classpathPrefix: this is really useful. It allows you to locate all the support jars in a different directory to the main part of the application. In this case I’ve chosen the very simple and short name of lib. If you run the build and then open up the resulting JAR file and extract and examine the /META-INF/MANIFEST.MFfile, you’ll find something rather like this: Manifest-Version: 1.0 Built-By: Roger Build-Jdk: 1.7.0_09 Class-Path: lib/spring-context-3.2.7.RELEASE.jar lib/spring-aop-3.2.7.RELEASE.jar lib/aopalliance-1.0.jar lib/spring-beans-3.2.7.RELEASE.jar lib/spring-core-3.2.7.RELEASE.jar lib/spring-expression-3.2.7.RELEASE.jar lib/slf4j-api-1.6.6.jar lib/slf4j-log4j12-1.6.6.jar lib/log4j-1.2.16.jar lib/guava-13.0.1.jar lib/commons-lang3-3.1.jar lib/commons-logging-1.1.3.jar lib/spring-context-support-3.2.7.RELEASE.jar lib/spring-tx-3.2.7.RELEASE.jar lib/quartz-1.8.6.jar lib/mail-1.4.jar lib/activation-1.1.jar Created-By: Apache Maven 3.0.4 Main-Class: com.captaindebug.errortrack.Main Archiver-Version: Plexus Archiver The last step is to marshall all the support jars into one directory, in this case the lib directory, so that the JVM can find them when you run the application. Again, there are two ways of approaching this: the easy way and the hard way. The hard way involves manually collecting together all the JAR files as defined by the POM (both direct and transient dependencies) and copying them to an output directory. The easy way involves getting the maven-dependency-plugin to do it for you. This involves adding the following to your POM file: org.apache.maven.plugins maven-dependency-plugin 2.5.1 copy-dependencies package copy-dependencies ${project.build.directory}/lib/ In this case you’re using the copy-dependencies goal executed in the package phase to copy all the project dependencies to the${project.build.directory}/lib/ directory - note that the final part of the directory path, lib, matches theclasspathPrefix setting from the previous step. In order to make life easier, I’ve also created a small run script: runme.sh: #!/bin/bash echo Running Error Tracking... java -jar error-track-1.0-SNAPSHOT.jar com.captaindebug.errortrack.Main And that’s about it. The application is just about complete. I’ve copied it to my build machine where it now monitors the Captain Debug Github sample apps and build. I could, and indeed may, add a few more features to the app. There are a few rough edges that need knocking off the code: for example is it best to run it as a separate app, or would it be a better idea to turn it into a web app? Furthermore, wouldn’t it be a good idea to ensure that the same errors aren’t reported twice? I may get around to thart soon... or maybe I'll talk about something else; so much to blog about so little time... The code for this blog is available on Github at: https://github.com/roghughe/captaindebug/tree/master/error-track. If you want to look at other blogs in this series take a look here… Tracking Application Exceptions With Spring Tracking Exceptions With Spring - Part 2 - Delegate Pattern Error Tracking Reports - Part 3 - Strategy and Package Private Tracking Exceptions - Part 4 - Spring's Mail Sender Tracking Exceptions - Part 5 - Scheduling With Spring

May 9, 2014

by Roger Hughes

· 9,740 Views

Understanding the Cloud Foundry Java Buildpack Code with Tomcat Example

Cloudfoundry's java buildpack is supporting some popular jvm based applications. This article is oriented to the audiences already with experience of cloudfoundry/heroku buildpack who want to have more understanding of how buildpack and cloudfoundry works internally. cf push app -p app.war -b build-pack-url The above command demonstrates the usage of pushing a war file to cloudfoundry by using a custom buildpack (E.g. https://github.com/cloudfoundry/java-buildpack). However, what exactly happens inside, or how cloudfoundry bootstrap the war file with tomcat? There are three contracts phase that bridge communication between buildpack and cloudfoundry. The three phases are detect, compile and release, which are three ruby shell scripts: Java buildpack has multiple sub components, while each of them has all of these three phases (E.g. tomcat is one of the sub components, while it contained another layer of sub components). Detect Phase: detect phase is to check whether a particular buildpack/component applies to the deployed application. Take the war file example, tomcat applies only when https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat.rb is true: def supports? web_inf? && !JavaBuildpack::Util::JavaMainUtils.main_class(@application) end The above code means, the tomcat applies when the application has a WEB-INF folder andthisisnot a main class bootstrapped application. Compile Phase: Compile phase would be the major/comprehensive work for a customized buildpack, while it is trying to build a file system on a lxc container. Take the example of our war application and tomcat example. In https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat/tomcat_instance.rb def compile download(@version, @uri) { |file| expand file } link_to(@application.root.children, root) @droplet.additional_libraries << tomcat_datasource_jar if tomcat_datasource_jar.exist? @droplet.additional_libraries.link_to web_inf_lib end def expand(file) with_timing "Expanding Tomcat to #{@droplet.sandbox.relative_path_from(@droplet.root)}" do FileUtils.mkdir_p @droplet.sandbox shell "tar xzf #{file.path} -C #{@droplet.sandbox} --strip 1 --exclude webapps 2>&1" @droplet.copy_resources end The above code is all about preparing the tomcat and link the application files, so the application files will be available for the tomcat classpath. Before going to the code, we have to understand the working directory when the above code executes: . => working directory .app => @application, contains the extracted war archive .buildpack/tomcat => @droplet.sandbox .buildpack/jdk .buildpack/other needed components Inside compile method: download method will download tomcat binary file (specified here: https://github.com/cloudfoundry/java-buildpack/blob/master/config/tomcat.yml), and then extract the archive file to @droplet.sandbox directory. Then copy the resources folder's files to https://github.com/cloudfoundry/java-buildpack/tree/master/resources/tomcat/conf to @droplet.sandbox/conf Symlink the @droplet.sandbox/webapps/ROOT to .app/ Symlink additional libraries (comes from other component rather than application) to the WEB-INF/lib Note: All the symlinks use relative path, since when the container deployed to DEA, the absolute paths would be different. RELEASE PHASE: Release phase is to setup instructions of how to start tomcat. Look at the code in :https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat.rb def command @droplet.java_opts.add_system_property 'http.port', '$PORT' [ @droplet.java_home.as_env_var, @droplet.java_opts.as_env_var, "$PWD/#{(@droplet.sandbox + 'bin/catalina.sh').relative_path_from(@droplet.root)}", 'run' ].flatten.compact.join(' ') end The above code does: Add java system properties http.port (referenced in tomcat server.xml) with environment properties ($PORT), this is the port on the DEA bridging to the lxc container already setup when the container was provisioned. instruction of how to run the tomcat Eg. "./bin/catalina.sh run"

May 9, 2014

by Shaozhen Ding

· 22,651 Views · 1 Like

Cyclop: A Web Based Editor for Cassandra Query Language

Cyclop is a web-based tool for querying Cassandra databases with features like syntax highlighting and query completion.

May 9, 2014

by Comsysto Gmbh

· 10,109 Views