DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Popular Topics

article thumbnail
Swiss Java Knife - A useful tool to add to your diagnostic tool-kit?
As a support consultant at C2B2 I am always looking for handy tools that may be able to help me or my team in diagnosing our customers middleware issues. So, when I came across a project called Swiss Java Knife promising tools for 'JVM monitoring, profiling and tuning' I figured I should take a look. It's basically a single jar file that allows you to run a number of tools most of which are similar to the ones that come bundled with the JDK. If you're interested in those tools my colleague Matt Brasier did a good introductory webinar which is available here: http://www.c2b2.co.uk/jvm_webinar_video Downloading Firstly I downloaded the latest jar file from github: https://github.com/aragozin/jvm-tools The source code is also available but for the purposes of this look into what it can offer the jar will suffice. What does it offer? Swiss Java Knife offers a number of commands: jps - Similar to the jps tool that comes with the JDK. ttop - Similar to the linux top command. hh - Similar to running the jmap tool that comes with the JDK with the -histo option. gc - Reports information about GC in real time. mx - Allows you to do basic operations with MBeans from the command line. mxdump - Dumps all MBeans of the target java process to JSON. Testing In order to test out the commands that are available I set up a Weblogic server and deployed an app containing a number of servlets that have known issues. These are then called via JMeter to show certain server behaviour: excessive Garbage Collection high CPU usage a memory leak Finding the process ID Normally to find the process ID I'd use the jps command that comes with the JDK. Swiss Java Knife has it's own version of the jps command so I tried that instead. Running the command: java -jar sjk-plus-0.1-2013-09-06.jar jps gives the following output: 5402org.apache.derby.drda.NetworkServerControl start 3250weblogic.Server 4032./ApacheJMeter.jar 3172weblogic.NodeManager -v 5427weblogic.Server 6523sjk-plus-0.1-2013-09-06.jar jps Which is basically the same as running the jps command with the -l option. There are a couple of additions where you can add filter options allowing you to pass in wild cards to match process descriptions or JVM system properties but overall it adds very little to the standard jps tool. jps -lv will generally give you everything you need. OK, so now we've got the process ID of our server we can start to look at what is going on. First of all, lets check garbage collection. Checking garbage collection OK. Now this one looks more promising. Swiss Java Knife has a command for collecting real time GC statistics. Let's give it a go. So, running the following command without my dodgy servlet running should give us a 'standard' reading: java -jar sjk-plus-0.1-2013-09-06.jar gc -p 3016 [GC: PS Scavenge#10471 time: 6ms interval: 113738ms mem: PS Survivor Space: 0k+96k->96k[max:128k,rate:0.84kb/s] PS Old Gen: 78099k+0k->78099k[max:349568k,rate:0.00kb/s] PS Eden Space: 1676k-1676k->0k[max:174464k,rate:-14.74kb/s]] [GC: PS MarkSweep#10436 time: 192ms interval: 40070ms mem: PS Survivor Space: 96k-96k->0k[max:128k,rate:-2.40kb/s] PS Old Gen: 78099k+7k->78106k[max:349568k,rate:0.19kb/s] PS Eden Space: 0k+0k->0k[max:174400k,rate:0.00kb/s]] PS Scavenge[ collections: 31 | avg: 0.0057 secs | total: 0.2 secs ] PS MarkSweep[ collections: 9 | avg: 0.1980 secs | total: 1.8 secs ] OK. Looks good. Useful to be able to get runtime GC info without having to rely on GC logs which are often not available. After running my dodgy servlet (containing a number System.gc() calls) we see the following: [GC: PS Scavenge#9787 time: 5ms interval: 38819ms mem: PS Survivor Space: 0k+64k->64k[max:192k,rate:1.65kb/s] PS Old Gen: 78062k+0k->78062k[max:349568k,rate:0.00kb/s] PS Eden Space: 204k-204k->0k[max:174336k,rate:-5.28kb/s]] [GC: PS MarkSweep#10200 time: 155ms interval: 112488ms mem: PS Survivor Space: 64k-64k->0k[max:192k,rate:-0.57kb/s] PS Old Gen: 78071k+0k->78071k[max:349568k,rate:0.00kb/s] PS Eden Space: 0k+0k->0k[max:174336k,rate:0.00kb/s]] PS Scavenge[ collections: 666 | avg: 0.0046 secs | total: 3.1 secs ] PS MarkSweep[ collections: 689 | avg: 0.1588 secs | total: 109.4 secs ] A big difference and although not a particularly realistic scenario it's certainly a useful tool for being able to quickly view runtime GC info. Next up we'll take a look at CPU usage. Checking CPU usage Swiss Java Knife has a command that works in a similar way to the linux top command which displays the top CPU processes. Running the following command should give us the top 10 CPU processes when running normally: java -jar sjk-plus-0.1-2013-09-06.jar ttop -n 10 -p 5427 -o CPU 2014-03-11T08:56:33.120-0700 Process summary process cpu=2.21% application cpu=0.67% (user=0.30% sys=0.37%) other: cpu=1.54% heap allocation rate 245kb/s [000001] user= 0.00% sys= 0.00% alloc= 0b/s - main [000002] user= 0.00% sys= 0.00% alloc= 0b/s - Reference Handler [000003] user= 0.00% sys= 0.00% alloc= 0b/s - Finalizer [000004] user= 0.00% sys= 0.00% alloc= 0b/s - Signal Dispatcher [000010] user= 0.00% sys= 0.00% alloc= 0b/s - Timer-0 [000011] user= 0.00% sys= 0.01% alloc= 96b/s - Timer-1 [000012] user= 0.00% sys= 0.01% alloc= 20b/s - [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' [000013] user= 0.00% sys= 0.00% alloc= 0b/s - weblogic.time.TimeEventGenerator [000014] user= 0.00% sys= 0.04% alloc= 245b/s - weblogic.timers.TimerThread [000017] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-7 So far so good, minimal CPU usage. Now I'll run my dodgy servlet and run it again: Hmmm, not so good: Unexpected error: java.lang.IllegalArgumentException: Comparison method violates its general contract! Try once again and we get the following: 2014-03-11T09:00:10.625-0700 Process summary process cpu=199.14% application cpu=189.87% (user=181.57% sys=8.30%) other: cpu=9.27% heap allocation rate 4945kb/s [000040] user=83.95% sys= 2.82% alloc= 0b/s - [ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)' [000038] user=93.71% sys=-0.44% alloc= 0b/s - [ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' [000044] user= 3.90% sys= 4.91% alloc= 4855kb/s - RMI TCP Connection(5)-127.0.0.1 [000001] user= 0.00% sys= 0.00% alloc= 0b/s - main [000002] user= 0.00% sys= 0.00% alloc= 0b/s - Reference Handler [000003] user= 0.00% sys= 0.00% alloc= 0b/s - Finalizer [000004] user= 0.00% sys= 0.00% alloc= 0b/s - Signal Dispatcher [000010] user= 0.00% sys= 0.00% alloc= 0b/s - Timer-0 [000011] user= 0.00% sys= 0.04% alloc= 1124b/s - Timer-1 [000012] user= 0.00% sys= 0.00% alloc= 0b/s - [STANDBY] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' So, the CPU usage is now through the roof (as expected). The main issue with this is that similar to the jps command it doesn't really offer much more than the top command. It also threw the exception above many times when trying to run commands ordered by CPU. Overall, it doesn't really add much to the command already available and unexpected errors are never good. Finally, we'll take a look at memory usage. Checking memory usage For checking memory usage Swiss Java Knife has a tool called hh which it claims is an extended version of jmap -histo. For those not familiar with jmap, it's another of the tools that comes with the JDK which prints shared object memory maps or heap memory details for a process. So, first of all I run my JMeter test that repeatedly calls my dodgy servlet. This time one that allocates multiple byte arrays each time it's called to simulate a memory leak. Although it claims to be an extended version of jmap -histo the only real addition is the ability to state how many buckets to view but this can be easily achieved by piping the output of jmap -histo through head. Aside from that the output is virtually identical. Output from jmap: num #instances #bytes class name ---------------------------------------------- 1: 42124 234260776 [B 2: 161472 24074512 3: 161472 21970928 4: 12853 15416848 5: 12853 10250656 6: 84735 9020400 [C 7: 10896 8943104 8: 91873 2939936 java.lang.String 9: 14021 1675576 java.lang.Class 10: 10311 1563520 [Ljava.lang.Object; Output from sjk: java -jar sjk-plus-0.1-2013-09-06.jar hh -n 10 -p 5427 1: 56626 386286072 [B 2: 161493 24076192 3: 161493 21973784 4: 12850 15409912 5: 12850 10249384 6: 10891 8936672 7: 83336 8577720 [C 8: 90525 2896800 java.lang.String 9: 14018 1675264 java.lang.Class 10: 9819 1579400 [Ljava.lang.Object; Total 996089 500086120 The only other tools available are the commands mxdump and mx which allow access to MBean attributes and operations. However, trying to run either of these resulted in a Null pointer exception. At this point I would generally download the code and start to poke about but by now I'd seen enough. Conclusion Although a nice idea it's very limited in what it offers. Under the covers it uses the Attach API so requires the JDK and not just the JRE in order to run so the majority of tools available are already provided with the standard JDK. There are a few additions to those tools but nothing that really makes it worthwhile using this instead. The only tool I could see myself using would be the real-time GC data gathering tool but this would only be of use where GC logs were unavailable and no other monitoring tools were available. The number of errors seen when running basic commands was also a concern, although this is just a project on github not a commercial offering and doesn't appear to be a particularly active project. So, a useful tool to add to your diagnostic tool-kit? Not in my opinion. It's certainly an interesting idea and with further work could be useful but for now I'd stick with the tools that are already available.
July 24, 2014
by Andy Overton
· 13,443 Views
article thumbnail
Building Extremely Large In-Memory InputStream for Testing Purposes
For some reason I needed extremely large, possibly even infinite InputStream that would simply return the same byte[]over and over. This way I could produce insanely big stream of data by repeating small sample. Sort of similar functionality can be found in Guava: Iterable Iterables.cycle(Iterable) and Iterator Iterators.cycle(Iterator). For example if you need an infinite source of 0 and 1, simply sayIterables.cycle(0, 1) and get 0, 1, 0, 1, 0, 1... infinitely. Unfortunately I haven't found such utility forInputStream, so I jumped into writing my own. This article documents many mistakes I made during that process, mostly due to overcomplicating and overengineering straightforward solution. We don't really need an infinite InputStream, being able to create very large one (say, 32 GiB) is enough. So we are after the following method: public static InputStream repeat(byte[] sample, int times) It basically takes sample array of bytes and returns an InputStream returning these bytes. However when sample runs out, it rolls over, returning the same bytes again - this process is repeated given number of times, until InputStreamsignals end. One solution that I haven't really tried but which seems most obvious: public static InputStream repeat(byte[] sample, int times) { final byte[] allBytes = new byte[sample.length * times]; for (int i = 0; i < times; i++) { System.arraycopy(sample, 0, allBytes, i * sample.length, sample.length); } return new ByteArrayInputStream(allBytes); } I see you laughing there! If sample is 100 bytes and we need 32 GiB of input repeating these 100 bytes, generatedInputStream shouldn't really allocate 32 GiB of memory, we must be more clever here. As a matter of fact repeat()above has another subtle bug. Arrays in Java are limited to 231-1 entries (int), 32 GiB is way above that. The reason this program compiles is a silent integer overflow here: sample.length * times. This multiplication doesn't fit in int. OK, let's try something that at least theoretically can work. My first idea was as follows: what if I create manyByteArrayInputStreams sharing the same byte[] sample (they don't do an eager copy) and somehow join them together? Thus I needed some InputStream adapter that could take arbitrary number of underlying InputStreams and chain them together - when first stream is exhausted, switch to next one. This awkward moment when you look for something in Apache Commons or Guava and apparently it was in the JDK forever... java.io.SequenceInputStream is almost ideal. However it can only chain precisely two underlying InputStreams. Of course since SequenceInputStreamis an InputStream itself, we can use it recursively as an argument to outer SequenceInputStream. Repeating this process we can chain arbitrary number of ByteArrayInputStreams together: public static InputStream repeat(byte[] sample, int times) { if (times <= 1) { return new ByteArrayInputStream(sample); } else { return new SequenceInputStream( new ByteArrayInputStream(sample), repeat(sample, times - 1) ); } } If times is 1, just wrap sample in ByteArrayInputStream. Otherwise use SequenceInputStream recursively. I think you can immediately spot what's wrong with this code: too deep recursion. Nesting level is the same as times argument, which will reach millions or even billions. There must be a better way. Luckily minor improvement changes recursion depth from O(n) to O(logn): public static InputStream repeat(byte[] sample, int times) { if (times <= 1) { return new ByteArrayInputStream(sample); } else { return new SequenceInputStream( repeat(sample, times / 2), repeat(sample, times - times / 2) ); } } Honestly this was the first implementation I tried. It's a simple application of divide and conquer principle, where we produce result by evenly splitting it into two smaller sub-problems. Looks clever, but there is one issue: it's easy to prove we create t (t = times) ByteArrayInputStreams and O(t) SequenceInputStreams. While sample byte array is shared, millions of various InputStream instances are wasting memory. This leads us to alternative implementation, creating just one InputStream, regardless value of times: import com.google.common.collect.Iterators; import org.apache.commons.lang3.ArrayUtils; public static InputStream repeat(byte[] sample, int times) { final Byte[] objArray = ArrayUtils.toObject(sample); final Iterator infinite = Iterators.cycle(objArray); final Iterator limited = Iterators.limit(infinite, sample.length * times); return new InputStream() { @Override public int read() throws IOException { return limited.hasNext() ? limited.next() & 0xFF : -1; } }; } We will use Iterators.cycle() after all. But before we have to translate byte[] into Byte[] since iterators can only work with objets, not primitives. There is no idiomatic way to turn array of primitives to array of boxed types, so I useArrayUtils.toObject(byte[]) from Apache Commons Lang. Having an array of objects we can create an infiniteiterator that cycles through values of sample. Since we don't want an infinite stream, we cut off infinite iterator usingIterators.limit(Iterator, int), again from Guava. Now we just have to bridge from Iterator toInputStream - after all semantically they represent the same thing. This solution suffers two problems. First of all it produces tons of garbage due to unboxing. Garbage collection is not that much concerned about dead, short-living objects, but still seems wasteful. Second issue we already faced previously:sample.length * times multiplication can cause integer overflow. It can't be fixed because Iterators.limit() takesint, not long - for no good reason. BTW we avoided third problem by doing bitwise and with 0xFF - otherwise byte with value -1 would signal end of stream, which is not the case. x & 0xFF is correctly translated to unsigned 255 (int). So even though implementation above is short and sweet, declarative rather than imperative, it's too slow and limited. If you have a C background, I can imagine how uncomfortable you were seeing me struggle. After all the most straightforward, painfully simple and low-level implementation was the one I came up with last: public static InputStream repeat(byte[] sample, int times) { return new InputStream() { private long pos = 0; private final long total = (long)sample.length * times; public int read() throws IOException { return pos < total ? sample[(int)(pos++ % sample.length)] : -1; } }; } GC free, pure JDK, fast and simple to understand. Let this be a lesson for you: start with the simplest solution that jumps to your mind, don't overengineer and don't be too smart. My previous solutions, declarative, functional, immutable, etc. - maybe they looked clever, but they were neither fast nor easy to understand. The utility we just developed was not just a toy project, it will be used later in subsequent article.
July 23, 2014
by Tomasz Nurkiewicz
· 7,527 Views
article thumbnail
VelocityEngine Spring Java Config
This is a first post in a series of short code snippets that will present the configuration of Spring beans from XML to Java. XML: resource.loader=class class.resource.loader.class=org.apache.velocity.runtime.resource.loader.ClasspathResourceLoader Java @Bean public VelocityEngine velocityEngine() throws VelocityException, IOException{ VelocityEngineFactoryBean factory = new VelocityEngineFactoryBean(); Properties props = new Properties(); props.put("resource.loader", "class"); props.put("class.resource.loader.class", "org.apache.velocity.runtime.resource.loader." + "ClasspathResourceLoader"); factory.setVelocityProperties(props); return factory.createVelocityEngine(); }
July 23, 2014
by Adrian Matei
· 13,192 Views
article thumbnail
Time - Memory Tradeoff With the Example of Java Maps
this article illustrates the general time - memory tradeoff with the example of different hash table implementations in java. the more memory a hash table takes, the faster each operation (e. g. getting a value by key or putting an entry) is performed. benchmarking method hash maps with int keys and int values were tested. memory measure is relative usage over theoretical minimum. for example, 1000 entries of int key and value take at least (4 (size of int) + 4) * 1000 = 8000 bytes. if the hash map implementation takes 20 000 bytes, it's memory overuse is (20 000 - 8000) / 8000 = 1.5. each implementation was benchmarked on 9 different load levels (load factors). on each load level, each map was filled with 10 numbers of entries, logariphmically evenly distributed bewteen 1000 and 10 000 000 (to study caching effects). then, for the same implementation and load level, memory metrics and average operation throughputs are averaged independently, over 3 smallest sizes (small sizes), 3 largest sizes (large sizes) and all 10 sizes from 1000 to 10 000 000 (all sizes). implementations: higher frequency trading collections (hftc) high performance primitive collections (hppc) fastutil collections goldman sachs collections (gs) trove collections mahout collections java.util.hashmap as a reference get value by key (successful) only looking at these charts, you can suppose that hftc, trove and mahout on the one size, fastutil, hppc and gs on the another use the same hash table algorithm. (in fact, it is not quite true.) sparser hash table on average performs less lookups during key search, therefore less memory reads, therefore the operation finishes earlier. notice, that on small sizes the largest maps are the fastest for all implementations, but on large and all sizes there isn't visible progress starting from memory overuse ~4. that's because when the total memory taken by the map goes beyond cpu cache capacity, cache misses become more often when the map is getting larger. this effect compensates algorithic trend. update (increment) value by key update operation behaves pretty similar to get(). fastutil wasn't benchmarked, because there aren't fairly performant method for this task in it's api. put an entry (key was absent) in this case, maps were gradually filled in with the entries from the size 0 to the target size (1000 - 10 000 000). rehash shouldn't occur, because maps were constructed with the target size provided. for small sizes, plots still looks like hyperbolas, but i can't explain so dramatic change on large sizes and differences between hftc and other primitive implementations. internal iteration (foreach) iteration is getting slower with memory usage growth. interesting thing about external iteration: for all open hash table implementations throughtput depends only memory usage, not even on load factor (which differs for implementations for the same memory usage). also, foreach throughput don't depend on open hash table size. external iteration (via iterator or cursor) external iteration performance is more varying than internal, because there is more freedom for optimization. hftc and trove employ own iteration interfaces, other libraries use standard java.util.iterator . footnote raw benchmark results from which the pictures were built with a link to the benchmarking code and information about the runsite in description.
July 22, 2014
by Roman Leventov
· 25,937 Views
article thumbnail
5 Reasons to Use a Java Data Grid in Your Application
In this post we explore 5 reasons to use a Java Data Grid for caching Java objects in-memory in your applications. In a later post we will explore some of the other data grid capabilities, beyond data storage, that can revolutionize your Java architectures, like on-grid computation and events. Memory is Fast Java Data Grids store Java objects in memory. Memory access is fast with low latency. So if access to data storage either disk or database is the primary bottleneck in your application then using a data grid as an in-memory cache in front of your storage tier will give you a performance boost. Scale out your Application Shared State If you need to share state across JVMs to scale out your application then using a Java Data Grid rather than a database will increase your scalability. A typical shared state architecture is shown below, the application server tier stores shared Java objects in the data grid and these objects are available to all application server nodes in your architecture. Separating the data grid tier from the application server tier has a number of advantages; Applications can be redeployed and restarted without losing the shared state Data Grid JVMs and Application JVMs can be tuned separately State can be shared across multiple different applications. Each tier can be scaled horizontally separately depending on work load Typical use cases for shared state include; PCI compliant storage of card security codes; In-game state in online games; web session data; prices and catalogues in ecommerce. Anything that needs low latency access can be stored in the shared data grid. High Availability for In-Memory Data As well as low latency access and scaling out shared state. Java Data Grids also provide high availability for your in-memory data. When storing Java objects in a data grid a primary object is stored in one of the Data Grid JVMs and secondary back up copies of the object are stored in different Data Grid JVM node, ensuring that if you lose a node then you don't lose any data. Clients of the data grid do not need to know where data is to access it so high availability is transparent to your application. Scale Out In-Memory Data Volumes Java objects, in data grids, aren't fully replicated across all Data Grid JVMs but are stored as a primary object and a secondary object. This means the more Data Grid JVM nodes we add the more JVM heap we have for storing Java objects in-memory (and remember memory is fast). For example if we build a Data Grid with 20 JVMs each with 4Gb free heap (after per JVM overhead) we could theoretically store 80Gb (4 times 20) of shared Java objects. If we assume we have 1 duplicate for high availability this cuts our storage in half so we can store 40Gb (.5 time 4 times 20 ) of Java Objects in memory. Native Integration with JPA Java Data Grids have native integration with JPA frameworks like TopLink and Hibernate whereby the Data Grid can act as a second level cache between JPA and the database. This can give a large performance boost to your database driven application if latency associated with database access is a key performance bottleneck.
July 22, 2014
by Steve Millidge
· 7,382 Views
article thumbnail
Tailing a File - Spring Websocket Sample
This is a sample that I have wanted to try for sometime - A Websocket application to tail the contents of a file. The following is the final view of the web-application: There are a few parts to this application: Generating a File to tail: I chose to use a set of 100 random quotes as a source of the file content, every few seconds the application generates a quote and writes this quote to the temporary file. Spring Integration is used for wiring this flow for writing the contents to the file: Just a quick note, Spring Integration flows can now also be written using a Java Based DSL, and this flow using Java is available here Tailing the file and sending the content to a broker The actual tailing of the file itself can be accomplished by OS specific tail command or by using a library like Apache Commons IO. Again in my case I decided to use Spring Integration which provides Inbound channel adapters to tail a file purely using configuration, this flow looks like this: and its working Java equivalent There is a reference to a "fileContentRecordingService" above, this is the component which will direct the lines of the file to a place where the Websocket client will subscribe to. Websocket server configuration Spring Websocket support makes it super simple to write a Websocket based application, in this instance the entire working configuration is the following: @Configuration @EnableWebSocketMessageBroker public class WebSocketDefaultConfig extends AbstractWebSocketMessageBrokerConfigurer { @Override public void configureMessageBroker(MessageBrokerRegistry config) { //config.enableStompBrokerRelay("/topic/", "/queue/"); config.enableSimpleBroker("/topic/", "/queue/"); config.setApplicationDestinationPrefixes("/app"); } @Override public void registerStompEndpoints(StompEndpointRegistry registry) { registry.addEndpoint("/tailfilesep").withSockJS(); } } This may seem a little over the top, but what these few lines of configuration does is very powerful and the configuration can be better understood by going through the reference here. In brief, it sets up a websocket endpoint at '/tailfileep' uri, this endpoint is enhanced with SockJS support, Stomp is used as a sub-protocol, endpoints `/topic` and `/queue` is configured to a real broker like RabbitMQ or ActiveMQ but in this specific to an in-memory one. Going back to the "fileContentRecordingService" once more, this component essentially takes the line of the file and sends it this in-memory broker, SimpMessagingTemplate facilitates this wiring: public class FileContentRecordingService { @Autowired private SimpMessagingTemplate simpMessagingTemplate; public void sendLinesToTopic(String line) { this.simpMessagingTemplate.convertAndSend("/topic/tailfiles", line); } } Websocket UI configuration The UI is angularjs based, the client controller is set up this way and internally uses the javascript libraries for sockjs and stomp support: var tailFilesApp = angular.module("tailFilesApp",[]); tailFilesApp.controller("TailFilesCtrl", function ($scope) { function init() { $scope.buffer = new CircularBuffer(20); } $scope.initSockets = function() { $scope.socket={}; $scope.socket.client = new SockJS("/tailfilesep); $scope.socket.stomp = Stomp.over($scope.socket.client); $scope.socket.stomp.connect({}, function() { $scope.socket.stomp.subscribe("/topic/tailfiles", $scope.notify); }); $scope.socket.client.onclose = $scope.reconnect; }; $scope.notify = function(message) { $scope.$apply(function() { $scope.buffer.add(angular.fromJson(message.body)); }); }; $scope.reconnect = function() { setTimeout($scope.initSockets, 10000); }; init(); $scope.initSockets(); }); The meat of this code is the "notify" function which the callback acting on the messages from the server, in this instance the new lines coming into the file and showing it in a textarea. This wraps up the entire application to tail a file. A complete working sample without any external dependencies is available at this github location, instructions to start it up is also available at that location. Conclusion Spring Websockets provides a concise way to create Websocket based applications, this sample provides a good demonstration of this support. I had presented on this topic recently at my local JUG (IndyJUG) and a deck with the presentation is available here
July 20, 2014
by Biju Kunjummen
· 12,964 Views · 2 Likes
article thumbnail
Grouping, Sampling and Batching - Custom Collectors in Java 8
Continuing from the first article, this time we will write some more useful custom collectors: for grouping by given criteria, sampling input, batching and sliding over with fixed size window. Grouping (counting occurrences, histogram) Imagine you have a collection of some items and you want to calculate how many times each item (with respect to equals()) appears in this collection. This can be achieved using CollectionUtils.getCardinalityMap() from Apache Commons Collections. This method takes an Iterable and returns Map, counting how many times each item appeared in the collection. However sometimes instead of usingequals() we would like to group by an arbitrary attribute of input T. For example say we have a list of Person objects and we would like to compute the number of males vs. females (i.e. Map) or maybe an age distribution. There is a built-in collector Collectors.groupingBy(Function classifier) - however it returns a map from key to all items mapped to that key. See: import static java.util.stream.Collectors.groupingBy; //... final List people = //... final Map> bySex = people .stream() .collect(groupingBy(Person::getSex)); It's valuable, but in our case unnecessarily builds two List. I only want to know the number of people. There is no such collector built-in, but we can compose it in a fairly simple manner: import static java.util.stream.Collectors.counting; import static java.util.stream.Collectors.groupingBy; //... final Map bySex = people .stream() .collect( groupingBy(Person::getSex, HashMap::new, counting())); This overloaded version of groupingBy() takes three parameters. First one is the key (classifier) function, as previously. Second argument creates a new map, we'll see shortly why it's useful. counting() is a nested collector that takes all people with same sex and combines them together - in our case simply counting them as they arrive. Being able to choose map implementation is useful e.g. when building age histogram. We would like to know how many people we have at given age - but age values should be sorted: final TreeMap byAge = people .stream() .collect( groupingBy(Person::getAge, TreeMap::new, counting())); byAge .forEach((age, count) -> System.out.println(age + ":\t" + count)); We ended up with a TreeMap from age (sorted) to count of people having that age. Sampling, batching and sliding window IterableLike.sliding() method in Scala allows to view a collection through a sliding fixed-size window. This window starts at the beginning and in each iteration moves by given number of items. Such functionality, missing in Java 8, allows several useful operators like computing moving average, splitting big collection into batches (compare with Lists.partition() in Guava) or sampling every n-th element. We will implement collector for Java 8 providing similar behaviour. Let's start from unit tests, which should describe briefly what we want to achieve: import static com.nurkiewicz.CustomCollectors.sliding @Unroll class CustomCollectorsSpec extends Specification { def "Sliding window of #input with size #size and step of 1 is #output"() { expect: input.stream().collect(sliding(size)) == output where: input | size | output [] | 5 | [] [1] | 1 | [[1]] [1, 2] | 1 | [[1], [2]] [1, 2] | 2 | [[1, 2]] [1, 2] | 3 | [[1, 2]] 1..3 | 3 | [[1, 2, 3]] 1..4 | 2 | [[1, 2], [2, 3], [3, 4]] 1..4 | 3 | [[1, 2, 3], [2, 3, 4]] 1..7 | 3 | [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]] 1..7 | 6 | [1..6, 2..7] } def "Sliding window of #input with size #size and no overlapping is #output"() { expect: input.stream().collect(sliding(size, size)) == output where: input | size | output [] | 5 | [] 1..3 | 2 | [[1, 2], [3]] 1..4 | 4 | [1..4] 1..4 | 5 | [1..4] 1..7 | 3 | [1..3, 4..6, [7]] 1..6 | 2 | [[1, 2], [3, 4], [5, 6]] } def "Sliding window of #input with size #size and some overlapping is #output"() { expect: input.stream().collect(sliding(size, 2)) == output where: input | size | output [] | 5 | [] 1..4 | 5 | [[1, 2, 3, 4]] 1..7 | 3 | [1..3, 3..5, 5..7] 1..6 | 4 | [1..4, 3..6] 1..9 | 4 | [1..4, 3..6, 5..8, 7..9] 1..10 | 4 | [1..4, 3..6, 5..8, 7..10] 1..11 | 4 | [1..4, 3..6, 5..8, 7..10, 9..11] } def "Sliding window of #input with size #size and gap of #gap is #output"() { expect: input.stream().collect(sliding(size, size + gap)) == output where: input | size | gap | output [] | 5 | 1 | [] 1..9 | 4 | 2 | [1..4, 7..9] 1..10 | 4 | 2 | [1..4, 7..10] 1..11 | 4 | 2 | [1..4, 7..10] 1..12 | 4 | 2 | [1..4, 7..10] 1..13 | 4 | 2 | [1..4, 7..10, [13]] 1..13 | 5 | 1 | [1..5, 7..11, [13]] 1..12 | 5 | 3 | [1..5, 9..12] 1..13 | 5 | 3 | [1..5, 9..13] } def "Sampling #input taking every #nth th element is #output"() { expect: input.stream().collect(sliding(1, nth)) == output where: input | nth | output [] | 1 | [] [] | 5 | [] 1..3 | 5 | [[1]] 1..6 | 2 | [[1], [3], [5]] 1..10 | 5 | [[1], [6]] 1..100 | 30 | [[1], [31], [61], [91]] } } Using data driven tests in Spock I managed to write almost 40 test cases in no-time, succinctly describing all requirements. I hope these are clear for you, even if you haven't seen this syntax before. I already assumed existence of handy factory methods: public class CustomCollectors { public static Collector>> sliding(int size) { return new SlidingCollector<>(size, 1); } public static Collector>> sliding(int size, int step) { return new SlidingCollector<>(size, step); } } The fact that collectors receive items one after another makes are job harder. Of course first collecting the whole list and sliding over it would have been easier, but sort of wasteful. Let's build result iteratively. I am not even pretending this task can be parallelized in general, so I'll leave combiner() unimplemented: public class SlidingCollector implements Collector>, List>> { private final int size; private final int step; private final int window; private final Queue buffer = new ArrayDeque<>(); private int totalIn = 0; public SlidingCollector(int size, int step) { this.size = size; this.step = step; this.window = max(size, step); } @Override public Supplier>> supplier() { return ArrayList::new; } @Override public BiConsumer>, T> accumulator() { return (lists, t) -> { buffer.offer(t); ++totalIn; if (buffer.size() == window) { dumpCurrent(lists); shiftBy(step); } }; } @Override public Function>, List>> finisher() { return lists -> { if (!buffer.isEmpty()) { final int totalOut = estimateTotalOut(); if (totalOut > lists.size()) { dumpCurrent(lists); } } return lists; }; } private int estimateTotalOut() { return max(0, (totalIn + step - size - 1) / step) + 1; } private void dumpCurrent(List> lists) { final List batch = buffer.stream().limit(size).collect(toList()); lists.add(batch); } private void shiftBy(int by) { for (int i = 0; i < by; i++) { buffer.remove(); } } @Override public BinaryOperator>> combiner() { return (l1, l2) -> { throw new UnsupportedOperationException("Combining not possible"); }; } @Override public Set characteristics() { return EnumSet.noneOf(Characteristics.class); } } I spent quite some time writing this implementation, especially correct finisher() so don't be frightened. The crucial part is a buffer that collects items until it can form one sliding window. Then "oldest" items are discarded and window slides forward by step. I am not particularly happy with this implementation, but tests are passing. sliding(N)(synonym to sliding(N, 1)) will allow calculating moving average of N items.sliding(N, N) splits input into batches of size N. sliding(1, N) takes every N-th element (samples). I hope you'll find this collector useful, enjoy!
July 18, 2014
by Tomasz Nurkiewicz
· 21,922 Views · 2 Likes
article thumbnail
From JPA to Hibernate's Legacy and Enhanced Identifier Generators
Read about enhanced identifier generators, like JPA and Hibernate.
July 16, 2014
by Vlad Mihalcea
· 19,431 Views
article thumbnail
Retrieving JMX information programmatically
Retrieving JMX information for a Java process is very easy when using a tool such as JConsole or JVisualVM. These provide an interface that allows viewing of information such as CPU usage, memory usage, threads active and more. This blog post gives an example of how to retrieve such information programmatically. In order to retrieve JMX information from a Java application, the target application must be configured to expose JMX information. This link shows how to do this. As an example, we shall be retrieving CPU and memory usage from a standalone Mule instance. In Mule, the JMX agent may be configured from a Mule configuration file. Through this, one may set the address that a JMX client can use to retrieve information; this is how. The following Java code allows for polling the JMX agent and retrieving memory, CPU usage and also shows how to remotely invoke the garbage collector: // create jmx connection with mules jmx agent JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://localhost:1098/server"); JMXConnector jmxc = JMXConnectorFactory.connect(url, null); jmxc.connect(); //create object instances that will be used to get memory and operating system Mbean objects exposed by JMX; create variables for cpu time and system time before Object memoryMbean = null; Object osMbean = null; long cpuBefore = 0; long tempMemory = 0; CompositeData cd = null; cpuBefore = Long.parseLong(a.toString()); // call the garbage collector before the test using the Memory Mbean jmxc.getMBeanServerConnection().invoke(new ObjectName("java.lang:type=Memory"), "gc", null, null); //create a loop to get values every second (optional) for (int i = 0; i < samplesCount; i++) { //get an instance of the HeapMemoryUsage Mbean memoryMbean = jmxc.getMBeanServerConnection().getAttribute(new ObjectName("java.lang:type=Memory"), "HeapMemoryUsage"); cd = (CompositeData) memoryMbean; //get an instance of the OperatingSystem Mbean osMbean = jmxc.getMBeanServerConnection().getAttribute(new ObjectName("java.lang:type=OperatingSystem"),"ProcessCpuTime"); System.out.println("Used memory: " + " " + cd.get("used") + " Used cpu: " + osMbean); //print memory usage tempMemory = tempMemory + Long.parseLong(cd.get("used").toString()); Thread.sleep(1000); //delay for one second } //get system time and cpu time from last poll long cpuAfter = Long.parseLong(osMbean.toString()); long cpuDiff = cpuAfter - cpuBefore; //find cpu time between our first and last jmx poll System.out.println("Cpu diff in milli seconds: " + cpuDiff / 1000000); //print cpu time in miliseconds System.out.println("average memory usage is: " + tempMemory / samplesCount);//print average memory usage The above example prints: ... Used memory: 23376624 Used cpu: 38060000000 Used memory: 24020624 Used cpu: 38080000000 Used memory: 24621920 Used cpu: 38090000000 Cpu diff in milli seconds: 4230 average memory usage is: 28028204 When the JMX agent may not be enabled for a Java process, it is also possible to retrieve information by fetching the process by id, for example. The following code shows how to do this using Sigar API: //create a sigar object Sigar sigar = new Sigar(); for (int i = 0; i < 100; i++) { ProcessFinder find = new ProcessFinder(sigar); //get the list of current java processes, and optionally query the list to choose which process to monitor long[] pidList = sigar.getProcList(); //assuming we know the process id, we may query the process finder long pid = find.findSingleProcess("Pid.Pid.eq=54730"); //get memory info for the process id ProcMem memory = new ProcMem(); memory.gather(sigar, pid); //get cou info for the oricess id ProcCpu cpu = new ProcCpu(); cpu.gather(sigar, pid); //print the memory used by the process id System.out.println("Current memory used: " + Long.toString(memory.getSize())); //print all memory info System.out.println(memory.toMap()); //print all cpu info System.out.println(cpu.toMap()); Thread.sleep(1000); } This is displayed when running the above example: Current memory used: 3257659392 {Resident=258789376, PageFaults=109613, Size=3257659392} {User=34973, LastTime=1404467787774, Percent=0.0, StartTime=1404467383826, Total=38121, Sys=3148}
July 16, 2014
by Gabriel Dimech
· 31,478 Views
article thumbnail
The Observer Pattern in Java
Get an overview of the Java observer pattern using an inventory example.
July 16, 2014
by Roohi Agrawala
· 113,183 Views · 27 Likes
article thumbnail
The Java Origins of Angular JS: Angular vs JSF vs GWT
Get familiar with the Angular JS origin story.
July 15, 2014
by Vasco Cavalheiro
· 85,896 Views · 5 Likes
article thumbnail
Hibernate Identity, Sequence and Table (Sequence) Generator
Learn about Identity, Sequence, and Table in Hibernate.
July 9, 2014
by Vlad Mihalcea
· 178,000 Views · 2 Likes
article thumbnail
Turning Recursive File System Traversal Into Stream
When I was learning programming, back in the days of Turbo Pascal, I managed to list files in directory using FindFirst,FindNext and FindClose functions. First I came up with a procedure printing contents of a given directory. You can imagine how proud I was to discover I can actually call that procedure from itself to traverse file system recursively. Well, I didn't know the term recursion back then, but it worked. Similar code in Java would look something like this: public void printFilesRecursively(final File folder) { for (final File entry : listFilesIn(folder)) { if (entry.isDirectory()) { printFilesRecursively(entry); } else { System.out.println(entry.getAbsolutePath()); } } } private File[] listFilesIn(File folder) { final File[] files = folder.listFiles(); return files != null ? files : new File[]{}; } Didn't know File.listFiles() can return null, did ya? That's how it signals I/O errors, like if IOException never existed. But that's not the point. System.out.println() is rarely what we need, thus this method is neither reusable nor composable. It is probably the best counterexample of Open/Closed principle. I can imagine several use cases for recursive traversal of file system: Getting a complete list of all files for display purposes Looking for all files matching given pattern/property (also check out File.list(FilenameFilter)) Searching for one particular file Processing every single file, e.g. sending it over network Every use case above has a unique set of challenges. For example we don't want to build a list of all files because it will take a significant amount of time and memory before we can start processing it. We would like to process files as they are discovered and lazily - by pipe-lining computation (but without clumsy visitor pattern). Also we want to short-circuit searching to avoid unnecessary I/O. Luckily in Java 8 some of these issues can be addressed with streams: final File home = new File(FileUtils.getUserDirectoryPath()); final Stream files = Files.list(home.toPath()); files.forEach(System.out::println); Remember that Files.list(Path) (new in Java 8) does not look into subdirectories - we'll fix that later. The most important lesson here is: Files.list() returns a Stream - a value that we can pass around, compose, map, filter, etc. It's extremely flexible, e.g. it's fairly simple to count how many files I have in a directory per extension: import org.apache.commons.io.FilenameUtils; //... final File home = new File(FileUtils.getUserDirectoryPath()); final Stream files = Files.list(home.toPath()); final Map> byExtension = files .filter(path -> !path.toFile().isDirectory()) .collect(groupingBy(path -> getExt(path))); byExtension. forEach((extension, matchingFiles) -> System.out.println( extension + "\t" + matchingFiles.size())); //... private String getExt(Path path) { return FilenameUtils.getExtension(path.toString()).toLowerCase(); } OK, just another API, you might say. But it becomes really interesting once we need to go deeper, recursively traversing subdirectories. One amazing feature of streams is that you can combine them with each other in various ways. Old Scala saying "flatMap that shit" is applicable here as well, check out this recursive Java 8 code: //WARNING: doesn't compile, yet: private static Stream filesInDir(Path dir) { return Files.list(dir) .flatMap(path -> path.toFile().isDirectory() ? filesInDir(path) : singletonList(path).stream()); } Stream lazily produced by filesInDir() contains all files within directory including subdirectories. You can use it as any other stream by calling map(), filter(), anyMatch(), findFirst(), etc. But how does it really work?flatMap() is similar to map() but while map() is a straightforward 1:1 transformation, flatMap() allows replacing single entry in input Stream with multiple entries. If we had used map(), we would have end up with Stream>(or maybe Stream>). But flatMap() flattens this structure, in a way exploding inner entries. Let's see a simple example. Imagine Files.list() returned two files and one directory. For files flatMap() receives a one-element stream with that file. We can't simply return that file, we have to wrap it, but essentially this is no-operation. It gets way more interesting for a directory. In that case we call filesInDir() recursively. As a result we get a stream of contents of that directory, which we inject into our outer stream. Code above is short, sweet and... doesn't compile. These pesky checked exceptions again. Here is a fixed code, wrapping checked exceptions for sanity: public static Stream filesInDir(Path dir) { return listFiles(dir) .flatMap(path -> path.toFile().isDirectory() ? filesInDir(path) : singletonList(path).stream()); } private static Stream listFiles(Path dir) { try { return Files.list(dir); } catch (IOException e) { throw Throwables.propagate(e); } } Unfortunately this quite elegant code is not lazy enough. flatMap() evaluates eagerly, thus it always traverses all subdirectories, even if we barely ask for first file. You can try with my tiny LazySeq library that tries to provide even lazier abstraction, similar to streams in Scala or lazy-seq in Clojure. But even standard JDK 8 solution might be really helpful and simplify your code significantly.
July 9, 2014
by Tomasz Nurkiewicz
· 6,934 Views
article thumbnail
Spring Security Run-As example using annotations and namespace configuration
Spring Security offers an authentication replacement feature, often referred to as Run-As, that can replace the current user's authentication (and thus permissions) during a single secured object invocation. Using this feature makes sense when a backend system invoked during request processing requires different privileges than the current application. For example, an application might want to expose a financial transaction log to the currently logged in user, but the backend system that provides it only permits this action to the members of a special "auditor" role. The application can not simply assign this role to the user as that would potentially permit them to execute other restricted actions. Instead, the user can be given this right exclusively for viewing their transaction log. Only two classes are used to implement this feature. Instances of RunAsManager are tasked with producing the actual replacement authentication tokens. A sensible default implementation is already provided by Spring Security. As with other types of authentication, it is also necessary to register an instance of an appropriate AuthenticationProvider. Tokens produced by runAsManager are signed with the provided key (my_run_as_key in the example above) and are later checked against the same key by runAsAuthenticationProvider, in order to mitigate the risk of fake tokens being provided. These keys can have any value, but need to be the same in both objects. Otherwise, runAsAuthenticationProvider will reject the produced tokens as invalid. If an instance is registered, RunAsManager will be invoked by AbstractSecurityInterceptor for every intercepted object invocation for which the user has already been given access. If RunAsManager returns a token, this token will be used be used instead of the original one for the duration of the invocation, thus granting the user different privileges. There are two key points here. In order for the authentication replacement feature to do anything, the call has to actually be secured (and thus intercepted), and the user has to already have been granted access. To register a RunAsManager instance with the method security interceptor, something similar to the following is needed: Now, all methods secured by the @Secured annotation will be able to trigger RunAsManager. One important point here is that global-method-security will only work in the Spring context in which it is defined. In Spring MVC applications, there usually are two Spring contexts: the parent context, attached to ContextLoaderListener, and the child context, attached toDispatcherServlet. To secure Controller methods in this way, global-method-security must be added to DispatcherServlet's context. To secure methods in beans not in this context, global-method-security should also be added to ContextLoaderListener's context. Otherwise, security annotations will be ignored. The default implementation of RunAsManager (RunAsManagerImpl) will inspect the secured object's configuration and if it finds any attributes prefixed with RUN_AS_, it will create a token identical to the original, with the addition of one new GrantedAuthorty per RUN_AS_ attribute found. The new GrantedAuthority will be a role (prefixed by ROLE_ by default) named like the found attribute without the RUN_AS_ prefix. So, if a user with a role ROLE_REGISTERED_USER invokes a method annotated with @Secured({"ROLE_REGISTERED_USER","RUN_AS_AUDITOR"}), e.g. @Controller public class TransactionLogController { @Secured({"ROLE_REGISTERED_USER","RUN_AS_AUDITOR"}) //Authorities needed for method access and authorities added by RunAsManager prefixed with RUN_AS_ @RequestMapping(value = "/transactions", method = RequestMethod.GET) //Spring MVC configuration. Not related to security @ResponseBody //Spring MVC configuration. Not related to security public List getTransactionLog(...) { ... //Invoke something in the backend requiring ROLE_AUDITOR { ... //User does not have ROLE_AUDITOR here } the resulting token created by RunAsManagerImpl with be granted ROLE_REGISTERED_USER and ROLE_AUDITOR. Thus, the user will also be allowed actions, normally reserved for ROLE_AUDITOR members, during the current invocation, permitting them, in this case, to access the transaction log.To enable runAsAuthenticationProvider, register it as usual: ... other authentication-providers used by the application ... This is all that is necessary to have the default implementation activated. Still, this setting will not work for methods secured by @PreAuthorize and @PostAuthorize annotations as their configuration attributes are differently evaluated (they are SpEL expressions and not a simple list or required authorities like with @Secured) and will not be recognized by RunAsManagerImpl. For this scenario to work, a custom RunAsManager implementation is required, as, at least at the time of writing, no applicable implementation is provided by Spring. A custom RunAsManager implementation for use with @PreAuthorize/@PostAuthorize A convenient implementation relying on a custom annotation is provided below: public class AnnotationDrivenRunAsManager extends RunAsManagerImpl { @Override public Authentication buildRunAs(Authentication authentication, Object object, Collection attributes) { if(!(object instanceof ReflectiveMethodInvocation) || ((ReflectiveMethodInvocation)object).getMethod().getAnnotation(RunAsRole.class) == null) { return super.buildRunAs(authentication, object, attributes); } String roleName = ((ReflectiveMethodInvocation)object).getMethod().getAnnotation(RunAsRole.class).value(); if (roleName == null || roleName.isEmpty()) { return null; } GrantedAuthority runAsAuthority = new SimpleGrantedAuthority(roleName); List newAuthorities = new ArrayList(); // Add existing authorities newAuthorities.addAll(authentication.getAuthorities()); // Add the new run-as authority newAuthorities.add(runAsAuthority); return new RunAsUserToken(getKey(), authentication.getPrincipal(), authentication.getCredentials(), newAuthorities, authentication.getClass()); } } This implementation will look for a custom @RunAsRole annotation on a protected method (e.g. @RunAsRole("ROLE_AUDITOR")) and, if found, will add the given authority (ROLE_AUDITOR in this case) to the list of granted authorities. RunAsRole itself is just a simple custom annotation: @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.METHOD) public @interface RunAsRole { String value(); } This new implementation would be instantiated in the same way as before: And registered in a similar fashion: The expression-handler is always required for pre-post-annotations to work. It is a part of the standard Spring Security configuration, and not related to the topic described here. Both pre-post-annotations and secured-annotations can be enabled at the same time, but should never be used in the same class. The protected controller method from above could now look like this: @Controller public class TransactionLogController { @PreAuthorize("hasRole('ROLE_REGISTERED_USER')") //Authority needed to access the method @RunAsRole("ROLE_AUDITOR") //Authority added by RunAsManager @RequestMapping(value = "/transactions", method = RequestMethod.GET) //Spring MVC configuration. Not related to security @ResponseBody //Spring MVC configuration. Not related to security public List getTransactionLog(...) { ... //Invoke something in the backend requiring ROLE_AUDITOR { ... //User does not have ROLE_AUDITOR here }
July 7, 2014
by Bojan Tomić
· 23,049 Views · 1 Like
article thumbnail
SpringBoot: Introducing SpringBoot
SpringBoot...there is a lot of buzz about SpringBoot nowadays. So what is SpringBoot? SpringBoot is a new spring portfolio project which takes opinionated view of building production-ready Spring applications by drastically reducing the amount of configuration required. Spring Boot is taking the convention over configuration style to the next level by registering the default configurations automatically based on the classpath libraries available at runtime. Well.. you might have already read this kind of introduction to SpringBoot on many blogs. So let me elaborate on what SpringBoot is and how it helps developing Spring applications more quickly. Spring framework was created by Rod Johnson when many of the Java developers are struggling with EJB 1.x/2.x for building enterprise applications. Spring framework makes developing the business components easy by using Dependency Injection and Aspect Oriented Programming concepts. Spring became very popular and many more Spring modules like SpringSecurity, Spring Batch, Spring Data etc become part of Spring portfolio. As more and more features added to Spring, configuring all the spring modules and their dependencies become a tedious task. Adding to that Spring provides atleast 3 ways of doing anything :-). Some people see it as flexibility and some others see it as confusing. Slowly, configuring all the Spring modules to work together became a big challenge. Spring team came up with many approaches to reduce the amount of configuration needed by introducing Spring XML DSLs, Annotations and JavaConfig. In the very beginning I remember configuring a big pile of jar version declarations in section and lot of declarations. Then I learned creating maven archetypes with basic structure and minimum required configurations. This reduced lot of repetitive work, but not eliminated completely. Whether you write the configuration by hand or generate by some automated ways, if there is code that you can see then you have to maintain it. So whether you use XML or Annotations or JavaConfig, you still need to configure(copy-paste) the same infrastructure setup one more time. On the other hand, J2EE (which is dead long time ago) emerged as JavaEE and since JavaEE6 it became easy (compared to J2EE and JavaEE5) to develop enterprise applications using JavaEE platform. Also JavaEE7 released with all the cool CDI, WebSockets, Batch, JSON support etc things became even more simple and powerful as well. With JavaEE you don't need so much XML configuration and your war file size will be in KBs (really??? for non-helloworld/non-stageshow apps also :-)). Naturally this "convention over configuration" and "you no need to glue APIs together appServer already did it" arguments became the main selling points for JavaEE over Spring. Then Spring team addresses this problem with SpringBoot :-). Now its time to JavaEE to show whats the SpringBoot's counterpart in JavaEE land :-) JBoss Forge?? I love this Spring vs JavaEE thing which leads to the birth of powerful tools which ultimately simplify the developers life :-). Many times we need similar kind of infrastructure setup using same libraries. For example, take a web application where you map DispatcherServlet url-pattern to "/", implement RESTFul webservices using Jackson JSON library with Spring Data JPA backend. Similarly there could be batch or spring integration applications which needs similar infrastructure configuration. SpringBoot to the rescue. SpringBoot look at the jar files available to the runtime classpath and register the beans for you with sensible defaults which can be overridden with explicit settings. Also SpringBoot configure those beans only when the jars files available and you haven't define any such type of bean. Altogether SpringBoot provides common infrastructure without requiring any explicit configuration but lets the developer overrides if needed. To make things more simpler, SpringBoot team provides many starter projects which are pre-configured with commonly used dependencies. For example Spring Data JPA starter project comes with JPA 2.x with Hibernate implementation along with Spring Data JPA infrastructure setup. Spring Web starter comes with Spring WebMVC, Embedded Tomcat, Jackson JSON, Logback setup. Aaah..enough theory..lets jump onto coding. I am using latest STS-3.5.1 IDE which provides many more starter project options like Facebbok, Twitter, Solr etc than its earlier version. Create a SpringBoot starter project by going to File -> New -> Spring Starter Project -> select Web and Actuator and provide the other required details and Finish. This will create a Spring Starter Web project with the following pom.xml and Application.java 4.0.0 com.sivalabs hello-springboot 1.0-SNAPSHOT jar hello-springboot Spring Boot Hello World org.springframework.boot spring-boot-starter-parent 1.1.3.RELEASE org.springframework.boot spring-boot-starter-actuator org.springframework.boot spring-boot-starter-web org.springframework.boot spring-boot-starter-test test UTF-8 com.sivalabs.springboot.Application 1.7 org.springframework.boot spring-boot-maven-plugin package com.sivalabs.springboot; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.EnableAutoConfiguration; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; @Configuration @ComponentScan @EnableAutoConfiguration public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } } Go ahead and run this class as a standalone Java class. It will start the embedded Tomcat server on 8080 port. But we haven't added any endpoints to access, lets go ahead and add a simple REST endpoint. @Configuration @ComponentScan @EnableAutoConfiguration @Controller public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } @RequestMapping(value="/") @ResponseBody public String bootup() { return "SpringBoot is up and running"; } } Now point your browser to http://localhost:8080/ and you should see the response "SpringBoot is up and running". Remember while creating project we have added Actuator starter module also. With Actuator you can obtain many interesting facts about your application. Try accessing the following URLs and you can see lot of runtime environment configurations that are provided by SpringBoot. http://localhost:8080/beans http://localhost:8080/metrics http://localhost:8080/trace http://localhost:8080/env http://localhost:8080/mappings http://localhost:8080/autoconfig http://localhost:8080/dump SpringBoot actuator deserves a dedicated blog post to cover its vast number of features, I will cover it in my upcoming posts. I hope this article provides some basic introduction to SpringBoot and how it simplifies the Spring application development. More on SpringBoot in upcoming articles. - See more at: http://www.sivalabs.in/2014/07/springboot-introducing-springboot.html#sthash.7syCIt8V.dpuf
July 4, 2014
by Siva Prasad Reddy Katamreddy
· 12,389 Views · 5 Likes
article thumbnail
What is Scalable Machine Learning?
scalability has become one of those core concept slash buzzwords of big data. it’s all about scaling out, web scale, and so on. in principle, the idea is to be able to take one piece of code and then throw any number of computers at it to make it fast. the terms “scalable” and “large scale” have been used in machine learning circles long before there was big data. there had always been certain problems which lead to a large amount of data, for example in bioinformatics, or when dealing with large number of text documents. so finding learning algorithms, or more generally data analysis algorithms which can deal with a very large set of data was always a relevant question. interestingly, this issue of scalability were seldom solved using actual scaling in in machine learning, at least not in the big data kind of sense. part of the reason is certainly that multicore processors didn’t yet exist at the scale they do today and that the idea of “just scaling out” wasn’t as pervasive as it is today. instead, “scalable” machine learning is almost always based on finding more efficient algorithms, and most often, approximations to the original algorithm which can be computed much more efficiently. to illustrate this, let’s search for nips papers (the annual advances in neural information processing systems, short nips, conference is one of the big ml community meetings) for papers which have the term “scalable” in the title. here are some examples: scalable inference for logistic-normal topic models … this paper presents a partially collapsed gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation … partially collapsed gibbs sampling is a kind of estimation algorithm for certain graphical models. a scalable approach to probabilistic latent space inference of large-scale networks … with […] an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices […] on a single machine in a matter of hours … stochastic variational inference algorithm is both an approximation and an estimation algorithm. scalable kernels for graphs with continuous attributes … in this paper, we present a class of path kernels with computational complexity $o(n^2(m + \delta^2 ))$ … and this algorithm has squared runtime in the number of data points, so wouldn’t even scale out well even if you could. usually, even if there is potential for scalability, it usually something that is “embarassingly parallel” (yep, that’s a technical term), meaning that it’s something like a summation which can be parallelized very easily. still, the actual “scalability” comes from the algorithmic side. so how do scalable ml algorithms look like? a typical example are the stochastic gradient descent (sgd) class of algorithms. these algorithms can be used, for example, to train classifiers like linear svms or logistic regression. one data point is considered at each iteration. the prediction error on that point is computed and then the gradient is taken with respect to the model parameters, giving information about how to adapt these parameters slightly to make the error smaller. vowpal wabbit is one program based on this approach and it has a nice definition of what it considers to mean scalable in machine learning: there are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. this project is about approach (b), and it’s reached a state where it may be useful to others as a platform for research and experimentation. so “scalable” means having a learning algorithm which can deal with any amount of data, without consuming ever growing amounts of resources like memory. for sgd type algorithms this is the case, because all you need to store are the model parameters, usually a few ten to hundred thousand double precision floating point value, so maybe a few megabytes in total. the main problem to speed this kind of computation up is how to stream the data by fast enough. to put it differently, not only does this kind of scalability not rely on scaling out, it’s actually not even necessary or possible to scale the computation out because the main state of the computation easily fits into main memory and computations on it cannot be distributed easily. i know that gradient descent is often taken as an example for map reduce and other approaches like in this paper on the architecture of spark , but that paper discusses a version of gradient descent where you are not taking one point at a time, but aggregate the gradient information for the whole data set before making the update to the model parameters. while this can be easily parallelized, it does not perform well in practice because the gradient information tends to average out when computed over the whole data set. if you want to know more, this large scale learning challenge sören sonnneburg organized in 2008 still has valuable information on how to deal with massive data sets. of course, there are things which can be easily scaled well using hadoop or spark, in particular any kind of data preprocessing or feature extraction where you need to apply the same operation to each data point in your data set. another area where parallelization is easy and useful is when you are using cross validation to do model selection where you usually have to train a large number of models for different parameter sets to find the combination which performs best. again, even here there is more potential for even speeding up such computations using better algorithms like in this paper of mine . i’ve just scratched the surface of this, but i hope you got the idea that scalability can mean quite different things. in big data (meaning the infrastructure side of it) what you want to compute is pretty well defined, for example some kind of aggregate over your data set, so you’re left with the question of how to parallelize that computation well. in machine learning, you have much more freedom because data is noisy and there’s always some freedom in how you model your data, so you can often get away with computing some variation of what you originally wanted to do and still perform well. often, this allows you to speed up your computations significantly by decoupling computations. parallelization is important, too, but alone it won’t get you very far. luckily, there are projects like spark and stratosphere/flink which work on providing more useful abstractions beyond map and reduce to make the last part easier for data scientists, but you won’t get rid of the algorithmic design part any time soon.
July 3, 2014
by Mikio Braun
· 18,507 Views · 1 Like
article thumbnail
Spring Integration Java DSL sample - Further Simplification With JMS Namespace Factories
In an earlier blog entry I had touched on a fictitious rube goldberg flow for capitalizing a string through a complicated series of steps, the premise of the article was to introduce Spring Integration Java DSL as an alternative to defining integration flows through xml configuration files. I learned a few new things after writing that blog entry, thanks to Artem Bilan and wanted to document those learnings here: So, first my original sample, here I have the following flow(the one's in bold): Take in a message of this type - "hello from spring integ" Split it up into individual words(hello, from, spring, integ) Send each word to a ActiveMQ queue Pick up the word fragments from the queue and capitalize each word Place the response back into a response queue Pick up the message, re-sequence based on the original sequence of the words Aggregate back into a sentence("HELLO FROM SPRING INTEG") and Return the sentence back to the calling application. EchoFlowOutbound.java: @Bean public DirectChannel sequenceChannel() { return new DirectChannel(); } @Bean public DirectChannel requestChannel() { return new DirectChannel(); } @Bean public IntegrationFlow toOutboundQueueFlow() { return IntegrationFlows.from(requestChannel()) .split(s -> s.applySequence(true).get().getT2().setDelimiters("\\s")) .handle(jmsOutboundGateway()) .get(); } @Bean public IntegrationFlow flowOnReturnOfMessage() { return IntegrationFlows.from(sequenceChannel()) .resequence() .aggregate(aggregate -> aggregate.outputProcessor(g -> Joiner.on(" ").join(g.getMessages() .stream() .map(m -> (String) m.getPayload()).collect(toList()))) , null) .get(); } @Bean public JmsOutboundGateway jmsOutboundGateway() { JmsOutboundGateway jmsOutboundGateway = new JmsOutboundGateway(); jmsOutboundGateway.setConnectionFactory(this.connectionFactory); jmsOutboundGateway.setRequestDestinationName("amq.outbound"); jmsOutboundGateway.setReplyChannel(sequenceChannel()); return jmsOutboundGateway; } It turns out, based on Artem Bilan's feedback, that a few things can be optimized here. First notice how I have explicitly defined two direct channels, "requestChannel" for starting the flow that takes in the string message and the "sequenceChannel" to handle the message once it returns back from the jms message queue, these can actually be totally removed and the flow made a little more concise this way: @Bean public IntegrationFlow toOutboundQueueFlow() { return IntegrationFlows.from("requestChannel") .split(s -> s.applySequence(true).get().getT2().setDelimiters("\\s")) .handle(jmsOutboundGateway()) .resequence() .aggregate(aggregate -> aggregate.outputProcessor(g -> Joiner.on(" ").join(g.getMessages() .stream() .map(m -> (String) m.getPayload()).collect(toList()))) , null) .get(); } @Bean public JmsOutboundGateway jmsOutboundGateway() { JmsOutboundGateway jmsOutboundGateway = new JmsOutboundGateway(); jmsOutboundGateway.setConnectionFactory(this.connectionFactory); jmsOutboundGateway.setRequestDestinationName("amq.outbound"); return jmsOutboundGateway; } "requestChannel" is now being implicitly created just by declaring a name for it. The sequence channel is more interesting, quoting Artem Bilan - do not specify outputChannel for AbstractReplyProducingMessageHandler and rely on DSL , what it means is that here jmsOutboundGateway is a AbstractReplyProducingMessageHandler and its reply channel is implicitly derived by the DSL. Further, two methods which were earlier handling the flows for sending out the message to the queue and then continuing once the message is back, is collapsed into one. And IMHO it does read a little better because of this change. The second good change and the topic of this article is the introduction of the Jms namespace factories, when I had written the previous blog article, DSL had support for defining the AMQ inbound/outbound adapter/gateway, now there is support for Jms based inbound/adapter adapter/gateways also, this simplifies the flow even further, the flow now looks like this: @Bean public IntegrationFlow toOutboundQueueFlow() { return IntegrationFlows.from("requestChannel") .split(s -> s.applySequence(true).get().getT2().setDelimiters("\\s")) .handle(Jms.outboundGateway(connectionFactory) .requestDestination("amq.outbound")) .resequence() .aggregate(aggregate -> aggregate.outputProcessor(g -> Joiner.on(" ").join(g.getMessages() .stream() .map(m -> (String) m.getPayload()).collect(toList()))) , null) .get(); } The inbound Jms part of the flow also simplifies to the following: @Bean public IntegrationFlow inboundFlow() { return IntegrationFlows.from(Jms.inboundGateway(connectionFactory) .destination("amq.outbound")) .transform((String s) -> s.toUpperCase()) .get(); } Thus, to conclude, Spring Integration Java DSL is an exciting new way to concisely configure Spring Integration flows. It is already very impressive in how it simplifies the readability of flows, the introduction of the Jms namespace factories takes it even further for JMS based flows.
July 2, 2014
by Biju Kunjummen
· 17,807 Views
article thumbnail
Making Operations on Volatile Fields Atomic
The expected behaviour for volatile fields is that they should behave in a multi-threaded application the same as they do in a single threaded application. They are not forbidden to behave the same way, but they are not guaranteed to behave the same way. The solution in Java 5.0+ is to use AtomicXxxx classes however these are relatively inefficient in terms of memory (they add a header and padding), performance (they add a references and little control over their relative positions), and syntactically they are not as clear to use. IMHO A simple solution if for volatile fields to act as they might be expected to do, the way JVM must support in AtomicFields which is not forbidden in the current JMM (Java- Memory Model) but not guaranteed. Why make fields volatile? The benefit of volatile fields is that they are visible across threads and some optimisations which avoid re-reading them are disabled so you always check again the current value even if you didn't change them. e.g. without volatile Thread 2: int a = 5; Thread 1: a = 6; (later) Thread 2: System.out.println(a); // prints 5 With volatile Thread 2: volatile int a = 5; Thread 1: a = 6; (later) Thread 2: System.out.println(a); // prints 6 Why not use volatile all the time? Volatile read and write access is substantially slower. When you write to a volatile field it stalls the entire CPU pipeline to ensure the data has been written to cache. Without this, there is a risk the next read of the value sees an old value, even in the same thread (See AtomicLong.lazySet() which avoids stalling the pipeline) The penalty can be in the order of 10x slower which you don't want to be doing on every access. What are the limitations of volatile? A significant limitation is that operations on the field is not atomic, even when you might think it is. Even worse than that is that usually, there is no difference. I.e. it can appear to work for a long time even years and suddenly/randomly break due to an incidental change such as the version of Java used, or even where the object is loaded into memory. e.g. which programs you loaded before running the program. e.g. updating a value Thread 2: volatile int a = 5; Thread 1: a += 1; Thread 2: a += 2; (later) Thread 2: System.out.println(a); // prints 6, 7 or 8 This is an issue because the read of a and the write of a are done separately and you can get a race condition. 99%+ of the time it will behave as expect, but sometimes it won't/ What can you do about it? You need to use AtomicXxxx classes. These wrap volatile fields with operations which behave as expected. Thread 2: AtomicInteger a = new AtomicInteger(5); Thread 1: a.incrementAndGet(); Thread 2: a.addAndGet(2); (later) Thread 2: System.out.println(a); // prints 8 What do I propose? The JVM has a means to behave as expected, the only surprising thing is you need to use a special class to do what the JMM won't guarantee for you. What I propose is that the JMM be changed to support the behaviour currently provided by the concurrency AtomicClasses. In each case the single threaded behaviour is unchanged. A multi-threaded program which does not see a race condition will behave the same. The difference is that a multi-threaded program does not have to see a race condition but changing the underlying behaviour current method suggested syntax notes x.getAndIncrement() x++ or x += 1 x.incrementAndGet() ++x x.getAndDecrment() x-- or x -= 1 x.decrementAndGet() --x x.addAndGet(y) (x += y) x.getAndAdd(y) ((x += y)-y) x.compareAndSet(e, y) (x == e ? x = y, true : false) Need to add the comma syntax used in other languages. These operations could be supported for all the primitive types such as boolean, byte, short, int, long, float and double. Additional assignment operators could be supported such as current method suggested syntax notes Atomic multiplication x *= 2; Atomic subtraction x -= y; Atomic division x /= y; Atomic modulus x %= y; Atomic shift x <<= y; Atomic shift x >>= z; Atomic shift x >>>= w; Atomic and x &= ~y; clears bits Atomic or x |= z; sets bits Atomic xor x ^= w; flips bits What is the risk? This could break code which relies on these operations occasionally failing due to race conditions. It might not be possible to support more complex expressions in a thread safe manner. This could lead to surprising bugs as the code can look like the works, but it doesn't. Never the less it will be no worse than the current state. JEP 193 - Enhanced Volatiles There is a JEP 193 to add this functionality to Java. An example is; class Usage { volatile int count; int incrementCount() { return count.volatile.incrementAndGet(); } } IMHO there is a few limitations in this approach. The syntax is fairly significant change. Changing the JMM might not require many changes the the Java syntax and possibly no changes to the compiler. It is a less general solution. It can be useful to support operations like volume += quantity; where these are double types. It places more burden on the developer to understand why he/she should use this instead of x++; Conclusion Much of the syntactic and performance overhead of using AtomicInteger and AtomicLong could be removed if the JMM guaranteed the equivalent single threaded operations behaved as expected for multi-threaded code. This feature could be added to earlier versions of Java by using byte code instrumentation.
June 27, 2014
by Peter Lawrey
· 11,897 Views
article thumbnail
Option.fold() Considered Unreadable
We had a lengthy discussion recently during code review whether scala.Option.fold() is idiomatic and clever or maybe unreadable and tricky? Let's first describe what the problem is. Option.fold does two things: maps a function f overOption's value (if any) or returns an alternative alt if it's absent. Using simple pattern matching we can implement it as follows: val option: Option[T] = //... def alt: R = //... def f(in: T): R = //... val x: R = option match { case Some(v) => f(v) case None => alt } If you prefer one-liner, fold is actually a combination of map and getOrElse val x: R = option map f getOrElse alt Or, if you are a C programmer that still wants to write in C, but using Scala compiler: val x: R = if (option.isDefined) f(option.get) else alt Interestingly this is similar to how fold() is actually implemented, but that's an implementation detail. OK, all of the above can be replaced with single Option.fold(): val x: R = option.fold(alt)(f) Technically you can even use /: and \: operators (alt /: option) - but that would be simply masochistic. I have three problems with option.fold() idiom. First of all - it's anything but readable. We are folding (reducing) over Option - which doesn't really make much sense. Secondly it reverses the ordinary positive-then-negative-case flow by starting with failure (absence, alt) condition followed by presence block (f function; see also: Refactoring map-getOrElse to fold). Interestingly this method would work great for me if it was named mapOrElse: ** * Hypothetical in Option */ def mapOrElse[B](f: A => B, alt: => B): B = this map f getOrElse alt Actually there is already such method in Scalaz, called OptionW.cata. cata. Here is whatMartin Odersky has to say about it: "I personally find methods like cata that take two closures as arguments are often overdoing it. Do you really gain in readability over map + getOrElse? Think of a newcomer to your code[...]"While cata has some theoretical background, Option.fold just sounds like a random name collision that doesn't bring anything to the table, apart from confusion. I know what you'll say, that TraversableOnce has fold and we are sort-of doing the same thing. Why it's a random collision rather than extending the contract described inTraversableOnce? fold() method in Scala collections typically just delegates to one offoldLeft()/foldRight() (the one that works better for given data structure), thus it doesn't guarantee order and folding function has to be associative. But inOption.fold() the contract is different: folding function takes just one parameter rather than two. If you read my previous article about folds you know that reducing function always takes two parameters: current element and accumulated value (initial value during first iteration). But Option.fold() takes just one parameter: current Option value! This breaks the consistency, especially when realizing Option.foldLeft() andOption.foldRight() have correct contract (but it doesn't mean they are more readable). The only way to understand folding over option is to imagine Option as a sequence with0 or 1 elements. Then it sort of makes sense, right? No. def double(x: Int) = x * 2 Some(21).fold(-1)(double) //OK: 42 None.fold(-1)(double) //OK: -1 but: Some(21).toList.fold(-1)(double) : error: type mismatch; found : Int => Int required: (Int, Int) => Int Some(21).toList.fold(-1)(double) ^ If we treat Option[T] as a List[T], awkward Option.fold() breaks because it has different type than TraversableOnce.fold(). This is my biggest concern. I can't understand why folding wasn't defined in terms of the type system (trait?) and implemented strictly. As an example take a look at: Data.Foldable in Haskell (advanced) Data.Foldable typeclass describes various flavours of folding in Haskell. There are familiar foldl/foldr/foldl1/foldr1, in Scala namedfoldLeft/foldRight/reduceLeft/reduceRight accordingly. They have the same type as Scala and behave unsurprisingly with all types that you can fold over, including Maybe, lists, arrays, etc. There is also a function named fold, but it has a completely different meaning: class Foldable t where fold :: Monoid m => t m -> m While other folds are quite complex, this one barely takes a foldable container of ms (which have to be Monoids) and returns the same Monoid type. A quick recap: a type can be aMonoid if there exists a neutral value of that type and an operation that takes two values and produces just one. Applying that function with one of the arguments being neutral value yields the other argument. String ([Char]) is a good example with empty string being neutral value (mempty) and string concatenation being such operation (mappend). Notice that there are two different ways you can construct monoids for numbers: under addition with neutral value being 0 (x + 0 == 0 + x == x for any x) and under multiplication with neutral 1 (x * 1 == 1 * x == x for any x). Let's stick to strings. If I fold empty list of strings, I'll get an empty string. But when a list contains many elements, they are being concatenated: > fold ([] :: [String]) "" > fold [] :: String "" > fold ["foo", "bar"] "foobar" In the first example we have to explicitly say what is the type of empty list []. Otherwise Haskell compiler can't figure out what is the type of elements in a list, thus which monoid instance to choose. In second example we declare that whatever is returned from fold [], it should be a String. From that the compiler infers that [] actually must have a type of [String]. Last fold is the simplest: the program folds over elements in list and concatenates them because concatenation is the operation defined in Monoid Stringtypeclass instance. Back to options (or more precisely Maybe). Folding over Maybe monad having type parameter being Monoid (I can't believe I just said it) has an interesting interpretation: it either returns value inside Maybe or a default Monoid value: > fold (Just "abc") "abc" > fold Nothing :: String "" Just "abc" is same as Some("abc") in Scala. You can see here that if Maybe Stringis Nothing, neutral String monoid value is returned, that is an empty string. Summary Haskell shows that folding (also over Maybe) can be at least consistent. In ScalaOption.fold is unrelated to List.fold, confusing and unreadable. I advise avoiding it and staying with slightly more verbose map/getOrElse transformations or pattern matching. PS: Did I mention there is also Either.fold() (with even different contract) but noTry.fold()?
June 26, 2014
by Tomasz Nurkiewicz
· 9,581 Views
article thumbnail
Eclipse Community Survey 2014 Results
We have published the results of the Eclipse Community Survey 2014. Thank you to everyone who participated in the survey this year. The complete results and data are available for anyone to download [xls] [ods]. As in other years, I think the results provide an interesting perspective on what tools software developers are using and the type of applications they are building. Here are some key highlights from the results this year: 1) Git #1 Code Management Tool. Git has finally surpassed Subversion to be the top code management tool used by software developers. A third of developers (33.3%) report they use Git as their primary code management tool compared to 30.7% using Subversion. Subversion continues to show a downward trend from previous years when it was used by more than half the developers. Of note, 9.6% claim GitHub is their primary code management tool so the prevalence of overall Git usage is becoming dominate. 2) Maven and Jenkins Key Tools. For Build and Release tools, Maven and Jenkins continue to be key tools used by developers. Of interest is the growth of Gradle from 2013 (4.5%) to 2014 (11%). 3) Top 3 Application Servers. Tomcat (32.6%), JBoss (11.8%) and Jetty (7.2%) continue to be the top 3 application servers. 4) Java 8 Adoption. Java 8 was released in March 2014 and already 9.2% of Java developers have migrated to Java 8 as their primary version of Java. 59.2% are using Java 7 but close to a quarter are using Java 6 or before. 5) Majority of Developers Use JavaScript. More and more software developers use multiple languages to develop software. Due to the Eclipse biased of the survey, Java is not surprisingly the top primary language. However, when asked what other languages developers might use, JavaScript stands out to be a popular language with a the majority of developers (56.2%) claiming it as a secondary language. 6) Developers Experimenting With Open Hardware. The Internet of Things (IoT) and Open Hardware have become important industry trends in the last couple of years. Over a third (35.7%) of software developers are spending their own personal time learning about devices like the BeagleBone, Arduino and Raspberry Pi. Thanks again to everyone who participated in the survey. I hope everyone finds the results of interest.
June 25, 2014
by Ian Skerrett
· 14,190 Views
  • Previous
  • ...
  • 527
  • 528
  • 529
  • 530
  • 531
  • 532
  • 533
  • 534
  • 535
  • 536
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×