DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Data Topics

article thumbnail
R/dplyr: Extracting Data Frame Column Value for Filtering With %in%
I’ve been playing around with dplyr over the weekend and wanted to extract the values from a data frame column to use in a later filtering step. I had a data frame: library(dplyr) df = data.frame(userId = c(1,2,3,4,5), score = c(2,3,4,5,5)) And wanted to extract the userIds of those people who have a score greater than 3. I started with: highScoringPeople = df %>% filter(score > 3) %>% select(userId) > highScoringPeople userId 1 3 2 4 3 5 And then filtered the data frame expecting to get back those 3 people: > df %>% filter(userId %in% highScoringPeople) [1] userId score <0 rows> (or 0-length row.names) No rows! I created vector with the numbers 3-5 to make sure that worked: > df %>% filter(userId %in% c(3,4,5)) userId score 1 3 4 2 4 5 3 5 5 That works as expected so highScoringPeople obviously isn’t in the right format to facilitate an ‘in lookup’. Let’s explore: > str(c(3,4,5)) num [1:3] 3 4 5 > str(highScoringPeople) 'data.frame': 3 obs. of 1 variable: $ userId: num 3 4 5 Now it’s even more obvious why it doesn’t work – highScoringPeople is still a data frame when we need it to be a vector/list. One way to fix this is to extract the userIds using the $ syntax instead of the select function: highScoringPeople = (df %>% filter(score > 3))$userId > str(highScoringPeople) num [1:3] 3 4 5 > df %>% filter(userId %in% highScoringPeople) userId score 1 3 4 2 4 5 3 5 5 Or if we want to do the column selection using dplyr we can extract the values for the column like this: highScoringPeople = (df %>% filter(score > 3) %>% select(userId))[[1]] > str(highScoringPeople) num [1:3] 3 4 5 Not so difficult after all.
March 12, 2015
by Mark Needham
· 15,161 Views
article thumbnail
Streaming Big Data: Storm, Spark and Samza
There are a number of distributed computation systems that can process Big Data in real time or near-real time. This article will start with a short description of three Apache frameworks, and attempt to provide a quick, high-level overview of some of their similarities and differences. Apache Storm In Storm, you design a graph of real-time computation called a topology, and feed it to the cluster where the master node will distribute the code among worker nodes to execute it. In a topology, data is passed around between spouts that emit data streams as immutable sets of key-value pairs called tuples, and bolts that transform those streams (count, filter etc.). Bolts themselves can optionally emit data to other bolts down the processing pipeline. Apache Spark Spark Streaming (an extension of the core Spark API) doesn’t process streams one at a time like Storm. Instead, it slices them in small batches of time intervals before processing them. The Spark abstraction for a continuous stream of data is called a DStream (for Discretized Stream). A DStream is a micro-batch of RDDs (Resilient Distributed Datasets). RDDs are distributed collections that can be operated in parallel by arbitrary functions and by transformations over a sliding window of data (windowed computations). Apache Samza Samza ’s approach to streaming is to process messages as they are received, one at a time. Samza’s stream primitive is not a tuple or a Dstream, but a message. Streams are divided into partitions and each partition is an ordered sequence of read-only messages with each message having a unique ID (offset). The system also supports batching, i.e. consuming several messages from the same stream partition in sequence. Samza`s Execution & Streaming modules are both pluggable, although Samza typically relies on Hadoop’s YARN (Yet Another Resource Negotiator) and Apache Kafka. Common Ground All three real-time computation systems are open-source, low-latency, distributed, scalable and fault-tolerant. They all allow you to run your stream processing code through parallel tasks distributed across a cluster of computing machines with fail-over capabilities. They also provide simple APIs to abstract the complexity of the underlying implementations. The three frameworks use different vocabularies for similar concepts: Comparison Matrix A few of the differences are summarized in the table below: There are three general categories of delivery patterns: At-most-once: messages may be lost. This is usually the least desirable outcome. At-least-once: messages may be redelivered (no loss, but duplicates). This is good enough for many use cases. Exactly-once: each message is delivered once and only once (no loss, no duplicates). This is a desirable feature although difficult to guarantee in all cases. Another aspect is state management. There are different strategies to store state. Spark Streaming writes data into the distributed file system (e.g. HDFS). Samza uses an embedded key-value store. With Storm, you’ll have to either roll your own state management at your application layer, or use a higher-level abstraction called Trident. Use Cases All three frameworks are particularly well-suited to efficiently process continuous, massive amounts of real-time data. So which one to use? There are no hard rules, at most a few general guidelines. If you want a high-speed event processing system that allows for incremental computations, Storm would be fine for that. If you further need to run distributed computations on demand, while the client is waiting synchronously for the results, you’ll have Distributed RPC (DRPC) out-of-the-box. Last but not least, because Storm uses Apache Thrift, you can write topologies in any programming language. If you need state persistence and/or exactly-once delivery though, you should look at the higher-level Trident API, which also offers micro-batching. A few companies using Storm: Twitter, Yahoo!, Spotify, The Weather Channel... Speaking of micro-batching, if you must have stateful computations, exactly-once delivery and don’t mind a higher latency, you could consider Spark Streaming…specially if you also plan for graph operations, machine learning or SQL access. The Apache Spark stack lets you combine several libraries with streaming (Spark SQL, MLlib, GraphX) and provides a convenient unifying programming model. In particular, streaming algorithms (e.g. streaming k-means) allow Spark to facilitate decisions in real-time. A few companies using Spark: Amazon, Yahoo!, NASA JPL, eBay Inc., Baidu… If you have a large amount of state to work with (e.g. many gigabytes per partition), Samza co-locates storage and processing on the same machines, allowing to work efficiently with state that won’t fit in memory. The framework also offers flexibility with its pluggable API: its default execution, messaging and storage engines can each be replaced with your choice of alternatives. Moreover, if you have a number of data processing stages from different teams with different codebases, Samza ‘s fine-grained jobs would be particularly well-suited, since they can be added/removed with minimal ripple effects. A few companies using Samza: LinkedIn, Intuit, Metamarkets, Quantiply, Fortscale… Conclusion We only scratched the surface of The Three Apaches. We didn’t cover a number of other features and more subtle differences between these frameworks. Also, it’s important to keep in mind the limits of the above comparisons, as these systems are constantly evolving.
February 28, 2015
by Tony Siciliani
· 32,717 Views · 5 Likes
article thumbnail
A JAXB Nuance: String Versus Enum from Enumerated Restricted XSD String
Although Java Architecture for XML Binding (JAXB) is fairly easy to use in nominal cases (especially since Java SE 6), it also presents numerous nuances. Some of the common nuances are due to the inability to exactlymatch (bind) XML Schema Definition (XSD) types to Java types. This post looks at one specific example of this that also demonstrates how different XSD constructs that enforce the same XML structure can lead to different Java types when the JAXB compiler generates the Java classes. The next code listing, for Food.xsd, defines a schema for food types. The XSD mandates that valid XML will have a root element called "Food" with three nested elements "Vegetable", "Fruit", and "Dessert". Although the approach used to specify the "Vegetable" and "Dessert" elements is different than the approach used to specify the "Fruit" element, both approaches result in similar "valid XML." The "Vegetable" and "Dessert" elements are declared directly as elements of the prescribed simpleTypes defined later in the XSD. The "Fruit" element is defined via reference (ref=) to another defined element that consists of a simpleType. Food.xsd Although Vegetable and Dessert elements are defined in the schema differently than Fruit, the resulting valid XML is the same. A valid XML file is shown next in the code listing for food1.xml. food1.xml Spinach Watermelon Pie At this point, I'll use a simple Groovy script to validate the above XML against the above XSD. The code for this Groovy XML validation script (validateXmlAgainstXsd.groovy) is shown next. validateXmlAgainstXsd.groovy #!/usr/bin/env groovy // validateXmlAgainstXsd.groovy // // Accepts paths/names of two files. The first is the XML file to be validated // and the second is the XSD against which to validate that XML. if (args.length < 2) { println "USAGE: groovy validateXmlAgainstXsd.groovy " System.exit(-1) } String xml = args[0] String xsd = args[1] import javax.xml.validation.Schema import javax.xml.validation.SchemaFactory import javax.xml.validation.Validator try { SchemaFactory schemaFactory = SchemaFactory.newInstance(javax.xml.XMLConstants.W3C_XML_SCHEMA_NS_URI) Schema schema = schemaFactory.newSchema(new File(xsd)) Validator validator = schema.newValidator() validator.validate(new javax.xml.transform.stream.StreamSource(xml)) } catch (Exception exception) { println "\nERROR: Unable to validate ${xml} against ${xsd} due to '${exception}'\n" System.exit(-1) } println "\nXML file ${xml} validated successfully against ${xsd}.\n" The next screen snapshot demonstrates running the above Groovy XML validation script against food1.xmland Food.xsd. The objective of this post so far has been to show how different approaches in an XSD can lead to the same XML being valid. Although these different XSD approaches prescribe the same valid XML, they lead to different Java class behavior when JAXB is used to generate classes based on the XSD. The next screen snapshot demonstrates running the JDK-provided JAXB xjc compiler against the Food.xsd to generate the Java classes. The output from the JAXB generation shown above indicates that Java classes were created for the "Vegetable" and "Dessert" elements but not for the "Fruit" element. This is because "Vegetable" and "Dessert" were defined differently than "Fruit" in the XSD. The next code listing is for the Food.java class generated by the xjc compiler. From this we can see that the generated Food.java class references specific generated Java types for Vegetable and Dessert, but references simply a generic Java String for Fruit. Food.java (generated by JAXB jxc compiler) // // This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 // See http://java.sun.com/xml/jaxb // Any modifications to this file will be lost upon recompilation of the source schema. // Generated on: 2015.02.11 at 10:17:32 PM MST // package com.blogspot.marxsoftware.foodxml; import javax.xml.bind.annotation.XmlAccessType; import javax.xml.bind.annotation.XmlAccessorType; import javax.xml.bind.annotation.XmlElement; import javax.xml.bind.annotation.XmlRootElement; import javax.xml.bind.annotation.XmlSchemaType; import javax.xml.bind.annotation.XmlType; /** * Java class for anonymous complex type. * * The following schema fragment specifies the expected content contained within this class. * * * * * * * * * * * * * * * * */ @XmlAccessorType(XmlAccessType.FIELD) @XmlType(name = "", propOrder = { "vegetable", "fruit", "dessert" }) @XmlRootElement(name = "Food") public class Food { @XmlElement(name = "Vegetable", required = true) @XmlSchemaType(name = "string") protected Vegetable vegetable; @XmlElement(name = "Fruit", required = true) protected String fruit; @XmlElement(name = "Dessert", required = true) @XmlSchemaType(name = "string") protected Dessert dessert; /** * Gets the value of the vegetable property. * * @return * possible object is * {@link Vegetable } * */ public Vegetable getVegetable() { return vegetable; } /** * Sets the value of the vegetable property. * * @param value * allowed object is * {@link Vegetable } * */ public void setVegetable(Vegetable value) { this.vegetable = value; } /** * Gets the value of the fruit property. * * @return * possible object is * {@link String } * */ public String getFruit() { return fruit; } /** * Sets the value of the fruit property. * * @param value * allowed object is * {@link String } * */ public void setFruit(String value) { this.fruit = value; } /** * Gets the value of the dessert property. * * @return * possible object is * {@link Dessert } * */ public Dessert getDessert() { return dessert; } /** * Sets the value of the dessert property. * * @param value * allowed object is * {@link Dessert } * */ public void setDessert(Dessert value) { this.dessert = value; } } The advantage of having specific Vegetable and Dessert classes is the additional type safety they bring as compared to a general Java String. Both Vegetable.java and Dessert.java are actually enums because they come from enumerated values in the XSD. The two generated enums are shown in the next two code listings. Vegetable.java (generated with JAXB xjc compiler) // // This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 // See http://java.sun.com/xml/jaxb // Any modifications to this file will be lost upon recompilation of the source schema. // Generated on: 2015.02.11 at 10:17:32 PM MST // package com.blogspot.marxsoftware.foodxml; import javax.xml.bind.annotation.XmlEnum; import javax.xml.bind.annotation.XmlEnumValue; import javax.xml.bind.annotation.XmlType; /** * Java class for Vegetable. * * The following schema fragment specifies the expected content contained within this class. * * * * * * * * * * * * */ @XmlType(name = "Vegetable") @XmlEnum public enum Vegetable { @XmlEnumValue("Carrot") CARROT("Carrot"), @XmlEnumValue("Squash") SQUASH("Squash"), @XmlEnumValue("Spinach") SPINACH("Spinach"), @XmlEnumValue("Celery") CELERY("Celery"); private final String value; Vegetable(String v) { value = v; } public String value() { return value; } public static Vegetable fromValue(String v) { for (Vegetable c: Vegetable.values()) { if (c.value.equals(v)) { return c; } } throw new IllegalArgumentException(v); } } Dessert.java (generated with JAXB xjc compiler) // // This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 // See http://java.sun.com/xml/jaxb // Any modifications to this file will be lost upon recompilation of the source schema. // Generated on: 2015.02.11 at 10:17:32 PM MST // package com.blogspot.marxsoftware.foodxml; import javax.xml.bind.annotation.XmlEnum; import javax.xml.bind.annotation.XmlEnumValue; import javax.xml.bind.annotation.XmlType; /** * Java class for Dessert. * * The following schema fragment specifies the expected content contained within this class. * * * * * * * * * * * */ @XmlType(name = "Dessert") @XmlEnum public enum Dessert { @XmlEnumValue("Pie") PIE("Pie"), @XmlEnumValue("Cake") CAKE("Cake"), @XmlEnumValue("Ice Cream") ICE_CREAM("Ice Cream"); private final String value; Dessert(String v) { value = v; } public String value() { return value; } public static Dessert fromValue(String v) { for (Dessert c: Dessert.values()) { if (c.value.equals(v)) { return c; } } throw new IllegalArgumentException(v); } } Having enums generated for the XML elements ensures that only valid values for those elements can be represented in Java. Conclusion JAXB makes it relatively easy to map Java to XML, but because there is not a one-to-one mapping between Java and XML types, there can be some cases where the generated Java type for a particular XSD prescribed element is not obvious. This post has shown how two different approaches to building an XSD to enforce the same basic XML structure can lead to very different results in the Java classes generated with the JAXB xjccompiler. In the example shown in this post, declaring elements in the XSD directly on simpleTypes restricting XSD's string to a specific set of enumerated values is preferable to declaring elements as references to other elements wrapping a simpleType of restricted string enumerated values because of the type safety that is achieved when enums are generated rather than use of general Java Strings.
February 25, 2015
by Dustin Marx
· 21,439 Views · 1 Like
article thumbnail
Sneak Peek into the JCache API (JSR 107)
This post covers the JCache API at a high level and provides a teaser – just enough for you to (hopefully) start itching about it ;-) In this post …. JCache overview JCache API, implementations Supported (Java) platforms for JCache API Quick look at Oracle Coherence Fun stuff – Project Headlands (RESTified JCache by Adam Bien) , JCache related talks at Java One 2014, links to resources for learning more about JCache What is JCache? JCache (JSR 107) is a standard caching API for Java. It provides an API for applications to be able to create and work with in-memory cache of objects. Benefits are obvious – one does not need to concentrate on the finer details of implementing the Caching and time is better spent on the core business logic of the application. JCache components The specification itself is very compact and surprisingly intuitive. The API defines high level components (interfaces) some of which are listed below Caching Provider – used to control Caching Managers and can deal with several of them, Cache Manager – deals with create, read, destroy operations on a Cache Cache – stores entries (the actual data) and exposes CRUD interfaces to deal with the entries Entry – abstraction on top of a key-value pair akin to a java.util.Map Hierarchy of JCache API components JCache Implementations JCache defines the interfaces which of course are implemented by different vendors a.k.a Providers. Oracle Coherence Hazelcast Infinispan ehcache Reference Implementation – this is more for reference purpose rather than a production quality implementation. It is per the specification though and you can be rest assured of the fact that it does in fact pass the TCK as well From the application point of view, all that’s required is the implementation to be present in the classpath. The API also provides a way to further fine tune the properties specific to your provider via standard mechanisms. You should be able to track the list of JCache reference implementations from the JCP website link public class JCacheUsage{ public static void main(String[] args){ //bootstrap the JCache Provider CachingProvider jcacheProvider = Caching.getCachingProvider(); CacheManager jcacheManager = jcacheProvider.getCacheManager(); //configure cache MutableConfiguration jcacheConfig = new MutableConfiguration<>(); jcacheConfig.setTypes(String.class, MyPreciousObject.class); //create cache Cache cache = jcacheManager.createCache("PreciousObjectCache", jcacheConfig); //play around String key = UUID.randomUUID().toString(); cache.put(key, new MyPreciousObject()); MyPreciousObject inserted = cache.get(key); cache.remove(key); cache.get(key); //will throw javax.cache.CacheException since the key does not exist } } JCache provider detection JCache provider detection happens automatically when you only have a single JCache provider on the class path You can choose from the below options as well //set JMV level system property -Djavax.cache.spi.cachingprovider=org.ehcache.jcache.JCacheCachingProvider //code level config System.setProperty("javax.cache.spi.cachingprovider","org.ehcache.jcache.JCacheCachingProvider //you want to choose from multiple JCache providers at runtime CachingProvider ehcacheJCacheProvider = Caching.getCachingProvider("org.ehcache.jcache.JCacheCachingProvider"); //which JCache providers do I have on the classpath? Iterable jcacheProviders = Caching.getCachingProviders(); Java Platform support Compliant with Java SE 6 and above Does not define any details in terms of Java EE integration. This does not mean that it cannot be used in a Java EE environment – it’s just not standardized yet. Could not be plugged into Java EE 7 as a tried and tested standard Candidate for Java EE 8 Project Headlands: Java EE and JCache in tandem By none other than Adam Bien himself ! Java EE 7, Java SE 8 and JCache in action Exposes the JCache API via JAX-RS (REST) Uses Hazelcast as the JCache provider Highly recommended ! Oracle Coherence This post deals with high level stuff w.r.t JCache in general. However, a few lines about Oracle Coherence in general would help put things in perspective Oracle Coherence is a part of Oracle’s Cloud Application Foundation stack It is primarily an in-memory data grid solution Geared towards making applications more scalable in general What’s important to know is that from version 12.1.3 onwards, Oracle Coherence includes a reference implementation for JCache (more in the next section) JCache support in Oracle Coherence Support for JCache implies that applications can now use a standard API to access the capabilities of Oracle Coherence This is made possible by Coherence by simply providing an abstraction over its existing interfaces (NamedCache etc). Application deals with a standard interface (JCache API) and the calls to the API are delegated to the existing Coherence core library implementation Support for JCache API also means that one does not need to use Coherence specific APIs in the application resulting in vendor neutral code which equals portability How ironic – supporting a standard API and always keeping your competitors in the hunt ;-) But hey! That’s what healthy competition and quality software is all about ! Talking of healthy competition – Oracle Coherence does support a host of other features in addition to the standard JCache related capabilities. The Oracle Coherence distribution contains all the libraries for working with the JCache implementation The service definition file in the coherence-jcache.jar qualifies it as a valid JCache provider implementation Curious about Oracle Coherence ? Quick Starter page Documentation Installation Further reading about Coherence and JCache combo – Oracle Coherence documentation JCache at Java One 2014 Couple of great talks revolving around JCache at Java One 2014 Come, Code, Cache, Compute! by Steve Millidge Using the New JCache by Brian Oliver and Greg Luck Hope this was fun :-) Cheers !
February 23, 2015
by Abhishek Gupta DZone Core CORE
· 6,303 Views · 1 Like
article thumbnail
Getting Started with Dropwizard: Authentication, Configuration and HTTPS
Basic Authentication is the simplest way to secure access to a resource.
February 10, 2015
by Dmitry Noranovich
· 48,818 Views · 1 Like
article thumbnail
The API Gateway Pattern: Angular JS and Spring Security Part IV
Written by Dave Syer in the Spring blog In this article we continue our discussion of how to use Spring Security with Angular JS in a “single page application”. Here we show how to build an API Gateway to control the authentication and access to the backend resources using Spring Cloud. This is the fourth in a series of articles, and you can catch up on the basic building blocks of the application or build it from scratch by reading the first article, or you can just go straight to the source code in Github. In the last article we built a simple distributed application that used Spring Session to authenticate the backend resources. In this one we make the UI server into a reverse proxy to the backend resource server, fixing the issues with the last implementation (technical complexity introduced by custom token authentication), and giving us a lot of new options for controlling access from the browser client. Reminder: if you are working through this article with the sample application, be sure to clear your browser cache of cookies and HTTP Basic credentials. In Chrome the best way to do that for a single server is to open a new incognito window. Creating an API Gateway An API Gateway is a single point of entry (and control) for front end clients, which could be browser based (like the examples in this article) or mobile. The client only has to know the URL of one server, and the backend can be refactored at will with no change, which is a significant advantage. There are other advantages in terms of centralization and control: rate limiting, authentication, auditing and logging. And implementing a simple reverse proxy is really simple with Spring Cloud. If you were following along in the code, you will know that the application implementation at the end of the last article was a bit complicated, so it’s not a great place to iterate away from. There was, however, a halfway point which we could start from more easily, where the backend resource wasn’t yet secured with Spring Security. The source code for this is a separate project in Github so we are going to start from there. It has a UI server and a resource server and they are talking to each other. The resource server doesn’t have Spring Security yet so we can get the system working first and then add that layer. Declarative Reverse Proxy in One Line To turn it into an API Gateawy, the UI server needs one small tweak. Somewhere in the Spring configuration we need to add an @EnableZuulProxy annotation, e.g. in the main (only)application class: @SpringBootApplication @RestController @EnableZuulProxy public class UiApplication { ... } and in an external configuration file we need to map a local resource in the UI server to a remote one in the external configuration (“application.yml”): security: ... zuul: routes: resource: path: /resource/** url: http://localhost:9000 This says “map paths with the pattern /resource/** in this server to the same paths in the remote server at localhost:9000”. Simple and yet effective (OK so it’s 6 lines including the YAML, but you don’t always need that)! All we need to make this work is the right stuff on the classpath. For that purpose we have a few new lines in our Maven POM: org.springframework.cloud spring-cloud-starter-parent 1.0.0.BUILD-SNAPSHOT pom import org.springframework.cloud spring-cloud-starter-zuul ... Note the use of the “spring-cloud-starter-zuul” - it’s a starter POM just like the Spring Boot ones, but it governs the dependencies we need for this Zuul proxy. We are also using because we want to be able to depend on all the versions of transitive dependencies being correct. Consuming the Proxy in the Client With those changes in place our application still works, but we haven’t actually used the new proxy yet until we modify the client. Fortunately that’s trivial. We just need to go from this implementation of the “home” controller: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('http://localhost:9000/').success(function(data) { $scope.greeting = data; }) }); to a local resource: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('resource/').success(function(data) { $scope.greeting = data; }) }); Now when we fire up the servers everything is working and the requests are being proxied through the UI (API Gateway) to the resource server. Further Simplifications Even better: we don’t need the CORS filter any more in the resource server. We threw that one together pretty quickly anyway, and it should have been a red light that we had to do anything as technically focused by hand (especially where it concerns security). Fortunately it is now redundant, so we can just throw it away, and go back to sleeping at night! Securing the Resource Server You might remember in the intermediate state that we started from there is no security in place for the resource server. Aside: Lack of software security might not even be a problem if your network architecture mirrors the application architecture (you can just make the resource server physically inaccessible to anyone but the UI server). As a simple demonstration of that we can make the resource server only accessible on localhost. Just add this to application.properties in the resource server: server.address: 127.0.0.1 Wow, that was easy! Do that with a network address that’s only visible in your data center and you have a security solution that works for all resource servers and all user desktops. Suppose that we decide we do need security at the software level (quite likely for a number of reasons). That’s not going to be a problem, because all we need to do is add Spring Security as a dependency (in the resource server POM): org.springframework.boot spring-boot-starter-security That’s enough to get us a secure resource server, but it won’t get us a working application yet, for the same reason that it didn’t in Part III: there is no shared authentication state between the two servers. Sharing Authentication State We can use the same mechanism to share authentication (and CSRF) state as we did in the last, i.e. Spring Session. We add the dependency to both servers as before: org.springframework.session spring-session 1.0.0.RELEASE org.springframework.boot spring-boot-starter-redis but this time the configuration is much simpler because we can just add the same Filterdeclaration to both. First the UI server (adding @EnableRedisHttpSession): @SpringBootApplication @RestController @EnableZuulProxy @EnableRedisHttpSession public class UiApplication { ... } and then the resource server. There are two changes to make: one is adding@EnableRedisHttpSession and a HeaderHttpSessionStrategy bean to theResourceApplication: @SpringBootApplication @RestController @EnableRedisHttpSession class ResourceApplication { ... @Bean HeaderHttpSessionStrategy sessionStrategy() { new HeaderHttpSessionStrategy(); } } and the other is to explicitly ask for a non-stateless session creation policy inapplication.properties: security.sessions: NEVER As long as redis is still running in the background (use the fig.yml if you like to start it) then the system will work. Load the homepage for the UI at http://localhost:8080 and login and you will see the message from the backend rendered on the homepage. How Does it Work? What is going on behind the scenes now? First we can look at the HTTP requests in the UI server (and API Gateway): VERB PATH STATUS RESPONSE GET / 200 index.html GET /css/angular-bootstrap.css 200 Twitter bootstrap CSS GET /js/angular-bootstrap.js 200 Bootstrap and Angular JS GET /js/hello.js 200 Application logic GET /user 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /resource 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /login.html 200 Angular login form partial POST /login 302 Redirect to home page (ignored) GET /user 200 JSON authenticated user GET /resource 200 (Proxied) JSON greeting That’s identical to the sequence at the end of Part II except for the fact that the cookie names are slightly different (“SESSION” instead of “JSESSIONID”) because we are using Spring Session. But the architecture is different and that last request to “/resource” is special because it was proxied to the resource server. We can see the reverse proxy in action by looking at the “/trace” endpoint in the UI server (from Spring Boot Actuator, which we added with the Spring Cloud dependencies). Go tohttp://localhost:8080/trace in a browser and scroll to the end (if you don’t have one already get a JSON plugin for your browser to make it nice and readable). You will need to authenticate with HTTP Basic (browser popup), but the same credentials are valid as for your login form. At or near the end you should see a pair of requests something like this: { "timestamp": 1420558194546, "info": { "method": "GET", "path": "/", "query": "" "remote": true, "proxy": "resource", "headers": { "request": { "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260", "x-forwarded-prefix": "/resource", "x-forwarded-host": "localhost:8080" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } }, } }, { "timestamp": 1420558200232, "info": { "method": "GET", "path": "/resource/", "headers": { "request": { "host": "localhost:8080", "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } } } }, The second entry there is the request from the client to the gateway on “/resource” and you can see the cookies (added by the browser) and the CSRF header (added by Angular as discussed inPart II). The first entry has remote: true and that means it’s tracing the call to the resource server. You can see it went out to a uri path “/” and you can see that (crucially) the cookies and CSRF headers have been sent too. Without Spring Session these headers would be meaningless to the resource server, but the way we have set it up it can now use those headers to re-constitute a session with authentication and CSRF token data. So the request is permitted and we are in business! Conclusion We covered quite a lot in this article but we got to a really nice place where there is a minimal amount of boilerplate code in our two servers, they are both nicely secure and the user experience isn’t compromised. That alone would be a reason to use the API Gateway pattern, but really we have only scratched the surface of what that might be used for (Netflix uses it for a lot of things). Read up on Spring Cloud to find out more on how to make it easy to add more features to the gateway. The next article in this series will extend the application architecture a bit by extracting the authentication responsibilities to a separate server (the Single Sign On pattern).
February 9, 2015
by Pieter Humphrey
· 16,299 Views
article thumbnail
Introducing the Database Selection Matrix
Originally Written by Mat Keep For the better part of a generation, the database landscape had changed very little. No one could say “this is not your father’s database.” They had become, in a word, boring. Then a combination of factors catalyzed an era of innovation in database technologies: cheap storage and compute resources; pervasive connectivity; social networks; smartphones; the proliferation of sensors; open source software. Data volumes grew (and are growing) at exponential rates. Over 80% of today’s data no longer fits neatly into the normalized row and column table formats of the past. And so developers began engineering solutions to a new set of problems with a very different set of resources and assumptions. Today these new options include a variety of database architectures built around diverse data models – from key-value to document to wide-column and graph. And of course you still have the option of the venerable relational database. For the enterprise these new technologies hold great promise. They open the door to new applications that could not be imagined before, or to more efficiently solve existing problems. They attract new technical talent. They facilitate the migration of systems to more cost effective infrastructure based on commodity hardware and cloud platforms. But at the same time, evaluation of these new options requires careful consideration. Selecting the appropriate database for a new project requires evaluation against multiple criteria, including: Development considerations: includes the data model, query functionality, available drivers, data consistency. These factors dictate the functionality of your application, and how quickly you can build it. Operational considerations: performance and scalability, high availability, data center awareness, security, management and backups. Over the application’s lifetime, operational costs will contribute a significant percentage to the project’s Total Cost of Ownership (TCO), and so these factors constitute your ability to meet SLAs while minimizing administrative overhead. Commercial considerations: licensing, pricing and support. You need to know that the database you choose is available in a way that is aligned with how you do business. Each these considerations need to be evaluated in context of specific application requirements as well as internal technology standards, skills availability and integration with your existing enterprise architecture. So, where to start? The Database Selection Matrix is designed to serve as a decision framework by teams responsible for database selection. It has been developed in collaboration with several large enterprises who have the choice of running multiple databases in production, and who wanted to institute a systematic methodology for database evaluation. Responses to questions in the matrix helped them identify key requirements and guide selection. And it can do the same for you. Lets illustrate how the Database Selection Matrix can be used by working through a practical example. The Database Selection Matrix in Action! ACME Retail Corporation runs a large vehicle fleet to distribute produce to its nationwide network of stores. The CEO is intent on improving distribution efficiency and so tasks her enterprise architects to build a new platform that can utilize sensor data generated by the company’s trucks. By capturing and analyzing this data, the organization believes it can optimize route planning, improve delivery times, cut wastage and reduce business interruptions caused by breakdowns. ACME Retail Corp is typical of many enterprises that see the opportunity to unlock new efficiencies by leveraging the “Internet of Things”. As Morgan Stanley stated in the “Internet of Things is Now” research “We do not believe traditional data storage architectures are well- suited to accommodate the volume, velocity, and variety of IoT data”. For this reason, enterprises are looking beyond traditional RDBMS technology to the swathe of new database options available to them. Bosch SI did exactly this when it took the decision to use MongoDB to power the Bosch IoT Suite. Of course, MongoDB may not be a perfect fit for every IoT project. There are many choices available – as there for every new type of project – and the ACME architecture group needs a way to navigate the complex landscape of modern databases. Using the Database Selection Matrix, they have the framework to ask the key questions that will guide their technology decisions. So lets put it into practice. Development Considerations In this opening phase, the architects need to evaluate how their shortlisted database options meet the functional requirements of the app that is being built. This is impacted directly by multiple factors – and these are the questions they will need to ask. The Data Model: Will the application need to handle data of varying structure and types? How large can each data type be – is our data made up of simple integers, strings and timestamps or can it also be large binary files such as images or videos? Can our data just be represented as a set of opaque values, or does it need to be typed so other applications can make sense of it? Do we know the data structure will remain constant, or will it vary as we introduce new sensor data and as the business updates application requirements? Does the application require its data to be strongly consistent (i.e. read our own writes), or can eventually consistent data be tolerated (and do our developers know how to handle the complexity it introduces?). Do we end up trading performance and availability if we configure the database to only return the freshest data? The Query Model: What sort of queries are we going to run against the database? Is it simple key-value lookups that we know in advance or do we need to execute ad-hoc queries and complex aggregations to support real-time analytics that the business wants to see? Can we run analytics directly against the database, or do we need to replicate data to dedicated search or analytics engines? Will the application be handling geospatial queries and text search? Does the data need to be integrated with our BI & analytics tools, and what about our new Hadoop cluster, or the data warehouse? Which languages will our engineers be using to develop the application, and does the database have drivers available for them? Operational Considerations In this second phase, the ACME Retail architects need to evaluate how each database would run in production. No-one wants to hand-feed a custom technology, so they need to understand if the database can meet the availability, scalability and security needs of the business, and interoperate with the existing management frameworks. Service Availability: What is the application’s availability SLA? What are our RTO and RPO objectives? Will our operations teams manage failure recovery, or is this something that should be fully automated by the database? What capabilities does the database offer to maintain availability during routine maintenance? Are there tools available to manage this or do we need to script something ourselves? Are there specific requirements to replicate data between our data centers to support disaster recovery? Scalability: How do we expect this application to grow? Will the database need to scale beyond the limits of just a few servers? If data is to be distributed across multiple nodes, will it be partitioned in such a way that it is still optimized for the application’s query patterns? Do we need to scale this across data centers? Can we write and read data locally to reduce the effects of geographic latency? Can we scale storage capacity and I/O by compressing the data, and are different compression algorithms available to optimize compression ratio to CPU overhead? Security: What types of data access control do we need? Can we just use authentication controls within the database or do we need to integrate with our existing LDAP infrastructure? What type of authorization controls are available, and how granular can we get? Do these controls needs to extend down to the level of individual attributes within a document? Is encryption needed, and will those pesky compliance officers need to audit every action taken against the database? Administration: How are going to run this thing? Does the database provide tools to automate provisioning and upgrades or do we need to create our own scripts? How about backups? Can we get incremental backups. How about point in time backups? And then monitoring. We need to know, for example, if disk utilization is peaking above 60% so we can take action before we hit a problem. Can we add these alerts into our existing operational workflow tools? Can we integrate the database’s management platform into our own operational tooling so we don’t need to leave our single screen? Commercial Considerations Once the ACME architects have profiled their technology requirements, they will need to understand how the database is licensed and priced, before legal and procurement come knocking at the door: Licensing: what license is used, and is this acceptable to our legal team? Are commercial licenses available? Support: What support options are open to me? Can I get support SLAs from my vendor, even if I use a community version of their product? What is the SLA I can expect if I do hit an issue? What sort of training is available? Is my only option to send my engineers to public classes, or can we get trained on-demand, at our own pace? What’s Next? The ACME example is designed to illustrate some of the key questions engineering teams need to ask. It is true that the database landscape is more complex than ever. But it needn’t be bewildering – the Database Selection Matrix is designed to help you identify and compare what is most critical as you build your next app, so go ahead and download it now. Looking for additional information about database selection? Learn why organizations choose MongoDB to deliver applications and outcomes that were never previously possible. Download the white paper below: THE VALUE OF DATABASE SELECTION
February 9, 2015
by Francesca Krihely
· 11,351 Views · 1 Like
article thumbnail
NetBeans in the Classroom: MySQL JDBC Connection Pool & JDBC Resource for GlassFish
This tutorial assumes that you have installed the Java EE version of NetBeans 8.02. It is further assumed that you have replaced the default GlassFish server instance in NetBeans with a new instance with its own folder for the domain. This is required for all platforms. See my previous article “Creating a New Instance of GlassFish in NetBeans IDE” The other day I presented to my students the steps necessary to make GlassFish responsible for JDBC connections. As I went through the steps I realized that I needed to record the steps for my students to reference. Here then are these steps. Steps 1 through 4 are required, Step 5 is optional. Step 1a: Manually Adding the MySQL driver to the GlassFish Domain Before we even start NetBeans we must do a bit of preliminary work. With the exception of Derby, GlassFish does not include the MySQL driver or any other driver in its distribution. Go to the MySql Connector/J download site at http://dev.mysql.com/downloads/connector/j/ and download the latest version. I recommend downloading the Platform Independent version. If your OS is Windows download the ZIP archive otherwise download the TAR archive. You are looking for the driver file named mysql-connector-java-5.1.34-bin.jar in the archive. Copy the driver file to the lib folder in the directory where you placed your domain. On my system the folder is located at C:\Users\Ken\personal_domain\lib. If GlassFish is already running then you will have to restart it so that it picks up the new library. Step 1b: Automatically Adding the MySQL driver to the GlassFish Domain NetBeans has a feature that deploys the database driver to the domain’s lib folder if that driver is in NetBeans’ folder of drivers. On my Windows 8.1 system the MySQL driver can be found in C:\Program Files\NetBeans 8.0.2\ide\modules\ext. Start NetBeans and go to the Services tab, expand Servers and right mouse click on your GlassFish Server. Click on Properties and the Servers dialog will appear. On this dialog you will see a check box labelled Enable JDBC Driver Deployment. By default it is checked. NetBeans determines the driver to copy to GlassFish from the file glassfish-resources.xml that we will create in Step 4 of this tutorial. Without this file and if you have not copied the driver into GlassFish manually then GlassFish will not be able to connect to the database. Any code in your web application will not work and all you will likely see are blank pages. Step 1a or Step 1b? I recommend Step 1a and manually add the driver. The reason I prefer this approach is that I can be certain that the most recent driver is in use. As of this writing NetBeans contains version 5.1.23 of the connector but the current version is 5.1.34. If you copy a driver into the lib folder then NetBeans will not replace it with an older driver even if the check box on the Server dialog is checked. NetBeans does not replace a driver if one is already in place. If you need a driver that NetBeans does have a copy of then Step 1b is your only choice. Step 2: Create a Database Connection in NetBeans One feature I have always liked in NetBeans is that it has an interface for working with databases. All that is required is that you create a connection to the database. It also has additional features for managing a MySQL server but we won’t need those. If you have not already started your MySQL DBMS then do that now. I assume that the database you wish to connect to already exists. Go to the Services tab and right mouse click on New Connection. In the next dialog you must choose the database driver you wish to use. It defaults to Java DB (Embedded). Pull down the combobox labeled Driver: and select MySQL (Connector/J driver). Click on Next and you will now see the Customize Connection dialog. Here you can enter the details of the connection. On my system the server is localhost and the database name is Aquarium. Here is what my dialog looks like. Notice the Test Connection button. I have clicked on mine and so I have the message Connection Succeeded. Click on Next. There is nothing to do on this dialog so click on Next. On this last dialog you have the option of assigning a name to the connection. By default it uses the URL but I prefer a more meaningful name. I have used AquariumMySQL. Click on Finish and the connection will appear under Databases. If the icon next to AquariumMySQL has what looks like a crack in it similar to the jdbc:derby connection then this means that a connection to the database could not be made. Verify that the database is running and is accessible. If it is then delete the connection and start over. Having a connection to the database in NetBeans is invaluable. You can interact with the database directly and issue SQL commands. As a MySQL user this means that I do not need to run the MySQL command line program to interact with the database. Step 3: Create a Web Application Project in NetBeans If you have not already done so create a New Project in NetBeans. I require my students to create a New Project in the Maven category of a Web Application project. Click on Next. In this dialog you can give the project a name and a location in your file system. The Artifact Id, Group Id and Version are used by Maven. The final dialog lets you select the application server that your application will use and the version of Java EE that your code must be compliant with. Here is my project ready for the next step. Step 4: Create the GlassFish JDBC Resource For GlassFish to manage your database connection you need to set up two resources, a JDBC Connection Pool and a JDBC Resource. You can create both in one step by creating a GlassFish JDBC Resource because you can create the Connection Pool as part of the same operation. Right mouse click on the project name and select New and then Other … Scroll down the Categories list and select GlassFish. In the File Types list select JDBC Resource. Click on Next. The next dialog is the General Attributes. Click on the radio button for Create New JDBC Connection Pool. In the text field JNDI Name enter a name that is unique for the project. JNDI names for connection resources always begin with jdbc/ followed by a name that starts with a lower case letter. I have used jdbc/myAquarium. Do not prefix the name with java:app/ as some tutorials suggest. An upcoming article will explain why. Click on Next. There is nothing for us to enter on the Properties dialog. Click on Next. On the Choose Database Connection dialog we will give our connection pool a name and select the database connection we created in Step 2. Notice that in the list of available connections you are shown the connection URL and not the name you assigned to it back in Step 2. Click on Next. On the Add Connection Pool Properties dialog you will see the connection URL and the user name and password. We do need to make one change. The resource type shows javax.sql.DataSource and we must change it to javax.sql.ConnectionPoolDataSource. Click on Next. There is nothing we need to change on Add Connection Pool Optional Properties so click on Finish. A new folder has appeared in the Projects view named Other Sources. It contains a sub folder named setup. In this folder is the file glassfish-resources.xml. The glassfish-resources.xml file will contain the following. I have reformatted the file for easier viewing. OPTIONAL Step 5: Configure GlassFish with glassfish-resources.xml The glassfish-resources.xml file, when included in the application’s WAR file in the WEB-INF folder, can configure the resource and pool for the application when it is deployed in GlassFish. When the application is un-deployed the resource and pool are removed. If you want to set up the resource and pool permanently in GlassFish then follow these steps. Go to the Services tab and select Servers and then right mouse click on GlassFish. If GlassFish is not running then click on Start. With the server started click on View Domain Admin Console. Your web browser will now open and show you the GlassFish console. If you assigned a user name and password to the server you will have to enter this information before you see the console. In the Common Tasks tree select Resources. You should now see in the panel adjacent to the tree the following: Click on Add Resources. You should now see: In the Location click on Choose File and locate your glassfish-resources.xml file. Mine is found at D:\NetBeansProjects\GlassFishTutorial\src\main\setup. You should now see: Click on OK. If everything has gone well you should see: The final task in this step is to test if the connection works. In the Common Tasks tree select Resources, JDBC, JDBC Connection Pools and aquariumPool. Click on Ping. You should see: The most common reason for the Ping to fail is that the database driver is not in the domain’s lib folder. Go to Step 1a and manually add the driver. The resources are now visible in NetBeans. Having the resource and pool add to GlassFish permanently will allow other applications to share this same resource and pool. You are now ready to code!
February 9, 2015
by Ken Fogel
· 52,608 Views · 3 Likes
article thumbnail
Microservices: Five Architectural Constraints
Microservices is a new software architecture and delivery paradigm, where applications are composed of several small runtime services. The current mainstream approach for software delivery is to build, integrate, and test entire applications as a monolith. This approach requires any software change, however small, to require a full test cycle of the entire application. With Microservices a software module is delivered as an independent runtime service with a well defined API. The Microservices approach allow faster delivery of smaller incremental changes to an application. There are several tradeoffs to consider with the Microservices architecture. On one hand, the Microservices approach builds on several best practices and patterns for software design, architecture, and DevOps style organization. On the other hand, Microservices requires expertise in distributed programming and can become an operational nightmare without proper tooling in place. There are several good posts that highlight the pros-and-cons of Microservices, and I have added in the references section. In the remainder of this post, I will define five architectural constraints (principles that drive desired properties) for the Microservices architectural style. To be a Microservice, a service must be: Elastic Resilient Composable Minimal, and; Complete Microservice Constraint #1 - Elastic A microservice must be able to scale, up or down, independently of other services in the same application. This constraint implies that based on load, or other factors, you can fine tune your applications performance, availability, and resource usage. This constraint can be realized in different ways, but a popular pattern is to architect the system so that you can run multiple stateless instances of each microservice, and there is a mechanism for Service naming, registration, and discovery along with routing and load-balancing of requests. Microservice Constraint #2 - Resilient A microservice must fail without impacting other services in the same application. A failure of a single service instance should have minimal impact on the application. A failure of all instances of a microservice, should only impact a single application function and users should be able to continue using the rest of the application without impact. Adrian Cockroft describes Microservices as loosely coupled service oriented architecture with bounded contexts [3]. To be resilient a service has to be loosely coupled with other services, and a bounded context limits a service’s failure domain. Microservice Constraint #3 - Composable A microservice must offer an interface that is uniform and is designed to support service composition. Microservice APIs should be designed with a common way of identifying, representing, and manipulating resources, describing the API schema and supported API operations. The ‘Uniform Interfaces constraint of the REST architectural style describes this in detail. Service Composition is a SOA principle that has fairly obvious benefits, but few guidelines on how it can be achieved. A Microservice interface should be designed to support composition patterns like aggregation, linking, and higher-level functions such as caching, proxies and gateways. I previously discussed REST constraints and elements in as two part blog post: REST is not about APIs Microservice Constraint #4 - Minimal A microservice must only contain highly cohesive entities In software, cohesion is a measure of whether things belong together. A module is said to have high cohesion if all objects and functions in it are focused on the same tasks. Higher cohesion leads to more maintainable software. A Microservice should perform a single business function, which implies that all of its components are highly cohesive. This is also an Single Responsibility Principle (SRP) of object-oriented design [5] Microservice Constraint #5 - Complete A microservice must be functionally complete Bjarne Stroustrup, the creator of C++, stated that a good interface must be, “minimal but complete” i.e. as small as possible, and no smaller. Similarly, a Microservice must offer a complete function, with minimal dependencies (loose coupling) to other services in the application. This is important, as otherwise its becomes impossible to version and upgrade individual services. This constraint is designed to oppose the minimal constraint. Put together a microservice must be “minimal but complete.” Conclusions Designing a Microservices application requires application of several principles, patterns, and best practices of modular design and service-oriented architectures. In this post, I've outlined five architectural constraints which can help guide and retain the key benefits of a Microservices-style architecture. For example, Microservices Constraint# 1 - Elastic steers implementations towards separating the data tier from the application tier, and leads to stateless services. At Nirmata we have built our solution, that makes it easy to deploy and operate microservices applications, using these very same principles. We believe that Microservices style applications, running in containers, will power the next generation of software innovation. If you are using, or interested in using microservices, I would love to hear from you. Jim Bugwadia Founder and CEO Nirmata -- For additional content and articles follow us at @NirmataCloud. -- If you are in the San Francisco Bay Area, come join our Microservices meetup group. References [1] Microservices, Martin Fowler and James Lewis, http://martinfowler.com/articles/microservices.html [2] Microservices Are Not a free lunch!, Benjamin Wootton, http://contino.co.uk/microservices-not-a-free-lunch/ [3] State of the Art in Microservices, Adrian Cockroft, http://thenewstack.io/dockercon-europe-adrian-cockcroft-on-the-state-of-microservices/ [4] The Principles of Object-Oriented Design, Robert C. Martin, http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod
February 5, 2015
by Jim Bugwadia
· 13,211 Views · 7 Likes
article thumbnail
Dropwizard vs Spring Boot—A Comparison Matrix
Of late, I have been looking into Microservice containers that are available out there to help speed up the development. Although, Microservice is a generic term however there is some consensus with respect to what it means. Hence, we may conveniently refer to the definition Microservice as an "architectural design pattern, in which complex applications are composed of small, independent processes communicating with each other using language-agnostic APIs. These services are small, highly decoupled and focus on doing a small task." There are several Microservice containers out there. However, in my experience I have found Dropwizard and Spring-boot to have had received more attention and they appear to be widely used compared to the rest. In my current role, I was asked create a comparison matrix between the two, so it's here below. Dropwizard Spring-Boot What is it? Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done. [more...] Takes an opinionated view of building production-ready Spring applications. Spring Boot favours convention over configuration and is designed to get you up and running as quickly as possible. [more...] Overview? Dropwizard straddles the line between being a library and a framework. Provide performant, reliable implementations of everything a production-ready web application needs. [more...] Spring-boot takes an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration. [more...] Out of the box features? Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality web service in the shortest time possible. [more...] Spring-boot provides a range of non-functional features that are common to large classes of projects (e.g. embedded servers, security, metrics, health checks, externalized configuration). [more...] Libraries Core: Jetty, Jersey, Jackson and Matrics Others: Guava, Liquibase and Joda Time. Spring, JUnit, Logback, Guava. There are several starter POM files covering various use cases, which can be included in the POM to get started. Dependency Injection? No built in Dependency Injection. Requires a 3rd party dependency injection framework such as Guice, CDI or Dagger. [Ref...] Built in Dependency Injection provided by Spring Dependency Injection container. [Ref...] Types of Services i.e. REST, SOAP Has some support for other types of services but primarily is designed for performant HTTP/REST LAYER. If ever need to integrate SOAP, there is a dropwizard bundle for building SOAP web services using JAX-WS API is provided here but it’s not official drop-wizard sub project. [more...] As well as supporting REST Spring-boot has support for other types of services such as JMS, Advanced Message Queuing Protocol, SOAP based Web Services to name a few. [more...] Deployment? How it creates the Executable Jar? Uses Shading to build executable fat jars, where a shaded jar spackages all classes, from all jars, into a single 'uber jar'. [Ref...] Spring-boot adopts a different approach and avoids shaded jars, as it becomes hard to see which libraries you are actually using in your application. It can also be problematic if the same filename is used in Shaded jars. Instead it uses “Nested Jar” approach where all classes from all jars do not need to be included into a single “uber jar” instead all dependent jars should be in the “lib” folder, spring loader loads them appropriately. [Ref...] Contract First Web Services? No built in support. Would have to refer to 3rd party library (CXF or any other JAX-WS implementation) if needed a solution for the Contract First SOAP based services. Contract First services support is available with the help of spring-boot-starter-ws starter application. [Ref...] Externalised Configuration for properties and YAML Supports both Properties and YAML Supports both Properties and YAML Concluding Remarks If dealing with only REST micro services, drop wizard is an excellent choice. Where Spring-boot shines is the types of services supported i.e. REST, JMS, Messaging, and Contract First Services. Not least a fully built in Dependency Injection container. Disclaimer: The matrix is purely based on my personal views and experiences, having tried both frameworks and is by no means an exhaustive guide. Readers are requested to do their own research before making a strategic decision between the two very formidable frameworks.
February 2, 2015
by Rizwan Ullah
· 74,003 Views · 9 Likes
article thumbnail
Andrews Curves
Andrews curves are a method for visualizing multidimensional data by mapping each observation onto a function. This function is defined as It has been shown the Andrews curves are able to preserve means, distance (up to a constant) and variances. Which means that Andrews curves that are represented by functions close together suggest that the corresponding data points will also be close together. Now, we will demonstrate the effectiveness of the Andrew curves on the iris dataset (which we already used here). Let's create a function to compute the values of the functions give a single sample: import numpy as np def andrew_curve4(x,theta):# iris has 4 four dimensions base_functions =[lambda x : x[0]/np.sqrt(2.),lambda x : x[1]*np.sin(theta),lambda x : x[2]*np.cos(theta),lambda x : x[3]*np.sin(2.*theta)] curve = np.zeros(len(theta))for f in base_functions: curve = curve + f(x)return curve At this point we can load the dataset and plot the curves for a subset of samples: samples = np.loadtxt('iris.csv', usecols=[0,1,2,3], delimiter=',')#samples = samples - np.mean(samples)#samples = samples / np.std(samples) classes = np.loadtxt('iris.csv', usecols=[4], delimiter=',',dtype=np.str) theta = np.linspace(-np.pi,np.pi,100)import pylab as pl for s in samples[:20]:# setosa pl.plot(theta, andrew_curve4(s,theta),'r')for s in samples[50:70]:# versicolor pl.plot(theta, andrew_curve4(s,theta),'b')for s in samples[100:120]:# virginica pl.plot(theta, andrew_curve4(s,theta),'g') pl.xlim(-np.pi,np.pi) pl.show() In the plot above, the each color used represents a class and we can easily note that the lines that represent samples from the same class have similar curves.
January 29, 2015
by Giuseppe Vettigli
· 14,007 Views · 6 Likes
article thumbnail
Bulk Data Insertion into Oracle Database in C#
Bulk insertion of data into database table is a big overhead in application development.
January 28, 2015
by Ayobami Adewole
· 44,742 Views
article thumbnail
Code Coverage for Embedded Target with Eclipse, gcc and gcov
The great thing with open source tools like Eclipse and GNU (gcc, gdb) is that there is a wealth of excellent tools: one thing I had in mind to explore for a while is how to generate code coverage of my embedded application. Yes, GNU and Eclipse come with code profiling and code coverage tools, all for free! The only downside seems to be that these tools seem to be rarely used for embedded targets. Maybe that knowledge is not widely available? So here is my attempt to change this :-). Or: How cool is it to see in Eclipse how many times a line in my sources has been executed? Line Coverage in Eclipse And best of all, it does not stop here…. Coverage with Eclipse To see how much percentage of my files and functions are covered? gcov in Eclipse Or even to show the data with charts? Coverage Bar Graph View Outline In this tutorial I’m using a Freescale FRDM-K64F board: this board has ARM Cortex-M4F on it, with 1 MByte FLASH and 256 KByte of RAM. The approach used in this tutorial can be used with any embedded target, as long there is enough RAM to store the coverage data on the target. I’m using Eclipse Kepler with the ARM Launchpad GNU tools (q3 2014 release), but with small modifications any Eclipse version or GNU toolchain could be used. To generate the Code Coverage information, I’m using gcov. Freescale FRDM-K64F Board Generating Code Coverage Information with gcov gcov is an open source program which can generate code coverage information. It tells me how often each line of a program is executed. This is important for testing, as that way I can know which parts of my application actually has been executed by the testing procedures. Gcov can be used as well for profiling, but in this post I will use it to generate coverage information only. The general flow to generate code coverage is: Instrument code: Compile the application files with a special option. This will add (hidden) code and hooks which records how many times a piece of code is executed. Generate Instrumentation Information: as part of the previous steps, the compiler generates basic block and line information. This information is stored on the host as *.gcno (Gnu Coverage Notes Object?) files. Run the application: While the application is running on the target, the instrumented code will record how many the lines or blocks in the application are executed. This information is stored on the target (in RAM). Dump the recorded information: At application exit (or at any time), the recorded information needs to be stored and sent to the host. By default gcov stores information in files. As a file system might not be alway available, other methods can be used (serial connection, USB, ftp, …) to send and store the information. In this tutorial I show how the debugger can be used for this. The information is stored as *.gcda (Gnu Coverage Data Analysis?) files. Generate the reports and visualize them with gcov. General gcov Flow gcc does the instrumentation and provides the library for code coverage, while gcov is the utility to analyze the generated data. Coverage: Compiler and Linker Options To generate the *.gcno files, the following option has to be added for each file which should generate coverage information: -fprofile-arcs -ftest-coverage :idea: There is as well the ‘–coverage’ option (which is a shortcut option) which can be used both for the compiler and linker. But I prefer the ‘full’ options so I know what is behind the options. -fprofile-arcs Compiler Option The option -fprofile-arcs adds code to the program flow to so execution of source code lines are counted. It does with instrumenting the program flow arcs. From https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html: -fprofile-arcs Add code so that program flow arcs are instrumented. During execution the program records how many times each branch and call is executed and how many times it is taken or returns. When the compiled program exits it saves this data to a file called auxname.gcda for each source file. The data may be used for profile-directed optimizations (-fbranch-probabilities), or for test coverage analysis (-ftest-coverage). Each object file’s auxname is generated from the name of the output file, if explicitly specified and it is not the final executable, otherwise it is the basename of the source file. In both cases any suffix is removed (e.g. foo.gcda for input file dir/foo.c, or dir/foo.gcda for output file specified as -o dir/foo.o). See Cross-profiling. If you are not familiar with compiler technology or graph theory: An ‘Arc‘ (alternatively ‘edge’ or ‘branch’) is a directed link between a pair ‘Basic Blocks‘. A Basic is a sequence of code which has no branching in it (it is executed in a single sequence). For example if you have the following code: k = 0; if (i==10) { i += j; j++; } else { foo(); } bar(); Then this consists of the following four basic blocks: Basic Blocks The ‘Arcs’ are the directed edges (arrows) of the control flow. It is important to understand that not every line of the source gets instrumented, but only the arcs: This means that the instrumentation overhead (code size and data) depends how ‘complicated’ the program flow is, and not how many lines the source file has. However, there is an important aspect to know about gcov: it provides ‘condition coverage‘ if a full expression evaluates to TRUE or FALSE. Consider the following case: if (i==0 || j>=20) { In other words: I get coverage how many times the ‘if’ has been executed, but *not* how many times ‘i==0′ or ‘j>=20′ (which would be ‘decision coverage‘, which is not provided here). See http://www.bullseye.com/coverage.html for all the details. -ftest-coverage Compiler Option The second option for the compiler is -ftest-coverage (from https://gcc.gnu.org/onlinedocs/gcc-3.4.5/gcc/Debugging-Options.html): -ftest-coverage Produce a notes file that the gcov code-coverage utility (see gcov—a Test Coverage Program) can use to show program coverage. Each source file’s note file is called auxname.gcno. Refer to the -fprofile-arcs option above for a description of auxname and instructions on how to generate test coverage data. Coverage data will match the source files more closely, if you do not optimize. So this option generates the *.gcno file for each source file I decided to instrument: gcno file generated This file is needed later to visualize the data with gcov. More about this later. Adding Compiler Options So with this knowledge, I need to add -fprofile-arcs -ftest-coverage as compiler option to every file I want to profile. It is not necessary profile the full application: to save ROM and RAM and resources, I can add this option only to the files needed. Actually as a starter, I recommend to instrument a single source file only at the beginning. For this I select the properties (context menu) of my file Test.c I add the options in ‘other compiler flags': Coverage Added to Compilation File -fprofile-arcs Linker Option Profiling not only needs a compiler option: I need to tell the linker that it needs to link with the profiler library. For this I add -fprofile-arcs to the linker options: -fprofile-arcs Linker Option Coverage Stubs Depending on your library settings, you might now get a lot of unresolved symbol linker errors. This is because by default the profiling library assumes to write the profiling information to a file system. However, most file systems do *not* have a file system. To overcome this, I add a stubs for all the needed functions. I have them added with a file to my project (see latest version of that file on GitHub): /* * coverage_stubs.c * * These stubs are needed to generate coverage from an embedded target. */ #include #include #include #include #include #include #include "UTIL1.h" #include "coverage_stubs.h" /* prototype */ void gcov_exit(void); /* call the coverage initializers if not done by startup code */ void static_init(void) { void (**p)(void); extern uint32_t __init_array_start, __init_array_end; /* linker defined symbols, array of function pointers */ uint32_t beg = (uint32_t)&__init_array_start; uint32_t end = (uint32_t)&__init_array_end; while(begst_mode = S_IFCHR; return 0; } int _getpid(void) { return 1; } int _isatty(int file) { switch (file) { case STDOUT_FILENO: case STDERR_FILENO: case STDIN_FILENO: return 1; default: errno = EBADF; return 0; } } int _kill(int pid, int sig) { (void)pid; (void)sig; errno = EINVAL; return (-1); } int _lseek(int file, int ptr, int dir) { (void)file; (void)ptr; (void)dir; return 0; /* return offset in file */ } #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wreturn-type" __attribute__((naked)) static unsigned int get_stackpointer(void) { __asm volatile ( "mrs r0, msp \r\n" "bx lr \r\n" ); } #pragma GCC diagnostic pop void *_sbrk(int incr) { extern char __HeapLimit; /* Defined by the linker */ static char *heap_end = 0; char *prev_heap_end; char *stack; if (heap_end==0) { heap_end = &__HeapLimit; } prev_heap_end = heap_end; stack = (char*)get_stackpointer(); if (heap_end+incr > stack) { _write (STDERR_FILENO, "Heap and stack collision\n", 25); errno = ENOMEM; return (void *)-1; } heap_end += incr; return (void *)prev_heap_end; } int _read(int file, char *ptr, int len) { (void)file; (void)ptr; (void)len; return 0; /* zero means end of file */ } :idea: In this code I’m using the UTIL1 (Utility) Processor Expert component, available on SourceForge. If you do not want/need this, you can remove the lines with UTIL1. - Coverage Stubs File in Project Coverage Constructors There is one important thing to mention: the coverage data structures need to be initialized, similar to constructors for C++. Depending on your startup code, this might *not* be done automatically. Check your linker .map file for some _GLOBAL__ symbols: .text._GLOBAL__sub_I_65535_0_TEST_Test 0x0000395c 0x10 ./Sources/Test.o Such a symbol should exist for every source file which has been instrumented with coverage information. These are functions which need to be called as part of the startup code. Set a breakpoint in your code at the given address to check if it gets called. If not, you need to call it yourself. :!: Typically I use the linker option ‘-nostartfiles’), and I have my startup code. In that case, these constructors are not called by default, so I need to do myself. See http://stackoverflow.com/questions/6343348/global-constructor-call-not-in-init-array-section In my linker file I have this: .init_array : { PROVIDE_HIDDEN (__init_array_start = .); KEEP (*(SORT(.init_array.*))) KEEP (*(.init_array*)) PROVIDE_HIDDEN (__init_array_end = .); } > m_text This means that there is a list of constructor function pointers put together between __init_array_start and __init_array_end. So all what I need is to iterate through this array and call the function pointers: /* call the coverage initializers if not done by startup code */ void static_init(void) { void (**p)(void); extern uint32_t __init_array_start, __init_array_end; /* linker defined symbols, array of function pointers */ uint32_t beg = (uint32_t)&__init_array_start; uint32_t end = (uint32_t)&__init_array_end; while(beg stack) { _write (STDERR_FILENO, "Heap and stack collision\n", 25); errno = ENOMEM; return (void *)-1; } heap_end += incr; return (void *)prev_heap_end; } :!: It might be that several kBytes of heap are needed. So if you are running in a memory constraint system, be sure that you have enough RAM available. The above implementation assumes that I have space between my heap end and the stack area. :!: If your memory mapping/linker file is different, of course you will need to change that _sbrk() implementation. Compiling and Building Now the application should compile and link without errors.Check that the .gcno files are generated: :idea: You might need to refresh the folder in Eclipse. - .gcno files generated In the next steps I’m showing how to get the coverage data as *.gcda files to the host using gdb. Using Debugger to get the Coverage Data The coverage data gets dumped when _exit() gets called by the application. Alternatively I could call gcov_exit() or __gcov_flush() any time. What it then does is Open the *.gcda file with _open() for every instrumented source file. Write the data to the file with _write(). So I can set a breakpoint in the debugger to both _open() and _write() and have all the data I need :-) With _open() I get the file name, and I store it in a global pointer so I can reference it in _write(): static const unsigned char *fileName; /* file name used for _open() */ int _open (const char *ptr, int mode) { (void)mode; fileName = (const unsigned char*)ptr; /* store file name for _write() */ return 0; } In _write() I get a pointer to the data and the length of the data. Here I can dump the data to a file using the gdb command: dump binary memory I could use a calculator to calculate the memory dump range, but it is much easier if I let the program generate the command line for gdb :-): int _write(int file, char *ptr, int len) { static unsigned char gdb_cmd[128]; /* command line which can be used for gdb */ (void)file; /* construct gdb command string */ UTIL1_strcpy(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)"dump binary memory "); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), fileName); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)" 0x"); UTIL1_strcatNum32Hex(gdb_cmd, sizeof(gdb_cmd), (uint32_t)ptr); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)" 0x"); UTIL1_strcatNum32Hex(gdb_cmd, sizeof(gdb_cmd), (uint32_t)(ptr+len)); return 0; } That way I can copy the string in the gdb debugger: Generated GDB Memory Dump Command That command gets pasted and executed in the gdb console: gdb command line After execution of the program, the *.gcda file gets created (refresh might be necessary to show it up): gcda file created Repeat this for all instrumented files as necessary. Showing Coverage Information To show the coverage information, I need the *.gcda, the *.gcno plus the .elf file. :idea: Use Refresh if not all files are shown in the Project Explorer view Files Ready to Show Coverage Information Then double-click on the gcda file to show coverage results: Double Click on gcda File Press OK, and it opens the gcov view. Double click on file in that view to show the details: gcov Views Use the chart icon to create a chart view: Chart view Bar Graph View Video of Steps to Create and Use Coverage The following video summarizes the steps needed: Data and Code Overhead Instrumenting code to generate coverage information means that it is an intrusive method: it impacts the application execution speed, and needs extra RAM and ROM. How much heavily depends on the complexity of the control flow and on the number of arcs. Higher compiler optimizations would reduce the code size footprint, however optimizations are not recommended for coverage sessions, as this might make the job of the coverage much harder. I made a quick comparison using my test application. I used the ‘size’ GNU command (see “Printing Code Size Information in Eclipse”). Without coverage enabled, the application footprint is: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 6360 1112 5248 12720 31b0 FRDM-K64F_Coverage.elf With coverage enabled only for Test.c gave: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 39564 2376 9640 51580 c97c FRDM-K64F_Coverage.elf Adding main.c to generate coverage gives: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 39772 2468 9700 51940 cae4 FRDM-K64F_Coverage.elf So indeed there is some initial add-up because of the coverage library, but afterwards adding more source files does not add up much. Summary It took me a while and reading many articles and papers to get code coverage implemented for an embedded target. Clearly, code coverage is easier if I have a file system and plenty of resources available. But I’m now able to retrieve coverage information from a rather small embedded system using the debugger to dump the data to the host. It is not practical for large sets of files, but at least a starting point :-). I have committed my Eclipse Kepler/Launchpad project I used in this tutorial on GitHub. Ideas I have in my mind: Instead using the debugger/gdb, use FatFS and SD card to store the data Exploring how to use profiling Combining multiple coverage runs Happy Covering :-) Links: Blog article who helped me to explore gcov for embedded targets: http://simply-embedded.blogspot.ch/2013/08/code-coverage-introduction.html Paper about using gcov for Embedded Systems: http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/gcov-on-an-embedded-system.pdf Article about coverage options for GNU compiler and linker: http://bobah.net/d4d/tools/code-coverage-with-gcov How to call static constructor methods manually: http://stackoverflow.com/questions/6343348/global-constructor-call-not-in-init-array-section Article about using gcov with lcov: https://qiaomuf.wordpress.com/2011/05/26/use-gcov-and-lcov-to-know-your-test-coverage/ Explanation of different coverage methods and terminology: http://www.bullseye.com/coverage.html
January 28, 2015
by Erich Styger
· 15,716 Views
article thumbnail
Improving Lock Performance in Java
After we introduced locked thread detection to Plumbr couple of months ago, we have started to receive queries similar to “hey, great, now I understand what is causing my performance issues, but what I am supposed to do now?” We are working hard to build the solution instructions into our own product, but in this post I am going to share several common techniques you can apply independent of the tool used for detecting the lock. The methods include lock splitting, concurrent data structures, protecting the data instead of the code and lock scope reduction. Locking is not evil, lock contention is Whenever you face a performance problem with the threaded code there is a chance that you will start blaming locks. After all, common “knowledge” is that locks are slow and limit scalability. So if you are equipped with this “knowledge” and start to optimize the code and getting rid of locks there is a chance that you end up introducing nasty concurrency bugs that will surface later on. So it is important to understand the difference between contended and uncontended locks. Lock contention occurs when a thread is trying to enter the synchronized block/method currently executed by another thread. This second thread is now forced to wait until the first thread has completed executing the synchronized block and releases the monitor. When only one thread at a time is trying to execute the synchronized code, the lock stays uncontended. As a matter of fact, synchronization in JVM is optimized for the uncontended case and for the vast majority of the applications, uncontended locks pose next to no overhead during execution. So, it is not locks you should blame for performance, but contended locks. Equipped with this knowledge, lets see what we can do to reduce either the likelihood of contention or the length of the contention. Protect the data not the code A quick way to achieve thread-safety is to lock access to the whole method. For example, take look at the following example, illustrating a naive attempt to build an online poker server: class GameServer { public Map<> tables = new HashMap>(); public synchronized void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } public synchronized void leave(Player player, Table table) {/*body skipped for brevity*/} public synchronized void createTable() {/*body skipped for brevity*/} public synchronized void destroyTable(Table table) {/*body skipped for brevity*/} } The intentions of the author have been good - when new players join() the table, there must be a guarantee that the number of players seated at the table would not exceed the table capacity of nine. But whenever such a solution would actually be responsible for seating players to tables - even on a poker site with moderate traffic, the system would be doomed to constantly trigger contention events by threads waiting for the lock to be released. Locked block contains account balance and table limit checks which potentially can involve expensive operations both increasing the likelihood and length of the contention. First step towards solution would be making sure we are protecting the data, not the code by moving the synchronization from the method declaration to the method body. In the minimalistic example above, it might not change much at the first place. But lets consider the whole GameServerinterface, not just the single join() method: class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { synchronized (tables) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } public void leave(Player player, Table table) {/* body skipped for brevity */} public void createTable() {/* body skipped for brevity */} public void destroyTable(Table table) {/* body skipped for brevity */} } What originally seemed to be a minor change, now affects the behaviour of the whole class. Whenever players were joining tables, the previously synchronized methods locked on theGameServer instance (this) and introduced contention events to players trying to simultaneouslyleave() tables. Moving the lock from the method signature to the method body postpones the locking and reduces the contention likelihood. Reduce the lock scope Now, after making sure it is the data we actually protect, not the code, we should make sure our solution is locking only what is necessary - for example when the code above is rewritten as follows: public class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { synchronized (tables) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } //other methods skipped for brevity } then the potentially time-consuming operation of checking player account balance (which potentially can involve IO operations) is now outside the lock scope. Notice that the lock was introduced only to protect against exceeding the table capacity and the account balance check is not anyhow part of this protective measure. Split your locks When we look at the last code example, you can clearly notice that the whole data structure is protected by the same lock. Considering that we might hold thousands of poker tables in this structure, it still poses a high risk for contention events as we have to protect each table separately from overflowing in capacity. For this there is an easy way to introduce individual locks per table, such as in the following example: public class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); synchronized (tablePlayers) { if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } //other methods skipped for brevity } Now, if we synchronize the access only to the same table instead of all the tables, we have significantly reduced the likelihood of locks becoming contended. Having for example 100 tables in our data structure, the likelihood of the contention is now 100x smaller than before. Use concurrent data structures Another improvement is to drop the traditional single-threaded data structures and use data structures designed explicitly for concurrent usage. For example, when picking ConcurrentHashMapto store all your poker tables would result in code similar to following: public class GameServer { public Map> tables = new ConcurrentHashMap>(); public synchronized void join(Player player, Table table) {/*Method body skipped for brevity*/} public synchronized void leave(Player player, Table table) {/*Method body skipped for brevity*/} public synchronized void createTable() { Table table = new Table(); tables.put(table.getId(), table); } public synchronized void destroyTable(Table table) { tables.remove(table.getId()); } } The synchronization in join() and leave() methods is still behaving as in our previous example, as we need to protect the integrity of individual tables. So no help from ConcurrentHashMap in this regards. But as we are also creating new tables and destroying tables in createTable() and destroyTable()methods, all these operations to the ConcurrentHashMap are fully concurrent, permitting to increase or reduce the number of tables in parallel. Other tips and tricks Reduce the visibility of the lock. In the example above, the locks are declared public and are thus visible to the world, so there is there is a chance that someone else will ruin your work by also locking on your carefully picked monitors. Check out java.util.concurrent.locks to see whether any of the locking strategies implemented there will improve the solution. Use atomic operations. The simple counter increase we are actually conducting in example above does not actually require a lock. Replacing the Integer in count tracking withAtomicInteger would most suit this example just fine. Hope the article helped you to solve the lock contention issues, independent of whether you are using Plumbr automatic lock detection solution or manually extracting the information from thread dumps.
January 22, 2015
by Nikita Salnikov-Tarnovski
· 11,528 Views
article thumbnail
Implementing Ehcache using Spring context and Annotations
Part 1 – Configuring Ehcache with Spring Ehcache is most widely used open source cache for boosting performance, offloading your database, and simplifying scalability. It is very easy to configure Ehcache with Spring context xml. Assuming you have a working Spring configured application, here we discuss how to integrate Ehcache with Spring. In order to configure it, we need to add a CacheManager into the Spring context file. Code snippet added below In ‘ehcache’ bean, we are referring to ‘Ehcache.xml’ file which will contain all ehcache configurations. Sample ehcache file is added below. You can configure this file according to your project needs. Here you can two caches, defaultCache and one defined with name ‘TestCache’. DefaultCache configuration is applied to any cache that is not explicitly configured. Part 2 - Enabling Caching Annotations In order to used Spring provided java annotations , we need to declarative enable cache annotations in Spring config file. Code snippet added below. In order to use above annotation declaration, we need to add cache namespace into the spring file. Code snippet added below. Spring Annotations for caching @Cacheable triggers cache population @CacheEvict triggers cache eviction @CachePut updates the cache without interfering with the method execution I will give brief description and practical implementation of these annotations. For detailed explanations please refer spring documentation for caching . @Cacheable - This annotation means that the return value of the corresponding method can be cached and multiple invocation of this method with same arguments must populate the return value from cache without executing the method. Code snippet added below. @Cacheable("customer") public Customer findCustomer(String custId) {...} Each time the method is called, the cache with name ‘customer’ is checked to see whether the invocation has been already executed for the same ‘custId’. @CachePut - This annotation is used to update the cache without interfering the method execution. Method will always be executed and its result will be replaced in the cache. This annoatation is used for cache population. @CachePut("customer") public Customer updateBook(String custId){...} @CacheEvict – This annotation is used for cache eviction, used to remove stale or unused data from the cache. @CacheEvict(value="customer", allEntries=true) public void unloadCustomer(String newBatch) With above configurations, all entries in the ‘customer’ cache will be removed. Above description will give head start to configure Ehcachewith spring and use some of the caching annotations.
January 21, 2015
by Roshan Thomas
· 12,226 Views
article thumbnail
Lambda Architecture for Big Data
An increasing number of systems are being built to handle the Volume, Velocity and Variety of Big Data, and hopefully help gain new insights and make better business decisions. Here, we will look at ways to deal with Big Data’s Volume and Velocity simultaneously, within a single architecture solution. Volume + Velocity Apache Hadoop provides both reliable storage (HDFS) and a processing system (MapReduce) for large data sets across clusters of computers. MapReduce is a batch query processor that is targeted at long-running background processes. Hadoop can handle Volume. But to handle Velocity, we need real-time processing tools that can compensate for the high-latency of batch systems, and serve the most recent data continuously, as new data arrives and older data is progressively integrated into the batch framework. Therefore we need both batch and real-time to run in parallel, and add a real-time computational system (e.g. Apache Storm) to our batch framework. This architectural combination of batch and real-time computation is referred to as a Lambda Architecture (λ). Generic Lambda λ has three layers: The Batch Layer manages the master data and precomputes the batch views The Speed Layer serves recent data only and increments the real-time views The Serving Layer is responsible for indexing and exposing the views so that they can be queried. The three layers are outlined in the below diagram along with a sample choice of technology stacks: Incoming data is dispatched to both Batch and Speed layers for processing. At the other end, queries are answered by merging both batch and real-time views. Note that real-time views are transient by nature and their data is discarded (making room for newer data) once propagated through the Batch and Serving layers. Most of the complexity is pushed onto the much smaller Speed layer where the results are only temporary, a process known as “complexity isolation“. We are indeed isolating the complexity of concurrent data updates in a layer that is regularly purged and kept small in size. λ is technology agnostic. The data pipeline is broken down into layers with clear demarcation of responsibilities, and at each layer, we can choose from a number of technologies. The Speed layer for instance could use either Apache Storm, or Apache Spark Streaming, or Spring “XD” ( eXtreme Data) etc. How do we recover from mistakes in λ ? Basically, we recompute the views. If that takes too long, we just revert to the previous, non-corrupted versions of our data. We can do that because of data immutability in the master dataset: data is never updated, only appended to (time-based ordering). The system is therefore Human Fault-Tolerant: if we write bad data, we can just remove that data altogether and recompute. Unified Lambda The downside of λ is its inherent complexity. Keeping in sync two already complex distributed systems is quite an implementation and maintenance challenge. People have started to look for simpler alternatives that would bring just about the same benefits and handle the full problem set. There are basically three approaches: 1) Adopt a pure streaming approach, and use a flexible framework such as Apache Samza to provide some type of batch processing. Although its distributed streaming layer is pluggable, Samza typically relies on Apache Kafka. Samza’s streams are replayable, ordered partitions. Samza can be configured for batching, i.e. consume several messages from the same stream partition in sequence. 2) Take the opposite approach, and choose a flexible Batch framework that would also allow micro-batches, small enough to be close to real-time, with Apache Spark/Spark Streaming or Storm’s Trident. Spark streaming is essentially a sequence of small batch processes that can reach latency as low as one second.Trident is a high-level abstraction on top of Storm that can process streams as small batches as well as do batch aggregation. 3) Use a technology stack already combining batch and real-time, such as Spring “XD”, Summingbird or Lambdoop. Summingbird (“Streaming MapReduce”) is a hybrid system where both batch/real-time workflows can be run at the same time and the results merged automatically.The Speed layer runs on Storm and the Batch layer on Hadoop, Lambdoop (Lambda-Hadoop, with HBase, Storm and Redis) also combines batch/real-time by offering a single API for both processing paradigms: The integrated approach (unified λ) seeks to handle Big Data’s Volume and Velocity by featuring a hybrid computation model, where both batch and real-time data processing are combined transparently. And with a unified framework, there would be only one system to learn, and one system to maintain.
January 17, 2015
by Tony Siciliani
· 40,451 Views · 6 Likes
article thumbnail
Structurizr: System Context Diagram as Code
as i said in resolving the conflict between software architecture and code , my focus for this year is representing a software architecture model as code. in simple sketches for diagramming your software architecture , i showed an example system context diagram for my techtribes.je website. it's a simple diagram that shows techtribes.je in the middle, surrounded by the key types of users and system dependencies. it's your typical "big picture" view. this diagram was created using omnigraffle (think microsoft visio for mac os x) and it's exactly that - a static diagram that needs to be manually kept up to date. instead, wouldn't it be great if this diagram was based upon a model that we could better version control, collaborate on and visualize? if you're not sure what i mean by a "model", take a look at models, sketches and everything in between . this is basically what the aim of structurizr is. it's a way to describe a software architecture model as code, and then visualize it in a simple way. the structurizr java library is available on github and you can download a prebuilt binary . just as a warning, this is very much a work in progress and so don't be surprised if things change! here's some java code to recreate the techtribes.je system context diagram. package com.structurizr.example; import com.structurizr.io.json.jsonwriter; import com.structurizr.model.location; import com.structurizr.model.model; import com.structurizr.model.person; import com.structurizr.model.softwaresystem; import com.structurizr.view.systemcontextview; import com.structurizr.view.viewset; import java.io.stringwriter; /** * this is a model of the system context for the techtribes.je system, * the code for which can be found at https://github.com/techtribesje/techtribesje */ public class techtribessystemcontext { public static void main(string[] args) throws exception { // create a model and the software system we want to describe model model = new model("techtribes.je", "this is a model of the system context for the techtribes.je system, the code for which can be found at https://github.com/techtribesje/techtribesje"); softwaresystem techtribes = model.addsoftwaresystem(location.internal, "techtribes.je", "techtribes.je is the only way to keep up to date with the it, tech and digital sector in jersey and guernsey, channel islands"); // create the various types of people (roles) that use the software system person anonymoususer = model.addperson(location.external, "anonymous user", "anybody on the web."); anonymoususer.uses(techtribes, "view people, tribes (businesses, communities and interest groups), content, events, jobs, etc from the local tech, digital and it sector."); person authenticateduser = model.addperson(location.external, "aggregated user", "a user or business with content that is aggregated into the website."); authenticateduser.uses(techtribes, "manage user profile and tribe membership."); person adminuser = model.addperson(location.external, "administration user", "a system administration user."); adminuser.uses(techtribes, "add people, add tribes and manage tribe membership."); // create the various software systems that techtribes.je has a dependency on softwaresystem twitter = model.addsoftwaresystem(location.external, "twitter", "twitter.com"); techtribes.uses(twitter, "gets profile information and tweets from."); softwaresystem github = model.addsoftwaresystem(location.external, "github", "github.com"); techtribes.uses(github, "gets information about public code repositories from."); softwaresystem blogs = model.addsoftwaresystem(location.external, "blogs", "rss and atom feeds"); techtribes.uses(blogs, "gets content using rss and atom feeds from."); // now create the system context view based upon the model viewset viewset = new viewset(model); systemcontextview contextview = viewset.createcontextview(techtribes); contextview.addallsoftwaresystems(); contextview.addallpeople(); // and output the model and view to json (so that we can render it using structurizr.com) jsonwriter jsonwriter = new jsonwriter(true); stringwriter stringwriter = new stringwriter(); jsonwriter.write(viewset, stringwriter); system.out.println(stringwriter.tostring()); } } executing this code creates this json , which you can then copy and paste into the try it page of structurizr. the result (if you move the boxes around) is something like this. don't worry, there will eventually be an api for uploading software architecture models and the diagrams will get some styling, but it proves the concept. what we have then is an api that implements the various levels in my c4 software architecture model, with a simple browser-based rendering tool. hopefully that's a nice simple introduction of how to represent a software architecture model as code, and gives you a flavour for the sort of direction i'm taking it. having the software architecture as code provides some interesting opportunities that you don't get with static diagrams from visio, etc and the ability to keep the models up to date automatically by scanning the codebase is what i find particularly exciting. if you have any thoughts on this, please do drop me a note.
January 16, 2015
by Simon Brown
· 6,599 Views · 4 Likes
article thumbnail
AngularJS Two-Way Data Binding
Traditional web development builds a bridge between the front end, where the user performs their manipulations of the application’s data, and the back end, where that data is stored. In traditional web development, this process is driven by successive networking calls, communicating changes between the server and the client via re-rendering the involved pages. AngularJS enhances this with two-way data binding. Below we’ll look at what two-way data binding is, and how it differs from the traditional data processing approach. The Traditional Approach Most web frameworks focus on one-way data binding. This involves reading the input from the DOM, serializing the data, sending it to the server, waiting for the process to finish, then modifying the DOM to indicate any errors, or reloading the DOM if the call is successful. While this provides a traditional web application all the time it needs to perform data processing, this benefit is only really applicable to web apps with highly complex data structures. If your application has a simpler data format, with relatively flat models, then the extra work can needlessly complicate the process. Furthermore, all models need to wait for server confirmation before their data can be updated, meaning that related data depending upon those models won’t have the latest information. Tying Together the UI and the Model AngularJS addresses this with two-way data binding. With two-way data binding, the user interface changes are immediately reflected in the underlying data model, and vice-versa. This allows the data model to serve as an atomic unit that the view of the application can always depend upon to be accurate. Many web frameworks implement this type of data binding with a complex series of event listeners and event handlers – an approach that can quickly become fragile. AngularJS, on the other hand, makes this approach to data a primary part of its architecture. Instead of creating a series of callbacks to handle the changing data, AngularJS does this automatically without any needed intervention by the programmer Benefits and Considerations The primary benefit of two-way data binding is that updates to (and retrievals from) the underlying data store happen more or less automatically. When the data store updates, the UI updates as well. This allows you to remove a lot of logic from the front-end display code, particularly when making effective use of AngularJS’s declarative approach to UI presentation. In essence, it allows for true data encapsulation on the front-end, reducing the need to do complex and destructive manipulation of the DOM. While this solves a lot of problems with a website’s presentation architecture, there are some disadvantages to take into consideration. First, AngularJS uses a dirty-checking approach that can be slow in some browsers – not a problem for thin presentation pages, but any page with heavy processing may run into problems in older browsers. Additionally, two-way binding is only truly beneficial for relatively simple objects. Any data that requires heavy parsing work, or extensive manipulation and processing, will simply not work well with two-way binding. Additionally, some uses of Angular – such as using the same binding directive more than once – can break the data binding process. Conclusion While the traditional approach to data binding has a lot of benefits when it comes to performing complex data manipulations and calculations, it can introduce some problems with respect to the design of the web application’s front-end architecture. With AngularJS’s use of two-way data binding, your application can greatly simplify its presentation layer, allowing the UI to be built off of a cleaner, less-destructive approach to DOM presentation. While it isn’t useful in every situation, the two-way data binding AngularJS provides can greatly ease web application development, and reduce the pain faced by your front-end developers. Build your Angular app and connect it to any database with Backand today. – Get started now.
January 13, 2015
by Itay Herskovits
· 6,702 Views
article thumbnail
Byte Buddy, an alternative to cglib and Javassist
Byte Buddy is a code generation library for creating Java classes during the runtime of a Java application and without the help of a compiler. Other than the code generation utilities that ship with the Java Class Library, Byte Buddy allows the creation of arbitrary classes and is not limited to implementing interfaces for the creation of runtime proxies. In order to use Byte Buddy, one does not require an understanding of Java byte code or the class file format. In contrast, Byte Buddy’s API aims for code that is concise and easy to understand for everybody. Nevertheless, Byte Buddy remains fully customizable down to the possibility of defining custom byte code. Furthermore, the API was designed to be as non-intrusive as possible and as a result, Byte Buddy does not leave any trace in the classes that were created by it. For this reason, the generated classes can exist without requiring Byte Buddy on the class path. Because of this feature, Byte Buddy’s mascot was chosen to be a ghost. Byte Buddy is written in Java 6 but supports the generation of classes for any Java version. Byte Buddy is a light-weight library and only depends on the visitor API of the Java byte code parser library ASM which does itself not require any further dependencies. At first sight, runtime code generation can appear to be some sort of black magic that should be avoided and only few developers write applications that explicitly generate code during their runtime. However, this picture changes when creating libraries that need to interact with arbitrary code and types that are unknown at compile time. In this context, a library implementer must often choose between either requiring a user to implement library-proprietary interfaces or to generate code at runtime when the user’s types becomes first known to the library. Many known libraries such as for example Spring or Hibernate choose the latter approach which is popular among their users under the term of using Plain Old Java Objects. As a result, code generation has become an ubiquitous concept in the Java space. Byte Buddy is an attempt to innovate the runtime creation of Java types in order to provide a better tool set to those relying on code generation. Hello World Saying Hello World with Byte Buddy is as easy as it can get. Any creation of a Java class starts with an instance of the ByteBuddy class which represents a configuration for creating new types: Class dynamicType = new ByteBuddy() .subclass(Object.class) .method(named("toString")).intercept(FixedValue.value("Hello World!")) .make() .load(getClass().getClassLoader(), ClassLoadingStrategy.Default.WRAPPER) .getLoaded(); assertThat(dynamicType.newInstance().toString(), is("Hello World!")); The default ByteBuddy configuration which is used in the above example creatse a Java class in the newest version of the class file format that is understood by the processing Java virtual machine. As hopefully obvious from the example code, the created type will extend the Object class and intercept its toString method which should return a fixed value of Hello World!. The method to be intercepted is identified by a so-called method matcher. In the above example, a predefined method matcher named(String) is used which identifies a method by its exact name. Byte Buddy comes with numerous predefined and well-tested method matchers which are collected in the MethodMatchers class. The creation of custom matchers is however as simple as implementing the (functional) MethodMatcher interface. For implementing the toString method, the FixedValue class defines a constant return value for the intercepted method. Defining a constant value is only one example of many method interceptors that ship with Byte Buddy. By implementing the Instrumentation interface, a method could however even be defined by custom byte code. Finally, the described Java class is created and then loaded into the Java virtual machine. For this purpose, a target class loader is required as well as a class loading strategy where we choose a wrapper strategy. The latter creates a new child class loader which wraps the given class loader and only knows about the newly created dynamic type. Eventually, we can convince ourselves of the result by calling the toString method on an instance of the created class and finding the return value to represent the constant value we expected. Where to go from here? Byte Buddy is a comprehensive library and we only scratched the surface of Byte Buddy's capabilities. However, Byte Buddy aims for being easy to use by providing a domain-specific language for creating classes. Most runtime code generation can be done by writing readable code and without any knowledge of Java's class file format. If you want to learn more about Byte Buddy, you can find such a tutorial on Byte Buddy's web page. Furthermore, Byte Buddy comes with a detailed in-code documentation and extensive test case coverage which can also serve as example code. you can also find the source code of Byte Buddy on GitHub .
January 11, 2015
by Rafael Winterhalter
· 10,680 Views · 6 Likes
article thumbnail
Need Micro Caching? Memoization to the Rescue
Caching solves wide sort of performance problems. There are many ways to integrate caching into our applications. For example when we use Spring there is easy to use @Cacheable support. Quite easy but we still have to configure cache manager, cache regions, etc. Sometimes it's unfortunately like taking a sledgehammer to crack a nut. So what can we do to "go lighter"? There is a technique called memoization. Technically it's as easy as pie but true genius lies in simplicity. Model solution looks as follows: public Foo getValue() { if (storedValue == null) { storedValue = retrieveFoo(); } return storedValue; } As you can see there is no problem in implementing it manually, but as long as we remember about DRY rule we can use already implemented solutions which additionally provides thread safety. Pretty good idea is to use Guava library. // create Supplier memoizer = Suppliers.memoize(this::retrieveFoo); // and use Foo variable = memoizer.get(); Sometimes it's enough but what can we do if we need to specify TTL for our value? We have to store (cache) retrieved value only for few seconds and after exceeding defined duration get this value one more time? One more time we can use functionality provided by Guava. Supplier memoizer = Suppliers.memoizeWithExpiration(this::retrieveFoo, 5, TimeUnit.SECONDS); The above line builds memoizer with TTL = 5 seconds. As you can see - simple... but powerful :)
January 6, 2015
by Jakub Kubrynski
· 17,764 Views
  • Previous
  • ...
  • 409
  • 410
  • 411
  • 412
  • 413
  • 414
  • 415
  • 416
  • 417
  • 418
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×