DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
To Shard, or Not to Shard
When I talk with customers about sharding decisions I often start by telling the following true story… A couple of years ago, a customer came to me looking for advice on how to shard his system. He told me he was already convinced he needed to do that since he read that some smart people at MySQL giants like Facebook and Twitter were sharding—so naturally this was something he should be doing, too. I paused for a moment and then I asked him what the size of his database was. “10GB,” he said. I nodded and asked if he handles many queries or if they were very complicated. “No,” he said. “Just a few hundred queries per second, and they have not been loading down the system by more than a few percent.” I asked him whether he was expecting exponential growth in the near future—looking to double every week or something like that. “No, our load and data size grew about 7 percent last year and we expect about the same growth this year and for the foreseeable future.” My recommendation to him was not to waste time and effort on sharding because it is just not needed in his company’s case. Before you decide how to shard, you’d best understand whether or not you really need to shard to begin with. Yes, on the extremely large-scale side of database demands, sharding is the only game in town. And not just for MySQL, but for pretty much any technology out there. Yet thanks to emerging technologies there is an increasing amount of applications that can run databases without sharding. Today we can easily run with terabytes of data per MySQL instance and serve tens of thousands of queries in many OLTP environments. This allows organizations to build very large applications without needing to shard. And keep this in mind: Sharding is a pain under all circumstances. Even if you have sharding provided out of the box by the database system, it is a pain because it introduces more components and complexity. Creating good distributed query execution plans is a very complicated task that needs to take network topology and load into account in addition to the data distribution and load of individual nodes. Before you decide if you need to shard, you should look at alternatives to scale your application. In the MySQL world, the solutions are typically as follows: Alternatives to Sharding Functional Partitioning: In many environments a single MySQL instance becomes a dumping ground for all kinds of databases—you might end up having your main application share a database instance with Drupal, which powers your website, with WordPress, which powers your blog, and with vBulletin, which powers your forums. Splitting those pieces into different database instances is something you should consider before you look into sharding. Custom-made systems will often have many applications using different data sets that can be easily split out. Replication: Many applications are read-heavy, so scaling reads becomes the issue earlier than it does with scaling writes. Replication is a great solution for this. MySQL’s built-in replication is very robust, though due to its asynchronous nature it adds complexity to the application. The developer must decide which of the reads can be done from the replica servers and which can’t, because you must be absolutely certain that you’re reading the most recent, actual data. This is the reason that alternative, synchronous replication technologies for MySQL like Percona XtraDB Cluster, are gaining popularity: They provide single database-like behavior from the cluster in most cases. Caching and Queueing: Caching is a great technology for reducing the amount of reads that hit the database. There are many applications that have reduced read load on the database by 80-95% using this technology. Queueing, in contrast, optimizes writes. It does this by merging multiple write operations together so they hit the database efficiently. Most large-scale applications should rely heavily on both of these technologies. Memcached and Redis are two popular caching technologies in the MySQL space. For queueing, the most popular technologies are ActiveMQ and RabbitMQ [1]. Supplemental Technologies: MySQL is great at many things but not at everything. If you’re looking for high-performance full-text search, consider ElasticSearch, Sphinx, or Lucene. If you’re looking at large-scale data analytics, a Hadoop-based infrastructure or Vertica might work well for you. You should let MySQL handle the things it is good at, and leave the rest to supporting tools. Optimizations to Make Before Sharding Scaling isn’t just about architecture either. You also need to make sure your system is reasonably optimized. Many people decide sharding is inevitable for them even though there are much easier and more cost-effective ways to get the performance and scale they are looking for. All of which, I might add, are also going to be valuable if sharding is indeed eventually needed. Hardware: Are you using the right hardware? I’ve seen many people looking into sharding when in fact simply purchasing decent hardware would solve their problems for years to come. Make sure you have plenty of memory and high-performance flash storage if you’re working with a large database. In many cases it can transform your system so much it will look like magic. MySQL version and Configuration: Use a recent MySQL version. By that I mean the latest GA version (MySQL 5.6 at the time of this article’s publication). Percona Server, which is free, often offers additional performance improvements for demanding workloads. Use the most recent operating system too, especially if you’re using modern hardware. Finally, make sure MySQL is configured properly. The difference in MySQL performance between poorly configured MySQL and well-tuned MySQL can be 10x or more. Schema and Queries: The same application logic can be expressed using a variety of schema and queries. I’ve seen a lot of similar applications approaching things differently, and the difference in the performance between an optimal approach and a poor one (but still used in production) can be 100x or more. Many of the changes can be retrofitted to existing schema—such as minor query changes and changes to the index structure—however, if your schema doesn’t fit your application needs well, then you might be looking at a complete redesign. So it is a good idea to think things through early. When to Shard So when should you start thinking about sharding? Basically, if none of the measures listed above have given you the performance you need, it might be time to consider sharding. Sharding does have the advantage of allowing you to potentially use lower-cost hardware or cheaper cloud instances. Most developers are using agile development methods these days and there is a common term, “Architectural Runway,” which defines how far the application can go with its current architecture. If you’ve already found success using replication in particular, it might be a bad decision to add sharding because it will force your developers to deal with the complexity of sharding and asynchronous replication. However, replication is still typically used to achieve high availability even if you’re already sharding, but in this case it’s not for scaling reads. If you’ve come to the point where you’re sure you need to shard, here are some of the questions you need to ask about how you’ll implement your sharding strategy: Shard Level: At which level should we shard? It does not have to be at the database level. Many applications, SaaS in particular, often “shard” on higher levels, deploying multiple copies of their full stack to offer complete isolation for availability, performance, security etc. In many large scale applications you will see multiple copies of a full stack deployed, each having its own sharded MySQL environment. Shard Key: How do we shard? In many cases the choice depends on whether you’re authenticating for user accounts or your organization, but in other cases it is not so obvious. When making a sharding choice, you need to think about two things: 1) as many data access points as possible should go into a single shard, because cross-shard access is expensive if supported at all, and 2) making sure such sharding does not produce a shard that is too large to handle either in terms of data size or traffic. For example, sharding by country is a poor idea because the requirements to handle Belgium traffic won’t be the same for the United States or China, which require a lot more resources. Shard by Schema or Instance: What is the unit of your shard? The typical choices are MySQL instance or database (schema). I like the shard = database approach, which doesn’t limit you to a single MySQL instance per physical box. That way you do not have to run too many MySQL instances, but you can run more than one if the application works better that way. Shard Unit: If you shard by a single MySQL server, you will run into a problem with high availability very soon. When you have 100 MySQL servers there are roughly 100 more chances for one of them to crash compared with having only one, so ensuring there is a high availability solution becomes critical. Instead of sharing across MySQL servers you will usually be sharding across “Replication Clusters,” such as one MySQL primary node and one or several replica or PXC (Percona XtraDB Cluster) nodes. Shard Technology: What technology can you use to assist you with sharding? Within the MySQL world there is no standard sharding technology as of yet that everyone uses. Most of the large web properties have implemented something in-house for their sharding needs, and some have released their solutions as open source projects. One example is Vitess, contributed by Google, and another is JetPants, contributed by Tumblr. Rolling out your own simple sharding framework might look easy for some developers until you have to deal with operational issues like balancing the shards, resharding, etc., on a large scale. There are a number of purpose-built technologies that can help you with sharding if this doesn’t sound like something your team can manage. Sharding Technologies Here are technologies that you should consider: MySQL Fabric: This is the sharding technology being developed by the MySQL team at Oracle. MySQL Fabric is GA, but its functionality right now is rather limited, especially in terms of their support for multi-sharded queries. Given more time however, it has the potential to become the standard sharding technology for MySQL. Tesora: Tesora has a proxy-based solution for MySQL sharding that became open source some time ago. I would be especially looking at Tesora if you’re also looking at deploying OpenStack, as they’ve invested a lot into the integration. ScaleArc: ScaleArc is a commercial database proxy solution that can do caching, filtering, routing, and sharding. It is a pretty mature solution that handles multiple database technologies and not just MySQL. ScaleBase: ScaleBase is a sharding solution designed specifically for MySQL and the cloud, which similarly to MySQL, operates at the proxy level. There are many technologies in the MySQL space that can help you scale your application without sharding. If you’re going to build the next “Facebook,” however, you will surely need to shard, and there are a number of technologies that can help you do it as painlessly as possible. Large-scale applications on large-scale databases will always introduce complexity, which makes them more complicated to develop against and manage. Success comes with cost. [1] http://dzone.com/research/guide-to-enterprise-integration Peter Zaitsev co-founded Percona in 2006, assuming the role of CEO. Percona helps companies of all sizes maximize their success with MySQL. Peter enjoys mixing business leadership with hands on technical expertise. Peter is also the co-author of O’Reilly’s High Performance MySQL, one of the most popular books on MySQL performance.
April 2, 2015
by Peter Zaitsev
· 20,875 Views · 1 Like
article thumbnail
Dismantling invokedynamic
Many Java developers regarded the JDK's version seven release as somewhat a disappointment. On the surface, merely a few language and library extensions made it into the release, namely Project Coin and NIO2. But under the covers, the seventh version of the platform shipped the single biggest extension to the JVM's type system ever introduced after its initial release. Adding the invokedynamic instruction did not only lay the foundation for implementing lambda expressions in Java 8, it also was a game changer for translating dynamic languages into the Java byte code format. While the invokedynamic instruction is an implementation detail for executing a language on the Java virtual machine, understanding the functioning of this instruction gives true insights into the inner workings of executing a Java program. This article gives a beginner's view on what problem the invokedynamic instruction solves and how it solves it. Method handles Method handles are often described as a retrofitted version of Java's reflection API, but this is not what they are meant to represent. While method handles can represent a method, constructor or field, they are not intended to describe properties of these class members. It is for example not possible to directly extract metadata from a method handle such as modifiers or annotation values of the represented method. And while method handles allow for the invocation of a referenced method, their main purpose is to be used together with an invokedynamic call site. For gaining a better understanding of method handles, looking at them as an imperfect replacement for the reflection API is however a reasonable starting point. Method handles cannot be instantiated. Instead, method handles are created by using a designated lookup object. These objects are themselves created by using a factory method that is provided by the MethodHandles class. Whenever this factory is invoked, it first creates a security context which ensures that the resulting lookup object can only locate methods that are also visible to the class from which the factory method was invoked. A lookup object can then be created as follows: class Example { void doSomething() { MethodHandles.Lookup lookup = MethodHandles.lookup(); } private void foo() { /* ... */ } } As argued before, the above lookup object could only be used to locate methods that are also visible to the Example class such asfoo. It would for example be impossible to look up a private method of another class. This is a first major difference to using the reflection API where private methods of outside classes can be located just as any other method and where these methods can even be invoked after marking such a method as accessible. Method handles are therefore sensible of their creation context which is a first major difference to the reflection API. Apart from that, a method handle is more specific than the reflection API by describing a specific type of method rather than representing just any method. In a Java program, a method's type is a composite of both the method's return type and the types of its parameters. For example, the only method of the following Counter class returns an int representing the number of characters of the only String-typed argument: class Counter { static int count(String name) { return name.length(); } } A representation of this method's type can be created by using another factory. This factory is found in the MethodType class which also represents instances of created method types. Using this factory, the method type for Counter::count can be created by handing over the method's return type and its parameter types bundled as an array: MethodType methodType = MethodType.methodType(int.class, new Class[] {String.class}); By using the lookup object that was created before and the above method type, it is now possible to locate a method handle that represents the Counter::count method as depicted in the following code: MethodType methodType = MethodType.methodType(int.class, new Class[] {String.class}); MethodHandles.Lookup lookup = MethodHandles.lookup(); MethodHandle methodHandle = lookup.findStatic(Counter.class, "count", methodType); int count = methodHandle.invokeExact("foo"); assertThat(count, is(3)); At first glance, using a method handle might seem like an overly complex version of using the reflection API. However, keep in mind that the direct invocation of a method using a handle is not the main intent of its use. The main difference of the above example code and of invoking a method via the reflection API is only revealed when looking into the differences of how the Java compiler translates both invocations into Java byte code. When a Java program invokes a method, this method is uniquely identified by its name and by its (non-generic) parameter types and even by its return type. It is for this reason that it is possible to overload methods in Java. And even though the Java programming language does not allow it, the JVM does in theory allow to overload a method by its return type. Following this principle, a reflective method call is executed as a common method call of the Method::invoke method. This method is identified by its two parameters which are of the types Object and Object[]. In addition to this, the method is identified by its Object return type. Because of this signature, all arguments to this method need to always be boxed and enclosed in an array. Similarly, the return value needs to be boxed if it was primitive or null is returned if the method was void. Method handles are the exception to this rule. Instead of invoking a method handle by referring to the signature ofMethodHandle::invokeExact signature which takes an Object[] as its single argument and returns Object, method handles are invoked by using a so-called polymorphic signature. A polymorphic signature is created by the Java compiler dependant on the types of the actual arguments and the expected return type at a call site. For example, when invoking the method handle as above with int count = methodHandle.invokeExact("foo"); the Java compiler translates this invocation as if the invokeExact method was defined to accept a single single argument of typeString and returning an int type. Obviously, such a method does not exist and for (almost) any other method, this would result in a linkage error at runtime. For method handles, the Java Virtual Machine does however recognize this signature to be polymorphic and treats the invocation of the method handle as if the Counter::count method that the handle refers to was inset directly into the call site. Thus, the method can be invoked without the overhead of boxing primitive values or the return type and without placing the argument values inside an array. At the same time, when using the invokeExact invocation, it is guaranteed to the Java virtual machine that the method handle always references a method at runtime that is compatible to the polymorphic signature. For the example, the JVM expected that the referenced method actually accepts a String as its only argument and that it returns a primitive int. If this constraint was not fulfilled, the execution would instead result in a runtime error. However, any other method that accepts a single String and that returns a primitive int could be successfully filled into the method handle's call site to replace Counter::count. In contrast, using the Counter::count method handle at the following three invocations would result in runtime errors, even though the code compiles successfully: int count1 = methodHandle.invokeExact((Object) "foo"); int count2 = (Integer) methodHandle.invokeExact("foo"); methodHandle.invokeExact("foo"); The first statement results in an error because the argument that is handed to the handle is too general. While the JVM expected a String as an argument to the method, the Java compiler suggested that the argument would be an Object type. It is important to understand that the Java compiler took the casting as a hint for creating a different polymorphic signature with anObject type as a single parameter type while the JVM expected a String at runtime. Note that this restriction also holds for handing too specific arguments, for example when casting an argument to an Integer where the method handle required aNumber type as its argument. In the second statement, the Java compiler suggested to the runtime that the handle's method would return an Integer wrapper type instead of the primitive int. And without suggesting a return type at all in the third statement, the Java compiler implicitly translated the invocation into a void method call. Hence, invokeExact really does mean exact. This restriction can sometimes be too harsh. For this reason, instead of requiring an exact invocation, the method handle also allows for a more forgiving invocation where conversions such as type castings and boxings are applied. This sort of invocation can be applied by using the MethodHandle::invoke method. Using this method, the Java compiler still creates a polymorphic signature. This time, the Java virtual machine does however test the actual arguments and the return type for compatibility at run time and converts them by applying boxings or castings, if appropriate. Obviously, these transformations can sometimes add a runtime overhead. Fields, methods and constructors: handles as a unified interface Other than Method instances of the reflection API, method handles can equally reference fields or constructors. The name of theMethodHandle type could therefore be seen as too narrow. Effectively, it does not matter what class member is referenced via a method handle at runtime as long as its MethodType, another type with a misleading name, matches the arguments that are passed at the associated call site. Using the appropriate factories of a MethodHandles.Lookup object, a field can be looked up to represent a getter or a setter. Using getters or setters in this context does not refer to invoking an actual method that follows the Java bean specification. Instead, the field-based method handle directly reads from or writes to the field but in shape of a method call via invoking the method handle. By representing such field access via method handles, field access or method invocations can be used interchangeably. As an example for such interchange, take the following class: class Bean { String value; void print(String x) { System.out.println(x); } } For the above Bean class, the following method handles can be used for either writing a string to the value field or for invoking the print method with the same string as an argument: MethodHandle fieldHandle = lookup.findSetter(Bean.class, "value", String.class); MethodType methodType = MethodType.methodType(void.class, new Class[] {String.class}); MethodHandle methodHandle = lookup.findVirtual(Bean.class, "print", methodType); As long as the method handle call site is handed an instance of Bean together with a String while returning void, both method handles could be used interchangeably as shown here: anyHandle.invokeExact((Bean) mybean, (String) myString); Note that the polymorphic signature of the above call site does not match the method type of the above handle. However, within Java byte code, non-static methods are invoked as if they were static methods with where the this reference is handed as a first, implicit argument. A non-static method's nominal type does therefore diverge from its actual runtime type. Similarly, access to a non-static field requires an instance to be access. Similarly to fields and methods, it is possible to locate and invoke constructors which are considered as methods with a voidreturn value for their nominal type. Furthermore, one can not only invoke a method directly but even invoke a super method as long as this super method is reachable for the class from which the lookup factory was created. In contrast, invoking a super method is not possible at all when relying on the reflection API. If required, it is even possible to return a constant value from a handle. Performance metrics Method handles are often described as being a more performant as the Java reflection API. At least for recent releases of the HotSpot virtual machine, this is not true. The simplest way of proving this is writing an appropriate benchmark. Then again, is not all too simple to write a benchmark for a Java program which is optimized while it is executed. The de facto standard for writing a benchmark has become using JMH, a harness that ships under the OpenJDK umbrella. The full benchmark can be found as a gist in my GitHub profile. In this article, only the most important aspects of this benchmark are covered. From the benchmark, it becomes obvious that reflection is already implemented quite efficiently. Modern JVMs know a concept named inflation where a frequently invoked reflective method call is replaced with runtime generated Java byte code. What remains is the overhead of applying the boxing for passing arguments and receiving a return values. These boxings can sometimes be eliminated by the JVM's Just-in-time compiler but this is not always possible. For this reason, using method handles can be more performant than using the reflection API if method calls involve a significant amount of primitive values. This does however require that the exact method signatures are already known at compile time such that the appropriate polymorphic signature can be created. For most use cases of the reflection API, this guarantee can however not be given because the invoked method's types are not known at compile time. In this case, using method handles does not offer any performance benefits and should not be used to replace it. Creating an invokedynamic call site Normally, invokedynamic call sites are created by the Java compiler only when it needs to translate a lambda expression into byte code. It is worthwhile to note that lambda expressions could have been implemented without invokedynamic call sites altogether, for example by converting them into anonymous inner classes. As a main difference to the suggested approach, using invokedynamic delays the creation of a similar class to runtime. We are looking into class creation in the next section. For now, bear however in mind that invokedynamic does not have anything to do with class creation, it only allows to delay the decision of how to dispatch a method until runtime. For a better understanding of invokedynamic call sites, it helps to create such call sites explicitly in order to look at the mechanic in isolation. To do so, the following example makes use of my code generation framework Byte Buddy which provides explicit byte code generation of invokedynamic call sites without requiring a any knowledge of the byte code format. Any invokedynamic call site eventually yields a MethodHandle that references the method to be invoked. Instead of invoking this method handle manually, it is however up to the Java runtime to do so. Because method handles have become a known concept to the Java virtual machine, these invocations are then optimized similarly to a common method call. Any such method handle is received from a so-called bootstrap method which is nothing more than a plain Java method that fulfills a specific signature. For a trivial example of a bootstrap method, look at the following code: class Bootstrapper { public static CallSite bootstrap(Object... args) throws Throwable { MethodType methodType = MethodType.methodType(int.class, new Class[] {String.class}) MethodHandles.Lookup lookup = MethodHandles.lookup(); MethodHandle methodHandle = lookup.findStatic(Counter.class, "count", methodType); return new ConstantCallSite(methodHandle); } } For now, we do not care much about the arguments of the method. Instead, notice that the method is static what is as a matter of fact a requirement. Within Java byte code, an invokedynamic call site references the full signature of a bootstrap method but not a specific object which could have a state and a life cycle. Once the invokedynamic call site is invoked, control flow is handed to the referenced bootstrap method which is now responsible for identifying a method handle. Once this method handle is returned from the bootstrap method, it is invoked by the Java runtime. As obvious from the above example, a MethodHandle is not returned directly from a bootstrap method. Instead, the handle is wrapped inside of a CallSite object. Whenever a bootstrap method is invoked, the invokedynamic call site is later permanently bound to the CallSite object that is returned from this method. Consequently, a bootstrap method is only invoked a single time for any call site. Thanks to this intermediate CallSite object, it is however possible to exchange the referenced MethodHandle at a later point. For this purpose, the Java class library already offers different implementations of CallSite. We have already seen a ConstantCallSite in the example code above. As the name suggests, a ConstantCallSite always references the same method handle without a possibility of a later exchange. Alternatively, it is however also possible to for example use aMutableCallSite which allows to change the referenced MethodHandle at a later point in time or it is even possible to implement a custom CallSite class. With the above bootstrap method and Byte Buddy, we can now implement a custom invokedynamic instruction. For this, Byte Buddy offers the InvokeDynamic instrumentation that accepts a bootstrap method as its only mandatory argument. Such instrumentations are then fed to Byte Buddy. Assuming the following class: abstract class Example { abstract int method(); } we can use Byte Buddy to subclass Example in order to override method. We are then going to implement this method to contain a single invokedynamic call site. Without any further configuration, Byte Buddy creates a polymorphic signature that resembles the method type of the overridden method. However, for non-static methods, the this reference is set as a first, implicit argument. Assuming that we want to bind the Counter::count method which expects a String as a single argument, we could not bind this handle to Example::method because of this type mismatch. Therefore, we need to create a different call site without the implicit argument but with an String in its place. This can be achieved by using Byte Buddy's domain specific language: Instrumentation invokeDynamic = InvokeDynamic .bootstrap(Bootstrapper.class.getDeclaredMethod(“bootstrap”, Object[].class)) .withoutImplicitArguments() .withValue("foo"); With this instrumentation in place, we can finally extend the Example class and override method to implement the invokedynamic call site as in the following code snippet: Example example = new ByteBuddy() .subclass(Example.class) .method(named(“method”)).intercept(invokeDynamic) .make() .load(Example.class.getClassLoader(), ClassLoadingStrategy.Default.INJECTION) .getLoaded() .newInstance(); int result = example.method(); assertThat(result, is(3)); As obvious from the above assertion, the characters of the "foo" string were counted correctly. By setting appropriate break points in the code, it is further possible to validate that the bootstrap method is called and that control flow further reaches theCounter::count method. So far, we did not gain much from using an invokedynamic call site. The above bootstrap method would always bindCounter::count and can therefore only produce a valid result if the invokedynamic call site really wanted to transform a Stringinto an int. Obviously, bootstrap methods can however be more flexible thanks to the arguments they receive from the invokedynamic call site. Any bootstrap method receives at least three arguments: As a first argument, the bootstrap method receives a MethodHandles.Lookup object. The security context of this object is that of the class that contains the invokedynamic call site that triggered the bootstrapping. As discussed before, this implies that private methods of the defining class could be bound to the invokedynamic call site using this lookup instance. The second argument is a String representing a method name. This string serves as a hint to indicate from the call site which method should be bound to it. Strictly speaking, this argument is not required as it is perfectly legal to bind a method with another name. Byte Buddy simply serves the the name of the overridden method as this argument, if not specified differently. Finally, the MethodType of the method handle that is expected to be returned is served as a third argument. For the example above, we specified explicitly that we expect a String as a single parameter. At the same time, Byte Buddy derived that we require an int as a return value from looking at the overridden method, as we again did not specify any explicit return type. It is up to the implementor of a bootstrap method what exact signature this method should portray as long as it can at least accept these three arguments. If the last parameter of a bootstrap method represents an Object array, this last parameter is treated as a varargs and can therefore accept any excess arguments. This is also the reason why the above example bootstrap method is valid. Additionally, a bootstrap method can receive several arguments from an invokedynamic call site as long as these arguments can be stored in a class's constant pool. For any Java class, a constant pool stores values that are used inside of a class, largely numbers or string values. As of today, such constants can be primitive values of at least 32 bit size, Strings, Classes,MethodHandles and MethodTypes. This allows bootstrap methods to be used more flexible, if locating a suitable method handle requires additional information in form of such arguments. Lambda expressions Whenever the Java compiler translates a lambda expression into byte code, it copies the lambda's body into a private method inside of the class in which the expression is defined. These methods are named lambda$X$Y with X being the name of the method that contains the lambda expression and with Y being a zero-based sequence number. The parameters of such a method are those of the functional interface that the lambda expression implements. Given that the lambda expression makes no use of non-static fields or methods of the enclosing class, the method is also defined to be static. For compensation, the lambda expression is itself substituted by an invokedynamic call site. On its invocation, this call site requests the binding of a factory for an instance of the functional interface. As arguments to this factory, the call site supplies any values of the lambda expression's enclosing method which are used inside of the expression and a reference to the enclosing instance, if required. As a return type, the factory is required to provide an instance of the functional interface. For bootstrapping a call site, any invokedynamic instruction currently delegates to the LambdaMetafactory class which is included in the Java class library. This factory is then responsible for creating a class that implements the functional interface and which invokes the appropriate method that contains the lambda's body which, as described before, is stored in the original class. In the future, this bootstrapping process might however change which is one of the major advantages of using invokedynamic for implementing lambda expressions. If one day, a better suited language feature was available for implementing lambda expressions, the current implementation could simply be swapped out. In order to being able to create a class that implements the functional interface, any call site representing a lambda expression provides additional arguments to the bootstrap method. For the obligatory arguments, it already provides the name of the functional interface's method. Also, it provides a MethodType of the factory method that the bootstrapping is supposed to yield as a result. Additionally, the bootstrap method is supplied another MethodType that describes the signature of the functional interface's method. To that, it receives a MethodHandle referencing the method that contains the lambda's method body. Finally, the call site provides a MethodType of the generic signature of the functional interface's method, i.e. the signature of the method at the call site before type-erasure was applied. When invoked, the bootstrap method looks at these arguments and creates an appropriate implementation of a class that implements the functional interface. This class is created using the ASM library, a low-level byte code parser and writer that has become the de facto standard for direct Java byte code manipulation. Besides implementing the functional interface's method, the bootstrap method also adds an appropriate constructor and a static factory method for creating instances of the class. It is this factory method that is later bound to the invokedyanmic call site. As arguments, the factory receives an instance to the lambda method's enclosing instance, in case it is accessed and also any values that are read from the enclosing method. As an example, consider the following lambda expression: class Foo { int i; void bar(int j) { Consumer consumer = k -> System.out.println(i + j + k); } } In order to be executed, the lambda expression requires access to both the enclosing instance of Foo and to the value j of its enclosing method. Therefore, the desugared version of the above class looks something like the following where the invokedynamic instruction is represented by some pseudo-code: class Foo { int i; void bar(int j) { Consumer consumer = ; } private /* non-static */ void lambda$foo$0(int j, int k) { System.out.println(this.i + j + k); } } In order to being able to invoke lambda$foo$0, both the enclosing Foo instance and the j variable are handed to the factory that is bound by the invokedyanmic instruction. This factory then receives the variables it requires in order to create an instance of the generated class. This generated class would then look something like the following: class Foo$$Lambda$0 implements Consumer { private final Foo _this; private final int j; private Foo$$Lambda$0(Foo _this, int j) { this._this = _this; this.j = j; } private static Consumer get$Lambda(Foo _this, int j) { return new Foo$$Lambda$0(_this, j); } public void accept(Object value) { // type erasure _this.lambda$foo$0(_this, j, (Integer) value); } } Eventually, the factory method of the generated class is bound to the invokedynamic call site via a method handle that is contained by a ConstantCallSite. However, if the lambda expression is fully stateless, i.e. it does not require access to the instance or method in which it is enclosed, the LambdaMetafactory returns a so-called constant method handle that references an eagerly created instance of the generated class. Hence, this instance serves as a singleton to be used for every time that the lambda expression's call site is reached. Obviously, this optimization decision affects your application's memory footprint and is something to keep in mind when writing lambda expressions. Also, no factory method is added to a class of a stateless lambda expression. You might have noticed that the lambda expression's method body is contained in a private method which is now invoked from another class. Normally, this would result in an illegal access error. To overcome this limitation, the generated classes are loaded using so-called anonymous class loading. Anonymous class loading can only be applied when a class is loaded explicitly by handing a byte array. Also, it is not normally possible to apply anonymous class loading in user code as it is hidden away in the internal classes of the Java class library. When a class is loaded using anonymous class loading, it receives a host class of which it inherits its full security context. This involves both method and field access rights and the protection domain such that a lambda expression can also be generated for signed jar files. Using this approch, lambda expression can be considered more secure than anonymous inner classes because private methods are never reachable from outside of a class. Under the covers: lambda forms Lambda forms are an implementation detail of how MethodHandles are executed by the virtual machine. Because of their name, lambda forms are however often confused with lambda expressions. Instead, lambda forms are inspired by lambda calculus and received their name for that reason, not for their actual usage to implement lambda expressions in the OpenJDK. In earlier versions of the OpenJDK 7, method handles could be executed in one of two modes. Method handles were either directly rendered as byte code or they were dispatched using explicit assembly code that was supplied by the Java runtime. The byte code rendering was applied to any method handle that was considered to be fully constant throughout the lifetime of a Java class. If the JVM could however not prove this property, the method handle was instead executed by dispatching it to the supplied assembly code. Unfortunately, because assembly code cannot be optimized by Java's JIT-compiler, this lead to non-constant method handle invocations to "fall off the performance cliff". As this also affected the lazily bound lambda expressions, this was obviously not a satisfactory solution. LambdaForms were introduced to solve this problem. Roughly speaking, lambda forms represent byte code instructions which, as stated before, can be optimized by a JIT-compiler. In the OpenJDK, a MethodHandle's invocation semantics are today represented by a LambdaForm to which the handle carries a reference. With this optimizable intermediate representation, the use of non-constant MethodHandles has become significantly more performant. As a matter of fact, it is even possible to see a byte-code compiled LambdaForm in action. Simply place a break point inside of a bootstrap method or inside of a method that is invoked via a MethodHandle. Once the break point kicks it, the byte code-translated LambdaForms can be found on the call stack. Why this matters for dynamic languages Any language that should be executed on the Java virtual machine needs to be translated to Java byte code. And as the name suggests, Java byte code aligns rather close to the Java programming language. This includes the requirement to define a strict type for any value and before invokedynamic was introduced, a method call required to specify an explicit target class for dispatching a method. Looking at the following JavaScript code, specifying either information is however not possible when translating the method into byte code: 1 2 3 function (foo) { foo.bar(); } Using an invokedynamic call site, it has become possible to delay the identification of the method's dispatcher until runtime and furthermore, to rebind the invocation target, in case that a previous decision needs to be corrected. Before, using the reflection API with all of its performance drawbacks was the only real alternative to implementing a dynamic language. The real profiteer of the invokedynamic instruction are therefore dynamic programming languages. Adding the instruction was a first step away from aligning the byte code format to the Java programming language, making the JVM a powerful runtime even for dynamic languages. And as lambda expressions proved, this stronger focus on hosting dynamic languages on the JVM does neither interfere with evolving the Java language. In contrast, the Java programming languages gained from these efforts.
April 2, 2015
by Rafael Winterhalter
· 13,669 Views · 7 Likes
article thumbnail
Fluent Nhibernate: Create Table from Code(Class- CodeFirst)
With Fluent Nhibernate we don’t have to write and maintain mapping in xml. You can write classes to map your domain objects to your database tables. If you want to learn how we can do this with an existing database and Fluent Nhibernate, I have also written a blog post about CRUD operation with Fluent Nhibernate and ASP.NET MVC .After writing this blog post I was getting a lot of emails about how we can create database from the Fluent Nhibernate, as we can do the same with Entity Framework Code First. So I thought it was a good idea to write a blog post about it instead of writing individual emails. How to create tables from class via Fluent Nhibernate: To Demonstrate how we can create tables based mapping classes create we’re going to create a console application via adding new project like below. Once you are done with creating application. It’s time to add reference for Fluent Nhibernate. You can do this via your library package manager like following. Now once, done with adding Fluent Nhibernate code, I have created a simple class “Customer” like below. namespace FluentNhibernateCodeFirst { public class Customer { public virtual int CustomerId { get; set; } public virtual string FirstName { get; set; } public virtual string LastName { get; set; } } } Here you can see I have created three properties which will be also column of our database table. Now as we know we need to write a mapping class for customer so below is my customer map class. using FluentNHibernate.Mapping; namespace FluentNhibernateCodeFirst { public class CustomerMap : ClassMap { public CustomerMap() { Id(c => c.CustomerId); Map(c => c.FirstName); Map(c => c.LastName); } } } Here, I map ID Customer Id as customer Id will be primary key. Now it’s time to write code creating database and saving some data into customer table created. using System; using System.Configuration; using FluentNHibernate.Cfg; using FluentNHibernate.Cfg.Db; using NHibernate; using NHibernate.Tool.hbm2ddl; namespace FluentNhibernateCodeFirst { class Program { private static ISessionFactory _sessionFactory; static void Main(string[] args) { //creating database string connectionString = ConfigurationManager.ConnectionStrings["DefaultConnectionString"].ConnectionString; CreateDatabase(connectionString); Console.WriteLine("Database Created sucessfully"); //creating a object of customer Customer customer=new Customer { CustomerId = 1, FirstName = "Jalpesh", LastName = "Vadgama" }; //saving customer in database. using(ISession session = _sessionFactory.OpenSession()) session.Save(customer); Console.WriteLine("Customer Saved"); } static void CreateDatabase(string connectionString) { var configuration = Fluently.Configure() .Database(MsSqlConfiguration.MsSql2012.ConnectionString(connectionString).ShowSql) .Mappings(m => m.FluentMappings.AddFromAssemblyOf()) .BuildConfiguration(); var exporter = new SchemaExport(configuration); exporter.Execute(true, true, false); _sessionFactory = configuration.BuildSessionFactory(); } } } Here in the above code, If you above code care fully then, I have created function called CreateDatabase. In this function First it will create a mapping and then that schema mapping will be executed against database to create table. After creating table, I have initialize the customer object and saved it into database. Now when you run this application. You will get output like below as expected. And now if you see database in the SQL Management Studio, Customer table has created like below. And If you see that table, Data is also inserted like below. That’s it. Hope you like it. Stay tuned for more!. You can find complete sourcecode of this example at github on -https://github.com/dotnetjalps/FluentHinbernateCodeFirst
March 30, 2015
by Jalpesh Vadgama
· 21,646 Views · 3 Likes
article thumbnail
Spark and ZooKeeper: Fault-Tolerant Job Manager out of the Box
Apache Spark, Solr, and Zookeeper work together to create a fault-tolerant, distributed ETL system that converts RDBMS data into Solr documents.
March 28, 2015
by Konstantin Smirnov
· 12,779 Views
article thumbnail
Using Google Protocol Buffers with Spring MVC-based REST Services
Written by Josh Long on the Spring blog This week I’m in São Paulo, Brazil presenting at QCon SP. I had an interesting discussion with someone who loves Spring’s REST stack, but wondered if there was something more efficient than plain-ol’ JSON. Indeed, there is! I often get asked about Spring’s support for high-speed binary based encoding of messages. Spring’s long supported RPC encoding with the likes of Hessian, Burlap, etc., and Spring Framework 4.1 introduced support for Google Protocol Buffers which can be used with REST services as well. From the Google Protocol Buffer website: Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages… Google uses Protocol Buffers extensively in their own, internal, service-centric architecture. A .proto document describes the types (_messages_) to be encoded and contains a definition language that should be familiar to anyone who’s used C structs. In the document, you define types, fields in those types, and their ordering (memory offsets!) in the type relative to each other. The .proto files aren’t implementations - they’re declarative descriptions of messages that may be conveyed over the wire. They can prescribe and validate constraints - the type of a given field, or the cardinatlity of that field - on the messages that are encoded and decoded. You must use the Protobuf compiler to generate the appropriate client for your language of choice. You can use Google Protocol Buffers anyway you like, but in this post we’ll look at using it as a way to encode REST service payloads. This approach is powerful: you can use content-negotiation to serve high speed Protocol Buffer payloads to the clients (in any number of languages) that accept it, and something more conventional like JSON for those that don’t. Protocol Buffer messages offer a number of improvements over typical JSON-encoded messages, particularly in a polyglot system where microservices are implemented in various technologies but need to be able to reason about communication between services in a consistant, long-term manner. Protocol Buffers are several nice features that promote stable APIs: Protocol Buffers offer backward compatibility for free. Each field is numbered in a Protocol Buffer, so you don’t have to change the behavior of the code going forward to maintain backward compatability with older clients. Clients that don’t know about new fields won’t bother trying to parse them. Protocol Buffers provide a natural place to specify validation using the required,optional, and repeated keywords. Each client enforces these constraints in their own way. Protocol Buffers are polyglot, and work with all manner of technologies. In the example code for this blog alone there is a Ruby, Python and Java client for the Java service demonstrated. It’s just a matter of using one of the numerous supported compilers. You might think that you could just use Java’s inbuilt serialization mechanism in a homogeneous service environment but, as the Protocol Buffers team were quick to point out whent hey first introduced the technology, there are some problems even with that. Java language luminary Josh Bloch’s epic tome, Effective Java, on page 213, provides further details. Let’s first look at our .proto document: package demo; option java_package = "demo"; option java_outer_classname = "CustomerProtos"; message Customer { required int32 id = 1; required string firstName = 2; required string lastName = 3; enum EmailType { PRIVATE = 1; PROFESSIONAL = 2; } message EmailAddress { required string email = 1; optional EmailType type = 2 [default = PROFESSIONAL]; } repeated EmailAddress email = 5; } message Organization { required string name = 1; repeated Customer customer = 2; } You then pass this definition to the protoc compiler and specify the output type, like this: protoc -I=$IN_DIR --java_out=$OUT_DIR $IN_DIR/customer.proto Here’s the little Bash script I put together to code-generate my various clients: #!/usr/bin/env bash SRC_DIR=`pwd` DST_DIR=`pwd`/../src/main/ echo source: $SRC_DIR echo destination root: $DST_DIR function ensure_implementations(){ # Ruby and Go aren't natively supported it seems # Java and Python are gem list | grep ruby-protocol-buffers || sudo gem install ruby-protocol-buffers go get -u github.com/golang/protobuf/{proto,protoc-gen-go} } function gen(){ D=$1 echo $D OUT=$DST_DIR/$D mkdir -p $OUT protoc -I=$SRC_DIR --${D}_out=$OUT $SRC_DIR/customer.proto } ensure_implementations gen java gen python gen ruby This will generate the appropriate client classes in the src/main/{java,ruby,python}folders. Let’s first look at the Spring MVC REST service itself. A Spring MVC REST Service In our example, we’ll register an instance of Spring framework 4.1’s org.springframework.http.converter.protobuf.ProtobufHttpMessageConverter. This type is an HttpMessageConverter. HttpMessageConverters encode and decode the requests and responses in REST service calls. They’re usually activated after some sort of content negotiation has occurred: if the client specifies Accept: application/x-protobuf, for example, then our REST service will send back the Protocol Buffer-encoded response. package demo; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.context.annotation.Bean; import org.springframework.http.converter.protobuf.ProtobufHttpMessageConverter; import org.springframework.web.bind.annotation.PathVariable; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RestController; import java.util.Arrays; import java.util.Collection; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; import java.util.stream.Collectors; @SpringBootApplication public class DemoApplication { public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); } @Bean ProtobufHttpMessageConverter protobufHttpMessageConverter() { return new ProtobufHttpMessageConverter(); } private CustomerProtos.Customer customer(int id, String f, String l, Collection emails) { Collection emailAddresses = emails.stream().map(e -> CustomerProtos.Customer.EmailAddress.newBuilder() .setType(CustomerProtos.Customer.EmailType.PROFESSIONAL) .setEmail(e).build()) .collect(Collectors.toList()); return CustomerProtos.Customer.newBuilder() .setFirstName(f) .setLastName(l) .setId(id) .addAllEmail(emailAddresses) .build(); } @Bean CustomerRepository customerRepository() { Map customers = new ConcurrentHashMap<>(); // populate with some dummy data Arrays.asList( customer(1, "Chris", "Richardson", Arrays.asList("[email protected]")), customer(2, "Josh", "Long", Arrays.asList("[email protected]")), customer(3, "Matt", "Stine", Arrays.asList("[email protected]")), customer(4, "Russ", "Miles", Arrays.asList("[email protected]")) ).forEach(c -> customers.put(c.getId(), c)); // our lambda just gets forwarded to Map#get(Integer) return customers::get; } } interface CustomerRepository { CustomerProtos.Customer findById(int id); } @RestController class CustomerRestController { @Autowired private CustomerRepository customerRepository; @RequestMapping("/customers/{id}") CustomerProtos.Customer customer(@PathVariable Integer id) { return this.customerRepository.findById(id); } } Most of this code is pretty straightforward. It’s a Spring Boot application. Spring Boot automatically registers HttpMessageConverter beans so we need only define the ProtobufHttpMessageConverter bean and it gets configured appropriately. The @Configuration class seeds some dummy date and a mock CustomerRepository object. I won’t reproduce the Java type for our Protocol Buffer, demo/CustomerProtos.java, here as it is code-generated bit twiddling and parsing code; not all that interesting to read. One convenience is that the Java implementation automatically provides builder methods for quickly creating instances of these types in Java. The code-generated types are dumb struct like objects. They’re suitable for use as DTOs, but should not be used as the basis for your API. Do not extend them using Java inheritance to introduce new functionality; it’ll break the implementation and it’s bad OOP practice, anyway. If you want to keep things cleaner, simply wrapt and adapt them as appropriate, perhaps handling conversion from an ORM entity to the Protocol Buffer client type as appropriate in that wrapper. HttpMessageConverters may also be used with Spring’s REST client, the RestTemplate. Here’s the appropriate Java-language unit test: package demo; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.IntegrationTest; import org.springframework.boot.test.SpringApplicationConfiguration; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.http.ResponseEntity; import org.springframework.http.converter.protobuf.ProtobufHttpMessageConverter; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import org.springframework.test.context.web.WebAppConfiguration; import org.springframework.web.client.RestTemplate; import java.util.Arrays; @RunWith(SpringJUnit4ClassRunner.class) @SpringApplicationConfiguration(classes = DemoApplication.class) @WebAppConfiguration @IntegrationTest public class DemoApplicationTests { @Configuration public static class RestClientConfiguration { @Bean RestTemplate restTemplate(ProtobufHttpMessageConverter hmc) { return new RestTemplate(Arrays.asList(hmc)); } @Bean ProtobufHttpMessageConverter protobufHttpMessageConverter() { return new ProtobufHttpMessageConverter(); } } @Autowired private RestTemplate restTemplate; private int port = 8080; @Test public void contextLoaded() { ResponseEntity customer = restTemplate.getForEntity( "http://127.0.0.1:" + port + "/customers/2", CustomerProtos.Customer.class); System.out.println("customer retrieved: " + customer.toString()); } } Things just work as you’d expect, not only in Java and Spring, but also in Ruby and Python. For completeness, here is a simple client using Ruby (client types omitted): #!/usr/bin/env ruby require './customer.pb' require 'net/http' require 'uri' uri = URI.parse('http://localhost:8080/customers/3') body = Net::HTTP.get(uri) puts Demo::Customer.parse(body) ..and here’s a client in Python (client types omitted): #!/usr/bin/env python import urllib import customer_pb2 if __name__ == '__main__': customer = customer_pb2.Customer() customers_read = urllib.urlopen('http://localhost:8080/customers/1').read() customer.ParseFromString(customers_read) print customer Where to go from Here If you want very high speed message encoding that works with multiple languages, Protocol Buffers are a compelling option. There are other encoding technologies like Avro or Thrift, but none nearly so mature and entrenched as Protocol Buffers. You don’t necessarily need to use Protocol Buffers with REST, either. You could plug it into some sort of RPC service, if that’s your style. There are almost as many client implementations as there are buildpacks for Cloud Foundry - so you could run almost anything on Cloud Foundry and enjoy the same high speed, consistent messaging across all your services! The code for this example is available online, as well, so don’t hesitate to check it out! Also.. Hi gang, in 2015, I’ve been trying to do a random tech-tip style post every week based on things that I see garnering interest in the community, either here or on the Pivotal blog. I use these weekly-_ish_ (OK! OK! - it’s not been easy doing them as regularly as This Week in Spring, but so far I haven’t missed a week! :-) ) posts as a chance to focus not on a specific new release, per se, but on the application of Spring in service to some community use case that might be cross-cutting or just might benefit from having a spotlight shined on it. So far we’ve looked at all manner of things - Vaadin, Activiti, 12-Factor App Style Configuration, Smarter Service to Service Invocations, Couchbase, and much more, etc. - and we’ve got some interesting stuff lined up, too. I wondered what else you want to see talked about, however. If you’ve got some ideas about what you’d like to see covered, or a community post of your own to contribute, reach out to me on Twitter (@starbuxman) or via email (jlong [at] pivotal [dot] io). I remain, as always, at your service.
March 27, 2015
by Pieter Humphrey
· 15,149 Views
article thumbnail
How to Read Call Logs Programmatically From Android
It’s fairly easy. You need to add the following uses-permission in the Android manifest to get call history programmatically. interface in your activity. It has three methods. abstract Loader onCreateLoader(int id, Bundle args) //Instantiate and return a new Loader for the given ID. abstract void onLoadFinished(Loader loader, D data) //Called when a previously created loader has finished its load. abstract void onLoaderReset(Loader loader) //Called when a previously created loader is being reset, and thus making its data unavailable. To initialize a query, we need to call LoaderManager.initLoader() at the very first place. We are going to add a button and call this in that button events here and after this background framework will be initialized. As soon as the background framework is initialized, it calls your implementation of onCreateLoader(). To start the query, we have to return a CursorLoader from this method. @Override public Loader onCreateLoader(int loaderID, Bundle args) { Log.d(TAG, "onCreateLoader() >> loaderID : " + loaderID); switch (loaderID) { case URL_LOADER: // Returns a new CursorLoader return new CursorLoader( this, // Parent activity context CallLog.Calls.CONTENT_URI, // Table to query null, // Projection to return null, // No selection clause null, // No selection arguments null // Default sort order ); default: return null; } } We are going access our expected data from a Cursor. And we will get this in theonLoadFinished() method. @Override public void onLoadFinished(Loader loader, Cursor managedCursor) { Log.d(TAG, "onLoadFinished()"); StringBuilder sb = new StringBuilder(); int number = managedCursor.getColumnIndex(CallLog.Calls.NUMBER); int type = managedCursor.getColumnIndex(CallLog.Calls.TYPE); int date = managedCursor.getColumnIndex(CallLog.Calls.DATE); int duration = managedCursor.getColumnIndex(CallLog.Calls.DURATION); sb.append("Call Log Details "); sb.append("\n"); sb.append("\n"); sb.append(""); while (managedCursor.moveToNext()) { String phNumber = managedCursor.getString(number); String callType = managedCursor.getString(type); String callDate = managedCursor.getString(date); Date callDayTime = new Date(Long.valueOf(callDate)); String callDuration = managedCursor.getString(duration); String dir = null; int callTypeCode = Integer.parseInt(callType); switch (callTypeCode) { case CallLog.Calls.OUTGOING_TYPE: dir = "Outgoing"; break; case CallLog.Calls.INCOMING_TYPE: dir = "Incoming"; break; case CallLog.Calls.MISSED_TYPE: dir = "Missed"; break; } sb.append("") .append("Phone Number: ") .append("") .append(phNumber) .append(""); sb.append(""); sb.append(""); sb.append("") .append("Call Type:") .append("") .append(dir) .append(""); sb.append(""); sb.append(""); sb.append("") .append("Date & Time:") .append("") .append(callDayTime) .append(""); sb.append(""); sb.append(""); sb.append("") .append("Call Duration (Seconds):") .append("") .append(callDuration) .append(""); sb.append(""); sb.append(""); sb.append(""); } sb.append(""); managedCursor.close(); callLogsTextView.setText(Html.fromHtml(sb.toString())); } Output: Full Source code: https://github.com/rokon12/call-log
March 27, 2015
by A N M Bazlur Rahman DZone Core CORE
· 40,403 Views · 2 Likes
article thumbnail
Getting Started with Couchbase and Spring Data Couchbase
Written by Josh Long on the Spring blog. This blog was inspired by a talk that Laurent Doguin, a developer advocate over at Couchbase, and I gave at Couchbase Connect last year. Merci Laurent! This is a demo of the Spring Data Couchbase integration. From the project page, Spring Data Couchbase is: The Spring Data Couchbase project provides integration with the Couchbase Server database. Key functional areas of Spring Data Couchbase are a POJO centric model for interacting with Couchbase Buckets and easily writing a Repository style data access layer. What is Couchbase? Couchbase is a distributed data-store that enjoys true horizontal scaling. I like to think of it as a mix of Redis and MongoDB: you work with documents that are accessed through their keys. There are numerous client APIs for all languages. If you’re using Couchbase for your backend and using the JVM, you’ll love Spring Data Couchbase. The bullets on the project home page best enumerate its many features: Spring configuration support using Java based @Configuration classes or an XML namespace for the Couchbase driver. CouchbaseTemplate helper class that increases productivity performing common Couchbase operations. Includes integrated object mapping between documents and POJOs. Exception translation into Spring’s portable Data Access Exception hierarchy. Feature Rich Object Mapping integrated with Spring’s Conversion Service. Annotation based mapping metadata but extensible to support other metadata formats. Automatic implementation of Repository interfaces including support for custom finder methods (backed by Couchbase Views). JMX administration and monitoring Transparent @Cacheable support to cache any objects you need for high performance access. Running Couchbase Use Vagrant to Run Couchbase Locally You will need to have Couchbase installed if you don’t already (naturally). Michael Nitschinger (@daschl, also lead of the Spring Data Couchbase project), blogged about how to get a simple4-node Vagrant cluster up and running here. I’ve reproduced his example here in the vagrantdirectory. To use it, you’ll need to install Virtual Box and Vagrant, of course, but then simply run vagrant up in the vagrant directory. To get the most up-to-date version of this configuration script, I went to Michael’s GitHub vagrants project and found that, beyond this example, there are numerous other Vagrant scripts available. I have a submodule in this code’s project directory that points to that, but be sure to consult that for the latest-and-greatest. To get everything running on my machine, I chose the Ubuntu 12 installation of Couchbase 3.0.2. You can change how many nodes are started by configuring the VAGRANT_NODES environment variable before startup: VAGRANT_NODES=2 vagrant up You’ll need to administer and configure Couchbase on initial setup. Point your browser to the right IP for each node. The rules for determining that IP are well described in the README. The admin interface, in my case, was available at 192.168.105.101:8091 and192.168.105.102:8091. For more on this process, I recommend that you follow theguidelines here for the details. Here’s how I did it. I hit the admin interface on the first node and created a new cluster. I usedadmin for the username and password for the password. On all subsequent management pages, I simply joined the existing cluster by pointing the nodes to 192.168.105.101 and using the aforementioned admin credential. Once you’ve joined all nodes, look for theRebalance button in the Server Nodes panel and trigger a cluster rebalance. If you are done with your Vagrant cluster, you can use the vagrant halt command to shut it down cleanly. Very handy is also vagrant suspend, which will save the state of the nodes instead of shutting them down completely. If you want to administer the Couchbase cluster from the command line there is the handycouchbase-cli. You can simply use the vagrant ssh command to get into each of the nodes (by their node-names: node1, node2, etc..). Once there, you can run cluster configuration commands. For example the server-list command will enumerate cluster nodes. /opt/couchbase/bin/couchbase-cli server-list -c 192.168.56.101-u admin -p password It’s easy to trigger a rebalance using: /opt/couchbase/bin/couchbase-cli rebalance -c 192.168.56.101-u admin -p password Couchbase In the Cloud and on Cloud Foundry Couchbase lends itself to use in the cloud. It’s horizontally scalable (like Gemfire or Cassandra) in that there’s no single point of failure. It does not employ a master-slave or active/passive system. There are a few ways to get it up and running where your applications are running. If you’re running a Cloud Foundry installation, then you can install the the Cumulogic Service Broker which then lets your Cloud Foundry installation talk to the Cumulogic platform which itself can manage Couchbase instances. Service brokers are the bit of integration code that teach Cloud Foundry how to provision, destroy and generally interact with a managed service, like Couchbase, in this case. Using Spring Data Couchbase to Store Facebook Places Let’s look at a simple example that reads data (in this case from the Facebook Places API using Spring Social Facebook’s FacebookTemplate API) and then loads it into the Couchbase server. Get a Facebook Access Token You’ll also need a Facebook access token. The easiest way to do this is to go to the Facebook Developer Portal and create a new application and then get an application ID and an application secret. Take these two values and concatenate them with a pike character (|). Thus, you’ll have something of the form: appID|appSecret. The sample application uses Spring’s Environment mechanism to resolve the facebook.accessToken key. You can provide a value for it in the src/main/resources/application.properties file or using any of the other supported Spring Boot property resolution mechanisms. You could even provide the value as a -D argument: -Dfacebook.accessToken=...|... Telling Spring Data Couchbase About our Cluster Data in Couchbase is stored in buckets. It’s logically the same as a database in a SQL RDBMS. It is typically replicated across nodes and has its own configuration. We’ll be using the defaultbucket, but it’s a snap to create more buckets. Let’s look at the basic configuration required to use Spring Data Couchbase (in this case, in terms of a Spring Boot application): @SpringBootApplication @EnableScheduling @EnableCaching public class Application { @EnableCouchbaseRepositories @Configuration static class CouchbaseConfiguration extends AbstractCouchbaseConfiguration { @Value("${couchbase.cluster.bucket}") private String bucketName; @Value("${couchbase.cluster.password}") private String password; @Value("${couchbase.cluster.ip}") private String ip; @Override protected List bootstrapHosts() { return Arrays.asList(this.ip); } @Override protected String getBucketName() { return this.bucketName; } @Override protected String getBucketPassword() { return this.password; } } // more beans } A Spring Data Couchbase Repository Spring Data provides the notion of repositories - objects that handle typical data-access logic and provide convention-based queries. They can be used to map POJOs to data in the backing data store. Our example simply stores the information on businesses it reads from Facebook’s Places API. To acheive this we’ve created a simple Place entity that Spring Data Couchbase repositories will know how to persist: @Document(expiry = 0) class Place { @Id private String id; @Field private Location location; @Field @NotNull private String name; @Field private String affilitation, category, description, about; @Field private Date insertionDate; // .. getters, constructors, toString, etc } The Place entity references another entity, Location, which is basically the same. In the case of Spring Data Couchbase, repository finder methods map to views - queries written in JavaScript - in a Couchbase server. You’ll need to setup views on the Couchbase servers. Go to any Couchbase server’s admin console and visit the Views screen, then clickCreate Development View and name it place, as our entity will be demo.Place (the development view name is adapted from the entity’s class name by default). We’ll create two views, the generic all, which is required for any Spring Data Couchbase POJO, and the byName view, which will be used to drive the repository’s findByName finder method. This mapping is by convention, though you can override which view is employed with the @View annotation on the finder method’s declaration. First, all: Now, byName: When you’re done, be sure to Publish each view! Now you can use Spring Data repositories as you’d expect. The only thing that’s a bit different about these repositories is that we’re declaring a Spring Data Couchbase Query type for the argument to the findByName finder method, not a String. Using the @Query is straightforward: Query query = new Query(); query.setKey("Philz Coffee"); Collection places = placeRepository.findByName(query); places.forEach(System.out::println); Where to go from Here We’ve only covered some of the basics here. Spring Data Couchbase supports the Java bean validation API, and can be configured to honor validation constraints on its entities. Spring Data Couchbase also provides lower-level access to the CouchbaseClient API, if you want it. Spring Data Couchbase also implements the Spring CacheManager abstraction - you can use@Cacheable and friends with data on service methods and it’ll be transparently persisted to Couchbase for you. The code for this example is in my Github repository, co-developed with my pal Laurent Doguin (@ldoguin) over at Couchbase.
March 24, 2015
by Pieter Humphrey
· 19,285 Views
article thumbnail
Getting Started With Activiti and Spring Boot
This post is a guest post by Activiti co-founder and community member Joram Barrez (@jbarrez) who works for Alfresco. Thanks Joram! I’d like to see more of these community guest posts, so - as usual - don’t hesitate to ping me (@starbuxman) with ideas and contributions! -Josh Introduction Activiti is an Apache-licensed business process management (BPM) engine. Such an engine has as core goal to take a process definition comprised of human tasks and service calls and execute those in a certain order, while exposing various API’s to start, manage and query data about process instances for that definition. Contrary to many of its competitors, Activiti is lightweight and integrates easily with any Java technology or project. All that, and it works at any scale - from just a few dozen to many thousands or even millions of process executions. The source code of Activiti can be found on Github. The project was founded and is sponsored by Alfresco, but enjoys contributions from all across the globe and industries. A process definition is typically visualized as a flow-chart-like diagram. In recent years, the BPMN 2.0 standard (an OMG standard, like UML) has become the de-facto ‘language’ of these diagrams. This standard defines how a certain shape on the diagram should be interpreted, both technically and business-wise and how it is stored, as a not-so-hipster XML file.. but luckily most of the tooling hides this for you. This is a standard, and you can use any number of compliant tools to design (and even run) your BPMN processes. That said, if you’re asking me, there is no better choice than Activiti! Spring Boot integration Activiti and Spring play nicely together. The convention-over-configuration approach in Spring Boot works nicely with Activiti’s process engine is setup and use. Out of the box, you only need a database, as process executions can span anywhere from a few seconds to a couple of years. Obviously, as an intrinsic part of a process definition is calling and consuming data to and from various systems with all kinds of technologies. The simplicity of adding the needed dependencies and integrating various pieces of (boiler-plate) logic with Spring Boot really makes this child’s play. Using Spring Boot and Activiti in a microservice approach also makes a lot of sense. Spring Boot makes it easy to get a production-ready service up and running in no time and - in a distributed microservice architecture - Activiti processes can glue together various microservices while also weaving in human workflow (tasks and forms) to achieve a certain goal. The Spring Boot integration in Activiti was created by Spring expert Josh Long. Josh and I did a webinar a couple of months ago that should give you a good insight into the basics of the Activiti integration for Spring Boot. The Activiti user guide section on Spring Boot is also a great starting place to get more information. Getting Started The code for this example can be found in my Github repository. The process we’ll implement here is a hiring process for a developer. It’s simplified of course (as it needs to fit on this web page), but you should get the core concepts. Here’s the diagram: As said in the introduction, all shapes here have a very specific interpretation thanks to the BPMN 2.0 standard. But even without knowledge of BPMN, the process is pretty easy to understand: When the process starts, the resume of the job applicant is stored in an external system. The process then waits until a telephone interview has been conducted. This is done by a user (see the little icon of a person in the corner). If the telephone interview wasn’t all that, a polite rejection email is sent. Otherwise, both a tech interview and financial negotiation should happen. Note that at any point, the applicant can cancel. That’s shown in the diagram as the event on the boundary of the big rectangle. When the event happens, everything inside will be killed and the process halts. If all goes well, a welcome email is sent. This is the BPMN for this process Let’s create a new Maven project, and add the dependencies needed to get Spring Boot, Activiti and a database. We’ll use an in memory database to keep things simple. org.activiti spring-boot-starter-basic ${activiti.version} com.h2database h2 1.4.185 So only two dependencies is what is needed to create a very first Spring Boot + Activiti application: @SpringBootApplication public class MyApp { public static void main(String[] args) { SpringApplication.run(MyApp.class, args); } } You could already run this application, it won’t do anything functionally but behind the scenes it already creates an in-memory H2 database creates an Activiti process engine using that database exposes all Activiti services as Spring Beans configures tidbits here and there such as the Activiti async job executor, mail server, etc. Let’s get something running. Drop the BPMN 2.0 process definition into thesrc/main/resources/processes folder. All processes placed here will automatically be deployed (ie. parsed and made to be executable) to the Activiti engine. Let’s keep things simple to start, and create a CommandLineRunner that will be executed when the app boots up: @Bean CommandLineRunner init( final RepositoryService repositoryService, final RuntimeService runtimeService, final TaskService taskService) { return new CommandLineRunner() { public void run(String... strings) throws Exception { Map variables = new HashMap(); variables.put("applicantName", "John Doe"); variables.put("email", "[email protected]"); variables.put("phoneNumber", "123456789"); runtimeService.startProcessInstanceByKey("hireProcess", variables); } }; } So what’s happening here is that we create a map of all the variables needed to run the process and pass it when starting process. If you’d check the process definition you’ll see we reference those variables using ${variableName} in many places (such as the task description). The first step of the process is an automatic step (see the little cogwheel icon), implemented using an expression that uses a Spring Bean: which is implemented with activiti:expression="${resumeService.storeResume()}" Of course, we need that bean or the process would not start. So let’s create it: @Component public class ResumeService { public void storeResume() { System.out.println("Storing resume ..."); } } When running the application now, you’ll see that the bean is called: . ____ _ __ _ _ /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ \\/ ___)| |_)| | | | | || (_| | ) ) ) ) ' |____| .__|_| |_|_| |_\__, | / / / / =========|_|==============|___/=/_/_/_/ :: Spring Boot :: (v1.2.0.RELEASE) 2015-02-16 11:55:11.129 INFO 304 --- [ main] MyApp : Starting MyApp on The-Activiti-Machine.local with PID 304 ... Storing resume ... 2015-02-16 11:55:13.662 INFO 304 --- [ main] MyApp : Started MyApp in 2.788 seconds (JVM running for 3.067) And that’s it! Congrats with running your first process instance using Activiti in Spring Boot! Let’s spice things up a bit, and add following dependency to our pom.xml: org.activiti spring-boot-starter-rest-api ${activiti.version} Having this on the classpath does a nifty thing: it takes the Activiti REST API (which is written in Spring MVC) and exposes this fully in your application. The REST API of Activiti is fully documented in the Activiti User Guide. The REST API is secured by basic auth, and won’t have any users by default. Let’s add an admin user to the system as shown below (add this to the MyApp class). Don’t do this in a production system of course, there you’ll want to hook in the authentication to LDAP or something else. @Bean InitializingBean usersAndGroupsInitializer(final IdentityService identityService) { return new InitializingBean() { public void afterPropertiesSet() throws Exception { Group group = identityService.newGroup("user"); group.setName("users"); group.setType("security-role"); identityService.saveGroup(group); User admin = identityService.newUser("admin"); admin.setPassword("admin"); identityService.saveUser(admin); } }; } Start the application. We can now start a process instance as we did in the CommandLineRunner, but now using REST: curl -u admin:admin -H "Content-Type: application/json" -d '{"processDefinitionKey":"hireProcess", "variables": [ {"name":"applicantName", "value":"John Doe"}, {"name":"email", "value":"[email protected]"}, {"name":"phoneNumber", "value":"1234567"} ]}' http://localhost:8080/runtime/process-instances Which returns us the json representation of the process instance: { "tenantId": "", "url": "http://localhost:8080/runtime/process-instances/5", "activityId": "sid-42BAE58A-8FFB-4B02-AAED-E0D8EA5A7E39", "id": "5", "processDefinitionUrl": "http://localhost:8080/repository/process-definitions/hireProcess:1:4", "suspended": false, "completed": false, "ended": false, "businessKey": null, "variables": [], "processDefinitionId": "hireProcess:1:4" } I just want to stand still for a moment how cool this is. Just by adding one dependency, you’re getting the whole Activiti REST API embedded in your application! Let’s make it even cooler, and add following dependency org.activiti spring-boot-starter-actuator ${activiti.version} This adds a Spring Boot actuator endpoint for Activiti. If we restart the application, and hithttp://localhost:8080/activiti/, we get some basic stats about our processes. With some imagination that in a live system you’ve got many more process definitions deployed and executing, you can see how this is useful. The same actuator is also registered as a JMX bean exposing similar information. { completedTaskCountToday: 0, deployedProcessDefinitions: [ "hireProcess (v1)" ], processDefinitionCount: 1, cachedProcessDefinitionCount: 1, runningProcessInstanceCount: { hireProcess (v1): 0 }, completedTaskCount: 0, completedActivities: 0, completedProcessInstanceCount: { hireProcess (v1): 0 }, openTaskCount: 0 } To finish our coding, let’s create a dedicated REST endpoint for our hire process, that could be consumed by for example a javascript web application (out of scope for this article). So most likely, we’ll have a form for the applicant to fill in the details we’ve been passing programmatically above. And while we’re at it, let’s store the applicant information as a JPA entity. In that case, the data won’t be stored in Activiti anymore, but in a separate table and referenced by Activiti when needed. You probably guessed it by now, JPA support is enabled by adding a dependency: org.activiti spring-boot-starter-jpa ${activiti.version} and add the entity to the MyApp class: @Entity class Applicant { @Id @GeneratedValue private Long id; private String name; private String email; private String phoneNumber; // Getters and setters We’ll also need a Repository for this Entity (put this in a separate file or also in MyApp). No need for any methods, the Repository magic from Spring will generate the methods we need for us. public interface ApplicantRepository extends JpaRepository { // .. } And now we can create the dedicated REST endpoint: @RestController public class MyRestController { @Autowired private RuntimeService runtimeService; @Autowired private ApplicantRepository applicantRepository; @RequestMapping(value="/start-hire-process", method= RequestMethod.POST, produces= MediaType.APPLICATION_JSON_VALUE) public void startHireProcess(@RequestBody Map data) { Applicant applicant = new Applicant(data.get("name"), data.get("email"), data.get("phoneNumber")); applicantRepository.save(applicant); Map variables = new HashMap(); variables.put("applicant", applicant); runtimeService.startProcessInstanceByKey("hireProcessWithJpa", variables); } } Note we’re now using a slightly different process called ‘hireProcessWithJpa’, which has a few tweaks in it to cope with the fact the data is now in a JPA entity. So for example, we can’t use ${applicantName} anymore, but we now have to use ${applicant.name}. Let’s restart the application and start a new process instance: curl -u admin:admin -H "Content-Type: application/json" -d '{"name":"John Doe", "email": "[email protected]", "phoneNumber":"123456789"}' http://localhost:8080/start-hire-process We can now go through our process. You could create a custom endpoints for this too, exposing different task queries with different forms … but I’ll leave this to your imagination and use the default Activiti REST end points to walk through the process. Let’s see which task the process instance currently is at (you could pass in more detailed parameters here, for example the ‘processInstanceId’ for better filtering): curl -u admin:admin -H "Content-Type: application/json" http://localhost:8080/runtime/tasks which returns { "order": "asc", "size": 1, "sort": "id", "total": 1, "data": [{ "id": "14", "processInstanceId": "8", "createTime": "2015-02-16T13:11:26.078+01:00", "description": "Conduct a telephone interview with John Doe. Phone number = 123456789", "name": "Telephone interview" ... }], "start": 0 } So, our process is now at the Telephone interview. In a realistic application, there would be a task list and a form that could be filled in to complete this task. Let’s complete this task (we have to set the telephoneInterviewOutcome variable as the exclusive gateway uses it to route the execution): curl -u admin:admin -H "Content-Type: application/json" -d '{"action" : "complete", "variables": [ {"name":"telephoneInterviewOutcome", "value":true} ]}' http://localhost:8080/runtime/tasks/14 When we get the tasks again now, the process instance will have moved on to the two tasks in parallel in the subprocess (big rectangle): { "order": "asc", "size": 2, "sort": "id", "total": 2, "data": [ { ... "name": "Tech interview" }, { ... "name": "Financial negotiation" } ], "start": 0 } We can now continue the rest of the process in a similar fashion, but I’ll leave that to you to play around with. Testing One of the strengths of using Activiti for creating business processes is that everything is simply Java. As a consequence, processes can be tested as regular Java code with unit tests. Spring Boot makes writing such test a breeze. Here’s how the unit test for the “happy path” looks like (while omitting @Autowired fields and test e-mail server setup). The code also shows the use of the Activiti API’s for querying tasks for a given group and process instance. @RunWith(SpringJUnit4ClassRunner.class) @SpringApplicationConfiguration(classes = {MyApp.class}) @WebAppConfiguration @IntegrationTest public class HireProcessTest { @Test public void testHappyPath() { // Create test applicant Applicant applicant = new Applicant("John Doe", "[email protected]", "12344"); applicantRepository.save(applicant); // Start process instance Map variables = new HashMap(); variables.put("applicant", applicant); ProcessInstance processInstance = runtimeService.startProcessInstanceByKey("hireProcessWithJpa", variables); // First, the 'phone interview' should be active Task task = taskService.createTaskQuery() .processInstanceId(processInstance.getId()) .taskCandidateGroup("dev-managers") .singleResult(); Assert.assertEquals("Telephone interview", task.getName()); // Completing the phone interview with success should trigger two new tasks Map taskVariables = new HashMap(); taskVariables.put("telephoneInterviewOutcome", true); taskService.complete(task.getId(), taskVariables); List tasks = taskService.createTaskQuery() .processInstanceId(processInstance.getId()) .orderByTaskName().asc() .list(); Assert.assertEquals(2, tasks.size()); Assert.assertEquals("Financial negotiation", tasks.get(0).getName()); Assert.assertEquals("Tech interview", tasks.get(1).getName()); // Completing both should wrap up the subprocess, send out the 'welcome mail' and end the process instance taskVariables = new HashMap(); taskVariables.put("techOk", true); taskService.complete(tasks.get(0).getId(), taskVariables); taskVariables = new HashMap(); taskVariables.put("financialOk", true); taskService.complete(tasks.get(1).getId(), taskVariables); // Verify email Assert.assertEquals(1, wiser.getMessages().size()); // Verify process completed Assert.assertEquals(1, historyService.createHistoricProcessInstanceQuery().finished().count()); } Next steps We haven’t touched any of the tooling around Activiti. There is a bunch more than just the engine, like the Eclipse plugin to design processes, a free web editor in the cloud (also included in the .zip download you can get from Activiti's site, a web application that showcases many of the features of the engine, … The current release of Activiti (version 5.17.0) has integration with Spring Boot 1.1.6. However, the current master version is compatible with 1.2.1. Using Spring Boot 1.2.0 brings us sweet stuff like support for XA transactions with JTA. This means you can hook up your processes easily with JMS, JPA and Activiti logic all in the same transaction! ..Which brings us to the next point … In this example, we’ve focussed heavily on human interactions (and barely touched it). But there’s many things you can do around orchestrating systems too. The Spring Boot integration also has Spring Integration support you could leverage to do just that in a very neat way! And of course there is much much more about the BPMN 2.0 standard. Read more about itin the Activiti docs.
March 20, 2015
by Pieter Humphrey
· 50,955 Views · 4 Likes
article thumbnail
The State of the Storage Engine
This article by Baron Schwartz comes to you from the DZone Guide to Database and Persistence Management.
March 16, 2015
by B Jones
· 16,546 Views · 1 Like
article thumbnail
A Beginner's Guide to JPA and Hibernate Cascade Types
Introduction JPA translates entity state transitions to database DML statements. Because it’s common to operate on entity graphs, JPA allows us to propagate entity state changes from Parents to Child entities. This behavior is configured through the CascadeType mappings. JPA vs Hibernate Cascade Types Hibernate supports all JPA Cascade Types and some additional legacy cascading styles. The following table draws an association between JPA Cascade Types and their Hibernate native API equivalent: JPA EntityManager action JPA CascadeType Hibernate native Session action Hibernate native CascadeType Event Listener detach(entity) DETACH evict(entity) DETACH or EVICT Default Evict Event Listener merge(entity) MERGE merge(entity) MERGE Default Merge Event Listener persist(entity) PERSIST persist(entity) PERSIST Default Persist Event Listener refresh(entity) REFRESH refresh(entity) REFRESH Default Refresh Event Listener remove(entity) REMOVE delete(entity) REMOVE orDELETE Default Delete Event Listener saveOrUpdate(entity) SAVE_UPDATE Default Save Or Update Event Listener replicate(entity, replicationMode) REPLICATE Default Replicate Event Listener lock(entity, lockModeType) buildLockRequest(entity, lockOptions) LOCK Default Lock Event Listener All the above EntityManager methods ALL All the above Hibernate Session methods ALL From this table we can conclude that: There’s no difference between calling persist, merge or refresh on the JPAEntityManager or the Hibernate Session. The JPA remove and detach calls are delegated to Hibernate delete and evict native operations. Only Hibernate supports replicate and saveOrUpdate. While replicate is useful for some very specific scenarios (when the exact entity state needs to be mirrored between two distinct DataSources), the persist and merge combo is always a better alternative than the native saveOrUpdate operation. As a rule of thumb, you should always use persist for TRANSIENT entities and merge for DETACHED ones.The saveOrUpdate shortcomings (when passing a detached entity snapshot to aSession already managing this entity) had lead to the merge operation predecessor: the now extinct saveOrUpdateCopy operation. The JPA lock method shares the same behavior with Hibernate lock request method. The JPA CascadeType.ALL doesn’t only apply to EntityManager state change operations, but to all Hibernate CascadeTypes as well. So if you mapped your associations with CascadeType.ALL, you can still cascade Hibernate specific events. For example, you can cascade the JPA lock operation (although it behaves as reattaching, instead of an actual lock request propagation), even if JPA doesn’t define a LOCK CascadeType. Cascading best practices Cascading only makes sense only for Parent – Child associations (the Parent entity state transition being cascaded to its Child entities). Cascading from Child to Parent is not very useful and usually, it’s a mapping code smell. Next, I’m going to take analyse the cascading behaviour of all JPA Parent – Childassociations. One-To-One The most common One-To-One bidirectional association looks like this: @Entity public class Post { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; private String name; @OneToOne(mappedBy = "post", cascade = CascadeType.ALL, orphanRemoval = true) private PostDetails details; public Long getId() { return id; } public PostDetails getDetails() { return details; } public String getName() { return name; } public void setName(String name) { this.name = name; } public void addDetails(PostDetails details) { this.details = details; details.setPost(this); } public void removeDetails() { if (details != null) { details.setPost(null); } this.details = null; } } @Entity public class PostDetails { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; @Column(name = "created_on") @Temporal(TemporalType.TIMESTAMP) private Date createdOn = new Date(); private boolean visible; @OneToOne @PrimaryKeyJoinColumn private Post post; public Long getId() { return id; } public void setVisible(boolean visible) { this.visible = visible; } public void setPost(Post post) { this.post = post; } } The Post entity plays the Parent role and the PostDetails is the Child. The bidirectional associations should always be updated on both sides, therefore the Parent side should contain the addChild andremoveChild combo. These methods ensure we always synchronize both sides of the association, to avoid Object or Relational data corruption issues. In this particular case, the CascadeType.ALL and orphan removal make sense because the PostDetails life-cycle is bound to that of its Post Parent entity. Cascading the one-to-one persist operation The CascadeType.PERSIST comes along with the CascadeType.ALL configuration, so we only have to persist the Post entity, and the associated PostDetails entity is persisted as well: Post post = new Post(); post.setName("Hibernate Master Class"); PostDetails details = new PostDetails(); post.addDetails(details); session.persist(post); Generating the following output: INSERT INTO post(id, NAME) VALUES (DEFAULT, Hibernate Master Class'') insert into PostDetails (id, created_on, visible) values (default, '2015-03-03 10:17:19.14', false) Cascading the one-to-one merge operation The CascadeType.MERGE is inherited from the CascadeType.ALL setting, so we only have to merge the Post entity and the associated PostDetails is merged as well: Post post = newPost(); post.setName("Hibernate Master Class Training Material"); post.getDetails().setVisible(true); doInTransaction(session -> { session.merge(post); }); The merge operation generates the following output: SELECT onetooneca0_.id AS id1_3_1_, onetooneca0_.NAME AS name2_3_1_, onetooneca1_.id AS id1_4_0_, onetooneca1_.created_on AS created_2_4_0_, onetooneca1_.visible AS visible3_4_0_ FROM post onetooneca0_ LEFT OUTER JOIN postdetails onetooneca1_ ON onetooneca0_.id = onetooneca1_.id WHERE onetooneca0_.id = 1 UPDATE postdetails SET created_on = '2015-03-03 10:20:53.874', visible = true WHERE id = 1 UPDATE post SET NAME = 'Hibernate Master Class Training Material' WHERE id = 1 Cascading the one-to-one delete operation The CascadeType.REMOVE is also inherited from the CascadeType.ALL configuration, so the Post entity deletion triggers a PostDetails entity removal too: Post post = newPost(); doInTransaction(session -> { session.delete(post); }); Generating the following output: delete from PostDetails where id = 1 delete from Post where id = 1 The one-to-one delete orphan cascading operation If a Child entity is dissociated from its Parent, the Child Foreign Key is set to NULL. If we want to have the Child row deleted as well, we have to use the orphan removalsupport. doInTransaction(session -> { Post post = (Post) session.get(Post.class, 1L); post.removeDetails(); }); The orphan removal generates this output: SELECT onetooneca0_.id AS id1_3_0_, onetooneca0_.NAME AS name2_3_0_, onetooneca1_.id AS id1_4_1_, onetooneca1_.created_on AS created_2_4_1_, onetooneca1_.visible AS visible3_4_1_ FROM post onetooneca0_ LEFT OUTER JOIN postdetails onetooneca1_ ON onetooneca0_.id = onetooneca1_.id WHERE onetooneca0_.id = 1 delete from PostDetails where id = 1 Unidirectional one-to-one association Most often, the Parent entity is the inverse side (e.g. mappedBy), the Child controling the association through its Foreign Key. But the cascade is not limited to bidirectional associations, we can also use it for unidirectional relationships: @Entity public class Commit { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; private String comment; @OneToOne(cascade = CascadeType.ALL) @JoinTable( name = "Branch_Merge_Commit", joinColumns = @JoinColumn( name = "commit_id", referencedColumnName = "id"), inverseJoinColumns = @JoinColumn( name = "branch_merge_id", referencedColumnName = "id") ) private BranchMerge branchMerge; public Commit() { } public Commit(String comment) { this.comment = comment; } public Long getId() { return id; } public void addBranchMerge( String fromBranch, String toBranch) { this.branchMerge = new BranchMerge( fromBranch, toBranch); } public void removeBranchMerge() { this.branchMerge = null; } } @Entity public class BranchMerge { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; private String fromBranch; private String toBranch; public BranchMerge() { } public BranchMerge( String fromBranch, String toBranch) { this.fromBranch = fromBranch; this.toBranch = toBranch; } public Long getId() { return id; } } Cascading consists in propagating the Parent entity state transition to one or more Child entities, and it can be used for both unidirectional and bidirectional associations. One-To-Many The most common Parent – Child association consists of a one-to-many and a many-to-one relationship, where the cascade being useful for the one-to-many side only: @Entity public class Post { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; private String name; @OneToMany(cascade = CascadeType.ALL, mappedBy = "post", orphanRemoval = true) private List comments = new ArrayList<>(); public void setName(String name) { this.name = name; } public List getComments() { return comments; } public void addComment(Comment comment) { comments.add(comment); comment.setPost(this); } public void removeComment(Comment comment) { comment.setPost(null); this.comments.remove(comment); } } @Entity public class Comment { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; @ManyToOne private Post post; private String review; public void setPost(Post post) { this.post = post; } public String getReview() { return review; } public void setReview(String review) { this.review = review; } } Like in the one-to-one example, the CascadeType.ALL and orphan removal are suitable because the Comment life-cycle is bound to that of its Post Parent entity. Cascading the one-to-many persist operation We only have to persist the Post entity and all the associated Comment entities are persisted as well: Post post = new Post(); post.setName("Hibernate Master Class"); Comment comment1 = new Comment(); comment1.setReview("Good post!"); Comment comment2 = new Comment(); comment2.setReview("Nice post!"); post.addComment(comment1); post.addComment(comment2); session.persist(post); The persist operation generates the following output: insert into Post (id, name) values (default, 'Hibernate Master Class') insert into Comment (id, post_id, review) values (default, 1, 'Good post!') insert into Comment (id, post_id, review) values (default, 1, 'Nice post!') Cascading the one-to-many merge operation Merging the Post entity is going to merge all Comment entities as well: Post post = newPost(); post.setName("Hibernate Master Class Training Material"); post.getComments() .stream() .filter(comment -> comment.getReview().toLowerCase() .contains("nice")) .findAny() .ifPresent(comment -> comment.setReview("Keep up the good work!") ); doInTransaction(session -> { session.merge(post); }); Generating the following output: SELECT onetomanyc0_.id AS id1_1_1_, onetomanyc0_.NAME AS name2_1_1_, comments1_.post_id AS post_id3_1_3_, comments1_.id AS id1_0_3_, comments1_.id AS id1_0_0_, comments1_.post_id AS post_id3_0_0_, comments1_.review AS review2_0_0_ FROM post onetomanyc0_ LEFT OUTER JOIN comment comments1_ ON onetomanyc0_.id = comments1_.post_id WHERE onetomanyc0_.id = 1 update Post set name = 'Hibernate Master Class Training Material' where id = 1 update Comment set post_id = 1, review='Keep up the good work!' where id = 2 Cascading the one-to-many delete operation When the Post entity is deleted, the associated Comment entities are deleted as well: Post post = newPost(); doInTransaction(session -> { session.delete(post); }); Generating the following output: delete from Comment where id = 1 delete from Comment where id = 2 delete from Post where id = 1 The one-to-many delete orphan cascading operation The orphan-removal allows us to remove the Child entity whenever it’s no longer referenced by its Parent: newPost(); doInTransaction(session -> { Post post = (Post) session.createQuery( "select p " + "from Post p " + "join fetch p.comments " + "where p.id = :id") .setParameter("id", 1L) .uniqueResult(); post.removeComment(post.getComments().get(0)); }); The Comment is deleted, as we can see in the following output: SELECT onetomanyc0_.id AS id1_1_0_, comments1_.id AS id1_0_1_, onetomanyc0_.NAME AS name2_1_0_, comments1_.post_id AS post_id3_0_1_, comments1_.review AS review2_0_1_, comments1_.post_id AS post_id3_1_0__, comments1_.id AS id1_0_0__ FROM post onetomanyc0_ INNER JOIN comment comments1_ ON onetomanyc0_.id = comments1_.post_id WHERE onetomanyc0_.id = 1 delete from Comment where id = 1 If you enjoy reading this article, you might want to subscribe to my newsletter and get a discount for my book as well. Many-To-Many The many-to-many relationship is tricky because each side of this association plays both the Parent and the Child role. Still, we can identify one side from where we’d like to propagate the entity state changes. We shouldn’t default to CascadeType.ALL, because the CascadeTpe.REMOVE might end-up deleting more than we’re expecting (as you’ll soon find out): @Entity public class Author { @Id @GeneratedValue(strategy=GenerationType.AUTO) private Long id; @Column(name = "full_name", nullable = false) private String fullName; @ManyToMany(mappedBy = "authors", cascade = {CascadeType.PERSIST, CascadeType.MERGE}) private List books = new ArrayList<>(); private Author() {} public Author(String fullName) { this.fullName = fullName; } public Long getId() { return id; } public void addBook(Book book) { books.add(book); book.authors.add(this); } public void removeBook(Book book) { books.remove(book); book.authors.remove(this); } public void remove() { for(Book book : new ArrayList<>(books)) { removeBook(book); } } } @Entity public class Book { @Id @GeneratedValue(strategy=GenerationType.AUTO) private Long id; @Column(name = "title", nullable = false) private String title; @ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE}) @JoinTable(name = "Book_Author", joinColumns = { @JoinColumn( name = "book_id", referencedColumnName = "id" ) }, inverseJoinColumns = { @JoinColumn( name = "author_id", referencedColumnName = "id" ) } ) private List authors = new ArrayList<>(); private Book() {} public Book(String title) { this.title = title; } } Cascading the many-to-many persist operation Persisting the Author entities will persist the Books as well: Author _John_Smith = new Author("John Smith"); Author _Michelle_Diangello = new Author("Michelle Diangello"); Author _Mark_Armstrong = new Author("Mark Armstrong"); Book _Day_Dreaming = new Book("Day Dreaming"); Book _Day_Dreaming_2nd = new Book("Day Dreaming, Second Edition"); _John_Smith.addBook(_Day_Dreaming); _Michelle_Diangello.addBook(_Day_Dreaming); _John_Smith.addBook(_Day_Dreaming_2nd); _Michelle_Diangello.addBook(_Day_Dreaming_2nd); _Mark_Armstrong.addBook(_Day_Dreaming_2nd); session.persist(_John_Smith); session.persist(_Michelle_Diangello); session.persist(_Mark_Armstrong); The Book and the Book_Author rows are inserted along with the Authors: insert into Author (id, full_name) values (default, 'John Smith') insert into Book (id, title) values (default, 'Day Dreaming') insert into Author (id, full_name) values (default, 'Michelle Diangello') insert into Book (id, title) values (default, 'Day Dreaming, Second Edition') insert into Author (id, full_name) values (default, 'Mark Armstrong') insert into Book_Author (book_id, author_id) values (1, 1) insert into Book_Author (book_id, author_id) values (1, 2) insert into Book_Author (book_id, author_id) values (2, 1) insert into Book_Author (book_id, author_id) values (2, 2) insert into Book_Author (book_id, author_id) values (3, 1) Dissociating one side of the many-to-many association To delete an Author, we need to dissociate all Book_Author relations belonging to the removable entity: doInTransaction(session -> { Author _Mark_Armstrong = getByName(session, "Mark Armstrong"); _Mark_Armstrong.remove(); session.delete(_Mark_Armstrong); }); This use case generates the following output: SELECT manytomany0_.id AS id1_0_0_, manytomany2_.id AS id1_1_1_, manytomany0_.full_name AS full_nam2_0_0_, manytomany2_.title AS title2_1_1_, books1_.author_id AS author_i2_0_0__, books1_.book_id AS book_id1_2_0__ FROM author manytomany0_ INNER JOIN book_author books1_ ON manytomany0_.id = books1_.author_id INNER JOIN book manytomany2_ ON books1_.book_id = manytomany2_.id WHERE manytomany0_.full_name = 'Mark Armstrong' SELECT books0_.author_id AS author_i2_0_0_, books0_.book_id AS book_id1_2_0_, manytomany1_.id AS id1_1_1_, manytomany1_.title AS title2_1_1_ FROM book_author books0_ INNER JOIN book manytomany1_ ON books0_.book_id = manytomany1_.id WHERE books0_.author_id = 2 delete from Book_Author where book_id = 2 insert into Book_Author (book_id, author_id) values (2, 1) insert into Book_Author (book_id, author_id) values (2, 2) delete from Author where id = 3 The many-to-many association generates way too many redundant SQL statements and often, they are very difficult to tune. Next, I’m going to demonstrate the many-to-many CascadeType.REMOVE hidden dangers. The many-to-many CascadeType.REMOVE gotchas The many-to-many CascadeType.ALL is another code smell, I often bump into while reviewing code. The CascadeType.REMOVE is automatically inherited when usingCascadeType.ALL, but the entity removal is not only applied to the link table, but to the other side of the association as well. Let’s change the Author entity books many-to-many association to use theCascadeType.ALL instead: @ManyToMany(mappedBy = "authors", cascade = CascadeType.ALL) private List books = new ArrayList<>(); When deleting one Author: doInTransaction(session -> { Author _Mark_Armstrong = getByName(session, "Mark Armstrong"); session.delete(_Mark_Armstrong); Author _John_Smith = getByName(session, "John Smith"); assertEquals(1, _John_Smith.books.size()); }); All books belonging to the deleted Author are getting deleted, even if other Authorswe’re still associated to the deleted Books: SELECT manytomany0_.id AS id1_0_, manytomany0_.full_name AS full_nam2_0_ FROM author manytomany0_ WHERE manytomany0_.full_name = 'Mark Armstrong' SELECT books0_.author_id AS author_i2_0_0_, books0_.book_id AS book_id1_2_0_, manytomany1_.id AS id1_1_1_, manytomany1_.title AS title2_1_1_ FROM book_author books0_ INNER JOIN book manytomany1_ ON books0_.book_id = manytomany1_.id WHERE books0_.author_id = 3 delete from Book_Author where book_id=2 delete from Book where id=2 delete from Author where id=3 Most often, this behavior doesn’t match the business logic expectations, only being discovered upon the first entity removal. We can push this issue even further, if we set the CascadeType.ALL to the Book entity side as well: @ManyToMany(cascade = CascadeType.ALL) @JoinTable(name = "Book_Author", joinColumns = { @JoinColumn( name = "book_id", referencedColumnName = "id" ) }, inverseJoinColumns = { @JoinColumn( name = "author_id", referencedColumnName = "id" ) } ) This time, not only the Books are being deleted, but Authors are deleted as well: doInTransaction(session -> { Author _Mark_Armstrong = getByName(session, "Mark Armstrong"); session.delete(_Mark_Armstrong); Author _John_Smith = getByName(session, "John Smith"); assertNull(_John_Smith); }); The Author removal triggers the deletion of all associated Books, which further triggers the removal of all associated Authors. This is a very dangerous operation, resulting in a massive entity deletion that’s rarely the expected behavior. If you enjoyed this article, I bet you are going to love my book as well. SELECT manytomany0_.id AS id1_0_, manytomany0_.full_name AS full_nam2_0_ FROM author manytomany0_ WHERE manytomany0_.full_name = 'Mark Armstrong' SELECT books0_.author_id AS author_i2_0_0_, books0_.book_id AS book_id1_2_0_, manytomany1_.id AS id1_1_1_, manytomany1_.title AS title2_1_1_ FROM book_author books0_ INNER JOIN book manytomany1_ ON books0_.book_id = manytomany1_.id WHERE books0_.author_id = 3 SELECT authors0_.book_id AS book_id1_1_0_, authors0_.author_id AS author_i2_2_0_, manytomany1_.id AS id1_0_1_, manytomany1_.full_name AS full_nam2_0_1_ FROM book_author authors0_ INNER JOIN author manytomany1_ ON authors0_.author_id = manytomany1_.id WHERE authors0_.book_id = 2 SELECT books0_.author_id AS author_i2_0_0_, books0_.book_id AS book_id1_2_0_, manytomany1_.id AS id1_1_1_, manytomany1_.title AS title2_1_1_ FROM book_author books0_ INNER JOIN book manytomany1_ ON books0_.book_id = manytomany1_.id WHERE books0_.author_id = 1 SELECT authors0_.book_id AS book_id1_1_0_, authors0_.author_id AS author_i2_2_0_, manytomany1_.id AS id1_0_1_, manytomany1_.full_name AS full_nam2_0_1_ FROM book_author authors0_ INNER JOIN author manytomany1_ ON authors0_.author_id = manytomany1_.id WHERE authors0_.book_id = 1 SELECT books0_.author_id AS author_i2_0_0_, books0_.book_id AS book_id1_2_0_, manytomany1_.id AS id1_1_1_, manytomany1_.title AS title2_1_1_ FROM book_author books0_ INNER JOIN book manytomany1_ ON books0_.book_id = manytomany1_.id WHERE books0_.author_id = 2 delete from Book_Author where book_id=2 delete from Book_Author where book_id=1 delete from Author where id=2 delete from Book where id=1 delete from Author where id=1 delete from Book where id=2 delete from Author where id=3 This use case is wrong in so many ways. There are a plethora of unnecessary SELECT statements and eventually we end up deleting all Authors and all their Books. That’s why CascadeType.ALL should raise your eyebrow, whenever you spot it on a many-to-many association. When it comes to Hibernate mappings, you should always strive for simplicity. TheHibernate documentation confirms this assumption as well: Practical test cases for real many-to-many associations are rare. Most of the time you need additional information stored in the “link table”. In this case, it is much better to use two one-to-many associations to an intermediate link class. In fact, most associations are one-to-many and many-to-one. For this reason, you should proceed cautiously when using any other association style. Conclusion Cascading is a handy ORM feature, but it’s not free of issues. You should only cascade from Parent entities to Children and not the other way around. You should always use only the casacde operations that are demanded by your business logic requirements, and not turn the CascadeType.ALL into a default Parent-Child association entity state propagation configuration. Code available on GitHub.
March 13, 2015
by Vlad Mihalcea
· 97,371 Views · 8 Likes
article thumbnail
How to Write a "Hello, World!" Microservice
What does implementing microservices mean for a software developer? Especially, for the rookies, greenhorns, and newbs out there? I’m not talking about microservice software architecture here; this is about microservices software development. And not just that, the ultimate implementation goal should be “microservices done right”. For this post, I’ll go with Java. Yes, it’s wordy. Yes, it’s resource intensive (especially when used for the sole purpose of returning a single string). However the concept of classes and objects goes well with my intention of explaining how to do microservices correctly. Plus, it makes sense to use microservices in environments that are heavily biased towards Java. Anyway, please feel free to add your own “Hello, World!” microservice in your favorite language in the comments section below. Hello, monolith! As a prerequisite, you should be familiar with the following piece of code, what it does, and why it has to look the way it does (read this tutorial if you don’t): class Starter { public static void main(String[] args) { System.out.println(“Hello, World!”); } } This is a simple console application that yields the string “Hello, World!” This is not written in the microservice way. This is an example of when not to use the microservices approach: if all you need on your console is a single string, this is all you need. Hello, code duplication! In addition to this console application, I want this string to be available on the web by calling http://localhost:80/helloWorld.servlet from a browser. Here is the required code, implemented as plain HTTP servlet (yes, it’s wordy. Get over it.) class HelloWorldServlet extends HttpServlet { public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.getWriter().println(“Hello, World!”); } } The string “Hello, World!” has to be “implemented” again. Sure, this is no big deal. But this simple string could be so much more. It could be the result of a complex calculation or it could be the result of a time consuming search query. So, just imagine that the string “Hello, World!” is the result of a week’s worth of hard work (If you’re new to programming, it may very well be...). How should you go about making it available to apps and services that you create? Step 1: HelloWorldService.java To save yourself from duplicating a week’s worth of coding, allow me to introduce the HelloWorldService class: class HelloWorldService { public String greet() { return “Hello, World!”; } } You can re-use this fine piece of software craftmanship in all your apps and classes without re-implementing or duplicating code. Here’s our console application again: class Starter { HelloWorldService helloWorldService = new HelloWorldService(); public static void main(String[] args) { String message = helloWorldService.greet(); System.out.println(message); } } The same goes for servlets: class HelloWorldServlet extends HttpServlet { HelloWorldService helloWorldService = new HelloWorldService(); public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { String message = helloWorldService.greet(); response.getWriter().println(message); } } It also works great for Spring MVC controllers: @Controller class HelloWorldController { HelloWorldService helloWorldService = new HelloWorldService(); @RequestMapping("/helloWorld") public String greet() { String message = helloWorldService.greet(); return message; } } I could go on and show more examples, but I think you get the point (spoiler: it’s the bold lines that matter). Those of you who are familiar with microservices could point out that this may be fine for getting rid of code duplication, but this is no microservice. You’re right, but to get to “microservices done right,” you have to be able to separate you app’s concerns, which is what I did here in the most possible basic way: I separated the app’s frontend concerns from its backend concerns. The frontend is either a console app or a servlet, the backend is HelloWorldService. Serviceward, ho! To go down microservice lane from here, all we have to do is wrap HelloWorldService into some kind of web component that makes it accessible via HTTP, right? Let’s see… First, we could just use our servlet code from above, as it conveniently returns the string as a response to any HTTP request. But we won’t. Why? Because there’s something missing: fault tolerance. What could possibly fail when returning a simple string? That’s not the point. What matters is that the client side (the code that calls HelloWorldService) should be given enough information to effectively react to failures. We face two possible problems: The service as a whole may be unavailable The service may be unable to return a proper response The service is unavailable If a service is unavailable, it’s the client that is responsible for dealing with the situation. Frameworks like unirest.io save you the effort of writing many lines of code when dealing with HTTP requests. Future> future = Unirest.post("HTTP://helloworld.myservices.local/greet") .header("accept", "application/json") .asJsonAsync(new Callback() { public void failed(UnirestException e) { //tell them UI folks that the request went south } public void completed(HttpResponse response) { //extract data from response and fulfill it’s destiny } public void cancelled() { //shot a note to UI dept that the request got cancelled } } ); With this code, the client now knows when the service is not available or has timed out following no response. Wee can easily have an error message displayed in place of the string we expected to receive. Try/catch is probably the right solution here. Invalid responses however pose more of a challenge. The service fails If the service fails, we can just return a string with an appropriate error message. But how can you know if a message is an error message or a correct response? Yes, you can start every error message with [ERROR] or invent another “smart” (read: not-so-smart) workaround, but this won’t be a solution you’ll be proud of. And, there’s always the possibility that even valid responses may begin with ERROR because it’s simply part of the message. I’d go with JSON or XML for wrapping the answer. I prefer JSON because it’s a little less wordy than XML. And I really like using the JSON-HTML tool over at json.bloople.net for visualizing results during development. Of course, you might go for any of the numerous alternatives, like protobuf or a proprietary solution of your own. The main point is that you need to be able to apply structure to responses: { “status”:”ok”, ”message”:”Hello, World!” } By checking the status attribute, you can easily decide whether to handle an error or to display an appropriate message. { “status”:”error”, ”message”:”Invalid input parameter” } The possibilities are endless here. You can add an error code or additional properties. This all boils down to a single important point: apply structure to your responses. Structure, why? Because structure not only helps you keep your code maintainable, it also serves as the foundation of the API of your service. An API definition consists of more than a URL like this: GET HTTP://helloworld.myservices.local/greet API definitions also consist of the response structures that can be expected as a response (you know this already from a few lines back): { “status”:”ok”, ”message”:”Hello, World!” } Most important takeaway Keeping the API specifications of a service’s request and response stable is a key requirement for succeeding with microservices. Conclusion Are you (and your project) ready for microservices? If you read this and kept asking yourself, what good is all the overhead of microservices, then either your project won’t benefit from microservices or you’re just not there yet (for mindset perspective see my previous post about the value of microservices ). If you can’t stop thinking about microservices, then you probably are ready.
March 13, 2015
by Martin Goodwell
· 31,224 Views · 7 Likes
article thumbnail
How to Test a REST API With JUnit
RESTEasy (and Jersey as well) contain a minimal web server within their libraries which enables their users to start up a tiny web server.
March 13, 2015
by Mark Paluch
· 311,527 Views · 6 Likes
article thumbnail
R/dplyr: Extracting Data Frame Column Value for Filtering With %in%
I’ve been playing around with dplyr over the weekend and wanted to extract the values from a data frame column to use in a later filtering step. I had a data frame: library(dplyr) df = data.frame(userId = c(1,2,3,4,5), score = c(2,3,4,5,5)) And wanted to extract the userIds of those people who have a score greater than 3. I started with: highScoringPeople = df %>% filter(score > 3) %>% select(userId) > highScoringPeople userId 1 3 2 4 3 5 And then filtered the data frame expecting to get back those 3 people: > df %>% filter(userId %in% highScoringPeople) [1] userId score <0 rows> (or 0-length row.names) No rows! I created vector with the numbers 3-5 to make sure that worked: > df %>% filter(userId %in% c(3,4,5)) userId score 1 3 4 2 4 5 3 5 5 That works as expected so highScoringPeople obviously isn’t in the right format to facilitate an ‘in lookup’. Let’s explore: > str(c(3,4,5)) num [1:3] 3 4 5 > str(highScoringPeople) 'data.frame': 3 obs. of 1 variable: $ userId: num 3 4 5 Now it’s even more obvious why it doesn’t work – highScoringPeople is still a data frame when we need it to be a vector/list. One way to fix this is to extract the userIds using the $ syntax instead of the select function: highScoringPeople = (df %>% filter(score > 3))$userId > str(highScoringPeople) num [1:3] 3 4 5 > df %>% filter(userId %in% highScoringPeople) userId score 1 3 4 2 4 5 3 5 5 Or if we want to do the column selection using dplyr we can extract the values for the column like this: highScoringPeople = (df %>% filter(score > 3) %>% select(userId))[[1]] > str(highScoringPeople) num [1:3] 3 4 5 Not so difficult after all.
March 12, 2015
by Mark Needham
· 15,152 Views
article thumbnail
Why I Use OrientDB on Production Applications
Like many other Java developers, when i start a new Java development project that requires a database, i have hopes and dreams of what my database looks like: Java API (of course) Embeddable Pure Java Simple jar file for inclusion in my project Database stored in a directory on disk Faster than a rocket First I’m going to review these points, and then i’m going to talk about the database i chose for my latest project, which is in production now with hundreds of users accessing the web application each month. What I Want from My Database Here’s what i’m looking for in my database. These are the things that literally make me happy and joyous when writing code. Java API I code in Java. It’s natural for me to want to use a modern Java API for my database work. Embeddable My development productivity and programming enjoyment skyrocket when my database is embedded. The database starts and stops with my application. It’s easy to destroy my database and restart from scratch. I can upgrade my database by updating my database jar file. It’s easy to deploy my application into testing and production, because there’s no separate database server to startup and manage. (I know about the issue with clustering and an embedded database, but i’ll get to that.) Pure Java Back when i developed software that would be deployed on all manner of hardware, i was a stickler that all my code be pure Java, so that i could be confident that my code would run wherever customers and users deployed it. In this day of SaaS, i’m less picky. I develop on the Mac. I test and run in production on Linux. Those are the systems i care about, so if my database has some platform-specific code in it to make it run fast and well, i’m fine with that. Just as long as that platform-specific configuration is not exposed to me as the developer. Simple Jar File for Inclusion in My Project I really just want one database jar file to add to my project. And i don’t want that jar file messing with my code or the dependencies i include in my project. If the database uses Guava 1.2, and i’m using Guava 0.8, that can mess me up. I want my database to not interfere with jars that i use by introducing newer or older versions of class files that i already reference in my project’s jars. Database Stored in a Directory on Disk I like to destroy my database by deleting a directory. I like to run multiple, simultaneous databases by configuring each database to use a separate directory. That makes me super productive during development, and it makes it more fun for me to program to a database. Faster Than a Rocket I think that’s just a given. My Latest Project That Needs a Database My latest project is Floify.com. Floify is a Mortgage Borrower Portal, automating the process of collecting mortgage loan documents from borrowers and emailing milestone loan status updates to real estate agents and borrowers. Mortgage loan originators use Floify to automate the labor-intensive parts of their loan processes. The web application receives about 500 unique visitors per month. Floify experienced 28% growth in january 2015. Floify’s vital statistics are: 38,301 loan documents under management 3,619 registered users 3,113 loan packages under management The Database I Chose for My Latest Project When i started Floify, i looked for a database that met all the criteria i’ve described above. I decided against databases that were server-based (Postgres, etc). I decided against databases that weren’t Java-based (MongoDB, etc). I decided against databases that didn’t support ACID transactions. I narrowed my choices to OrientDB and Neo4j. It’s been a couple years since that decision process occurred, but i distinctly remember a few reasons why i ultimately chose OrientDB over Neo4j: Performance benchmarks for OrientDB were very impressive. The OrientDB development team was very active. Cost. OrientDB is free. Neo4j cost more than what i was willing to pay or what i could afford. I forget which it was. My Favourite OrientDB Features Here are some of my favourite features in OrientDB. These are not competitive advantages to OrientDB. It’s just some of the things that make me happy when coding against an embeddable database. I can create the database in code. I don’t have to use SQL for querying, but most of the time, i do. I already know SQL, and it’s just easy for me. I use the document database, and it’s very pleasant inserting new documents in Java. I can store multi-megabyte binary objects directly in the database. My database is stored in a directory on disk. When scalability demands it, i can upgrade to a two-server distributed database. I haven’t been there yet. Speed. For me, OrientDB is very fast, and in the few years i’ve been using it, it’s become faster. OrientDB doesn’t come in a single jar file, as would be my ideal. I have to include a few different jars, but that’s an easy tradeoff for me. Future In the future, as Floify’s performance and scalability needs demand it, i’ll investigate a multi-server database configuration on OrientDB. In the meantime, i’m preparing to upgrade to OrientDB 2.0, which was recently released and promises even more speed. Go speed. :-)
March 5, 2015
by Dave Sims
· 18,270 Views · 6 Likes
article thumbnail
Using MongoDB with Hadoop & Spark: Part 2 - Hive Example
Originally Written by Matt Kalan Welcome to part two of our three-part series on MongoDB and Hadoop. In part one, we introduced Hadoop and how to set it up. In this post, we'll look at a Hive example. Introduction & Setup of Hadoop and MongoDB Hive Example Spark Example & Key Takeaways For more detail on the use case, see the first paragraph of part 1. Summary Use case: aggregating 1 minute intervals of stock prices into 5 minute intervals Input:: 1 minute stock prices intervals in a MongoDB database Simple Analysis: performed in: - Hive - Spark Output: 5 minute stock prices intervals in Hadoop Hive Example I ran the following example from the Hive command line (simply typing the command “hive” with no parameters), not Cloudera’s Hue editor, as that would have needed additional installation steps. I immediately noticed the criticism people have with Hive, that everything is compiled into MapReduce which takes considerable time. I ran most things with just 20 records to make the queries run quickly. This creates the definition of the table in Hive that matches the structure of the data in MongoDB. MongoDB has a dynamic schema for variable data shapes but Hive and SQL need a schema definition. CREATE EXTERNAL TABLE minute_bars ( id STRUCT, Symbol STRING, Timestamp STRING, Day INT, Open DOUBLE, High DOUBLE, Low DOUBLE, Close DOUBLE, Volume INT ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id", "Symbol":"Symbol", "Timestamp":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minbars'); Recent changes in the Apache Hive repo make the mappings necessary even if you are keeping the field names the same. This should be changed in the MongoDB Hadoop Connector soon if not already by the time you read this. Then I ran the following command to create a Hive table for the 5 minute bars: CREATE TABLE five_minute_bars ( id STRUCT, Symbol STRING, Timestamp STRING, Open DOUBLE, High DOUBLE, Low DOUBLE, Close DOUBLE ); This insert statement uses the SQL windowing functions to group 5 1-minute periods and determine the OHLC for the 5 minutes. There are definitely other ways to do this but here is one I figured out. Grouping in SQL is a little different from grouping in the MongoDB aggregation framework (in which you can pull the first and last of a group easily), so it took me a little while to remember how to do it with a subquery. The subquery takes each group of 5 1-minute records/documents, sorts them by time, and takes the open, high, low, and close price up to that record in each 5-minute period. Then the outside WHERE clause selects the last 1-minute bar in that period (because that row in the subquery has the correct OHLC information for its 5-minute period). I definitely welcome easier queries to understand but you can run the subquery by itself to see what it’s doing too. INSERT INTO TABLE five_minute_bars SELECT m.id, m.Symbol, m.OpenTime as Timestamp, m.Open, m.High, m.Low, m.Close FROM (SELECT id, Symbol, FIRST_VALUE(Timestamp) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as OpenTime, LAST_VALUE(Timestamp) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as CloseTime, FIRST_VALUE(Open) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as Open, MAX(High) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as High, MIN(Low) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as Low, LAST_VALUE(Close) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as Close FROM minute_bars) as m WHERE unix_timestamp(m.CloseTime, 'yyyy-MM-dd HH:mm') - unix_timestamp(m.OpenTime, 'yyyy-MM-dd HH:mm') = 60*4; I can definitely see the benefit of being able to use SQL to access data in MongoDB and optionally in other databases and file formats, all with the same commands, while the mapping differences are handled in the table declarations. The downside is that the latency is quite high, but that could be made up some with the ability to scale horizontally across many nodes. I think this is the appeal of Hive for most people - they can scale to very large data volumes using traditional SQL, and latency is not a primary concern. Post #3 in this blog series shows similar examples using Spark. Introduction & Setup of Hadoop and MongoDB Hive Example Spark Example & Key Takeaways To learn more, watch our video on MongoDB and Hadoop. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights. WATCH MONGODB & HADOOP << Read Part 1
March 2, 2015
by Francesca Krihely
· 10,847 Views
article thumbnail
Standing Up a Local Netflix Eureka
Here I will consider two different ways of standing up a local instance of Netflix Eureka. If you are not familiar with Eureka, it provides a central registry where (micro)services can register themselves and client applications can use this registry to look up specific instances hosting a service and to make the service calls. Approach 1: Native Eureka Library The first way is to simply use the archive file generated by the Netflix Eureka build process: 1. Clone the Eureka source repository here: https://github.com/Netflix/eureka 2. Run "./gradlew build" at the root of the repository, this should build cleanly generating a war file in eureka-server/build/libs folder 3. Grab this file, rename it to "eureka.war" and place it in the webapps folder of either tomcat or jetty. For this exercise I have used jetty. 4. Start jetty, by default jetty will boot up at port 8080, however I wanted to instead bring it up at port 8761, so you can start it up this way, "java -jar start.jar -Djetty.port=8761" The server should start up cleanly and can be verified at this endpoint - "http://localhost:8761/eureka/v2/apps" Approach 2: Spring-Cloud-Netflix Spring-Cloud-Netflix provides a very neat way to bootstrap Eureka. To bring up Eureka server using Spring-Cloud-Netflix the approach that I followed was to clone the sample Eureka server application available here: https://github.com/spring-cloud-samples/eureka 1. Clone this repository 2. From the root of the repository run "mvn spring-boot:run", and that is it!. The server should boot up cleanly and the REST endpoint should come up here: "http://localhost:8761/eureka/apps". As a bonus, Spring-Cloud-Netflix provides a neat UI showing the various applications who have registered with Eureka at the root of the webapp at "http://localhost:8761/". Just a few small issues to be aware of, note that the context url's are a little different in the two cases "eureka/v2/apps" vs "eureka/apps", this can be adjusted on the configurations of the services which register with Eureka. Conclusion Your mileage with these approaches may vary. I have found Spring-Cloud-Netflix a little unstable at times but it has mostly worked out well for me. The documentation at the Spring-Cloud site is also far more exhaustive than the one provided at the Netflix Eureka site.
February 26, 2015
by Biju Kunjummen
· 13,292 Views
article thumbnail
Redirecting All Kinds of stdout in Python
A common task in Python (especially while testing or debugging) is to redirect sys.stdout to a stream or a file while executing some piece of code. However, simply "redirecting stdout" is sometimes not as easy as one would expect; hence the slightly strange title of this post. In particular, things become interesting when you want C code running within your Python process (including, but not limited to, Python modules implemented as C extensions) to also have its stdout redirected according to your wish. This turns out to be tricky and leads us into the interesting world of file descriptors, buffers and system calls. But let's start with the basics. Pure Python The simplest case arises when the underlying Python code writes to stdout, whether by calling print, sys.stdout.write or some equivalent method. If the code you have does all its printing from Python, redirection is very easy. With Python 3.4 we even have a built-in tool in the standard library for this purpose - contextlib.redirect_stdout. Here's how to use it: from contextlib import redirect_stdout f = io.StringIO() with redirect_stdout(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) When this code runs, the actual print calls within the with block don't emit anything to the screen, and you'll see their output captured by in the stream f. Incidentally, note how perfect the with statement is for this goal - everything within the block gets redirected; once the block is done, things are cleaned up for you and redirection stops. If you're stuck on an older and uncool Python, prior to 3.4 [1], what then? Well, redirect_stdout is really easy to implement on your own. I'll change its name slightly to avoid confusion: from contextlib import contextmanager @contextmanager def stdout_redirector(stream): old_stdout = sys.stdout sys.stdout = stream try: yield finally: sys.stdout = old_stdout So we're back in the game: f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) Redirecting C-level streams Now, let's take our shiny redirector for a more challenging ride: import ctypes libc = ctypes.CDLL(None) f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue())) I'm using ctypes to directly invoke the C library's puts function [2]. This simulates what happens when C code called from within our Python code prints to stdout - the same would apply to a Python module using a C extension. Another addition is the os.system call to invoke a subprocess that also prints to stdout. What we get from this is: this comes from C and this is from echo Got stdout: "foobar 12 " Err... no good. The prints got redirected as expected, but the output from puts and echo flew right past our redirector and ended up in the terminal without being caught. What gives? To grasp why this didn't work, we have to first understand what sys.stdout actually is in Python. Detour - on file descriptors and streams This section dives into some internals of the operating system, the C library, and Python [3]. If you just want to know how to properly redirect printouts from C in Python, you can safely skip to the next section (though understanding how the redirection works will be difficult). Files are opened by the OS, which keeps a system-wide table of open files, some of which may point to the same underlying disk data (two processes can have the same file open at the same time, each reading from a different place, etc.) File descriptors are another abstraction, which is managed per-process. Each process has its own table of open file descriptors that point into the system-wide table. Here's a schematic, taken from The Linux Programming Interface: File descriptors allow sharing open files between processes (for example when creating child processes with fork). They're also useful for redirecting from one entry to another, which is relevant to this post. Suppose that we make file descriptor 5 a copy of file descriptor 4. Then all writes to 5 will behave in the same way as writes to 4. Coupled with the fact that the standard output is just another file descriptor on Unix (usually index 1), you can see where this is going. The full code is given in the next section. File descriptors are not the end of the story, however. You can read and write to them with the read and write system calls, but this is not the way things are typically done. The C runtime library provides a convenient abstraction around file descriptors - streams. These are exposed to the programmer as the opaque FILE structure with a set of functions that act on it (for example fprintf and fgets). FILE is a fairly complex structure, but the most important things to know about it is that it holds a file descriptor to which the actual system calls are directed, and it provides buffering, to ensure that the system call (which is expensive) is not called too often. Suppose you emit stuff to a binary file, a byte or two at a time. Unbuffered writes to the file descriptor with write would be quite expensive because each write invokes a system call. On the other hand, using fwrite is much cheaper because the typicall call to this function just copies your data into its internal buffer and advances a pointer. Only occasionally (depending on the buffer size and flags) will an actual write system call be issued. With this information in hand, it should be easy to understand what stdout actually is for a C program. stdout is a global FILE object kept for us by the C library, and it buffers output to file descriptor number 1. Calls to functions like printf and puts add data into this buffer. fflush forces its flushing to the file descriptor, and so on. But we're talking about Python here, not C. So how does Python translate calls to sys.stdout.write to actual output? Python uses its own abstraction over the underlying file descriptor - a file object. Moreover, in Python 3 this file object is further wrapper in an io.TextIOWrapper, because what we pass to print is a Unicode string, but the underlying write system calls accept binary data, so encoding has to happen en route. The important take-away from this is: Python and a C extension loaded by it (this is similarly relevant to C code invoked via ctypes) run in the same process, and share the underlying file descriptor for standard output. However, while Python has its own high-level wrapper around it - sys.stdout, the C code uses its own FILE object. Therefore, simply replacing sys.stdout cannot, in principle, affect output from C code. To make the replacement deeper, we have to touch something shared by the Python and C runtimes - the file descriptor. Redirecting with file descriptor duplication Without further ado, here is an improved stdout_redirector that also redirects output from C code [4]: from contextlib import contextmanager import ctypes import io import os, sys import tempfile libc = ctypes.CDLL(None) c_stdout = ctypes.c_void_p.in_dll(libc, 'stdout') @contextmanager def stdout_redirector(stream): # The original fd stdout points to. Usually 1 on POSIX systems. original_stdout_fd = sys.stdout.fileno() def _redirect_stdout(to_fd): """Redirect stdout to the given file descriptor.""" # Flush the C-level buffer stdout libc.fflush(c_stdout) # Flush and close sys.stdout - also closes the file descriptor (fd) sys.stdout.close() # Make original_stdout_fd point to the same file as to_fd os.dup2(to_fd, original_stdout_fd) # Create a new sys.stdout that points to the redirected fd sys.stdout = io.TextIOWrapper(os.fdopen(original_stdout_fd, 'wb')) # Save a copy of the original stdout fd in saved_stdout_fd saved_stdout_fd = os.dup(original_stdout_fd) try: # Create a temporary file and redirect stdout to it tfile = tempfile.TemporaryFile(mode='w+b') _redirect_stdout(tfile.fileno()) # Yield to caller, then redirect stdout back to the saved fd yield _redirect_stdout(saved_stdout_fd) # Copy contents of temporary file to the given stream tfile.flush() tfile.seek(0, io.SEEK_SET) stream.write(tfile.read()) finally: tfile.close() os.close(saved_stdout_fd) There are a lot of details here (such as managing the temporary file into which output is redirected) that may obscure the key approach: using dup and dup2 to manipulate file descriptors. These functions let us duplicate file descriptors and make any descriptor point at any file. I won't spend more time on them - go ahead and read their documentation, if you're interested. The detour section should provide enough background to understand it. Let's try this: f = io.BytesIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue().decode('utf-8'))) Gives us: Got stdout: "and this is from echo this comes from C foobar 12 " Success! A few things to note: The output order may not be what we expected. This is due to buffering. If it's important to preserve order between different kinds of output (i.e. between C and Python), further work is required to disable buffering on all relevant streams. You may wonder why the output of echo was redirected at all? The answer is that file descriptors are inherited by subprocesses. Since we rigged fd 1 to point to our file instead of the standard output prior to forking to echo, this is where its output went. We use a BytesIO here. This is because on the lowest level, the file descriptors are binary. It may be possible to do the decoding when copying from the temporary file into the given stream, but that can hide problems. Python has its in-memory understanding of Unicode, but who knows what is the right encoding for data printed out from underlying C code? This is why this particular redirection approach leaves the decoding to the caller. The above also makes this code specific to Python 3. There's no magic involved, and porting to Python 2 is trivial, but some assumptions made here don't hold (such as sys.stdout being a io.TextIOWrapper). Redirecting the stdout of a child process We've just seen that the file descriptor duplication approach lets us grab the output from child processes as well. But it may not always be the most convenient way to achieve this task. In the general case, you typically use the subprocess module to launch child processes, and you may launch several such processes either in a pipe or separately. Some programs will even juggle multiple subprocesses launched this way in different threads. Moreover, while these subprocesses are running you may want to emit something to stdout and you don't want this output to be captured. So, managing the stdout file descriptor in the general case can be messy; it is also unnecessary, because there's a much simpler way. The subprocess module's swiss knife Popen class (which serve as the basis for much of the rest of the module) accepts a stdout parameter, which we can use to ask it to get access to the child's stdout: import subprocess echo_cmd = ['echo', 'this', 'comes', 'from', 'echo'] proc = subprocess.Popen(echo_cmd, stdout=subprocess.PIPE) output = proc.communicate()[0] print('Got stdout:', output) The subprocess.PIPE argument can be used to set up actual child process pipes (a la the shell), but in its simplest incarnation it captures the process's output. If you only launch a single child process at a time and are interested in its output, there's an even simpler way: output = subprocess.check_output(echo_cmd) print('Got stdout:', output) check_output will capture and return the child's standard output to you; it will also raise an exception if the child exist with a non-zero return code. Conclusion I hope I covered most of the common cases where "stdout redirection" is needed in Python. Naturally, all of the same applies to the other standard output stream - stderr. Also, I hope the background on file descriptors was sufficiently clear to explain the redirection code; squeezing this topic in such a short space is challenging. Let me know if any questions remain or if there's something I could have explained better. Finally, while it is conceptually simple, the code for the redirector is quite long; I'll be happy to hear if you find a shorter way to achieve the same effect. [1] Do not despair. As of February 2015, a sizable chunk of the worldwide Python programmers are in the same boat. [2] Note that bytes passed to puts. This being Python 3, we have to be careful since libc doesn't understand Python's unicode strings. [3] The following description focuses on Unix/POSIX systems; also, it's necessarily partial. Large book chapters have been written on this topic - I'm just trying to present some key concepts relevant to stream redirection. [4] The approach taken here is inspired by this Stack Overflow answer.
February 23, 2015
by Eli Bendersky
· 19,599 Views
article thumbnail
Sneak Peek into the JCache API (JSR 107)
This post covers the JCache API at a high level and provides a teaser – just enough for you to (hopefully) start itching about it ;-) In this post …. JCache overview JCache API, implementations Supported (Java) platforms for JCache API Quick look at Oracle Coherence Fun stuff – Project Headlands (RESTified JCache by Adam Bien) , JCache related talks at Java One 2014, links to resources for learning more about JCache What is JCache? JCache (JSR 107) is a standard caching API for Java. It provides an API for applications to be able to create and work with in-memory cache of objects. Benefits are obvious – one does not need to concentrate on the finer details of implementing the Caching and time is better spent on the core business logic of the application. JCache components The specification itself is very compact and surprisingly intuitive. The API defines high level components (interfaces) some of which are listed below Caching Provider – used to control Caching Managers and can deal with several of them, Cache Manager – deals with create, read, destroy operations on a Cache Cache – stores entries (the actual data) and exposes CRUD interfaces to deal with the entries Entry – abstraction on top of a key-value pair akin to a java.util.Map Hierarchy of JCache API components JCache Implementations JCache defines the interfaces which of course are implemented by different vendors a.k.a Providers. Oracle Coherence Hazelcast Infinispan ehcache Reference Implementation – this is more for reference purpose rather than a production quality implementation. It is per the specification though and you can be rest assured of the fact that it does in fact pass the TCK as well From the application point of view, all that’s required is the implementation to be present in the classpath. The API also provides a way to further fine tune the properties specific to your provider via standard mechanisms. You should be able to track the list of JCache reference implementations from the JCP website link public class JCacheUsage{ public static void main(String[] args){ //bootstrap the JCache Provider CachingProvider jcacheProvider = Caching.getCachingProvider(); CacheManager jcacheManager = jcacheProvider.getCacheManager(); //configure cache MutableConfiguration jcacheConfig = new MutableConfiguration<>(); jcacheConfig.setTypes(String.class, MyPreciousObject.class); //create cache Cache cache = jcacheManager.createCache("PreciousObjectCache", jcacheConfig); //play around String key = UUID.randomUUID().toString(); cache.put(key, new MyPreciousObject()); MyPreciousObject inserted = cache.get(key); cache.remove(key); cache.get(key); //will throw javax.cache.CacheException since the key does not exist } } JCache provider detection JCache provider detection happens automatically when you only have a single JCache provider on the class path You can choose from the below options as well //set JMV level system property -Djavax.cache.spi.cachingprovider=org.ehcache.jcache.JCacheCachingProvider //code level config System.setProperty("javax.cache.spi.cachingprovider","org.ehcache.jcache.JCacheCachingProvider //you want to choose from multiple JCache providers at runtime CachingProvider ehcacheJCacheProvider = Caching.getCachingProvider("org.ehcache.jcache.JCacheCachingProvider"); //which JCache providers do I have on the classpath? Iterable jcacheProviders = Caching.getCachingProviders(); Java Platform support Compliant with Java SE 6 and above Does not define any details in terms of Java EE integration. This does not mean that it cannot be used in a Java EE environment – it’s just not standardized yet. Could not be plugged into Java EE 7 as a tried and tested standard Candidate for Java EE 8 Project Headlands: Java EE and JCache in tandem By none other than Adam Bien himself ! Java EE 7, Java SE 8 and JCache in action Exposes the JCache API via JAX-RS (REST) Uses Hazelcast as the JCache provider Highly recommended ! Oracle Coherence This post deals with high level stuff w.r.t JCache in general. However, a few lines about Oracle Coherence in general would help put things in perspective Oracle Coherence is a part of Oracle’s Cloud Application Foundation stack It is primarily an in-memory data grid solution Geared towards making applications more scalable in general What’s important to know is that from version 12.1.3 onwards, Oracle Coherence includes a reference implementation for JCache (more in the next section) JCache support in Oracle Coherence Support for JCache implies that applications can now use a standard API to access the capabilities of Oracle Coherence This is made possible by Coherence by simply providing an abstraction over its existing interfaces (NamedCache etc). Application deals with a standard interface (JCache API) and the calls to the API are delegated to the existing Coherence core library implementation Support for JCache API also means that one does not need to use Coherence specific APIs in the application resulting in vendor neutral code which equals portability How ironic – supporting a standard API and always keeping your competitors in the hunt ;-) But hey! That’s what healthy competition and quality software is all about ! Talking of healthy competition – Oracle Coherence does support a host of other features in addition to the standard JCache related capabilities. The Oracle Coherence distribution contains all the libraries for working with the JCache implementation The service definition file in the coherence-jcache.jar qualifies it as a valid JCache provider implementation Curious about Oracle Coherence ? Quick Starter page Documentation Installation Further reading about Coherence and JCache combo – Oracle Coherence documentation JCache at Java One 2014 Couple of great talks revolving around JCache at Java One 2014 Come, Code, Cache, Compute! by Steve Millidge Using the New JCache by Brian Oliver and Greg Luck Hope this was fun :-) Cheers !
February 23, 2015
by Abhishek Gupta DZone Core CORE
· 6,278 Views · 1 Like
article thumbnail
Getting Started with Dropwizard: Authentication, Configuration and HTTPS
Basic Authentication is the simplest way to secure access to a resource.
February 10, 2015
by Dmitry Noranovich
· 48,809 Views · 1 Like
article thumbnail
The API Gateway Pattern: Angular JS and Spring Security Part IV
Written by Dave Syer in the Spring blog In this article we continue our discussion of how to use Spring Security with Angular JS in a “single page application”. Here we show how to build an API Gateway to control the authentication and access to the backend resources using Spring Cloud. This is the fourth in a series of articles, and you can catch up on the basic building blocks of the application or build it from scratch by reading the first article, or you can just go straight to the source code in Github. In the last article we built a simple distributed application that used Spring Session to authenticate the backend resources. In this one we make the UI server into a reverse proxy to the backend resource server, fixing the issues with the last implementation (technical complexity introduced by custom token authentication), and giving us a lot of new options for controlling access from the browser client. Reminder: if you are working through this article with the sample application, be sure to clear your browser cache of cookies and HTTP Basic credentials. In Chrome the best way to do that for a single server is to open a new incognito window. Creating an API Gateway An API Gateway is a single point of entry (and control) for front end clients, which could be browser based (like the examples in this article) or mobile. The client only has to know the URL of one server, and the backend can be refactored at will with no change, which is a significant advantage. There are other advantages in terms of centralization and control: rate limiting, authentication, auditing and logging. And implementing a simple reverse proxy is really simple with Spring Cloud. If you were following along in the code, you will know that the application implementation at the end of the last article was a bit complicated, so it’s not a great place to iterate away from. There was, however, a halfway point which we could start from more easily, where the backend resource wasn’t yet secured with Spring Security. The source code for this is a separate project in Github so we are going to start from there. It has a UI server and a resource server and they are talking to each other. The resource server doesn’t have Spring Security yet so we can get the system working first and then add that layer. Declarative Reverse Proxy in One Line To turn it into an API Gateawy, the UI server needs one small tweak. Somewhere in the Spring configuration we need to add an @EnableZuulProxy annotation, e.g. in the main (only)application class: @SpringBootApplication @RestController @EnableZuulProxy public class UiApplication { ... } and in an external configuration file we need to map a local resource in the UI server to a remote one in the external configuration (“application.yml”): security: ... zuul: routes: resource: path: /resource/** url: http://localhost:9000 This says “map paths with the pattern /resource/** in this server to the same paths in the remote server at localhost:9000”. Simple and yet effective (OK so it’s 6 lines including the YAML, but you don’t always need that)! All we need to make this work is the right stuff on the classpath. For that purpose we have a few new lines in our Maven POM: org.springframework.cloud spring-cloud-starter-parent 1.0.0.BUILD-SNAPSHOT pom import org.springframework.cloud spring-cloud-starter-zuul ... Note the use of the “spring-cloud-starter-zuul” - it’s a starter POM just like the Spring Boot ones, but it governs the dependencies we need for this Zuul proxy. We are also using because we want to be able to depend on all the versions of transitive dependencies being correct. Consuming the Proxy in the Client With those changes in place our application still works, but we haven’t actually used the new proxy yet until we modify the client. Fortunately that’s trivial. We just need to go from this implementation of the “home” controller: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('http://localhost:9000/').success(function(data) { $scope.greeting = data; }) }); to a local resource: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('resource/').success(function(data) { $scope.greeting = data; }) }); Now when we fire up the servers everything is working and the requests are being proxied through the UI (API Gateway) to the resource server. Further Simplifications Even better: we don’t need the CORS filter any more in the resource server. We threw that one together pretty quickly anyway, and it should have been a red light that we had to do anything as technically focused by hand (especially where it concerns security). Fortunately it is now redundant, so we can just throw it away, and go back to sleeping at night! Securing the Resource Server You might remember in the intermediate state that we started from there is no security in place for the resource server. Aside: Lack of software security might not even be a problem if your network architecture mirrors the application architecture (you can just make the resource server physically inaccessible to anyone but the UI server). As a simple demonstration of that we can make the resource server only accessible on localhost. Just add this to application.properties in the resource server: server.address: 127.0.0.1 Wow, that was easy! Do that with a network address that’s only visible in your data center and you have a security solution that works for all resource servers and all user desktops. Suppose that we decide we do need security at the software level (quite likely for a number of reasons). That’s not going to be a problem, because all we need to do is add Spring Security as a dependency (in the resource server POM): org.springframework.boot spring-boot-starter-security That’s enough to get us a secure resource server, but it won’t get us a working application yet, for the same reason that it didn’t in Part III: there is no shared authentication state between the two servers. Sharing Authentication State We can use the same mechanism to share authentication (and CSRF) state as we did in the last, i.e. Spring Session. We add the dependency to both servers as before: org.springframework.session spring-session 1.0.0.RELEASE org.springframework.boot spring-boot-starter-redis but this time the configuration is much simpler because we can just add the same Filterdeclaration to both. First the UI server (adding @EnableRedisHttpSession): @SpringBootApplication @RestController @EnableZuulProxy @EnableRedisHttpSession public class UiApplication { ... } and then the resource server. There are two changes to make: one is adding@EnableRedisHttpSession and a HeaderHttpSessionStrategy bean to theResourceApplication: @SpringBootApplication @RestController @EnableRedisHttpSession class ResourceApplication { ... @Bean HeaderHttpSessionStrategy sessionStrategy() { new HeaderHttpSessionStrategy(); } } and the other is to explicitly ask for a non-stateless session creation policy inapplication.properties: security.sessions: NEVER As long as redis is still running in the background (use the fig.yml if you like to start it) then the system will work. Load the homepage for the UI at http://localhost:8080 and login and you will see the message from the backend rendered on the homepage. How Does it Work? What is going on behind the scenes now? First we can look at the HTTP requests in the UI server (and API Gateway): VERB PATH STATUS RESPONSE GET / 200 index.html GET /css/angular-bootstrap.css 200 Twitter bootstrap CSS GET /js/angular-bootstrap.js 200 Bootstrap and Angular JS GET /js/hello.js 200 Application logic GET /user 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /resource 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /login.html 200 Angular login form partial POST /login 302 Redirect to home page (ignored) GET /user 200 JSON authenticated user GET /resource 200 (Proxied) JSON greeting That’s identical to the sequence at the end of Part II except for the fact that the cookie names are slightly different (“SESSION” instead of “JSESSIONID”) because we are using Spring Session. But the architecture is different and that last request to “/resource” is special because it was proxied to the resource server. We can see the reverse proxy in action by looking at the “/trace” endpoint in the UI server (from Spring Boot Actuator, which we added with the Spring Cloud dependencies). Go tohttp://localhost:8080/trace in a browser and scroll to the end (if you don’t have one already get a JSON plugin for your browser to make it nice and readable). You will need to authenticate with HTTP Basic (browser popup), but the same credentials are valid as for your login form. At or near the end you should see a pair of requests something like this: { "timestamp": 1420558194546, "info": { "method": "GET", "path": "/", "query": "" "remote": true, "proxy": "resource", "headers": { "request": { "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260", "x-forwarded-prefix": "/resource", "x-forwarded-host": "localhost:8080" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } }, } }, { "timestamp": 1420558200232, "info": { "method": "GET", "path": "/resource/", "headers": { "request": { "host": "localhost:8080", "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } } } }, The second entry there is the request from the client to the gateway on “/resource” and you can see the cookies (added by the browser) and the CSRF header (added by Angular as discussed inPart II). The first entry has remote: true and that means it’s tracing the call to the resource server. You can see it went out to a uri path “/” and you can see that (crucially) the cookies and CSRF headers have been sent too. Without Spring Session these headers would be meaningless to the resource server, but the way we have set it up it can now use those headers to re-constitute a session with authentication and CSRF token data. So the request is permitted and we are in business! Conclusion We covered quite a lot in this article but we got to a really nice place where there is a minimal amount of boilerplate code in our two servers, they are both nicely secure and the user experience isn’t compromised. That alone would be a reason to use the API Gateway pattern, but really we have only scratched the surface of what that might be used for (Netflix uses it for a lot of things). Read up on Spring Cloud to find out more on how to make it easy to add more features to the gateway. The next article in this series will extend the application architecture a bit by extracting the authentication responsibilities to a separate server (the Single Sign On pattern).
February 9, 2015
by Pieter Humphrey
· 16,292 Views
  • Previous
  • ...
  • 490
  • 491
  • 492
  • 493
  • 494
  • 495
  • 496
  • 497
  • 498
  • 499
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×