DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1
Learn how to set up a four node Hadoop cluster using AWS EC2, PuTTy(gen), and WinSCP.
January 23, 2014
by Hardik Pandya
· 135,934 Views · 3 Likes
article thumbnail
Big Data Search, Part 2: Setting Up
the interesting thing about this problem is that i was very careful in how i phrased things. i said what i wanted to happen, but didn’t specify what needs to be done. that was quite intentional. for that matter, the fact that i am posting about what is going to be our acceptance criteria is also intentional. the idea is to have a non trivial task, but something that should be very well understood and easy to research. it also means that the candidate needs to be able to write some non trivial code. and i can tell a lot about a dev from such a project. at the same time, this is a very self contained scenario. the idea is that this is something that you can do in a short amount of time. the reason that this is an interesting exercise is that this is actually at least two totally different but related problems. first, in a 15tb file, we obviously cannot rely on just scanning the entire file. that means that we have to have an index. and that means we have to build it. interestingly enough, an index being a sorted structure, that means that we have to solve the problem of sorting more data than can fit in main memory. the second problem is probably easier, since it is just an implementation of external sort, and there are plenty of algorithms around to handle that. note that i am not really interested in actual efficiencies for this particular scenario. i care about being able to see the code. see that it works, etc. my solution, for example, is a single threaded system that make no attempt at parallelism or i/o optimizations. it clocks at over 1 gb / minute and the memory consumption is at under 150mb. queries for a unique value return the result in 0.0004 seconds. queries that returned 153k results completed in about 2 seconds. when increasing the used memory to about 650mb, there isn’t really any difference in performance, which surprised me a bit. then again, the entire code is probably highly inefficient. but that is good enough for now. the process is kicked off with indexing: 1: var options = new directoryexternalstorageoptions("/path/to/index/files"); 2: var input = file.openread(@"/path/to/data/crimes_-_2001_to_present.csv"); 3: var sorter = new externalsorter(input, options, new int[] 4: { 5: 1,// case number 6: 4, // ichr 7: 8: }); 9: 10: sorter.sort(); i am actually using the chicago crime data for this. this is a 1gb file that i downloaded from the chicago city portal in csv format. this is what the data looks like: the externalsorter will read and parse the file, and start reading it into a buffer. when it gets to a certain size (about 64mb of source data, usually), it will sort the values in memory and output them into temporary files. those file looks like this: initially, i tried to do that with binary data, but it turns out that that was too complex to be easy, and writing this in a human readable format made it much easier to work with. the format is pretty simple, you have the value of the left, and on the right you have start position of the row for this value. we generate about 17 such temporary files for the 1gb file. one temporary file per each 64 mb of the original file. this lets us keep our actual memory consumption very low, but for larger data sets, we’ll probably want to actually do the sort every 1 gb or maybe more. our test machine has 16 gb of ram, so doing a sort and outputting a temporary file every 8 gb can be a good way to handle things. but that is beside the point. the end result is that we have multiple sorted files, but they aren’t sequential. in other words, in file #1 we have values 1,4,6,8 and in file #2 we have 1,2,6,7. we need to merge all of them together. luckily, this is easy enough to do. we basically have a heap that we feed entries from the files into. and that pretty much takes care of this. see merge sort if you want more details about this. the end result of merging all of those files is… another file, just like them, that contains all of the data sorted. then it is time to actually handle the other issue, actually searching the data. we can do that using simple binary search, with the caveat that because this is a text file, and there is no fixed size records or pages, it is actually a big hard to figure out where to start reading. in effect, what i am doing is to select an arbitrary byte position, then walk backward until i find a ‘\n’. once i found the new line character, i can read the full line, check the value, and decide where i need to look next. assuming that i actually found my value, i can now go to the byte position of the value in the original file and read the original line, giving it to the user. assuming an indexing rate of 1 gb / minute a 15 tb file would take about 10 days to index. but there are ways around that as well, but i’ll touch on them in my next post. what all of this did was bring home just how much we usually don’t have to worry about such things. but i consider this research well spent, we’ll be using this in the future.
January 21, 2014
by Oren Eini
· 3,447 Views
article thumbnail
Node.js and N1QL
This post was originally written by Brett Lawson. So, recently I added support to our Node.js client for executing N1QL queries against your cluster, providing you are running an instance of the N1QL engine (to get a hold of the updated version of the Node.js client with this support, point npm to our github master branch at https://github.com/couchbase/couchnode). When I implemented it, I didn’t have very much to test against at the time, so I figured it would be a interesting endeavor to see how nice the Node.js’s beer-sample example would look if we used entirely N1QL queries rather than using any views. I first started by converting over the basic queries which simply selected all beers or breweries from the sample data, and then moved on to converting the live-search querying to use N1QL as well. I figured I would write a little blog post on the conversions and make some remarks about what I noticed along the way. Here is our first query: var q = { limit : ENTRIES_PER_PAGE, stale : false }; db.view( "beer", "by_name", q).query(function(err, values) { var keys = _.pluck(values, 'id'); db.getMulti( keys, null, function(err, results) { var beers = _.map(results, function(v, k) { v.value.id = k; return v.value; }); res.render('beer/index', {'beers':beers}); }) }); and the converted version: db.query( "SELECT META().id AS id, * FROM beer-sample WHERE type='beer' LIMIT " + ENTRIES_PER_PAGE, function(err, beers) { res.render('beer/index', {'beers':beers}); }); As you can see, we no longer need to do two separate operations to retrieve the list. We can execute our N1QL query which will returns all the information that we need, and formats it appropriately; rather than needing to reformat the data and add our id values, we can simply select it as part of the result set. I find the N1QL version here is much more concise and appreciate how simple it was to construct the query. I then converted the brewery listing function following a similar path, and here is what I ended up with, as you can see, it is similarly beautiful and concise: db.query( "SELECT META().id AS id, name FROM beer-sample WHERE type='brewery' LIMIT " + ENTRIES_PER_PAGE, function(err, breweries) { res.render('brewery/index', {'breweries':breweries}); }); Next I converted the searching methods. These were a bit more of a challenge as looking at the original code directly, without thinking about what it was trying to achieve, the semantics were not immediately obvious, here is a look at what it looked like: var q = { startkey : value, endkey : value + JSON.parse('"\u0FFF"'), stale : false, limit : ENTRIES_PER_PAGE } db.view( "beer", "by_name", q).query(function(err, values) { var keys = _.pluck(values, 'id'); db.getMulti( keys, null, function(err, results) { var beers = []; for(var k in results) { beers.push({ 'id': k, 'name': results[k].value.name, 'brewery_id': results[k].value.brewery_id }); } res.send(beers); }); }); Again, we have quite a bit of code to achieve something which you should expect to be quite simple. In case you can’t tell, the map/reduce query above retrieves a listing of beers whose names begin with the value entered by the user. We are going to convert this to a N1QL LIKE clause, and as an added bonus, we will allow the search term to appear anywhere in the string, instead of requiring it at the beginning: db.query( "SELECT META().id, name, brewery_id FROM beer-sample WHERE type='beer' AND LOWER(name) LIKE '%" + term + "%' LIMIT " + ENTRIES_PER_PAGE, function(err, beers) { res.send(beers); }); We have again collapsed a large amount of vaguely understandable code down to a simple and concise query. I believe this begins to show the power of N1QL and why I am personally so excited to see N1QL. There is however one caveat I noticed while doing this, and this is that similar to SQL, you need to be careful about what kind of user-data you are passing into your queries. I wrote a simple cleaning function to try and prevent any malicious intent (though N1QL is currently read-only anyways), but my cleaning code is by no means extensive. Another issue I noticed is that our second query with the LIKE clause executed significantly slower as a N1QL query then it did when using map/reduce. I believe this is simply a result of N1QL still being developer preview, and there is lots of optimizations left to be done by the N1QL team. If you want to see the fully converted source code, take a look at the n1ql branch of the beersample-node repository available here, https://github.com/couchbaselabs/beersample-node/tree/n1ql. Thanks! Brett
January 17, 2014
by Don Pinto
· 7,904 Views · 1 Like
article thumbnail
A Beginner's Guide to ACID and Database Transactions
Read the original article here.
January 7, 2014
by Vlad Mihalcea
· 20,396 Views
article thumbnail
CGLib: The Missing Manual
The byte code instrumentation library cglib is a popular choice among many well-known Java frameworks such as Hibernate (not anymore) or Spring for doing their dirty work. Byte code instrumentation allows to manipulate or to create classes after the compilation phase of a Java application. Since Java classes are linked dynamically at run time, it is possible to add new classes to an already running Java program. Hibernate uses cglib for example for its generation of dynamic proxies. Instead of returning the full object that you stored in a a database, Hibernate will return you an instrumented version of your stored class that lazily loads some values from the database only when they are requested. Spring used cglib for example when adding security constraints to your method calls. Instead of calling your method directly, Spring security will first check if a specified security check passes and only delegate to your actual method after this verification. Another popular use of cglib is within mocking frameworks such as mockito, where mocks are nothing more than instrumented class where the methods were replaced with empty implementations (plus some tracking logic). Other than ASM - another very high-level byte code manipulation library on top of which cglib is built - cglib offers rather low-level byte code transformers that can be used without even knowing about the details of a compiled Java class. Unfortunately, the documentation of cglib is rather short, not to say that there is basically none. Besides a single blog article from 2005 that demonstrates the Enhancer class, there is not much to find. This blog article is an attempt to demonstrate cglib and its unfortunately often awkward API. Enhancer Let's start with the Enhancer class, the probably most used class of the cglib library. An enhancer allows the creation of Java proxies for non-interface types. The Enhancer can be compared with the Java standard library's Proxy class which was introduced in Java 1.3. The Enhancer dynamically creates a subclass of a given type but intercepts all method calls. Other than with the Proxy class, this works for both class and interface types. The following example and some of the examples after are based on this simple Java POJO: public class SampleClass { public String test(String input) { return "Hello world!"; } } Using cglib, the return value of test(String) method can easily be replaced by another value using an Enhancer and a FixedValue callback: @Test public void testFixedValue() throws Exception { Enhancer enhancer = new Enhancer(); enhancer.setSuperclass(SampleClass.class); enhancer.setCallback(new FixedValue() { @Override public Object loadObject() throws Exception { return "Hello cglib!"; } }); SampleClass proxy = (SampleClass) enhancer.create(); assertEquals("Hello cglib!", proxy.test(null)); } In the above example, the enhancer will return an instance of an instrumented subclass of SampleClass where all method calls return a fixed value which is generated by the anonymous FixedValue implementation above. The object is created by Enhancer#create(Object...) where the method takes any number of arguments which are used to pick any constructor of the enhanced class. (Even though constructors are only methods on the Java byte code level, the Enhancer class cannot instrument constructors. Neither can it instrument static or final classes.) If you only want to create a class, but no instance, Enhancer#createClass will create a Class instance which can be used to create instances dynamically. All constructors of the enhanced class will be available as delegation constructors in this dynamically generated class. Be aware that any method call will be delegated in the above example, also calls to the methods defined in java.lang.Object. As a result, a call to proxy.toString() will also return "Hello cglib!". In contrast will a call to proxy.hashCode() result in a ClassCastException since the FixedValue interceptor always returns a String even though the Object#hashCode signature requires a primitive integer. Another observation that can be made is that final methods are not intercepted. An example of such a method is Object#getClass which will return something like "SampleClass$$EnhancerByCGLIB$$e277c63c" when it is invoked. This class name is generated randomly by cglib in order to avoid naming conflicts. Be aware of the different class of the enhanced instance when you are making use of explicit types in your program code. The class generated by cglib will however be in the same package as the enhanced class (and therefore be able to override package-private methods). Similar to final methods, the subclassing approach makes for the inability of enhancing final classes. Therefore frameworks as Hibernate cannot persist final classes. Next, let us look at a more powerful callback class, the InvocationHandler, that can also be used with an Enhancer: @Test public void testInvocationHandler() throws Exception { Enhancer enhancer = new Enhancer(); enhancer.setSuperclass(SampleClass.class); enhancer.setCallback(new InvocationHandler() { @Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { if(method.getDeclaringClass() != Object.class && method.getReturnType() == String.class) { return "Hello cglib!"; } else { throw new RuntimeException("Do not know what to do."); } } }); SampleClass proxy = (SampleClass) enhancer.create(); assertEquals("Hello cglib!", proxy.test(null)); assertNotEquals("Hello cglib!", proxy.toString()); } This callback allows us to answer with regards to the invoked method. However, you should be careful when calling a method on the proxy object that comes with the InvocationHandler#invoke method. All calls on this method will be dispatched with the same InvocationHandler and might therefore result in an endless loop. In order to avoid this, we can use yet another callback dispatcher: @Test public void testMethodInterceptor() throws Exception { Enhancer enhancer = new Enhancer(); enhancer.setSuperclass(SampleClass.class); enhancer.setCallback(new MethodInterceptor() { @Override public Object intercept(Object obj, Method method, Object[] args, MethodProxy proxy) throws Throwable { if(method.getDeclaringClass() != Object.class && method.getReturnType() == String.class) { return "Hello cglib!"; } else { proxy.invokeSuper(obj, args); } } }); SampleClass proxy = (SampleClass) enhancer.create(); assertEquals("Hello cglib!", proxy.test(null)); assertNotEquals("Hello cglib!", proxy.toString()); proxy.hashCode(); // Does not throw an exception or result in an endless loop. } The MethodInterceptor allows full control over the intercepted method and offers some utilities for calling the method of the enhanced class in their original state. But why would one want to use other methods anyways? Because the other methods are more efficient and cglib is often used in edge case frameworks where efficiency plays a significant role. The creation and linkage of the MethodInterceptor requires for example the generation of a different type of byte code and the creation of some runtime objects that are not required with the InvocationHandler. Because of that, there are other classes that can be used with the Enhancer: LazyLoader: Even though the LazyLoader's only method has the same method signature as FixedValue, the LazyLoader is fundamentally different to the FixedValue interceptor. The LazyLoader is actually supposed to return an instance of a subclass of the enhanced class. This instance is requested only when a method is called on the enhanced object and then stored for future invocations of the generated proxy. This makes sense if your object is expensive in its creation without knowing if the object will ever be used. Be aware that some constructor of the enhanced class must be called both for the proxy object and for the lazily loaded object. Thus, make sure that there is another cheap (maybe protected) constructor available or use an interface type for the proxy. You can choose the invoked constructed by supplying arguments to Enhancer#create(Object...). Dispatcher: The Dispatcher is like the LazyLoader but will be invoked on every method call without storing the loaded object. This allows to change the implementation of a class without changing the reference to it. Again, be aware that some constructor must be called for both the proxy and the generated objects. ProxyRefDispatcher: This class carries a reference to the proxy object it is invoked from in its signature. This allows for example to delegate method calls to another method of this proxy. Be aware that this can easily cause an endless loop and will always cause an endless loop if the same method is called from within ProxyRefDispatcher#loadObject(Object). NoOp: The NoOp class does not what its name suggests. Instead, it delegates each method call to the enhanced class's method implementation. At this point, the last two interceptors might not make sense to you. Why would you even want to enhance a class when you will always delegate method calls to the enhanced class anyways? And you are right. These interceptors should only be used together with a CallbackFilter as it is demonstrated in the following code snippet: @Test public void testCallbackFilter() throws Exception { Enhancer enhancer = new Enhancer(); CallbackHelper callbackHelper = new CallbackHelper(SampleClass.class, new Class[0]) { @Override protected Object getCallback(Method method) { if(method.getDeclaringClass() != Object.class && method.getReturnType() == String.class) { return new FixedValue() { @Override public Object loadObject() throws Exception { return "Hello cglib!"; }; } } else { return NoOp.INSTANCE; // A singleton provided by NoOp. } } }; enhancer.setSuperclass(MyClass.class); enhancer.setCallbackFilter(callbackHelper); enhancer.setCallbacks(callbackHelper.getCallbacks()); SampleClass proxy = (SampleClass) enhancer.create(); assertEquals("Hello cglib!", proxy.test(null)); assertNotEquals("Hello cglib!", proxy.toString()); proxy.hashCode(); // Does not throw an exception or result in an endless loop. } The Enhancer instance accepts a CallbackFilter in its Enhancer#setCallbackFilter(CallbackFilter) method where it expects methods of the enhanced class to be mapped to array indices of an array of Callback instances. When a method is invoked on the created proxy, the Enhancer will then choose the according interceptor and dispatch the called method on the corresponding Callback (which is a marker interface for all the interceptors that were introduced so far). To make this API less awkward, cglib offers a CallbackHelper which will represent a CallbackFilter and which can create an array of Callbacks for you. The enhanced object above will be functionally equivalent to the one in the example for the MethodInterceptor but it allows you to write specialized interceptors whilst keeping the dispatching logic to these interceptors separate. How does it work? When the Enhancer creates a class, it will set create a privatestatic field for each interceptor that was registered as a Callback for the enhanced class after its creation. This also means that class definitions that were created with cglib cannot be reused after their creation since the registration of callbacks does not become a part of the generated class's initialization phase but are prepared manually by cglib after the class was already initialized by the JVM. This also means that classes created with cglib are not technically ready after their initialization and for example cannot be sent over the wire since the callbacks would not exist for the class loaded in the target machine. Depending on the registered interceptors, cglib might register additional fields such as for example for the MethodInterceptor where two privatestatic fields (one holding a reflective Method and a the other holding MethodProxy) are registered per method that is intercepted in the enhanced class or any of its subclasses. Be aware that the MethodProxy is making excessive use of the FastClass which triggers the creation of additional classes and is described in further detail below. For all these reasons, be careful when using the Enhancer. And always register callback types defensively, since the MethodInterceptor will for example trigger the creation of additional classes and register additional static fields in the enhanced class. This is specifically dangerous since the callback variables are also stored as static variables in the enhanced class: This implies that the callback instances are never garbage collected (unless their ClassLoader is, what is unusual). This is in particular dangerous when using anonymous classes which silently carry a reference to their outer class. Recall the example above: @Test public void testFixedValue() throws Exception { Enhancer enhancer = new Enhancer(); enhancer.setSuperclass(SampleClass.class); enhancer.setCallback(new FixedValue() { @Override public Object loadObject() throws Exception { return "Hello cglib!"; } }); SampleClass proxy = (SampleClass) enhancer.create(); assertEquals("Hello cglib!", proxy.test(null)); } The anonymous subclass of FixedValue would become hardly referenced from the enhanced SampleClass such that neither the anonymous FixedValue instance or the class holding the @Test method would ever be garbage collected. This can introduce nasty memory leaks in your applications. Therefore, do not use non-static inner classes with cglib. (I only use them in this blog entry for keeping the examples short.) Finally, you should never intercept Object#finalize(). Due to the subclassing approach of cglib, intercepting finalize is implemented by overriding it what is in general a bad idea. Enhanced instances that intercept finalize will be treated differently by the garbage collector and will also cause these objects being queued in the JVM's finalization queue. Also, if you (accidentally) create a hard reference to the enhanced class in your intercepted call to finalize, you have effectively created an noncollectable instance. This is in general nothing you want. Note that final methods are never intercepted by cglib. Thus, Object#wait, Object#notify and Object#notifyAll do not impose the same problems. Be however aware that Object#clone can be intercepted what is something you might not want to do. Immutable Bean cglib's ImmutableBean allows you to create an immutability wrapper similar to for example Collections#immutableSet. All changes of the underlying bean will be prevented by an IllegalStateException (however, not by an UnsupportedOperationException as recommended by the Java API). Looking at some bean public class SampleBean { private String value; public String getValue() { return value; } public void setValue(String value) { this.value = value; } } we can make this bean immutable: @Test(expected = IllegalStateException.class) public void testImmutableBean() throws Exception { SampleBean bean = new SampleBean(); bean.setValue("Hello world!"); SampleBean immutableBean = (SampleBean) ImmutableBean.create(bean); assertEquals("Hello world!", immutableBean.getValue()); bean.setValue("Hello world, again!"); assertEquals("Hello world, again!", immutableBean.getValue()); immutableBean.setValue("Hello cglib!"); // Causes exception. } As obvious from the example, the immutable bean prevents all state changes by throwing an IllegalStateException. However, the state of the bean can be changed by changing the original object. All such changes will be reflected by the ImmutableBean. Bean Generator The BeanGenerator is another bean utility of cglib. It will create a bean for you at run time: @Test public void testBeanGenerator() throws Exception { BeanGenerator beanGenerator = new BeanGenerator(); beanGenerator.addProperty("value", String.class); Object myBean = beanGenerator.create(); Method setter = myBean.getClass().getMethod("setValue", String.class); setter.invoke(myBean, "Hello cglib!"); Method getter = myBean.getClass().getMethod("getValue"); assertEquals("Hello cglib!", getter.invoke(myBean)); } As obvious from the example, the BeanGenerator first takes some properties as name value pairs. On creation, the BeanGenerator creates the accessors get() void set() for you. This might be useful when another library expects beans which it resolved by reflection but you do not know these beans at run time. (An example would be Apache Wicket which works a lot with beans.) Bean Copier The BeanCopier is another bean utility that copies beans by their property values. Consider another bean with similar properties as SampleBean: public class OtherSampleBean { private String value; public String getValue() { return value; } public void setValue(String value) { this.value = value; } } Now you can copy properties from one bean to another: @Test public void testBeanCopier() throws Exception { BeanCopier copier = BeanCopier.create(SampleBean.class, OtherSampleBean.class, false); SampleBean bean = new SampleBean(); myBean.setValue("Hello cglib!"); OtherSampleBean otherBean = new OtherSampleBean(); copier.copy(bean, otherBean, null); assertEquals("Hello cglib!", otherBean.getValue()); } without being restrained to a specific type. The BeanCopier#copy mehtod takles an (eventually) optional Converter which allows to do some further manipulations on each bean property. If the BeanCopier is created with false as the third constructor argument, the Converter is ignored and can therefore be null. Bulk Bean A BulkBean allows to use a specified set of a bean's accessors by arrays instead of method calls: @Test public void testBulkBean() throws Exception { BulkBean bulkBean = BulkBean.create(SampleBean.class, new String[]{"getValue"}, new String[]{"setValue"}, new Class[]{String.class}); SampleBean bean = new SampleBean(); bean.setValue("Hello world!"); assertEquals(1, bulkBean.getPropertyValues(bean).length); assertEquals("Hello world!", bulkBean.getPropertyValues(bean)[0]); bulkBean.setPropertyValues(bean, new Object[] {"Hello cglib!"}); assertEquals("Hello cglib!", bean.getValue()); } The BulkBean takes an array of getter names, an array of setter names and an array of property types as its constructor arguments. The resulting instrumented class can then extracted as an array by BulkBean#getPropertyBalues(Object). Similarly, a bean's properties can be set by BulkBean#setPropertyBalues(Object, Object[]). Bean Map This is the last bean utility within the cglib library. The BeanMap converts all properties of a bean to a String-to-Object Java Map: @Test public void testBeanGenerator() throws Exception { SampleBean bean = new SampleBean(); BeanMap map = BeanMap.create(bean); bean.setValue("Hello cglib!"); assertEquals("Hello cglib", map.get("value")); } Additionally, the BeanMap#newInstance(Object) method allows to create maps for other beans by reusing the same Class. Key Factory The KeyFactory factory allows the dynamic creation of keys that are composed of multiple values that can be used in for example Map implementations. For doing so, the KeyFactory requires some interface that defines the values that should be used in such a key. This interface must contain a single method by the name newInstance that returns an Object. For example: public interface SampleKeyFactory { Object newInstance(String first, int second); } Now an instance of a a key can be created by: @Test public void testKeyFactory() throws Exception { SampleKeyFactory keyFactory = (SampleKeyFactory) KeyFactory.create(Key.class); Object key = keyFactory.newInstance("foo", 42); Map map = new HashMap(); map.put(key, "Hello cglib!"); assertEquals("Hello cglib!", map.get(keyFactory.newInstance("foo", 42))); } The KeyFactory will assure the correct implementation of the Object#equals(Object) and Object#hashCode methods such that the resulting key objects can be used in a Map or a Set. The KeyFactory is also used quite a lot internally in the cglib library. Mixin Some might already know the concept of the Mixin class from other programing languages such as Ruby or Scala (where mixins are called traits). cglib Mixins allow the combination of several objects into a single object. However, in order to do so, those objects must be backed by interfaces: public interface Interface1 { String first(); } public interface Interface2 { String second(); } public class Class1 implements Interface1 { @Override public String first() { return "first"; } } public class Class2 implements Interface2 { @Override public String second() { return "second"; } } Now the classes Class1 and Class2 can be combined to a single class by an additional interface: public interface MixinInterface extends Interface1, Interface2 { /* empty */ } @Test public void testMixin() throws Exception { Mixin mixin = Mixin.create(new Class[]{Interface1.class, Interface2.class MixinInterface.class}, new Object[]{new Class1(), new Class2()}); MixinInterface mixinDelegate = (MixinInterface) mixin; assertEquals("first", mixinDelegate.first()); assertEquals("second", mixinDelegate.second()); } Admittedly, the Mixin API is rather awkward since it requires the classes used for a mixin to implement some interface such that the problem could also be solved by non-instrumented Java. String Switcher The StringSwitcher emulates a String to int Java Map: @Test public void testStringSwitcher() throws Exception { String[] strings = new String[]{"one", "two"}; int[] values = new int[]{10, 20}; StringSwitcher stringSwitcher = StringSwitcher.create(strings, values, true); assertEquals(10, stringSwitcher.intValue("one")); assertEquals(20, stringSwitcher.intValue("two")); assertEquals(-1, stringSwitcher.intValue("three")); } The StringSwitcher allows to emulate a switch command on Strings such as it is possible with the built-in Java switch statement since Java 7. If using the StringSwitcher in Java 6 or less really adds a benefit to your code remains however doubtful and I would personally not recommend its use. Interface Maker The InterfaceMaker does what its name suggests: It dynamically creates a new interface. @Test public void testInterfaceMaker() throws Exception { Signature signature = new Signature("foo", Type.DOUBLE_TYPE, new Type[]{Type.INT_TYPE}); InterfaceMaker interfaceMaker = new InterfaceMaker(); interfaceMaker.add(signature, new Type[0]); Class iface = interfaceMaker.create(); assertEquals(1, iface.getMethods().length); assertEquals("foo", iface.getMethods()[0].getName()); assertEquals(double.class, iface.getMethods()[0].getReturnType()); } Other than any other class of cglib's public API, the interface maker relies on ASM types. The creation of an interface in a running application will hardly make sense since an interface only represents a type which can be used by a compiler to check types. It can however make sense when you are generating code that is to be used in later development. Method Delegate A MethodDelegate allows to emulate a C#-like delegate to a specific method by binding a method call to some interface. For example, the following code would bind the SampleBean#getValue method to a delegate: public interface BeanDelegate { String getValueFromDelegate(); } @Test public void testMethodDelegate() throws Exception { SampleBean bean = new SampleBean(); bean.setValue("Hello cglib!"); BeanDelegate delegate = (BeanDelegate) MethodDelegate.create( bean, "getValue", BeanDelegate.class); assertEquals("Hello world!", delegate.getValueFromDelegate()); } There are however some things to note: The factory method MethodDelegate#create takes exactly one method name as its second argument. This is the method the MethodDelegate will proxy for you. There must be a method without arguments defined for the object which is given to the factory method as its first argument. Thus, the MethodDelegate is not as strong as it could be. The third argument must be an interface with exactly one argument. The MethodDelegate implements this interface and can be cast to it. When the method is invoked, it will call the proxied method on the object that is the first argument. Furthermore, consider these drawbacks: cglib creates a new class for each proxy. Eventually, this will litter up your permanent generation heap space You cannot proxy methods that take arguments. If your interface takes arguments, the method delegation will simply not work without an exception thrown (the return value will always be null). If your interface requires another return type (even if that is more general), you will get a IllegalArgumentException. Multicast Delegate The MulticastDelegate works a little different than the MethodDelegate even though it aims at similar functionality. For using the MulticastDelegate, we require an object that implements an interface: public interface DelegatationProvider { void setValue(String value); } public class SimpleMulticastBean implements DelegatationProvider { private String value; public String getValue() { return value; } public void setValue(String value) { this.value = value; } } Based on this interface-backed bean we can create a MulticastDelegate that dispatches all calls to setValue(String) to several classes that implement the DelegationProvider interface: @Test public void testMulticastDelegate() throws Exception { MulticastDelegate multicastDelegate = MulticastDelegate.create( DelegatationProvider.class); SimpleMulticastBean first = new SimpleMulticastBean(); SimpleMulticastBean second = new SimpleMulticastBean(); multicastDelegate = multicastDelegate.add(first); multicastDelegate = multicastDelegate.add(second); DelegatationProvider provider = (DelegatationProvider)multicastDelegate; provider.setValue("Hello world!"); assertEquals("Hello world!", first.getValue()); assertEquals("Hello world!", second.getValue()); } Again, there are some drawbacks: The objects need to implement a single-method interface. This sucks for third-party libraries and is awkward when you use CGlib to do some magic where this magic gets exposed to the normal code. Also, you could implement your own delegate easily (without byte code though but I doubt that you win so much over manual delegation). When your delegates return a value, you will receive only that of the last delegate you added. All other return values are lost (but retrieved at some point by the multicast delegate). Constructor Delegate A ConstructorDelegate allows to create a byte-instrumented factory method. For that, that we first require an interface with a single method newInstance which returns an Object and takes any amount of parameters to be used for a constructor call of the specified class. For example, in order to create a ConstructorDelegate for the SampleBean, we require the following to call SampleBean's default (no-argument) constructor: public interface SampleBeanConstructorDelegate { Object newInstance(); } @Test public void testConstructorDelegate() throws Exception { SampleBeanConstructorDelegate constructorDelegate = (SampleBeanConstructorDelegate) ConstructorDelegate.create( SampleBean.class, SampleBeanConstructorDelegate.class); SampleBean bean = (SampleBean) constructorDelegate.newInstance(); assertTrue(SampleBean.class.isAssignableFrom(bean.getClass())); } Parallel Sorter The ParallelSorter claims to be a faster alternative to the Java standard library's array sorters when sorting arrays of arrays: @Test public void testParallelSorter() throws Exception { Integer[][] value = { {4, 3, 9, 0}, {2, 1, 6, 0} }; ParallelSorter.create(value).mergeSort(0); for(Integer[] row : value) { int former = -1; for(int val : row) { assertTrue(former < val); former = val; } } } The ParallelSorter takes an array of arrays and allows to either apply a merge sort or a quick sort on every row of the array. Be however careful when you use it: When using arrays of primitives, you have to call merge sort with explicit sorting ranges (e.g. ParallelSorter.create(value).mergeSort(0, 0, 3) in the example. Otherwise, the ParallelSorter has a pretty obvious bug where it tries to cast the primitive array to an array Object[] what will cause a ClassCastException. If the array rows are uneven, the first argument will determine the length of what row to consider. Uneven rows will either lead to the extra values not being considered for sorting or a ArrayIndexOutOfBoundException. Personally, I doubt that the ParallelSorter really offers a time advantage. Admittedly, I did however not yet try to benchmark it. If you tried it, I'd be happy to hear about it in the comments. Fast Class and Fast Members The FastClass promises a faster invocation of methods than the Java reflection API by wrapping a Java class and offering similar methods to the reflection API: @Test public void testFastClass() throws Exception { FastClass fastClass = FastClass.create(SampleBean.class); FastMethod fastMethod = fastClass.getMethod(SampleBean.class.getMethod("getValue")); MyBean myBean = new MyBean(); myBean.setValue("Hello cglib!"); assertTrue("Hello cglib!", fastMethod.invoke(myBean, new Object[0])); } Besides the demonstrated FastMethod, the FastClass can also create FastConstructors but no fast fields. But how can the FastClass be faster than normal reflection? Java reflection is executed by JNI where method invocations are executed by some C-code. The FastClass on the other side creates some byte code that calls the method directly from within the JVM. However, the newer versions of the HotSpot JVM (and probably many other modern JVMs) know a concept called inflation where the JVM will translate reflective method calls into native version's of FastClass when a reflective method is executed often enough. You can even control this behavior (at least on a HotSpot JVM) with setting the sun.reflect.inflationThreshold property to a lower value. (The default is 15.) This property determines after how many reflective invocations a JNI call should be substituted by a byte code instrumented version. I would therefore recommend to not use FastClass on modern JVMs, it can however fine-tune performance on older Java virtual machines. cglib Proxy The cglib Proxy is a reimplementation of the Java Proxy class mentioned in the beginning of this article. It is intended to allow using the Java library's proxy in Java versions before Java 1.3 and differs only in minor details. The better documentation of the cglib Proxy can however be found in the Java standard library's Proxy javadoc where an example of its use is provided. For this reason, I will skip a more detailed discussion of the cglib's Proxy at this place. A Final Word of Warning After this overview of cglib's functionality, I want to speak a final word of warning. All cglib classes generate byte code which results in additional classes being stored in a special section of the JVM's memory: The so called perm space. This permanent space is, as the name suggests, used for permanent objects that do not usually get garbage collected. This is however not completely true: Once a Class is loaded, it cannot be unloaded until the loading ClassLoader becomes available for garbage collection. This is only the case the Class was loaded with a custom ClassLoader which is not a native JVM system ClassLoader. This ClassLoader can be garbage collected if itself, all Classes it ever loaded and all instances of all Classes it ever loaded become available for garbage collection. This means: If you create more and more classes throughout the life of a Java application and if you do not take care of the removal of these classes, you will sooner or later run of of perm space what will result in your application's death by the hands of an OutOfMemoryError. Therefore, use cglib sparingly. However, if you use cglib wisely and carefully, you can really do amazing things with it that go beyond what you can do with non-instrumented Java applications. Lastly, when creating projects that depend on cglib, you should be aware of the fact that the cglib project is not as well maintained and active as it should be, considering its popularity. The missing documentation is a first hint. The often messy public API a second. But then there are also broken deploys of cglib to Maven central. The mailing list reads like an archive of spam messages. And the release cycles are rather unstable. You might therefore want to have a look at javassist, the only real low-level alternative to cglib. Javassist comes bundled with a pseudo-java compiler what allows to create quite amazing byte code instrumentations without even understanding Java byte code. If you like to get your hands dirty, you might also like ASM on top of which cglib is built. ASM comes with a great documentation of both the library and Java class files and their byte code. Note that these examples only run with cglib 2.2.2 and are not compatible with the newest release 3 of cglib. Unfortunately, I experienced the newest cglib version to occasionally produce invalid byte code which is why I considered an old version and also use this version in production. Also, note that most projects using cglib move the library to their own namespace in order to avoid version conflicts with other dependencies such as for example demonstrated by the Spring project. You should do the same with your project when making use of cglib. Tools such like jarjar can help you with the automation of this good practice.
January 7, 2014
by Rafael Winterhalter
· 76,778 Views · 18 Likes
article thumbnail
Hunting for an SWT Test Framework? Say Hello to Red Deer
This is the first in a series of posts on the new “Red Deer” (https://github.com/jboss-reddeer/reddeer) open source testing framework for Eclipse. In this post, we’ll introduce Red Deer, and take a look at the some of the advantages that it offers by building a sample test program from scratch. Some of the features that Red Deer automated offers are: An easy to use, high-level API for testing standard Eclipse components Support for creating custom extensions for your own applications A requirements validation mechanism to assist you in configuring complex tests Eclipse Tooling to Assist in Creating new Projects A record and playback tool to enable you to quickly create automated tests An integration with Selenium for testing web based applications Support for running tests in a Jenkins CI environment Note that as of this writing, Red Deer is in an incubation stage. The current release is at level 0.5. The target date for the 1.0 release of Red Deer is late 2014. But, as a community-based, open source project, now is a great time to try Red Deer and make suggestions or even contribute code! A Look at Red Deer’s Architecture The Red Deer project itself is comprised of utilities and the API that supports the development and execution of automated tests. The API (the parts of the above diagram that are enclosed in dashed line boxes) can be thought of as having three layers: The top layer consists of extensions to Red Deer’s abstract classes or implementations for Eclipse components such as Views, Editors, Wizards, or Shells. For example, if you are writing tests for a feature that uses a custom Eclipse View, you can extend Red Deer’s View class by adding support for the specific functions of the feature. The advantage that this API layer gives you is that your test programs do not have to focus on manipulating the individual UI elements directly to perform operations. Your programs can instead instantiate an instance of an Eclipse component such as a View, and then use that instance’s methods to perform operations on the View. This layer of abstraction makes your test programs easier to write, understand, and maintain. The middle layer consists of the Red Deer implementations for SWT UI elements such as: Button, Combo, Label, Menu, Shell, TabItem, Table, ToolBar, Tree. This API layer supports the API’s higher level by providing the building blocks for the API’s Views, Editors, Shells, and WIzards. This middle layer of the API also provides Red Deer packages that enable your tests to enforce requirements, so that necessary setup tasks are performed before a test is run. The bottom layer consists of Red Deer packages that support the execution of tests such as: Conditions, Matchers, Widgets, Workbench, and Red Deer extensions to JUnit. What Makes Red Deer different from other Tools? A Layer of Abstraction The top-most layer of the API enables you to instantiate Eclipse UI elements as objects, and then manipulate them through their methods. The resulting code is easier to read and maintain, instead of being brittle and subject to failures when the UI changes. For example, for a test that has to open a view and press a button, without Red Deer, the test would have to navigate the top level menu, find the view menu, then the view type in that menu, then find the view open dialog, then locate the “OK” button, etc. Your test would have to spend a lot of time navigating through the UI elements before it could even begin to perform the test’s steps. With Red Deer, the code to open a view (in this case, the servers view) is simply: ServersView view = new ServersView(); view.open(); Furthermore, within that ServersView, your test program can perform operations on the View through methods which are defined in the view (and are incidentally also well debugged by the Red Deer team), instead of having to explicitly locate and manipulate the UI elements directly. For example, to obtain a list of all the servers, instead of locating the UI tree that contains the server list, and extracting that list of servers into an array, your Red Deer program can simply call the “getServers()” method. Likewise, the code to open a PackageExplorer, and then select a project within that PackageExplorer is as follows: PackageExplorer packageExplorer = new PackageExplorer(); packageExplorer.open(); packageExplorer.getProject("myTestProject").select(); And, the code to retrieve all the projects within that PackageExplorer is simply: packageExplorer.getProjects(); The result are that your tests are easier to write and maintain and you can focus on testing your application’s logic instead of writing brittle code to navigate through the application. Installing Red Deer The only prerequisites to using Red Deer are Eclipse and Java. In this post, we’ll use Eclipse Kepler and OpenJDK 1.7, running on Red Hat Enterprise Linux (RHEL) 6. To install Red Deer 0.4 (this is the latest stable milestone version as of this writing) follow these steps: Open up Eclipse Navigate to: Help->Install New Software Define a new download site using the Red Deer update site URL: http://download.jboss.org/jbosstools/updates/stable/kepler/core/reddeer/0.4.0/ Select Red Deer, click on the Finish button and Red Deer will install Now that you have Red Deer installed, let’s move onto building a new Red Deer test. Building your First Red Deer Test To create a new Red Deer test project, you make use of the Red Deer UI tooling and select New->Project->Other->Red Deer Test: Before we move on, let’s take a look at the WEB-INF/MANIFEST.MF file that is created in the project: Manifest-Version: 1.0 Bundle-ManifestVersion: 2 Bundle-Name: com.example.reddeer.sample Bundle-SymbolicName: com.example.reddeer.sample;singleton:=true Bundle-Version: 1.0.0.qualifier Bundle-ActivationPolicy: lazy Bundle-Vendor: Sample Co Bundle-RequiredExecutionEnvironment: JavaSE-1.6 Require-Bundle: org.junit, org.jboss.reddeer.junit, org.jboss.reddeer.swt, org.jboss.reddeer.eclipse The line we’re interested in is the final line in the file. These are the bundles that are required by Red Deer. After the empty project is created by the wizard, you can define a package and create a test class. Here's the code for a minimal functional test. The test will verify that the eclipse configuration is not empty. package com.example.reddeer.sample; import static org.junit.Assert.assertFalse; import java.util.List; import org.jboss.reddeer.swt.api.TreeItem; import org.jboss.reddeer.swt.impl.button.PushButton; import org.jboss.reddeer.swt.impl.menu.ShellMenu; import org.jboss.reddeer.swt.impl.tree.DefaultTree; import org.junit.Test; import org.junit.runner.RunWith; import org.jboss.reddeer.junit.runner.RedDeerSuite; @RunWith(RedDeerSuite.class) public class SimpleTest { @Test public void TestIt() { new ShellMenu("Help", "About Eclipse Platform").select(); new PushButton("Installation Details").click(); DefaultTree ConfigTree = new DefaultTree(); List ConfigItems = ConfigTree.getAllItems(); assertFalse ("The list is empty!", ConfigItems.isEmpty()); for (TreeItem item : ConfigItems) { System.out.println ("Found: " + item.getText()); } } } After you save the test's source file, you can run the test. To run the test, select the Run As->Red Deer Test option: And - there's the green bar! Simplifying Tests with Requirements Red Deer requirements enable you to define actions that you want happen before a test is executed. The advantage to using requirements is that you define the actions with annotations instead of using a @BeforeClass method. The result is that your test code is easier to read and maintain. The biggest difference between a Red Deer requirement and the the @BeforeClass annotation from the JUnit framework is that if a requirement cannot be fulfilled the test is not executed. Like everything else in Red Deer, you can make use of predefined requirements, or you can extend the feature by adding your own custom requirements. These custom requirements can be made complex and for convenience can be stored in external properties files. (We’ll take a look at defining custom requirements in a later post in this series when we examine how to create and contribute extensions to Red Deer.) The current milestone release of Red Deer provides predefined requirements that enable you to clean out your current workspace and open a perspective. Let’s add these to our example. To do this, we need to add these import statements: import org.jboss.reddeer.eclipse.ui.perspectives.JavaBrowsingPerspective; import org.jboss.reddeer.requirements.cleanworkspace.CleanWorkspaceRequirement.CleanWorkspace; import org.jboss.reddeer.requirements.openperspective.OpenPerspectiveRequirement.OpenPerspective; And these annotations: @CleanWorkspace @OpenPerspective(JavaBrowsingPerspective.class) And, we also have to a reference to org.jboss.reddeer.requirements to the required bundle list in our example’s MANIFEST.MF file: Require-Bundle: org.junit, org.jboss.reddeer.junit, org.jboss.reddeer.swt, org.jboss.reddeer.eclipse, org.jboss.reddeer.requirements When we’re done, our example looks like this: package com.example.reddeer.sample; import static org.junit.Assert.assertFalse; import java.util.List; import org.jboss.reddeer.swt.api.TreeItem; import org.jboss.reddeer.swt.impl.button.PushButton; import org.jboss.reddeer.swt.impl.menu.ShellMenu; import org.jboss.reddeer.swt.impl.tree.DefaultTree; import org.junit.Test; import org.junit.runner.RunWith; import org.jboss.reddeer.junit.runner.RedDeerSuite; import org.jboss.reddeer.eclipse.ui.perspectives.JavaBrowsingPerspective; import org.jboss.reddeer.requirements.cleanworkspace.CleanWorkspaceRequirement.CleanWorkspace; import org.jboss.reddeer.requirements.openperspective.OpenPerspectiveRequirement.OpenPerspective; @RunWith(RedDeerSuite.class) @CleanWorkspace @OpenPerspective(JavaBrowsingPerspective.class) public class SimpleTest { @Test public void TestIt() { new ShellMenu("Help", "About Eclipse Platform").select(); new PushButton("Installation Details").click(); DefaultTree ConfigTree = new DefaultTree(); List ConfigItems = ConfigTree.getAllItems(); assertFalse ("The list is empty!", ConfigItems.isEmpty()); for (TreeItem item : ConfigItems) { System.out.println ("Found: " + item.getText()); } } } Notice how we were able to add those functions to the test code, while only adding a very small amount of actual new code? Yes, it can pay to be a lazy programmer. ;-) What’s Next? What’s next for Red Deer is its continued development as it progresses through its incubation stage until its 1.0 release. What’s next for this series of posts will be discussions about: The Red Deer Recorder - To enable you to capture manual actions and convert them into test programs How you can Extend Red Deer - To provide test coverage for your plugins’ specific functions. And How you can Contribute these extensions to the Red Deer project. How you can Define Complex Requirements - To enable you to perform setup tasks for your tests. Red Deer’s Integration with Selenium - To enable you to test web interfaces provided by your plugins. Running Red Deer tests with Jenkins - To enable you to take advantage of Jenkins’ Continuous Integration (CI) test framework. Author’s Acknowledgements I’d like to thank all the contributors to Red Deer for their vision and contributions. It’s a new project, but it is growing fast! The contributors (in alphabetic order) are: Stefan Bunciak, Radim Hopp, Jaroslav Jankovic, Lucia Jelinkova, Marian Labuda, Martin Malina, Jan Niederman, Vlado Pakan, Jiri Peterka, Andrej Podhradsky, Milos Prchlik, Radoslav Rabara, Petr Suchy, and Rastislav Wagner.
January 7, 2014
by Len DiMaggio
· 7,712 Views
article thumbnail
Bulk Fetching with Hibernate
If you need to process large database result sets from Java, you can opt for JDBC to give you the low level control required. On the other hand, if you are already using an ORM in your application, falling back to JDBC might imply some extra pain. You would be losing features such as optimistic locking, caching, automatic fetching when navigating the domain model and so forth. Fortunately most ORMs, like Hibernate, have some options to help you with that. While these techniques are not new, there are a couple of possibilities to choose from. A simplified example; let's assume we have a table (mapped to class "DemoEntity") with 100.000 records. Each record consists of a single column (mapped to the property "property" in DemoEntity) holding some random alphanumerical data of about ~2KB. The JVM is ran with -Xmx250m. Let's assume that 250MB is the overall maximum memory that can be assigned to the JVM on our system. Your job is to read all records currently in the table, doing some not further specified processing, and finally store the result. We'll assume that the entities resulting from our bulk operation are not modified. To start we'll try the obvious first, performing a query to simply retrieve all data: new TransactionTemplate(txManager).execute(new TransactionCallback() { @Override public Void doInTransaction(TransactionStatus status) { Session session = sessionFactory.getCurrentSession(); List demoEntitities = (List) session.createQuery("from DemoEntity").list(); for(DemoEntity demoEntity : demoEntitities){ //Process and write result } return null; } }); After a couple of seconds: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded Clearly this won't cut it. To fix this we will be switching to Hibernate scrollable result sets as probably most developers are aware of. The above example instructs hibernate to execute the query, map the entire results to entities and return them. When using scrollable result sets records are transformed to entities one at a time: new TransactionTemplate(txManager).execute(new TransactionCallback() { @Override public Void doInTransaction(TransactionStatus status) { Session session = sessionFactory.getCurrentSession(); ScrollableResults scrollableResults = session.createQuery("from DemoEntity").scroll(ScrollMode.FORWARD_ONLY); int count = 0; while (scrollableResults.next()) { if (++count > 0 && count % 100 == 0) { System.out.println("Fetched " + count + " entities"); } DemoEntity demoEntity = (DemoEntity) scrollableResults.get()[0]; //Process and write result } return null; } }); After running this we get: ... Fetched 49800 entities Fetched 49900 entities Fetched 50000 entities Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded Although we are using a scrollable result set, every returned object is an attached object and becomes part of the persistence context (aka session). The result is actually the same as our first example in which we used "session.createQuery("from DemoEntity").list()". However, with that approach we had no control; everything happens behind the scenes and you get a list back with all the data if hibernate has done its job. using a scrollable result set on the other hand gives us a hook into the retrieval process and allows us to free memory up when needed. As we have seen it does not free up memory automatically, you have to instruct Hibernate to actually do it. Following options exist: Evicting the object from the persistent context after processing it Clearing the entire session every now and then We will opt for the first. In the above example under line 13 (//Process and write result) we'll add: session.evict(demoEntity); Important: If you were to perform any modification to the entity (or entities it has associations with that are cascade evicted alongside), make sure to flush the session PRIOR evicting or clearing, otherwise queries hold back because of Hibernate's write behind will not be sent to the database Evicting or clearing does not remove the entities from second level cache. If you enabled second level cache and are using it and you want to remove them as well use the desired sessionFactory.getCache().evictXxx() method From the moment you evict an entity it will be no longer attached (no longer associated with a session). Any modification done to the entity at that stage will no longer be reflected to the database automatically. If you are using lazy loading, accessing any property that was not loaded prior the eviction will yield the famous org.hibernate.LazyInitializationException. So basically, make sure the processing for that entity is done (or it is at least initialized for further needs) before you evict or clear After we run the application again, we see that it now successfully executes: ... Fetched 99800 entities Fetched 99900 entities Fetched 100000 entities Btw; you can also set the query read-only allowing hibernate to perform some extra optimizations: ScrollableResults scrollableResults = session.createQuery("from DemoEntity").setReadOnly(true).scroll(ScrollMode.FORWARD_ONLY); Doing this only gives a very marginal difference in memory usage, in this specific test setup it enabled us to read about 300 entities extra with the given amount of memory. Personally I would not use this feature merely for memory optimizations alone but only if it suits in your overall immutability strategy. With hibernate you have different options to make entities read-only: on the entity itself, the overall session read-only and so forth. Setting read only false on the query individually is probably the least preferred approach. (eg. entities loaded in the session before will remain unaffected, possibly modifiable. Lazy associations will be loaded modifiable even if the root objects returned by the query are read only). Ok, we were able to process our 100.000 records, life is good. But as it turns out Hibernate has another another option for bulk operations: the stateless session. You can obtain a scrollable result set from a stateless session the same way as from a normal session. A stateless session lies directly above JDBC. Hibernate will run in nearly "all features disabled" mode. This means no persistent context, no 2nd level caching, no dirty detection, no lazy loading, basically no nothing. From the javadoc: /** * A command-oriented API for performing bulk operations against a database. * A stateless session does not implement a first-level cache nor interact with any * second-level cache, nor does it implement transactional write-behind or automatic * dirty checking, nor do operations cascade to associated instances. Collections are * ignored by a stateless session. Operations performed via a stateless session bypass * Hibernate's event model and interceptors. Stateless sessions are vulnerable to data * aliasing effects, due to the lack of a first-level cache. For certain kinds of * transactions, a stateless session may perform slightly faster than a stateful session. * * @author Gavin King */ The only thing it does is transforming records to objects. This might be an appealing alternative because it helps you getting rid of that manual evicting/flushing: new TransactionTemplate(txManager).execute(new TransactionCallback() { @Override public Void doInTransaction(TransactionStatus status) { sessionFactory.getCurrentSession().doWork(new Work() { @Override public void execute(Connection connection) throws SQLException { StatelessSession statelessSession = sessionFactory.openStatelessSession(connection); try { ScrollableResults scrollableResults = statelessSession.createQuery("from DemoEntity").scroll(ScrollMode.FORWARD_ONLY); int count = 0; while (scrollableResults.next()) { if (++count > 0 && count % 100 == 0) { System.out.println("Fetched " + count + " entities"); } DemoEntity demoEntity = (DemoEntity) scrollableResults.get()[0]; //Process and write result } } finally { statelessSession.close(); } } }); return null; } }); Besides the fact that the stateless session has the most optimal memory usage, using the it has some side effects. You might have noticed that we are opening a stateless session and closing it explicitly: there is no sessionFactory.getCurrentStatelessSession() nor (at the time of writing) any Spring integration for managing the stateless session.Opening a stateless session allocates a new java.sql.Connection by default (if you use openStatelessSession()) to perform its work and therefore indirectly spawns a second transaction. You can mitigate these side effects by using the Hibernate work API as in the example which supplies the current Connection and pass it along to openStatelessSession(Connection connection). Closing the session in the finally has no impact on the physical connection since that is captured by the Spring infrastructure: only the logical connection handle is closed and a new logical connection handle was created when opening the stateless session. Also note that you have to deal with closing the stateless session yourself and that the above example is only good for read-only operations. From the moment you are going to modify using the stateless session there are some more caveats. As said before, hibernate runs in "all feature disabled" mode and as a direct consequence entities are returned in detached state. For each entity you modify, you'll have to call: statelessSession.update(entity) explicitly. First I tried this for modifying an entity: new TransactionTemplate(txManager).execute(new TransactionCallback() { @Override public Void doInTransaction(TransactionStatus status) { sessionFactory.getCurrentSession().doWork(new Work() { @Override public void execute(Connection connection) throws SQLException { StatelessSession statelessSession = sessionFactory.openStatelessSession(connection); try { DemoEntity demoEntity = (DemoEntity) statelessSession.createQuery("from DemoEntity where id = 1").uniqueResult(); demoEntity.setProperty("test"); statelessSession.update(demoEntity); } finally { statelessSession.close(); } } }); return null; } }); The idea is that we open a stateless session with the existing database Connection. As the StatelessSession javadoc indicates that no write behind occurs, I was convinced that each statement performed by the stateless session would be sent directly to the database. Eventually when the transaction (started by the TransactionTemplate) would be committed the results would become visible in the database. However, hibernate does BATCH statements using a stateless session. I'm not 100% sure what the difference is between batching and write behind, but the result is the same and thus contra dictionary with the javadoc as statements are queued and flushed at a later time. So, if you don't do anything special, statements that are batched will not be flushed and this is what happened in my case: the "statelessSession.update(demoEntity);" was batched and never flushed. One way to force the flush is to use the hibernate transaction API: StatelessSession statelessSession = sessionFactory.openStatelessSession(); statelessSession.beginTransaction(); ... statelessSession.getTransaction().commit(); ... While this works, you probably don't want to start controlling your transactions programatically just because you are using a stateless session. Also, doing this we are again running our stateless session work in a second transaction scenario since we didn't pass along our Connection and thus a new database connection will be acquired. The reason we can't pass along the outer Connection is because if we commit the inner transaction (the "stateless session transaction") and it would be using the same connection as the outer transaction (started by the TransactionTemplate) it would break the outer transaction atomicity as statements from the outer transaction sent to database would be committed along with the inner transaction. So not passing along the connections means opening a new connection and thus creating a second transaction. A better alternative would be just to trigger Hibernate to flush the stateless session. However, statelessSession has no "flush" method to manually trigger a flush. A solution here is to depend a bit on the Hibernate internal API. This solution makes the manual transaction handling and the second transaction obsolete: all statements become part of our (one and only) outer transaction: StatelessSession statelessSession = sessionFactory.openStatelessSession(connection); try { DemoEntity demoEntity = (DemoEntity) statelessSession.createQuery("from DemoEntity where id = 1").uniqueResult(); demoEntity.setProperty("test"); statelessSession.update(demoEntity); ((TransactionContext) statelessSession).managedFlush(); } finally { statelessSession.close(); } Fortunately there is an even better solution very recently posted on the Spring jira: https://jira.springsource.org/browse/SPR-2495 This is not yet part of Spring, but the factory bean implementation is pretty straight forward: StatelessSessionFactoryBean.java when using this you could simple inject the StatelessSession: @Autowired private StatelessSession statelessSession; It will inject a stateless session proxy which is equivalent to the way the normal "current" session works (with the minor difference that you inject a SessionFactory and need to obtain the currentSession each time). When the proxy is invoked it will lookup the stateless session bound to the running transaction. If none exists already it will create one with the same connection as the normal session (like we did in the example) and register a custom transaction synchronization for the stateless session. When the transaction is committed the stateless session is flushed thanks to the synchronization and finally closed. Using this you can inject the stateless session directly and use it as a current session (or the same way as you would inject a JPA PeristentContext for that matter). This relieves you from dealing with the opening and closing of the stateless session and having to deal with one way or the other to make it flush. The implementation is JPA aimed, but the JPA part is limited to obtaining the physical connection in obtainPhysicalConnection(). You can easily leave out the EntityManagerFactory and get the physical connection directly from the Hibernate session. Very careful conclusion: it is clear that the best approach will depend on your situation. If you use the normal session you will have to deal with eviction yourself when reading or persisting entities. Besides the fact you have to do this manually, it might also impact further use of the session if you have a mixed transaction; you both perform 'bulk' and 'normal' operations in the same transaction. If you continue with the normal operations you will have detached entities in your session which might lead to unexpected results (as dirty detection will no longer work and so forth). On the other hand you will still have the major hibernate benefits (as long as the entity isn't evicted) such as lazy loading, caching, dirty detection and the likes. Using the stateless session at the time of writing requires some extra attention on managing it (opening, closing and flushing) which can also be error prone. In the assumption you can proceed with the proposed factory bean, you have a very bare bone session which is separately from your normal session but still participating in the same transaction. With this you have a powerful tool to perform bulk operations without having to think about memory management. The downside is that you don't have any other hibernate functionality available.
January 6, 2014
by Koen Serneels
· 90,733 Views · 14 Likes
article thumbnail
Top Posts of 2013: Google's Big Data Papers
I’ll review Google’s most important Big Data publications and discuss where they are (as far as they’ve disclosed).
December 30, 2013
by Mikio Braun
· 117,082 Views
article thumbnail
Java: Using the Specification Pattern With JPA
This article is an introduction to using the specification pattern in Java. We also will see how we can combine classic specifications with JPA Criteria queries to retrieve objects from a relational database. Within this post we will use the following Poll class as an example entity for creating specifications. It represents a poll that has a start and end date. In the time between those two dates users can vote among different choices. A poll can also be locked by an administrator before the end date has been reached. In this case, a lock date will be set. @Entity public class Poll { @Id @GeneratedValue private long id; private DateTime startDate; private DateTime endDate; private DateTime lockDate; @OneToMany(cascade = CascadeType.ALL) private List votes = new ArrayList<>(); } For better readability I skipped getters, setters, JPA annotations for mapping Joda DateTime instances and fields that aren't needed in this example (like the question being asked in the poll). Now assume we have two constraints we want to implement: A poll is currently running if it is not locked and if startDate < now < endDate A poll is popular if it contains more than 100 votes and is not locked We could start by adding appropriate methods to Poll like: poll.isCurrentlyRunning(). Alternatively we could use a service method like pollService.isCurrentlyRunning(poll). However, we also want to be able to query the database to get all currently running polls. So we might add a DAO or repository method like pollRepository.findAllCurrentlyRunningPolls(). If we follow this way we implement the isCurrentlyRunning constraint two times in two different locations. Things become worse if we want to combine constraints. What if we want to query the database for a list of all popular polls that are currently running? This is where the specification pattern come in handy. When using the specification pattern we move business rules into extra classes called specifications. To get started with specifications we create a simple interface and an abstract class: public interface Specification { boolean isSatisfiedBy(T t); Predicate toPredicate(Root root, CriteriaBuilder cb); Class getType(); } abstract public class AbstractSpecification implements Specification { @Override public boolean isSatisfiedBy(T t) { throw new NotImplementedException(); } @Override public Predicate toPredicate(Root poll, CriteriaBuilder cb) { throw new NotImplementedException(); } @Override public Class getType() { ParameterizedType type = (ParameterizedType) this.getClass().getGenericSuperclass(); return (Class) type.getActualTypeArguments()[0]; } } Please ignore the AbstractSpecification class with the mysterious getType() method for a moment (we come back to it later). The central part of a specification is the isSatisfiedBy() method, which is used to check if an object satisfies the specification. toPredicate() is an additional method we use in this example to return the constraint as javax.persistence.criteria.Predicate instance which can be used to query a database. For each constraint we create a new specification class that extends AbstractSpecification and implements isSatisfiedBy() and toPredicate(). The specification implementation to check if a poll is currently running looks like this: public class IsCurrentlyRunning extends AbstractSpecification { @Override public boolean isSatisfiedBy(Poll poll) { return poll.getStartDate().isBeforeNow() && poll.getEndDate().isAfterNow() && poll.getLockDate() == null; } @Override public Predicate toPredicate(Root poll, CriteriaBuilder cb) { DateTime now = new DateTime(); return cb.and( cb.lessThan(poll.get(Poll_.startDate), now), cb.greaterThan(poll.get(Poll_.endDate), now), cb.isNull(poll.get(Poll_.lockDate)) ); } } Within isSatisfiedBy() we check if the passed object matches the constraint. In toPredicate() we construct a Predicate using JPA's CriteriaBuilder. We will use the resulting Predicate instance later to build a CriteriaQuery for querying the database. The specification for checking if a poll is popular looks similar: public class IsPopular extends AbstractSpecification { @Override public boolean isSatisfiedBy(Poll poll) { return poll.getLockDate() == null && poll.getVotes().size() > 100; } @Override public Predicate toPredicate(Root poll, CriteriaBuilder cb) { return cb.and( cb.isNull(poll.get(Poll_.lockDate)), cb.greaterThan(cb.size(poll.get(Poll_.votes)), 5) ); } } If we now want to test if a Poll instance matches one of these constraints we can use our newly created specifications: boolean isPopular = new IsPopular().isSatisfiedBy(poll); boolean isCurrentlyRunning = new IsCurrentlyRunning().isSatisfiedBy(poll); For querying the database we need to extend our DAO / repository to support specifications. This can look like the following: public class PollRepository { private EntityManager entityManager = ... public List findAllBySpecification(Specification specification) { CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder(); // use specification.getType() to create a Root instance CriteriaQuery criteriaQuery = criteriaBuilder.createQuery(specification.getType()); Root root = criteriaQuery.from(specification.getType()); // get predicate from specification Predicate predicate = specification.toPredicate(root, criteriaBuilder); // set predicate and execute query criteriaQuery.where(predicate); return entityManager.createQuery(criteriaQuery).getResultList(); } } Here we finally use the getType() method implemented in AbstractSpecification to createCriteriaQuery and Root instances. getType() returns the generic type of theAbstractSpecification instance defined by the subclass. For IsPopular andIsCurrentlyRunning it returns the Poll class. Without getType() we would have to create theCriteriaQuery and Root instances inside toPredicate() of every specification we create. So it is just a small helper to reduce boiler plate code inside specifications. Feel free to replace this with your own implementation if you come up with better approaches. Now we can use our repository to query the database for polls that match a certain specification: List popularPolls = pollRepository.findAllBySpecification(new IsPopular()); List currentlyRunningPolls = pollRepository.findAllBySpecification(new IsCurrentlyRunning()); At this point the specifications are the only components that contain the constraint definitions. We can use it to query the database or to check if an object fulfills the required rules. However one question remains: How do we combine two or more constraints? For example we would like to query the database for all popular polls that are still running. The answer to this is a variation of the composite design pattern called composite specifications. Using a composite specification we can combine specifications in different ways. To query the database for all running and popular pools we need to combine the isCurrentlyRunning with theisPopular specification using the logical and operation. Let's create another specification for this. We name itAndSpecification: public class AndSpecification extends AbstractSpecification { private Specification first; private Specification second; public AndSpecification(Specification first, Specification second) { this.first = first; this.second = second; } @Override public boolean isSatisfiedBy(T t) { return first.isSatisfiedBy(t) && second.isSatisfiedBy(t); } @Override public Predicate toPredicate(Root root, CriteriaBuilder cb) { return cb.and( first.toPredicate(root, cb), second.toPredicate(root, cb) ); } @Override public Class getType() { return first.getType(); } } An AndSpecification is created out of two other specifications. In isSatisfiedBy() and toPredicate()we return the result of both specifications combined by a logical and operation. We can use our new specification like this: Specification popularAndRunning = new AndSpecification<>(new IsPopular(), new IsCurrentlyRunning()); List polls = myRepository.findAllBySpecification(popularAndRunning); To improve readability we can add an and() method to the Specification interface: public interface Specification { Specification and(Specification other); // other methods } and implement it within our abstract implementation: abstract public class AbstractSpecification implements Specification { @Override public Specification and(Specification other) { return new AndSpecification<>(this, other); } // other methods } Now we can chain multiple specification by using the and() method: Specification popularAndRunning = new IsPopular().and(new IsCurrentlyRunning()); boolean isPopularAndRunning = popularAndRunning.isSatisfiedBy(poll); List polls = myRepository.findAllBySpecification(popularAndRunning); When needed we can easily extend this further with other composite specifications (for exampleOrSpecification or NotSpecification). Conclusion When using the specification pattern we move business rules in separate specification classes. These specification classes can be easily combined by using composite specifications. In general, specification improve reusability and maintainability. Additionally specifications can easily be unit tested. For more detailed information about the specification pattern I recommend this article by Eric Evans and Martin Fowler. You can find the source of this example project on GitHub.
December 30, 2013
by Michael Scharhag
· 116,619 Views · 8 Likes
article thumbnail
Extracting Tables from PDFs in Javascript with PDF.js
a common and difficult problem acquiring data is extracting tables from a pdf. previously, i described how to extract the text from a pdf with pdf.js , a pdf rendering library made by mozilla labs. the rendering process requires an html canvas object, and then draws each object (character, line, rectangle, etc) on it. the easiest way to get a list of these is to to intercept all the calls pdf.js makes to drawing functions on the canvas object. (see “ self modifying javascripts ” for a similar technique). the “set” method below adds a wrapper closure to each function, which logs the call. function replace(ctx, key) { var val = ctx[key]; if (typeof(val) == "function") { ctx[key] = function() { var args = array.prototype.slice.call(arguments); console.log("called " + key + "(" + args.join(",") + ")"); return val.apply(ctx, args); } } } for (var k in context) { replace(context, k); } var rendercontext = { canvascontext: context, viewport: viewport }; page.render(rendercontext); this lets us see a series of calls: called transform(1,0,0,1,150.42,539.67) called translate(0,0) called scale(1,-1) called scale(0.752625,0.752625) called measuretext(c) called save() called scale(0.9701818181818181,1) called filltext(c,0,0) called restore() called restore() called save() called transform(1,0,0,1,150.42,539.6 we can easily retrieve the text by noting the first argument to each “filltext” call: "congregations ranked by growth and decline in membership and worship attendance, 2006 to 2011philadelphia presbytery - table 16net membership changenet worship changepercent changepercent changeworship 2006worship 2011membership 2006membership 2011abington, abington- 143(74)-13.18%(57)0(15)0.00%(22)numberrank3003001,085942anchor, wrightstown0(23)0.00%(27)-12(25)-21.43%(52)numberrank56449797arch street, philadelphia-117(71)-68.42%(117)27(5)90.00% (2)numberrank305717154aston, aston3(21)3.53%(22)-5(19)-9.43% (31)numberrank53488588beaconno reportboth yearsno reportboth yearsnumberrankbensalem, bensalem-23(39)-13.94%(62)-28(36)-28.57% (64)numberrank9870165142berean, philadelphia106(4)44.92%(4)no reportboth yearsnumberrank00236342bethany collegiate, havertown- 188(76)-42.44%(110)43(3)21.29%(7)numberrank202245443255bethel, philadelphia-13(33)-13.68%(60)-27(35)-35.06% (71)numberrank77509582bethesda, philadelphia9(18)5.56%(18)no reportboth yearsnumberrank1150162171beverly hills, upper darby-3(26)-3.03% (32)-11(24)-20.00%(48)numberrank55449996bridesburg, philadelphia0(23)0.00%(27)no reportboth yearsnumberrank004444bristol, bristolno reportboth yearsno reportboth yearsnumberrankpage 1 of 10report prepared by research services, presbyterian church (u.s.a.)1- 800-728-7228, ext #204006-oct-12" notable, this doesn’t track line endings, and not all the characters are recorded in the expected order (the first line is rendered after the second). the calls to transform, translate, and scale control where text is placed. the filltext method also takes an (x, y) parameter set that moves the individual letters between words. the exact position is a combination of successive operations, which are modeled as a stack of matrix operations. thankfully, pdf.js tracks the output of these operations as it renders, so we don’t have to recalculate it. thus, we can make a method that records the letters and their real positions. this method takes the internal context object, the type of state transition, and the arguments to the transition. this method is then called from the ‘record’ function listed above. var chars = []; var cur = {}; function record(ctx, state, args) { if (state === 'filltext') { var c = args[0]; cur.c = c; cur.x = ctx._transformmatrix[4] + args[1]; cur.y = ctx._transformmatrix[5] + args[2]; chars[chars.length] = cur; cur = {}; } } these results can be sorted by position (x and y). the sort method arranges letters by position – if they are shifted up or down a small amount, they are considered to be on one line. chars.sort( function(a, b) { var dx = b.x - a.x; var dy = b.y - a.y; if (math.abs(dy) < 0.5) { return dx * -1; } else { return dy * -1; } } ); this presents several difficulties: this doesn’t detect right-to-left text, and it’s becoming clear that we’re going to have a hard time knowing when you’re in a table and when we aren’t. to do this, we define a function which can transform the array of letters and positions into a csv style output. this tracks from letter to letter – if it sees a “large” change in y, it makes a new line. if it sees a “large” change in x, it treats it as a new column. the real challenge is defining “large” which for my test pdf were around 15 and 20, for dx and dy. function gettext(marks, ex, ey, v) { var x = marks[0].x; var y = marks[0].y; var txt = ''; for (var i = 0; i < marks.length; i++) { var c = marks[i]; var dx = c.x - x; var dy = c.y - y; if (math.abs(dy) > ey) { txt += "\"\n\""; if (marks[i+1]) { // line feed - start from position of next line x = marks[i+1].x; } } if (math.abs(dx) > ex) { txt += "\",\""; } if (v) { console.log(dx + ", " + dy); } txt += c.c; x = c.x; y = c.y; } return txt; } this algorithm doesn’t handle newlines in rows, and oddly, the columns don’t come out in the right order, but they appear to be consistently out of order. line with large spaces (e.g. an em-dash) are detected as having multiple columns, but this can be cleaned up later – here is some sample output. you can see an example below, and the final source is available on github . congregations ranked by growth and decline in m","embership and w","orship attendance, 2006 to 2011" "","philadelphia presbytery"," - table 16" "","net ","membership ","change" "","net worship ","change","percent ","change","percent ","change","worship"," 2006","worship"," 2011","membership"," 2006","membership"," 2011" "","abington, abington","-143","(74)","-13.18%(57)","0","(15)","0.00%(22)","number","rank","300","300","1,085","942" "","anchor, wrightstown","0","(23)","0.00%(27)","-12","(25)","-21.43%(52)","number","rank","56","44","97","97" "","arch street, philadelphia","-117","(71)","-68.42%","(117)","27(5)","90.00%(2)","number","rank","30","57","171","54" "","aston, aston","3","(21)","3.53%(22)","-5","(19)","-9.43%(31)","number","rank","53","48","85","88" "","beacon","no report","both years","no report","both years","number","rank" "","bensalem, bensalem","-23","(39)","-13.94%(62)","-28","(36)","-28.57%(64)","number","rank","98","70","165","142" "","berean, philadelphia","106(4)","44.92%(4)","no report","both years","number","rank","0","0","236","342" "","bethany collegiate, havertown","-188","(76)","-42.44%","(110)","43(3)","21.29%(7)","number","rank","202","245","443","255" "","bethel, philadelphia","-13","(33)","-13.68%(60)","-27","(35)","-35.06%(71)","number","rank","77","50","95","82" "","bethesda, philadelphia","9","(18)","5.56%(18)","no report","both years","number","rank","115","0","162","171" "","beverly hills, upper darby","-3","(26)","-3.03%(32)","-11","(24)","-20.00%(48)","number","rank","55","44","99","96" "","bridesburg, philadelphia","0","(23)","0.00%(27)","no report","both years","number","rank","0","0","44","44" "","bristol, bristol","no report","both years","no report","both years","number","rank" "","page 1 of 10","report prepared by research services, presbyterian church (u.s.a.)","1-800-728-7228, ext #2040","06-oct-12"
December 26, 2013
by Gary Sieling
· 21,483 Views
article thumbnail
Storing Objects in Android
One alternative to using SQLite on Android is to store Java objects in SharedPreferences.
December 19, 2013
by Tony Siciliani
· 47,677 Views · 1 Like
article thumbnail
Handling Big Data with HBase Part 4: The Java API
Editor's note: Be sure to check out part 2 as well. This is the fourth of an introductory series of blogs on Apache HBase. In the third part, we saw a high level view of HBase architecture . In this part, we'll use the HBase Java API to create tables, insert new data, and retrieve data by row key. We'll also see how to setup a basic table scan which restricts the columns retrieved and also uses a filter to page the results. Having just learned about HBase high-level architecture, now let's look at the Java client API since it is the way your applications interact with HBase. As mentioned earlier you can also interact with HBase via several flavors of RPC technologies like Apache Thrift plus a REST gateway, but we're going to concentrate on the native Java API. The client APIs provide both DDL (data definition language) and DML (data manipulation language) semantics very much like what you find in SQL for relational databases. Suppose we are going to store information about people in HBase, and we want to start by creating a new table. The following listing shows how to create a new table using the HBaseAdmin class. Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("people")); tableDescriptor.addFamily(new HColumnDescriptor("personal")); tableDescriptor.addFamily(new HColumnDescriptor("contactinfo")); tableDescriptor.addFamily(new HColumnDescriptor("creditcard")); admin.createTable(tableDescriptor); The people table defined in preceding listing contains three column families: personal, contactinfo, and creditcard. To create a table you create an HTableDescriptor and add one or more column families by adding HColumnDescriptor objects. You then call createTable to create the table. Now we have a table, so let's add some data. The next listing shows how to use the Put class to insert data on John Doe, specifically his name and email address (omitting proper error handling for brevity). Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Put put = new Put(Bytes.toBytes("doe-john-m-12345")); put.add(Bytes.toBytes("personal"), Bytes.toBytes("givenName"), Bytes.toBytes("John")); put.add(Bytes.toBytes("personal"), Bytes.toBytes("mi"), Bytes.toBytes("M")); put.add(Bytes.toBytes("personal"), Bytes.toBytes("surame"), Bytes.toBytes("Doe")); put.add(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"), Bytes.toBytes("[email protected]")); table.put(put); table.flushCommits(); table.close(); In the above listing we instantiate a Put providing the unique row key to the constructor. We then add values, which must include the column family, column qualifier, and the value all as byte arrays. As you probably noticed, the HBase API's utility Bytes class is used a lot; it provides methods to convert to and from byte[] for primitive types and strings. (Adding a static import for the toBytes() method would cut out a lot of boilerplate code.) We then put the data into the table, flush the commits to ensure locally buffered changes take effect, and finally close the table. Updating data is also done via the Put class in exactly the same manner as just shown in the prior listing. Unlike relational databases in which updates must update entire rows even if only one column changed, if you only need to update a single column then that's all you specify in the Put and HBase will only update that column. There is also a checkAndPut operation which is essentially a form of optimistic concurrency control - the operation will only put the new data if the current values are what the client says they should be. Retrieving the row we just created is accomplished using the Get class, as shown in the next listing. (From this point forward, listings will omit the boilerplate code to create a configuration, instantiate the HTable, and the flush and close calls.) Get get = new Get(Bytes.toBytes("doe-john-m-12345")); get.addFamily(Bytes.toBytes("personal")); get.setMaxVersions(3); Result result = table.get(get); The code in the previous listing instantiates a Get instance supplying the row key we want to find. Next we use addFamily to instruct HBase that we only need data from the personal column family, which also cuts down the amount of work HBase must do when reading information from disk. We also specify that we'd like up to three versions of each column in our result, perhaps so we can list historical values of each column. Finally, calling get returns a Result instance which can then be used to inspect all the column values returned. In many cases you need to find more than one row. HBase lets you do this by scanning rows, as shown in the second part which showed using a scan in the HBase shell session. The corresponding class is the Scan class. You can specify various options, such as the start and ending row key to scan, which columns and column families to include and the maximum versions to retrieve. You can also add filters, which allow you to implement custom filtering logic to further restrict which rows and columns are returned. A common use case for filters is pagination. For example, we might want to scan through all people whose last name is Smith one page (e.g. 25 people) at a time. The next listing shows how to perform a basic scan. Scan scan = new Scan(Bytes.toBytes("smith-")); scan.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("givenName")); scan.addColumn(Bytes.toBytes("contactinfo"), Bytes.toBytes("email")); scan.setFilter(new PageFilter(25)); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { // ... } In the above listing we create a new Scan that starts from the row key smith- and we then use addColumn to restrict the columns returned (thus reducing the amount of disk transfer HBase must perform) to personal:givenName and contactinfo:email. A PageFilter is set on the scan to limit the number of rows scanned to 25. (An alternative to using the page filter would be to specify a stop row key when constructing the Scan.) We then get a ResultScanner for the Scan just created, and loop through the results performing whatever actions are necessary. Since the only method in HBase to retrieve multiple rows of data is scanning by sorted row keys, how you design the row key values is very important. We'll come back to this topic later. You can also delete data in HBase using the Delete class, analogous to the Put class to delete all columns in a row (thus deleting the row itself), delete column families, delete columns, or some combination of those. Connection Handling In the above examples not much attention was paid to connection handling and RPCs (remote procedure calls). HBase provides the HConnection class which provides functionality similar to connection pool classes to share connections, for example you use the getTable() method to get a reference to an HTable instance. There is also an HConnectionManager class which is how you get instances of HConnection. Similar to avoiding network round trips in web applications, effectively managing the number of RPCs and amount of data returned when using HBase is important, and something to consider when writing HBase applications. Conclusion to Part 4 In this part we used the HBase Java API to create a people table, insert a new person, and find the newly inserted person information. We also used the Scan class to scan the people table for people with last name "Smith" and showed how to restrict the data retrieved and finally how to use a filter to limit the number of results. In the next part, we'll learn how to deal with the absence of SQL and relations when modeling schemas in HBase. References HBase web site, http://hbase.apache.org/ HBase wiki, http://wiki.apache.org/hadoop/Hbase HBase Reference Guide http://hbase.apache.org/book/book.html HBase: The Definitive Guide, http://bit.ly/hbase-definitive-guide Google Bigtable Paper, http://labs.google.com/papers/bigtable.html Hadoop web site, http://hadoop.apache.org/ Hadoop: The Definitive Guide, http://bit.ly/hadoop-definitive-guide Fallacies of Distributed Computing, http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing HBase lightning talk slides, http://www.slideshare.net/scottleber/hbase-lightningtalk Sample code, https://github.com/sleberknight/basic-hbase-examples
December 18, 2013
by Scott Leberknight
· 56,806 Views · 3 Likes
article thumbnail
Top 24 Java-Based Content Management Systems
CMS, or content management systems, are platforms for managing and administering website content. There is no denying that CMSes are important in today's web ecosystem. These content management systems not only provide an easy way to build and maintain websites, but they also lend a helping hand in updating and editing website content without the need to spend hours or days writing and altering codes and scripts. Some of the leading CMSes are PHP-based, Ruby on Rails-based, ASP.NET-based, and Java-based. Among these, due to scalability, modernized architecture and open-source standards of a few, Java-based CMSs are getting quite a lot of attention lately, especially for enterprise websites, because of the scalable, modern, open source technology behind most of them. There are plenty of CMS tools based on Java to help developers create multi-lingual and multi-channel websites. But how do we decide on the best one for our use case? In this article, we’re going to explore the top 24 content management systems based on Java. Let’s have a look at each of them in detail: 1. Alfresco : Alfresco is one of the top open-source content management systems of Java. It comes with enterprise repository and portlet capabilities along with document management, collaboration, records management, knowledge management, web content management, imaging, and a lot more. Alfresco has a modular architecture and enables end users to efficiently manage websites across the cloud, mobile, hybrid and on-premise environments using open source Java technologies, such as Spring, Hibernate, Lucene and JSF. 2. Magnolia : Magnolia is a well-documented, easy to use, enterprise-grade open source CMS based on the Java Content Repository Standard. It is a highly popular CMS due to its out-of-the-box functionality and ease of use under an open source license. Moreover, Magnolia supports unique content delivery capabilities in a search-engine optimized manner and also follows W3C standards. Magnolia CMS has been deployed by enterprises and governments in more than 100 countries across the world. Here's a case study on Magnolia-based website development 3. LogicalDOC : Though less known than other software such as Alfresco, LogicalDOC is emerging as a powerful and more affordable alternative. With primary focus on Document Management, it offers very interesting content management, knowledge management and collaboration features, and all this in a really efficient way. A peculiar aspect of the interface is the use of Google GWT , this makes the user interface very responsive while the data transfer with the server is minimum. Also the availability of Free Apps for Android and Apple devices (iPhone and iPad) is an interesting feature. 4. Asbru: Asbru is another powerful, fully-featured, easy to use content management system with database-driven capabilities. It is built on the Spring framework with integrated community, databases, eCommerce and statistics modules, which helps developers to create, publish and manage rich and user-friendly internet, extranet and intranet websites on the go. Available in various editions, Asbru provides users with a simple, user-friendly platform to manage websites along with a host of other benefits and features such as custom templates and data, password protected content, multi-lingual content, communities, eCommerce and website analytics, a cutting-edge WYSIWYG content editor and a lot more. 5. OpenCMS : OpenCMS is based on Java and XML technology that allows you to build highly customizable and interactive websites and portals. It comes integrated with a WYSIWYG editor and fully-featured Template Engine which is fully compliant with W3C standards. OpenCMS can be deployed both in an open-source environment (Linux, Apache, Tomcat, MySQL) as well as a commercial environment (Windows NT, IIS, BEA Weblogic, Oracle) 6. Walrus: Walrus is yet another Spring-based CMS that provides unique and effective content management capabilities with a smart administrative interface and drag-and-drop facilities. Easy-to-setup and undo/redo features make Walrus a highly preferred and suitable CMS for government and non-profit enterprises. 7. Pulse : Pulse is a Java-based framework and portal solution that offers easy-to-use and extensible patterns for creating rich browser web applications and responsive websites. It brings a bunch of innovative and powerful components including content management, web shops, user management and more. A few of its key features include a WebDAV based virtual file system for digital asset management, mature user and role management, built-in internationalization, and more. 8. MeshCMS: MeshCMS is an easy to use online editing system written in Java. It comes with a host of features that you will find in any ideal content management system however, it uses a conventional approach in managing and editing website content. It is considered one of the fastest CMSes for editing files online, managing files, and building some very common components like menus, breadcrumbs, mail forms and so on. MeshCMS is accompanied by cross-browser capabilities, a WYSIWYG editor, hot-linking prevention, and tag-library that makes content management an interesting affair. 9. Liferay: Liferay is one of the most popular CMSes based on Java, and is recommended by many industry experts. It comes with awesome features that can make your content management tasks simple. Liferay is a very popular for developing personal as well as professional websites with ease. 10. DotCMS: DotCMS is a next-gen enterprise CMS that wears an open-source hat. It is highly popular and widely used CMS due to its open APIs, extensible and scalable architecture that it used to create personalized and engaging websites, intranets, extranets and applications with ease. 11. Jease: Jease highly known as ‘Java with ease’ is another open source content management system that is built on popular Java technologies like db40, Perst, Lucence and ZK. It is an extremely lightweight CMS with excellent Ajax interface. Due to its intuitive and interactive interface, it is highly simple and easy to customize and deploy websites in Jease even for inexperienced Java developers. 12. Hippo: Hippo is again a powerful open-source CMS made in Java that features enterprise level capabilities that helps in delivering personalized websites and channels. Hippo outlines its competitor by delivering outstanding customer experience through innovative solutions. Hippo has come a long way since 1999 serving medium to large organizations by offering a personalized multichannel content distribution platform including website, mobile, tablet, extranets and intranets. Its major version update was in December 2012 and since then it is seeing minor updates every couple of months. 13. Apache Lenya: Apache Lenya is another open-source Java CMS that features revision control , multisite management, scheduling, search, WYSIWYG editors, and workflow which makes website development and management quite interesting and easy for developers. Available in a variety of languages, Apache Lenya is highly preferred CMS among enterprises that desire to develop multi-lingual websites. 14. Contelligent: Conteligent is another smart CMS solution offered under Java technology stack. It is fully compliant with J2EE and offers great solution for creating and managing personalized websites. 15. InfoGlue: InfoGlue again is a Java-based CMS that is known for its advanced, scalable and robust open-source architecture. It is a highly flexible CMS built on JSR-168 and comes with full multi-language support, excellent information reuse and high integration capabilities. 16. OpenEdit: OpenEdit CMS is a dynamic tool for managing website content with online editing capabilities. Built in open-source architecture, OpenEdit provides facilities like user manager, file manager, version control and notification tools for managing media-rich websites. OpenCMS features enterprise grade plugins such as eCommerce, Content Management, Blog, Events Calendar, Social Networking Tools and more. 17. AtLeap: Atleap is a multi-lingual CMS based on Java which offers amazing content delivery assistance with SEO and full text search functionalities. AtLeap, a product of Blandware, is not only a CMS but a highly robust framework for developing website and web applications 18. Weceem: Weceem is yet another open source content management system, unlikely other CMS it is built upon well-known Java framework grails, spring and Java itself. Weceem has garner positive reviews and is an ideal CMS when it comes to grails, but faces tough competition in best Java CMS category. I came across a LinkedIn discussion which was enough for me to put this CMS in the Best Java CMS list. 19. Nuxeo: Nuxeno is a powerful open source CMS built on Java-based architecture. It offers solutions related to document management, case management and digital asset management. It is free from licensing free but do costs you when reach out for support and maintenance help. It has strong groups of customers including Electronic Arts, U.S. Navy and as stated on the company website, it’s been used in over 145 countries across thousands of organizations. 20. XperienCentral: Xperien central is currently the only CMS that offers unique content to a visitor as per his earlier journey, so you can tailor the content to increase the conversion. It offers multi-channel content delivery across website, mobile social media channels and applications. It is built on Java and hence it is extremely scalable and agile. 21 Atex: Atex is a web CMS that uses polopoly technology to deliver content. As per claims, it is the only industry leading CMS with built in paywall. Atex again is one of the premium CMS that offers amazing solutions for managing websites and helps marketers deliver the right content to relevant audiences. It has rich set of clientele. 22 Escenic: Customers of escenic include News of the World, The Sun, The Times, the Independent titles. It’s a closed source Java framework. Both Atex and Escenic are found to be highly popular in Sweden. Some of the biggest sites in Sweden use both these CMS. idg.se uses Atex and Aftonbladet.se uses Escenic 23. Adobe Experience Manager/ CQ5 : Best CMS list cannot be completed without including adobe experience manager. It is an all-round CMS which offers all kinds of agility and flexibility an organization may want. It helps deliver unique customer experience by delivering different content on different channels. Adobe Experience Manager was recently named a leader in web content management by Garnters magic quadrant. Earlier it was known as CQ5 but later was acquire by Adobe in 2010 24. SDL Tridion Again a well known CMS and highly recommended by industry experts. Its simple intuitive UI makes it simple to manage the content and deliver it uniformly across all channels. It recently received top score in overall content management experience according to an independent research firm - Forrester Research, Inc., This completes the list 23 top Java-based content management system. Hope after reading about all the CMS, you have got enough inferences and insight as to which CMS would be best for your website development project.
December 9, 2013
by Boni Satani
· 328,590 Views · 4 Likes
article thumbnail
MongoDB and its locks
Sometimes, you need your jobs to be persisted to a database. Existing solutions such as Gearman only used relational or file-based persistence, so they were a no-go for us and we went with MongoDB. Fast-forward a few months, and we have some problems with the database load. However, it's not that workers are pestering it too much: the problem was related to locks. MongoDB locking model As of 2.4, MongoDB holds write locks on an entire database for each write operation. Since atomicity is guaranteed only on a single document, this isn't usually a problem because even if you are inserting thousands of documents you are doing so in thousands of different operations that can be interleaved with queries and other inserts with a fair policy. This sometimes results in count() queries being inconsistent as documents are moved and indexes are asynchronously updated. However, write corruption is inexistent as documents are a very cohesive entity. However, atomic operations over a single document still lock the whole database, as in the case of findAndModify(), which looks for a document matching a certain query and updates it with a $set operation before returning it; all in a single shot and with the guarantee no other process will be able to perform the same operation of reading and writing at the same time. You can see this operation is ideal for implementing workers based on a pull model, each asking the database for a new job to do and locking it with '$set: {locked: true}'. However, after the number of workers increases a little bit, locks become a problem. Lock duration We cleaned up the working space collection of our MongoDB database by keeping in it only the unfinished jobs, and moving all the rest (completed or failed) to a different collection for archival. As the load increases due to new contracts, we saw the locking time increase as well: the application and the workers were insisting on the same database. The first of the problems was that after reducing the specs of our primary server, we started seeing timeouts of unrelated code even if the CPU and IO usage were low. The locks taken by workers to pick jobs were starting to take seconds or tens of seconds. Moreover, the MongoDB server started filling the logs with: Fri Dec 6 00:01:07 [conn280998] warning: ClientCursor::yield can't unlock b/c of recursive lock... I'm a user, not MongoDB guru but that seems not very good, especially given hundreds of these messages were written every day (although the queues continued to work correctly.) We did not find any explanation for these messages in the documentation, but I suppose they mean some operations are taking so long that they have to yield to make room for others, but in the case of atomic operations they can't to preserve consistency. An easy solution Since MongoDB does not have collection-wide locks yet, we decided to move the job pool and the completed job collections to a different database. In this way, we had a main database with the usual collections and one containing just these two, named with a '_queue' suffix. Note that we're still writing to the same database server: there is still the same number of connections being created by each process. This solution preallocates more space given two databases are involved, but as you know space is cheap nowadays. Both insertion of jobs and worker reads must take place on the same database. Here is where we discovered cohesion pays: if you have this information in a single place it is very easy to change configuration. If you have a singleton database, because "we should only have one database in this application, it will never change" this feature would cost you a lot. Fortunately, in our case it was about 10 lines of code, including the refactoring on the Factory Methods that created MongoDB database objects. Long term This solution is not for the long term, as we know the numbers of machines and their workers pool will increase in the future; a sufficiently high number of workers will saturate the connections available on the MongoDB server and lock the common collection until a pick of a job takes dozens of seconds. The design towards which we are moving includes one "foreman" to each machine, and many workers under his control; only the foreman polls the database and may lock the common collection. Distributing the job pool is not what we want for ease of retrieval of a job in case something goes bad (ever done a query on multiple databases?). Also, we don't want a push solution as it will involve the registration of workers or foremen to a central point of failure that assignes them their jobs. Since most of our servers are shutdown and rebooted according to the user load, we prefer a dynamic solution where a server can start picking jobs whenever it wants and stop without notifying remote machines.
December 6, 2013
by Giorgio Sironi
· 27,600 Views
article thumbnail
Adding Java 8 Lambda Goodness to JDBC
Data access, specifically SQL access from within Java, has never been nice. This is in large part due to the fact that the JDBC api has a lot of ceremony. Java 7 vastly improved things with ARM blocks by taking away a lot of the ceremony around managing database objects such as Statements and ResultSets but fundamentally the code flow is still the same. Java 8 Lambdas gives us a very nice tool for improving the flow of JDBC. Out first attempt at improving things here is very simply to make it easy to work with ajava.sql.ResultSet. Here we simply wrap the ResultSet iteration and then delegate it to Lambda function. This is very similar in concept to Spring's JDBCTemplate. NOTE: I've released All the code snippets you see here under an Apache 2.0 license on Github. First we create a functional interface called ResultSetProcessor as follows: @FunctionalInterface public interface ResultSetProcessor { public void process(ResultSet resultSet, long currentRow) throws SQLException; } Very straightforward. This interface takes the ResultSet and the current row of theResultSet as a parameter. Next we write a simple utility to which executes a query and then calls ourResultSetProcessor each time we iterate over the ResultSet: public static void select(Connection connection, String sql, ResultSetProcessor processor, Object... params) { try (PreparedStatement ps = connection.prepareStatement(sql)) { int cnt = 0; for (Object param : params) { ps.setObject(++cnt, param)); } try (ResultSet rs = ps.executeQuery()) { long rowCnt = 0; while (rs.next()) { processor.process(rs, rowCnt++); } } catch (SQLException e) { throw new DataAccessException(e); } } catch (SQLException e) { throw new DataAccessException(e); } } Note I've wrapped the SQLException in my own unchecked DataAccessException. Now when we write a query it's as simple as calling the select method with a connection and a query: select(connection, "select * from MY_TABLE",(rs, cnt)-> { System.out.println(rs.getInt(1)+" "+cnt) }); So that's great, but I think we can do more... One of the nifty Lambda additions in Java is the new Streams API. This would allow us to add very powerful functionality with which to process a ResultSet. Using the Streams API over a ResultSet however creates a bit more of a challenge than the simple select with Lambda in the previous example. The way I decided to go about this is create my own Tuple type which represents a single row from a ResultSet. My Tuple here is the relational version where a Tuple is a collection of elements where each element is identified by an attribute, basically a collection of key value pairs. In our case the Tuple is ordered in terms of the order of the columns in the ResultSet. The code for the Tuple ended up being quite a bit so if you want to take a look, see the GitHub project in the resources at the end of the post. Currently the Java 8 API provides the java.util.stream.StreamSupport object which provides a set of static methods for creating instances of java.util.stream.Stream. We can use this object to create an instance of a Stream. But in order to create a Stream it needs an instance ofjava.util.stream.Spliterator. This is a specialised type for iterating and partitioning a sequence of elements, the Stream needs for handling operations in parallel. Fortunately the Java 8 api also provides the java.util.stream.Spliterators class which can wrap existing Collection and enumeration types. One of those types being ajava.util.Iterator. Now we wrap a query and ResultSet in an Iterator: public class ResultSetIterator implements Iterator { private ResultSet rs; private PreparedStatement ps; private Connection connection; private String sql; public ResultSetIterator(Connection connection, String sql) { assert connection != null; assert sql != null; this.connection = connection; this.sql = sql; } public void init() { try { ps = connection.prepareStatement(sql); rs = ps.executeQuery(); } catch (SQLException e) { close(); throw new DataAccessException(e); } } @Override public boolean hasNext() { if (ps == null) { init(); } try { boolean hasMore = rs.next(); if (!hasMore) { close(); } return hasMore; } catch (SQLException e) { close(); throw new DataAccessException(e); } } private void close() { try { rs.close(); try { ps.close(); } catch (SQLException e) { //nothing we can do here } } catch (SQLException e) { //nothing we can do here } } @Override public Tuple next() { try { return SQL.rowAsTuple(sql, rs); } catch (DataAccessException e) { close(); throw e; } } } This class basically delegates the iterator methods to the underlying result set and then on the next() call transforms the current row in the ResultSet into my Tuple type. And that's the basics done (This class will need a little bit more work though). All that's left is to wire it all together to make a Stream object. Note that due to the nature of a ResultSet it's not a good idea to try process them in parallel, so our stream cannot process in parallel. public static Stream stream(final Connection connection, final String sql, final Object... parms) { return StreamSupport .stream(Spliterators.spliteratorUnknownSize( new ResultSetIterator(connection, sql), 0), false); } Now it's straightforward to stream a query. In the usage example below I've got a table TEST_TABLE with an integer column TEST_ID which basically filters out all the non even numbers and then runs a count: long result = stream(connection, "select TEST_ID from TEST_TABLE") .filter((t) -> t.asInt("TEST_ID") % 2 == 0) .limit(100) .count(); And that's it! We now have a very powerful way of working with a ResultSet. So all this code is available under an Apache 2.0 license on GitHub here. I've rather lamely dubbed the project "lambda tuples," and the purpose really is to experiment and see where you can take Java 8 and Relational DB access, so please download or feel free to contribute.
December 5, 2013
by Julian Exenberger
· 78,293 Views · 6 Likes
article thumbnail
Implementing the “Card” UI Pattern in PhoneGap/HTML5 Applications
The Card UI pattern is a common look used by Pinterest and many other content sites. See how you can make a PhoneGap app with this look.
December 2, 2013
by Andrew Trice
· 116,272 Views · 2 Likes
article thumbnail
Deconstructing the Azure Point-to-Site VPN for Command Line usage
when configuring an azure virtual network one of the most common things you'll want to do is setup a point-to-site vpn so that you can actually get to your servers to manage and maintain them. azure point-to-site vpns use client certificates to secure connections which can be quite complicated to configure so microsoft has gone the extra mile to make it easy for you to configure and get setup – sadly at the cost of losing the ability to connect through the command line or through powershell – let's change that. current state of play == no command line vpn connections normally when you want to launch a vpn from the cli or powershell in windows you can simply use the following command: rasdial "my home vpn" the azure pre-packaged vpn doesn't allow this because it's really just not a normal vpn. it's something else , something mysterious - not a normal native windows vpn connection. when you run the azure vpn through the command line you get this (you'll see a hint as to why i'd be using azure point-to-site in this screenshot): azure vpns don't appear to support this. if you want to keep your servers behind a private network in azure and use continuous deployment to get your code into production this makes it hard to deploy without a human being around. not really the best case scenario – especially when you remind yourself that automated builds aim to do away with human error altogether. what the azure point-to-site looks like out of the box when you first go to setup a point-to-site vpn into your azure virtual network microsoft points you at a page that walks you through creating a client certificate on your local machine to use as authentication. they then get you to download a package for setting up the azure vpn ras dialler on your local machine. this is accessed from within the azure "networks" page for your virtual network. you install this package and then whenever connecting you're greeted with a connection screen that you might of seen in a previous life. and by seen i don't mean that windows azure virtual networks have been around for ages. but more that the login screen may look familiar. this is because this login screen is a microsoft " connection manager " login screen and has been around for a while. example from technet (note extremely dated bitmap awesomeness): connection manager is used to pre-package vpn and dial up connections for easy-install distribution in a large organisation. this also means we can reconstruct the underlying vpn connection and use it as a normal vpn – claiming back our cli super powers. digging through the details so what we really want to know is: what is this mystical vpn technology the people at microsoft have bestowed upon us? here's how i started getting more information about the implementation: connecting once successfully then disconnect. open it up again to connect and click on properties then clicking on view log you'll then be greeted by something that looks like this: ****************************************************************** operating system : windows nt 6.2 dialler version : 7.2.9200.16384 connection name : my azure virtual network all users/single user : single user start date/time : 24/11/2013, 7:50:31 ****************************************************************** module name, time, log id, log item name, other info for connection type, 0=dial-up, 1=vpn, 2=vpn over dial-up ****************************************************************** [cmdial32] 7:50:31 03 pre-init event callingprocess = c:\windows\system32\cmmon32.exe [cmdial32] 7:50:39 04 pre-connect event connectiontype = 1 [cmdial32] 7:50:39 06 pre-tunnel event username = myclientsslcertificate domain = dunsetting = [obfuscated azure gateway id] tunnel devicename = tunneladdress = [obfuscated azure gateway id].cloudapp.net [cmdial32] 7:50:44 07 connect event [cmdial32] 7:50:44 08 custom action dll actiontype = connect actions description = to update your routing table actionpath = c:\users\doug\appdata\roaming\microsoft\network\connections\cm\[obfuscated azure gateway id]\cmroute.dll returnvalue = 0x0 [cmmon32] 7:56:21 23 external disconnect [cmdial32] 7:56:21 13 disconnect event callingprocess = c:\windows\explorer.exe more importantly you'll see this path included in the connection: within this folder is all the magic connection manager odds and ends. apologies for the [obfuscated], simply the path contains information to my azure endpoint. within this folder you'll see a bunch of files: most importantly there is a pbk file – a personal phonebook. this is what stores the connect settings for the vpn as is a commonly distributed way of sending out connection settings in the enterprise. if you run this on its own you'll actually be able to connect to the vpn directly (without your network routes being updated). this phonebook is where we can steal our settings from to recreate a command line driven connection. setting it up open up the properties of your azure point-to-site vpn phonebook above, and copy the connection address. it will look like this: azuregateway-[guid].cloudapp.net open network sharing centre , and create a new connection. then select connect to a workplace . select that you'll "use my internet connection". then enter your azure point-to-site vpn address and then give your new connection a name. remember this name for later then click create to save your vpn. now open the connection properties for your newly created vpn. this is where we'll use the settings in your azure diallers config to setup your connection. i'll save you the hassle of showing you me copying the settings from one connection to another and instead i'll just focus on what you need to set them to. flick over to the options tab and then click ppp settings . click the 2 missing options enable software compression and negotiate multi-link for single-link connections . set the type of vpn to secure socket tunnelling protocol (sstp), turn on eap and select microsoft: smart card of other certificate as the authentication type. then click on properties . select "use a certificate on this computer", un-tick "connect to these servers", and then select the certificate that uses your azure endpoint uri as its certificate name and then save out. then flick over to the network tab. open tcp/ipv4 then advanced then untick use default gateway on remote network . this setting stops internet traffic going over the vpn while you're connected so you can still surf reddit while managing your azure environment. close the vpn configuration panel. you now have a working vpn connection to azure. when you connect using windows you'll be asked to select the name of the client certificate you'll be authenticating with. you select the certificate you created and uploaded into azure before you setup your connection. when you connect using the command line you don't need to specify your certificate: rasdial "azure vpn" but there's one catch: your local machine's route table doesn't know when to send any traffic to your azure virtual network. the network link is there, but windows doesn't know what to send over your internet link and what to send over the vpn link. you see microsoft did a few things when they packaged your connection manager, and one of these things was to also copy a file called "cmroute.dll" and call this after connection to route your traffic onto your virtual network. this file altered your routing table to route traffic to your virtual network subnets through the vpn connection . we can do the same thing – so lets go about it. what's this about routing... rooting (for the english speakers in the room) my azure virtual network consists of the following network range: 10.0.0.0/8 i also have the following subnets for different machines groups. 10.0.1.0/24 (web servers) 10.0.2.0/24 (application servers) 10.0.3.0/24 (management services) my pptp connections, or point-to-site connections sit on the range: 172.16.0/24 this means that when i connect to the azure vpn i will get an ip address in this range. example: 172.16.0.17 when this happens we need to tell windows to route all traffic going to my 10.0.x.x range ip addresses through the ip address that has been given to us by azure's vpn rras service. you can see your current routing table by entering route print into a command prompt or powershell console. automating the routing additions luckily the windows task scheduler supports event listeners that allow us to watch for vpn connections and run commands off the back of them. take the below powershell script below and save it for arguments sake in c:\scripts\updateroutetableforazurevpn.ps1 ############################################################# # adds ip routes to azure vpn through the point-to-site vpn ############################################################# # define your azure subnets $ips = @("10.0.1.0", "10.0.2.0","10.0.3.0") # point-to-site ip address range # should be the first 4 octets of the ip address '172.16.0.14' == '172.16.0. $azurepptprange = "172.16.0." # find the current new dhcp assigned ip address from azure $azureipaddress = ipconfig | findstr $azurepptprange # if azure hasn't given us one yet, exit and let u know if (!$azureipaddress){ "you do not currently have an ip address in your azure subnet." exit 1 } $azureipaddress = $azureipaddress.split(": ") $azureipaddress = $azureipaddress[$azureipaddress.length-1] $azureipaddress = $azureipaddress.trim() # delete any previous configured routes for these ip ranges foreach($ip in $ips) { $routeexists = route print | findstr $ip if($routeexists) { "deleting route to azure: " + $ip route delete $ip } } # add our new routes to azure virtual network foreach($subnet in $ips) { "adding route to azure: " + $subnet echo "route add $ip mask 255.255.255.0 $azureipaddress" route add $subnet mask 255.255.255.0 $azureipaddress } now execute the following from an elevated command prompt window. this tells windows to add an event listener based task that looks for events to our "azure vpn" connection and if it sees them, it runs our powershell script. schtasks /create /f /tn "vpn connection update" /tr "powershell.exe -noninteractive -command c:\scripts\updateroutetableforazurevpn.ps1" /sc onevent /ec application /mo "*[system[(level=4 or level=0) and (eventid=20225)]] and *[eventdata[data='azure vpn']] " if i then connect to my vpn the above script should execute. after connecting if i check my routing table by entering route print into a console application we have our routes to azure added correctly. we're done! with that we're now able to fully use an azure point-to-site vpn simply from the command line. this means we can use it as part of a build server deployment, or if you're working on it all the time you can simply set it up to connect every time you login to windows . command line usage rasdial "[connection name]" rasdial "[connection name]" /disconnect for my connection named "azure vpn" this command line usage becomes: rasdial "azure vpn" rasdial "azure vpn" /disconnect
November 29, 2013
by Douglas Rathbone
· 10,549 Views
article thumbnail
New in Neo4j: Optional Relationships with OPTIONAL MATCH
One of the breaking changes in Neo4j 2.0.0-RC1 compared to previous versions is that the -[?]-> syntax for matching optional relationships has been retired and replaced with the OPTIONAL MATCH construct. An example where we might want to match an optional relationship could be if we want to find colleagues that we haven’t worked with given the following model: Suppose we have the following data set: CREATE (steve:Person {name: "Steve"}) CREATE (john:Person {name: "John"}) CREATE (david:Person {name: "David"}) CREATE (paul:Person {name: "Paul"}) CREATE (sam:Person {name: "Sam"}) CREATE (londonOffice:Office {name: "London Office"}) CREATE UNIQUE (steve)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (john)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (david)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (paul)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (sam)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(john) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(david) We might write the following query to find people from the same office as Steve but that he hasn’t worked with: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague ==> +----------------------+ ==> | potentialColleague | ==> +----------------------+ ==> | Node[4]{name:"Paul"} | ==> | Node[5]{name:"Sam"} | ==> +----------------------+ We first find which office Steve works in and find the people who also work in that office. Then we optionally match the ‘COLLEAGUES_WITH’ relationship and only return people who Steve doesn’t have that relationship with. If we run that query in 2.0.0-RC1 we get this exception: ==> SyntaxException: Question mark is no longer used for optional patterns - use OPTIONAL MATCH instead (line 1, column 199) ==> "MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague" ==> Based on that advice we might translate our query to read like this: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague If we run that we get back more people than we’d expect: ==> +------------------------+ ==> | potentialColleague | ==> +------------------------+ ==> | Node[15]{name:"John"} | ==> | Node[14]{name:"David"} | ==> | Node[13]{name:"Paul"} | ==> | Node[12]{name:"Sam"} | ==> +------------------------+ The reason this query doesn’t work as we’d expect is because the WHERE clause immediately following OPTIONAL MATCH is part of the pattern rather than being evaluated afterwards as we’ve become used to. The OPTIONAL MATCH part of the query matches a ‘COLLEAGUES_WITH’ relationship where the relationship is actually null, something of a contradiction! However, since the match is optional a row is still returned. If we include ‘c’ in the RETURN part of the query we can see that this is the case: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague, c ==> +---------------------------------+ ==> | potentialColleague | c | ==> +---------------------------------+ ==> | Node[15]{name:"John"} | | ==> | Node[14]{name:"David"} | | ==> | Node[13]{name:"Paul"} | | ==> | Node[12]{name:"Sam"} | | ==> +---------------------------------+ If we take out the WHERE part of the OPTIONAL MATCH the query is a bit closer to what we want: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) RETURN potentialColleague, c ==> +-----------------------------------------------+ ==> | potentialColleague | c | ==> +-----------------------------------------------+ ==> | Node[2]{name:"John"} | :COLLEAGUES_WITH[5]{} | ==> | Node[3]{name:"David"} | :COLLEAGUES_WITH[6]{} | ==> | Node[4]{name:"Paul"} | | ==> | Node[5]{name:"Sam"} | | ==> +-----------------------------------------------+ If we introduce a WITH after the OPTIONAL MATCH we can choose to filter out those people that we’ve already worked with: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WITH potentialColleague, c WHERE c IS null RETURN potentialColleague If we evaluate that query it returns the same output as our original query: ==> +----------------------+ ==> | potentialColleague | ==> +----------------------+ ==> | Node[4]{name:"Paul"} | ==> | Node[5]{name:"Sam"} | ==> +----------------------+
November 26, 2013
by Mark Needham
· 21,476 Views · 7 Likes
article thumbnail
How to Create a Range From 1 to 10 in SQL
How do you create a range from 1 to 10 in SQL? Have you ever thought about it? This is such an easy problem to solve in any imperative language, it’s ridiculous. Take Java (or C, whatever) for instance: for (int i = 1; i <= 10; i++) System.out.println(i); This was easy, right? Things even look more lean when using functional programming. Take Scala, for instance: (1 to 10) foreach { t => println(t) } We could fill about 25 pages about various ways to do the above in Scala, agreeing on how awesome Scala is (or what hipsters we are). But how to create a range in SQL? … And we’ll exclude using stored procedures, because that would be no fun. In SQL, the data source we’re operating on are tables. If we want a range from 1 to 10, we’d probably need a table containing exactly those ten values. Here are a couple of good, bad, and ugly options of doing precisely that in SQL. OK, they’re mostly bad and ugly. By creating a table The dumbest way to do this would be to create an actual temporary table just for that purpose: CREATE TABLE "1 to 10" AS SELECT 1 value FROM DUAL UNION ALL SELECT 2 FROM DUAL UNION ALL SELECT 3 FROM DUAL UNION ALL SELECT 4 FROM DUAL UNION ALL SELECT 5 FROM DUAL UNION ALL SELECT 6 FROM DUAL UNION ALL SELECT 7 FROM DUAL UNION ALL SELECT 8 FROM DUAL UNION ALL SELECT 9 FROM DUAL UNION ALL SELECT 10 FROM DUAL See also this SQLFiddle This table can then be used in any type of select. Now that’s pretty dumb but straightforward, right? I mean, how many actual records are you going to put in there? By using a VALUES() table constructor This solution isn’t that much better. You can create a derived table and manually add the values from 1 to 10 to that derived table using the VALUES() table constructor. In SQL Server, you could write: SELECT V FROM ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) [1 to 10](V) See also this SQLFiddle By creating enough self-joins of a sufficent number of values Another “dumb”, yet a bit more generic solution would be to create only a certain amount of constant values in a table, view or CTE (e.g. two) and then self join that table enough times to reach the desired range length (e.g. four times). The following example will produce values from 1 to 10, “easily”: WITH T(V) AS ( SELECT 0 FROM DUAL UNION ALL SELECT 1 FROM DUAL ) SELECT V FROM ( SELECT 1 + T1.V + 2 * T2.V + 4 * T3.V + 8 * T4.V V FROM T T1, T T2, T T3, T T4 ) WHERE V <= 10 ORDER BY V See also this SQLFiddle By using grouping sets Another way to generate large tables is by using grouping sets, or more specifically by using the CUBE() function. This works much in a similar way as the previous example when self-joining a table with two records: SELECT ROWNUM FROM ( SELECT 1 FROM DUAL GROUP BY CUBE(1, 2, 3, 4) ) WHERE ROWNUM <= 10 See also this SQLFiddle By just taking random records from a “large enough” table In Oracle, you could probably use ALL_OBJECTs. If you’re only counting to 10, you’ll certainly get enough results from that table: SELECT ROWNUM FROM ALL_OBJECTS WHERE ROWNUM <= 10 See also this SQLFiddle What’s so “awesome” about this solution is that you can cross join that table several times to be sure to get enough values: SELECT ROWNUM FROM ALL_OBJECTS, ALL_OBJECTS, ALL_OBJECTS, ALL_OBJECTS WHERE ROWNUM <= 10 OK. Just kidding. Don’t actually do that. Or if you do, don’t blame me if your productive system runs low on memory. By using the awesome PostgreSQL GENERATE_SERIES() function Incredibly, this isn’t part of the SQL standard. Neither is it available in most databases but PostgreSQL, which has the GENERATE_SERIES() function. This is much like Scala’s range notation: (1 to 10) SELECT * FROM GENERATE_SERIES(1, 10) See also this SQLFiddle By using CONNECT BY If you’re using Oracle, then there’s a really easy way to create such a table using the CONNECT BY clause, which is almost as convenient as PostgreSQL’s GENERATE_SERIES() function: SELECT LEVEL FROM DUAL CONNECT BY LEVEL < 10 See also this SQLFiddle By using a recursive CTE Recursive common table expressions are cool, yet utterly unreadable. the equivalent of the above Oracle CONNECT BY clause when written using a recursive CTE would look like this: WITH "1 to 10"(V) AS ( SELECT 1 FROM DUAL UNION ALL SELECT V + 1 FROM "1 to 10" WHERE V < 10 ) SELECT * FROM "1 to 10" See also this SQLFiddle By using Oracle’s MODEL clause A decent “best of” comparison of how to do things in SQL wouldn’t be complete without at least one example using Oracle’s MODEL clause (see this awesome use-case for Oracle’s spreadsheet feature). Use this clause only to make your co workers really angry when maintaining your SQL code. Bow before this beauty! SELECT V FROM ( SELECT 1 V FROM DUAL ) T MODEL DIMENSION BY (ROWNUM R) MEASURES (V) RULES ITERATE (10) ( V[ITERATION_NUMBER] = CV(R) + 1 ) ORDER BY 1 See also this SQLFiddle Conclusion There aren’t actually many nice solutions to do such a simple thing in SQL. Clearly, PostgreSQL’s GENERATE_SERIES() table function is the most beautiful solution. Oracle’s CONNECT BY clause comes close. For all other databases, some trickery has to be applied in one way or another. Unfortunately.
November 20, 2013
by Lukas Eder
· 30,270 Views
article thumbnail
Neo4j: Modeling Hyper Edges in a Property Graph
At the Graph Database meet up in Antwerp, we discussed how you would model a hyper edge in a property graph like Neo4j, and I realized that I’d done this in my football graph without realizing. A hyper edge is defined as follows: A hyperedge is a connection between two or more vertices, or nodes, of a hypergraph. A hypergraph is a graph in which generalized edges (called hyperedges) may connect more than two nodes with discrete properties. In Neo4j, an edge (or relationship) can only be between itself or another node; there’s no way of creating a relationship between more than 2 nodes. I had problems when trying to model the relationship between a player and a football match because I wanted to say that a player participated in a match and represented a specific team in that match. I started out with the following model: Unfortunately, creating a direct relationship from the player to the match means that there’s no way to work out which team they played for. This information is useful because sometimes players transfer teams in the middle of a season and we want to analyze how they performed for each team. In a property graph, we need to introduce an extra node which links the match, player and team together: Although we are forced to adopt this design it actually helps us realize an extra entity in our domain which wasn’t visible before – a player’s performance in a match. If we want to capture information about a players’ performance in a match we can store it on this node. We can also easily aggregate players stats by following the played relationship without needing to worry about the matches they played in. The Neo4j manual has a few more examples of domain models containing hyper edges which are worth having a look at if you want to learn more.
November 19, 2013
by Mark Needham
· 7,030 Views
  • Previous
  • ...
  • 502
  • 503
  • 504
  • 505
  • 506
  • 507
  • 508
  • 509
  • 510
  • 511
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×