Performance Resources

The Latest Performance Topics

Introducing BIRT iHub F-Type: Installing on Windows

Originally written by Virgil Dodson Actuate recently released a new, free BIRT server called the BIRT iHub F-Type. It incorporates all the functionality of BIRT iHub and is limited only by the capacity of output it can deliver on a daily basis. It is ideal for departmental and smaller scale applications. When BIRT F-Type reaches its maximum output capacity, additional capacity can be purchased on a subscription based model. Some of the key features of BIRT iHub F-Type that will help improve your BIRT content applications are: Interactivity – Allow end-users to modify and personalize reports, and answer questions themselves. Scheduling – Automate report generation based on rules and calendar, and then notify users. Sharing – Secure document management and distribution that allows users to only access content/data they are entitled to. Excel Emitter – Export as native Excel (not CSV) with formulas/pivot tables/worksheets/charts. Integration – JavaScript API to embed dynamic reports and visualizations in your web app. Downloading BIRT iHub F-Type Before we get started with the installation process, we need to download BIRT iHub F-Type. There are three downloads available: Windows, Linux, and a VMware image. This blog will cover the Windows installation. If you’re installing either of the other types, you’ll find links to guides for them at the bottom of this blog post. Once you click on your chosen download, you’ll be asked to register. If you’ve already registered, click the “Click to Login” button. If not, fill out the short registration form to get started. Next, read and accept the license agreement. Once you’ve done that, click the checkbox, and a link for the download will appear. Click that to start your download. At this point, you should also receive an email with an activation code. Be sure to check your spam folder if you don’t see it in your inbox. Installing BIRT iHub F-Type After the download is complete, launch the executable file named ActuateBIRTiHubFType.exe. A welcome message will appear. Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation. The default is C:\Actuate\BIRTiHub. If you have existing BIRT designs that depend on a JDBC database driver, you can optionally specify the folder where these drivers are located. Press Next to continue. Once the installation has finished, press Finish to launch the BIRT iHub F-Type. A desktop shortcut is also created that points to the iHub F-Type URL at http://localhost:8700/iportal. The first time you launch the BIRT iHub F-Type, you will need to activate it. Enter the activation code that you should have received in an e-mail. After entering a valid activation code, you should receive a message that the code was accepted and the BIRT iHub F-Type should start initializing services. Once that has completed, you will be presented with the login screen. The default user name is “administrator” and the password is blank for your first log in. You’ll be able to change this after you have logged in. Press “Log In” to continue. The first time you launch the BIRT iHub F-Type, you will be in tutorial mode which will help you get started loading your BIRT content and required resources. You can bypass the tutorial mode at any time by pressing the “Exit Tutorial” button at the top right. Select a BIRT design (*.rptdesign) file and press the Upload button. If you don’t have a BIRT design, you can download a sample from the link on the same page. The BIRT design file is automatically inspected and if there are any dependent files needed, like images, data files, BIRT report libraries, CSS styles, or other linked BIRT designs, you will be asked to upload those files as well. Once your BIRT design and dependent files are uploaded, your BIRT report will be displayed in the BIRT iHub F-Type and is now ready to explore. Thanks for reading. Now, it’s time to unleash the full power of BIRT into your application. If you have any questions or comments, please feel free to use the comments section below or visit the BIRT iHub F-Type forum. -Virgil For more blogs in the “Introducing BIRT iHub F-Type” series, see the list below: Installing iHub F-Type: Linux | VMWare Image

September 12, 2014

by Michael Singer

· 7,010 Views

Jar Hell Made Easy - Demystifying the Classpath

Some of the hardest problems a Java Developer will ever have to face are classpath errors: ClassNotFoundException, NoClassDefFoundError, Jar Hell, Xerces Hell and company. In this post we will go through the root causes of these problems, and see how a minimal tool (JHades) can help solving them quickly. We will see why Maven cannot (always) prevent classpath duplicates, and also: The only way to deal with Jar Hell Class loaders The Class loader chain Class loader priority: Parent First vs Parent Last Debugging server startup problems Making sense of Jar Hell with jHades Simple strategy for avoiding classpath problems The classpath gets fixed in Java 9? The only way to deal with Jar Hell Classpath problems can be time-consuming to debug, and tend to happen at the worst possible times and places: before releases, and often in environments where there is little to no access by the development team. They can also happen at the IDE level, and become a source of reduced productivity. We developers tend to find these problems early and often, and this is the usual response: Let's try to save us some hair and get to the bottom of this. These type of problems are hard to approach via trial and error. The only real way to solve them is to really understand what is going on, but where to start? It turns out that Jar Hell problems are simpler than what they look, and only a few concepts are needed to solve them. In the end, the common root causes for Jar Hell problems are: a Jar is missing there is one Jar too many a class is not visible where it should be But if it's that simple, then why are classpath problems so hard to debug? Jar Hell stack traces are incomplete One reason is that the stack traces for classpath problems have a lot of information missing that is needed to troubleshoot the problem. Take for example this stack trace: java.lang.IncompatibleClassChangeError: Class org.jhades.SomeServiceImpl does not implement the requested interfaceorg.jhades.SomeService org.jhades.TestServlet.doGet(TestServlet.java:19) It says that a class does not implement a certain interface. But if we look at the class source: publicclassSomeServiceImpl implementsSomeService { @Override publicvoiddoSomething() { System.out.println( "Call successful!"); } Well, the class clearly implements the missing interface! So what is going on then? The problem is that the stack trace is missing a lot of information that is critical to understanding the problem. The stack trace should have probably contained an error message such as this (we will learn what this means): The Class SomeServiceImpl of class loader /path/to/tomcat/lib does not implement the interface SomeService loaded from class loader Tomcat - WebApp - /path/to/tomcat/webapps/test This would be at least an indication of where to start: Someone new learning Java would at least know that there is this notion of class loader that is essential to understand what is going on It would make clear that one class involved was not being loaded from a WAR, but somehow from some directory on the server (SomeServiceImpl). What is a Class Loader? To start, a Class Loader is just a Java class, more exactly an instance of a class at runtime. It is NOT an inaccessible internal component of the JVM like for example the garbage collector. Take for example the WebAppClassLoader of Tomcat, here is it's javadoc. As you can see it's just a plain Java class, we can even write our own class loader if needed. Any subclass of ClassLoader will qualify as a class loader. The main responsibilities of a class loader is to known where class files are located, and then load classes on JVM demand. Everything is linked to a class loader Each object in the JVM is linked to it's Class via getClass(), and each class is linked to a class loader via getClassLoader(). This means that: Every object in the JVM is linked to a class loader! Let's see how this fact can be used to troubleshoot a classpath error scenario. How-To find where a class file really is Let's take an object and see where it's class file is located in the file system: System.out.println(service.getClass() .getClassLoader() .getResource("org/jhades/SomeServiceImpl.class")); This is the full path to the class file: jar:file:/Users/user1/.m2/repository/org/jhades/jar-2/1.0-SNAPSHOT/jar-2-1.0-SNAPSHOT.jar!/org/jhades/SomeServiceImpl.class As we can see the class loader is just a runtime component that knowns where in the file system to look for class files and how to load them. But what happens if the class loader cannot find a given class? The Class loader Chain By default in the JVM, if a class loader does not find a class, it will then ask it's parent class loader for that same class and so forth. This continues all the way up until the JVM bootstrap class loader (more on this later). This chain of class loaders is the class loader delegation chain. Class loader priority: Parent First vs Parent Last Some class loaders delegate requests immediately to the parent class loader, without searching first in their own known set of directories for the class file. A class loader operating on this mode is said to be in Parent First mode. If a class loader first looks for a class locally and only after queries the parent if the class is not found, then that class loader is said to be working in Parent Last mode. Do all applications have a class loader chain ? Even the most simple Hello World main method has 3 class loaders: The Application class loader, responsible for loading the application classes (parent first) The Extensions class loader, that loads jars from $JAVA_HOME/jre/lib/ext (parent first) The Bootstrap class loader, that loads any class shipped with the JDK such as java.lang.String (no parent class loader) What does the class loader chain of a WAR application look like? In the case of application servers like Tomcat or Websphere, the class loader chain is configured differently than a simple Hello World main method program. Take for example the case of the Tomcat class loader chain: Here we wee that each WAR runs in a WebAppClassLoader, that works in parent last mode (it can be set to parent first as well). The Common class loader loads libraries installed at the level of the server. What does the Servlet spec say about class loading? Only a small part of the class loader chain behavior is defined by the Servlet container specification: The WAR application runs on it's own application class loader, that might be shared with other applications or not The files in WEB-INF/classes take precedence over everything else After that, it's anyones guess! The rest is completely open for interpretation by container providers. Why isn't there a common approach for class loading across vendors? Usually open source containers like Tomcat or Jetty are configured by default to look for classes in the WAR first, and only then search in server class loaders. This allows for applications to use their own versions of libraries that override the ones available on the server. What about the big iron servers? Commercial products like Websphere will try to 'sell' you their own server provided libraries, that by default take precedence over the ones installed on the WAR. This is done assuming that if you bought the server you want also to use the JEE libraries and versions it provides, which is often NOT the case. This makes deploying to certain commercial products a huge hassle, as they behave differently then the Tomcat or Jetty that developers use to run applications in their workstation. We will see further on a solution for this. Common Problem: duplicate class versions At this moment you probably have a huge question: What if there are two jars inside a WAR that contain the exact same class? The answer is that the behavior is undetermined and only at runtime one of the two classes will be chosen. Which one gets chosen depends on the internal implementation of the class loader, there is no way to know upfront. But luckily most projects these days use Maven, and Maven solves this problem by ensuring only one version of a given jar is added to the WAR. So a Maven project is immune to this particular type of Jar Hell, right? Why Maven does not prevent classpath duplicates Unfortunately Maven cannot help in all Jar Hell situations. In fact, many Maven projects that don't use certain quality control plugins can have hundreds of duplicate class files on the classpath (I saw trunks with over 500 duplicates). There are several reasons for that: Library publishers occasionally change the artifact name of a jar: This happens due to re-branding or other reasons. Take for example the example of the JAXB jar. There is no way Maven can identify those artifacts as being the same jar! Some jars are published with and without dependencies: Some library providers provide a 'with dependencies' version of a jar, which includes other jars inside. If we have transitive dependencies with the two versions, we will end up with duplicates. Some classes are copied between jars: Some library creators, when faced with the need for a certain class will just grab it from another project and copy it to a new jar without changing the package name. Are all class files duplicates dangerous? If the duplicate class files exist inside the same class loader, and the two duplicate class files are exactly identical then it does not matter which one gets chosen first - this situation is not dangerous. If the two class files are inside the same class loader and they are not identical, then there is no way which one will be chosen at runtime - this is problematic and can manifest itself when deploying to different environments. If the class files are in two different class loaders, then they are never considered identical (see the class identity crisis section further on). How can WAR classpath duplicates be avoided? This problem can be avoided for example by using the Maven Enforcer Plugin, with the extra rule of Ban Duplicate Classes turned on. You can quickly check if your WAR is clean using the JHades WAR duplicate classes report as well. This tool has an option to filter 'harmless' duplicates (same class file size). But even a clean WAR might have deployment problems: Classes missing, classes taken from the server instead of the WAR and thus with the wrong version, class cast exceptions, etc. Debugging the classpath with JHades Classpath problems often show up when the application server is starting up, which is a particularly bad moment specially when deploying to an environment where there is limited access. JHades is a tool to help deal it with Jar Hell (disclaimer: I wrote it). It's a single Jar with no dependencies other than the JDK7 itself. This is an example of how to use it: newJHades() .printClassLoaders() .printClasspath() .overlappingJarsReport() .multipleClassVersionsReport() .findClassByName("org.jhades.SomeServiceImpl") This prints to the screen the class loader chain, jars, duplicate classes, etc. Debugging server startup problems JHades works works well in scenarios where the server does not start properly. A servlet listener is provided that allows to print classpath debugging information even before any other component of the application starts running. ClassCastException and the Class Identity Crisis When troubleshooting Jar Hell, beware of ClassCastExceptions. A class is identified in the JVM not only by it's fully qualified class name, but also by it's class loader. This is counterintuitive but in hindsight makes sense: We can create two different classes with the same package and name, ship them in two jars and put them in two different class loaders. One let's say extends ArrayList and the other is a Map. The classes are therefore completely different (despite the same name) and cannot be cast to each other! The runtime will throw a CCE to prevent this potential error case, because there is no guarantee that the classes are castable. Adding the class loader to the class identifier was the outcome of the Class Identity Crisis that occurred in earlier Java days. A Strategy for Avoiding Classpath Problems This is easier said then done, but the best way to avoid classpath related deployment problems is to run the production server in Parent Last mode. This way the class versions of the WAR take precedence over the ones on the server, and the same classes are used in production and in a developer workstation where it's likely that Tomcat, Jetty or other open source Parent Last server is being used. In certain servers like Websphere, this is not sufficient and you also have to provide special properties on the manifest file to explicitly turn off certain libraries like for example JAX-WS. Fixing the classpath in Java 9 In Java 9 the classpath gets completely revamped with the new Jigsaw modularity system. In Java 9 a jar can be declared as a module and it will run in it's own isolated class loader, that reads class files from other similar module class loaders in an OSGI sort of way. This will allow multiple versions of the same Jar to coexist in the same application if needed. Conclusions In the end, Jar Hell problems are not that low level or unapproachable as they might seem at first. It's all about zip files (jars) being present/ not being present in certain directories, how to find those directories, and how to debug the classpath in environments with limited access. By knowing a limited set of concepts such as Class Loaders, the Class Loader Chain and Parent First / Parent Last modes, these problems can be tackled effectively. External links This presentation Do you really get class loaders from Jevgeni Kabanov of ZeroTurnaround (JRebel company) is a great resource about Jar Hell and the different type of classpath related exceptions.

September 8, 2014

by Vasco Cavalheiro

· 55,126 Views · 7 Likes

Fibonacci Tutorial with Java 8 Examples: recursive and corecursive

Learn Fibonacci Series patterns and best practices with easy Java 8 source code examples in this outstanding tutorial by Pierre-Yves Saumont

September 5, 2014

by Pierre-Yves Saumont

· 49,825 Views · 6 Likes

Securing JBoss EAP 6 - Implementing SSL

Security is one of the most important features while running a JBoss server in a production environment. Implementing SSL and securing communications is a must do, to avoid malicious use. This blogs details the steps you could take to secure JBoss EAP 6 running in Domain mode. These are probably documented by RedHat but the documentation seems a bit scattered. The idea behind this blog is to put together everything in one place. In Order to enhance security in JBoss EAP 6, SSL/encryption can be implemented for the following Admin console access – enable https access for admin console Domain Controller – Host controller communication – Communication between the main domain controller and all the other host controllers should be secured. Jboss CLI – enable ssl for the command line interface The below example uses a single keystore being both the key and truststore and also uses CA signed certificates. You could use self-signed certificates and/or separated keystores and truststores if required. Create the keystores (certificates for each of the servers) keytool -genkeypair -alias testServer.prd -keyalg RSA -keysize 2048 -validity 730 -keystore testServer.prd.jks Generate a certificate signing request (CSR) for the Java keystore keytool -certreq -alias testServer.prd -keystore testServer.prd.jks -file testServer.prd.csr Get the CSR signed by the Certificate Authorities Import a root or intermediate CA certificate to the existing Java keystore keytool -import -trustcacerts -alias root -file rootCA.crt -keystore testServer.prd.jks Import the signed primary certificate to the existing Java keystore. Keytool -importcert -keystore testServer.prd.jks -trustcacerts -alias testServer.prd -file testServer.prd.crt Repeat steps 1-6 for each of the servers. In order to establish trust between the master and slave hosts, Import the signed certificates of all the (slave) servers that the Domain Controller must trust onto the Domain Controllers Keystore keytool -importcert -keystore testServer.prd.jks -trustcacerts -alias slaveServer.prd -file slaveServers.prd.crt repeat step for all slave hosts. Import the signed certificate of the Domain controller onto the slave hosts keytool -importcert -keystore slaveServer.prd.jks -trustcacerts -alias testServer.prd -file testServer.prd.crt repeat steps for all slave hosts This has be to done because (as per RedHat’s Documentation) There is a problem with this methodology when trying to configure one way SSL between the servers, because there the HC's and the DC (depending on what action is being performed) switch roles (client, server). Because of this one way SSL configuration will not work and it is recommended that if you need SSL between these two endpoints that you configure two way SSL Once this is done, we now have signed certificates loaded onto the java keystore. In Jboss EAP 6 , the http-interface which provides access to the admin console, by default uses the ManagementRealm to provide file based authentication. (mgmt.-users.properties).The next step is to modify the configurations in the host.xml, to make the ManagementRealm use the certificates we created above. The host.xml should be modified to look like: view source print? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. On the Slave hosts, In addition to the above configuration, the following needs to be changed view source print? 1. 2. 3. " 4. 5. Once you make the above changes and restart the servers, you should be able to access the admin console via https. https://testServer.prd:9443/console Finally, in order to secure cli authentication Modify /opt/jboss/jboss-eap-6.1/bin/jboss-cli.xml for each server and add view source print? 01. 02. 03. testServer.prd 04. 05. /opt/jboss/jboss-eap-6.1/domain/configuration/testServer.prd.jks 06. 07. xxxx 08. 09. /opt/jboss/jboss-eap-6.1/domain/configuration/testServer.prd.jks 10. 11. xxxx 12. 13. true 14. 15.

August 28, 2014

by Arvind Anandam

· 11,479 Views

Setting up Java Applications to Communicate with MongoDB, Kerberos and SSL

By Alex Komyagin, Technical Services Engineer at MongoDB Setting up Kerberos authentication and SSL encryption in a MongoDB Java application is not as simple as other languages. In this post, I’m going to show you how to create a Kerberos and SSL enabled Java application that communicates with MongoDB. My original setup consists of the following: 1) KDC server: kdc.mongotest.com kerberos config file (/etc/krb5.conf): [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = MONGOTEST.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true [realms] MONGOTEST.COM = { kdc = kdc.mongotest.com admin_server = kdc.mongotest.com } [domain_realm] .mongotest.com = MONGOTEST.COM mongotest.com = MONGOTEST.COM KDC has the following principals: [email protected] - user principle (for java app) mongodb/[email protected] - service principle (for mongodb server) 2) MongoDB server: rhel64.mongotest.com MongoDB version: 2.6.0 MongoDB config file: dbpath= logpath= fork=true auth = true setParameter = authenticationMechanisms=GSSAPI sslOnNormalPorts = true sslPEMKeyFile = /etc/ssl/mongodb.pem This server also has the global environment variable $KRB5_KTNAME set to the keytab file exported from KDC. Application user is configured in the admin database like this: { "_id" : "[email protected]", "user" : "[email protected]", "db" : "$external", "credentials" : { "external" : true }, "roles" : [ { "role" : "readWrite", "db" : "test" } ] } Download the Java driver: wget http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.12.1/mongo-java-driver-2.12.1.jar Install java and jdk: sudo yum install java-1.7.0 sudo yum install java-1.7.0-devel Create a certificate store for Java and store the server certificate there, so that Java knows who it should trust: keytool -importcert -file mongodb.crt -alias mongoCert -keystore firstTrustStore (mongodb.crt is just a public certificate part of mongodb.pem) Copy kerberos config file to the application server: /etc/krb5.conf or ““C:\WINDOWS\krb5.ini“` (otherwise you’ll have to specify kdc and realm as Java runtime options) Use kinit to store the principal password on the application server: kinit [email protected] As an alternative to kinit, you can use JAAS to cache kerberos credentials. Compile and run the Java program javac -cp ../mongo-java-driver-2.12.1.jar SSLApp.java java -cp .:../mongo-java-driver-2.12.1.jar -Djavax.net.ssl.trustStore=firstTrustStore -Djavax.net.ssl.trustStorePassword=changeme -Djavax.security.auth.useSubjectCredsOnly=false SSLApp It is important to specify useSubjectCredsOnly=false, otherwise you’ll get the “No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)” exception from Java. As we discovered, this is not strictly necessary in all cases, but it is if you are relying on kinit to get the service ticket. The Java driver needs to construct MongoDB service principal name in order to request the Kerberos ticket. The service principal is constructed based on the server name you provide (unless you explicitly asked to canonicalize server name). For example, if I change rhel64.mongotest.com to the host IP address in the connection URI, I would be getting Kerberos exceptions No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]. So be sure you specify the same server host name as you used in the Kerberos principal (). Adding -Dsun.security.krb5.debug=true to Java runtime options helps a lot in debugging kerberos auth issues. These steps should help simplify the process of connecting Java applications with SSL. Before deploying any application with MongoDB, be sure to read through our Security Checklist which outlines recommended security measures to protect your MongoDB installation. More information on configuring MongoDB Security can be found in the MongoDB Manual. For further questions, feel free to reach out to the MongoDB team through google-groups.

August 26, 2014

by Francesca Krihely

· 8,290 Views

Introducing BIRT iHub F-Type

Actuate recently released a new, free BIRT server called the BIRT iHub F-Type. It incorporates all the functionality of BIRT iHub and is limited only by the capacity of output it can deliver on a daily basis. It is ideal for departmental and smaller scale applications. When BIRT F-Type reaches its maximum output capacity, additional capacity can be purchased on a subscription based model. Some of the key features of BIRT iHub F-Type that will help improve your BIRT content applications are: Interactivity – Allow end-users to modify and personalize reports, and answer questions themselves. Scheduling – Automate report generation based on rules and calendar, and then notify users. Sharing – Secure document management and distribution that allows users to only access content/data they are entitled to. Excel Emitter – Export as native Excel (not CSV) with formulas/pivot tables/worksheets/charts. Integration – JavaScript API to embed dynamic reports and visualizations in your web app. Downloading BIRT iHub F-Type Before we get started with the installation process, we need to download BIRT iHub F-Type. There are three downloads available: Windows, Linux, and a VMware image. This blog will cover the Windows installation. If you’re installing either of the other types, you’ll find links to guides for them at the bottom of this blog post. Once you click on your chosen download, you’ll be asked to register. If you’ve already registered, click the “Click to Login” button. If not, fill out the short registration form to get started. Next, read and accept the license agreement. Once you’ve done that, click the checkbox, and a link for the download will appear. Click that to start your download. At this point, you should also receive an email with an activation code. Be sure to check your spam folder if you don’t see it in your inbox. Installing BIRT iHub F-Type After the download is complete, launch the executable file named ActuateBIRTiHubFType.exe. A welcome message will appear. Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation. The default is C:\Actuate\BIRTiHub. If you have existing BIRT designs that depend on a JDBC database driver, you can optionally specify the folder where these drivers are located. Press Next to continue. Once the installation has finished, press Finish to launch the BIRT iHub F-Type. A desktop shortcut is also created that points to the iHub F-Type URL at http://localhost:8700/iportal. The first time you launch the BIRT iHub F-Type, you will need to activate it. Enter the activation code that you should have received in an e-mail. After entering a valid activation code, you should receive a message that the code was accepted and the BIRT iHub F-Type should start initializing services. Once that has completed, you will be presented with the login screen. The default user name is “administrator” and the password is blank for your first log in. You’ll be able to change this after you have logged in. Press “Log In” to continue. The first time you launch the BIRT iHub F-Type, you will be in tutorial mode which will help you get started loading your BIRT content and required resources. You can bypass the tutorial mode at any time by pressing the “Exit Tutorial” button at the top right. Select a BIRT design (*.rptdesign) file and press the Upload button. If you don’t have a BIRT design, you can download a sample from the link on the same page. The BIRT design file is automatically inspected and if there are any dependent files needed, like images, data files, BIRT report libraries, CSS styles, or other linked BIRT designs, you will be asked to upload those files as well. Once your BIRT design and dependent files are uploaded, your BIRT report will be displayed in the BIRT iHub F-Type and is now ready to explore. Thanks for reading. Now, it’s time to unleash the full power of BIRT into your application. If you have any questions or comments, please feel free to use the comments section below or visit the BIRT iHub F-Type forum. -Virgil For more blogs in the “Introducing BIRT iHub F-Type” series, see the list below: Installing iHub F-Type: Linux | VMWare Image - See more at: http://blogs.actuate.com/introducing-birt-ihub-f-type-installing-on-windows/#sthash.QPJhv2gw.dpuf

August 6, 2014

by Michael Singer

· 1,988 Views

JBoss Data Grid: Installation and Development

In this blog, we will discuss one particular data grid platform from Redhat namely JBoss Data Grid (JDG). We will firstly cover how to access and install this data grid platform and then we will demonstrate how to develop and deploy a simple remote client/server data grid application which utilises the HotRod protocol. We will be using the latest release JDG 6.2 from Redhat in this article. Installation Overview To start using JDG, firstly log on to the redhat site https://access.redhat.com/home and download the software from the Downloads section of the site. We wish to download JDG 6.2 server by clicking on the appropriate links in the Downloads section. For future reference, it is also useful to download the quickstart and maven repository zip files. To install JDG, we simply unzip the JDG server package into an appropriate directory in your environment. JDG Overview In this section, we will provide a brief overview of the contents of the JDG installation package and the most notable configuration options available to users. Out of the box, users are provided with two runtime options either to run JDG in standalone or clustered mode. We can start JDG in either mode by invoking the stanadalone or clustered start up scripts in the / bin directory. To configure the JDG in either mode we need to configure the files standalone.xml and clustered.xml. In our case we will creating a distributed cache which will run on 3 node JDG cluster so we will be utilizing the clustered startup script. In order to set up and add new cache instances to JDG, we modify the infinispan subsystems in the appropriate xml configuration file above. We should also note the principal difference between the standalone and clustered configuration file is that in the clustered configuration file there is a JGroups subsystem configured element which allows for communication and messaging between configured cache instances running in a JDG cluster. Development Environment Setup and Configuration In this section, we will detail how to develop and configure a simple datagrid application which will be deployed to a 3 node JDG cluster. We will demonstrate how to configure and deploy a distributed cache in JDG and also show how to develop a HotRod Java client application which will be used to insert, update and display entries in the distributed cache. We will firstly discuss setting a new distributed cache on a 3 node JDG cluster. In this example, we will run our JDG cluster on a single machine by running each JDG instance on different ports. Firstly, we will create 3 instances of JDG by creating 3 directories (server1, server2, server3) on our host machine and unzipping each JDG installation into each directory. We will now configure each node in our cluster by copying and renaming the clustered.xml configuration file in the \server1\jboss-datagrid-6.2.0-server\standalone\configuration directory. We will name each of the cluster configuration files as "clustered1.xml", "clustered2.xml" and "clustered3.xml" for the JDG instances denoted by "server1", "server2" and "server3" respectively. We will now set up a new distributed cache on our JDG cluster by modifying the infinispan subsystem element in each clustered.xml file. We will demonstrate this for the node denoted "server1" here by modifying the file "clustered1.xml". The cache configuration shown here will be the same across all 3 nodes. To setup a new distributed cache named "directory-dist-cache", we configure the following elements in the file named "clustered1.xml" ......... ...... .............. ...... ...... /socket-binding-group> We will discuss the key elements and attributes relating to the configuration above. In the infinispan endpoint subsystem, we will configure hotrod clients to connect to the JDG server instance on socket 11222. The name of the cache container to host each of the cache instances will be held in the container named "clusteredcache". We have configured the infinispan core subsystem to the default cache container named "clusteredcacahe" whereby we will allow for jmx statistics to be collected relating the configured cache entries i.e statistics="true" We have created a new distributed cache named "directory-dist-cache" whereby there will be two copies of each cache entry held on two of the 3 cluster nodes. We have also set up an eviction policy whereby should there be more than 20 entries in our cache then cache entries will be removed using the LRU algorithm We should have configured nodes "server2" and "server3" to start up with a port offset of 100 and 200 respectively by configuring the socketing binding group element appropriately. Please view the socket bindings noted below. To set the socket binding element with a port offset of 100 on "server2", we configure "clustered2.xml" with the following entry: ...... ...... /socket-binding-group> To set the socket binding element with a port offset of 200 on "server3", we configure "clustered3.xml" with the following entry: ...... ...... /socket-binding-group> Before discussing the setup and configuration of our Hotrod client which will be used to interact with our JDG clustered HotRod server, we will start up each server instance to ensure our newly configured JDG distributed cache starts up correctly. Open up 3 Windows or Linux consoles and execute the following start up commands: Console 1: 1) Navigate to \server1\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the first instance of our JDG cluster denoted "server1": clustered -c=clustered1.xml -Djboss.node.name=server1 Console 2: 1) Navigate to \server2\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the second instance of our JDG cluster denoted "server2": clustered -c=clustered2.xml -Djboss.node.name=server2 Console 3: 1) Navigate to \server3\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the third instance of our JDG cluster denoted "server3": clustered -c=clustered3.xml -Djboss.node.name=server3 Providing all 3 JDG instances have started up correctly, you should see output in the console window whereby we can see there are 3 JDG instances in the JGroups view: HotRod Client Development Setup Now that the Hotrod server is up and running, we need to develop a Hotrod Java client which will interact with the clustered server application. The development environment consists of the following tools. 1) JDK Hotspot 1.7.0_45 2) IDE - Eclipse Kepler Build id: 20130919-0819 The HotRod client application is a simple application consisting of two Java classes. The application allows users to retrieve a reference to the distributed cache from the JDG server and then perform these actions: a) add new cinema objects. b) add and remove shows to each cinema object. c) print the list of all cinemas and shows stored in our distributed cache. The source code can be downloaded from github @ https://github.com/davewinters/JDG. We could use maven here to build and execute our application by configuring the maven settings.xml to point to the maven repository files we downloaded earlier and set up a maven project file (pom.xml) to build and execute the client application. In this article we will build our application using the Eclipse IDE and run the client application on the command line. To create a HotRod client application and execute the sample application, one should complete the following steps: 1) Create a new Java Project in Eclipse 2) Create a new package named uk.co.c2b2.jdg.hotrod and import the source code that has been downloaded from Github mentioned previously. 3) Now we need to configure the build path in Eclipse to contain the appropriate JDG client jar files which are required to compile the application. You should include all the client jar files in the project build path. These jar files are contained in the JDG installation zip file. For example on my machine these jar files are located in the directory: \server1\jboss-datagrid-6.2.0-server\client\hotrod\java 4. Providing the Eclipse build path has been configured appropriately, the application source should compile without issue. 5. We will need to execute the Hotrod application by opening the console window and executing the following command. Note the path specified here will differ depending on where the JDG client jar files and application class files are located in your environment: java -classpath ".;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\commons-pool-1.6-redhat-4.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-client-hotrod-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-commons-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-query-dsl-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-remote-query-client-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-logging-3.1.2.GA-redhat-1.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-river-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protobuf-java-2.5.0.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protostream-1.0.0.CR1-redhat-1.jar" uk/co/c2b2/jdg/hotrod/CinemaDirectory 6. The Hotrod client at runtime provides the end user with a number of different options to interact with the distributed cache as we can view from the console window below. Client Application Principal API Details We will not provide a detailed overview of the Hotrod application code however we will describe the principal API and code details briefly. In order to interact with the distributed cache on the JDG cluster using the Hotrod protocol, we will use the RemoteCacheManager Object which will allow us to retrieve a remote reference to the distributed cache. We have initialised a Properties object with the list of JDG instances and the associated with HotRod server port on each instance. We can add Cinema objects into the distributed cache using the RemoteCache.put() method. private RemoteCacheManager cacheManager; private RemoteCache cache; ..... Properties properties = new Properties(); properties.setProperty(ConfigurationProperties.SERVER_LIST, "127.0.0.1:11222;127.0.0.1:11322;127.0.0.1:11422"); cacheManager = new RemoteCacheManager(properties); cache = cacheManager.getCache("directory-dist-cache"); ..... cache.put(cinemaKey, cinemalist); In the webinar below, I describe in further detail how to set up a JDG cluster and how to develop and run the JDG application discussed above. For further details on JDG please visit: http://www.redhat.com/products/jbossenterprisemiddleware/data-grid/ Webinar: Introduction to JBoss Data Grid -- Installation, Configuration and Development In this webinar we will look at the basics of setting up JBoss Data Grid covering installation, configuration and development. We will look at practical examples of storing data, viewing the data in the cache and removing it. We will also take a look at the different clustered modes and what effect these have on the storage of your data:

July 25, 2014

by David Winters

· 16,109 Views

Option.fold() Considered Unreadable

We had a lengthy discussion recently during code review whether scala.Option.fold() is idiomatic and clever or maybe unreadable and tricky? Let's first describe what the problem is. Option.fold does two things: maps a function f overOption's value (if any) or returns an alternative alt if it's absent. Using simple pattern matching we can implement it as follows: val option: Option[T] = //... def alt: R = //... def f(in: T): R = //... val x: R = option match { case Some(v) => f(v) case None => alt } If you prefer one-liner, fold is actually a combination of map and getOrElse val x: R = option map f getOrElse alt Or, if you are a C programmer that still wants to write in C, but using Scala compiler: val x: R = if (option.isDefined) f(option.get) else alt Interestingly this is similar to how fold() is actually implemented, but that's an implementation detail. OK, all of the above can be replaced with single Option.fold(): val x: R = option.fold(alt)(f) Technically you can even use /: and \: operators (alt /: option) - but that would be simply masochistic. I have three problems with option.fold() idiom. First of all - it's anything but readable. We are folding (reducing) over Option - which doesn't really make much sense. Secondly it reverses the ordinary positive-then-negative-case flow by starting with failure (absence, alt) condition followed by presence block (f function; see also: Refactoring map-getOrElse to fold). Interestingly this method would work great for me if it was named mapOrElse: ** * Hypothetical in Option */ def mapOrElse[B](f: A => B, alt: => B): B = this map f getOrElse alt Actually there is already such method in Scalaz, called OptionW.cata. cata. Here is whatMartin Odersky has to say about it: "I personally find methods like cata that take two closures as arguments are often overdoing it. Do you really gain in readability over map + getOrElse? Think of a newcomer to your code[...]"While cata has some theoretical background, Option.fold just sounds like a random name collision that doesn't bring anything to the table, apart from confusion. I know what you'll say, that TraversableOnce has fold and we are sort-of doing the same thing. Why it's a random collision rather than extending the contract described inTraversableOnce? fold() method in Scala collections typically just delegates to one offoldLeft()/foldRight() (the one that works better for given data structure), thus it doesn't guarantee order and folding function has to be associative. But inOption.fold() the contract is different: folding function takes just one parameter rather than two. If you read my previous article about folds you know that reducing function always takes two parameters: current element and accumulated value (initial value during first iteration). But Option.fold() takes just one parameter: current Option value! This breaks the consistency, especially when realizing Option.foldLeft() andOption.foldRight() have correct contract (but it doesn't mean they are more readable). The only way to understand folding over option is to imagine Option as a sequence with0 or 1 elements. Then it sort of makes sense, right? No. def double(x: Int) = x * 2 Some(21).fold(-1)(double) //OK: 42 None.fold(-1)(double) //OK: -1 but: Some(21).toList.fold(-1)(double) : error: type mismatch; found : Int => Int required: (Int, Int) => Int Some(21).toList.fold(-1)(double) ^ If we treat Option[T] as a List[T], awkward Option.fold() breaks because it has different type than TraversableOnce.fold(). This is my biggest concern. I can't understand why folding wasn't defined in terms of the type system (trait?) and implemented strictly. As an example take a look at: Data.Foldable in Haskell (advanced) Data.Foldable typeclass describes various flavours of folding in Haskell. There are familiar foldl/foldr/foldl1/foldr1, in Scala namedfoldLeft/foldRight/reduceLeft/reduceRight accordingly. They have the same type as Scala and behave unsurprisingly with all types that you can fold over, including Maybe, lists, arrays, etc. There is also a function named fold, but it has a completely different meaning: class Foldable t where fold :: Monoid m => t m -> m While other folds are quite complex, this one barely takes a foldable container of ms (which have to be Monoids) and returns the same Monoid type. A quick recap: a type can be aMonoid if there exists a neutral value of that type and an operation that takes two values and produces just one. Applying that function with one of the arguments being neutral value yields the other argument. String ([Char]) is a good example with empty string being neutral value (mempty) and string concatenation being such operation (mappend). Notice that there are two different ways you can construct monoids for numbers: under addition with neutral value being 0 (x + 0 == 0 + x == x for any x) and under multiplication with neutral 1 (x * 1 == 1 * x == x for any x). Let's stick to strings. If I fold empty list of strings, I'll get an empty string. But when a list contains many elements, they are being concatenated: > fold ([] :: [String]) "" > fold [] :: String "" > fold ["foo", "bar"] "foobar" In the first example we have to explicitly say what is the type of empty list []. Otherwise Haskell compiler can't figure out what is the type of elements in a list, thus which monoid instance to choose. In second example we declare that whatever is returned from fold [], it should be a String. From that the compiler infers that [] actually must have a type of [String]. Last fold is the simplest: the program folds over elements in list and concatenates them because concatenation is the operation defined in Monoid Stringtypeclass instance. Back to options (or more precisely Maybe). Folding over Maybe monad having type parameter being Monoid (I can't believe I just said it) has an interesting interpretation: it either returns value inside Maybe or a default Monoid value: > fold (Just "abc") "abc" > fold Nothing :: String "" Just "abc" is same as Some("abc") in Scala. You can see here that if Maybe Stringis Nothing, neutral String monoid value is returned, that is an empty string. Summary Haskell shows that folding (also over Maybe) can be at least consistent. In ScalaOption.fold is unrelated to List.fold, confusing and unreadable. I advise avoiding it and staying with slightly more verbose map/getOrElse transformations or pattern matching. PS: Did I mention there is also Either.fold() (with even different contract) but noTry.fold()?

June 26, 2014

by Tomasz Nurkiewicz

· 9,624 Views

Anomaly Detection : A Survey

this post is summary of the “anomaly detection : a survey”. anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. these non-conforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants in different application domains. anomalies are patterns in data that do not conform to a well defined notion of normal behavior. interesting to analyze unwanted noise in the data also can be found in there. novelty detection which aims at detecting previously unobserved (emergent, novel) patterns in the data challenges for anomaly detection drawing the boundary between normal and anomalous behavior availability of labeled data noisy data type of anomaly anomalies can be classified into following three categories point anomalies - an individual data instance can be considered as anomalous with respect to the rest of data contextual anomalies - a data instance is anomalous in a specific context (but not otherwise), then it is termed as a contextual anomaly (also referred as conditional anomaly). each data instance is defined using following two sets of attributes contextual attributes. the contextual attributes are used to determine the context (or neighborhood) for that instance eg: in time- series data, time is a contextual attribute which determines the position of an instance on the entire sequence behavioral attributes. the behavioral attributes define the non-contextual characteristics of an instance eg: in a spatial data set describing the average rainfall of the entire world, the amount of rainfall at any location is a behavioral attribute to explain this we will look into "exchange rate history for converting united states dollar (usd) to sri lankan rupee (lkr)"[1] contextual anomaly t2 in a exchange rate time series. note that the exchange rate at time t1 is same as that at time t2 but occurs in a different context and hence is not considered as an anomaly 3. collective anomalies - a collection of related data instances is anomalous with respect to the entire data set data labels the labels associated with a data instance denote if that instance is normal or anomalous. depending labels availability, anomaly detection techniques can be operated in one of the following three modes supervised anomaly detection - techniques trained in supervised mode assume the availability of a training data set which has labeled instances for normal as well as anomaly class semi-supervised anomaly detection - techniques that operate in a semi-supervised mode, assume that the training data has labeled instances for only the normal class. since they do not require labels for the anomaly class unsupervised anomaly detection - techniques that operate in unsupervised mode do not require training data, and thus are most widely applicable. the techniques implicit assume that normal instances are far more frequent than anomalies in the test data. if this assumption is not true then such techniques suffer from high false alarm rate output of anomaly detection anomaly detection have two types of output techniques scores. scoring techniques assign an anomaly score to each instance in the test data depending on the degree to which that instance is considered an anomaly labels. techniques in this category assign a label (normal or anomalous) to each test instance applications of anomaly detection intrusion detection intrusion detection refers to detection of malicious activity. the key challenge for anomaly detection in this domain is the huge volume of data. thus, semi-supervised and unsupervised anomaly detection techniques are preferred in this domain.denning[3] classifies intrusion detection systems into host based and net- work based intrusion detection systems. host based intrusion detection systems - this deals with operating system call traces network intrusion detection systems - these systems deal with detecting intrusions in network data. the intrusions typically occur as anomalous patterns (point anomalies) though certain techniques model[4] the data in a sequential fashion and detect anomalous subsequences (collective anomalies). a challenge faced by anomaly detection techniques in this domain is that the nature of anomalies keeps changing over time as the intruders adapt their network attacks to evade the existing intrusion detection solutions. fraud detection fraud detection refers to detection of criminal activities occurring in commercial organizations such as banks, credit card companies, insurance agencies, cell phone companies, stock market, etc. the organizations are interested in immediate detection of such frauds to prevent economic losses. detection techniques used for credit card fraud and network intrusion detection as below. statistical profiling using histograms parametric statisti- cal modeling non-parametric sta- tistical modeling bayesian networks neural networks support vector ma- chines rule-based clustering based nearest neighbor based spectral information theoretic here are some domain in fraud detections credit card fraud detection mobile phone fraud detection insurance claim fraud detection insider trading detection medical and public health anomaly detection anomaly detection in the medical and public health domains typically work with pa- tient records. the data can have anomalies due to several reasons such as abnormal patient condition or instrumentation errors or recording errors. thus the anomaly detection is a very critical problem in this domain and requires high degree of accuracy. industrial damage detection such damages need to be detected early to prevent further escalation and losses. fault detection in mechanical units structural defect detection image processing anomaly detection techniques dealing with images are either interested in any changes in an image over time (motion detection) or in regions which appear ab- normal on the static image. this domain includes satellite imagery. anomaly detection in text data anomaly detection techniques in this domain primarily detect novel topics or events or news stories in a collection of documents or news articles. the anomalies are caused due to a new interesting event or an anomalous topic. sensor networks since the sensor data collected from various wireless sensors has several unique characteristics. references [1] http://themoneyconverter.com/usd/lkr.aspx [2] varun chandola, arindam banerjee, and vipin kumar. 2009. anomaly detection: a survey. acm comput. surv. 41, 3, article 15 (july 2009), 58 pages. doi=10.1145/1541880.1541882 http://doi.acm.org/10.1145/1541880.1541882 [3] denning, d. e. 1987. an intrusion detection model. ieee transactions of software engineer-ing 13, 2, 222–232. [4]gwadera, r., atallah, m. j., and szpankowski, w. 2004. detection of significant sets of episodes in event sequences. in proceedings of the fourth ieee international conference on data mining. ieee computer society, washington, dc, usa, 3–10.

June 16, 2014

by Madhuka Udantha

· 12,943 Views

Introducing Partitioned Collections for MongoDB Applications

TokuMX 1.5 is around the corner. The big feature will be something we discussed briefly when talking about replication changes in 1.4: partitioned collections. Before introducing the feature, I wanted to mention the following. Although TokuMX 1.5 is not available as of this writing, we would love to hear feedback on partitioned collections, which we think are wonderful for time-series data, as I describe below. If you are interested in trying out the feature, email [email protected] for a pre-release version of TokuMX 1.5. What is a partitioned collection? A partitioned collection is analogous to a partitioned table in relational databases. Oracle, MySQL, SQL Server, and Postgres all support partitioned tables. We are happy to bring this functionality to TokuMX. So, if the remainder of this blog is unclear, and you have friends in the office who are familiar with relational databases, you may want to ask them for more information :). Nevertheless, a partitioned collection is a collection that underneath the covers is broken into (or partitioned into) several individual collections, based on ranges of a “partition key”. From the application developer’s point of view, the collection is just another collection. Queries, inserts, updates, and deletes just work with no syntactical changes. Secondary indexes and replication work as well. But underneath the covers, the data will be broken into several collections, with each collection responsible for all data for a range of the partition key. If you are running TokuMX 1.4, a simple example is the oplog, which is a partitioned collection. Any normal query works just fine on the oplog. However, if you look in your data directory, you will see several .tokumx files named “local_oplog_rs_p…”. These files are the individual partitions that break up the data. Each partition stores a range of _id fields in the oplog. Why should I bother using a partitioned collection? This will be its own post with longer examples, but here is a summary. Partitioned collections have two big advantages: Large chunks of data can be deleted very efficiently by dropping partitions. The cost is that of performing an “rm” on some files in the filesystem. This is really fast and efficient. Queries that include the partition key may be isolated to individual partitions, and therefore run faster. This is similar to “query isolation” for shard keys. So, one scenario you may want a partitioned collection for is where the oldest data gets dropped periodically, and many queries benefit from a time based key. That will be a good fit. In short: time series data. If you have a time-series application where you want to keep a rolling period of data (e.g. the last 6 months worth), then using a partitioned collection will be great, and is preferable to using a TTL index or a capped collection. In a future blog post, I will expand on this. How do I use a partitioned collection in TokuMX 1.5? Basically, just like a normal collection, except with some commands added to create a partitioned collection, add partitions, and drop partitions. Below, I explain the shell commands added for this functionality. Our documentation contains the full commands so that they may be called by any driver’s runCommand method. Ok, so how do I create a partitioned collection? The first thing to consider is what your partition key should be. That is, what key do you want to use ranges of to partition your data? This key has similarities with a shard key. It should be a key that can be used to isolate partitions, the way a shard key is used to isolate shards (as explained here). Also, it should be a key that contains a range of data you would like to delete all at once. With time series data, that key will likely be a timestamp. In TokuMX, the partition key is always the primary key. To create a partitioned collection, “foo”, with a timestamp field, “ts”, used for the partitioning run the following: > db.createCollection("foo", { partitioned : 1 , primaryKey : { ts : 1 , _id : 1 } }) { "ok" : 1 } Note that in TokuMX, the primary key must have the _id field appended to it to ensure uniqueness. As a side note, we do not support hash based partitioning, only range based partitioning. Adding partitions? In TokuMX, partitions can only be appended to the end. Individual partitions cannot be split. So, say we have a collection that partitions on the _id field, where all _id’s happen to be integers. Suppose we have three partitions with the following ranges: _id <= 0 0 < _id <= 1000 _id > 1000 With this collection we cannot create a partition with the range 500 < _id <= 1000, because that would split the second partition. All we can do is add a new partition to the end, and “cap” the current last partition with a new maximum value. This new maximum value must be greater than or equal to the primary key (or in this case, _id) of the last partition’s last document. So, if the last partition’s last document has an _id of 2500, we can only partitions that create a range whose maximum is at least 2500. There are two ways to add a partition. The first method peeks at the last document in the current last partition, caps the partition with the primary key of that last document, and creates a new partition. To do so, one does: > db.foo.addPartition() { "ok" : 1 } In the above example, the partitioned collection would now have partitions with the following ranges: _id <= 0 0 < _id <= 1000 1000 < _id <= 2500 _id > 2500 Alternatively, we can specify what the new maximum of the existing last partition may be, provided it is greater than the last document in last partition (which in this example is 2500). To do so, we simply pass in the new maximum as a parameter: > db.foo.addPartition({ _id : 3000 }); { "ok" : 1 } This would make the collection have partitions with the following ranges: _id <= 0 0 < _id <= 1000 1000 < _id <= 3000 _id > 3000 Dropping partitions? Dropping partitions is simple. First, see what the partitions are with the following shell command: > db.foo.getPartitionInfo() { "numPartitions" : NumberLong(4), "partitions" : [ { "_id" : NumberLong(0), "max" : { "_id" : 0 }, "createTime" : ISODate("2014-05-29T01:50:15.839Z") }, { "_id" : NumberLong(1), "max" : { "_id" : 1000 }, "createTime" : ISODate("2014-05-29T01:50:27.049Z") }, { "_id" : NumberLong(2), "max" : { "_id" : 2500 }, "createTime" : ISODate("2014-05-29T01:50:30.549Z") }, { "_id" : NumberLong(3), "max" : { "_id" : { "$maxKey" : 1 } }, "createTime" : ISODate("2014-05-29T01:50:35.903Z") } ], "ok" : 1 } This lists each partition, what the maximum value that each partition may hold (thus defining the range of the partition), and the id of the partition (in the _id field). So, in the example we used for adding partitions, we have four partitions with _ids 0 through 3. To drop a partition, we run the following command and pass the _id of the partition we want to drop. To drop partition 0, we run: > db.foo.dropPartition(0) { "ok" : 1 } Looking at the list of partitions after this operation, we see the partition is dropped: > db.foo.getPartitionInfo() { "numPartitions" : NumberLong(3), "partitions" : [ { "_id" : NumberLong(1), "max" : { "_id" : 1000 }, "createTime" : ISODate("2014-05-29T01:50:27.049Z") }, { "_id" : NumberLong(2), "max" : { "_id" : 2500 }, "createTime" : ISODate("2014-05-29T01:50:30.549Z") }, { "_id" : NumberLong(3), "max" : { "_id" : { "$maxKey" : 1 } }, "createTime" : ISODate("2014-05-29T01:50:35.903Z") } ], "ok" : 1 } This covers how to use partitioned collections. We hope users in the MongoDB ecosystem find this feature as useful as relational database users do. In the comments section below, feel free to leave questions and/or feedback.

June 4, 2014

by Zardosht Kasheff

· 12,509 Views

Open Session In View Design Tradeoffs

The Open Session in View (OSIV) pattern gives rise to different opinions in the Java development community. Let's go over OSIV and some of the pros and cons of this pattern. The problem The problem that OSIV solves is a mismatch between the Hibernate concept of session and it's lifecycle and the way that many server-side view technologies work. In a typical Java frontend application the service layer starts by querying some of the data needed to build the view. The remaining data needed can be lazy-loaded later, with the condition that the Hibernate session remains open - and there lies the problem. Between the moment that the service layer method finishes it's execution and the moment that the view is rendered, Hibernate has already committed the transaction and closed the session. When the view tries to lazy load the extra data that it needs, if finds the Hibernate session closed, causing a LazyInitializationException. The OSIV solution OSIV tackles this problem by ensuring that the Hibernate session is kept open all the way up to the rendering of the view - hence the name of the pattern. Because the session is kept open, no more LazyInitializationExceptions occur. The session or entity manager is kept open by means of a filter that is added to the request processing chain. In the case of JPA the OpenEntityManagerInViewFilter will create an entity manager at the beginning of the request, and then bind it to the request thread. The service layer will then be executed and the business transaction committed or rolled back, but the transaction manager will not remove the entity manager from the thread after the commit. When the view rendering starts, the transaction manager will then check if there is already an entity manager binded to the thread, and if so use it instead of creating a new one. After the request is processed, the filter will then unbind the entity manager from the thread. The end result is that the same entity manager used to commit the business transaction was kept around in the request thread, allowing the view rendering code to lazy load the needed data. Going back to the original problem Let's step back a moment and go back to the initial problem: the LazyInitializationException. Is this exception really a problem? This exception can also be seen as a warning sign of a wrongly written query in the service layer. When building a view and it's backing services, the developer knows upfront what data is needed, and can make sure that the needed data is loaded before the rendering starts. Several relation types such as one-to-many use lazy-loading by default, but that default setting can be overridden if needed at query time using the following syntax: select p FROM Person p left join fetch p.invoices This means that the lazy loading can be turned off on a case by case basis depending on the data needed by the view. OSIV in projects I've worked In projects I have worked that used OSIV, we could see via query logging that the database was getting hit with a high number of SQL queries, sometimes to the point that developers had to turn off the Hibernate SQL logging. The performance of these application was impacted, but it was kept manageable using second-level caches, and due to the fact that these where intranet-based applications with a limited number of users. Pros of OSIV The main advantage of OSIV is that it makes working with ORM and the database more transparent: Less queries need to be manually written Less awareness is required about the Hibernate session and how to solve LazyInitializationExceptions. Cons of OSIV OSIV seems to be easy to misuse and can accidentally introduce N+1 performance problems in the application. On projects I've worked OSIV did not work out well in the long-term. The alternative of writing custom queries that eager fetch data depending on the use case is manageable and turned out well in other projects I've worked. Alternatives to OSIV Besides the application-level solution of writing custom queries to pre-fetch the needed data, there are other framework-level aproaches to OSIV. The Seam Framework was built by some of the same developers as Hibernate , and solves the problem by introducing the notion of conversation. Can you let me know in the comments bellow your thoughts and experiences with OSIV, thanks for reading.

April 30, 2014

by Vasco Cavalheiro

· 19,197 Views · 3 Likes

Circuit Breaker Pattern in Apache Camel

Camel is very often used in distributed environments for accessing remote resources. Remote services may fail for various reasons and periods. For services that are temporarily unavailable and recoverable after short period of time, a retry strategy may help. But some services can fail or hang for longer period of time making the calling application unresponsive and slow. A good strategy to prevent from cascading failures and exhaustion of critical resources is the Circuit Breaker pattern described by Michael Nygard in the Release It! book. Circuit Breaker is a stateful pattern that wraps the failure-prone resource and monitors for errors. Initially the Circuit Breaker is in closed state and passes all calls to the wrapped resource. When the failures reaches a certain threshold, the circuit moves to open state where it returns error to the caller without actually calling the wrapped resource. This prevents from overloading the already failing resource. While at this state, we need a mechanism to detect whether the failures are over and start calling the protected resource. This is where the third state called half-open comes into play. This state is reached after a certain time following the last failure. At this state, the calls are passed through to the protected resource, but the result of the call is important. If the call is successful, it is assumed that the protected resource has recovered and the circuit is moved into closed state, and if the call fails, the timeout is reset, and the circuit is moved back to open state where all calls are rejected. Here is the state diagram of Circuit Breaker from Martin Fowler's post: How Circuit Breaker is implemented in Camel? Circuit Breaker is available in the latest snapshot version of Camel as a Load balancer policy. Camel Load Balancer already has policies for Round Robin, Random, Failover, etc. and now also CircuiBreaker policy. Here is an example load balancer that uses Circuit Breaker policy with threshold of 2 errors and halfOpenAfter timeout of 1 second. Notice also that this policy applies only to errors caused by MyCustomException. new RouteBuilder() { public void configure() { from("direct:start").loadBalance() .circuitBreaker(2, 1000L, MyCustomException.class) .to("mock:result"); } }; And here is the same example using Spring XML DSL: MyCustomException

April 16, 2014

by Bilgin Ibryam

· 18,443 Views · 1 Like

Be a Lazy But Productive Android Developer, Part 5: Image Loading Library

Welcome to part 5 of “Be a lazy but a productive android developer” series. If you are a lazy Android developer and looking for image loading library, which could help you to load image(s) asynchronously without writing a logic for downloading and caching images then this article is for you. This series so far: Part 1: We looked at RoboGuice, a dependency injection library by which we can reduce the boiler plate code, save time and there by achieve productivity during Android app development. Part 2: We saw and explored about Genymotion, which is a rocket speed emulator and super-fast emulator as compared to native emulator. And we can use Genymotion while developing apps and can quickly test apps and there by can achieve productivity. Part 3: We understood and explored about JSON Parsing libraries (GSON and Jackson), using which we can increase app performance, we can decrease boilerplate code and there by can optimize productivity. Part 4: We talked about Card UI and explored card library, also created a basic card and simple card list demo. In this part In this part, we are going to talk about some image libraries using which we can load image(s) asynchronously, can cache images and also can download images into the local storage. Required features for loading images Almost every android app has a need to load remote images. While loading remote images, we have to take care of below things: Image loading process must be done in background (i.e. asynchronously) to avoid blocking UI main thread. Image recycling image should be done. Image should be displayed once its loaded successfully. Images should be cached in local memory for the later use. If remote image gets failed (due to network connection or bad url or any other reasons) to load then it should be managed perfectly for avoiding duplicate requests to load the same again, instead it should load if and only if net connection is available. Memory management should be done efficiently. In short, we have to write a code to manage each and every aspects of image loading but there are some awesome libraries available, using which we can load/download image asynchronously. We just have to call the load image method and success/failure callbacks. Asynchronous image loading Consider a case where we are having 50 images and 50 titles and we try to load all the images/text into the listview, it won’t display anything until all the images get downloaded. Here Asynchronous image loading process comes in picture. Asynchronous image loading is nothing but a loading process which happens in background so that it doesn’t block main UI thread and let user to play with other loaded data on the screen. Images will be getting displayed as and when it gets downloaded from background threads. Asynchronous image loading libraries Nostra’s Universal Image loader – https://github.com/nostra13/Android-Universal-Image-Loader Picasso – http://square.github.io/picasso/ UrlImageViewHelper by Koush Volley - By Android team members @ Google Novoda’s Image loader – https://github.com/novoda/ImageLoader Let’s have a look at examples using Picasso and Universal Image loader libraries. Example 1: Nostra’s Universal Image loader Step 1: Initialize ImageLoader configuration ? public class MyApplication extends Application{ @Override public void onCreate() { // TODO Auto-generated method stub super.onCreate(); // Create global configuration and initialize ImageLoader with this configuration ImageLoaderConfiguration config = new ImageLoaderConfiguration.Builder(getApplicationContext()).build(); ImageLoader.getInstance().init(config); } } Step 2: Declare application class inside Application tag in AndroidManifest.xml file ? Step 3: Load image and display into ImageView ? ImageLoader.getInstance().displayImage(objVideo.getThumb(), holder.imgVideo); Now, Universal Image loader also provides a functionality to implement success/failure callback to check whether image loading is failed or successful. ? ImageLoader.getInstance().displayImage(photoUrl, imgView, new ImageLoadingListener() { @Override public void onLoadingStarted(String arg0, View arg1) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.VISIBLE); } @Override public void onLoadingFailed(String arg0, View arg1, FailReason arg2) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } @Override public void onLoadingComplete(String arg0, View arg1, Bitmap arg2) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } @Override public void onLoadingCancelled(String arg0, View arg1) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } }); Example 2: Picasso Image loading straight way: ? Picasso.with(context).load("http://postimg.org/image/wjidfl5pd/").into(imageView); Image re-sizing: ? Picasso.with(context) .load(imageUrl) .resize(100, 100) .centerCrop() .into(imageView) Example 3: UrlImageViewHelper library It’s an android library that sets an ImageView’s contents from a url, manages image downloading, caching, and makes your coffee too. UrlImageViewHelper will automatically download and manage all the web images and ImageViews. Duplicate urls will not be loaded into memory twice. Bitmap memory is managed by using a weak reference hash table, so as soon as the image is no longer used by you, it will be garbage collected automatically. Image loading straight way: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png"); Placeholder image when image is being downloaded: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png", R.drawable.loadingPlaceHolder); Cache images for a minute only: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png", null, 60000); Example 4: Volley library Yes Volley is a library developed and being managed by some android team members at Google, it was announced by Ficus Kirkpatrick during the last I/O. I wrote an article about Volley library 10 months back , read it and give it a try if you haven’t used it yet. Let’s look at an example of image loading using Volley. Step 1: Take a NetworkImageView inside your xml layout. ? Step 2: Define a ImageCache class Yes you are reading title perfectly, we have to define an ImageCache class for initializing ImageLoader object. ? public class BitmapLruCache extends LruCache implements ImageLoader.ImageCache { public BitmapLruCache() { this(getDefaultLruCacheSize()); } public BitmapLruCache(int sizeInKiloBytes) { super(sizeInKiloBytes); } @Override protected int sizeOf(String key, Bitmap value) { return value.getRowBytes() * value.getHeight() / 1024; } @Override public Bitmap getBitmap(String url) { return get(url); } @Override public void putBitmap(String url, Bitmap bitmap) { put(url, bitmap); } public static int getDefaultLruCacheSize() { final int maxMemory = (int) (Runtime.getRuntime().maxMemory() / 1024); final int cacheSize = maxMemory / 8; return cacheSize; } } Step 3: Create an ImageLoader object and load image Create an ImageLoader object and initialize it with ImageCache object and RequestQueue object. ? ImageLoader.ImageCache imageCache = new BitmapLruCache(); ImageLoader imageLoader = new ImageLoader(Volley.newRequestQueue(context), imageCache); Step 4: Load an image into ImageView ? NetworkImageView imgAvatar = (NetworkImageView) findViewById(R.id.imgDemo); imageView.setImageUrl(url, imageLoader); Which library to use? Can you decide which library you would use? Let us know which and what are the reasons? Selection of the library is always depends on the requirement. Let’s look at the few fact points about each library so that you would able to compare exactly and can take decision. Picasso: It’s just a one liner code to load image using Picasso. No need to initialize ImageLoader and to prepare a singleton instance of image loader. Picasso allows you to specify exact target image size. It’s useful when you have memory pressure or performance issues, you can trade off some image quality for speed. Picasso doesn’t provide a way to prepare and store thumbnails of local images. Sometimes you need to check image loading process is in which state, loading, finished execution, failed or cancelled image loading. Surprisingly It doesn’t provide a callback functionality to check any state. “fetch()” dose not pass back anything. “get()” is for synchronously read, and “load()” is for asynchronously draw a view. Universal Image loader (UIL): It’s the most popular image loading library out there. Actually, it’s based on the Fedor Vlasov’s project which was again probably a very first complete solution and also a most voted answer (for the image loading solution) on Stackoverflow. UIL library is better in documentation and even there’s a demo example which highlights almost all the features. UIL provides an easy way to download image. UIL uses builders for customization. Almost everything can be configured. UIL doesn’t not provide a way to specify image size directly you want to load into a view. It uses some rules based on the size of the view. Indirectly you can do it by mentioning ImageSize argument in the source code and bypass the view size checking. It’s not as flexible as Picasso. Volley: It’s officially by Android dev team, Google but still it’s not documented. It’s just not an image loading library only but an asynchronous networking library Developer has to define ImageCache class their self and has to initialize ImageLoader object with RequestQueue and ImageCache objects. So now I am sure now you can be able to compare libraries. Choosing library is a bit difficult talk because it always depends on the requirement and type of projects. If the project is large then you should go for Picasso or Universal Image loader. If the project is small then you can consider to use Volley librar, because Volley isn’t an image loading library only but it tries to solve a more generic solution.). I suggest you to start with Picasso. If you want more control and customization, go for UIL. Read more: http://blog.bignerdranch.com/3177-solving-the-android-image-loading-problem-volley-vs-picasso/ http://stackoverflow.com/questions/19995007/local-image-caching-solution-for-android-square-picasso-vs-universal-image-load https://plus.google.com/103583939320326217147/posts/bfAFC5YZ3mq Hope you liked this part of “Lazy android developer: Be productive” series. Till the next part, keep exploring image loading libraries mentioned above and enjoy!

April 11, 2014

by Paresh Mayani

· 64,026 Views · 2 Likes

Measuring Code Coverage by Protractor End-to-End Tests

Was just setting up new JavaScript project based on Grunt. I scaffolded the project template by Yeoman with usage of angular-fullstack generator. I decided to try MEAN stack without MongoDB for my new project (DB isn't needed). Next step was integrating Require.JS and configuring measurement of code coverage on client and server by Instanbul. When was this all done I was wondering if it is possible to measure code coverage by Protractor end-to-end testing. After quick search I found that Ryan Bridges recently released grunt-protractor-coverage plugin. Interesting coincidence. So I decided to try it and can confirm that it's working fine with mentioned stack. Configuration was smooth and Ryan fixed small issue very promptly. It's based on grunt-protractor-runner plugin. I created separate Grunt configuration file just for this purpose not to mess around with normal build. I had also problems to run 'makeReport' task of grunt-istanbul plugin for two different directories (Mocha server side code coverage measurement is using same task). So here is the Grunt flow. First we need to copy non JavaScript files into target directory. It needs to be in separate directory because JavaScript files will need to be instrumented. copy: { coverageE2E: { files: [{ expand: true, dot: true, cwd: '', dest: '/app', src: [ '*.{ico,png,txt}', '.htaccess', 'bower_components/**/*', 'images/**/*', 'fonts/**/*', 'views/**/*', 'styles/**/*', ] }] }, }, Next step is instrumentation of the code. It is needed for gathering coverage stats. Each line is decorated by special instructions that helps during measurement. Pay attention to fact that we are instrumenting server and client side code. Instrumented code is placed into target directory represented by placeholder . instrument: { files: ['server/**/*.js', 'app/scripts/**/*.js'], options: { lazy: true, basePath: '/' } }, Next we start Express from target directory. express: { options: { port: process.env.PORT || 9000 }, coverageE2E: { options: { script: '/lib/server.js', debug: true } }, }, And the protractor_coverage task of grunt-protractor-coverage plugin. Configuration should be the same as for grunt-protractor-runner plugin. protractor_coverage: { options: { configFile: 'test/protractor/protractorConf.js', // Default config file keepAlive: true, // If false, the grunt process stops when the test fails. noColor: false, // If true, protractor will not use colors in its output. coverageDir: '', args: {} }, chrome: { options: { args: { baseUrl: 'http://localhost:3000/', // Arguments passed to the command 'browser': 'chrome' } } }, }, Last step is generation of coverage report. makeReport: { src: '/*.json', options: { type: 'html', dir: '/reports', print: 'detail' } }, Finally, this is grunt task gathering all steps. grunt.registerTask('default', [ 'clean:coverageE2E', 'copy:coverageE2E', 'instrument', 'express:coverageE2E', 'protractor_coverage:chrome', 'makeReport' ]); EDIT: Notice that following Github link was changed to branch, because project structure was significantly changed: Source code for this project can be found on GitHub. To run this Grunt configuration file: grunt --gruntfile Gruntfile-e2e.js For running end to end Protractor test you have to have webdriver-manager running. See Protractor documentation.

April 9, 2014

by Lubos Krnac

· 17,738 Views · 1 Like

How to Use NodeManager to Control WebLogic Servers

In my previous post, you have seen how we can start a WebLogic admin and multiple managed servers. One downside with that instruction is that those processes will start in foreground and the STDOUT are printed on terminal. If you intended to run these severs as background services, you might want to try the WebLogic node manager wlscontrol.sh tool. I will show you how you can get Node Manager started here. The easiest way is still to create the domain directory with the admin server running temporary and then create all your servers through the /console application as described in last post. Once you have these created, then you may shut down all these processes and start it with Node Manager. 1. cd $WL_HOME/server/bin && startNodeManager.sh & 3. $WL_HOME/common/bin/wlscontrol.sh -d mydomain -r $HOME/domains/mydomain -c -f startWebLogic.sh -s myserver START 4. $WL_HOME/common/bin/wlscontrol.sh -d mydomain -r $HOME/domains/mydomain -c -f startManagedWebLogic.sh -s appserver1 START The first step above is to start and run your Node Manager. It is recommended you run this as full daemon service so even OS reboot can restart itself. But for this demo purpose, you can just run it and send to background. Using the Node Manager we can then start the admin in step 2, and then to start the managed server on step 3. The NodeManager can start not only just the WebLogic server for you, but it can also monitor them and automatically restart them if they were terminated for any reasons. If you want to shutdown the server manually, you may use this command using Node Manager as well: $WL_HOME/common/bin/wlscontrol.sh -d mydomain -s appserver1 KILL The Node Manager can also be used to start servers remotely through SSH on multiple machines. Using this tool effectively can help managing your servers across your network. You may read more details here: http://docs.oracle.com/cd/E23943_01/web.1111/e13740/toc.htm TIPS1: If there is problem when starting server, you may wnat to look into the log files. One log file is the/servers//logs/.out of the server you trying to start. Or you can look into the Node Manager log itself at $WL_HOME/common/nodemanager/nodemanager.log TIPS2: You add startup JVM arguments to each server starting with Node Manager. You need to create a file under /servers//data/nodemanager/startup.properties and add this key value pair:Arguments = -Dmyapp=/foo/bar TIPS3: If you want to explore Windows version of NodeManager, you may want to start NodeManager without native library to save yourself some trouble. Try adding NativeVersionEnabled=false to$WL_HOME/common/nodemanager/nodemanager.properties file.

March 24, 2014

by Zemian Deng

· 14,293 Views

Time Series Feature Design: The Consensus has dRafted a Decision

So, after reaching the conclusion that replication is going to be hard, I went back to the office and discussed those challenges and was in general pretty annoyed by it. Then Michael made a really interesting suggestion. Why not put it on RAFT? And once he explained what he meant, I really couldn’t hold my excitement. We now have a major feature for 4.0. But before I get excited about that (we’ll only be able to actually start working on that in a few months, anyway), let us talk about what the actual suggestion was. Raft is a consensus algorithm. It allows a distributed set of computers to arrive into a mutually agreed upon set of sequential log records. Hm… I wonder where else we can find sequential log records, and yes, I am looking at you Voron.Journal. The basic idea is that we can take the concept of log shipping, but instead of having a single master/slave relationship, we change things so we can put Raft in the middle. When committing a transaction, we’ll hold off committing the transaction until we have a Raft consensus that it should be committed. The advantage here is that we won’t be constrained any longer by the master/slave issue. If there is a server down, we can still process requests (maybe need to elect a new cluster leader, but that is about it). That means that from an architectural standpoint, we’ll have the ability to process write requests for any quorum (N/2+1). That is a pretty standard requirement for distributed databases, so that is perfectly fine. That is a pretty awesome thing to have, to be honest, and more importantly, this is happening at the low level storage layer. That means that we can apply this behavior not just to a single database solution, but to many database solutions. I’m pretty excited about this.

March 19, 2014

by Oren Eini

· 2,186 Views

How to "Backcast" a Time Series in R

Sometimes it is useful to “backcast” a time series — that is, forecast in reverse time. Although there are no in-built R functions to do this, it is very easy to implement. Suppose x is our time series and we want to backcast for periods. Here is some code that should work for most univariate time series. The example is non-seasonal, but the code will also work with seasonal data. library(forecast) x <- WWWusage h <- 20 f <- frequency(x) # Reverse time revx <- ts(rev(x), frequency=f) # Forecast fc <- forecast(auto.arima(revx), h) plot(fc) # Reverse time again fc$mean <- ts(rev(fc$mean),end=tsp(x)[1] - 1/f, frequency=f) fc$upper <- fc$upper[h:1,] fc$lower <- fc$lower[h:1,] fc$x <- x # Plot result plot(fc, xlim=c(tsp(x)[1]-h/f, tsp(x)[2]))

February 28, 2014

by Rob J Hyndman

· 5,763 Views

Voron & Time Series Data: Getting Real Data Outputs

So far, we have just put the data in and out. And we have had a pretty good track record doing so. However, what do we do with the data now that we have it? As you can expect, we need to read it out. Usually by specific date ranges. The interesting thing is that we usually are not interested in just a single channel, we care about multiple channels. And for fun, those channel might be synchronized or not. An example of the first might be the current speed and the current engine temperature in a car. They are generally share the exact same timestamps. An example of out of sync is when you have a sensor on a rooftop measuring rainfall, and another sensor in the sewer measuring water flow rates. (Again, thanks to Dan for helping me with the domain). This is interesting, because it present quite a few interesting problems: We need to merge different streams into a unified view. We need to handle both matching and non matching sequences. We need to handle erroneous data, what happens when we have two reading for the same time for the same sensor? Yes, that shouldn’t happen, but it does. I solved this with the following API: public class RangeEntry { public DateTime Timestamp; public double?[] Values; } IEnumerable results = dts.ScanRanges(DateTime.MinValue, DateTime.MaxValue, new[] { "6febe146-e893-4f64-89f8-527f2dbaae9b", "707dcb42-c551-4f1a-9203-e4b0852516cf", "74d5bee8-9a7b-4d4e-bd85-5f92dfc22edb", "7ae29feb-6178-4930-bc38-a90adf99cfd3", }); This API gives me the results in the time order, with the same positions as the ids requested for the values. With nulls if there isn’t a value matching the value from that time in that particular sensor channel. The actual implementation relies on this method: IEnumerable ScanRange(DateTime start, DateTime end, string id) All this does it provide the entries all the entries in a particular date range, for a particular channel. Let us see how we implement multi channel scanning on top of this: private class PendingEnumerator { public IEnumerator Enumerator; public int Index; } private class PendingEnumerators { private readonly SortedDictionary> _values = new SortedDictionary>(); public void Enqueue(PendingEnumerator entry) { List list; var dateTime = entry.Enumerator.Current.Timestamp; if (_values.TryGetValue(dateTime, out list) == false) { _values.Add(dateTime, list = new List()); } list.Add(entry); } public bool IsEmpty { get { return _values.Count == 0; } } public List Dequeue() { if (_values.Count == 0) return new List(); var kvp = _values.First(); _values.Remove(kvp.Key); return kvp.Value; } } public IEnumerable ScanRanges(DateTime start, DateTime end, string[] ids) { if (ids == null || ids.Length == 0) yield break; var pending = new PendingEnumerators(); for (int i = 0; i < ids.Length; i++) { var enumerator = ScanRange(start, end, ids[i]).GetEnumerator(); if(enumerator.MoveNext() == false) continue; pending.Enqueue(new PendingEnumerator { Enumerator = enumerator, Index = i }); } var result = new RangeEntry { Values = new double?[ids.Length] }; while (pending.IsEmpty == false) { Array.Clear(result.Values,0,result.Values.Length); var entries = pending.Dequeue(); if (entries.Count == 0) break; foreach (var entry in entries) { var current = entry.Enumerator.Current; result.Timestamp = current.Timestamp; result.Values[entry.Index] = current.Value; if(entry.Enumerator.MoveNext()) pending.Enqueue(entry); } yield return result; } } We are getting a single entry from each channel into the pending enumerators. Then, we collate all the entries that share the same time into a single entry. We use the Index property to track the actual expected index of the entry in the output. And we handle duplicate times in the same channel by outputting multiple entries. Testing this on my 1.1 million records data set, we can get 185 thousands records back in 0.15 seconds.

February 25, 2014

by Oren Eini

· 5,414 Views

The best code coverage for Scala

The best code coverage metric for Scala is statement coverage. Simple as that. It suits the typical programming style in Scala best. Scala is a chameleon and it can look like anything you wish, but very often more statements are written on a single line and conditional “if” statements are used rarely. In other words, line coverage and branch coverage metrics are not helpful. Java tools Scala runs on JVM and therefore many existing tools for Java can be used for Scala as well. But for code coverage it’s a mistake to do so. One wrong option is to use tools that measure coverage looking at bytecode like JaCoCo. Even though it gives you a coverage rate number, JaCoCo knows nothing about Scala and therefore it doesn’t tell you which piece of code you forgot to cover. Another misfortune are tools that natively support line and branch coverage metrics only. Cobertura is a standard in Java world and XML coverage report that it generates is supported by many tools. Some Scala code coverage tools decided to use Cobertura XML report format because of its popularity. Sadly, it doesn’t support statement coverage. Statement coverage Why? Because a typical Scala statement looks like this (a single line of code): def f(l: List[Int]) = l.filter(_ > 0).filter(_ < 42).takeWhile(_ != 3).foreach(println(_)) Neither line nor branch coverage works here. When would you consider this single line as being covered by a test? If at least one statement of that line has been called? Maybe. Or all of them? Also maybe. Where is a branch? Yes, there are statements that are executed conditionally, but the decision logic is hidden in internal implementation of List. Branch coverage tools are helpless, because they don't see this kind of conditional execution. What we need to know instead is whether individual statements like _ > 0, _ < 42 or println(_) have beed executed by an automated test. This is the statement coverage. Scoverage to the rescue! Luckily there is a tool named Scoverage. It is a plugin for Scala compiler. There is also a plugin for SBT. It does exactly what we need. It generates HTML report and also own XML report containing detailed information about covered statements. Scoverage plugin for SonarQube Recently I have implemented a plugin for Sonar 4 so that statement coverage measurement can become an integral part of your team's continuous integration process and a required quality standard. It allows you to review overall project statement coverage as well as dig deeper into sub-modules, directories and source code files to see uncovered statements. Project dashboard with Scoverage plugin: Multi-module project overview: Columns with statement coverage, total number of statements and number of covered statements: Source code markup with covered and uncovered lines:

February 12, 2014

by Rado Buranský

· 34,098 Views · 1 Like

To ServiceMix or Not to ServiceMix

This morning an interesting topic was posted to the Apache ServiceMix user forum, asking the question: To ServiceMix or not ServiceMix. In my mind the short answer is: NO Guillaume Nodet one of the key architects and long time committer on Apache ServiceMix already had his mind set 3 years ago when he wrong this blog post - Thoughts about ServiceMix. What has happened on the ServiceMix project was that the ServiceMix kernel was pulled out of ServiceMix into its own project - Apache Karaf. That happened in spring 2009, which Guillaume also blogged about. So is all that bad? No its IMHO all great. In fact having the kernel as a separate project, and Camel and CXF as the integration and WS/RS frameworks, would allow the ServiceMix team to focus on building the ESB that truly had value-add. But that did not happen. ServiceMix did not create a cross product security model, web console, audit and trace tooling, clustering, governance, service registry, and much more that people were looking for in an ESB (or related to a SOA suite). There were only small pieces of it, but never really baked well into the project. That said its not too late. I think the ServiceMix project is dying, but if a lot of people in the community step up, and contribute and work on these things, then it can bring value to some users. But I seriously doubt this will happen. PS: 6 years ago I was working as a consultant and looked at the next integration platform for a major Danish organization, and we looked at ServiceMix back then and dismissed it due its JBI nature, and the new OSGi based architecture was only just started. And frankly it has taken a long long time to mature Apache Karaf / Felix / Aries and the other pieces in OSGi to what they are today to offer a stable and sound platform for users to build their integration applications. That was not the case 4-6 years ago. Okay No to ServiceMix - what are my options then? So what should use you instead of ServiceMix? Well in my mind you have at least these two options. 1) Use Apache Karaf and add the pieces you need, such as Camel, CXF, ActiveMQ and build your own ESB. These individual projects have regular releases, and you can upgrade as you need. The ServiceMix project only has the JBI components in additional, that you should NOT use. Only legacy users that got on the old ServiceMix 3.x wagon may need to use this in a graceful upgrade from JBI to Karaf based containers. 2) Take a look at fabric8. IMHO fabric8 is all that value-add the ServiceMix project did not create, and a lot more. James Strachan, just blogged today about some of his thoughts on fabric8, JBoss Fuse, and Karaf. I encourage you to take a read. For example he talks about how fabric becomes poly container, so you have a much wider choice of which containers/JVM to run your integration applications. OSGi is no longer a requirement. (IMHO that is very very existing and potentially a changer). I encourage you to check out fabric8 web-site, and also read the overview and motivation sections of the documentation. And then check out some of the videos. After the upcoming JBoss Fuse 6.1 release, the Fuse team at Red Hat will have more time and focus to bring the documentation at fabric8 up to date covering all the functionality we have (there is a lot more), and as well bring out a 1.0 community released using pure community releases. This gives end users a 100% free to use out of the box release. And users looking for a commercial release can then use JBoss Fuse. Best of both worlds. Summary Okay back to the question - to ServiceMix or not. Then NO. Innovation happens outside ServiceMix, and also more and more outside Apache. If you have thoughts then you can share those in comments to this blog, or better yet, get involved in the discussion forum at the ServiceMix user forum. PPS: The thoughts on this blog is mine alone, and are not any official words from my employer.

February 12, 2014

by Claus Ibsen

· 16,977 Views