Deployment Resources

The Latest Deployment Topics

There are several good ways to upload content to an S3 bucket in the Java world – in this article we’ll look at what the jclouds library provides for this purpose. To use jclouds – specifically the APIs discussed in this article, this simple Maven dependency should be added to the pom of the project: org.jclouds jclouds-allblobstore 1.5.9 1. Uploading to Amazon S3 The first step, in order to access any of these APIs, is to create a BlobStoreContext: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(BlobStoreContext.class); This represents the entry-point to a general key-value storage service, such as Amazon S3 – but not limited to it. For the more specific S3 only implementation, the context can be created similarly: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(S3BlobStoreContext.class); And even more specifically: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); When the authenticated context is no longer needed, closing it is required to release all resources – threads and connections – associated to it. 2. The four S3 APIs of jclouds The jclouds library provides four different APIs to upload content to S3 bucket, ranging from simple but inflexible to complex and powerful, all obtained via the BlobStoreContext. Let’s start with the simplest. 2.1. Upload via the Map API The easiest way jclouds can be used to interact with an S3 bucket is by representing that bucket as a Map. The API is obtained from the context: InputStreamMap bucket = context.createInputStreamMap("bucketName"); Then, to upload a simple HTML file: bucket.putString("index1.html", "hello world1"); The InputStreamMap API exposes several other types of PUT operations – files, raw bytes – both for single and bulk. A simple integration test can be used as an example: @Test public void whenFileIsUploadedToS3WithMapApi_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); InputStreamMap bucket = context.createInputStreamMap("bucketName"); bucket.putString("index1.html", "hello world1"); context.close(); } 2.2. Upload via BlobMap Using the simple Map API is straightforward but ultimately limited – for example, there is no way to pass in metadata about the content being uploaded. When more flexibility and customization is necessary, this simplified approach to uploading data to S3 via a Map is no longer enough. The next API we’ll look at is the Blob Map API – this is obtained from the context: BlobMap bucket = context.createBlobMap("bucketName"); The API allows the client to access more lower level details, such as Content-Length, Content-Type, Content-Encoding, eTag hash and others; to upload new content in the bucket: Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); The API also allows setting a variety of payloads on the create request. A simple integration test for uploading a basic HTML file to S3 via the Blob Map API: @Test public void whenFileIsUploadedToS3WithBlobMap_thenNoExceptions() throws IOException { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobMap bucket = context.createBlobMap("bucketName"); Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); bucket.put(blob.getMetadata().getName(), blob); context.close(); } 2.3. Upload via BlobStore The previous APIs had no way to upload content using multipart upload – this makes them ill suited when working with large files. This limitation is addressed by the next API we’re going to look at – the synchronous BlobStore API. This is obtained from the context: BlobStore blobStore = context.getBlobStore(); To use the multipart support and upload a file to S3: Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); The payload builder is the same one that was being used by the BlobMap API, so the same flexibility in specifying lower level metadata information about the blob is available here. The difference is the PutOptions supported by the PUT operation of the API – namely the multipart support. The previous integration test now has multipart enabled: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); context.close(); } 2.4. Upload via AsyncBlobStore While the previous BlobStore API was synchronous, there is also an asynchronous API for BlobStore – AsyncBlobStore. The API is similarly obtained from the context: AsyncBlobStore blobStore = context.getAsyncBlobStore(); The only difference between the two is that the async API is returning ListenableFuture for the PUT asynchronous operation: Blob blob = blobStore.blobBuilder("index4.html"). .payload("hello world4").build(); blobStore.putBlob("bucketName", blob).get(); The integration test displaying this operation is similar to the synchronous one: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index4.html"). payload("hello world4").contentType("text/html").build(); Future putOp = blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); putOp.get(); context.close(); } 3. Conclusion In this article, we analysed the four APIs that the jclouds library provides to upload content to Amazon S3. These four APIs are generic and they work with other key-value storage services as well – such as Microsoft Azure Storage for example. In the next article we’ll look at the Amazon specific S3 API available in jclouds – the AWSS3Client. We’ll implement the operation of uploading a large file, dynamically calculate the optimal number of parts for any given file, and perform the upload of all parts in parallel. P.S. You might dig following me on Twitter.

April 18, 2013

by Eugen Paraschiv

· 8,876 Views · 1 Like

Grails Goodness: Using Wrapper for Running Grails Commands Without Grails Installation

Since Grails 2.1 we can create a Grails wrapper. The wrapper allows developer to run Grails commands in a project without installing Grails first. The wrapper concept is also available in other projects from the Groovy ecosystem like Gradle or Griffon. A wrapper is a shell script for Windows, OSX or Linux named grailsw.bat or grailsw and a couple of JAR files to automatically download a specific version of Grails. We can check in the shell scripts and supporting files into a version control system and make it part of the project. Developers working on the project simply check out the code and execute the shell script. If there is no Grails installation available then it will be downloaded. To create the shell scripts and supporting files someone on the project must run the wrapper command for the first time. This developer must have a valid Grails installation. The files that are generated can then be added to version control and from then one developers can use the grailsw or grailsw.bat shell scripts. $ grails wrapper | Wrapper installed successfully $ In the root of the project we have two new files grailsw and grailsw.bat. Windows users can uss grailsw.bat and on other operating systems we use grailsw. Also a new directory wrapper is created with three files: grails-wrapper-runtime-2.2.0.jar grails-wrapper.properties springloaded-core-1.1.1.jar When we run the grailsw or grailsw.bat scripts for the first time we see how Grails is downloaded and installed into the $USER_HOME/.grails/wrapper directory. The following output shows that the file is downloaded and extracted when we didn't run the grailsw script before: $ ./grailsw --version Downloading http://dist.springframework.org.s3.amazonaws.com/release/GRAILS/grails-2.2.0.zip to /Users/mrhaki/.grails/wrapper/grails-2.2.0-download.zip ..................................................................................... ................................................................ Extracting /Users/mrhaki/.grails/wrapper/grails-2.2.0-download.zip to /Users/mrhaki/.grails/wrapper/2.2.0 Grails version: 2.2.0 When we want to use a new version of Grails one of the developers needs to run to run $ grails upgrade followed by $ grails wrapper with the new Grails version. Notice this developer needs to have a locally installed Grails installation of the version we want to create a wrapper for. The newly generated files can be checked in to version control and all developers on the project will have the new Grails version when they run the grails or grailsw.bat shell scripts. $ ./grailsw --version Downloading http://dist.springframework.org.s3.amazonaws.com/release/GRAILS/grails-2.2.1.zip to /Users/mrhaki/.grails/wrapper/grails-2.2.1-download.zip ..................................................................................... ... ................................................................ Extracting /Users/mrhaki/.grails/wrapper/grails-2.2.1-download.zip to /Users/mrhaki/.grails/wrapper/2.2.1 Grails version: 2.2.1 We can change the download location of Grails to for example a company intranet URL. In the wrapper/ directory we see the file grails-wrapper.properties. The file has one property wrapper.dist.url, which by default refers to http://dist.springframework.org.s3.amazonaws.com/release/GRAILS/. We can change this to another URL, add the change to version control so other developers will get the change automatically. And when the grailsw shell script is executed the download location will be another URL. To set a different download URL when generating the wrapper we can use the command-line option --distributionUrl: $ grails wrapper --distributionUrl=http://company.intranet/downloads/grails-releases/ If we don't like the default name for the directory to store the supporting files we can use the command-line option --wrapperDir. The files are then stored in the given directory and the grailsw and grailsw.bat shell scripts will contain the given directory name. Written with Grails 2.2.0 and 2.2.1

April 16, 2013

by Hubert Klein Ikkink

· 5,854 Views

Introduction to SmartSVN

SmartSVN is a powerful and easy-to-use graphical client for Apache Subversion. There are several clients for Subversion, but here are just a few reasons you should try SmartSVN: It’s cross-platform – SmartSVN runs on Windows, Linux and Mac OS X, so you can continue using the operating system (OS) that works the best for you. It can also be integrated into your OS, via Mac’s Finder Integration or Windows Shell. Everything you need, out of the box – SmartSVN comes complete with all the tools you need to manage your Subversion projects: Conflict solver – this feature combines the freedom of a general, three-way-merge with the ability to detect and resolve any conflicts that occur during the development lifecycle. File compare – this allows you to make inner-line comparisons and directly edit the compared files. Built-in SSH client – allows users to access servers using the SSH protocol. This security-conscious protocol encrypts every piece of communication between the client and the server, for additional protection. A complete view of your project at a glance – the most important files (such as conflicted, modified or missing files) are placed at the top of the file list. SmartSVN also highlights which directories contain local modifications, which directories have been changed in the repository, and whether individual files have been modified locally or in the central repo. This makes it easy to get a quick overview of the state of your project. Fully customizable – maximize productivity by fine-tuning your SmartSVN installation to suit your particular needs: Change keyboard shortcuts, write your own plugin with the SmartSVN API, group revisions to personalize your display, create Change Sets, and alter the context menus and toolbars to suit you. You can learn more about customizing SmartSVN at our ‘5 Ways to Customize SmartSVN’ blog post. Comprehensive bug tracker support – Trac and JIRA are both fully supported. Multitude of support options – SmartSVN users have access to a range of free support, from refcards to blogsand documentation, the SmartSVN forum and a Twitter account maintained by our open source experts. If you need extra support with your SmartSVN installation, expert email support is included with SmartSVN Professional licenses. Want to learn more about SmartSVN? On April 18th, WANdisco will be be holding a free ‘Introduction to SmartSVN’ webinar covering everything you need to get off to a great start with this popular client: Repository basics Checkouts, working folders, editing files and commits Reporting on changes Simple branching Simple merging This webinar is free so register now.

April 13, 2013

by Jessica Thornsby

· 6,613 Views

Android Tutorial: Using the ViewPager

I've put together a quick tutorial that gets a ViewPager up and running (with the Support Library), in just a few steps.

April 10, 2013

by Isaac Taylor

· 240,184 Views · 5 Likes

Capture a Signature on iOS

Originally authored by Jason Harwig The Square Engineering Blog has a great article on Smoother Signatures for Android, but I didn't find anything specifically about iOS. So, what is the best way to capture a users signature on an iOS device? Although I didn't find any articles on signature capture, there are good implementations on the App Store. My target user experience was the iPad application Paper by 53, a drawing application with beautiful and responsive brushes. All code is available in the Github repository: SignatureDemo. Connecting the Dots The simplest approach is to capture the touches and connect them with straight lines. In the initializer of a UIView subclass, create the path and gesture recognizer to capture touch events. // Create a path to connect lines path = [UIBezierPath bezierPath]; // Capture touches UIPanGestureRecognizer *pan = [[UIPanGestureRecognizer alloc] initWithTarget:self action:@selector(pan:)]; pan.maximumNumberOfTouches = pan.minimumNumberOfTouches = 1; [self addGestureRecognizer:pan]; Capture the pan events into a bézier path by connecting the points with lines. - (void)pan:(UIPanGestureRecognizer *)pan { CGPoint currentPoint = [pan locationInView:self]; if (pan.state == UIGestureRecognizerStateBegan) { [path moveToPoint:currentPoint]; } else if (pan.state == UIGestureRecognizerStateChanged) [path addLineToPoint:currentPoint]; [self setNeedsDisplay]; } Stroke the path - (void)drawRect:(CGRect)rect { [[UIColor blackColor] setStroke]; [path stroke]; } An example "J" character rendered using this technique reveals some issues. At slow velocities iOS captures enough touch resolution that the lines aren't noticeable, but faster movement shows large gaps between touches that accentuates the lines. The 2012 Apple Developer Conference included a session Building Advanced Gesture Recognizers that addresses this issue using math. Quadratic Bézier Curves Instead of connected lines between the touch points, quadratic bézier curves connect the points using the technique discussed in the aforementioned WWDC session (Seek to 42:15.) Connect the touch points with a quadratic curve using the touch points as the control points and the mid points as start and end. Adding quadratic curves to the previous code requires the storing the previous touch point, so add an instance variable for that. CGPoint previousPoint; Create a function to calculate the midpoint of two points. static CGPoint midpoint(CGPoint p0, CGPoint p1) { return (CGPoint) { (p0.x + p1.x) / 2.0, (p0.y + p1.y) / 2.0 }; } Update the pan gesture handler to add quadratic curves instead of straight lines - (void)pan:(UIPanGestureRecognizer *)pan { CGPoint currentPoint = [pan locationInView:self]; CGPoint midPoint = midpoint(previousPoint, currentPoint); if (pan.state == UIGestureRecognizerStateBegan) { [path moveToPoint:currentPoint]; } else if (pan.state == UIGestureRecognizerStateChanged) { [path addQuadCurveToPoint:midPoint controlPoint:previousPoint]; } previousPoint = currentPoint; [self setNeedsDisplay]; } Not much code and already we see a big difference. The touch points are no longer visible, but it looks a little bland when drawing a signature. Every curve is the same width, which doesn't match the physics of a real pen. Variable Stroke Width The width can be varied based on the touch velocity to create a more natural stroke. The UIPanGestureRecognizer already includes a method called velocityInView: that returns the current touch velocity as a CGPoint. To render a stroke of varying width, I switched to OpenGL ES and a technique called tesselation to convert the stroke into triangles – specifically, triangle strips (OpenGL has support for drawing lines, but iOS doesn't support variable line widths with smoothing.) The quadratic points along a curve also need to be calculated, but is beyond the scope of this article. Check the source on github for details. Given two points, a perpendicular vector is calculated and its magnitude set to half the current thickness. Given the nature of GL_TRIANGLE_STRIP only two points are needed to create the next rectangle segment with two triangles. Here is an example of the final output using quadratic bézier curves, and velocity based stroke thickness creating a visually appealing and natural signature.

April 8, 2013

by Scott Leberknight

· 20,828 Views

94 Expert Tips for Agile Teams

Here are 10 articles from 10 different authors that provide valuable advice for Scrum teams. These articles are in no particular order, so feel free to skim down the list and start with the ones that are most relevant to you. 10 Tips for a Great Daily Scrum Meeting by Platinum Edge – The daily Scrum meeting is a powerful tool that keeps your project moving. At the same time, it is also easy for the meetings to not bring any added value. Tips for Effective Backlog Grooming by Charles Bradley – Are you wasting time in your Sprint Planning Meetings? Increase the value of your team’s Sprint Planning Meetings by grooming your Product Backlog. Yoda’s top 10 tips for a new Scrum Master by Nigel Steane – As a new Scrum Master, you face unfamiliar challenges and your success is very much based on your ability to utilise coaching and soft skills to gently guide your team and colleagues. Top ten tips for distributed Scrum team teleconferences By Jon Archer – After acting as a Scrum Master for several months on a distributed team with people in six different locations, three different countries, learn ten tips to help get past those inevitable awkward silences. 10 tips for adopting Scrum to save your project by Matthew Hodgson – Are you interested in adopting Scrum for your next project? Here are 10 tips from his experience with moving a number of projects from their existing project management frameworks to Scrum. Five Tips for Impediment Resolution with Scrum by Stefan Roock – Impediments can slow down or even halt the progress of an otherwise well-functioning Scrum team. Take a look at the most common challenges that crop up on teams and what steps you can take to resolve them. 10 Tips for Succeeding with Enterprise Agile Development by Tools Journal – Many enterprises are experimenting with agile development approaches like Scrum, Kanban, Lean, and XP hoping that introducing a new development approach will help. Yet, agile development has struggled to achieve critical mass in large enterprises. 6 Tips for Good Scrum by Martin Harris – If you are doing these 6 tips, then you are doing very well and are likely to get better over time. 9 Tips for Creating a Good Sprint Backlog by Luciano Felix – Giving attention to the sprint backlog creation process is fundamental to the team’s understanding of what should be done and how to better plan during the sprint. 7 Tips for a More Effective Daily Scrum by Richard Lawrence – The main purpose of the Daily Scrum is for team members to make and follow-up on commitments to one another that work towards the team’s shared sprint commitment. Here are seven ways to get your Daily Scrum back on focus If your it has become unfocused, too long, or otherwise ineffective. If you have any other good articles related to agile, please share them in the comments. Thanks.

April 5, 2013

by Hamid Shojaee

· 15,916 Views

Configuring Apache SolrCloud on Amazon VPC

We are going to construct an Apache SolrCloud (4.1) with 12 node EC2 instance(s) inside Amazon VPC in this post. Since the search data stored inside the SolrCloud is critical, we are going to build High availability at Solr Node level as well as AZ level. This setup will be done inside private subnet of Amazon VPC and will leverage 3 Availability Zones of the Amazon EC2 Region. Deployment architecture of the setup is given below: A small brief about setup: 3 Zookeepers will be deployed on 3 Availability Zones. ZK EC2 instances will be deployed on the Private subnet of the Amazon VPC. 3 Solr Shard EC2 instances will be deployed on Private subnet of Availability Zone 1 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 2 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 3 inside Amazon VPC. EBS optimized + PIOPS EC2 instances can be used for Solr EC2 Nodes To know more about SolrCloud Deployment best practices on Amazon VPC, Refer article: http://harish11g.blogspot.in/2013/03/Apache-Solr-cloud-on-Amazon-EC2-AWS-VPC-implementation-deployment.html Step 1: Creating Virtual Private Cloud on AWS Create a VPC with Public and Private Subnets. Assume the Load balancer and Web/App Servers can reside on the public subnet and Apache Solr Cloud will reside on the private subnet of the VPC. Step 2: Assigning the IP for the Subnets Create the subnet with its IP range. Chose the Availability zone for this subnet. Step 3: Multiple Subnets on Multiple AZ’s Create multiple subnets in Multiple AZ for building a Highly available setup for SolCloud Step 4: Install Java for Zookeeper & Solr Amazon Linux is chosen as the EC2 OS variant. Execute the following instructions on the respective EC2 nodes after their launch. EC2 instances should be launched in Multi-AZ in Multiple VPC Private Subnets. Solr uses Zookeeper as the cluster configuration and coordinator. Zookeeper is a distributed file system containing information about all the Solr Nodes. Solrconfig.xml, Schema.xml etc are stored in the repository.We have used Oracle-Sun Java over OpenJDK “sudo -s” “cd /opt” “wget --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk-7u3-download-1501626.html;" http://download.oracle.com/otn-pub/java/jdk/7u13-b20/jdk-7u13-linux-x64.rpm” “mv jdk-7u10-linux-x64.rpm?AuthParam=1357217677_76ec3d8d9a3644f4b9ec1ea79e1fcf33 jdk-7u10-linux-x64.rpm jdk-7u10-linux-x64.rpm” “sudo rpm -ivh jdk-7u10-linux-x64.rpm” “alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_10/jre/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jdk1.7.0_10/jre/bin/javaws 20000” “alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_10/bin/javac 20000” “alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_10/bin/jar 20000” “alternatives --install /usr/bin/java java /usr/java/jre1.7.0_10/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jre1.7.0_10/bin/javaws 20000” “alternatives --configure java” Add JAVA_HOME in .bash_profile: “vim ~/.bash_profile” export JAVA_HOME="/usr/java/jdk1.7.0_09" export PATH=$PATH:$JAVA_HOME/bin Restart the instance. “init 6” Check the version of Java installed using “java -version” command Step 5: Configure the ZooKeeper (v3.4.5) Ensemble: Since single Zookeeper is not ideal for a large Solr cluster (because of SPOF), it is recommended to configure multiple Zookeepers in concert as an ensemble .In this step we will install and configure 3 ZooKeeper EC2 nodes spanning across 3 different Availability Zones in respective Private Subnets inside a VPC.Zookeeper will be configured on Amazon Linux. “sudo yum update” “sudo -s” “ cd /opt” “wget http://apache.techartifact.com/mirror/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz” “tar -xzvf zookeeper-3.4.5.tar.gz” “rm zookeeper-3.4.5.tar.gz” “cd zookeeper-3.4.5” “cp conf/zoo_sample.cfg conf/zoo.cfg” Add the following lines in zoo.cfg “vim conf/zoo.cfg” dataDir=/data server.1=[zk-server01-ip]:2888:3888 server.2=[zk-server02-ip]:2888:3888 server.3=[zk-server03-ip]:2888:3888 “cd /opt/zookeeper/data” “vim myid” 1 or 2 or 3 respectively on each ZooKeeper EC2 instances in Multi-AZ #Starting ZooKeeper Program. “bin/zkServer.sh start” Follow the above steps in all the ZooKeeper servers. ReferClustered (Multi-Server) SetupandConfiguration Parameters for understandingquorum_port,leader_election_port and the filemyid. Every ZooKeeper node needs to know about every other ZK EC2 node in the ensemble, and a majority of EC2’s (called a Quorum) are needed to provide the service. Make sure the VPC IP of all the Zookeepers are given in every ZK node, like the one in following command. server.1=:: server.2=:: server.3=:: Step 6: Configuring Solr 4.1 EC2 node In this step we will install and configure 3 Apache Solr4.1 Shard EC2 instances in a single Amazon AZ and 2 Solr Replicas in another AZ in their respective Private subnets. Please note that we have to specify all the ZooKeeper (ZK) hosts on every Solr instance as below. Note: Solr gets comes with jetty in default, it is suggested to use tomcat for production nodes. Perform the following after launching EC2 instances in Multi-AZ in Multiple VPC Private Subnets. “sudo -s” “yum update” “cd /opt” “wget http://apache.techartifact.com/mirror/lucene/solr/4.1.0/apache-solr-4.1.0.tgz” “tar -xzvf apache-solr-4.1.0.tgz” “rm -f apache-solr-4.1.0.tgz” On Solr Shard/Replica Instances: “cd /opt/apache-solr-4.0.0/example/” “vim /opt/apache-solr-4.0.0/example/solr/collection1/conf/solrconfig.xml” Change /var/data/solr to /data Starting Solr4.1 Shard/Replica Java Program. “java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=SolrCloud4.1-Conf -DnumShards=3 -DzkHost=[zk-server01-ip]:2181,[zk-server02-ip]:2181,[zk-server03-ip]:2181 -jar start.jar “java -DzkHost= DzkHost=:,:,: -jar start.jar” -DnumShards: the number of shards that will be present. Note that once set, this number cannot be increased or decreased without re-indexing the entire data set. (Dynamically changing the number of shards is part of the Solr roadmap!) -DzkHost: a comma-separated list of ZooKeeper servers. -Dbootstrap_confdir, -Dcollection.configName: these parameters are specified only when starting up the first Solr instance. This will enable the transfer of configuration files to ZooKeeper. Subsequent Solr instances need to just point to the ZooKeeper ensemble. The above command with –DnumShards=3 specifies that it is a 3-shard cluster. The first Solr EC2 node automatically becomes shard1 and the second Solr EC2 node automatically becomes shard2 …. What happens when we launch fourth Solr instance in this cluster? Since it’s a 3-shard cluster, the fourth Solr EC2 node automatically becomes a replica of shard1 and the fifth Solr EC2 node becomes a replica of shard2. Step 7: AWS Security Group TCP Ports to be enabled: Configure the following TCP ports on the AWS security group to allow access between Solr and ZK nodes deployed in Multiple AZ. Solr Shards/Replicas will connect to ZK through TCP Port 2181 Solr Web Interface with Jetty container through TCP Port 8983 Solr Web Interface with Tomcat container through TCP Port 8080 Every instance that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. We can accomplish this with the series of lines of the form server.id=host:port:port For example, server.1=[vpc-ip]:2888:3888 server.2=[vpc-ip]:2888:3888 server.3=[vpc-ip]:2888:3888 TCP Ports 2888, 3888 should be opened for ZK Ensemble.

April 5, 2013

by Harish Ganesan

· 7,798 Views

Getting Real with Scrumban

I've been working as a Scrum Master and as an Agile Coach for a good few years now, mainly as a contractor. Each time I am interviewed for a new contract I always like to ask if I can meet the teams I’d be working with. You see, time and again it will be a manager who does the interviewing, while the team members themselves are left with little say in whether I should be hired. I think it’s important that they reckon they can get along with me. Of course, it also gives me an opportunity to see them, and to gain a fuller understanding of the situation I’d really be walking into. As we head towards the desks of my prospective team, one of the first things I look for is the board, whether it be a Scrum task board or a Kanban board. Most teams with agile aspirations…or agile pretensions…will have set up a board of some kind. A board is the "grand old dame" of information radiators. No matter how much the details of a sordid past are glossed over, the truth always seems to come out. It's in the nature of a board to tell the truth, since any untruths can be quickly exposed. The story I can piece together from dubious lanes and columns, misplaced or missing tickets, misplaced or missing avatars, and a host of other shibboleths can be far more telling than anything I get to hear from people in an interview situation. Another of the things I look for is a "fast track" lane on a Scrum Team's board. These are very common; you could say it is almost unusual not to see them. From a certain perspective they are good things to have, and they can imply a level of maturity - or at least of pragmatism - on the part of a team. They suggest that the team accepts that not everything can be predicted in Sprint planning. A fast track lane is a nod to the fact that emergencies happen, that support work and unforeseen defect fixes still need to be done, and most importantly, that the team has a way of dealing with all of this. However it also shows that they aren't doing Scrum. There...I've said it. Fast track lanes aren't part of Scrum. It's that simple. I don't mean to say that they are bad practice, or in some sense un-agile. On the contrary, they are part of the Lean Kanban approach to varying the Quality of Service provided to certain backlog items. That's what a fast track lane is...a way of varying the quality of service that a Scrum team gives to certain items. When something hits a fast track lane, a well-trained team will swarm over it and decide who is best qualified to progress the matter. While they do this, their own tasks will be marked as impeded or blocked. Then, the decision made, all others return to their work in progress. So if fast track lanes are a widely understood and practical way of managing operational issues, what is wrong with them, Scrum-wise? The answer is that Scrum - unlike Lean Kanban - doesn't provide for variations in quality of service. Each piece of work is prioritized and negotiated into a Sprint backlog. The team then self-organizes to deliver a corresponding increment of functionality. The team will plan with the Product Owner what it intends to do during a sprint, and the sprint backlog they agree to belongs to them. No-one, not even the CEO of the organization, can override their sprint backlog by introducing work to be "fast tracked". The team wholly owns their sprint backlog. That's Scrum. When I point this out, teams can become crestfallen or even defensive. “What else are we supposed to do”, they say. “We aren’t dedicated 100% to doing project work. We still have support work to do, and serious issues always trump development. We have to fix them and put project work on hold.” My answer to that is that under the circumstances the team is facing, it may indeed be right to vary the quality of service by fast-tracking support work. It just isn’t Scrum, that’s all. It’s a type of "Scrumban", a Scrum variant that includes Kanban characteristics. This is no fault of the team, but it could suggest a problem higher up. Perhaps a dedicated Kanban support team hasn’t been properly resourced and trained so that Scrum development can proceed unimpeded. Perhaps the Product Owner is being undermined by other managers who have separate interests impacting the development. Whatever the situation, it needs to be made transparent and acknowledged by all stakeholders. So, the next step…and the one I’ll often indicate as the interview progresses…is to account for fast track work as impediments against product burndown or velocity. Moreover, these are impediments which are external to the team. It’s essentially a type of waste, or unplanned work, being generated from outside. It needs to be made quite transparent where this waste is coming from and what can be done to mitigate it. What can be done about those other teams, or workflows, or managers, who are undercutting this Scrum team’s ability to plan out their Sprints? Often, the source of these impediments will be the people interviewing me...and that’s when things can start to get really interesting!

April 4, 2013

by $$anonymous$$

· 9,685 Views

How to use Mock/Stub in Spring Integration Tests

Generally, you pick up a subset of components in some integration tests to check if they are glued as expected. To achieve this, they are usually really invoked, but sometimes, it is too expensive to do so. For example, Component A invokes Component B, and Component B has a dependency on an external system which does not have a test server. We really want to verify the configurations, it seems the only way is replacing Component B with test double after wiring Component A and B. Let's start with Strategy A: Manual Injecting @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = "classpath:config.xml") public class SomeAppIntegrationTestsUsingManualReplacing { private Mockery context = new JUnit4Mockery(); （1） private SomeInterface mock = context.mock(SomeInterface.class); （2） @Resource(name = "someApp") private SomeApp someApp; （3） @Before public void replaceDependenceWithMock() { someApp.setDependence(mock); （4） } @DirtiesContext @Test public void returnsHelloWorldIfDependenceIsAvailable() throws Exception { context.checking(new Expectations() { { allowing(mock).isAvailable(); will(returnValue(true)); （5） } }); String actual = someApp.returnHelloWorld(); assertEquals("helloWorld", actual); context.assertIsSatisfied(); （6） } } We get a spring bean someApp(Component A in this case), and it has a denpendence on SomeInterface's(Component B in this case). We inject mock (declare and init at step 4) to someApp, thus the test passes without sending request to the external system. The context.assertIsSatisfied()(at step 6 ) is very important as we use SpringJUnit4ClassRunner as junit runner instead of JMock, so you have to explictly assert that all expectations are satisfied. There are two downsides of the previous strategy: Firstly, if there are more than one mock, you have to inject them one by one, which is very tedious especially when you need to inject mocks into serveral spring bean. Secondly, the wiring is not tested. For example, if I forget to write the integration tests using manual inject strategy is not going to tell. Strategy B: Using predefined BeanPostProcessor Spring provides BeanPostProcessor which is very useful when you want to replace some bean after the wiring is done. According to the reference, application context will auto detect all BeanPostProcessor registered in metadata(usually in xml format). public class PredefinedBeanPostProcessor implements BeanPostProcessor { public Mockery context = new JUnit4Mockery(); （1） public SomeInterface mock = context.mock(SomeInterface.class); （2） @Override public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException { return bean; } @Override public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException { if ("dependence".equals(beanName)) { return mock; } else { return bean; } } } @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = { "classpath:config.xml", "classpath:predefined.xml" }) （1） public class SomeAppIntegrationTestsUsingPredefinedReplacing { @Resource(name = "someApp") private SomeApp someApp; @Resource(name = "predefined") private PredefinedBeanPostProcessor fixture; @Test public void returnsHelloWorldIfDependenceIsAvailable() throws Exception { fixture.context.checking(new Expectations() { { allowing(fixture.mock).isAvailable(); will(returnValue(true)); } }); String actual = someApp.returnHelloWorld(); assertEquals("helloWorld", actual); fixture.context.assertIsSatisfied(); } } Notice there is an extra config xml in which the PredefinedBeanPostProcessor is registered(at step 1). The predefined.xml is placed in src/test/resources/, so it will not be packed into the artifact for production. For each test, using Strategy B requires inputting both a java file and a xml which is quite verbose. Now we have learned the pros and cons of Strategy A and Strategy B. What about a hybrid version -- killing two birds with one stone. Therefore we have the next strategy. Strategy C：Dynamic Injecting public class TestDoubleInjector implements BeanPostProcessor { private static Map MOCKS = new HashMap(); （1） @Override public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException { return bean; } @Override public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException { if (MOCKS.containsKey(beanName)) { return MOCKS.get(beanName); } return bean; } public void addMock(String beanName, Object mock) { MOCKS.put(beanName, mock); } public void clear() { MOCKS.clear(); } } @RunWith(JMock.class) public class SomeAppIntegrationTestsUsingDynamicReplacing { private Mockery context = new JUnit4Mockery(); private SomeInterface mock = context.mock(SomeInterface.class); private SomeApp someApp; private ConfigurableApplicationContext applicationContext; private TestDoubleInjector fixture = new TestDoubleInjector(); （1） @Before public void replaceDependenceWithMock() { fixture.addMock("dependence", mock); （2） applicationContext = new ClassPathXmlApplicationContext(new String[] { "classpath:config.xml", "classpath:dynamic.xml" }); （3） someApp = (SomeApp) applicationContext.getBean("someApp"); } @Test public void returnsHelloWorldIfDependenceIsAvailable() throws Exception { context.checking(new Expectations() { { allowing(mock).isAvailable(); will(returnValue(true)); } }); String actual = someApp.returnHelloWorld(); assertEquals("helloWorld", actual); } @After public void clean() { applicationContext.close(); fixture.clear(); } } The TestDoubleInjector class is an implementation of Monostate pattern. Mocks are added to the static map before the application context being created. When another TestDoubleInjector instance (defined in dynamic.xml) is initiated, it can share the static map for replacement. Just beware to clear the static map after tests. By the way, you could use Stub instead of Mocks with same strategies. Please do not hesitate to contact me if you might have any questions. And I do appreciate it, if you could let me know you have a better idea. Thanks! Resources： http://www.jmock.org http://www.oracle.com/technetwork/articles/entarch/spring-aop-with-ejb5-093994.html(I saw BeanPostProcessor the first time in this post)

April 3, 2013

by Hippoom Zhou

· 51,789 Views · 1 Like

AWS VPC NAT Instance Failover and High Availability

Amazon Virtual Private Cloud (VPC) is a great way to setup an isolated portion of AWS and control the network topology. It is a great way to extend your data center and use AWS for burst requirements. With the latest VPC for Everyone announcement, what was earlier "Classic" and "VPC" in AWS will soon be only VPC. That is, every deployment in AWS will be on a VPC even though one might not need all the additional features that VPC provides. One might eventually start looking at utilizing VPC features such as multiple Subnets, Network isolation, Network ACLs, etc.. Those who have already worked with VPC's understand the role of NAT Instance in a VPC. When you create a VPC, you create them with multiple Subnets (Public and Private). Instances launched in the Public Subnet have direct internet connectivity to send and receive internet traffic through the internet gateway of the VPC. Typically, internet facing servers such as web servers are kept in the Public Subnet. A Private Subnet can be used to launch Instances that do not require direct access from the internet. Instances in a Private Subnet can access the Internet without exposing their private IP address by routing their traffic through a Network Address Translation (NAT) instance in the Public Subnet. AWS provides an AMI that can be launched as a NAT Instance. Following diagram is the representation of a standard VPC that gets provisioned through the AWS Management Console wizard. Standard Private and Public Subnets in a VPC The above architecture has A Public Subnet that has direct internet connectivity through the Internet Gateway. Web Instances can be placed within the Public Subnet The custom Route Table associated with Public Subnet will have the necessary routing information to route traffic to the Internet Gateway A NAT Instance is also provisioned in the Public Subnet A Private Subnet that has outbound internet connectivity through the NAT Instance in the Public Subnet The Main Route Table is by default associated with the Private Subnet. This will have necessary routing information to route internet traffic to the NAT Instance Instances in the Private Subnet will use the NAT Instance for outbound internet connectivity. For example, DB backups from standby that needs to be stored in S3. Background programs that make external web services calls Of course, the above architecture has limited High Availability since all the Subnets are created within the same Availability Zone. We can avoid this by creating multiple Subnets in multiple Availability Zones. Public and Private Subnets with multiple Availability Zones Additional Subnets (Public and Private) are created in one another Availability Zone Both Private Subnets are attached to the Main Routing Table Both Public Subnets are attached to the same Custom Routing Table Instances in the Private Subnet still continue to use the NAT Instance for outbound internet connectivity Though we increased the High Availability by utilizing multiple Availability Zones, the NAT Instance is still a Single Point of Failure. NAT Instance is just another EC2 Instance that can become unavailable any time. The updated architecture below uses two NAT Instances to provide failover and High Availability for the NAT Instances NAT Instance High Availability Each Subnet is associated with its own Route Table NAT1 is provisioned in Public Subnet 1 NAT2 is provisioned in Public Subnet 2 Private Subnet 1's Route Table (RT) has routing entry to NAT1 for internet traffic Private Subnet 2's Route Table (RT) has routing entry to NAT2 for internet traffic NAT Instance HA Illustration A script can be installed on both the NAT Instances to monitor each other and swap the routing table association if one of them fails. For example, if NAT1 detects that NAT2 is not responding to its ping requests, it can change the Route Table of Private Subnet 2 to NAT1 for internet traffic. Once NAT2 becomes operational again, a reverse swapping can happen. AWS has a pretty good documentation on this and a sample script for the swapping. Apart from HA, the above architecture also provides better overall throughput, since during normal conditions, both NAT Instances can be used to drive the outbound internet requirements of the VPC. If there are workloads that requires a lot of outbound internet connectivity, having more than one NAT Instance would make sense. Of course, you are still limited with one NAT Instance per Subnet.

March 28, 2013

by Raghuraman Balachandran

· 18,783 Views

Accessing AWS Without Key and Secret

If you are using Amazon Web Services(AWS), you are probably aware how to access and use resources like SNS, SQS, S3 using key and secret. With the aws-java-sdk that is straight forward: AmazonSNSClient snsClient = new AmazonSNSClient( new BasicAWSCredentials("your key", "your secret")) One of the difficulties with this approach is storing the key/secret securely especially when there are different set of these for different environments. Using java property files, combined with maven or spring profiles might help a little bit to externalize the key/secret out of your source code, but still doesn't solve the issue of securely accessing these resources. Amazon has another service to help you in this occasion. No, no, this is not one more service to pay for in order to use the previous services. It is a free service, actually it is a feature of the amazon account. AWS Identity and Access Management (IAM) lets you securely control access to AWS services and resources for your users, you can manage users and groups and define permissions for AWS resources. One interesting functionality of IAM is the ability to assign roles to EC2 instances. The idea is you create roles with sets of permissions and you launch an EC2 instance by assigning the role to the instance. And when you deploy an application on that instance, the application doesn't need to have access key and secret in order to access other amazon resource. The application will use the role credentials to sign the requests. This has a number of benefits like a centralized place to control all the instances credentials, reduced risk with auto refreshing credentials and so on. Here is a short video demonstrating how to assign roles to an EC2 instance: Once you have role based security enabled for an instance, to access other resources from that instances you have to create and AwsClient using the chained credential provider: AmazonSNSClient snsClient = new AmazonSNSClient( new DefaultAWSCredentialsProviderChain()) The provider will search your system properties, environment properties and finally call instance metadata API to retrieve the role credentials in chain of responsibility fashion. It will also refresh the credentials in the background periodically depending on its expiration period. And finally, if you want to use role based security from Camel applications running on Amazon, all you have to do is create an instance of the client with configured chained credentials object and don't specify any key or secret: from("direct:start") .to("aws-sns://MyTopic?amazonSNSClient=#snsClient");

March 26, 2013

by Bilgin Ibryam

· 14,404 Views

Connect Apache OFBiz with the Real World

What would you expect from someone who is OFBiz and Camel committer? To integrate them for fun? Fine, here it is. In addition to being fun, I believe this integration will be of real benefit for the OFBiz community, because despite the fact of being a complete ERP software, OFBiz lacks the ability to easily integrate with external systems. The goal of this project is instead of reinventing the wheel and trying to integrate OFBiz with each system separately, integrate it with Camel and let Camel do what it does best: connect your application with every possible protocol and system out there. Quick OFBiz introduction The Apache Open For Business Project is an open source, enterprise automation software. It consist mainly from two parts: A full-stack framework for rapid business application development. It has Entity Engine for the data layer (imagine something like iBATIS and Hibernate combined). It is the same entity engine that powers millions of Attlasian Jira instances. But don't get me wrong, it is not meant for usage outside of OFBiz framework, so use it only as OFBiz data layer. Service Engine - this might be hard to grasp for someone only with DDD background, but OFBiz doesn't have any domain objects. Instead for the service layer it uses SOA and has thousands of services that contains the business logic. A service is an atomic bit of isolated business logic, usually reading and updating the database. If you need you can make services triggering each other using ECAs(event-condition-action) which is kind of rule engine that allows define pre/post conditions for triggering other service calls when a service is executed. The service itself can be written written in java, groovy or simple language (an XML DSL for simple database manipulation) and usually requires authentication, authorisation and finally executed in a transaction. UI widgets - an XML DSL which allows you easily create complex pages with tables, forms and trees. And the really great thing about this framework is that 'The whole is greater than the sum of its parts' - all of the above layers works together amazingly: if you have an entity definition (a table) in your data layer, you can use it in your service layer during your service interface definition or its implementation. It takes one line of code(a long one) to create a service which has as input parameters the table columns and return the primary key as result of the service. Then if you are creating a screen with tables or forms, you can base it on your entity definitions or service definitions. It is again only few lines of code to create a form with fields mapping to a service or entity fields. Out of the box business applications. These are vertical applications for managing the full life cycle of a business domain like ordering, accounting, manufacturing and many more in a horizontally integrated manner. So creating an order from order or ecommerce application will interact with facility to check whether a product is available, and after the order is created will create accounting transaction in accounting application. Before the order is shipped from the facility, it will create invoices and once the invoice is paid, it will close the order. You get the idea. Camel in 30 seconds Apache Camel is an integration framework based on known Enterprise Integration Patterns(EIP). Camel can also be presented as consisting of two artifacts: The routing framework which can be defined using java, scala, xml DSL with all the EIPs like Pipe, Filter, Router, Splitter, Aggregator, Throttler, Normalizer and many more. Components and transformers ie all the different connectors to more than 100 different applications and protocols like: AMQP, AWS, web services, REST, MongoDB, Twitter, Websocket, you name it. If you can imagine a tool, that enables you to consume data from one system, then manipulate the data (transform, filter, split, aggregate) and send it to other systems, using a declarative, concise, English-like DSL without any boilerplate code - that's Apache Camel. Let OFBiz talk to all of the systems Camel do The main interaction point with OFBiz are either by using the Entity Engine for direct data manipulation or by calling services through Service Engine. The latter is preferred because it ensures that the user executing the service is authorised to do so, the operation is transactional to ensure data integrity, and also all the business rules are satisfied (there might be other services that have to be executed with ECA rules). So if we can create an OFBiz endpoint in Camel and execute OFBiz services from Camel messages, that would allow OFBiz to receive notifications from Camel endpoints. What about the other way around - making OFBiz notify Camel endpoints? The ideal way would be to have an OFBiz service that sends the IN parameters to Camel endpoints as message body and headers and return the reply message as OFBiz service response. If you are wondering: why is it so great, what is an endpoint, where is the real world, who is gonna win Euro2012... have a look at the complete list of available Camel components, and you will find out the answer. Running Camel in OFBiz container I've started an experimental ofbiz-camel project on github which allows you to do all of the above. It demonstrates how to poll files from a directory using Camel and create notes in OFBiz with the content of the file using createNote service. The project also has an OFBiz service, that enables sending messages from OFBiz to Camel. For example using that service it is possible to send a message to Camel file://data endpoint, and Camel will create a file in the data folder using the service parameters. The integration between OFBiz and Camel is achieved by running Camel in an OFBiz container as part of the OFBiz framework. This makes quite tight integration, but ensures that there will not be any http, rmi or any other overhead in between. It is still WIP and may change totally. Running Camel and OFBiz separately Another approach is KISS: run Camel and OFBiz as they are - separate applications, and let them interact with RMI, WS* or something else. This doesn't require any much coding, but only configuring both systems to talk to each other. I've created a simple demo camel-ofbiz-rmi which demonstrates how to listen for tweets with a specific keyword and store them in OFBiz as notes by calling createNote service using RMI. It uses Camel's twitter and rmi components and requires only configuration. Notice that this example demonstrates only one way interaction: from Camel to OFBiz. In order to invoke a Camel endpoint from OFBiz you can you have to write some RMI, WS* or other code. PS: I'm looking forward to hear your real world integration requirements for OFBiz.

March 25, 2013

by Bilgin Ibryam

· 18,230 Views · 1 Like

How to Configure diff and Merge Tool in Visual Studio Git Tools

If you are using Visual Studio plugin for Git, but you have also configured Git with MSys git, probably you could be surprised by some Visual Studio behavior.

March 20, 2013

by Ricci Gian Maria

· 75,542 Views · 1 Like

Using Facades to Decouple API Integrations

the most important part of building an integration with an api is actually writing the code that will connect with the web service and invoke its methods. i'll show you why using the façade pattern to decouple calls from your existing code is a good idea and help you identify what kind of problems you might be able to prevent. so, first things first, what is the façade pattern? a façade is an object that provides simple access to complex - or external - functionality. it might be used to group together several methods into a single one, to abstract a very complex method into several simple calls or, more generically, to decouple two pieces of code where there's a strong dependency of one over the other. what happens when you develop api calls inside your code and, suddenly, the api is upgraded and some of its methods or parameters change? you'll have to change your application code to handle those changes. also, by changing your internal application code, you might have to change the way some of your objects behave. it is easy to overlook every instance and can require you to doublecheck multiple lines of code. there's a better way to keep api calls up-to-date. by writing a façade with the single responsibility of interacting with the external web service, you can defend your code from external changes. now, whenever the api changes, all you have to do is update your façade. your internal application code will remain untouched. from a test driven development point-of-view, using a facade offers a big advantage. you're now able to write simple tests against the façade without affecting your internal code test results. by using this strategy, you'll be able to know immediately whenever an api is not working as you expected and make the necessary changes to the façade. this is the approach we follow at cloudwork when building integrations between any third-party apis. api façades act as tight compartments that protect the rest of the application from external changes and simplify the way we interact with different web services. this article is cross-posted at using facades to decouple api integrations .

March 13, 2013

by Bruno Pedro

· 12,236 Views

In-Memory Data Grids

Introduction The IT buzzword of 2012 is without a doubt Big Data. It’s new and here to stay, and for a good reason. Big data is data that exceeds the processing capacity of conventional database systems. Great examples are CERN with the Large Hadron Collider, whose experiments generate 25 petabytes of data annually, or Walmart, which handles more than one million customer transaction every hour. Problems These vast amounts of data leave us with two problems. Problem 1: To gain value from this data, one must choose an alternative way to process it. The value of big data to an organization falls into two categories: analytical use, and enabling new products. Big data analytics can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports. Problem 2: The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. Remember the CERN case where the LHC produces over 25 Petabytes of data annually? No “classic” database architecture or setup is capable of holding these amounts of data. Solutions Fortunately, both problems can be solved by implementing the correct infrastructure and rethinking data storage. There are two critical factors in Big Data environments: size and speed. We already discussed the vast amounts of data and desire to be able to access and process the data fast. The latter is the main differentiator from more traditional data warehouses. Just imagine what you can do when you can access all your data real-time. Enter big data. A common Big Data implementation is an in-memory data grid that lives in a distributed cluster, ensuring both speed, by storing data in-memory, and capacity by using scalability features provided by a cluster. As a bonus, availability is ensured by using a distributed cluster. As for the data storage, there are typically two kinds: in-memory databases and in-memory data grids. But first some background. It is not a new attempt to use main memory as a storage area instead of a disk. In our daily lives there are numerous examples of main memory databases (MMDB), as they perform much faster than disk-based databases. An every day example is a mobile phone. When you SMS or call someone most mobile service providers use MMDB to get the information on your contact as soon as possible. The same applies to your phone. When someone calls you, the caller details are looked up in the contacts application, usually providing a name and sometimes a picture. In memory data grids In Memory Data Grid (IMDG) is the same as MMDB in that it stores data in main memory, but it has a totally different architecture. The features of IMDG can be summarized as follows: Data is distributed and stored on multiple servers. Each server operates in the active mode. A data model is usually object-oriented (serialized) and non-relational. According to the necessity, you often need to add or reduce servers. No traditional database features such as tables. In other words, IMDG is designed to store data in main memory, ensure scalability and store an object itself. These days, there are many IMDG products, both commercial and open source. Some of the most commonly used products are: Hazelcast (http://www.hazelcast.com) JBoss Infinispan (http://www.jboss.org/infinispan) GridGain DataGrid (http://www.gridgain.com/features/in-memory-data-grid/) VMware Gemfire (http://www.vmware.com/nl/products/application-platform/vfabric-gemfire/overview.html) Oracle Coherence (http://www.oracle.com/technetwork/middleware/coherence/overview/index.html) Gigaspaces XAP (http://www.gigaspaces.com/datagrid) Terracotta Enterprise Suite (http://terracotta.org/products/enterprise-suite) Why Memory? The main reasons for using main memory for data storage are once again the two main themes of Big Data: speed and capacity. The processing performance of main memory is 800 times faster than an HDD and up to 40 times faster than an. Moreover, the latest x86 server supports main memory of hundreds of GB per server. It is said that the limit of a traditional processing database’s (OLTP) data capacity is approximately 1 TB and that the OLTP processing data capacity would not increment well. If servers using main memory of 1 TB or larger become more commonly used, you will be able to conduct operations with the entire data placed in main memory, at least in the field of OLTP. IMDG Architecture To use main memory as a storage area, two weak points should be overcome: Limited capacity: involves data that exceeds the maximum capacity of the main memory of the server Reliability: involves data loss in case of a (system) failure. IMDG overcomes the limit of capacity by ensuring horizontal scalability using a distributed architecture, and resolves the issue of reliability through a replication system as part of the grid (or a distributed cluster). Now let’s discuss how an IMDG actually works. First of all, it is important to understand that an IMDG is not the same as an in-memory database, also referred to as MMDB (main memory databases). Typical examples of MMDBs are Oracle TimesTen or Sap Hana. MMDBs are full database products that simply reside in memory. As a result of being a full-blown database, they also carry the weight and overhead of database management features. IMDG is different. No tables, indexes, triggers, stored procedures, process managers etc. Just plain storage. The data model used in IMDG is key-value pairs. A key-value pair is a list with only two parts: a key and a value. The key can be used for storing and retrieving the values in the list. A key can be compared to the index or primary key of a table in a database. Note that IMDG are closely tied to development environments such as Java as the key-value pairs are represented by the structures provided by such a programming environment. Most IMDGs are written in Java, and can only be used within other Java applications. Therefore, the values of key-value pairs can be anything supported by Java, ranging from simple data types such as a string or number, to complex objects. This overcomes the two important hurdles: as you can store complex Java objects as value, there’s no need to translate these objects into a relational datamodel (which is the case in more traditional applications using a database for storage). Furthermore, the seeming limitation of being able to store only one value per key, is actually no limitation at all. Large memory sizes Most of the products introduced above use Java as an implementation language. Java reserves and uses a part of the RAM (internal memory) for dynamic memory allocation. This reserved memory space is called the Java heap. All runtime objects created by a Java application are stored in heap. Using large amounts of data causes two problems. Size limitation: By default, the heap size is 128 MB, but for current business applications, this limit is reached easily. Once the heap is “full”, no new objects can be created and the Java application will show some nasty errors. Performance: It is possible to increase the size of the heap, but this introduces some new problems. When a heap reaches a size of more than 4 gigabytes, Java will have serious issues with memory managements, causing your application to slow down or even freeze. Java has a feature called Garbage Collector, which periodically scans the heap and checks each object if it is still valid and being used. If not, the garbage collector removes the object and defragments the newly available space. The problem is, the larger the heap size, the more work to do for the garbage collector, resulting in performance degradation. Imagine a large bank has a Java application that manages customers, accounts and transactions. We have seen that an IMDG allows the application to store and access all data very quickly by caching it in memory, instead of storing the data in relatively slow databases. Let’s assume the combined data has a size of 40 gigabytes. Storing it in heap is simply not possible, considering the performance penalties of Java’s memory management capabilities. The graph below illustrates the garbage collection pause time when placing cached data in heap: Terracotta’s BigMemory product has a method to overcome these limitations. The method is to use an off-heap memory (direct buffer). Data will not be stored in Java’s heap, but directly in the available internal memory (RAM). Since, this is not subject to Java’s garbage collector, there are no performance penalties. The differences on performance are significant, as can be seen in the graph below: Using off-heap storage has some major benefits: You can use all the available memory on your machine, not just the memory that is allocated to the heap (usually less that 512 Mb). This allows you to store more data in a in-memory data grid, greatly speeding up your application. The heap can be relieved by storing data in native memory, speeding up Java applications as less heap space has to be garbage collected. Clustering, fail over and high availability So far, we have seen IMDG features that are applicable to a single server. However, the real power of IMDG lies in it’s networking and clustering capabilities, providing features as data replication, data synchronization between clients, fail over and high availability. To achieve this, a cluster of servers (or server array) acts a backbone of the infrastructure. Applications (that still can have their own IMDG or off-heap cache) that are connected to the cluster can share, replicate and backup their data with either the cluster or other applications. The graph below depicts a typical setup using Terracotta's BigMemory: The caches on the application servers are usually referred to as “level 1” cache, while the data cache on the server array is referred to as “level 2” cache. There are many different scenarios possible for storing, clustering, synchronizing and replicating data. Covering all these topics goes far beyond the scope of this article. For more information, consult the technical documentation of the product of your choice. Conclusion Big Data brings us some new challenges. First of all, storing and accessing vast amounts of data makes us rethink traditional methods and technologies. Next, there’s the question what to do with all the available data. The potential value for marketing, financial and other businesses is huge. In order to facilitate Big Data, in-memory data grids are considered the best option. IMDGs with off-heap storage are even more powerful, allowing data centric enterprise application to overcome certain limits of the Java platform, such as memory and performance constraints. As the amount of data that (large) companies produce and store, grows exponentially, databases will hit a limit. Accessing your data without a performance penalty simply will not be possible. The answer to this is using an IMDG.

March 13, 2013

by Roy Prins

· 32,633 Views · 5 Likes

Maven's Non-Resolvable Parent POM Problem

Need help dealing with Maven's non-resolvable parent problem? Check out this post to learn how.

March 12, 2013

by Roger Hughes

· 462,957 Views · 8 Likes

8 Lessons in Deployment Tooling Lessons Learned

It didn’t take long. A few months after we released an open source continuous integration tool (Anthill) in 2001, we were asked, “It’s great that I have the build setup, now how do I deploy to the test lab?” That email started a clear transformation in our thinking. Lesson 1: Builds generally exist to be tested or released to customers.The corollary is: Continuous integration is not about build, it is about quality and checking quality generally requires a deployment or six. In 2005, we updated our AnthillPro 2.5 with a shiny new deployment capability. It worked ok, but with that generation of tool being so build oriented, it was never a clean fit. Lesson 2: Deployments are a serious challenge, and can not be bolted on to a build tool. This lesson has been reinforced over the years as we’ve watched the results of various tools tacking on deployments. In 2006, we released AnthillPro 3. Oh boy, what a change. Now deployments and other stuff you do to builds days and weeks later were their own thing. Demonstrating a stunning lack of marketing savvy, we called them “non-originating workflows” (processes that don’t make builds) and declared a new type of tool, an “Application Lifecycle Automation Server”. Today, you’d call it a Continuous Delivery tool (thank you Jez and Dave for the better name). To my knowledge, AnthillPro was the first tool to really take a build lifecycle or build pipeline seriously and had the ambition to manage from continuous integration build through to production release quickly and with a solid audit trail. That brings us to Lesson 3: When you come from the development side of the house as either an engineer or a vendor, you have a lot to learn about audit and security. Around the same time we also learned a great deal about the Dev / Ops gap. While some of our customers implemented the tool as intended, most of the time it was used either by a dev organization doing continuous integration and deployment to the early test environments or a production support / release management group that only did the “official” builds and focused on the production release. The failure of the developer initiated efforts to get to production provided a key insight. Lesson 4: When it comes to deployment automation, start with production, and work your way to the lower environments, treating them as a simpler case of the hard problem. We also learned about the limitations of a build pipeline approach in complex applications. Lesson 5: Deployments of complex systems often require a coordinated release of many builds. The need to start at production style deployments (and not care as much about the build) and the emphasis on deployment time dependencies focused our efforts on a deployment centric tool, uDeploy to compliment the pipeline approach in AnthillPro. Production and scaled usage come together to make the deployment tool a critical piece of infrastructure. If you lose a datacenter, can you recover the tool and use it to recover other applications into a backup datacenter?Lesson 6: Your release and deployment systems need to be configured with high availability and recoverability in mind. For the better part of the last decade, we’ve been trying to streamline deployments and kill off the release weekend. Many of customers have eliminated or shrunk those big events down to a more manageable size. Big releases though still can require the coordination of dozens of people across various teams. One time events like configuring the deployment tool with the password for a new database in each environment, or routine manual steps like putting eyeballs on the new production system before putting it out to the public still need to be managed. Lesson 7: Deployment is still just one part of release. With that in mind, we’ve added uRelease to the portfolio. Different people in the application delivery value chain need different tools. Developers need continuous integration and great testing. Change management needs great auditing around production deployments. Testers need their environments to be not broken, updated regularly and close enough to production to make their testing worthwhile. The patterns and needs overlap. Everyone needs deployments that work. Everyone benefits from understanding the lifecycle of a build (or other version) as well as what is in a given environment. Everyone benefits when release planning meetings are shorter. Lesson 8: No single tool is enough. An integrated toolchain that exchanges information is required. So there you go. Twelve years of tooling, eight with deployments. Summed up in six over-simplified lessons. I’m sure we’ll be updating this next year. Lesson 1: Builds generally exist to be tested or released to customers. Lesson 2: Deployments are a serious challenge, and can not be bolted on to a build tool. Lesson 3: When you come from the development side of the house as either an engineer or a vendor, you have a lot to learn about audit and security. Lesson 4: When it comes to deployment automation, start with production, and work your way to the lower environments, treating them as a simpler case of the hard problem. Lesson 5: Deployments of complex systems often require a coordinated release of many builds. Lesson 6: Release and deployment systems need to be configured with high availability and recoverability in mind. Lesson 7: Deployment is still just one part of release Lesson 8: No single tool does “DevOps”. An integrated toolchain that exchanges information is required. What are the biggest lessons you’ve learned about deployments in the last few years? Share in the comments below!

March 4, 2013

by Eric Minick

· 7,294 Views

SAP Integration with Talend Components / Connectors (BAPI, RFC, IDoc, BW, SOAP)

talend has several connectors to integrate sap systems. however, this guide is no introduction to talend’s sap components. instead, this guide helps to understand different alternatives to integrate sap systems with talend set up a local sap system configure talend studio for using sap components use talend’s sap wizard run a first talend job which connects to sap all further required information and example use cases for talend’s sap components should be available in the talend component guide at www.help.talend.com . if that’s not the case, please create a jira documentation ticket ( https://jira.talendforge.org/browse/doct )! now let’s take a look at different alternatives for integration of sap systems with talend. alternatives for sap integration three protocols exist for communication between sap and external programs: dynamic information and action gateway (diag): e.g. used by sap gui remote function call (rfc): a function call with input and output parameters (like a java interface) hypertext transfer protocol (http): internet standard the following alternatives are available for integrating sap systems using some of these protocols. file sap supports the direct import of files (call-transaction-program, batch-input, direct input). files have to be in a specific format to be imported. transformation and integration can be realized with talend’s various file components such as tfileinputdelimited. rfc remote function call is the proprietary sap ag interface for communication between a sap system and other sap or third-party compatible system over tcp/ip or cpi-c connections. remote function calls may be associated with sap software and abap programming, and provide a way for an external program (written in languages such as php, asp, java, or c, c++) to use data returned from the server. data transactions are not limited to getting data from the server, but can insert data into server records as well. sap can act as the client or server in an rfc call. a remote function call (rfc) is the call or remote execution of a remote function module in an external system. in the sap system, these functions are provided by the rfc interface system. the rfc interface system enables function calls between two sap systems, or between a sap system and an external system. tsapinput and tsapoutput are talend’s components to use rfcs. business application programming interface (bapi) a bapi is an object-oriented view on most data and transactions of a sap system (called “business objects”). object types of the business objects are stored in the business object repository (bor). bapis are always implemented as rfcs and therefore can be called the same way. additionally, they have the following characteristics (compared to rfcs): stable interface no view layer no exceptions, instead export parameter: “return” most business objects offer the following standard bapis: getlist getdetail change creationfromdata tsapinput and tsapoutput are talend’s components to use bapis. application link enabling (ale) application link enabling (ale) is used for asynchronous messaging between different systems via “intermediate documents (idoc)”. idoc is a sap document format for business transaction data transfers. it is used to realize distributed business processes. idoc is similar to xml in purpose, but differs in syntax. both serve the purpose of data exchange and automation in computer systems, but the idoc technology takes a different approach. while xml allows having some metadata about the document itself, an idoc is obligated to have information at its header like its creator, creation time, etc. while xml has a tag-like tree structure containing data and meta-data, idocs use a table with the data and meta-data. idocs also have a session that explains all the processes which the document passed or will pass, allowing one to debug and trace the status of the document. an idoc consists of control record (it contains the type of idoc, port of the partner, release of sap r/3 which produced the idoc, etc.) data records of different types. the number and type of segments is mostly fixed for each idoc type, but there is some flexibility (for example an sd order can have any number of items). status records containing messages such as 'idoc created', 'the recipient exists', 'idoc was successfully passed to the port', 'could not book the invoice because...' different idoc types are available to handle different types of messages. for example, the idoc format orders01 may be used for both purchase orders and order confirmations. tsapidocinput and tsapidocoutput are talend’s components to use ale / idoc. bapis can also be called asynchronously via ale. all new idocs are even based on bapis. soap web services sap supports soap web services. not just sap as java, but also sap as abap! integration can be realized with talend’s esb / web service components such as tesbrequest, tesbresponse, or tesbconsumer. installation of sap server and client installation can take about 6 to 8 hours, but it is an “all in one installation”, i.e. you can install it overnight. steps for installation: get yourself a windows 7 64 bit laptop or vm with 8+ gb ram and 50+gb free disc space get a sap community account (for free, just register): http://scn.sap.com/welcome download sap netweaver (software downloads --> sap netweaver main releases: http://www.sdn.sap.com/irj/scn/nw-downloads download current version of sap netweaver application server abap 64-bit trial install sap server: follow installation guide – a html website included in the download in root of extracted download folder (start.htm --> there click on “installation” link) install sap gui (rich client frontend): start.htm --> there click on “install sap gui” link and follow instructions download the sap jco for the operating system on which your connector is running. the sap jco is available for download from sap's website at http://service.sap.com/connectors . you must have an sapnet account to access the sap jco (if you do not already have one, contact your local sap basis administrator). usage of sap server hint: you have to use a windows user which has a password (as you need to enter windows credentials when stopping sap). if you have a windows user without a password (for instance if you use windows within a vm on your mac), sap cannot process these credentials (i.e. it cannot process an empty password field) --> change your windows password before starting sap start the management console (windows startmenu --> programs --> sap management console) start and stop the sap server (right click on “nsp” --> start / stop) default user: sap* (sap system super user) password: the one which you entered at installation of sap netweaver, e.g. admin123 usage of sap client a sap client should be used to get information about the sap system (functions, data, etc.) similarly to using e.g. mysql workbench to get information from a mysql database. sap gui (view layer) communicates with sap as abap (business logic layer). the application server communicates with the relational database (db layer). different clients are available for sap: sap gui windows sap gui java web browser external rfc-program for local development demos, sap gui windows is probably the best alternative. start sap gui windows by: clicking shortcut “windows start menu --> sap frontend --> sap logon” entering username and password clicking logon sap transactions in sap, you call sap programs via sap transaction codes. important transactions codes are for example: bapi: bapi explorer, view all sap bapi's se16: data browser, view/add table data se38: program editor here is a list of several other important transaction codes: http://www.sapdev.co.uk/tcodes/tcodes.htm installation of demo data the sap installation includes some demo data. as most people do not want to install “real” sap modules such as sap fi, sap crm or sap bi on their local system, this demo data is perfect for demos using talend’s sap connectors. to install the flight demo on a local sap system, you just have to open the abap editor (transaction: se38) and execute the program sapbc_data_generator. this program generates example data within the flight tables and does some further initializations. here is a good tutorial with more information and how to test the flight application: http://help.sap.com/saphelp_erp60_sp/helpdata/de/db/7c623cf568896be10000000a11405a/content.htm configuration of talend studio to use sap components talend’s sap components are already included in the studio. however, two further steps are required to be able to use them: copy sapjco3.dll to the directory c:/windows/system32 sap java connector jar must be added copy sapjco3.jar to the directory “talend/studio/lib/java” (re-) start talend studio check if sap library is added successfully open view “talend modules” (eclipse --> windows --> show view --> talend --> modules) sort by column “context” look for “tsap*” contexts and check if sapjco3.jar has status “installed” usage of sap components with talend studio this section describes how to use talend’s sap components and the sap wizard in general (using one specific example for calling a bapi). detailed descriptions of all sap components (for using bapis, rfcs, idocs, bw, etc.) are available in the documentation talend_components_rg_x.y.z.pdf at www.help.talend.com . connection to a sap system a connection to a sap system can be done “built-in” or via “metadata --> sap connections” (the latter only in enterprise version). using the latter has several advantages: reuse connection configuration quick check if connection to sap works wizards for retrieving functions from sap (instead of handwriting without wizard) quick test with test parameters if function works before finishing development lifecycle for a sap job development lifecycle for sap job: create connection (if not existing yet) right click on metadata --> sap connection create sap connection follow wizard sap jco version: 3 client: “001” userid: “sap*” password: “admin123” --> as you defined it while installation language: “en” hostname: “localhost” system number: “00” retrieve function (bapi / rfc) right click on created connection click on “retrieve sap function” enter search filter (e.g. bapi_fl*) click on “search” select and double click on your function (e.g bapi_flcust_getlist) you see all input, output and table parameters for this sap function click on “test in” --> here you see parameters in more detail: you now have to define which input and output parameters you want to use --> remove all other by selecting them and clicking “remove” button hint: if you do not remove an input parameter, you usually have to enter a value for it! select the output type - can be a single (single record), a table (list of records), or a structure output hint: difference between table and structure in sap: http://www.sapfans.com/forums/viewtopic.php?f=12&t=119794 if you want to do a quick test: enter values for input parameters (if there are any for your function call), then click “launch” button in this example, there is only an optional input parameter max_rows you should see data in the output fields in this example, you see the record with custname “sap ag” and street “neurottstr. 16” click “finish” button under “metadata --> sap connections --> “your connection” --> sap functions: there you can now see your function (in this example: bapi_flcust_getlist) create sap job drag&drop the created function into a job (without the wizard, you also can enter all data by hand) tsapinput component is proposed automatically. click ok to add it to your job go to “initialize input” and add parameter values in this example, there is just the parameter “max_rows” hint: the parameter value can be changed from a hardcoded value to a variable, of course (just click control space on your keyboard to get access to all available variables via code completion in your studio) go to the tsapinput component and add the desired output mapping (i.e. which values you want to process further with other components scroll to the bottom to “outputs” add the correct table / structure name (in this example: "customer_list") click on mapping (which is empty and has to be filled) click on “mapping”, then click on “…” add the wanted output columns of your sap function add the same names at the column “schema xpathqueries” (do not forget the double quotes here!) click “ok” button connect the tsapinput component to a tlogrowcomponent and synchronize the schema hint: always try out if this works before adding further logic to your job! run and test your job (you will see five rows logged (as you have configured max_rows = 5 that's it. now enjoy talend's sap components :-) best regards, kai wähner (twitter: @kaiwaehner) content from my blog: http://www.kai-waehner.de/blog/2013/03/03/sap-integration-with-talend-components-connectors-bapi-rfc-idoc-bw-soap/

March 4, 2013

by Kai Wähner

CORE

· 32,885 Views · 1 Like

Spring, JMS, Listener Adapters, and Containers

In order to receive JMS messages, Spring provides the concept of message listener containers. These are beans that can be tied to receive messages that arrive at certain destinations. This post will examine the different ways in which containers can be configured. A simple example is below where the DefaultMessageListenerContianer has been configured to watch one queue (the property jms.queue.name) and has a reference to a myMessageListener bean which implements the MessageListener interface (ie onMessage): This is all very well but means that the myMessageListener bean will have to handle the JMS Message object and process accordingly depending upon the type of javax.jms.Message and its payload. For example: if (message instanceof MapMessage) { // cast, get object, do something } An alternative is to use a MessageListenerAdapter. This class abstracts away the above processing and leaves your code to deal with just the message's payload. For example: The delegate is a reference to a myMessageReceiverDelegate bean which has one or more methods called processMessage. It does not need to implement the MessageListener interface. This method can be overload to handle different payload types. Spring behind the scenes will determine which gets called. For example: public void processMessage(final HashMap message) { // do something } public void processMessage(final String message) { // do something } For the given approach though, only one queue can be tied to the container. Another approach is to tie many listeners (therefore many queues) to the one container, The below Spring XML, using the jms namespace, shows how two listeners for different queues can be tied to one container: The myMessageReceiverDelegate bean is treated as an adapter delegate, therefore does not need to implement the MessageListener interface. Each listener can have a different delegate but for the above example, all messages arriving at the two queues are processed by the one receiver bean ie myMessageReceiverDelegate. If there is a need to check the message type and extract the payload, then the listener can use a class which implements the MessageListener interface (eg the myMessageListener bean used in the first example). The onMessage method will then be called when messages arrive at the specified destination:

February 28, 2013

by Geraint Jones

· 72,556 Views · 2 Likes

Synchronising Multithreaded Integration Tests

Testing threads is hard, very hard and this makes writing good integration tests for multithreaded systems under test... hard. This is because in JUnit there's no built in synchronisation between the test code, the object under test and any threads. This means that problems usually arise when you have to write a test for a method that creates and runs a thread. One of the most common scenarios in this domain is in making a call to a method under test, which starts a new thread running before returning. At some point in the future when the thread's job is done you need assert that everything went well. Examples of this scenario could include asynchronously reading data from a socket or carrying out a long and complex set of operations on a database. For example, the ThreadWrapper class below contains a single public method: doWork(). Calling doWork() sets the ball rolling and at some point in the future, at the discretion of the JVM, a thread runs adding data to a database. public class ThreadWrapper { /** * Start the thread running so that it does some work. */ public void doWork() { Thread thread = new Thread() { /** * Run method adding data to a fictitious database */ @Override public void run() { System.out.println("Start of the thread"); addDataToDB(); System.out.println("End of the thread method"); } private void addDataToDB() { // Dummy Code... try { Thread.sleep(4000); } catch (InterruptedException e) { e.printStackTrace(); } } }; thread.start(); System.out.println("Off and running..."); } } A straightforward test for this code would be to call the doWork() method and then check the database for the result. The problem is that, owing to the use of a thread, there's no co-ordination between the object under test, the test and the thread. A common way of achieving some co-ordination when writing this kind of test is to put some kind of delay in between the call to the method under test and checking the results in the database as demonstrated below: public class ThreadWrapperTest { @Test public void testDoWork() throws InterruptedException { ThreadWrapper instance = new ThreadWrapper(); instance.doWork(); Thread.sleep(10000); boolean result = getResultFromDatabase(); assertTrue(result); } /** * Dummy database method - just return true */ private boolean getResultFromDatabase() { return true; } } In the code above there is a simple Thread.sleep(10000) between two method calls. This technique has the benefit of being incredabile simple; however it's also very risky. This is because it introduces a race condition between the test and the worker thread as the JVM makes no guarantees about when threads will run. Often it'll work on a developer's machine only to fail consistently on the build machine. Even if it does work on the build machine it atificially lengthens the duration of the test; remember that quick builds are important. The only sure way of getting this right is to synchronise the two different threads and one technique for doing this is to inject a simple CountDownLatch into the instance under test. In the example below I've modified the ThreadWrapper class's doWork() method adding the CountDownLatch as an argument. public class ThreadWrapper { /** * Start the thread running so that it does some work. */ public void doWork(final CountDownLatch latch) { Thread thread = new Thread() { /** * Run method adding data to a fictitious database */ @Override public void run() { System.out.println("Start of the thread"); addDataToDB(); System.out.println("End of the thread method"); countDown(); } private void addDataToDB() { try { Thread.sleep(4000); } catch (InterruptedException e) { e.printStackTrace(); } } private void countDown() { if (isNotNull(latch)) { latch.countDown(); } } private boolean isNotNull(Object obj) { return latch != null; } }; thread.start(); System.out.println("Off and running..."); } } he Javadoc API describes a count down latch as: A synchronization aid that allows one or more threads to wait until a set of operations being performed in other threads completes. A CountDownLatch is initialized with a given count. The await methods block until the current count reaches zero due to invocations of the countDown() method, after which all waiting threads are released and any subsequent invocations of await return immediately. This is a one-shot phenomenon -- the count cannot be reset. If you need a version that resets the count, consider using a CyclicBarrier. A CountDownLatch is a versatile synchronization tool and can be used for a number of purposes. A CountDownLatch initialized with a count of one serves as a simple on/off latch, or gate: all threads invoking await wait at the gate until it is opened by a thread invoking countDown(). A CountDownLatchinitialized to N can be used to make one thread wait until N threads have completed some action, or some action has been completed N times. A useful property of a CountDownLatch is that it doesn't require that threads calling countDown wait for the count to reach zero before proceeding, it simply prevents any thread from proceeding past an await until all threads could pass. The idea here is that the test code will never check the database for the results until the run() method of the worker thread has called latch.countdown(). This is because the test code thread is blocking at the call to latch.await(). latch.countdown() decrements latch's count and once this is zero the blocking call the latch.await() returns and the test code continues executing, safe in the knowledge that any results which should be in the database, are in the database. The test can then retrieve these results and make a valid assertion. Obviously, the code above merely fakes the database connection and operations. The thing is you may not want to, or need to, inject a CountDownLatch directly into your code; after all it's not used in production and it doesn't look particularly clean or elegant. One quick way around this is to simply make the doWork(CountDownLatch latch) method package private and expose it through a public doWork() method. public class ThreadWrapper { /** * Start the thread running so that it does some work. */ public void doWork() { doWork(null); } @VisibleForTesting void doWork(final CountDownLatch latch) { Thread thread = new Thread() { /** * Run method adding data to a fictitious database */ @Override public void run() { System.out.println("Start of the thread"); addDataToDB(); System.out.println("End of the thread method"); countDown(); } private void addDataToDB() { try { Thread.sleep(4000); } catch (InterruptedException e) { e.printStackTrace(); } } private void countDown() { if (isNotNull(latch)) { latch.countDown(); } } private boolean isNotNull(Object obj) { return latch != null; } }; thread.start(); System.out.println("Off and running..."); } } The code above uses Google's Guava @VisibleForTesting annotation to tell us that the doWork(CountDownLatch latch) method visibility has been relaxed slightly for testing purposes. Now I realise that making a method call package private for testing purposes in highly controversial; some people hate the idea, whilst others include it everywhere. I could write a whole blog on this subject (and may do one day), but for me it should be used judiciously, when there's no other choice, for example when you're writing characterisation tests for legacy code. If possible it should be avoided, but never ruled out. After all tested code is better than untested code. With this in mind the next iteration of ThreadWrapper designs out the need for a method marked as @VisibleForTesting together with the need to inject a CountDownLatch into your production code. The idea here is to use the Strategy Pattern and separate the Runnable implementation from the Thread. Hence, we have a very simple ThreadWrapper public class ThreadWrapper { /** * Start the thread running so that it does some work. */ public void doWork(Runnable job) { Thread thread = new Thread(job); thread.start(); System.out.println("Off and running..."); } } and a separate job: public class DatabaseJob implements Runnable { /** * Run method adding data to a fictitious database */ @Override public void run() { System.out.println("Start of the thread"); addDataToDB(); System.out.println("End of the thread method"); } private void addDataToDB() { try { Thread.sleep(4000); } catch (InterruptedException e) { e.printStackTrace(); } } } You'll notice that the DatabaseJob class doesn't use a CountDownLatch. How is it synchronised? The answer lies in the test code below... public class ThreadWrapperTest { @Test public void testDoWork() throws InterruptedException { ThreadWrapper instance = new ThreadWrapper(); CountDownLatch latch = new CountDownLatch(1); DatabaseJobTester tester = new DatabaseJobTester(latch); instance.doWork(tester); latch.await(); boolean result = getResultFromDatabase(); assertTrue(result); } /** * Dummy database method - just return true */ private boolean getResultFromDatabase() { return true; } private class DatabaseJobTester extends DatabaseJob { private final CountDownLatch latch; public DatabaseJobTester(CountDownLatch latch) { super(); this.latch = latch; } @Override public void run() { super.run(); latch.countDown(); } } } The test code above contains an inner class DatabaseJobTester, which extends DatabaseJob. In this class the run() method has been overridden to include a call to latch.countDown() after our fake database has been updated via the call to super.run(). This works because the test passes a DatabaseJobTester instance to the doWork(Runnable job) method adding in the required thread testing capability. The idea of sub-classing objects under test is something I've mentioned before in one of my blogs on testing techniques and is a really powerful technique. So, to conclude: Testing threads is hard. Testing anonymous inner classes is almost impossible. Using Thead.sleep(...) is a risky idea and should be avoided. You can refactor out these problems using the Strategy Pattern. Programming is the Art of Making the Right Decision ...and that relaxing a method's visibility for testing may or may not be a good idea, but more on that later... The code above is available on Github in the captain debug repository (git://github.com/roghughe/captaindebug.git) under the unit-testing-threads project.

February 13, 2013

by Roger Hughes

· 13,933 Views · 12 Likes