Tools Resources

The Latest Tools Topics

JAX RS: Streaming a Response using StreamingOutput

A couple of weeks ago Jim and I were building out a neo4j unmanaged extension from which we wanted to return the results of a traversal which had a lot of paths. Our code initially looked a bit like this: package com.markandjim @Path("/subgraph") public class ExtractSubGraphResource { private final GraphDatabaseService database; public ExtractSubGraphResource(@Context GraphDatabaseService database) { this.database = database; } @GET @Produces(MediaType.TEXT_PLAIN) @Path("/{nodeId}/{depth}") public Response hello(@PathParam("nodeId") long nodeId, @PathParam("depth") int depth) { Node node = database.getNodeById(nodeId); final Traverser paths = Traversal.description() .depthFirst() .relationships(DynamicRelationshipType.withName("whatever")) .evaluator( Evaluators.toDepth(depth) ) .traverse(node); StringBuilder allThePaths = new StringBuilder(); for (org.neo4j.graphdb.Path path : paths) { allThePaths.append(path.toString() + "\n"); } return Response.ok(allThePaths.toString()).build(); } } We then compiled that into a JAR, placed it in ‘plugins’ and added the following line to ‘conf/neo4j-server.properties’: org.neo4j.server.thirdparty_jaxrs_classes=com.markandjim=/unmanaged After we’d restarted the neo4j server we were able to call this end point using cURL like so: $ curl -v http://localhost:7474/unmanaged/subgraph/1000/10 This approach works quite well but Jim pointed out that it was quite inefficient to load all those paths up into memory so we thought it would be quite cool if we could stream it as we got to each path. Traverser wraps an iterator so we are lazily evaluating the result set in any case. After a bit of searching we came StreamingOutput which is exactly what we need. We adapted our code to use that instead: package com.markandjim @Path("/subgraph") public class ExtractSubGraphResource { private final GraphDatabaseService database; public ExtractSubGraphResource(@Context GraphDatabaseService database) { this.database = database; } @GET @Produces(MediaType.TEXT_PLAIN) @Path("/{nodeId}/{depth}") public Response hello(@PathParam("nodeId") long nodeId, @PathParam("depth") int depth) { Node node = database.getNodeById(nodeId); final Traverser paths = Traversal.description() .depthFirst() .relationships(DynamicRelationshipType.withName("whatever")) .evaluator( Evaluators.toDepth(depth) ) .traverse(node); StreamingOutput stream = new StreamingOutput() { @Override public void write(OutputStream os) throws IOException, WebApplicationException { Writer writer = new BufferedWriter(new OutputStreamWriter(os)); for (org.neo4j.graphdb.Path path : paths) { writer.write(path.toString() + "\n"); } writer.flush(); } }; return Response.ok(stream).build(); } As far as I can tell the only discernible difference between the two approaches is that you get an almost immediate response from the streamed approached whereas the first approach has to put everything in the StringBuilder first. Both approaches make use of chunked transfer encoding which according to tcpdump seems to have a maximum packet size of 16332 bytes: 00:10:27.361521 IP localhost.7474 > localhost.55473: Flags [.], seq 6098196:6114528, ack 179, win 9175, options [nop,nop,TS val 784819663 ecr 784819662], length 16332 00:10:27.362278 IP localhost.7474 > localhost.55473: Flags [.], seq 6147374:6163706, ack 179, win 9175, options [nop,nop,TS val 784819663 ecr 784819663], length 16332

July 10, 2013

by Mark Needham

· 114,333 Views · 4 Likes

Easy Messaging with STOMP over WebSockets using ActiveMQ and HornetQ

Expose two very popular JMS implementations, Apache ActiveMQand JBoss HornetQ, to be available to web front-end (JavaScript) using STOMP over Websockets.

July 2, 2013

by Andriy Redko

· 52,846 Views · 3 Likes

git: Having a branch/tag with the same name (error: dst refspec matches more than one.)

Andres and I recently found ourselves wanting to delete a remote branch which had the same name as a tag and therefore the normal way of doing that wasn’t worked out as well as we’d hoped. I created a dummy repository to recreate the state we’d got ourselves into: $ echo "mark" > README $ git commit -am "readme" $ echo "for the branch" >> README $ git commit -am "for the branch" $ git checkout -b same Switched to a new branch 'same' $ git push origin same Counting objects: 5, done. Writing objects: 100% (3/3), 263 bytes, done. Total 3 (delta 0), reused 0 (delta 0) To ssh://[email protected]/markhneedham/branch-tag-test.git * [new branch] same -> same $ git checkout master $ echo "for the tag" >> README $ git commit -am "for the tag" $ git tag same $ git push origin refs/tags/same Counting objects: 5, done. Writing objects: 100% (3/3), 266 bytes, done. Total 3 (delta 0), reused 0 (delta 0) To ssh://[email protected]/markhneedham/branch-tag-test.git * [new tag] same -> same We wanted to delete the remote ‘same’ branch and the following command would work if we hadn’t created a tag with the same name. Instead it throws an error: $ git push origin :same error: dst refspec same matches more than one. error: failed to push some refs to 'ssh://[email protected]/markhneedham/branch-tag-test.git' We learnt that what we needed to do was refer to the full path for the branch when trying to delete it remotely: $ git push origin :refs/heads/same To ssh://[email protected]/markhneedham/branch-tag-test.git - [deleted] same To delete the tag we could do the same thing: $ git push origin :refs/tags/same remote: warning: Deleting a non-existent ref. To ssh://[email protected]/markhneedham/branch-tag-test.git - [deleted] same Of course the tag and branch still exist locally: $ ls -alh .git/refs/heads/ total 16 drwxr-xr-x 4 markhneedham wheel 136B 13 Jun 23:09 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 master -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 same $ ls -alh .git/refs/tags/ total 8 drwxr-xr-x 3 markhneedham wheel 102B 13 Jun 23:08 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 same So we got rid of them as well: $ git checkout master Switched to branch 'master' $ git branch -d same Deleted branch same (was 08ad88c). $ git tag -d same Deleted tag 'same' (was 1187891) And now they are gone: $ ls -alh .git/refs/heads/ total 8 drwxr-xr-x 3 markhneedham wheel 102B 13 Jun 23:16 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 master $ ls -alh .git/refs/tags/ total 0 drwxr-xr-x 2 markhneedham wheel 68B 13 Jun 23:16 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. Out of interest we’d ended up with this situation by mistake rather than by design but it was still fun to do a little bit of git digging to figure out how to solve the problem we’d created for ourselves.

June 17, 2013

by Mark Needham

· 21,468 Views

NetBeans IDE 7.3.1 Now Available with Java EE 7 Support

NetBeans IDE 7.3.1 is an update to NetBeans IDE 7.3 and includes the following highlights: Support for Java EE 7 development Deployment to GlassFish 4 Support for major Java EE 7 specifications: JSF 2.2, JPA 2.1, JAX-RS 2.0, WebSocket 1.0 and more Support for WebLogic 12.1.2 and JBoss 7.x Integration of recent patches There are two ways to get the recent changes: To use the new Java EE 7 support, it is recommended to download and install NetBeans IDE 7.3.1. To get only the integration of recent patches: Launch your current installation of NetBeans IDE 7.3. An update notification will appear in the IDE. Click the notification box to install the updates. OR to perform the update manually, in the IDE select Help-->Check for Updates. NetBeans IDE 7.3.1 is available in English, Brazilian Portuguese, Japanese, Russian, and Simplified Chinese.

June 12, 2013

by Tinu Awopetu

· 9,448 Views

Hadoop REST API - WebHDFS

Hadoop provides a Java native API to support file system operations..

June 3, 2013

by Istvan Szegedi

· 57,498 Views · 5 Likes

Avro's Built-In Sorting

avro has a little-known gem of a feature which allows you to control which fields in an avro record are used for partitioning , sorting and grouping in mapreduce. the following figure gives a quick refresher as to what these terms mean. oh, and don’t take the placement of the “sorting” literally - sorting actually occurs on both the map and reduce side - but it’s always performed in the context of a specific partition (i.e. for a specific reducer). by default all the fields in an avro map output key are used for partitioning, sorting and grouping in mapreduce. let’s walk through an example and see how this works. you’ll begin with a simple schema github source : {"type": "record", "name": "com.alexholmes.avro.weathernoignore", "doc": "a weather reading.", "fields": [ {"name": "station", "type": "string"}, {"name": "time", "type": "long"}, {"name": "temp", "type": "int"}, {"name": "counter", "type": "int", "default": 0} ] } we’re going to see what happens when we run this code against a small sample data set, which we’ll generate using avro code github source : file input = tmpfolder.newfile("input.txt"); avrofiles.createfile(input, weathernoignore.schema$, arrays.aslist( weathernoignore.newbuilder().setstation("sfo").settime(1).settemp(3).build(), weathernoignore.newbuilder().setstation("iad").settime(1).settemp(1).build(), weathernoignore.newbuilder().setstation("sfo").settime(2).settemp(1).build(), weathernoignore.newbuilder().setstation("sfo").settime(1).settemp(2).build(), weathernoignore.newbuilder().setstation("sfo").settime(1).settemp(1).build() ).toarray()); to understand how avro is partitioning, sorting and grouping the data, we’ll write an identity mapper and reducer, with a small enhancement to the reducer to increment the counter field for each record we see in an individual reducer instance github source : package com.alexholmes.avro.sort.basic; import com.alexholmes.avro.weathernoignore; import org.apache.avro.mapred.avrokey; import org.apache.avro.mapred.avrovalue; import org.apache.avro.mapreduce.avrojob; import org.apache.avro.mapreduce.avrokeyinputformat; import org.apache.avro.mapreduce.avrokeyoutputformat; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.nullwritable; import org.apache.hadoop.mapreduce.job; import org.apache.hadoop.mapreduce.mapper; import org.apache.hadoop.mapreduce.reducer; import org.apache.hadoop.mapreduce.lib.input.fileinputformat; import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; import java.io.ioexception; public class avrosort { private static class sortmapper extends mapper, nullwritable, avrokey, avrovalue> { @override protected void map(avrokey key, nullwritable value, context context) throws ioexception, interruptedexception { context.write(key, new avrovalue(key.datum())); } } private static class sortreducer extends reducer, avrovalue, avrokey, nullwritable> { @override protected void reduce(avrokey key, iterable> values, context context) throws ioexception, interruptedexception { int counter = 1; for (avrovalue weathernoignore : values) { weathernoignore.datum().setcounter(counter++); context.write(new avrokey(weathernoignore.datum()), nullwritable.get()); } } } public boolean runmapreduce(final job job, path inputpath, path outputpath) throws exception { fileinputformat.setinputpaths(job, inputpath); job.setinputformatclass(avrokeyinputformat.class); avrojob.setinputkeyschema(job, weathernoignore.schema$); job.setmapperclass(sortmapper.class); avrojob.setmapoutputkeyschema(job, weathernoignore.schema$); avrojob.setmapoutputvalueschema(job, weathernoignore.schema$); job.setreducerclass(sortreducer.class); avrojob.setoutputkeyschema(job, weathernoignore.schema$); job.setoutputformatclass(avrokeyoutputformat.class); fileoutputformat.setoutputpath(job, outputpath); return job.waitforcompletion(true); } } if you look at the output of the job below, you’ll see that the output is sorted across all the fields, and that the sorting is in field ordinal order. what this means is that when mapreduce is sorting these records, it compares the station field first, then the time field second, and so on according to the ordering of the fields in the avro schema. this is pretty much what you’d expect if you write your own complex writable type, and your comparator compared all the fields in order. {"station": "iad", "time": 1, "temp": 1, "counter": 1} {"station": "sfo", "time": 1, "temp": 1, "counter": 1} {"station": "sfo", "time": 1, "temp": 2, "counter": 1} {"station": "sfo", "time": 1, "temp": 3, "counter": 1} {"station": "sfo", "time": 2, "temp": 1, "counter": 1} oh, and before we move on notice that the value for the counter field is always 1 , meaning that each reducer was only fed a single key/vaue pair, which makes sense since our identity mapper only emitted a single value for each key, the keys are unique, and the mapreduce partitioner, sorter and grouper were using all the fields in the record. excluding fields for sorting avro gives us the ability to indicate that specific fields should be ignored when performing ordering functions. in mapreduce these fields are ignored for sorting/partitioning and grouping in mapreduce, which basically means that we have the ability to perform secondary sorting. let’s examine the following schema github source : {"type": "record", "name": "com.alexholmes.avro.weather", "doc": "a weather reading.", "fields": [ {"name": "station", "type": "string"}, {"name": "time", "type": "long"}, {"name": "temp", "type": "int", "order": "ignore"}, {"name": "counter", "type": "int", "order": "ignore", "default": 0} ] } it’s pretty much identical to the first schema, the only difference being that the last two fields are flagged as being “ignored” for sorting/partitioning/grouping. let’s run the same (other than modified to work with the different schema) mapreduce code github source as above against this new schema and examine the outputs. {"station": "iad", "time": 1, "temp": 1, "counter": 1} {"station": "sfo", "time": 1, "temp": 3, "counter": 1} {"station": "sfo", "time": 1, "temp": 2, "counter": 2} {"station": "sfo", "time": 1, "temp": 1, "counter": 3} {"station": "sfo", "time": 2, "temp": 1, "counter": 1} there are a couple of notable differences between this output, and the output from the previous schema which didn’t have any ignored fields. first, it’s clear that the temp field isn’t being used in the sorting, which makes sense since we specified that it should be ignored in the schema. however, more interestingly, note the value of the counter field. all records that had identical station and time values went to the same reducer invocation, evidenced by the increasing value of counter . this is essentially secondary sort! now, all of this greatness isn’t without some limitations: you can’t support two mapreduce jobs that use the same avro key, but have different sorting/partitioning/grouping requirements. although it’s conceivable that you could create a new instance of the avro schema and set the ignored flags for these fields yourself. the partitioner, sorter and grouping functions in mapreduce all work off of the same fields (i.e. they all ignore fields that set as ignored in the schema). this means that your options for secondary sorting are limited. for example, you wouldn’t be able to partition all stations to the same reducer, and then group by station and time. ordering uses a field’s ordinal position to determine its order within the overall set of fields to be ordered. in other words, in a two-field record, the first field is always compared before the second. there’s no way to change this behavior other than flipping the order of the fields in the record. having said all of that - the “ignoring fields” feature for sorting is pretty awesome, and something that will no doubt come in handy in my future mapreduce work.

May 29, 2013

by Alex Holmes

· 8,148 Views

Amazon S3 Parallel MultiPart File Upload

In this blog post, I will present a simple tutorial on uploading a large file to Amazon S3 as fast as the network supports. Amazon S3 is clustered storage service of Amazon. It is designed to make web-scale computing easier. Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers. For using Amazon services, you'll need your AWS access key identifiers, which AWS assigned you when you created your AWS account. The following are the AWS access key identifiers: Access Key ID (a 20-character, alphanumeric sequence) For example: 022QF06E7MXBSH9DHM02 Secret Access Key (a 40-character sequence) For example: kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct Caution Your Secret Access Key is a secret, which only you and AWS should know. It is important to keep it confidential to protect your account. Store it securely in a safe place. Never include it in your requests to AWS, and never e-mail it to anyone. Do not share it outside your organization, even if an inquiry appears to come from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your Secret Access Key. The Access Key ID is associated with your AWS account. You include it in AWS service requests to identify yourself as the sender of the request. The Access Key ID is not a secret, and anyone could use your Access Key ID in requests to AWS. To provide proof that you truly are the sender of the request, you also include a digital signature calculated using your Secret Access Key. The sample code handles this for you. Your Access Key ID and Secret Access Key are displayed to you when you create your AWS account. They are not e-mailed to you. If you need to see them again, you can view them at any time from your AWS account. To get your AWS access key identifiers Go to the Amazon Web Services web site at http://aws.amazon.com. Point to Your Account and click Security Credentials. Log in to your AWS account. The Security Credentials page is displayed. Your Access Key ID is displayed in the Access Identifiers section of the page. To display your Secret Access Key, click Show in the Secret Access Key column. You can use your Amazon keys from a properties file in your application. Here is a sample for properties file containing Amazon keys: # Fill in your AWS Access Key ID and Secret Access Key # http://aws.amazon.com/security-credentials accessKey = secretKey = Here is sample AmazonUtil class for getting AWS Credentials from properties file. public class AmazonUtil { private static final Logger logger = LogUtil.getLogger(); private static final String AWS_CREDENTIALS_CONFIG_FILE_PATH = ConfigUtil.CONFIG_DIRECTORY_PATH + File.separator + "aws-credentials.properties"; private static AWSCredentials awsCredentials; static { init(); } private AmazonUtil() { } private static void init() { try { awsCredentials = new PropertiesCredentials(IOUtil.getResourceAsStream(AWS_CREDENTIALS_CONFIG_FILE_PATH)); } catch (IOException e) { logger.error("Unable to initialize AWS Credentials from " + AWS_CREDENTIALS_CONFIG_FILE_PATH); } } public static AWSCredentials getAwsCredentials() { return awsCredentials; } } Amazon S3 has Multipart Upload service which allows faster, more flexible uploads into Amazon S3. Multipart Upload allows you to upload a single object as a set of parts. After all parts of your object are uploaded, Amazon S3 then presents the data as a single object. With this feature you can create parallel uploads, pause and resume an object upload, and begin uploads before you know the total object size. For more information on Multipart Upload, review the Amazon S3 Developer Guide In this tutorial, my sample application uploads each file parts to Amazon S3 with different threads for using network throughput as possible as much. Each file part is associated with a thread and each thread uploads its associated part with Amazon S3 API. Figure 1. Amazon S3 Parallel Multi-Part File Upload Mechanism Amazon S3 API suppots MultiPart File Upload in this way: 1. Send a MultipartUploadRequest to Amazon. 2. Get a response containing a unique id for this upload operation. 3. For i in ${partCount} 3.1. Calculate size and offset of split-i in whole file. 3.2. Build a UploadPartRequest with file offset, size of current split and unique upload id. 3.3. Give this request to a thread and starts upload by running thread. 3.3.1. Send associated UploadPartRequest to Amazon. 3.3.2. Get response after successful upload and save ETag property of response. 4. Wait all threads to terminate 5. Get ETags (ETag is an identifier for successfully completed uploads) of all terminated threads. 6. Send a CompleteMultipartUploadRequest to Amazon with unique upload id and all ETags. So Amazon joins all file parts as target objects. Here is implementation: public class AmazonS3Util { private static final Logger logger = LogUtil.getLogger(); public static final long DEFAULT_FILE_PART_SIZE = 5 * 1024 * 1024; // 5MB public static long FILE_PART_SIZE = DEFAULT_FILE_PART_SIZE; private static AmazonS3 s3Client; private static TransferManager transferManager; static { init(); } private AmazonS3Util() { } private static void init() { // ... s3Client = new AmazonS3Client(AmazonUtil.getAwsCredentials()); transferManager = new TransferManager(AmazonUtil.getAwsCredentials()); } // ... public static void putObjectAsMultiPart(String bucketName, File file) { putObjectAsMultiPart(bucketName, file, FILE_PART_SIZE); } public static void putObjectAsMultiPart(String bucketName, File file, long partSize) { List partETags = new ArrayList(); List uploaders = new ArrayList(); // Step 1: Initialize. InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, file.getName()); InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest); long contentLength = file.length(); try { // Step 2: Upload parts. long filePosition = 0; for (int i = 1; filePosition < contentLength; i++) { // Last part can be less than part size. Adjust part size. partSize = Math.min(partSize, (contentLength - filePosition)); // Create request to upload a part. UploadPartRequest uploadRequest = new UploadPartRequest(). withBucketName(bucketName).withKey(file.getName()). withUploadId(initResponse.getUploadId()).withPartNumber(i). withFileOffset(filePosition). withFile(file). withPartSize(partSize); uploadRequest.setProgressListener(new UploadProgressListener(file, i, partSize)); // Upload part and add response to our list. MultiPartFileUploader uploader = new MultiPartFileUploader(uploadRequest); uploaders.add(uploader); uploader.upload(); filePosition += partSize; } for (MultiPartFileUploader uploader : uploaders) { uploader.join(); partETags.add(uploader.getPartETag()); } // Step 3: complete. CompleteMultipartUploadRequest compRequest = new CompleteMultipartUploadRequest(bucketName, file.getName(), initResponse.getUploadId(), partETags); s3Client.completeMultipartUpload(compRequest); } catch (Throwable t) { logger.error("Unable to put object as multipart to Amazon S3 for file " + file.getName(), t); s3Client.abortMultipartUpload( new AbortMultipartUploadRequest( bucketName, file.getName(), initResponse.getUploadId())); } } // ... private static class UploadProgressListener implements ProgressListener { File file; int partNo; long partLength; UploadProgressListener(File file) { this.file = file; } @SuppressWarnings("unused") UploadProgressListener(File file, int partNo) { this(file, partNo, 0); } UploadProgressListener(File file, int partNo, long partLength) { this.file = file; this.partNo = partNo; this.partLength = partLength; } @Override public void progressChanged(ProgressEvent progressEvent) { switch (progressEvent.getEventCode()) { case ProgressEvent.STARTED_EVENT_CODE: logger.info("Upload started for file " + "\"" + file.getName() + "\""); break; case ProgressEvent.COMPLETED_EVENT_CODE: logger.info("Upload completed for file " + "\"" + file.getName() + "\"" + ", " + file.length() + " bytes data has been transferred"); break; case ProgressEvent.FAILED_EVENT_CODE: logger.info("Upload failed for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; case ProgressEvent.CANCELED_EVENT_CODE: logger.info("Upload cancelled for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; case ProgressEvent.PART_STARTED_EVENT_CODE: logger.info("Upload started at " + partNo + ". part for file " + "\"" + file.getName() + "\""); break; case ProgressEvent.PART_COMPLETED_EVENT_CODE: logger.info("Upload completed at " + partNo + ". part for file " + "\"" + file.getName() + "\"" + ", " + (partLength > 0 ? partLength : progressEvent.getBytesTransfered()) + " bytes data has been transferred"); break; case ProgressEvent.PART_FAILED_EVENT_CODE: logger.info("Upload failed at " + partNo + ". part for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; } } } private static class MultiPartFileUploader extends Thread { private UploadPartRequest uploadRequest; private PartETag partETag; MultiPartFileUploader(UploadPartRequest uploadRequest) { this.s3Client = s3Client; this.uploadRequest = uploadRequest; } @Override public void run() { partETag = s3Client.uploadPart(uploadRequest).getPartETag(); } private PartETag getPartETag() { return partETag; } private void upload() { start(); } } }

May 28, 2013

by Serkan Özal

· 57,424 Views · 3 Likes

Building a full-text index of git commits using lunr.js and Github APIs

Github has a nice API for inspecting repositories – it lets you read gists, issues, commit history, files and so on. Git repository data lends itself to demonstrating the power of combining full text and faceted search, as there is a mix of free text fields (commit messages, code) and enumerable fields (committers, dates, committer employers). Github APIs return JSON, which has the nice property of resembling a tree structure – results can be recursed over without fear of infinite loops. Note that to download the entire commit history for a repository, you need to page through it by sha hash. The API I use here lacks diffs, which must be retrieved elsewhere. To test this, access a URL like so. The configurable arguments are the repository owner and name fields. https://api.github.com/repos/torvalds/linux/commits This is what a commit looks like: { "sha": "7638417db6d59f3c431d3e1f261cc637155684cd", "url": "https://api.github.com/repos/octocat/Hello-World/git/commits/7638417db6d59f3c431d3e1f261cc637155684cd", "author": { "date": "2008-07-09T16:13:30+12:00", "name": "Scott Chacon", "email": "[email protected]" }, "committer": { "date": "2008-07-09T16:13:30+12:00", "name": "Scott Chacon", "email": "[email protected]" }, "message": "my commit message", "tree": { "url": "https://api.github.com/repos/octocat/Hello-World/git/trees/827efc6d56897b048c772eb4087f854f46256132", "sha": "827efc6d56897b048c772eb4087f854f46256132" }, "parents": [ { "url": "https://api.github.com/repos/octocat/Hello-World/git/commits/7d1b31e74ee336d15cbd21741bc88a537ed063a0", "sha": "7d1b31e74ee336d15cbd21741bc88a537ed063a0" } ] } To make the test simple, I download these as JSON locally, then start a python webserver. Were I to make many such calls on a public site, I’d set up a proxy to the github APIs. python -m SimpleHTTPServer This data has a number of nested objects and must be flattened to fit into the lunr.jsfull-text index. This example uses the commit number (0, 1, 2..N) as the location in the index, but a real environment should use the commit hash to allow partitioning the ingestion process. Nested objects are flattened by joining subsequent keys with underscores in between. A production-worthy solution needs to escape these to prevent collisions. var documents = []; function recurse(doc_num, base, obj, value) { if ($.isPlainObject(value)) { $.each(value, function (k, v) { recurse(doc_num, base + obj + "_", k, v); }); } else { process(doc_num, base + obj, value); } } function process(doc_num, key, value) { if (documents.length <= doc_num) documents[doc_num] = {}; if (value !== null) documents[doc_num][key] = value + ''; } $.each(data, function(doc_num, commit) { $.each(commit, function(k, v) { recurse(doc_num, '', k, v) }); }); Normally, one sets up a lunr full-text index by specifying all the fields, much like Solr’s numerous XML config files. Lunr doesn’t have nearly as many configuration options, since you only specify the ‘boost’ parameter to increase the value of certain fields in ranking. I imagine this will change as the project grows, at the very least to include type hints. Given the simplicity of field objects, you can infer infer the field list from JSON payloads. The code below provides two modes, one where you inspect the entire JSON payload, or one where you limit how many commits you check, a good option when JSON data is consistent. The function accepts configuration objects resembling ExtJS config objects, which lets you override as desired. If fields derived from existing data are required, they can be inserted after any documents are inserted. function inferIndex(documents, config) { return lunr(function() { this.ref('id'); var found = {}; var idx = this; $.each(documents, function(doc_num, doc) { if (config && config.limit && config.limit < doc_num) return; $.each(doc, function(k, v) { if (!found[k]) { if (config && config[k]) { idx.field(k, config[k]); } else { idx.field(k); } found[k] = true; } }); }); }); } var index = inferIndex(documents, {limit: 1, 'commit_author_name':{boost:10}); Inserting flattened documents into the index becomes simple. The method below provides a callback, should you desire to add calculated fields fields. $.each(documents, function(doc_num, attrs, doc_cb) { var doc = $.extend( {id: doc_num}, attrs); if (doc_cb) { doc = doc_cb(doc); } index.add(doc); }); At this point we’ve indexed the entire commit history from a git repository, which lets us search for commits by topic. While this is useful, it’d be really nice to be able to facet on fields, which would return the number of documents in a category, like a SQL group by. I’ve found it particularly convenient to facet on author, date, or author’s company. If you have access to the original documents, you can easily construct facets based on the results of a lunr search: function facet(index, query, data, field) { var results = index.search(query); var facets = {}; $.each(results, function(index, searchResult) { var doc = data[searchResult.ref]; facets[doc[field]] = (facets[doc[field]] === undefined ? 0 : facets[doc[field]]) + 1; } ); return facets; } Commit messages in repositories where I work often contain names of clients who requested a feature or bug fix. Consequently doing a search faceted by author provides a list of who worked with each client the most – this can also tell you who has worked with various pieces of technology. The following query demonstrates this technique: var facets = facet(index, 'driver', documents, 'commit_author_name'); {"Wolfram Sang":24,"Linus Torvalds":3} The approach shown here works well, but requires retrieving results requires access to the original document data. If we want to filter the results to a category, we need a richer search API than lunr currently provides, as well as callback options within the search API. In Solr there are also options to skip lower-casing data, as that may be inappropriate for category titles. Mitigating these issues will be explored further in future essays.

May 20, 2013

by Gary Sieling

· 8,190 Views

Azure Blob Storage - "The specified blob or block content is invalid"

If you’re uploading blobs by splitting blobs into blocks and you get the error – The specified blob or block content is invalid, then this post is for you. Short Version If you’re uploading blobs by splitting blobs into blocks and you get the above mentioned error, ensure that your block ids of your blocks are of same length. If the block ids of your blocks are of different length, you’ll get this error. Long Version Now for the longer version of this post . A few days back I was working with storage client library especially around uploading blobs in chunks and with one particular blob I was constantly getting the error – The specified blob or block content is invalid. I tried numerous combinations even resorting to REST API directly but to no avail. It only happened with just one blob. Furthermore if I uploaded the same blob without splitting it into blocks, all was well. I was at my wits’ end. Tried searching the Internet for this error but could not find a conclusive answer to my problem. After much trial and error, I was able to simulate the same problem on other blobs as well. Here’s how you can recreate it: Start uploading the blob by splitting it into blocks. For block id, let’s do a 7 character long string e.g. intValue.ToString(“d7”). This will ensure that my block ids would be “0000001”, “0000002”, …, ”0000010” ….. After one or two blocks are uploaded, cancel the operation. Now re-upload the blob by splitting it into blocks. However this time for block id, let’s do a 6 character long string e.g. intValue.ToString(“d6”). You’ll get the error as soon as you try to upload the 1st block. Possible Solutions Now that we know the root cause of this problem, let’s look at some of the possible solutions to solve this problem. Wait out One possible solution is to wait out. I know its lame but still a possible solution. We know that Windows Azure Blob Storage Service keeps all uncommitted blocks for a duration of 7 days and if within 7 days those uncommitted blocks are not committed, the storage service purges them. I wish storage service provided some mechanism to purge uncommitted blocks programmatically. Commit uncommitted blocks You could possibly commit the blocks which are in uncommitted state so that at least you get a blob (which would not be the blob we wanted to upload in the first place). You can then delete that blob and re-upload the blob by specifying block ids which are of same length. To fetch the list of uncommitted blocks, if you’re using REST API directly you can perform “Get Block List” operation and pass “blocklisttype=uncommitted” as one of the query string parameters. If you’re using storage client library (assuming you’re using the version 2.x of .Net storage client library), you can do something like the code below: private static List GetUncommittedBlockIds(CloudBlockBlob blob) { var sasUri = blob.GetSharedAccessSignature(new SharedAccessBlobPolicy() { SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(5), Permissions = SharedAccessBlobPermissions.Read, }); var blobUri = new Uri(string.Format("{0}{1}", blob.Uri, sasUri)); List uncommittedBlockIds = new List(); var request = BlobHttpWebRequestFactory.GetBlockList(blobUri, null, null, BlockListingFilter.Uncommitted, null, null); //request.Headers.Add("Authorization", using (var resp = (HttpWebResponse)request.GetResponse()) { using (var stream = resp.GetResponseStream()) { var getBlockListResponse = new GetBlockListResponse(stream); var blocks = getBlockListResponse.Blocks; foreach (var block in blocks.Where(b => !b.Committed)) { uncommittedBlockIds.Add(Encoding.UTF8.GetString(Convert.FromBase64String(block.Name))); } } } return uncommittedBlockIds; } A few things to keep in mind here: Microsoft.WindowsAzure.Storage.Blob namespace does not have the capability to get the list of uncommitted blocks. You would need to make use ofMicrosoft.WindowsAzure.Storage.Blob.Protocol namespace. Because we’re kind of invoking the REST API by executing an HttpWebRequest, I created a shared access signature on the blob so that I don’t have to create “Authorization” header. Fetch uncommitted blocks to see block id length You could fetch the list of uncommitted blocks just to find out the length of the block id used. You could then use that block id length for your new upload session and do the upload. Please see the code snippet above to find this information. Upload another blob with same name without splitting it into blocks You could also upload another blob with the same name without splitting it into blocks. It could very well be a zero byte blob. That way your uncommitted block list will be wiped clean. Then you could delete that dummy blob and re-upload the actual blob. A Few Words About Blocks Since we’re talking about blocks, I thought it might be useful to mention a few points about them: Blocks and block related operations are only applicable for “Block Blobs”. Duh!! You’ll get an error if you’re trying to do these operations on a “Page Blob”. For uploading large blobs, it is recommended that you split your blob into blocks. In fact if your blob size is more than 64 MB, then you have to split it into blocks. Minimum size of a block is 1 Byte and the maximum size of a block is 4 MB. It is recommended that you choose a block size based on your internet connectivity and number of parallel threads you want use to upload these blocks. A blob can be split into a maximum of 50000 blocks. It’s important to remember this limitation because you are reminded of this limit when you’re trying to upload 50001st block. The length of all the block ids must be same. So if you’re using an integer value to denote block id, you make sure that you pad that integer value with “0” so that you get same length. So you could do something likeint.ToString(“d6”). When passing the block id as a parameter, it must be Base64 encoded. While the order in which the blocks are uploaded is not important, the order is important when you commit the block list because that’s when the blob is constructed by the service. For example, let’s say you’re uploading a blob by splitting it into 5 blocks (with ids “000001”, “000002”, “000003”, “000004”, and “000005”). You could upload these blocks in any order – 000004, 000001, 000003, 000005, 000002 however when you commit the block list, ensure that the block ids are passed in proper order i.e. 000001, 000002, 000003, 000004, 000005. Summary That’s it for this post. I hope you’ve found this information useful. I spent considerable amount of time trying to fix this problem so I hope it will help some folks out. As always, if you find any issues with the post please let me know and I’ll fix it ASAP.

May 20, 2013

by Gaurav Mantri

· 10,937 Views

Bucketing, Multiplexing and Combining in Hadoop - Part 1

this is the first blog post in a series which looks at some data organization patterns in mapreduce. we’ll look at how to bucket output across multiple files in a single task, how to multiplex data across multiple files, and also how to coalesce data. these are all common patterns that are useful to have in your mapreduce toolkit. we’ll kick things off with a look at bucketing data outputs in your map or reduce tasks. by default when using a fileoutputformat-derived outputformat (such as textoutputformat), all the outputs for a reduce task (or a map task in a map-only job) are written to a single file in hdfs. imagine a situation where you have user activity logs being streamed into hdfs, and you want to write a mapreduce job to better organize the incoming data. as an example a large organization with multiple products may want to bucket the logs based on the product. to do this you’ll need the ability to write to multiple output files in a single task. let’s take a look at how we can make that happen. multipleoutputformat there are a few ways you can achieve your goal, and the first option we’ll look at is the multipleoutputformat class in hadoop. this is an abstract class that lets you do the following: define the output path for each and every key/value output record being emitted by a task. incorporate the input paths into the output directory for map-only jobs. redefine the key and value that are used to write to the underlying recordwriter . this is useful in situations where you want to remove data from the outputs as it duplicates data in the filename. for each output path, define the recordwriter that should be used to write the outputs. ok enough with the words - let’s look at some data and code. first up is the simple data we’ll use in our example - imagine you work at a fruit market with locations in multiple cities, and you have a purchase transaction stream which contains the store location along with the fruit that was purchased. cupertino apple sunnyvale banana cupertino pear to help bucket your data for future analysis, you want to bin each record into city-specific files. for the simple data set above you don’t want to filter, project or transform your data, just bucket it out, so a simple identity map-only job will do the job. to force more than one mapper, we’ll write the data to two separate files. $ tab="$(printf '\t')" $ hdfs -put - file1.txt << eof cupertino${tab}apple sunnyvale${tab}banana eof $ hdfs -put - file2.txt << eof cupertino${tab}pear eof here’s the code which will let you write city-specific output files. import org.apache.commons.lang.stringutils; import org.apache.hadoop.conf.configuration; import org.apache.hadoop.conf.configured; import org.apache.hadoop.fs.filesystem; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.text; import org.apache.hadoop.mapred.*; import org.apache.hadoop.mapred.lib.identitymapper; import org.apache.hadoop.mapred.lib.multipletextoutputformat; import org.apache.hadoop.util.progressable; import org.apache.hadoop.util.tool; import org.apache.hadoop.util.toolrunner; import java.io.ioexception; import java.util.arrays; /** * an example of how to use {@link org.apache.hadoop.mapred.lib.multipleoutputformat}. */ public class mofexample extends configured implements tool { /** * create output files based on the output record's key name. */ static class keybasedmultipletextoutputformat extends multipletextoutputformat { @override protected string generatefilenameforkeyvalue(text key, text value, string name) { return key.tostring() + "/" + name; } } /** * the main job driver. */ public int run(final string[] args) throws exception { string csvinputs = stringutils.join(arrays.copyofrange(args, 0, args.length - 1), ","); path outputdir = new path(args[args.length - 1]); jobconf jobconf = new jobconf(super.getconf()); jobconf.setjarbyclass(mofexample.class); jobconf.setnumreducetasks(0); jobconf.setmapperclass(identitymapper.class); jobconf.setinputformat(keyvaluetextinputformat.class); jobconf.setoutputformat(keybasedmultipletextoutputformat.class); fileinputformat.setinputpaths(jobconf, csvinputs); fileoutputformat.setoutputpath(jobconf, outputdir); return jobclient.runjob(jobconf).issuccessful() ? 0 : 1; } /** * main entry point for the utility. * * @param args arguments * @throws exception when something goes wrong */ public static void main(final string[] args) throws exception { int res = toolrunner.run(new configuration(), new mofexample(), args); system.exit(res); } } run this code and you’ll see the following files in hdfs, where /output is the job output directory: $ hadoop fs -lsr /output /output/cupertino/part-00000 /output/cupertino/part-00001 /output/sunnyvale/part-00000 if you look at the output files you’ll see that the files contain the correct buckets. $ hadoop fs -lsr /output/cupertino/* cupertino apple cupertino pear $ hadoop fs -lsr /output/sunnyvale/* sunnyvale banana awesome, you have your data bucketed by store. now that we have everything working, let’s look at what we did to get there. we had to do two things to get this working: extend multipletextoutputformat this is where the magic happened - let’s look at that class again. static class keybasedmultipletextoutputformat extends multipletextoutputformat { @override protected string generatefilenameforkeyvalue(text key, text value, string name) { return key.tostring() + "/" + name; } } you are working with text, which is why you extended multipletextoutputformat , a class that in turn extends multipleoutputformat . multipletextoutputformat is a simple class which instructs the multipleoutputformat to use textoutputformat as the underlying output format for writing out the records. if you were to use multipleoutputformat as-is it behaves as if you were using the regular textoutputformat , which is to say that it’ll only write to a single output file. to write data to multiple files you had to extend it, as with the example above. the generatefilenameforkeyvalue method allows you to return the output path for an input record. the third argument, name , is the original fileoutputformat -created filename, which is in the form “part-nnnnn”, where “nnnnn” is the task index, to ensure uniqueness. to avoid file collisions, it’s a good idea to make sure your generated output paths are unique, and leveraging the original output file is certainly a good way of doing this. in our example we’re using the key as the directory name, and then writing to the original fileoutputformat filename within that directory. specify the outputformat the next step was easy - specify that this output format should be used for your job: jobconf.setoutputformat(keybasedmultipletextoutputformat.class); earlier we also mentioned that you can use the input path as part of the output path, which we will look at next. using the input filename as part of the output filename in map-only jobs what if we wanted to keep the input filename as part of the output filename? this only works for map-only jobs, and can be accomplished by overriding the getinputfilebasedoutputfilename method. let’s look at the following code to understand how this method fits into the overall sequence of actions that the multipleoutputformat class performs: public void write(k key, v value) throws ioexception { // get the file name based on the key string keybasedpath = generatefilenameforkeyvalue(key, value, myname); // get the file name based on the input file name string finalpath = getinputfilebasedoutputfilename(myjob, keybasedpath); // get the actual key k actualkey = generateactualkey(key, value); v actualvalue = generateactualvalue(key, value); recordwriter rw = this.recordwriters.get(finalpath); if (rw == null) { // if we don't have the record writer yet for the final path, create // one // and add it to the cache rw = getbaserecordwriter(myfs, myjob, finalpath, myprogressable); this.recordwriters.put(finalpath, rw); } rw.write(actualkey, actualvalue); }; the getinputfilebasedoutputfilename method is called with the output of generatefilenameforkeyvalue , which contains our already-customized output file. our new keybasedmultipletextoutputformat can now be updated to override getinputfilebasedoutputfilename and append the original input filename to the output filename: static class keybasedmultipletextoutputformat extends multipletextoutputformat { @override protected string generatefilenameforkeyvalue(object key, object value, string name) { return key.tostring() + "/" + name; } @override protected string getinputfilebasedoutputfilename(jobconf job, string name) { string infilename = new path(job.get("map.input.file")).getname(); return name + "-" + infilename; } if you run with your modified outputformat class you’ll see the following files in hdfs, confirming that the input filenames are now concatenated to the end of each output file. $ hadoop fs -lsr /output /output/cupertino/part-00000-file1.txt /output/cupertino/part-00001-file2.txt /output/sunnyvale/part-00000-file1.txt the implementation of getinputfilebasedoutputfilename in multipleoutputformat doesn’t do anything interesting by default, but if you set the value of the mapred.outputformat.numoftrailinglegs configurable to an integer greater than 0, then the getinputfilebasedoutputfilename will use part of the input path as the output path. let’s see what happens when we set the value to 1: jobconf.setint("mapred.outputformat.numoftrailinglegs", 1); the output files in hdfs now exactly mirror the input files used for the job: $ hadoop fs -lsr /output /output/file1.txt /output/file2.txt if we set mapred.outputformat.numoftrailinglegs to 2, and our input files exist in the /inputs directory, then our output directory looks like this: $ hadoop fs -lsr /output /output/input/file1.txt /output/input/file2.txt basically as you keep incrementing mapred.outputformat.numoftrailinglegs , then multipleoutputformat will continue to go up the parent directories of the input file and use them in the output path. modifying the output key and value it’s very possible that the actual key and value you want to emit are different from those that were used to determine the output file. in our example, we took the output key and wrote to a directory using the key name. if you do that keeping the key in the output file may be redundant. how would we modify the output record so that the key isn’t written? multipleoutputformat has your back with the generateactualkey method. class keybasedmultipletextoutputformat extends multipletextoutputformat { @override protected string generatefilenameforkeyvalue(text key, text value, string name) { return key.tostring() + "/" + name; } @override protected text generateactualkey(text key, text value) { return null; } } the returned value from this method replaces the key that’s supplied to the underlying recordwriter , so if you return null as in the above example, no key will be written to the file. $ hadoop fs -lsr /output/cupertino/* apple pear $ hadoop fs -lsr /output/sunnyvale/* banana you can achieve the same result for the output value by overriding the generateactualvalue method. changing the recordwriter in our final step we’ll look at how you can leverage multiple recordwriter classes for different output files. this is accomplished by overriding the getrecordwriter method. in the example below we’re leveraging the same textoutputformat for all the files, but it gives you a sense of what can be accomplished. static class keybasedmultipletextoutputformat extends multipletextoutputformat { @override protected string generatefilenameforkeyvalue(text key, text value, string name) { return key.tostring() + "/" + name; } @override public recordwriter getrecordwriter(filesystem fs, jobconf job, string name, progressable prog) throws ioexception { if (name.startswith("apple")) { return new textoutputformat().getrecordwriter(fs, job, name, prog); } else if (name.startswith("banana")) { return new textoutputformat().getrecordwriter(fs, job, name, prog); } return super.getrecordwriter(fs, job, name, prog); } } conclusion when using multipleoutputformat , give some thought to the number of distinct files that each reducer will create. it would be prudent to plan your bucketing so that you have a relatively small number of files. in this post we extended multipletextoutputformat , which is a simple extension of multipleoutputformat that supports text outputs. multiplesequencefileoutputformat also exists to support sequencefiles in a similar fashion. so what are the shortcomings with the multipleoutputformat class? if you have a job that uses both map and reduce phases, then multipleoutputformat can’t be used in the map-side to write outputs. of course, multipleoutputformat works fine in map-only jobs. all recordwriter classes must support exactly the same output record types. for example, you wouldn’t be able to support a recordwriter that emitted for one output file, and have another recordwriter that emitted . multipleoutputformat exists in the mapred package, so it won’t work with a job that requires use of the mapreduce package. all is not lost if you bump into either one of these issues, as you’ll discover in the next blog post.

May 20, 2013

by Alex Holmes

· 6,324 Views

JavaFX Accordion Slide Out Menu for the NetBeans Platform

Let's say you have a NetBeans Platform application that puts a premium on vertical space. Maybe a Heads Up Display on a Touch Screen? Wouldn't it be great to have the menu slide out from the edge of the screen only when you need it? Well the NetBeans Platform provides slide-in TopComponents, of course, but a JMenu just isn't going to work out so well inside one. We can use JavaFX as part of the solution as it provides some capabilities that the base Swing components available in the NetBeans Platform do not. Let's say we take all of our root MenuBar items and place them within an Accordion type pane. Each collapsible TitledPane of the Accordion control could then contain the sub-menu items, maybe represented by a JavaFX MenuButton. This would allow for a recursive Menu like effect but the overall container could be placed anywhere. Something like the screenshot below: What we see here is the described effect sliding out and overlayed on top of the Favorites tab. I sprinkled in some transparency for good measure. Notice how we are able to completely eliminate the Menu Bar and Tool Bar gaining potentially valuable real estate? The rest of this tutorial will explain the steps necessary to achieve something like this. That article was written by Geertjan Wielenga and it will become clear that much of the base code to accomplish this article was extended from Geertjan's example. Thanks again Geertjan! Similar articles to this that may be helpful are below: https://dzone.com/articles/javafx-fxml-meets-netbeans https://dzone.com/articles/how-embed-javafx-chart-visual All these articles are loosely coupled in a tutorial arc towards explaining and demonstrating the advantages of integrating JavaFX into the NetBeans Platform. The following two steps are borrowed exactly as found from Geertjan's tutorial: Step 1. Remove the default menubar and replace with your own: import org.openide.modules.ModuleInstall; public class Installer extends ModuleInstall { @Override public void restored() { System.setProperty("netbeans.winsys.menu_bar.path", "LookAndFeel/MenuBar.instance"); } } Step 2: In the layer.xml file define your Swing replacement menubar. I have also taken the liberty to hide the Toolbars as well. Now why are we replacing the old MenuBar with a new MenuBar if we intend to hide it? Well if you hide the MenuBar via the layer.xml as I did the Toolbars the filesystem folder tree will not be instantiated. That means we won't be able to dynamically determine the Menu Folder tree to rebuild our custom AccordionMenu. The solution? Make an empty Menubar. package polaris.javafxwizard.jfxmenu; import javax.swing.JMenuBar; /** * * @author SPhillips (King of Australia) */ public class HiddenMenuBar extends JMenuBar { public HiddenMenuBar() { super(); } } Step 3: Build an "AccordionMenu" using JavaFX This is where the tutorials diverge and this process gets a bit more complicated. Our task is to use the JavaFX/Swing Interop pattern to create a component that extends JFXPanel yet can give the user access to all the items that were once in the Menu Bar. The basic algorithm is as such: Create a component that extends JFXPanel Implement the standard Platform.runLater() pattern for creating a JavaFX scene Loop through each top level file object in the Menu folder of the application file system: Create a JavaFX Flow Pane for each file object Recursively create JavaFX ButtonMenu items for submenus Add ButtonMenu items to FlowPanes Add FlowPane to JavaFX TitledPane Add TitledPane to JavaFX Accordion component Add Accordion to scene So instead of Menus and SubMenus, we are using MenuButtons which can be recursively added to other MenuButtons and MenuItems. The Accordion control gives us a space saving collapsible view with some nice animation. The FlowPane makes it easy to layout the MenuButtons horizontally in a way that maximizes space. Below is the code for my AccordionMenu class. You will see where I borrowed heavily from Geertjan's example: polaris.javafxwizard.jfxmenu; import java.lang.reflect.InvocationTargetException; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import javafx.application.Platform; import javafx.embed.swing.JFXPanel; import javafx.event.ActionEvent; import javafx.event.EventHandler; import javafx.geometry.Orientation; import javafx.scene.Group; import javafx.scene.Scene; import javafx.scene.control.Accordion; import javafx.scene.control.Button; import javafx.scene.control.MenuButton; import javafx.scene.control.MenuItem; import javafx.scene.control.TitledPane; import javafx.scene.effect.DropShadow; import javafx.scene.layout.FlowPane; import javafx.scene.paint.Color; import javax.swing.Action; import javax.swing.SwingUtilities; import org.openide.awt.Actions; import org.openide.filesystems.FileObject; import org.openide.filesystems.FileUtil; import org.openide.loaders.DataFolder; import org.openide.loaders.DataObject; import org.openide.util.Exceptions; /** * * @author SPhillips (King of Australia) */ public class AccordionMenu extends JFXPanel{ public Accordion accordionPane; public String transparentCSS = "-fx-background-color: rgba(0,100,100,0.1);"; public AccordionMenu() { super(); // create JavaFX scene Platform.setImplicitExit(false); Platform.runLater(new Runnable() { @Override public void run() { createScene(); //Standard Swing Interop Pattern } }); } private void createScene() { FileObject menuFolder = FileUtil.getConfigFile("Menu"); FileObject[] menuKids = menuFolder.getChildren(); //for each Menu folder need to create a TilePane and add it to an Accordion List titledPaneList = new ArrayList<>(); for (FileObject menuKid : FileUtil.getOrder(Arrays.asList(menuKids), true)) { //Build a Flow pane based on menu children //TOP level menu items should all be flow panes FlowPane flowPane = buildFlowPane(menuKid); flowPane.setStyle(transparentCSS); TitledPane newTitledPaneFromFileObject = new TitledPane(menuKid.getName(), flowPane); newTitledPaneFromFileObject.setAnimated(true); newTitledPaneFromFileObject.autosize(); newTitledPaneFromFileObject.setStyle(transparentCSS); titledPaneList.add(newTitledPaneFromFileObject); } Group g = new Group(); Scene scene = new Scene(g, 400, 400,new Color(0.0,0.0,0.0,0.0)); scene.setFill(null); g.setStyle(transparentCSS); accordionPane = new Accordion(); accordionPane.setStyle(transparentCSS); accordionPane.getPanes().addAll(titledPaneList); g.getChildren().add(accordionPane); setScene(scene); validate(); this.setOpaque(true); this.setBackground(new java.awt.Color(0.0f, 0.0f, 0.0f, 0.0f)); } private FlowPane buildFlowPane(FileObject fo) { //FlowPanes are made up of Buttons and MenuButtons built from actions and sub menus FlowPane flowPane = new FlowPane(Orientation.HORIZONTAL,5,5); flowPane.setStyle(transparentCSS); //If anything at the Flow Pane level is an action we need to add it as a button //otherwise we can recursively build it as a MenuButton DataFolder df = DataFolder.findFolder(fo); DataObject[] childs = df.getChildren(); for (DataObject oneChild : childs) { //If child is folder we need to build recursively if (oneChild.getPrimaryFile().isFolder()) { FileObject childFo = oneChild.getPrimaryFile(); MenuButton newMenuButton = new MenuButton(childFo.getName()); buildMenuButton(childFo, newMenuButton); flowPane.getChildren().add(newMenuButton); } else { Object instanceObj = FileUtil.getConfigObject(oneChild.getPrimaryFile().getPath(), Object.class); if (instanceObj instanceof Action) { //If it is an Action we have reached an endpoint final Action a = (Action) instanceObj; String name = (String) a.getValue(Action.NAME); String cutAmpersand = Actions.cutAmpersand(name); Button buttonItem = new Button(cutAmpersand); MenuEventHandler meh = new MenuEventHandler(a); buttonItem.setOnAction(meh); buttonItem.setEffect(new DropShadow()); flowPane.getChildren().add(buttonItem); } } } return flowPane; } private void buildMenuButton(FileObject fo, MenuButton menuButton) { DataFolder df = DataFolder.findFolder(fo); DataObject[] childs = df.getChildren(); for (DataObject oneChild : childs) { //If child is folder we need to build recursively if (oneChild.getPrimaryFile().isFolder()) { FileObject childFo = oneChild.getPrimaryFile(); //Menu newMenu = new Menu(childFo.getName()); MenuButton newMenuButton = new MenuButton(childFo.getName()); //menu.getItems().add(newMenu); buildMenuButton(childFo, newMenuButton); } else { Object instanceObj = FileUtil.getConfigObject(oneChild.getPrimaryFile().getPath(), Object.class); if (instanceObj instanceof Action) { //If it is an Action we have reached an endpoint final Action a = (Action) instanceObj; String name = (String) a.getValue(Action.NAME); String cutAmpersand = Actions.cutAmpersand(name); MenuItem menuItem = new MenuItem(cutAmpersand); MenuEventHandler meh = new MenuEventHandler(a); menuItem.setOnAction(meh); menuButton.getItems().add(menuItem); } } } } private class MenuEventHandler implements EventHandler { public Action theAction; public MenuEventHandler(Action action) { super(); theAction = action; } @Override public void handle(final ActionEvent t) { try { SwingUtilities.invokeAndWait(new Runnable() { @Override public void run() { java.awt.event.ActionEvent event = new java.awt.event.ActionEvent( t.getSource(), t.hashCode(), t.toString()); theAction.actionPerformed(event); } }); } catch ( InterruptedException | InvocationTargetException ex) { Exceptions.printStackTrace(ex); } } } } I took the liberty of placing a few CSS stylings here and there, trying to play with the transparency. Also I found that it looked better if a JavaFX Button was used for any Actions found at the very top level, instead of a MenuButton with a single item. Step 4: Build a Slide in TopComponent for the new AccordionMenu Now that you have a JFXPanel Swing Interop component, your NetBeans Platform TopComponent doesn't need to know about JavaFX. However in this scenario the Platform also is contributing via its wonderful docking framework. Use the Window wizard and select Left Sliding In as a mode. I would also advise making this component not closable, otherwise the user could lose the ability to use the menu. Here are the annotations and constructor code in my TopComponent: @ConvertAsProperties( dtd = "-//polaris.javafxwizard.jfxmenu//SlidingAccordion//EN", autostore = false) @TopComponent.Description( preferredID = "SlidingAccordionTopComponent", iconBase="polaris/javafxwizard/jfxmenu/categories.png", persistenceType = TopComponent.PERSISTENCE_ALWAYS) @TopComponent.Registration(mode = "leftSlidingSide", openAtStartup = true) @ActionID(category = "Window", id = "polaris.javafxwizard.jfxmenu.SlidingAccordionTopComponent") @ActionReference(path = "Menu/JavaFX" /*, position = 333 */) @TopComponent.OpenActionRegistration( displayName = "#CTL_SlidingAccordionAction", preferredID = "SlidingAccordionTopComponent") @Messages({ "CTL_SlidingAccordionAction=SlidingAccordion", "CTL_SlidingAccordionTopComponent=SlidingAccordion Window", "HINT_SlidingAccordionTopComponent=This is a SlidingAccordion window" }) public final class SlidingAccordionTopComponent extends TopComponent { public AccordionMenu accordionMenu; public SlidingAccordionTopComponent() { initComponents(); setName(Bundle.CTL_SlidingAccordionTopComponent()); setToolTipText(Bundle.HINT_SlidingAccordionTopComponent()); putClientProperty(TopComponent.PROP_CLOSING_DISABLED, Boolean.TRUE); putClientProperty(TopComponent.PROP_DRAGGING_DISABLED, Boolean.TRUE); putClientProperty(TopComponent.PROP_MAXIMIZATION_DISABLED, Boolean.TRUE); putClientProperty(TopComponent.PROP_UNDOCKING_DISABLED, Boolean.TRUE); putClientProperty(TopComponent.PROP_KEEP_PREFERRED_SIZE_WHEN_SLIDED_IN, Boolean.TRUE); setLayout(new BorderLayout()); //Standard JFXPanel Swing Interop Pattern accordionMenu = new AccordionMenu(); //transparency Color transparent = new Color(0.0f, 0.0f, 0.0f, 0.0f); accordionMenu.setOpaque(true); accordionMenu.setBackground(transparent); this.add(accordionMenu); this.setOpaque(true); this.setBackground(transparent); } Step 5. See how great it looks We now have a slide out collapsible application menu provided by JavaFX components. These components can be "skinned" using CSS stylings and as such the menu can be crafted differently for different applications. (By the way if anyone reading this has some ideas please contact me because I am not a CSS guy at all) Best of all we have adapted our application to work nicely with a Heads Up Display or Kiosk view that typically run on touchscreen computers. This is because we have saved real estate and implemented an interface that is more condusive to single touches versus mouse drag events. Hey let's see how it might look with an application that needs all the space it can get?

May 17, 2013

by Sean Phillips

· 20,501 Views

Deploy a File Server in the Cloud (WebDav on Windows Azure)

this month, my fellow it pro technical evangelists and i are authoring a new series of articles on 20 key scenarios with windows azure infrastructure services . check out the list of articles here: http://mythoughtsonit.com/2013/05/20-key-scenarios-with-windows-azure-infrastructure-services/ . web-based distributed authoring and versioning, or webdav, is a set of protocols based on http that allows end-users to map a network drive over http and edit content and files stored on the web server. when webdav was first offered on microsoft server i had evaluated it and decided it did not perform well enough for me. the webdav extension to iis was completely rewritten back in the server 2008 timeframe and is worth taking a look at again. in this article i will guide you step by step through the process of setting up webdav on server 2012 in a windows azure iaas environment. this will give you a solid performing file share on the internet over port 80 and the http protocol. first you need an azure account. you can setup a free trail of azure. details can be found here: http://mythoughtsonit.com/2013/04/step-by-step-guide-to-setting-up-a-windows-azure-free-trial/ second provision a server 2012 machine. watch a video of what to do here: third open port 80 to this new server: in the azure portal select your 2012 server and choose the “endpoints” tab on the top. click “add endpoint” at the bottom of the screen enter the endpoint information for port 80 to port 80 done. next we need to install the iis webserver and webdav. installing webdav on iis 8.0 start server manager and go to “add roles and features” under server roles – add the web server (iis) role click through the wizard until you come to the role services section. then find and select “webdav publishing” and “windows authentication” click next and then install when the install is finished you are ready to move on to the next section. configuring iis 8 for webdav after the installation finishes you need to configure the box for access. start the iis manager tool. choose the “default web site” on the left side. then click on “authentication” open the windows authentication option and enable it. open the “webdav authoring rules” create a webdav rule. i choose to allow all users access to all content. a better security practice is to limit what users can use the service. it’s your data so you decide. make sure webdav is enabled and that your access rule is set: that is it… now your ready to access your webdav file share! test and insure you can hit the web server by using your browser: because you opened port 80 and installed iis 8 you should see the default web page when you browse to your servers internet dns name. example: http://yourdomainname.cloudapp.net/ how to map a drive to your webdav server: there are two ways i use to connect to the webdav server how to map a drive to your webdav server from the win 8 gui: from windows explorer, right click on “computer” and select “map a network drive” map your network drive by entering the address to your server example: http://yourdomainname.cloudapp.net/ i selected “connect using different credentials” because my workstation was not joined to the server in anyway and i needed to use an account in the servers local sam database. hit “finish” and enter your credentials. now you will have a connected drive that you can access from windows explorer or any tool via the drive mapping. how to map a drive to your webdav server from a cmd box: 1. hit windows start and type: cmd 2. enter the command: net use [drive letter] [url] example: net use e: http://yourdomainname.cloudapp.net/

May 15, 2013

by Brian Lewis

· 15,969 Views

Synchronising Multithreaded Integration Tests revisited

I recently stumbled upon an article Synchronising Multithreaded Integration Tests on Captain Debug's Blog. That post emphasizes the problem of designing integration tests involving class under test running business logic asynchronously. This contrived example was given (I stripped some comments): public class ThreadWrapper { public void doWork() { Thread thread = new Thread() { @Override public void run() { System.out.println("Start of the thread"); addDataToDB(); System.out.println("End of the thread method"); } private void addDataToDB() { // Dummy Code... try { Thread.sleep(4000); } catch (InterruptedException e) { e.printStackTrace(); } } }; thread.start(); System.out.println("Off and running..."); } } This is only an example of common pattern where business logic is delegated to some asynchronous job pool we have no control over. Roger Hughes (the author) enumerates few techniques of testing such code, including: arbitrary ("long enough") sleep() in test method to make sure background logic finishes refactoring doWork() so that it accepts CountDownLatch and agrees to notify it when job is done making the method above package private and @VisibleForTesting only "The" solution - refactoring doWork() so that it accepts arbitrary Runnable. In test we can wrap this Runnable (decorator pattern) and wait for inner Runnable to complete Last solution is not bad but it changes the responsibilities of ThreadWrapper significantly. Now it's up to the caller to decide what kind of job should be executed asynchronously while previously ThreadWrapper was encapsulating business logic completely. I am not saying it's a bad design, but it's drastically different from original method. Awaitility Can we write a test without such a massive refactoring? First solution involves handy library called Awaitility. This library is not a silver bullet, it simply evaluates given condition periodically and makes sure it's fulfilled within given time. It's the kind of code you probably wrote once or twice - wrapped in a library with well designed API. So here is our initial approach: import static com.jayway.awaitility.Awaitility.await; import static java.util.concurrent.TimeUnit.SECONDS; //... await().atMost(10, SECONDS).until(recordInserted()); //... private Callable recordInserted() { return new Callable() { @Override public Boolean call() throws Exception { return dataExists(); } }; } I think there is nothing to explain here. dataExists() is simply a boolean method that initially returns false but will eventually return true once the background task (addDataToDB()) is done. In other words we assume that background task introduces some side effect and dataExists() can detect that side effect. BTW I happened to have JDK 8 with Lambda support installed and IntelliJ IDEA gives me this nice tooltip: Suddenly I get this Java 8-compatible alternative suggested: private Callable recordInserted() { return () -> dataExists(); } But there's more: Which transforms my code to: private Callable recordInserted() { return this::dataExists; } this:: prefix means that recordInsterted is a method of current object. Just as well we can say someDao::dataExists. Simply put this syntax turns method into a function object we can pass around (this process is called eta expansion in Scala). By now recordInsterted() method is no longer that needed so I can inline it and remove it completely: await().atMost(10, SECONDS).until(this::dataExists); I am not sure what I love more - the new lambda syntax or how IntelliJ IDEA takes pre-Java 8 code and retrofits it for me automatically (well, it's still a bit experimental, just reported IDEA-106670). I can run this intention in IntelliJ project-wide, Lambda-enabling my whole code base in seconds. Sweet! But back to original problem. Awaitility helps a lot by providing decent API and some handy features. I use it extensively in combination with FluentLenium. But periodically polling for state changes feels a bit like a workaround and still introduces minimal latency. But notice that running and synchronizing on asynchronous tasks is quite common and JDK already provides necessary facilities: Future abstraction! java.util.concurrent.Future To limit the scope of refactoring I will leave the original new Thread() approach for now and use SettableFuture from Guava. It is a Future implementation that allows triggering completion or failure at any time, from any thread (see DeferredResult - asynchronous processing in Spring MVC for more advanced usage). As you can see the changes are quite small: public class ThreadWrapper { public ListenableFuture doWork() { final SettableFuture future = SettableFuture.create(); Thread thread = new Thread() { @Override public void run() { addDataToDB() //... //last instruction future.set(null); } private void addDataToDB() { // Dummy Code... // ... } }; thread.start(); return future; } } doWork() now returns ListenableFuture with lifecycle controlled inside asynchronous task. We use Void but in reality you might want to return some asynchronous result instead. future.set(null) invocation in the end is crucial. It signals that future is fulfilled and all threads waiting for that future will be notified. Once again, in practice you would use e.g. Future and then instead of null we would say future.set(someInteger). Here null is just a placeholder for Void type. How does this help us? Test code can now rely on future completion: final ListenableFuture future = wrapper.doWork(); future.get(10, SECONDS); future.get() blocks until future is done (with timeout), i.e. until we call future.set(...). BTW I use ListenableFuture from Guava but Java 8 introduces equivalent and standard CompletableFuture - I will write about it soon. So, we are getting somewhere. Future is a useful abstraction for waiting and signalling completion of background jobs. But there is also one immense advantage of Future which are not taking, ekhm, advantage from - exception handling and propagation. Future.get() will block until future is complete and return asynchronous result or throw an exception initially thrown from our job. This is really useful for asynchronous tests. Currently if Thread.run() throws an exception it may or may not be logged or visible to us and future will never be completed. With Awaitility it's slightly better - it will timeout without any meaningful reason, which have to be tracked down manually in console/logs. But with minor modification our test is much more verbose: public void run() { try { addDataToDB() //... future.set(null); } catch (Exception e) { future.setException(e); } } If some exception occurs in asynchronous job, it will pop-up and be shown as JUnit/TestNG failure reason. (Listening)ExecutorService That's it. If addDataToDB() throws an exception it will not be lost. Instead our future.get() in test will re-throw that exception for us. Our test won't simply timeout leaving us with no clue what went wrong. Great, but do we really have to create this special SettableFuture instance, can't we just use existing libraries that already give us Future with correct underlying implementation? Of course! By this requires further refactoring: import com.google.common.util.concurrent.ListeningExecutorService; import com.google.common.util.concurrent.MoreExecutors; import java.util.concurrent.Executors; import java.util.concurrent.Future; public class ThreadWrapper { private final ListeningExecutorService executorService = MoreExecutors.listeningDecorator( Executors.newSingleThreadExecutor() ); public ListenableFuture doWork() { Runnable job = new Runnable() { @Override public void run() { //... } }; return executorService.submit(job); } } This is what you've all been waiting for. Don't start new Thread all the time, use thread pool! I actually went one step further by using ListeningExecutorService - an extension to ExecutorService that returns ListenableFuture instances (see why you want that). But the solution doesn't require this, I just spread good practices. As you can see Future instance is now created and managed for us. The test is exactly the same but production code is cleaner and more robust. MoreExecutors.sameThreadExecutor() The final trick I want to show you involves dependency injection. First let's externalize the creation of a thread pool from ThreadWrapper class: private final ListeningExecutorService executorService; public ThreadWrapper() { this(Executors.newSingleThreadExecutor()); } public ThreadWrapper(ExecutorService executorService) { this.executorService = MoreExecutors.listeningDecorator(executorService); } We can now optionally supply custom ExecutorService. This is good for various other reasons, but for us it opens brand new testing opportunity: MoreExecutors.sameThreadExecutor(). This time we modify our test slightly: final ThreadWrapper wrapper = new ThreadWrapper(MoreExecutors.sameThreadExecutor()); wrapper.doWork().get(); See how we pass custom ExecutorService? It's a very special implementation that doesn't really maintain thread pool of any kind. Every time you submit() some task to that "pool" it will be executed in the same thread in a blocking manner. This means that we no longer have asynchronous test, even though the production code wasn't changed that much! wrapper.doWork() will block until "background" job finishes. The extra call to get() is still needed to make sure exceptions are propagated, but is guaranteed to never block (because the job is already done). Using the same thread to execute asynchronous task instead of a thread pool might have an unexpected results if you somehow depend on thread-based properties, e.g. transactions, security, ThreadLocal. However if you use standard ThreadPoolExecutor with CallerRunsPolicy, JDK already behaves this way if thread pool is overflowed. So it's not that unusual. Summary Testing asynchronous code is hard, but you have options. Several options. But one conclusion that strikes me is the side effect of our efforts. We refactored original code in order to make it testable. But the final production code is not only testable, but also much better structured and robust. Surprisingly it's even source-code compatible with previous version as we barely changed return type from void to Future. It seems to be a rule - testable code is often better designed and implemented. Unit test is the first client code using our library. It naturally forces us to to think more about consumers, not the implementation.

May 7, 2013

by Tomasz Nurkiewicz

· 8,995 Views · 1 Like

How to Integrate JavaFX into a NetBeans Platform Wizard (Part 1)

When working within the NetBeans Platform, Swing is King. JavaFX is the crown prince. However, some developers avoid developing GUI controls with JavaFX in the NetBeans Platform because Swing is available by default. Well, it is possible to develop your JavaFX forms and simply replace the default NetBeans panels. The following tutorial explains how a developer can take a JavaFX GUI form and FXML developed using Scene Builder and replace a NetBeans Platform Wizard visual panel with minimal effort. Now, why would this concept be useful? Well, consider a development team where new Java applications are being written in JavaFX. Why rewrite the useful Panel classes to Swing just to use them within a NetBeans Platform Wizard? Why force new form development to be in Swing just to be compatible with a NetBeans Platform application? NetBeans Platform applications are perfectly capable of rendering JavaFX interop'd with Swing. Here's how: First you will need to do a little prep work to setup an application for this tutorial. Do the following: Create the JavaFX GUI. Create a new JavaFX FXML GUI using SceneBuilder. Add the controls you want and generate your FXML file and controller class. Update your Controller Class by Extending JFXPanel. This is part of the Swing Interop pattern that we all know and love. You will also need to @Override the getName() method so that the wizard framework can update the current step title. Encapsulate fields/values. Create public methods that will provide Wizard framework with the fields it needs to pass from panel to panel. This is the same thing you would need to do with a standard Swing Wizard JPanel class. The code for your controller class still runs without a problem within your JavaFX application but is now Swing Interop compatible. The code might look like this: package jfxwizpanel.jfxwiz; import java.io.File; import java.net.URL; import java.util.ResourceBundle; import javafx.embed.swing.JFXPanel; import javafx.event.ActionEvent; import javafx.fxml.FXML; import javafx.fxml.Initializable; import javafx.scene.control.Button; import javafx.scene.control.TextField; import javafx.stage.FileChooser; /** * * @author SPhillips (King of Australia) */ public class WizPanelController extends JFXPanel implements Initializable { @FXML // fx:id="browseButton" private Button browseButton; // Value injected by FXMLLoader @FXML // fx:id="pathText" private TextField pathText; //Field that Path is stored in private String filePath = ""; //some value to pass to the next Wizard panel // Handler for Button[fx:id="browseButton"] onAction public void handleButtonAction(ActionEvent event) { FileChooser fileChooser = new FileChooser(); fileChooser.setTitle("Select File"); //Show open file dialog File file = fileChooser.showOpenDialog(null); if(file!=null) { setFilePath(file.getPath()); pathText.setText(filePath); } } @Override // This method is called by the FXMLLoader when initialization is complete public void initialize(URL fxmlFileLocation, ResourceBundle resources) { assert browseButton != null : "fx:id=\"browseButton\" was not injected: check your FXML file 'WizPanel.fxml'."; assert pathText != null : "fx:id=\"pathText\" was not injected: check your FXML file 'WizPanel.fxml'."; // initialize your logic here: all @FXML variables will have been injected } @Override //This method is used by Wizard Framework to generate list of steps public String getName() { return "FXML JFXPanel"; } /** * @return the filePath */ public String getFilePath() { return filePath; } /** * @param filePath the filePath to set */ public void setFilePath(String filePath) { this.filePath = filePath; } } And when you run this Code within the JavaFX FXML application you get something like the following screenshot: Create the NetBeans Platform Application. Create a new NetBeans Platform application and add a new module. Add a Wizard using the "Wizard" Wizard. Include the JavaFX Runtime. Create a NetBeans library wrapper module to include "jfxrt.jar" and set a dependency on it in the module described above. Copy Controller class and FXML file. As of NetBeans 7.3 you cannot refactor copy these files from your JavaFX FXML project to your NetBeans Platform application package. After manually copying these two files you will need to do a manual replace of the package path in both the Controller class and the fx:controller string in the FXML file. Your FXML code might now look something like this: Replace Swing Panel with FXML Controller. At this point you can replace the autogenerated Swing JPanel class that would normally be loaded by the Wizard control class with your JavaFX FXML controller. Remember we extended JFXPanel and it pays off here. All we have to do now is follow our standard Swing Interop technique. However this time we have to use our Platform.runLater() pattern in the getComponent() method of the Wizard controller class. Below is the relevant code after the update. Notice how little we had to change: public class JfxwizWizardPanel1 implements WizardDescriptor.Panel { /** * The visual component that displays this panel. If you need to access the * component from this class, just use getComponent(). */ //private JfxwizVisualPanel1 component; public WizPanelController component; //Replaces original autogenerated JPanel class // Get the visual component for the panel. In this template, the component // is kept separate. This can be more efficient: if the wizard is created // but never displayed, or not all panels are displayed, it is better to // create only those which really need to be visible. @Override public WizPanelController getComponent() { if (component == null) { component = new WizPanelController(); //return new JFXPanel controller Platform.setImplicitExit(false); Platform.runLater(new Runnable() { @Override public void run() { createScene(); //standard Swing Interop Pattern } }); } return component; } private void createScene() { try { URL location = getClass().getResource("WizPanel.fxml"); //same FXML copied from JavaFX app FXMLLoader fxmlLoader = new FXMLLoader(); fxmlLoader.setLocation(location); fxmlLoader.setBuilderFactory(new JavaFXBuilderFactory()); Parent root = (Parent) fxmlLoader.load(location.openStream()); Scene scene = new Scene(root); component.setScene(scene); component = (WizPanelController) fxmlLoader.getController(); } catch (IOException ex) { Exceptions.printStackTrace(ex); } } At this point, you should be able to resolve any import issues, compile and run. You should see your JavaFX GUI nicely loaded within the Wizard Dialog frame like the screenshot below: Wow that's awesome that you can load JavaFX GUIs into your wizards. But you didn't do anything with the information, so you didn't actually leverage the WizardDescriptor framework.

April 26, 2013

by Sean Phillips

· 22,791 Views

Multipart Upload on S3 with jclouds

1. Goal In the previous article, we looked at how we can use the generic Blob APIs from jclouds to upload content to S3. In this article we will use the S3 specific asynchronous API from jclouds to upload content and leverage the multipart upload functionality provided by S3. 2. Preparation 2.1. Set up the custom API The first part of the upload process is creating the jclouds API – this is a custom API for Amazon S3: public AWSS3AsyncClient s3AsyncClient() { String identity = ... String credentials = ... BlobStoreContext context = ContextBuilder.newBuilder("aws-s3"). credentials(identity, credentials).buildView(BlobStoreContext.class); RestContext providerContext = context.unwrap(); return providerContext.getAsyncApi(); } 2.2. Determining the number of parts for the content Amazon S3 has a 5 MB limit for each part to be uploaded. As such, the first thing we need to do is determine the right number of parts that we can split our content into so that we don’t have parts below this 5 MB limit: public static int getMaximumNumberOfParts(byte[] byteArray) { int numberOfParts= byteArray.length / fiveMB; // 5*1024*1024 if (numberOfParts== 0) { return 1; } return numberOfParts; } 2.3. Breaking the content into parts Were going to break the byte array into a set number of parts: public static List breakByteArrayIntoParts(byte[] byteArray, int maxNumberOfParts) { List parts = Lists. newArrayListWithCapacity(maxNumberOfParts); int fullSize = byteArray.length; long dimensionOfPart = fullSize / maxNumberOfParts; for (int i = 0; i < maxNumberOfParts; i++) { int previousSplitPoint = (int) (dimensionOfPart * i); int splitPoint = (int) (dimensionOfPart * (i + 1)); if (i == (maxNumberOfParts - 1)) { splitPoint = fullSize; } byte[] partBytes = Arrays.copyOfRange(byteArray, previousSplitPoint, splitPoint); parts.add(partBytes); } return parts; } We’re going to test the logic of breaking the byte array into parts – we’re going to generate some bytes, split the byte array, recompose it back together using Guava and verify that we get back the original: @Test public void given16MByteArray_whenFileBytesAreSplitInto3_thenTheSplitIsCorrect() { byte[] byteArray = randomByteData(16); int maximumNumberOfParts = S3Util.getMaximumNumberOfParts(byteArray); List fileParts = S3Util.breakByteArrayIntoParts(byteArray, maximumNumberOfParts); assertThat(fileParts.get(0).length + fileParts.get(1).length + fileParts.get(2).length, equalTo(byteArray.length)); byte[] unmultiplexed = Bytes.concat(fileParts.get(0), fileParts.get(1), fileParts.get(2)); assertThat(byteArray, equalTo(unmultiplexed)); } To generate the data, we simply use the support from Random: byte[] randomByteData(int mb) { byte[] randomBytes = new byte[mb * 1024 * 1024]; new Random().nextBytes(randomBytes); return randomBytes; } 2.4. Creating the Payloads Now that we have determined the correct number of parts for our content and we managed to break the content into parts, we need to generate the Payload objects for the jclouds API: public static List createPayloadsOutOfParts(Iterable fileParts) { List payloads = Lists.newArrayList(); for (byte[] filePart : fileParts) { byte[] partMd5Bytes = Hashing.md5().hashBytes(filePart).asBytes(); Payload partPayload = Payloads.newByteArrayPayload(filePart); partPayload.getContentMetadata().setContentLength((long) filePart.length); partPayload.getContentMetadata().setContentMD5(partMd5Bytes); payloads.add(partPayload); } return payloads; } 3. Upload The upload process is a flexible multi-step process – this means: the upload can be started before having all the data – data can be uploaded as it’s coming in data is uploaded in chunks – if one of these operations fails, it can simply be retrieved chunks can be uploaded in parallel – this can greatly increase the upload speed, especially in the case of large files 3.1. Initiating the Upload operation The first step in the Upload operation is to initiate the process. This request to S3 must contain the standard HTTP headers – the Content-MD5 header in particular needs to be computed. Were going to use the Guava hash function support here: Hashing.md5().hashBytes(byteArray).asBytes(); This is the md5 hash of the entire byte array, not of the parts yet. To initiate the upload, and for all further interactions with S3, we’re going to use the AWSS3AsyncClient – the asynchronous API we created earlier: ObjectMetadata metadata = ObjectMetadataBuilder.create().key(key).contentMD5(md5Bytes).build(); String uploadId = s3AsyncApi.initiateMultipartUpload(container, metadata).get(); The key is the handle assigned to the object – this needs to be a unique identifier specified by the client. Also notice that, even though we’re using the async version of the API, we’re blocking for the result of this operation – this is because we will need the result of the initialize to be able to move forward. The result of the operation is an upload id returned by S3 – this will identify the upload throughout it’s lifecycle and will be present in all subsequent upload operations. 3.2. Uploading the Parts The next step is uploading the parts. Our goal here is to send these requests in parallel, as the upload parts operation represent the bulk of the upload process: List> ongoingOperations = Lists.newArrayList(); for (int partNumber = 0; partNumber < filePartsAsByteArrays.size(); partNumber++) { ListenableFuture future = s3AsyncApi.uploadPart( container, key, partNumber + 1, uploadId, payloads.get(partNumber)); ongoingOperations.add(future); } The part numbers need to be continuous but the order in which the requests are send is not relevant. After all of the upload part requests have been submitted, we need to wait for their responses so that we can collect the individual ETag value of each part: Function, String> getEtagFromOp = new Function, String>() { public String apply(ListenableFuture ongoingOperation) { try { return ongoingOperation.get(); } catch (InterruptedException | ExecutionException e) { throw new IllegalStateException(e); } } }; List etagsOfParts = Lists.transform(ongoingOperations, getEtagFromOp); If, for whatever reason, one of the upload part operations fails, the operation can be retried until it succeeds. The logic above does not contain the retry mechanism, but building it in should be straightforward enough. 3.3. Completing the Upload operation The final step of the upload process is completing the multipart operation. The S3 API requires the responses from the previous parts upload as a Map, which we can now easily create from the list of ETags that we obtained above: Map parts = Maps.newHashMap(); for (int i = 0; i < etagsOfParts.size(); i++) { parts.put(i + 1, etagsOfParts.get(i)); } And finally, send the complete request: s3AsyncApi.completeMultipartUpload(container, key, uploadId, parts).get(); This will return final ETag of the finished object and will complete the entire upload process. 4. Conclusion In this article we built a multipart enabled, fully parallel upload operation to S3, using the custom S3 jclouds API. This operation is ready to be used as is, but it can be improved in a few ways. First, retry logic should be added around the upload operations to better deal with failures. Next, for really large files, even though the mechanism is sending all upload multipart requests in parallel, a throttling mechanism should still limit the number of parallel requests being sent. This is both to avoid bandwidth becoming a bottleneck as well as to make sure Amazon itself doesn’t flag the upload process as exceeding an allowed limit of requests per second – the Guava RateLimiter can potentially be very well suited for this. P.S. You might dig following me on Twitter.

April 21, 2013

by Eugen Paraschiv

· 6,643 Views · 1 Like

Upload on S3 with the jclouds Library

There are several good ways to upload content to an S3 bucket in the Java world – in this article we’ll look at what the jclouds library provides for this purpose. To use jclouds – specifically the APIs discussed in this article, this simple Maven dependency should be added to the pom of the project: org.jclouds jclouds-allblobstore 1.5.9 1. Uploading to Amazon S3 The first step, in order to access any of these APIs, is to create a BlobStoreContext: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(BlobStoreContext.class); This represents the entry-point to a general key-value storage service, such as Amazon S3 – but not limited to it. For the more specific S3 only implementation, the context can be created similarly: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(S3BlobStoreContext.class); And even more specifically: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); When the authenticated context is no longer needed, closing it is required to release all resources – threads and connections – associated to it. 2. The four S3 APIs of jclouds The jclouds library provides four different APIs to upload content to S3 bucket, ranging from simple but inflexible to complex and powerful, all obtained via the BlobStoreContext. Let’s start with the simplest. 2.1. Upload via the Map API The easiest way jclouds can be used to interact with an S3 bucket is by representing that bucket as a Map. The API is obtained from the context: InputStreamMap bucket = context.createInputStreamMap("bucketName"); Then, to upload a simple HTML file: bucket.putString("index1.html", "hello world1"); The InputStreamMap API exposes several other types of PUT operations – files, raw bytes – both for single and bulk. A simple integration test can be used as an example: @Test public void whenFileIsUploadedToS3WithMapApi_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); InputStreamMap bucket = context.createInputStreamMap("bucketName"); bucket.putString("index1.html", "hello world1"); context.close(); } 2.2. Upload via BlobMap Using the simple Map API is straightforward but ultimately limited – for example, there is no way to pass in metadata about the content being uploaded. When more flexibility and customization is necessary, this simplified approach to uploading data to S3 via a Map is no longer enough. The next API we’ll look at is the Blob Map API – this is obtained from the context: BlobMap bucket = context.createBlobMap("bucketName"); The API allows the client to access more lower level details, such as Content-Length, Content-Type, Content-Encoding, eTag hash and others; to upload new content in the bucket: Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); The API also allows setting a variety of payloads on the create request. A simple integration test for uploading a basic HTML file to S3 via the Blob Map API: @Test public void whenFileIsUploadedToS3WithBlobMap_thenNoExceptions() throws IOException { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobMap bucket = context.createBlobMap("bucketName"); Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); bucket.put(blob.getMetadata().getName(), blob); context.close(); } 2.3. Upload via BlobStore The previous APIs had no way to upload content using multipart upload – this makes them ill suited when working with large files. This limitation is addressed by the next API we’re going to look at – the synchronous BlobStore API. This is obtained from the context: BlobStore blobStore = context.getBlobStore(); To use the multipart support and upload a file to S3: Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); The payload builder is the same one that was being used by the BlobMap API, so the same flexibility in specifying lower level metadata information about the blob is available here. The difference is the PutOptions supported by the PUT operation of the API – namely the multipart support. The previous integration test now has multipart enabled: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); context.close(); } 2.4. Upload via AsyncBlobStore While the previous BlobStore API was synchronous, there is also an asynchronous API for BlobStore – AsyncBlobStore. The API is similarly obtained from the context: AsyncBlobStore blobStore = context.getAsyncBlobStore(); The only difference between the two is that the async API is returning ListenableFuture for the PUT asynchronous operation: Blob blob = blobStore.blobBuilder("index4.html"). .payload("hello world4").build(); blobStore.putBlob("bucketName", blob).get(); The integration test displaying this operation is similar to the synchronous one: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index4.html"). payload("hello world4").contentType("text/html").build(); Future putOp = blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); putOp.get(); context.close(); } 3. Conclusion In this article, we analysed the four APIs that the jclouds library provides to upload content to Amazon S3. These four APIs are generic and they work with other key-value storage services as well – such as Microsoft Azure Storage for example. In the next article we’ll look at the Amazon specific S3 API available in jclouds – the AWSS3Client. We’ll implement the operation of uploading a large file, dynamically calculate the optimal number of parts for any given file, and perform the upload of all parts in parallel. P.S. You might dig following me on Twitter.

April 18, 2013

by Eugen Paraschiv

· 8,916 Views · 1 Like

HotSpot GC Thread CPU footprint on Linux

The following question will test your knowledge on garbage collection and high CPU troubleshooting for Java applications running on Linux OS. This troubleshooting technique is especially crucial when investigating excessive GC and / or CPU utilization. It will assume that you do not have access to advanced monitoring tools such as Compuware dynaTrace or even JVisualVM. Future tutorials using such tools will be presented in the future but please ensure that you first master the base troubleshooting principles. Question: How can you monitor and calculate how much CPU % each of the Oracle HotSpot or JRockit JVM garbage collection (GC) threads is using at runtime on Linux OS? Answer: On the Linux OS, Java threads are implemented as native Threads, which results in each thread being a separate Linux process. This means that you are able to monitor the CPU % of any Java thread created by the HotSpot JVM using the top –H command (Threads toggle view). That said, depending of the GC policy that you are using and your server specifications, the HotSpot & JRockit JVM will create a certain number of GC threads that will be performing young and old space collections. Such threads can be easily identified by generating a JVM thread dump. As you can see below in our example, the Oracle JRockit JVM did create 4 GC threads identified as "(GC Worker Thread X)”. ===== FULL THREAD DUMP =============== Fri Nov 16 19:58:36 2012 BEA JRockit(R) R27.5.0-110-94909-1.5.0_14-20080204-1558-linux-ia32 "Main Thread" id=1 idx=0x4 tid=14911 prio=5 alive, in native, waiting -- Waiting for notification on: weblogic/t3/srvr/T3Srvr@0xfd0a4b0[fat lock] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) at java/lang/Object.wait(J)V(Native Method) at java/lang/Object.wait(Object.java:474) at weblogic/t3/srvr/T3Srvr.waitForDeath(T3Srvr.java:730) ^-- Lock released while waiting: weblogic/t3/srvr/T3Srvr@0xfd0a4b0[fat lock] at weblogic/t3/srvr/T3Srvr.run(T3Srvr.java:380) at weblogic/Server.main(Server.java:67) at jrockit/vm/RNI.c2java(IIIII)V(Native Method) -- end of trace "(Signal Handler)" id=2 idx=0x8 tid=14920 prio=5 alive, in native, daemon "(GC Main Thread)" id=3 idx=0xc tid=14921 prio=5 alive, in native, native_waiting, daemon "(GC Worker Thread 1)" id=? idx=0x10 tid=14922 prio=5 alive, in native, daemon "(GC Worker Thread 2)" id=? idx=0x14 tid=14923 prio=5 alive, in native, daemon "(GC Worker Thread 3)" id=? idx=0x18 tid=14924 prio=5 alive, in native, daemon "(GC Worker Thread 4)" id=? idx=0x1c tid=14925 prio=5 alive, in native, daemon ……………………… Now let’s put all of these principles together via a simple example. Step #1 - Monitor the GC thread CPU utilization The first step of the investigation is to monitor and determine: Identify the native Thread ID for each GC worker thread shown via the Linux top –H command. Identify the CPU % for each GC worker thread. Step #2 – Generate and analyze JVM Thread Dumps At the same time of Linux top –H, generate 2 or 3 JVM Thread Dump snapshots via kill -3 . Open the JVM Thread Dump and locate the JVM GC worker threads. Now correlate the "top -H" output data with the JVM Thread Dump data by looking at the native thread id (tid attribute). As you can see in our example, such analysis did allow us to determine that all our GC worker threads were using around 20% CPU each. This was due to major collections happening at that time. Please note that it is also very useful to enable verbose:gc as it will allow you to correlate such CPU spikes with minor and major collections and determine how much your JVM GC process is contributing to the overall server CPU utilization.

April 17, 2013

by Pierre - Hugues Charbonneau

· 14,603 Views

Capture a Signature on iOS

Originally authored by Jason Harwig The Square Engineering Blog has a great article on Smoother Signatures for Android, but I didn't find anything specifically about iOS. So, what is the best way to capture a users signature on an iOS device? Although I didn't find any articles on signature capture, there are good implementations on the App Store. My target user experience was the iPad application Paper by 53, a drawing application with beautiful and responsive brushes. All code is available in the Github repository: SignatureDemo. Connecting the Dots The simplest approach is to capture the touches and connect them with straight lines. In the initializer of a UIView subclass, create the path and gesture recognizer to capture touch events. // Create a path to connect lines path = [UIBezierPath bezierPath]; // Capture touches UIPanGestureRecognizer *pan = [[UIPanGestureRecognizer alloc] initWithTarget:self action:@selector(pan:)]; pan.maximumNumberOfTouches = pan.minimumNumberOfTouches = 1; [self addGestureRecognizer:pan]; Capture the pan events into a bézier path by connecting the points with lines. - (void)pan:(UIPanGestureRecognizer *)pan { CGPoint currentPoint = [pan locationInView:self]; if (pan.state == UIGestureRecognizerStateBegan) { [path moveToPoint:currentPoint]; } else if (pan.state == UIGestureRecognizerStateChanged) [path addLineToPoint:currentPoint]; [self setNeedsDisplay]; } Stroke the path - (void)drawRect:(CGRect)rect { [[UIColor blackColor] setStroke]; [path stroke]; } An example "J" character rendered using this technique reveals some issues. At slow velocities iOS captures enough touch resolution that the lines aren't noticeable, but faster movement shows large gaps between touches that accentuates the lines. The 2012 Apple Developer Conference included a session Building Advanced Gesture Recognizers that addresses this issue using math. Quadratic Bézier Curves Instead of connected lines between the touch points, quadratic bézier curves connect the points using the technique discussed in the aforementioned WWDC session (Seek to 42:15.) Connect the touch points with a quadratic curve using the touch points as the control points and the mid points as start and end. Adding quadratic curves to the previous code requires the storing the previous touch point, so add an instance variable for that. CGPoint previousPoint; Create a function to calculate the midpoint of two points. static CGPoint midpoint(CGPoint p0, CGPoint p1) { return (CGPoint) { (p0.x + p1.x) / 2.0, (p0.y + p1.y) / 2.0 }; } Update the pan gesture handler to add quadratic curves instead of straight lines - (void)pan:(UIPanGestureRecognizer *)pan { CGPoint currentPoint = [pan locationInView:self]; CGPoint midPoint = midpoint(previousPoint, currentPoint); if (pan.state == UIGestureRecognizerStateBegan) { [path moveToPoint:currentPoint]; } else if (pan.state == UIGestureRecognizerStateChanged) { [path addQuadCurveToPoint:midPoint controlPoint:previousPoint]; } previousPoint = currentPoint; [self setNeedsDisplay]; } Not much code and already we see a big difference. The touch points are no longer visible, but it looks a little bland when drawing a signature. Every curve is the same width, which doesn't match the physics of a real pen. Variable Stroke Width The width can be varied based on the touch velocity to create a more natural stroke. The UIPanGestureRecognizer already includes a method called velocityInView: that returns the current touch velocity as a CGPoint. To render a stroke of varying width, I switched to OpenGL ES and a technique called tesselation to convert the stroke into triangles – specifically, triangle strips (OpenGL has support for drawing lines, but iOS doesn't support variable line widths with smoothing.) The quadratic points along a curve also need to be calculated, but is beyond the scope of this article. Check the source on github for details. Given two points, a perpendicular vector is calculated and its magnitude set to half the current thickness. Given the nature of GL_TRIANGLE_STRIP only two points are needed to create the next rectangle segment with two triangles. Here is an example of the final output using quadratic bézier curves, and velocity based stroke thickness creating a visually appealing and natural signature.

April 8, 2013

by Scott Leberknight

· 20,896 Views

Configuring Apache SolrCloud on Amazon VPC

We are going to construct an Apache SolrCloud (4.1) with 12 node EC2 instance(s) inside Amazon VPC in this post. Since the search data stored inside the SolrCloud is critical, we are going to build High availability at Solr Node level as well as AZ level. This setup will be done inside private subnet of Amazon VPC and will leverage 3 Availability Zones of the Amazon EC2 Region. Deployment architecture of the setup is given below: A small brief about setup: 3 Zookeepers will be deployed on 3 Availability Zones. ZK EC2 instances will be deployed on the Private subnet of the Amazon VPC. 3 Solr Shard EC2 instances will be deployed on Private subnet of Availability Zone 1 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 2 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 3 inside Amazon VPC. EBS optimized + PIOPS EC2 instances can be used for Solr EC2 Nodes To know more about SolrCloud Deployment best practices on Amazon VPC, Refer article: http://harish11g.blogspot.in/2013/03/Apache-Solr-cloud-on-Amazon-EC2-AWS-VPC-implementation-deployment.html Step 1: Creating Virtual Private Cloud on AWS Create a VPC with Public and Private Subnets. Assume the Load balancer and Web/App Servers can reside on the public subnet and Apache Solr Cloud will reside on the private subnet of the VPC. Step 2: Assigning the IP for the Subnets Create the subnet with its IP range. Chose the Availability zone for this subnet. Step 3: Multiple Subnets on Multiple AZ’s Create multiple subnets in Multiple AZ for building a Highly available setup for SolCloud Step 4: Install Java for Zookeeper & Solr Amazon Linux is chosen as the EC2 OS variant. Execute the following instructions on the respective EC2 nodes after their launch. EC2 instances should be launched in Multi-AZ in Multiple VPC Private Subnets. Solr uses Zookeeper as the cluster configuration and coordinator. Zookeeper is a distributed file system containing information about all the Solr Nodes. Solrconfig.xml, Schema.xml etc are stored in the repository.We have used Oracle-Sun Java over OpenJDK “sudo -s” “cd /opt” “wget --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk-7u3-download-1501626.html;" http://download.oracle.com/otn-pub/java/jdk/7u13-b20/jdk-7u13-linux-x64.rpm” “mv jdk-7u10-linux-x64.rpm?AuthParam=1357217677_76ec3d8d9a3644f4b9ec1ea79e1fcf33 jdk-7u10-linux-x64.rpm jdk-7u10-linux-x64.rpm” “sudo rpm -ivh jdk-7u10-linux-x64.rpm” “alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_10/jre/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jdk1.7.0_10/jre/bin/javaws 20000” “alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_10/bin/javac 20000” “alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_10/bin/jar 20000” “alternatives --install /usr/bin/java java /usr/java/jre1.7.0_10/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jre1.7.0_10/bin/javaws 20000” “alternatives --configure java” Add JAVA_HOME in .bash_profile: “vim ~/.bash_profile” export JAVA_HOME="/usr/java/jdk1.7.0_09" export PATH=$PATH:$JAVA_HOME/bin Restart the instance. “init 6” Check the version of Java installed using “java -version” command Step 5: Configure the ZooKeeper (v3.4.5) Ensemble: Since single Zookeeper is not ideal for a large Solr cluster (because of SPOF), it is recommended to configure multiple Zookeepers in concert as an ensemble .In this step we will install and configure 3 ZooKeeper EC2 nodes spanning across 3 different Availability Zones in respective Private Subnets inside a VPC.Zookeeper will be configured on Amazon Linux. “sudo yum update” “sudo -s” “ cd /opt” “wget http://apache.techartifact.com/mirror/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz” “tar -xzvf zookeeper-3.4.5.tar.gz” “rm zookeeper-3.4.5.tar.gz” “cd zookeeper-3.4.5” “cp conf/zoo_sample.cfg conf/zoo.cfg” Add the following lines in zoo.cfg “vim conf/zoo.cfg” dataDir=/data server.1=[zk-server01-ip]:2888:3888 server.2=[zk-server02-ip]:2888:3888 server.3=[zk-server03-ip]:2888:3888 “cd /opt/zookeeper/data” “vim myid” 1 or 2 or 3 respectively on each ZooKeeper EC2 instances in Multi-AZ #Starting ZooKeeper Program. “bin/zkServer.sh start” Follow the above steps in all the ZooKeeper servers. ReferClustered (Multi-Server) SetupandConfiguration Parameters for understandingquorum_port,leader_election_port and the filemyid. Every ZooKeeper node needs to know about every other ZK EC2 node in the ensemble, and a majority of EC2’s (called a Quorum) are needed to provide the service. Make sure the VPC IP of all the Zookeepers are given in every ZK node, like the one in following command. server.1=:: server.2=:: server.3=:: Step 6: Configuring Solr 4.1 EC2 node In this step we will install and configure 3 Apache Solr4.1 Shard EC2 instances in a single Amazon AZ and 2 Solr Replicas in another AZ in their respective Private subnets. Please note that we have to specify all the ZooKeeper (ZK) hosts on every Solr instance as below. Note: Solr gets comes with jetty in default, it is suggested to use tomcat for production nodes. Perform the following after launching EC2 instances in Multi-AZ in Multiple VPC Private Subnets. “sudo -s” “yum update” “cd /opt” “wget http://apache.techartifact.com/mirror/lucene/solr/4.1.0/apache-solr-4.1.0.tgz” “tar -xzvf apache-solr-4.1.0.tgz” “rm -f apache-solr-4.1.0.tgz” On Solr Shard/Replica Instances: “cd /opt/apache-solr-4.0.0/example/” “vim /opt/apache-solr-4.0.0/example/solr/collection1/conf/solrconfig.xml” Change /var/data/solr to /data Starting Solr4.1 Shard/Replica Java Program. “java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=SolrCloud4.1-Conf -DnumShards=3 -DzkHost=[zk-server01-ip]:2181,[zk-server02-ip]:2181,[zk-server03-ip]:2181 -jar start.jar “java -DzkHost= DzkHost=:,:,: -jar start.jar” -DnumShards: the number of shards that will be present. Note that once set, this number cannot be increased or decreased without re-indexing the entire data set. (Dynamically changing the number of shards is part of the Solr roadmap!) -DzkHost: a comma-separated list of ZooKeeper servers. -Dbootstrap_confdir, -Dcollection.configName: these parameters are specified only when starting up the first Solr instance. This will enable the transfer of configuration files to ZooKeeper. Subsequent Solr instances need to just point to the ZooKeeper ensemble. The above command with –DnumShards=3 specifies that it is a 3-shard cluster. The first Solr EC2 node automatically becomes shard1 and the second Solr EC2 node automatically becomes shard2 …. What happens when we launch fourth Solr instance in this cluster? Since it’s a 3-shard cluster, the fourth Solr EC2 node automatically becomes a replica of shard1 and the fifth Solr EC2 node becomes a replica of shard2. Step 7: AWS Security Group TCP Ports to be enabled: Configure the following TCP ports on the AWS security group to allow access between Solr and ZK nodes deployed in Multiple AZ. Solr Shards/Replicas will connect to ZK through TCP Port 2181 Solr Web Interface with Jetty container through TCP Port 8983 Solr Web Interface with Tomcat container through TCP Port 8080 Every instance that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. We can accomplish this with the series of lines of the form server.id=host:port:port For example, server.1=[vpc-ip]:2888:3888 server.2=[vpc-ip]:2888:3888 server.3=[vpc-ip]:2888:3888 TCP Ports 2888, 3888 should be opened for ZK Ensemble.

April 5, 2013

by Harish Ganesan

· 7,830 Views

AWS VPC NAT Instance Failover and High Availability

Amazon Virtual Private Cloud (VPC) is a great way to setup an isolated portion of AWS and control the network topology. It is a great way to extend your data center and use AWS for burst requirements. With the latest VPC for Everyone announcement, what was earlier "Classic" and "VPC" in AWS will soon be only VPC. That is, every deployment in AWS will be on a VPC even though one might not need all the additional features that VPC provides. One might eventually start looking at utilizing VPC features such as multiple Subnets, Network isolation, Network ACLs, etc.. Those who have already worked with VPC's understand the role of NAT Instance in a VPC. When you create a VPC, you create them with multiple Subnets (Public and Private). Instances launched in the Public Subnet have direct internet connectivity to send and receive internet traffic through the internet gateway of the VPC. Typically, internet facing servers such as web servers are kept in the Public Subnet. A Private Subnet can be used to launch Instances that do not require direct access from the internet. Instances in a Private Subnet can access the Internet without exposing their private IP address by routing their traffic through a Network Address Translation (NAT) instance in the Public Subnet. AWS provides an AMI that can be launched as a NAT Instance. Following diagram is the representation of a standard VPC that gets provisioned through the AWS Management Console wizard. Standard Private and Public Subnets in a VPC The above architecture has A Public Subnet that has direct internet connectivity through the Internet Gateway. Web Instances can be placed within the Public Subnet The custom Route Table associated with Public Subnet will have the necessary routing information to route traffic to the Internet Gateway A NAT Instance is also provisioned in the Public Subnet A Private Subnet that has outbound internet connectivity through the NAT Instance in the Public Subnet The Main Route Table is by default associated with the Private Subnet. This will have necessary routing information to route internet traffic to the NAT Instance Instances in the Private Subnet will use the NAT Instance for outbound internet connectivity. For example, DB backups from standby that needs to be stored in S3. Background programs that make external web services calls Of course, the above architecture has limited High Availability since all the Subnets are created within the same Availability Zone. We can avoid this by creating multiple Subnets in multiple Availability Zones. Public and Private Subnets with multiple Availability Zones Additional Subnets (Public and Private) are created in one another Availability Zone Both Private Subnets are attached to the Main Routing Table Both Public Subnets are attached to the same Custom Routing Table Instances in the Private Subnet still continue to use the NAT Instance for outbound internet connectivity Though we increased the High Availability by utilizing multiple Availability Zones, the NAT Instance is still a Single Point of Failure. NAT Instance is just another EC2 Instance that can become unavailable any time. The updated architecture below uses two NAT Instances to provide failover and High Availability for the NAT Instances NAT Instance High Availability Each Subnet is associated with its own Route Table NAT1 is provisioned in Public Subnet 1 NAT2 is provisioned in Public Subnet 2 Private Subnet 1's Route Table (RT) has routing entry to NAT1 for internet traffic Private Subnet 2's Route Table (RT) has routing entry to NAT2 for internet traffic NAT Instance HA Illustration A script can be installed on both the NAT Instances to monitor each other and swap the routing table association if one of them fails. For example, if NAT1 detects that NAT2 is not responding to its ping requests, it can change the Route Table of Private Subnet 2 to NAT1 for internet traffic. Once NAT2 becomes operational again, a reverse swapping can happen. AWS has a pretty good documentation on this and a sample script for the swapping. Apart from HA, the above architecture also provides better overall throughput, since during normal conditions, both NAT Instances can be used to drive the outbound internet requirements of the VPC. If there are workloads that requires a lot of outbound internet connectivity, having more than one NAT Instance would make sense. Of course, you are still limited with one NAT Instance per Subnet.

March 28, 2013

by Raghuraman Balachandran

· 18,824 Views