Databases Resources

The Latest Databases Topics

avro has a little-known gem of a feature which allows you to control which fields in an avro record are used for partitioning , sorting and grouping in mapreduce. the following figure gives a quick refresher as to what these terms mean. oh, and don’t take the placement of the “sorting” literally - sorting actually occurs on both the map and reduce side - but it’s always performed in the context of a specific partition (i.e. for a specific reducer). by default all the fields in an avro map output key are used for partitioning, sorting and grouping in mapreduce. let’s walk through an example and see how this works. you’ll begin with a simple schema github source : {"type": "record", "name": "com.alexholmes.avro.weathernoignore", "doc": "a weather reading.", "fields": [ {"name": "station", "type": "string"}, {"name": "time", "type": "long"}, {"name": "temp", "type": "int"}, {"name": "counter", "type": "int", "default": 0} ] } we’re going to see what happens when we run this code against a small sample data set, which we’ll generate using avro code github source : file input = tmpfolder.newfile("input.txt"); avrofiles.createfile(input, weathernoignore.schema$, arrays.aslist( weathernoignore.newbuilder().setstation("sfo").settime(1).settemp(3).build(), weathernoignore.newbuilder().setstation("iad").settime(1).settemp(1).build(), weathernoignore.newbuilder().setstation("sfo").settime(2).settemp(1).build(), weathernoignore.newbuilder().setstation("sfo").settime(1).settemp(2).build(), weathernoignore.newbuilder().setstation("sfo").settime(1).settemp(1).build() ).toarray()); to understand how avro is partitioning, sorting and grouping the data, we’ll write an identity mapper and reducer, with a small enhancement to the reducer to increment the counter field for each record we see in an individual reducer instance github source : package com.alexholmes.avro.sort.basic; import com.alexholmes.avro.weathernoignore; import org.apache.avro.mapred.avrokey; import org.apache.avro.mapred.avrovalue; import org.apache.avro.mapreduce.avrojob; import org.apache.avro.mapreduce.avrokeyinputformat; import org.apache.avro.mapreduce.avrokeyoutputformat; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.nullwritable; import org.apache.hadoop.mapreduce.job; import org.apache.hadoop.mapreduce.mapper; import org.apache.hadoop.mapreduce.reducer; import org.apache.hadoop.mapreduce.lib.input.fileinputformat; import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; import java.io.ioexception; public class avrosort { private static class sortmapper extends mapper, nullwritable, avrokey, avrovalue> { @override protected void map(avrokey key, nullwritable value, context context) throws ioexception, interruptedexception { context.write(key, new avrovalue(key.datum())); } } private static class sortreducer extends reducer, avrovalue, avrokey, nullwritable> { @override protected void reduce(avrokey key, iterable> values, context context) throws ioexception, interruptedexception { int counter = 1; for (avrovalue weathernoignore : values) { weathernoignore.datum().setcounter(counter++); context.write(new avrokey(weathernoignore.datum()), nullwritable.get()); } } } public boolean runmapreduce(final job job, path inputpath, path outputpath) throws exception { fileinputformat.setinputpaths(job, inputpath); job.setinputformatclass(avrokeyinputformat.class); avrojob.setinputkeyschema(job, weathernoignore.schema$); job.setmapperclass(sortmapper.class); avrojob.setmapoutputkeyschema(job, weathernoignore.schema$); avrojob.setmapoutputvalueschema(job, weathernoignore.schema$); job.setreducerclass(sortreducer.class); avrojob.setoutputkeyschema(job, weathernoignore.schema$); job.setoutputformatclass(avrokeyoutputformat.class); fileoutputformat.setoutputpath(job, outputpath); return job.waitforcompletion(true); } } if you look at the output of the job below, you’ll see that the output is sorted across all the fields, and that the sorting is in field ordinal order. what this means is that when mapreduce is sorting these records, it compares the station field first, then the time field second, and so on according to the ordering of the fields in the avro schema. this is pretty much what you’d expect if you write your own complex writable type, and your comparator compared all the fields in order. {"station": "iad", "time": 1, "temp": 1, "counter": 1} {"station": "sfo", "time": 1, "temp": 1, "counter": 1} {"station": "sfo", "time": 1, "temp": 2, "counter": 1} {"station": "sfo", "time": 1, "temp": 3, "counter": 1} {"station": "sfo", "time": 2, "temp": 1, "counter": 1} oh, and before we move on notice that the value for the counter field is always 1 , meaning that each reducer was only fed a single key/vaue pair, which makes sense since our identity mapper only emitted a single value for each key, the keys are unique, and the mapreduce partitioner, sorter and grouper were using all the fields in the record. excluding fields for sorting avro gives us the ability to indicate that specific fields should be ignored when performing ordering functions. in mapreduce these fields are ignored for sorting/partitioning and grouping in mapreduce, which basically means that we have the ability to perform secondary sorting. let’s examine the following schema github source : {"type": "record", "name": "com.alexholmes.avro.weather", "doc": "a weather reading.", "fields": [ {"name": "station", "type": "string"}, {"name": "time", "type": "long"}, {"name": "temp", "type": "int", "order": "ignore"}, {"name": "counter", "type": "int", "order": "ignore", "default": 0} ] } it’s pretty much identical to the first schema, the only difference being that the last two fields are flagged as being “ignored” for sorting/partitioning/grouping. let’s run the same (other than modified to work with the different schema) mapreduce code github source as above against this new schema and examine the outputs. {"station": "iad", "time": 1, "temp": 1, "counter": 1} {"station": "sfo", "time": 1, "temp": 3, "counter": 1} {"station": "sfo", "time": 1, "temp": 2, "counter": 2} {"station": "sfo", "time": 1, "temp": 1, "counter": 3} {"station": "sfo", "time": 2, "temp": 1, "counter": 1} there are a couple of notable differences between this output, and the output from the previous schema which didn’t have any ignored fields. first, it’s clear that the temp field isn’t being used in the sorting, which makes sense since we specified that it should be ignored in the schema. however, more interestingly, note the value of the counter field. all records that had identical station and time values went to the same reducer invocation, evidenced by the increasing value of counter . this is essentially secondary sort! now, all of this greatness isn’t without some limitations: you can’t support two mapreduce jobs that use the same avro key, but have different sorting/partitioning/grouping requirements. although it’s conceivable that you could create a new instance of the avro schema and set the ignored flags for these fields yourself. the partitioner, sorter and grouping functions in mapreduce all work off of the same fields (i.e. they all ignore fields that set as ignored in the schema). this means that your options for secondary sorting are limited. for example, you wouldn’t be able to partition all stations to the same reducer, and then group by station and time. ordering uses a field’s ordinal position to determine its order within the overall set of fields to be ordered. in other words, in a two-field record, the first field is always compared before the second. there’s no way to change this behavior other than flipping the order of the fields in the record. having said all of that - the “ignoring fields” feature for sorting is pretty awesome, and something that will no doubt come in handy in my future mapreduce work.

May 29, 2013

by Alex Holmes

· 8,148 Views

Amazon S3 Parallel MultiPart File Upload

In this blog post, I will present a simple tutorial on uploading a large file to Amazon S3 as fast as the network supports. Amazon S3 is clustered storage service of Amazon. It is designed to make web-scale computing easier. Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers. For using Amazon services, you'll need your AWS access key identifiers, which AWS assigned you when you created your AWS account. The following are the AWS access key identifiers: Access Key ID (a 20-character, alphanumeric sequence) For example: 022QF06E7MXBSH9DHM02 Secret Access Key (a 40-character sequence) For example: kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct Caution Your Secret Access Key is a secret, which only you and AWS should know. It is important to keep it confidential to protect your account. Store it securely in a safe place. Never include it in your requests to AWS, and never e-mail it to anyone. Do not share it outside your organization, even if an inquiry appears to come from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your Secret Access Key. The Access Key ID is associated with your AWS account. You include it in AWS service requests to identify yourself as the sender of the request. The Access Key ID is not a secret, and anyone could use your Access Key ID in requests to AWS. To provide proof that you truly are the sender of the request, you also include a digital signature calculated using your Secret Access Key. The sample code handles this for you. Your Access Key ID and Secret Access Key are displayed to you when you create your AWS account. They are not e-mailed to you. If you need to see them again, you can view them at any time from your AWS account. To get your AWS access key identifiers Go to the Amazon Web Services web site at http://aws.amazon.com. Point to Your Account and click Security Credentials. Log in to your AWS account. The Security Credentials page is displayed. Your Access Key ID is displayed in the Access Identifiers section of the page. To display your Secret Access Key, click Show in the Secret Access Key column. You can use your Amazon keys from a properties file in your application. Here is a sample for properties file containing Amazon keys: # Fill in your AWS Access Key ID and Secret Access Key # http://aws.amazon.com/security-credentials accessKey = secretKey = Here is sample AmazonUtil class for getting AWS Credentials from properties file. public class AmazonUtil { private static final Logger logger = LogUtil.getLogger(); private static final String AWS_CREDENTIALS_CONFIG_FILE_PATH = ConfigUtil.CONFIG_DIRECTORY_PATH + File.separator + "aws-credentials.properties"; private static AWSCredentials awsCredentials; static { init(); } private AmazonUtil() { } private static void init() { try { awsCredentials = new PropertiesCredentials(IOUtil.getResourceAsStream(AWS_CREDENTIALS_CONFIG_FILE_PATH)); } catch (IOException e) { logger.error("Unable to initialize AWS Credentials from " + AWS_CREDENTIALS_CONFIG_FILE_PATH); } } public static AWSCredentials getAwsCredentials() { return awsCredentials; } } Amazon S3 has Multipart Upload service which allows faster, more flexible uploads into Amazon S3. Multipart Upload allows you to upload a single object as a set of parts. After all parts of your object are uploaded, Amazon S3 then presents the data as a single object. With this feature you can create parallel uploads, pause and resume an object upload, and begin uploads before you know the total object size. For more information on Multipart Upload, review the Amazon S3 Developer Guide In this tutorial, my sample application uploads each file parts to Amazon S3 with different threads for using network throughput as possible as much. Each file part is associated with a thread and each thread uploads its associated part with Amazon S3 API. Figure 1. Amazon S3 Parallel Multi-Part File Upload Mechanism Amazon S3 API suppots MultiPart File Upload in this way: 1. Send a MultipartUploadRequest to Amazon. 2. Get a response containing a unique id for this upload operation. 3. For i in ${partCount} 3.1. Calculate size and offset of split-i in whole file. 3.2. Build a UploadPartRequest with file offset, size of current split and unique upload id. 3.3. Give this request to a thread and starts upload by running thread. 3.3.1. Send associated UploadPartRequest to Amazon. 3.3.2. Get response after successful upload and save ETag property of response. 4. Wait all threads to terminate 5. Get ETags (ETag is an identifier for successfully completed uploads) of all terminated threads. 6. Send a CompleteMultipartUploadRequest to Amazon with unique upload id and all ETags. So Amazon joins all file parts as target objects. Here is implementation: public class AmazonS3Util { private static final Logger logger = LogUtil.getLogger(); public static final long DEFAULT_FILE_PART_SIZE = 5 * 1024 * 1024; // 5MB public static long FILE_PART_SIZE = DEFAULT_FILE_PART_SIZE; private static AmazonS3 s3Client; private static TransferManager transferManager; static { init(); } private AmazonS3Util() { } private static void init() { // ... s3Client = new AmazonS3Client(AmazonUtil.getAwsCredentials()); transferManager = new TransferManager(AmazonUtil.getAwsCredentials()); } // ... public static void putObjectAsMultiPart(String bucketName, File file) { putObjectAsMultiPart(bucketName, file, FILE_PART_SIZE); } public static void putObjectAsMultiPart(String bucketName, File file, long partSize) { List partETags = new ArrayList(); List uploaders = new ArrayList(); // Step 1: Initialize. InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, file.getName()); InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest); long contentLength = file.length(); try { // Step 2: Upload parts. long filePosition = 0; for (int i = 1; filePosition < contentLength; i++) { // Last part can be less than part size. Adjust part size. partSize = Math.min(partSize, (contentLength - filePosition)); // Create request to upload a part. UploadPartRequest uploadRequest = new UploadPartRequest(). withBucketName(bucketName).withKey(file.getName()). withUploadId(initResponse.getUploadId()).withPartNumber(i). withFileOffset(filePosition). withFile(file). withPartSize(partSize); uploadRequest.setProgressListener(new UploadProgressListener(file, i, partSize)); // Upload part and add response to our list. MultiPartFileUploader uploader = new MultiPartFileUploader(uploadRequest); uploaders.add(uploader); uploader.upload(); filePosition += partSize; } for (MultiPartFileUploader uploader : uploaders) { uploader.join(); partETags.add(uploader.getPartETag()); } // Step 3: complete. CompleteMultipartUploadRequest compRequest = new CompleteMultipartUploadRequest(bucketName, file.getName(), initResponse.getUploadId(), partETags); s3Client.completeMultipartUpload(compRequest); } catch (Throwable t) { logger.error("Unable to put object as multipart to Amazon S3 for file " + file.getName(), t); s3Client.abortMultipartUpload( new AbortMultipartUploadRequest( bucketName, file.getName(), initResponse.getUploadId())); } } // ... private static class UploadProgressListener implements ProgressListener { File file; int partNo; long partLength; UploadProgressListener(File file) { this.file = file; } @SuppressWarnings("unused") UploadProgressListener(File file, int partNo) { this(file, partNo, 0); } UploadProgressListener(File file, int partNo, long partLength) { this.file = file; this.partNo = partNo; this.partLength = partLength; } @Override public void progressChanged(ProgressEvent progressEvent) { switch (progressEvent.getEventCode()) { case ProgressEvent.STARTED_EVENT_CODE: logger.info("Upload started for file " + "\"" + file.getName() + "\""); break; case ProgressEvent.COMPLETED_EVENT_CODE: logger.info("Upload completed for file " + "\"" + file.getName() + "\"" + ", " + file.length() + " bytes data has been transferred"); break; case ProgressEvent.FAILED_EVENT_CODE: logger.info("Upload failed for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; case ProgressEvent.CANCELED_EVENT_CODE: logger.info("Upload cancelled for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; case ProgressEvent.PART_STARTED_EVENT_CODE: logger.info("Upload started at " + partNo + ". part for file " + "\"" + file.getName() + "\""); break; case ProgressEvent.PART_COMPLETED_EVENT_CODE: logger.info("Upload completed at " + partNo + ". part for file " + "\"" + file.getName() + "\"" + ", " + (partLength > 0 ? partLength : progressEvent.getBytesTransfered()) + " bytes data has been transferred"); break; case ProgressEvent.PART_FAILED_EVENT_CODE: logger.info("Upload failed at " + partNo + ". part for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; } } } private static class MultiPartFileUploader extends Thread { private UploadPartRequest uploadRequest; private PartETag partETag; MultiPartFileUploader(UploadPartRequest uploadRequest) { this.s3Client = s3Client; this.uploadRequest = uploadRequest; } @Override public void run() { partETag = s3Client.uploadPart(uploadRequest).getPartETag(); } private PartETag getPartETag() { return partETag; } private void upload() { start(); } } }

May 28, 2013

by Serkan Özal

· 57,422 Views · 3 Likes

Web API in ASP.NET Web Forms Application

With the release of ASP.NET MVC 4 one of the exciting features packed in the release was ASP.NET Web API.

May 24, 2013

by Lohith Nagaraj

· 51,624 Views

Azure Blob Storage - "The specified blob or block content is invalid"

If you’re uploading blobs by splitting blobs into blocks and you get the error – The specified blob or block content is invalid, then this post is for you. Short Version If you’re uploading blobs by splitting blobs into blocks and you get the above mentioned error, ensure that your block ids of your blocks are of same length. If the block ids of your blocks are of different length, you’ll get this error. Long Version Now for the longer version of this post . A few days back I was working with storage client library especially around uploading blobs in chunks and with one particular blob I was constantly getting the error – The specified blob or block content is invalid. I tried numerous combinations even resorting to REST API directly but to no avail. It only happened with just one blob. Furthermore if I uploaded the same blob without splitting it into blocks, all was well. I was at my wits’ end. Tried searching the Internet for this error but could not find a conclusive answer to my problem. After much trial and error, I was able to simulate the same problem on other blobs as well. Here’s how you can recreate it: Start uploading the blob by splitting it into blocks. For block id, let’s do a 7 character long string e.g. intValue.ToString(“d7”). This will ensure that my block ids would be “0000001”, “0000002”, …, ”0000010” ….. After one or two blocks are uploaded, cancel the operation. Now re-upload the blob by splitting it into blocks. However this time for block id, let’s do a 6 character long string e.g. intValue.ToString(“d6”). You’ll get the error as soon as you try to upload the 1st block. Possible Solutions Now that we know the root cause of this problem, let’s look at some of the possible solutions to solve this problem. Wait out One possible solution is to wait out. I know its lame but still a possible solution. We know that Windows Azure Blob Storage Service keeps all uncommitted blocks for a duration of 7 days and if within 7 days those uncommitted blocks are not committed, the storage service purges them. I wish storage service provided some mechanism to purge uncommitted blocks programmatically. Commit uncommitted blocks You could possibly commit the blocks which are in uncommitted state so that at least you get a blob (which would not be the blob we wanted to upload in the first place). You can then delete that blob and re-upload the blob by specifying block ids which are of same length. To fetch the list of uncommitted blocks, if you’re using REST API directly you can perform “Get Block List” operation and pass “blocklisttype=uncommitted” as one of the query string parameters. If you’re using storage client library (assuming you’re using the version 2.x of .Net storage client library), you can do something like the code below: private static List GetUncommittedBlockIds(CloudBlockBlob blob) { var sasUri = blob.GetSharedAccessSignature(new SharedAccessBlobPolicy() { SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(5), Permissions = SharedAccessBlobPermissions.Read, }); var blobUri = new Uri(string.Format("{0}{1}", blob.Uri, sasUri)); List uncommittedBlockIds = new List(); var request = BlobHttpWebRequestFactory.GetBlockList(blobUri, null, null, BlockListingFilter.Uncommitted, null, null); //request.Headers.Add("Authorization", using (var resp = (HttpWebResponse)request.GetResponse()) { using (var stream = resp.GetResponseStream()) { var getBlockListResponse = new GetBlockListResponse(stream); var blocks = getBlockListResponse.Blocks; foreach (var block in blocks.Where(b => !b.Committed)) { uncommittedBlockIds.Add(Encoding.UTF8.GetString(Convert.FromBase64String(block.Name))); } } } return uncommittedBlockIds; } A few things to keep in mind here: Microsoft.WindowsAzure.Storage.Blob namespace does not have the capability to get the list of uncommitted blocks. You would need to make use ofMicrosoft.WindowsAzure.Storage.Blob.Protocol namespace. Because we’re kind of invoking the REST API by executing an HttpWebRequest, I created a shared access signature on the blob so that I don’t have to create “Authorization” header. Fetch uncommitted blocks to see block id length You could fetch the list of uncommitted blocks just to find out the length of the block id used. You could then use that block id length for your new upload session and do the upload. Please see the code snippet above to find this information. Upload another blob with same name without splitting it into blocks You could also upload another blob with the same name without splitting it into blocks. It could very well be a zero byte blob. That way your uncommitted block list will be wiped clean. Then you could delete that dummy blob and re-upload the actual blob. A Few Words About Blocks Since we’re talking about blocks, I thought it might be useful to mention a few points about them: Blocks and block related operations are only applicable for “Block Blobs”. Duh!! You’ll get an error if you’re trying to do these operations on a “Page Blob”. For uploading large blobs, it is recommended that you split your blob into blocks. In fact if your blob size is more than 64 MB, then you have to split it into blocks. Minimum size of a block is 1 Byte and the maximum size of a block is 4 MB. It is recommended that you choose a block size based on your internet connectivity and number of parallel threads you want use to upload these blocks. A blob can be split into a maximum of 50000 blocks. It’s important to remember this limitation because you are reminded of this limit when you’re trying to upload 50001st block. The length of all the block ids must be same. So if you’re using an integer value to denote block id, you make sure that you pad that integer value with “0” so that you get same length. So you could do something likeint.ToString(“d6”). When passing the block id as a parameter, it must be Base64 encoded. While the order in which the blocks are uploaded is not important, the order is important when you commit the block list because that’s when the blob is constructed by the service. For example, let’s say you’re uploading a blob by splitting it into 5 blocks (with ids “000001”, “000002”, “000003”, “000004”, and “000005”). You could upload these blocks in any order – 000004, 000001, 000003, 000005, 000002 however when you commit the block list, ensure that the block ids are passed in proper order i.e. 000001, 000002, 000003, 000004, 000005. Summary That’s it for this post. I hope you’ve found this information useful. I spent considerable amount of time trying to fix this problem so I hope it will help some folks out. As always, if you find any issues with the post please let me know and I’ll fix it ASAP.

May 20, 2013

by Gaurav Mantri

· 10,937 Views

Postgres Fuzzy Search Using Trigrams (+/- Django)

When building websites, you’ll often want users to be able to search for something by name. On LinerNotes, users can search for bands, albums, genres etc from a search bar that appears on the homepage and in the omnipresent nav bar. And we need a way to match those queries to entities in our Postgres database. At first, this might seem like a simple problem with a simple solution, especially if you’re using the ORM; just jam the user input into an ORM filter and retrieve every matching string. But there’s a problem: if you do Bands.objects.filter(name="beatles") You’ll probably get nothing back, because the name column in your “bands” table probably says “The Beatles” and as far as Postgres is concerned if it’s not exactly the same string, it’s not a match. Users are naturally terrible at spelling, and even if they weren’t they’d be bad at guessing exactly how the name is formatted in your database. Of course you can use the LIKE keyword in SQL (or the equivalent ‘__contains’ suffix in the ORM) to give yourself a little flexibility and make sure that “Beatles” returns “The Beatles”. But 1) the LIKE keyword requires you to evaluate a regex against every row in your table, or hope that you’ve configured your indices to support LIKE (a quick Google doesn’t tell me whether Django does that by default in the ORM) and 2) what if the user types “Beetles”? Well, then you’ve got a bit of a problem. No matter how obvious it is to human you that “beatles” is close to “beetles”[1], to the computer they’re just two non-identical byte sequences. If you want the computer to understand them as similar you’re going to have to give it a metric for similarity and a method to make the comparison. There are a few ways to do that. You can do what I did initially and whip out the power tools, i.e. a dedicated search system like Solr or ElasticSearch. These guys have notions of fuzziness built right in (Solr more automatically than ES). But they’re designed for full-text indexing of documents (e.g. full web pages) and they’re rather complex to set up and administer. ES has been enough of a hassle to keep running smoothly that I took the time to see if I could push the search workload to Postgres, and hence this article. Unless you need to do something real fancy, it’s probably overkill to use them for just matching names. Instead, we’re going to follow Starr Horne’s advice and use a Postgres EXTENSION that lets us build fuzziness into our query in a fast and fairly simple way. Specifically, we’re going to use an extension called pg_trgm (i.e. “Postgres Trigram”) which gives Postgres a “similarity” function that can evaluate how many three-character subsequences (i.e. “trigrams”) two strings share. This is actually a pretty good metric for fuzzy matching short strings like names. To use pg_trgm, you’ll need to install the “Postgres Contrib” package. On ubuntu: sudo apt-get install postgres-contrib **WARNING: THIS WILL TRY TO RESTART YOUR DATABASE** then pop open psql and install pg_trgm (NB: this only works on Postgres 9.1+; Google for the instructions if you’re on a lower version.) psql CREATE EXTENSION pg_trgm; \dx # to check it's installed Now you can do SELECT * FROM types_and_labels_view WHERE label %'Mountain Goats' ORDER BY similarity(label,'Mountain Goats') DESC LIMIT 100; And out will pop the 100 most similar names. This will still take a long time if your table is large, but we can improve that with a special type of index provided by pg_trgm: CREATE INDEX labels_trigram_index ON types_and_labels_table USING gist (label gist_trgm_ops); or CREATE INDEX labels_trigram_index ON types_and_labels_table USING gin (label gin_trgm_ops); (GIN is slower than GIST to build, but answers queries faster. That’ll take a while to build (possibly quite a while), but once it does you should be able to fuzzy search with ease and speed. If you’re using Django, you will have to drop into writing SQL to use this (until someone, maybe you, writes a Django extension to do this in the ORM.) And as a frustrating finishing note, my attempt to implement this on LinerNotes was not ultimately succesful. It seems that that index query performance is at least O(n) and with 50 million entities in my database queries take at least 10 seconds. I’ve read that performance is great up to about 100k records then drops off sharply from there. There are some apparently additional options for improving query performance, but I’ll be sticking with ElasticSearch for now. [1] Sorry, Googlebot! Not sorry, Bingbot.

May 19, 2013

by George London

· 9,625 Views

Lazy sequences implementation for Java 8

I just published the LazySeq library on GitHub - the result of my Java 8 experiments recently. I hope you will enjoy it. Even if you don't find it very useful, it's still a great lesson of functional programming in Java 8 (and in general). Also it's probably the first community library targeting Java 8! Introduction A Lazy sequence is a data structure that is computed only when its elements are actually needed. All operations on lazy sequences, like map() and filter() are lazy as well, postponing invocation up to the moment when it is really necessary. Lazy sequences are always traversed from the beginning using very cheap first/rest decomposition (head() and tail()). An important property of lazy sequences is that they can represent infinite streams of data, e.g. all natural numbers or temperature measurements over time. Lazy sequence remembers already computed values so if you access the Nth element, all elements from 1 to N-1 are computed as well and cached. Despite that LazySeq (being at the core of many functional languages and algorithms) is immutable and thread-safe. Rationale This library is heavily inspired by scala.collection.immutable.Stream and aims to provide immutable, thread-safe and easy to use lazy sequence implementation, possibly infinite. See Lazy sequences in Scala and Clojure for some use cases. Stream class name is already used in Java 8, therefore LazySeq was chosen, similar to lazy-seq in Clojure. Speaking of Stream, at first it looks like a lazy sequence implementation available out-of-the-box. However, quoting Javadoc: Streams are not data structures and: Once an operation has been performed on a stream, it is considered consumed and no longer usable for other operations. In other words java.util.stream.Stream is just a thin wrapper around existing collection, suitable for one time use. More akin to Iterator than to Stream in Scala. This library attempts to fill this niche. Of course implementing lazy sequence data structure was possible prior to Java 8, but lack of lambdas makes working with such data structure tedious and too verbose. Getting started Building and working with lazy sequences in 10 minutes. Infinite sequence of all natural numbers In order to create a lazy sequence you use LazySeq.cons() factory method that accepts first element (head) and a function that might be later used to compute rest (tail). For example in order to produce lazy sequence of natural numbers with given start element you simply say: private LazySeq naturals(int from) { return LazySeq.cons(from, () -> naturals(from + 1)); } There is really no recursion here. If there was, calling naturals() would quickly result in StackOverflowError as it calls itself without stop condition. However () -> naturals(from + 1) expression defines a function returning LazySeq (Supplier to be precise) that this data structure will invoke, but only if needed. Look at the code below, how many times do you think naturals() function was called (except the first line)? final LazySeq ints = naturals(2); final LazySeq strings = ints. map(n -> n + 10). filter(n -> n % 2 == 0). take(10). flatMap(n -> Arrays.asList(0x10000 + n, n)). distinct(). map(Integer::toHexString); First invocation of naturals(2) returns lazy sequence starting from 2 but rest (3, 4, 5, ...) is not computed yet. Later we map() over this sequence, filter() it, take() first 10 elements, remove duplicates, etc. All these operations do not evaluate the sequence and are as lazy as possible. For example take(10) doesn't evaluate first 10 elements eagerly to return them. Instead new lazy sequence is returned which remembers that it should truncate original sequence at 10th element. Same applies to distinct(). It doesn't evaluate the whole sequence to extract all unique values (otherwise code above would explode quickly, traversing infinite amount of natural numbers). Instead it returns a new sequence with only the first element. If you ever ask for the second unique element, it will lazily evaluate tail, but only as much as possible. Check out toString() output: System.out.println(strings); //[1000c, ?] Question mark (?) says: "there might be something more in that collection, but I don't know it yet". Do you understand where did 1000c came from? Look carefully: Start from an infinite stream of natural numbers starting from 2 Add 10 to each element (so the first element becomes 12 or C in hex) filter() out odd numbers (12 is even so it stays) take() first 10 elements from sequence so far Each element is replaced by two elements: that element plus 0x1000 and the element itself (flatMap()). This does not yield a sequence of pairs, but a sequence of integers that is twice as long We ensure only distinct() elements will be returned In the end we turn integers to hex strings. As you can see none of these operations really require evaluating the whole stream. Only head is being transformed and this is what we see in the end. So when this data structure is actually evaluated? When it absolutely must, e.g. during side-effect traversal: strings.force(); //or strings.forEach(System.out::println); //or final List list = strings.toList(); //or for (String s : strings) { System.out.println(s); } All the statements above alone will force evaluation of whole lazy sequence. Not very smart if our sequence was infinite, but strings was limited to first 10 elements so it will not run infinitely. If you want to force only part of the sequence, simply call strings.take(5).force(). BTW have you noticed that we can iterate over LazySeq strings using standard Java 5 for-each syntax? That's because LazySeq implements List interface, thus plays nicely with Java Collections Framework ecosystem: import java.util.AbstractList; public abstract class LazySeq extends AbstractList Please keep in mind that once lazy sequence is evaluated (computed) it will cache (memoize) them for later use. This makes lazy sequences great for representing infinite or very long streams of data that are expensive to compute. iterate() Building an infinite lazy sequence very often boils down to providing an initial element and a function that produces next item based on the previous one. In other words second element is a function of the first one, third element is a function of the second one, and so on. Convenience LazySeq.iterate() function is provided for such circumstances. ints definition can now look like this: final LazySeq ints = LazySeq.iterate(2, n -> n + 1); We start from 2 and each subsequent element is represented as previous element + 1. More examples: Fibonacci sequence and Collatz conjecture No article about lazy data structure can be left without Fibonacci numbers example: private static LazySeq lastTwoFib(int first, int second) { return LazySeq.cons( first, () -> lastTwoFib(second, first + second) ); } Fibonacci sequence is infinite as well but we are free to transform it in multiple ways: System.out.println( fib. drop(5). take(10). toList() ); //[5, 8, 13, 21, 34, 55, 89, 144, 233, 377] final int firstAbove1000 = fib. filter(n -> (n > 1000)). head(); fib.get(45); See how easy and natural it is to work with infinite stream of numbers? drop(5).take(10) skips first 5 elements and displays next 10. At this point first 15 numbers are already computed and will never by computed again. Finding first Fibonacci number above 1000 (happens to be 1597) is very straightforward. head() is always precomputed by filter() , so no further evaluation is needed. Last but not least we can simply just ask for 45th Fibonacci number (0-based) and get 1134903170. If you ever try to access any Fibonacci number up to this one, they are precomputed and fast to retrieve. Finite sequences (Collatz conjecture) Collatz conjecture is also quite interesting problem. For each positive integer n we compute next integer using following algorithm: n/2 if n is even 3n + 1 if n is odd For example starting from 10 series looks as follows: 10, 5, 16, 8, 4, 2, 1. The series ends when it reaches 1. Mathematicians believe that starting from any integer we will eventually reach 1 but it's not yet proven. Let us create a lazy sequence that generates Collatz series for given n, but only as many as needed. As stated above, this time our sequence will be finite: private LazySeq collatz(long from) { if (from > 1) { final long next = from % 2 == 0 ? from / 2 : from * 3 + 1; return LazySeq.cons(from, () -> collatz(next)); } else { return LazySeq.of(1L); } } This implementation is driven directly by the definition. For each number greater than 1 return that number + lazily evaluated (() -> collatz(next)) rest of the stream. As you can see if 1 is given, we return single element lazy sequence using special of() factory method. Let's test it with aforementioned 10: final LazySeq collatz = collatz(10); collatz.filter(n -> (n > 10)).head(); collatz.size(); filter() allows us to find first number in the sequence that is greater than 10. Remember that lazy sequence will have to traverse the contents (evaluate itself), but only to the point where it finds first matching element. Then it stops, ensuring it computes as little as possible. However size(), in order to calculate total number of elements, must traverse the whole sequence. Of course this can only work with finite lazy sequences, calling size() on an infinite sequence will end up poorly. If you play a bit with this sequence you will quickly realize that sequences for different numbers share the same suffix (always end with the same sequence of numbers). This begs for some caching/structural sharing. See CollatzConjectureTest for details. But can it be used to something, you know... useful? Real life? Infinite sequences of numbers are great, but not very practical in real life. Maybe some more down to earth examples? Imagine you have a collection and you need to pick few items from that collection randomly. Instead of collection I will use a function returning random latin characters: private char randomChar() { return (char) ('A' + (int) (Math.random() * ('Z' - 'A' + 1))); } But there is a twist. You need N (N < 26, number of latin characters) unique values. Simply calling randomChar() few times doesn't guarantee uniqueness. There are few approaches to this problem, with LazySeq it's pretty straightforward: LazySeq charStream = LazySeq.continually(this::randomChar); LazySeq uniqueCharStream = charStream.distinct(); continually() simply invokes given function for each element when needed. Thus charStream will be an infinite stream of random characters. Of course they can't be unique. However uniqueCharStream guarantees that its output is unique. It does so by examining next element of underlying charStream and rejecting items that already appeared. We can now say uniqueCharStream.take(4) and be sure that no duplicates will appear. Once again notice that continually(this::randomChar).distinct().take(4) really calls randomChar() only once! As long as you don't consume this sequence, it remains lazy and postpones evaluation as long as possible. Another example involves loading batches (pages) of data from database. Using ResultSet or Iterator is cumbersome but loading whole data set into memory often not feasible. An alternative involves loading first batch of data eagerly and then providing a function to load next batches. Data is loaded only when it's really needed and we don't suffer performance or scalability issues. First let's define abstract API for loading batches of data from database: public List loadPage(int offset, int max) { //load records from offset to offset + max } I abstract from the technology entirely, but you get the point. Imagine that we now define LazySeq that starts from row 0 and loads next pages only when needed: public static final int PAGE_SIZE = 5; private LazySeq records(int from) { return LazySeq.concat( loadPage(from, PAGE_SIZE), () -> records(from + PAGE_SIZE) ); } When creating new LazySeq instance by calling records(0) first page of 5 elements is loaded. This means that first 5 sequence elements are already computed. If you ever try to access 6th or above, sequence will automatically load all missing record and cache them. In other words you never compute the same element twice. More useful tools when working with sequences are grouped() and sliding() methods. First partitions input sequence into groups of equal size. Take this as an example, also proving that these methods are as always lazy: final LazySeq chars = LazySeq.of('A', 'B', 'C', 'D', 'E', 'F', 'G'); chars.grouped(3); //[[A, B, C], ?] chars.grouped(3).force(); //force evaluation //[[A, B, C], [D, E, F], [G]] and similarly for sliding(): chars.sliding(3); //[[A, B, C], ?] chars.sliding(3).force(); //force evaluation //[[A, B, C], [B, C, D], [C, D, E], [D, E, F], [E, F, G]] These two methods are extremely useful. You can look at your data through sliding window (e.g. to compute moving average) or partition it to equal-length buckets. Last interesting utility method you may find useful is scan() that iterates (lazily, of course) the input stream and constructs every element of output by applying a function on previous and current element of input. Code snippet is worth a thousand words: LazySeq list = LazySeq. numbers(1). scan(0, (a, x) -> a + x); list.take(10).force(); //[0, 1, 3, 6, 10, 15, 21, 28, 36, 45] LazySeq.numbers(1) is a sequence of natural numbers (1, 2, 3...). scan() creates a new sequence that starts from 0 and for each element of input (natural numbers) adds it to last element of itself. So we get: [0, 0+1, 0+1+2, 0+1+2+3, 0+1+2+3+4, 0+1+2+3+4+5...]. If you want a sequence of growing strings, just replace few types: LazySeq.continually("*"). scan("", (s, c) -> s + c). map(s -> "|" + s + "\\"). take(10). forEach(System.out::println); And enjoy this beautiful triangle: |\ |*\ |**\ |***\ |****\ |*****\ |******\ |*******\ |********\ |*********\ Alternatively (same output): lazySeq. stream(). map(n -> n + 1). flatMap(n -> asList(0, n - 1).stream()). filter(n -> n != 0). substream(4, 18). limit(10). sorted(). distinct(). collect(Collectors.toList()); Java collections framework interoperability LazySeq implements java.util.List interface, thus can be used in variety of places. Moreover it also implements Java 8 enhancements to collections, namely streams and collectors: lazySeq. stream(). map(n -> n + 1). flatMap(n -> asList(0, n - 1).stream()). filter(n -> n != 0). substream(4, 18). limit(10). sorted(). distinct(). collect(Collectors.toList()); However streams in Java 8 were created to work around feature that is a foundation of LazySeq - lazy evaluation. Example above postpones all intermediate steps until collect() is called. With LazySeq you can safely skip .stream() and work directly on sequence: lazySeq. map(n -> n + 1). flatMap(n -> asList(0, n - 1)). filter(n -> n != 0). slice(4, 18). limit(10). sorted(). distinct(); Moreover LazySeq provides special purpose collector (see: LazySeq.toLazySeq()) that avoids evaluation even when used with collect() - which normally forces full collection computation. Implementation details Each lazy sequence is built around the idea of eagerly computed head and lazily evaluated tail represented as function. This is very similar to classic single-linked list recursive definition: class List { private final T head; private final List tail; //... } However in case of lazy sequence tail is given as a function, not a value. Invocation of that function is postponed as long as possible: class Cons extends LazySeq { private final E head; private LazySeq tailOrNull; private final Supplier> tailFun; @Override public LazySeq tail() { if (tailOrNull == null) { tailOrNull = tailFun.get(); } return tailOrNull; } For full implementation see Cons.java and FixedCons.java used when tail is known at creation time (for example LazySeq.of(1, 2) as opposed to LazySeq.cons(1, () -> someTailFun()). Pitfalls and common dangers Below common issues and misunderstandings are described. Evaluating too much One of the biggest dangers of working with infinite sequences is trying to evaluate them completely, which obviously leads to infinite computation. The idea behind infinite sequence is not to evaluate it in its entirety but to take as much as we need without introducing artificial limits and accidental complexity (see database loading example). However evaluating whole sequence is way too simple to miss. For example calling LazySeq.size()must evaluate whole sequence and will run infinitely, eventually filling up stack or heap (implementation detail). There are other methods that require full traversal in order to function properly. E.g. allMatch() making sure all elements match given predicate. Some methods are even more dangerous, because whether they will finish or not depends on data in the sequence. For example anyMatch() may return immediately if head matches predicate - or never. Sometimes we can easily avoid costly operations by using more deterministic methods. For example: seq.size() <= 10 //BAD may not work or be extremely slow if seq is infinite. However we can achieve the same with (more) predictable: seq.drop(10).isEmpty() Remember that lazy sequences are immutable (so we don't really mutate seq), drop(n) is typically O(n) while isEmpty() is O(1). When in doubt, consult source code or JavaDoc to make sure your operation won't too eagerly evaluate your sequence. Also be very cautious when using LazySeq where java.util.Collection or java.util.List is expected. Holding unnecessary reference to head Lazy sequences be definition remember already computed elements. You have to be aware of that, otherwise your sequence (especially infinite) will quickly fill up available memory. However, because LazySeq is just a fancy linked list, if you no longer keep a reference to head (but only to some element in the middle), it becomes eligible for garbage collection. For example: //LazySeq first = seq.take(10); seq = seq.drop(10); First ten elements are dropped and we assume nothing holds a reference to what previously was hept in seq. This makes first ten elements eligible for garbage collection. However if we uncomment first line and keep reference to old head in first, JVM will not release any memory. Let's put that into perspective. The following piece of code will eventually throw OutOfMemoryError because infinite reference keeps holding the beginning of the sequence, therefore all the elements created so far: LazySeq infinite = LazySeq.continually(Big::new); for (Big arr : infinite) { // } However by inlining call to continually() or extracting it to a method this code works flawlessly (well, still runs forever, but uses almost no memory): private LazySeq getContinually() { return LazySeq.continually(Big::new); } for (Big arr : getContinually()) { // } What's the difference? For-each loop uses iterators underneath. LazySeqIterator underneath doesn't hold a reference to old head() when it advances, so if nothing else references that head, it will be eligible for garbage collection, see true javac output when for-each is used: for (Iterator cur = getContinually().iterator(); cur.hasNext(); ) { final Big arr = cur.next(); //... } TL;DR Your sequence grows while being traversed. If you keep holding one end while the other grows, it will eventually blow up. Just like your first level cache in Hibernate if you load too much in one transaction. Use only as much as needed. Converting to plain Java collections Converting is simple, but dangerous. This is a consequence of points above. You can convert lazy sequence to java.util.List by calling toList(): LazySeq even = LazySeq.numbers(0, 2); even.take(5).toList(); //[0, 2, 4, 6, 8] or using Collector from Java 8 having richer API: even. stream(). limit(5). collect(Collectors.toSet()) //[4, 6, 0, 2, 8] But remember that Java collections are finite from definition so avoid converting lazy sequences to collections explicitly. Note that LazySeq is already List, thus Iterable and Collection. It also has efficient LazySeq.iterator(). If you can, simply pass LazySeq instance directly and may just work. Performance, time and space complexity head() of every sequence (except empty) is always computed eagerly, thus accessing it is fast O(1). Computing tail() may take everything from O(1) (if it was already computed) to infinite time. As an example take this valid stream: import static com.blogspot.nurkiewicz.lazyseq.LazySeq.cons; import static com.blogspot.nurkiewicz.lazyseq.LazySeq.continually; LazySeq oneAndZeros = cons( 1, () -> continually(0) ). filter(x -> (x > 0)); It represents 1 followed by infinite number of 0s. By filtering all positive numbers (x > 0) we get a sequence with same head, but filtering of tail is delayed (lazy). However if we now carelessly call oneAndZeros.tail(), LazySeq will keep computing more and more of this infinite sequence, but since there is no positive element after initial 1, this operation will run forever, eventually throwing StackOverflowError or OutOfMemoryError (this is an implementation detail). However if you ever reach this state, it's probably a programming bug or misusing of the library. Typically tail() will be close to O(1). On the other hand if you have plenty of operations already "stacked", calling tail() will trigger them rapidly one after another, so tail() run time is heavily dependant on your data structure. Most operations on LazySeq are O(1) since they are lazy. Some operations, like get(n) or drop(n) are O(n) (n represents parameter, not sequence length). In general run time will be similar to normal linked list. Because LazySeq remembers all already computed values in a single linked list, memory consumption is always O(n), where nn is the number of already computed elements. Troubleshooting Error invalid target release: 1.8 during maven build If you see this error message during maven build: [INFO] BUILD FAILURE ... [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project lazyseq: Fatal error compiling: invalid target release: 1.8 -> [Help 1] it means you are not compiling using Java 8. Download JDK 8 with lambda support and let maven use it: $ export JAVA_HOME=/path/to/jdk8 I get StackOverflowError or program hangs infinitely When working with LazySeq you sometimes get StackOverflowError or OutOfMemoryError: java.lang.StackOverflowError at sun.misc.Unsafe.allocateInstance(Native Method) at java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:426) at com.blogspot.nurkiewicz.lazyseq.LazySeq.iterate(LazySeq.java:118) at com.blogspot.nurkiewicz.lazyseq.LazySeq.lambda$0(LazySeq.java:118) at com.blogspot.nurkiewicz.lazyseq.LazySeq$$Lambda$2.get(Unknown Source) at com.blogspot.nurkiewicz.lazyseq.Cons.tail(Cons.java:32) at com.blogspot.nurkiewicz.lazyseq.LazySeq.size(LazySeq.java:325) at com.blogspot.nurkiewicz.lazyseq.LazySeq.size(LazySeq.java:325) at com.blogspot.nurkiewicz.lazyseq.LazySeq.size(LazySeq.java:325) at com.blogspot.nurkiewicz.lazyseq.LazySeq.size(LazySeq.java:325) at com.blogspot.nurkiewicz.lazyseq.LazySeq.size(LazySeq.java:325) at com.blogspot.nurkiewicz.lazyseq.LazySeq.size(LazySeq.java:325) at com.blogspot.nurkiewicz.lazyseq.LazySeq.size(LazySeq.java:325) When working with possibly infinite data structures, care must be taken. Avoid calling operations that must (size(), allMatch(), minBy(), forEach(), reduce(), ...) or can (filter(), distinct(), ...) traverse the whole sequence in order to give correct results. See Pitfalls for more examples and ways to avoid. Maturity Quality This project was started as an exercise and is not battle-proven. But a healthy 300+ unit-test suite (3:1 test code/production code ratio) guards quality and functional correctness. I also make sure LazySeq is as lazy as possible by mocking tail functions and verifying they are called as rarely as one can get. Contributions and bug reports In the event of finding a bug or missing feature, don't hesitate to open a new ticket or start pull request. I would also love to see more interesting usages of LazySeq in wild. Possible improvements Just like FixedCons is used when tail is known up-front, consider IterableCons that wraps existing Iterable in one node rather than building FixedCons hierarchy. This can be used for all concat methods. Parallel processing support (implementing spliterator?) License This project is released under version 2.0 of the Apache License.

May 15, 2013

by Tomasz Nurkiewicz

· 29,052 Views · 1 Like

Deploy a File Server in the Cloud (WebDav on Windows Azure)

this month, my fellow it pro technical evangelists and i are authoring a new series of articles on 20 key scenarios with windows azure infrastructure services . check out the list of articles here: http://mythoughtsonit.com/2013/05/20-key-scenarios-with-windows-azure-infrastructure-services/ . web-based distributed authoring and versioning, or webdav, is a set of protocols based on http that allows end-users to map a network drive over http and edit content and files stored on the web server. when webdav was first offered on microsoft server i had evaluated it and decided it did not perform well enough for me. the webdav extension to iis was completely rewritten back in the server 2008 timeframe and is worth taking a look at again. in this article i will guide you step by step through the process of setting up webdav on server 2012 in a windows azure iaas environment. this will give you a solid performing file share on the internet over port 80 and the http protocol. first you need an azure account. you can setup a free trail of azure. details can be found here: http://mythoughtsonit.com/2013/04/step-by-step-guide-to-setting-up-a-windows-azure-free-trial/ second provision a server 2012 machine. watch a video of what to do here: third open port 80 to this new server: in the azure portal select your 2012 server and choose the “endpoints” tab on the top. click “add endpoint” at the bottom of the screen enter the endpoint information for port 80 to port 80 done. next we need to install the iis webserver and webdav. installing webdav on iis 8.0 start server manager and go to “add roles and features” under server roles – add the web server (iis) role click through the wizard until you come to the role services section. then find and select “webdav publishing” and “windows authentication” click next and then install when the install is finished you are ready to move on to the next section. configuring iis 8 for webdav after the installation finishes you need to configure the box for access. start the iis manager tool. choose the “default web site” on the left side. then click on “authentication” open the windows authentication option and enable it. open the “webdav authoring rules” create a webdav rule. i choose to allow all users access to all content. a better security practice is to limit what users can use the service. it’s your data so you decide. make sure webdav is enabled and that your access rule is set: that is it… now your ready to access your webdav file share! test and insure you can hit the web server by using your browser: because you opened port 80 and installed iis 8 you should see the default web page when you browse to your servers internet dns name. example: http://yourdomainname.cloudapp.net/ how to map a drive to your webdav server: there are two ways i use to connect to the webdav server how to map a drive to your webdav server from the win 8 gui: from windows explorer, right click on “computer” and select “map a network drive” map your network drive by entering the address to your server example: http://yourdomainname.cloudapp.net/ i selected “connect using different credentials” because my workstation was not joined to the server in anyway and i needed to use an account in the servers local sam database. hit “finish” and enter your credentials. now you will have a connected drive that you can access from windows explorer or any tool via the drive mapping. how to map a drive to your webdav server from a cmd box: 1. hit windows start and type: cmd 2. enter the command: net use [drive letter] [url] example: net use e: http://yourdomainname.cloudapp.net/

May 15, 2013

by Brian Lewis

· 15,969 Views

JPA - Querydsl Projections

In my last post: JPA - Basic Projections - I've mentioned about two basic possibilities of building JPA Projections. This post brings you more examples, this time based on Querydsl framework. Note, that I'm referring Querydslversion 3.1.1 here. Reinvented constructor expressions Take a look at the following code:... import static com.blogspot.vardlokkur.domain.QEmployee.employee; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import org.springframework.beans.factory.annotation.Autowired; import com.blogspot.vardlokkur.domain.EmployeeNameProjection; import com.mysema.query.jpa.JPQLTemplates; import com.mysema.query.jpa.impl.JPAQuery; import com.mysema.query.types.ConstructorExpression; ... public class ConstructorExpressionExample { ... @PersistenceContext private EntityManager entityManager; @Autowired private JPQLTemplates jpqlTemplates; public void someMethod() { ... final List projections = new JPAQuery(entityManager, jpqlTemplates) .from(employee) .orderBy(employee.name.asc()) .list(ConstructorExpression.create(EmployeeNameProjection.class, employee.employeeId, employee.name)); ... } ... } The above Querydsl construction means: create new JPQL query [1][2], using employee as the data source, order the data using employee name [3], and return the list of EmployeeNameProjection, built using the 2-arg constructor called with employee ID and name [4]. This is very similar to the constructor expressions example from my previous post (JPA - Basic Projections), and leads to the following SQL query: select EMPLOYEE_ID, EMPLOYEE_NAME from EMPLOYEE order by EMPLOYEE_NAME asc As you see above, the main advantage comparing to the JPA constructor expressions is using Java class, instead of its name hard-coded in JPQL query. Even more reinvented constructor expressions Querydsl documentation [4] describes another way of using constructor expressions, requiring @QueryProjectionannotation and Query Type [1] usage for projection, see example below. Let's start with the projection class modification - note that I added @QueryProjection annotation on the class constructor. package com.blogspot.vardlokkur.domain; import java.io.Serializable; import javax.annotation.concurrent.Immutable; import com.mysema.query.annotations.QueryProjection; @Immutable public class EmployeeNameProjection implements Serializable { private final Long employeeId; private final String name; @QueryProjection public EmployeeNameProjection(Long employeeId, String name) { super(); this.employeeId = employeeId; this.name = name; } public Long getEmployeeId() { return employeeId; } public String getName() { return name; } } Now we may use modified projection class (and corresponding Query Type [1] ) in following way: ... import static com.blogspot.vardlokkur.domain.QEmployee.employee; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import org.springframework.beans.factory.annotation.Autowired; import com.blogspot.vardlokkur.domain.EmployeeNameProjection; import com.blogspot.vardlokkur.domain.QEmployeeNameProjection; import com.mysema.query.jpa.JPQLTemplates; import com.mysema.query.jpa.impl.JPAQuery; ... public class ConstructorExpressionExample { ... @PersistenceContext private EntityManager entityManager; @Autowired private JPQLTemplates jpqlTemplates; public void someMethod() { ... final List projections = new JPAQuery(entityManager, jpqlTemplates) .from(employee) .orderBy(employee.name.asc()) .list(new QEmployeeNameProjection(employee.employeeId, employee.name)); ... } ... } Which leads to SQL query: select EMPLOYEE_ID, EMPLOYEE_NAME from EMPLOYEE order by EMPLOYEE_NAME asc In fact, when you take a closer look at the Query Type [1] generated for EmployeeNameProjection(QEmployeeNameProjection), you will see it is some kind of "shortcut" for creating constructor expression the way described in first section of this post. Mapping projection Querydsl provides another way of building projections, using factories based on MappingProjection. package com.blogspot.vardlokkur.domain; import static com.blogspot.vardlokkur.domain.QEmployee.employee; import com.mysema.query.Tuple; import com.mysema.query.types.MappingProjection; public class EmployeeNameProjectionFactory extends MappingProjection { public EmployeeNameProjectionFactory() { super(EmployeeNameProjection.class, employee.employeeId, employee.name); } @Override protected EmployeeNameProjection map(Tuple row) { return new EmployeeNameProjection(row.get(employee.employeeId), row.get(employee.name)); } } The above class is a simple factory creating EmployeeNameProjection instances using employee ID and name. Note that the factory constructor defines which employee properties will be used for building the projection, and mapmethod defines how the instances will be created. Below you may find an example of using the factory: ... import static com.blogspot.vardlokkur.domain.QEmployee.employee; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import org.springframework.beans.factory.annotation.Autowired; import com.blogspot.vardlokkur.domain.EmployeeNameProjection; import com.blogspot.vardlokkur.domain.EmployeeNameProjectionFactory import com.mysema.query.jpa.JPQLTemplates; import com.mysema.query.jpa.impl.JPAQuery; ... public class MappingProjectionExample { ... @PersistenceContext private EntityManager entityManager; @Autowired private JPQLTemplates jpqlTemplates; public void someMethod() { ... final List projections = new JPAQuery(entityManager, jpqlTemplates) .from(employee) .orderBy(employee.name.asc()) .list(new EmployeeNameProjectionFactory()); .... } ... } As you see, the one and only difference here, comparing to constructor expression examples, is the list method call. Above example leads again to the very simple SQL query: select EMPLOYEE_ID, EMPLOYEE_NAME from EMPLOYEE order by EMPLOYEE_NAME asc Building projections this way is much more powerful, and doesn't require existence of n-arg projection constructor. QBean based projection (JavaBeans strike again) There is at least one more possibility of creating projection with Querydsl - QBean based - in this case we build the result list using: ... .list(Projections.bean(EmployeeNameProjection.class, employee.employeeId, employee.name)) This way requires EmployeeNameProjection class to follow JavaBean conventions, which is not always desired in application. Use it if you want, but you have been warned ;) Few links for the dessert Using Query Types Querying Ordering Constructor projections

May 15, 2013

by Michal Jastak

· 39,513 Views · 3 Likes

Getting Started with Active Directory Lightweight Directory Services

introduction in preparation for some upcoming posts related to linq (what else?), windows powershell and rx, i had to set up a local ldap-capable directory service. (hint: it will pay off to read till the very end of the post if you’re wondering what i’m up to...) in this post i’ll walk the reader through the installation, configuration and use of active directory lightweight directory services (lds) , formerly known as active directory application mode (adam). having used the technology several years ago, in relation to the linq to active directory project (which as an extension to this blog series will receive an update), it was a warm and welcome reencounter. what’s lightweight directory services anyway? use of hierarchical storage and auxiliary services provided by technologies like active directory often has advantages over alternative designs, e.g. using a relational database. for example, user accounts may be stored in a directory service for an application to make use of. while active directory seems the natural habitat to store (and replicate, secure, etc.) additional user information, it admins will likely point you – the poor developer – at the door when asking to extend the schema. that’s one of the places where lds comes in, offering the ability to take advantage of the programming model of directory services while keeping your hands off “the one and only ad schema”. the lds website quotes other use cases, which i’ll just copy here verbatim: active directory lightweight directory service (ad lds), formerly known as active directory application mode, can be used to provide directory services for directory-enabled applications. instead of using your organization’s ad ds database to store the directory-enabled application data, ad lds can be used to store the data. ad lds can be used in conjunction with ad ds so that you can have a central location for security accounts (ad ds) and another location to support the application configuration and directory data (ad lds). using ad lds, you can reduce the overhead associated with active directory replication, you do not have to extend the active directory schema to support the application, and you can partition the directory structure so that the ad lds service is only deployed to the servers that need to support the directory-enabled application. install from media generation. the ability to create installation media for ad lds by using ntdsutil.exe or dsdbutil.exe. auditing. auditing of changed values within the directory service. database mounting tool. gives you the ability to view data within snapshots of the database files. active directory sites and services support. gives you the ability to use active directory sites and services to manage the replication of the ad lds data changes. dynamic list of ldif files. with this feature, you can associate custom ldif files with the existing default ldif files used for setup of ad lds on a server. recursive linked-attribute queries. ldap queries can follow nested attribute links to determine additional attribute properties, such as group memberships. obviously that last bullet point grabs my attention through i will retain myself from digressing here. getting started if you’re running windows 7, the following explanation is the right one for you. for older versions of the operating system, things are pretty similar though different downloads will have to be used. for windows server 2008, a server role exists for lds. so, assuming you’re on windows 7, start by downloading the installation media over here . after installing this, you should find an entry “active directory lightweight directory services setup wizard” under the “administrative tools” section in “control panel”: lds allows you to install multiple instances of directory services on the same machine, just like sql server allows multiple server instances to co-exist. each instance has a name and listens on certain ports using the ldp protocol. starting this wizard – which lives under %systemroot%\adam\adaminstall.exe, revealing the former product name – brings us here: after clicking next, we need to decide whether we create a new unique instance that hasn’t any ties with existing instances, or whether we want to create a replicate of an existing instance. for our purposes, the first option is what we need: next, we’re asked for an instance name. the instance name will be used for the creation of a windows service, as well as to store some settings. each instance will get its own windows service. in our sample, we’ll create a directory for the northwind employees tables, which we’ll use to create accounts further on. we’re almost there with the baseline configuration. the next question is to specify a port number, both for plain tcp and for ssl-encrypted traffic. the default ports, 389 and 636, are fine for us. later we’ll be able to connect to the instance by connecting to ldp over port 389, e.g. using the system.directoryservices namespace functionality in .net. notice every instance of lds should have its own port number, so only one can be using the default port numbers. now that we have completed the “physical administration”, the wizard moves on to a bit of “logical administration”. more specifically, we’re given the option to create a directory partition for the application. here we choose to create such a partition, though in many concrete deployment scenarios you’ll want the application’s setup to create this at runtime. our partition’s distinguished name will mimic a “northwind.local” domain containing a partition called “employees”: after this bit of logical administration, some more physical configuration has to be carried out, specifying the data files location and the account to run the services under. for both, the default settings are fine. also the administrative account assigned to manage the lds instance can be kept as the currently logged in user, unless you feel the need to change this in your scenario: finally, we’ve arrived at an interesting step where we’re given the option to import ldif files. and ldif file, with extension .ldf, contains the definition of a class that can be added to a directory service’s schema. basically those contain things like attributes and their types. under the %systemroot%\adam folder, a set of out-of-the-box .ldf files can be found: instead of having to run the ldifde.exe tool, the wizard gives us the option to import ldif files directly. those classes are documented in various places, such as rfc2798 for inetorgperson . on technet, information is presented in a more structured manner, e.g revealing that inetorgperson is a subclass of user . custom classes can be defined and imported after setup has completed. in this post, we won’t extend the schema ourselves but we will simply be using the built-in user class so let’s tick that one: after clicking next, we get a last chance to revisit our settings or can confirm the installation. at this point, the wizard will create the instance – setting up the service – and import the ldif files. congratulations! your first lds instance has materialized. if everything went alright, the northwindemployees service should show up: inspecting the directory to inspect the newly created directory instance, a bunch of tools exist. one is adsi edit which you could already see in the administrative tools. to set it up, open the mmc-based tool and go to action, connect to… in the dialog that appears, specify the server name and choose schema as the naming context. for example, if you want to inspect the user class, simply navigate to the schema node in the tree and show the properties of the user entry. to visualize the objects in the application partition, connect using the distinguished name specified during the installation: now it’s possible to create a new object in the directory using the context menu in the content pane: after specifying the class, we get to specify the “cn” name (for common name) of the object. in this case, i’ll use my full name: we can also set additional attributes, as shown below (using the “physicaldeliveryofficename” to specify the office number of the user): after clicking set, closing the attributes dialog and clicking finish to create the object, we see it pop up in the items view of the adsi editor snap-in: programmatic population of the directory obviously we’re much more interested in a programmatic way to program directory services. .net supports the use of directory services and related protocols (ldap in particular) through the system.directoryservices namespace. in a plain new console application, add a reference to the assembly with the same name (don’t both about other assemblies that deal with account management and protocol stuff): for this sample, i’ll also assume the reader got a northwind sql database sitting somewhere and knows how to get data out of its employees table as rich objects. below is how things look when using the linq to sql designer: we’ll just import a few details about the users; it’s left to the reader to map other properties onto attributes using the documentation about the user directory services class . just a few lines of code suffice to accomplish the task (assuming the system.directoryservices namespace is imported): static void main() { var path = "ldap://bartde-hp07/cn=employees,dc=northwind,dc=local"; var root = new directoryentry(path); var ctx = new northwinddatacontext(); foreach (var e in ctx.employees) { var cn = "cn=" + e.firstname + e.lastname; var u = root.children.add(cn, "user"); u.properties["employeeid"].value = e.employeeid; u.properties["sn"].value = e.lastname; u.properties["givenname"].value = e.firstname; u.properties["comment"].value = e.notes; u.properties["homephone"].value = e.homephone; u.properties["photo"].value = e.photo.toarray(); u.commitchanges(); } } after running this code – obviously changing the ldap path to reflect your setup – you should see the following in adsi edit (after hitting refresh): now it’s just plain easy to write an application that visualizes the employees with their data. we’ll leave that to the ui-savvy reader (just to tease that segment of my audience, i’ve also imported the employee’s photo as a byte-array). a small preview of what’s coming up to whet the reader’s appetite about next episodes on this blog, below is a single screenshot illustrating something – imho – rather cool (use of linq to active directory is just an implementation detail below): note: what’s shown here is the result of a very early experiment done as part of my current job on “linq to anything” here in the “cloud data programmability team”. please don’t fantasize about it as being a vnext feature of any product involved whatsoever. the core intent of those experiments is to emphasize the omnipresence of linq (and more widely, monads) in today’s (and tomorrow’s) world. while we’re not ready to reveal the “linq to anything” mission in all its glory (rather think of it as “linq to the unimaginable”), we can drop some hints. stay tuned for more!

May 11, 2013

by Bart De Smet

· 26,414 Views · 1 Like

Hibernate 3 with Spring

1. Overview This article will focus on setting up Hibernate 3 with Spring – we’ll look at how to configure Spring 3 with Hibernate 3 using both Java and XML Configuration. 2. Maven To add the Spring Persistence dependencies to the pom, please see the Spring with Maven article. Continuing with Hibernate 3, the Maven dependencies are simple: org.hibernate hibernate-core 3.6.10.Final Then, to enable Hibernate to use its proxy model, we need javassist as well: org.javassist javassist 3.17.1-GA And since we’re going to use MySQL for this tutorial, we’ll also need: mysql mysql-connector-java 5.1.25 runtime 3. Java Spring Configuration for Hibernate 3 Setting up Hibernate 3 with Spring and Java configuration is straightforward: import java.util.Properties; import javax.sql.DataSource; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.PropertySource; import org.springframework.core.env.Environment; import org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor; import org.springframework.jdbc.datasource.DriverManagerDataSource; import org.springframework.orm.hibernate3.HibernateTransactionManager; import org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean; import org.springframework.transaction.annotation.EnableTransactionManagement; import com.google.common.base.Preconditions; @Configuration @EnableTransactionManagement @PropertySource({ "classpath:persistence-mysql.properties" }) @ComponentScan({ "org.baeldung.spring.persistence" }) public class PersistenceConfig { @Autowired private Environment env; @Bean public AnnotationSessionFactoryBean sessionFactory() { AnnotationSessionFactoryBean sessionFactory = new AnnotationSessionFactoryBean(); sessionFactory.setDataSource(restDataSource()); sessionFactory.setPackagesToScan(new String[] { "org.baeldung.spring.persistence.model" }); sessionFactory.setHibernateProperties(hibernateProperties()); return sessionFactory; } @Bean public DataSource restDataSource() { DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName(env.getProperty("jdbc.driverClassName")); dataSource.setUrl(env.getProperty("jdbc.url")); dataSource.setUsername(env.getProperty("jdbc.user")); dataSource.setPassword(env.getProperty("jdbc.pass")); return dataSource; } @Bean public HibernateTransactionManager transactionManager() { HibernateTransactionManager txManager = new HibernateTransactionManager(); txManager.setSessionFactory(sessionFactory().getObject()); return txManager; } @Bean public PersistenceExceptionTranslationPostProcessor exceptionTranslation() { return new PersistenceExceptionTranslationPostProcessor(); } Properties hibernateProperties() { return new Properties() { { setProperty("hibernate.hbm2ddl.auto", env.getProperty("hibernate.hbm2ddl.auto")); setProperty("hibernate.dialect", env.getProperty("hibernate.dialect")); } }; } } Compared to the XML Configuration – described next – there is a small difference in the way one bean in the configuration access another. In XML there is no difference between pointing to a bean or pointing to a bean factory capable of creating that bean. Since the Java configuration is type-safe – pointing directly to the bean factory is no longer an option – we need to retrieve the bean from the bean factory manually: txManager.setSessionFactory(sessionFactory().getObject()); 4. XML Spring Configuration for Hibernate 3 Simillary, Hibernate 3 can be configured using XML Configuration as well: ${hibernate.hbm2ddl.auto} ${hibernate.dialect} Then, this XML file is boostrapped into the Spring context: @Configuration @EnableTransactionManagement @ImportResource({ "classpath:persistenceConfig.xml" }) public class PersistenceXmlConfig { // } For both types of configuration, the JDBC and Hibernate specific properties are stored in a properties file: # jdbc.X jdbc.driverClassName=com.mysql.jdbc.Driver jdbc.url=jdbc:mysql://localhost:3306/spring_hibernate_dev?createDatabaseIfNotExist=true jdbc.user=tutorialuser jdbc.pass=tutorialmy5ql # hibernate.X hibernate.dialect=org.hibernate.dialect.MySQL5Dialect hibernate.show_sql=false hibernate.hbm2ddl.auto=create-drop 5. Spring, Hibernate and MySQL The example above uses MySQL 5 as the underlying database configured with Hibernate – however, Hibernate supports several underlying SQL Databases. 5.1. The Driver The Driver class name is configured via the jdbc.driverClassName property provided to the DataSource. In the example above, it is set to com.mysql.jdbc.Driver from the mysql-connector-java dependency we defined in the pom, at the start of the article. 5.2. The Dialect The Dialect is configured via the hibernate.dialect property provided to the Hibernate SessionFactory. In the example above, this is set to org.hibernate.dialect.MySQL5Dialect as we are using MySQL 5 as the underlying Database. There are several other dialects supporting MySQL: org.hibernate.dialect.MySQL5InnoDBDialect – for MySQL 5.x with the InnoDB storage engine org.hibernate.dialect.MySQLDialect – for MySQL prior to 5.x org.hibernate.dialect.MySQLInnoDBDialect – for MySQL prior to 5.x with the InnoDB storage engine org.hibernate.dialect.MySQLMyISAMDialect – for all MySQL versions with the ISAM storage engine Hibernate supports SQL Dialects for every supported Database. 6. Usage At this point, Hibernate 3 is fully configured with Spring and we can inject the raw HibernateSessionFactory directly whenever we need to: public abstract class FooHibernateDAO{ @Autowired SessionFactory sessionFactory; ... protected Session getCurrentSession(){ return sessionFactory.getCurrentSession(); } } 7. Conclusion In this example, we configured Hiberate 3 with Spring – both with Java and XML configuration. The implementation of this simple project can be found in the github project – this is an Eclipse based project, so it should be easy to import and run as it is.

May 8, 2013

by Eugen Paraschiv

· 13,014 Views · 1 Like

Neo4j/Cypher: Returning a Row with Zero Count When No Relationship Exists

I’ve been trying to see if I can match some of the football stats that OptaJoe posts on twitter and one that I was looking at yesterday was around the number of red cards different teams have received. 1 – Sunderland have picked up their first PL red card of the season. The only team without one now are Man Utd. Angels. To refresh this is the sub graph that we’ll need to look at to work it out: I started off with the following query which traverses out from each match, finds the players who were sent off in the match and then groups the sendings off by the team they were playing for: START game = node:matches('match_id:*') MATCH game<-[:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, COUNT(game) AS redCards ORDER BY redCards LIMIT 5 When we run this we get the following results: +------------------------------+ | team.name | redCards | +------------------------------+ | "Sunderland" | 1 | | "West Ham United" | 1 | | "Norwich City" | 1 | | "Reading" | 1 | | "Liverpool" | 2 | +------------------------------+ 5 rows The problem we have here is that it hasn’t returned Manchester United because they haven’t yet received any red cards and therefore none of their players match the ‘sent_off_in’ relationship. I ran into something similar in a post I wrote about a month ago where I was working out which day of the week players scored on. The first step towards getting Manchester United to return with a count of 0 is to make the ‘sent_off_in’ relationship optional. However, that on its own that isn’t enough because it now returns a count of all the player performances for each team: START game = node:matches('match_id:*') MATCH game<-[?:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, COUNT(game) AS redCards ORDER BY redCards ASC LIMIT 5 +-----------------------------+ | team.name | redCards | +-----------------------------+ | "Chelsea" | 448 | | "Wigan Athletic" | 459 | | "Fulham" | 460 | | "Liverpool" | 466 | | "Everton" | 467 | +-----------------------------+ 5 rows Instead what we need to do is collect up all the ‘sent_off_in’ relationships and sum them up. We can use the COLLECT function to do that and the neat thing about COLLECT is that it doesn’t bother collecting the empty relationships so we end up with exactly what we need: START game = node:matches('match_id:*') MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, COLLECT(r) AS redCards LIMIT 5 +-----------------------------------------------------------------------------------------------------+ | team.name | redCards | +-----------------------------------------------------------------------------------------------------+ | "Wigan Athletic" | [:sent_off_in[26443] {},:sent_off_in[37785] {}] | | "Everton" | [:sent_off_in[6795] {minute:61},:sent_off_in[21735] {},:sent_off_in[34594] {}] | | "Newcastle United" | [:sent_off_in[434] {minute:75},:sent_off_in[32389] {},:sent_off_in[34915] {}] | | "Southampton" | [:sent_off_in[49393] {minute:70},:sent_off_in[49392] {minute:82}] | | "West Ham United" | [:sent_off_in[21734] {minute:67}] | +-----------------------------------------------------------------------------------------------------+ 5 rows We then just need to call the LENGTH function to work out how many red cards there are in each collection and then we’re done: START game = node:matches('match_id:*') MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, LENGTH(COLLECT(r)) AS redCards ORDER BY redCards LIMIT 5 +--------------------------------+ | team.name | redCards | +--------------------------------+ | "Manchester United" | 0 | | "West Ham United" | 1 | | "Sunderland" | 1 | | "Norwich City" | 1 | | "Reading" | 1 | +--------------------------------+ 5 rows

April 30, 2013

by Mark Needham

· 5,899 Views

Multipart Upload on S3 with jclouds

1. Goal In the previous article, we looked at how we can use the generic Blob APIs from jclouds to upload content to S3. In this article we will use the S3 specific asynchronous API from jclouds to upload content and leverage the multipart upload functionality provided by S3. 2. Preparation 2.1. Set up the custom API The first part of the upload process is creating the jclouds API – this is a custom API for Amazon S3: public AWSS3AsyncClient s3AsyncClient() { String identity = ... String credentials = ... BlobStoreContext context = ContextBuilder.newBuilder("aws-s3"). credentials(identity, credentials).buildView(BlobStoreContext.class); RestContext providerContext = context.unwrap(); return providerContext.getAsyncApi(); } 2.2. Determining the number of parts for the content Amazon S3 has a 5 MB limit for each part to be uploaded. As such, the first thing we need to do is determine the right number of parts that we can split our content into so that we don’t have parts below this 5 MB limit: public static int getMaximumNumberOfParts(byte[] byteArray) { int numberOfParts= byteArray.length / fiveMB; // 5*1024*1024 if (numberOfParts== 0) { return 1; } return numberOfParts; } 2.3. Breaking the content into parts Were going to break the byte array into a set number of parts: public static List breakByteArrayIntoParts(byte[] byteArray, int maxNumberOfParts) { List parts = Lists. newArrayListWithCapacity(maxNumberOfParts); int fullSize = byteArray.length; long dimensionOfPart = fullSize / maxNumberOfParts; for (int i = 0; i < maxNumberOfParts; i++) { int previousSplitPoint = (int) (dimensionOfPart * i); int splitPoint = (int) (dimensionOfPart * (i + 1)); if (i == (maxNumberOfParts - 1)) { splitPoint = fullSize; } byte[] partBytes = Arrays.copyOfRange(byteArray, previousSplitPoint, splitPoint); parts.add(partBytes); } return parts; } We’re going to test the logic of breaking the byte array into parts – we’re going to generate some bytes, split the byte array, recompose it back together using Guava and verify that we get back the original: @Test public void given16MByteArray_whenFileBytesAreSplitInto3_thenTheSplitIsCorrect() { byte[] byteArray = randomByteData(16); int maximumNumberOfParts = S3Util.getMaximumNumberOfParts(byteArray); List fileParts = S3Util.breakByteArrayIntoParts(byteArray, maximumNumberOfParts); assertThat(fileParts.get(0).length + fileParts.get(1).length + fileParts.get(2).length, equalTo(byteArray.length)); byte[] unmultiplexed = Bytes.concat(fileParts.get(0), fileParts.get(1), fileParts.get(2)); assertThat(byteArray, equalTo(unmultiplexed)); } To generate the data, we simply use the support from Random: byte[] randomByteData(int mb) { byte[] randomBytes = new byte[mb * 1024 * 1024]; new Random().nextBytes(randomBytes); return randomBytes; } 2.4. Creating the Payloads Now that we have determined the correct number of parts for our content and we managed to break the content into parts, we need to generate the Payload objects for the jclouds API: public static List createPayloadsOutOfParts(Iterable fileParts) { List payloads = Lists.newArrayList(); for (byte[] filePart : fileParts) { byte[] partMd5Bytes = Hashing.md5().hashBytes(filePart).asBytes(); Payload partPayload = Payloads.newByteArrayPayload(filePart); partPayload.getContentMetadata().setContentLength((long) filePart.length); partPayload.getContentMetadata().setContentMD5(partMd5Bytes); payloads.add(partPayload); } return payloads; } 3. Upload The upload process is a flexible multi-step process – this means: the upload can be started before having all the data – data can be uploaded as it’s coming in data is uploaded in chunks – if one of these operations fails, it can simply be retrieved chunks can be uploaded in parallel – this can greatly increase the upload speed, especially in the case of large files 3.1. Initiating the Upload operation The first step in the Upload operation is to initiate the process. This request to S3 must contain the standard HTTP headers – the Content-MD5 header in particular needs to be computed. Were going to use the Guava hash function support here: Hashing.md5().hashBytes(byteArray).asBytes(); This is the md5 hash of the entire byte array, not of the parts yet. To initiate the upload, and for all further interactions with S3, we’re going to use the AWSS3AsyncClient – the asynchronous API we created earlier: ObjectMetadata metadata = ObjectMetadataBuilder.create().key(key).contentMD5(md5Bytes).build(); String uploadId = s3AsyncApi.initiateMultipartUpload(container, metadata).get(); The key is the handle assigned to the object – this needs to be a unique identifier specified by the client. Also notice that, even though we’re using the async version of the API, we’re blocking for the result of this operation – this is because we will need the result of the initialize to be able to move forward. The result of the operation is an upload id returned by S3 – this will identify the upload throughout it’s lifecycle and will be present in all subsequent upload operations. 3.2. Uploading the Parts The next step is uploading the parts. Our goal here is to send these requests in parallel, as the upload parts operation represent the bulk of the upload process: List> ongoingOperations = Lists.newArrayList(); for (int partNumber = 0; partNumber < filePartsAsByteArrays.size(); partNumber++) { ListenableFuture future = s3AsyncApi.uploadPart( container, key, partNumber + 1, uploadId, payloads.get(partNumber)); ongoingOperations.add(future); } The part numbers need to be continuous but the order in which the requests are send is not relevant. After all of the upload part requests have been submitted, we need to wait for their responses so that we can collect the individual ETag value of each part: Function, String> getEtagFromOp = new Function, String>() { public String apply(ListenableFuture ongoingOperation) { try { return ongoingOperation.get(); } catch (InterruptedException | ExecutionException e) { throw new IllegalStateException(e); } } }; List etagsOfParts = Lists.transform(ongoingOperations, getEtagFromOp); If, for whatever reason, one of the upload part operations fails, the operation can be retried until it succeeds. The logic above does not contain the retry mechanism, but building it in should be straightforward enough. 3.3. Completing the Upload operation The final step of the upload process is completing the multipart operation. The S3 API requires the responses from the previous parts upload as a Map, which we can now easily create from the list of ETags that we obtained above: Map parts = Maps.newHashMap(); for (int i = 0; i < etagsOfParts.size(); i++) { parts.put(i + 1, etagsOfParts.get(i)); } And finally, send the complete request: s3AsyncApi.completeMultipartUpload(container, key, uploadId, parts).get(); This will return final ETag of the finished object and will complete the entire upload process. 4. Conclusion In this article we built a multipart enabled, fully parallel upload operation to S3, using the custom S3 jclouds API. This operation is ready to be used as is, but it can be improved in a few ways. First, retry logic should be added around the upload operations to better deal with failures. Next, for really large files, even though the mechanism is sending all upload multipart requests in parallel, a throttling mechanism should still limit the number of parallel requests being sent. This is both to avoid bandwidth becoming a bottleneck as well as to make sure Amazon itself doesn’t flag the upload process as exceeding an allowed limit of requests per second – the Guava RateLimiter can potentially be very well suited for this. P.S. You might dig following me on Twitter.

April 21, 2013

by Eugen Paraschiv

· 6,642 Views · 1 Like

Upload on S3 with the jclouds Library

There are several good ways to upload content to an S3 bucket in the Java world – in this article we’ll look at what the jclouds library provides for this purpose. To use jclouds – specifically the APIs discussed in this article, this simple Maven dependency should be added to the pom of the project: org.jclouds jclouds-allblobstore 1.5.9 1. Uploading to Amazon S3 The first step, in order to access any of these APIs, is to create a BlobStoreContext: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(BlobStoreContext.class); This represents the entry-point to a general key-value storage service, such as Amazon S3 – but not limited to it. For the more specific S3 only implementation, the context can be created similarly: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(S3BlobStoreContext.class); And even more specifically: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); When the authenticated context is no longer needed, closing it is required to release all resources – threads and connections – associated to it. 2. The four S3 APIs of jclouds The jclouds library provides four different APIs to upload content to S3 bucket, ranging from simple but inflexible to complex and powerful, all obtained via the BlobStoreContext. Let’s start with the simplest. 2.1. Upload via the Map API The easiest way jclouds can be used to interact with an S3 bucket is by representing that bucket as a Map. The API is obtained from the context: InputStreamMap bucket = context.createInputStreamMap("bucketName"); Then, to upload a simple HTML file: bucket.putString("index1.html", "hello world1"); The InputStreamMap API exposes several other types of PUT operations – files, raw bytes – both for single and bulk. A simple integration test can be used as an example: @Test public void whenFileIsUploadedToS3WithMapApi_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); InputStreamMap bucket = context.createInputStreamMap("bucketName"); bucket.putString("index1.html", "hello world1"); context.close(); } 2.2. Upload via BlobMap Using the simple Map API is straightforward but ultimately limited – for example, there is no way to pass in metadata about the content being uploaded. When more flexibility and customization is necessary, this simplified approach to uploading data to S3 via a Map is no longer enough. The next API we’ll look at is the Blob Map API – this is obtained from the context: BlobMap bucket = context.createBlobMap("bucketName"); The API allows the client to access more lower level details, such as Content-Length, Content-Type, Content-Encoding, eTag hash and others; to upload new content in the bucket: Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); The API also allows setting a variety of payloads on the create request. A simple integration test for uploading a basic HTML file to S3 via the Blob Map API: @Test public void whenFileIsUploadedToS3WithBlobMap_thenNoExceptions() throws IOException { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobMap bucket = context.createBlobMap("bucketName"); Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); bucket.put(blob.getMetadata().getName(), blob); context.close(); } 2.3. Upload via BlobStore The previous APIs had no way to upload content using multipart upload – this makes them ill suited when working with large files. This limitation is addressed by the next API we’re going to look at – the synchronous BlobStore API. This is obtained from the context: BlobStore blobStore = context.getBlobStore(); To use the multipart support and upload a file to S3: Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); The payload builder is the same one that was being used by the BlobMap API, so the same flexibility in specifying lower level metadata information about the blob is available here. The difference is the PutOptions supported by the PUT operation of the API – namely the multipart support. The previous integration test now has multipart enabled: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); context.close(); } 2.4. Upload via AsyncBlobStore While the previous BlobStore API was synchronous, there is also an asynchronous API for BlobStore – AsyncBlobStore. The API is similarly obtained from the context: AsyncBlobStore blobStore = context.getAsyncBlobStore(); The only difference between the two is that the async API is returning ListenableFuture for the PUT asynchronous operation: Blob blob = blobStore.blobBuilder("index4.html"). .payload("hello world4").build(); blobStore.putBlob("bucketName", blob).get(); The integration test displaying this operation is similar to the synchronous one: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index4.html"). payload("hello world4").contentType("text/html").build(); Future putOp = blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); putOp.get(); context.close(); } 3. Conclusion In this article, we analysed the four APIs that the jclouds library provides to upload content to Amazon S3. These four APIs are generic and they work with other key-value storage services as well – such as Microsoft Azure Storage for example. In the next article we’ll look at the Amazon specific S3 API available in jclouds – the AWSS3Client. We’ll implement the operation of uploading a large file, dynamically calculate the optimal number of parts for any given file, and perform the upload of all parts in parallel. P.S. You might dig following me on Twitter.

April 18, 2013

by Eugen Paraschiv

· 8,916 Views · 1 Like

ActiveMQ and .NET combined!

ActiveMQ is one of the most popular messaging frameworks. For sure the most popular open source framework. Many people think that ActiveMQ works only with Java and this is not true at all. ActiveMQ can work with almost every popular language (including JavaScript!) through numerous protocols which it supports. Today I will show you how to use ActiveMQ in .NET-based solutions. Project setup Using VS 2010's Extension Manger I installed NuGet Package Manager. After installation and VS 2010 restart, I created a project called ActiveMQNMS. I right-clicked it and selected "Manage NuGet packages...". In the search field I typed: "ActiveMQ". There was a package called Apache.NMS.ActiveMQ. I installed it. (Note: ActiveMQ has one dependency - Apache.NMS package. The NMS package provides a unified API for working with different messaging frameworks and providers.) Starting ActiveMQ I already had ActiveMQ installed on my machine. If you don't have one, download it from http://activemq.apache.org. The default instance listens on 61616 port. However, mine is listening on 62626. If you want to run my code, please remember to change the port. To start ActiveMQ I executed: activemq-5.5.0\bin\activemq Depending on configured ports, you can use ActiveMQ web console to manage your queues, topics, subscribers, connections, embedded Apache Camel, etc. I'm using 8282 port, and the console URL is: http://localhost:8282/admin. Test stub In general the .NET API is almost a copy of the Java API. So if you're familiar with JMS and/or ActiveMQ you don't need any documentation. Please note TestIntialize and TestCleanup methods. using System; using Apache.NMS; using Apache.NMS.ActiveMQ; using Microsoft.VisualStudio.TestTools.UnitTesting; namespace ActiveMQNMS { [Serializable] public class Person { public string FirstName { get; set; } public string LastName { get; set; } } [TestClass] public class ActiveMqTest { private IConnection _connection; private ISession _session; private const String QUEUE_DESTINATION = "DotNet.ActiveMQ.Test.Queue"; [TestInitialize] public void TestInitialize() { IConnectionFactory factory = new ConnectionFactory("tcp://localhost:62626"); _connection = factory.CreateConnection(); _connection.Start(); _session = _connection.CreateSession(); } [TestCleanup] public void TestCleanup() { _session.Close(); _connection.Close(); } } } Writing Producer Here is the producer: [TestMethod] public void TestA() { IDestination dest = _session.GetQueue(QUEUE_DESTINATION); using (IMessageProducer producer = _session.CreateProducer(dest)) { var person = new Person { FirstName = "Łukasz", LastName = "Budnik" }; var objectMessage = producer.CreateObjectMessage(person); producer.Send(objectMessage); } } Run the test and refresh "Queues" list in ActiveMQ web console. You should see DotNet.ActiveMQ.Test.Queue queue with 1 enqueued and pending message. Purge the queue by hitting the purge link or you simply delete it. Writing Consumer Now we have to consume the message. Here is the code: [TestMethod] public void TestB() { Person person = null; IDestination dest = _session.GetQueue(QUEUE_DESTINATION); using (IMessageConsumer consumer = _session.CreateConsumer(dest)) { IMessage message; while ((message = consumer.Receive(TimeSpan.FromMilliseconds(2000))) != null) { var objectMessage = message as IObjectMessage; if (objectMessage != null) { person = objectMessage.Body as Person; if (person != null) { Assert.AreEqual("Łukasz", person.FirstName); Assert.AreEqual("Budnik", person.LastName); } } else { Assert.Fail("Object Message is null"); } } } if (person == null) { Assert.Fail("Person object is null"); } } Run tests. Refresh "Queues" tab in ActiveMQ web console. You should see 1 message enqueued and 1 message dequeued. As expected. Summary That's all. Simple, isn't it? ActiveMQ works very, very nicely with .NET. I have to find some performance comparison for ActiveMQ and MS or pure .C#/NET messaging frameworks. Or maybe you have it? Please share. cheers, Łukasz

April 15, 2013

by Łukasz Budnik

· 29,468 Views

Introduction to SmartSVN

SmartSVN is a powerful and easy-to-use graphical client for Apache Subversion. There are several clients for Subversion, but here are just a few reasons you should try SmartSVN: It’s cross-platform – SmartSVN runs on Windows, Linux and Mac OS X, so you can continue using the operating system (OS) that works the best for you. It can also be integrated into your OS, via Mac’s Finder Integration or Windows Shell. Everything you need, out of the box – SmartSVN comes complete with all the tools you need to manage your Subversion projects: Conflict solver – this feature combines the freedom of a general, three-way-merge with the ability to detect and resolve any conflicts that occur during the development lifecycle. File compare – this allows you to make inner-line comparisons and directly edit the compared files. Built-in SSH client – allows users to access servers using the SSH protocol. This security-conscious protocol encrypts every piece of communication between the client and the server, for additional protection. A complete view of your project at a glance – the most important files (such as conflicted, modified or missing files) are placed at the top of the file list. SmartSVN also highlights which directories contain local modifications, which directories have been changed in the repository, and whether individual files have been modified locally or in the central repo. This makes it easy to get a quick overview of the state of your project. Fully customizable – maximize productivity by fine-tuning your SmartSVN installation to suit your particular needs: Change keyboard shortcuts, write your own plugin with the SmartSVN API, group revisions to personalize your display, create Change Sets, and alter the context menus and toolbars to suit you. You can learn more about customizing SmartSVN at our ‘5 Ways to Customize SmartSVN’ blog post. Comprehensive bug tracker support – Trac and JIRA are both fully supported. Multitude of support options – SmartSVN users have access to a range of free support, from refcards to blogsand documentation, the SmartSVN forum and a Twitter account maintained by our open source experts. If you need extra support with your SmartSVN installation, expert email support is included with SmartSVN Professional licenses. Want to learn more about SmartSVN? On April 18th, WANdisco will be be holding a free ‘Introduction to SmartSVN’ webinar covering everything you need to get off to a great start with this popular client: Repository basics Checkouts, working folders, editing files and commits Reporting on changes Simple branching Simple merging This webinar is free so register now.

April 13, 2013

by Jessica Thornsby

· 6,826 Views

Application Services Governance Components

Application Services Governance is a necessary step towards building a responsive IT organization and achieving business agility. By guiding teams through a streamlined application services development process, Application Services Governance Platforms optimize IT effectiveness, raise software quality, and reduce delivery timeframes. Governance relies on policy, people, process and technology to guide business activity and consistently deliver positive outcomes. Effective governance channels business activity towards the ‘right’ path; by making the right actions the path of least resistance. To efficiently guide teams and demonstrate policy compliance benefits, Application Services Governance Platforms provide policy management, developer portals, repositories, service integration and composition, and business value dashboards. Effective governance encompasses the entire IT solution spanning APIs, services, business processes, data, and application delivery. While most governance solutions focus on web services, leading Application Services Governance Platforms bridge API governance, SOA governance, Cloud deployment governance, data governance, and application delivery governance. Additionally, the governance experience must be tailored for the participant’s project role. Portals may be personalized to present notifications, tasks, actions, and reports suitable for application service creators, publishers, subscribers, consumers, or business managers. Application delivery governance segments participants into developers, quality assurance testers, operations, project managers, and application users. End-user Application Services Governance priorities are evolving toward bridging service governance with API governance, extending application lifecycle management to embrace cloud deployment environments, and focusing on visualizing asset business value. Key governance challenges include meeting mobile application demands, implementing efficient self-service provisioning, right-sizing governance practices (not too heavy or light), and defining appropriate policy tiers. Governance Components To efficiently guide teams and demonstrate policy compliance benefits, Application Services Governance Platforms provide policy management, developer portals, repositories, service integration and composition, and business value dashboards. Figure 1 Application Services Governance Components Policy Management Policy management is used to specify the correct behavior, detail exception thresholds, and define corrective actions or notifications. Leading application services governance platforms deliver advanced policy management by conforming to a flexible architecture, addressing relevant policy categories, and spanning all lifecycle phases. A comprehensive Application Services Governance Platform manages: Design-time Policy Run-time Policy Security Policy Developer access Policy Service and API Lifecycle Management Policy Application Lifecycle Management Policy Within these six broad categories, application services governance commonly encompasses service level policies, usage policies, version policies, subscription policies, and access control policies. Registries serve as policy stores for many types of runtime policies including security policies, lifecycle management workflow policies, API policies, service description, service contracts, service consumption, service usage, service lifecycle management, service level agreements (SLAs) and XACML authorization policies. Leading platforms have built-in support for a number of policy standards including WS-Policy, XACML 3.0, and SCXML. Cloud foundation and cloud middleware components deliver sophisticated run-time policy enforcement for tenant partitioning, service level management, application provisioning, tenant access, and resource management. All run-time infrastructure products should serve as well-integrated policy enforcement points that may delegate policy decisions to external decision points or internally cache and process policy assertions. Identity Management infrastructure components serve as a policy decision point and a policy manager for sophisticated security policies encoded in XACML. The Application Service Governance Platforms use workflow engines to execute governance workflow, present task lists, and manage approvals. Complex Event Processor components can be configured as policy decision points, which use time-based policy pattern matching to evaluate run-time service, message, REST resource, and event traffic. For more information on policy management, read the detailed policy management blog post. Developer Portal and Repository Portals serve as the viewport into policy management, service integration and composition, and business value dashboards. The Application Service Governance portals should deliver an application service governance experience tuned for self-service, on-demand access, and safe API usage. Developer portals are often contextually personalized to fit the project and user’s role. For example, a developer portal may fit the needs of API creators and API publishers who are defining, documenting, and publishing APIs. The portal’s user experience may enable API creators and publishers to monitor, manage, and analyze API usage. A developer portal may also be personalized to deliver a user experience tailored for API consumers. API developers who are consuming APIs can find, explore, subscribe and evaluate APIs. Developer portals are often tuned to facilitate service meta-data and lifecycle management for service creators. Service and integration developers who are consuming services can find and explore services. A developer portal should guide teams toward effective and efficient governance when building service implementation and service consumption code. Advanced developer portals capabilities include overlaying build management governance, test governance (i.e. unit, integration, performance), implementation lifecycle governance, and deployment governance. An Application Services Governance Platform should enable flexible organization, classification & documentation of services, APIs, and any IT asset. Key repository capabilities include governing and managing: Any type of metadata in any structure Service, API, or artifact associations and relationships Schema definitions and namespaces Users and Roles User subscriptions Service level agreements Developer documentation Social taxonomies (e.g. ratings, comments, tags) Implementation artifacts (i.e. code, test cases) Service Integration and Composition Service integration and composition for APIs, web services, or business process are often implemented using tools provided by the run-time infrastructure vendor. Application Services Governance components must integrate into diverse run-time infrastructure containers and development tooling. Synchronizing policy, development artifacts, and deployment packages requires tight integration between design-time tools, development tools, run-time management consoles, and application services governance portals and repositories. Business Value Dashboards To gauge governance effectiveness and enhanced business value, analytic dashboards assess policy compliance, quality of service, service usage, architecture coherence, and team performance. The Application Services Governance platform should capture service tier subscription information, collects usage statistics, and integrate with billing and payment systems that deliver show-back or charge-back reports. Subscription and usage reports help teams understand asset adoption (by version, by service) and usage (by version, by service). By understanding adoption and usage, business owners and architects can intelligently invest future development resources, properly plan infrastructure scale, and rationalize the portfolio. Dashboards also present a service overview, number of services, service lifecycle stage, schema re-use, service dependencies, upgrade impacts, development team productivity, and project progress. Governance Lifecycle Phases API management portals and SOA Governance Registries must work together to keep API lifecycle stages synchronized with backend service implementation stages. An API Governance experience may provide a straightforward set of lifecycle stages (e.g., created, published, deprecated, retired, blocked) that may be customized by the development team. SOA Governance Registries facilitates service metadata management and governance across design, implementation, test, and run-time operations. Figure 2 below depicts the intersection of the two governance views. Figure 2: API and Service Lifecycle Views Application delivery governance usually relies on ad hoc tools and processes, knitted together by end-user delivery managers. Application Services Governance Platforms should span project inception, development, quality assurance, production deployment, production management, maintenance, and retirement. Figure 3 illustrates service implementation activities governed by an application delivery governance product. Figure 3: Implementation activities governed by application services delivery governance Application Services Governance Drivers The IT focus on API, DevOps, and Cloud scale is driving resurgent interest in Application Services Governance. As development teams support mobile applications by fielding web APIs, they are creating a new ‘demand layer’ in front of existing service implementations. Both API and SOA success requires creating loosely coupled consumer-provider connections, enforcing a separation of concerns between consumer and provider, and exposing a set of re-usable, shared services, and gaining service consumer adoption. With traditional SOA Governance, many development teams publish services, yet struggle to create a service architecture that is widely shared, re-used, and adopted across internal development teams. In today’s connected business world, API and SOA are the business. An effective governance approach must address human collaboration stumbling blocks. By publishing managed APIs, establishing API manager and publisher roles, extending the governance registry, facilitating API management practices (e.g self-service key management, self-service provisioning, service tier management, and usage visualization),and offering APIs through developer portal, organizations can overcome collaboration, trust, and adoption hurdles while enhancing SOA success. By publishing managed APIs, establishing API manager and publisher roles, extending the governance registry, and offering APIs through an API Store, team have a new opportunity to increase service re-use and enhance IT business value. For more information on how teams can complement SOA Governance with API Governance, read the promoting services with API Management white paper. Because services are often imbedded in application solutions, leading Application Services Governance platforms wrap services governance inside application delivery governance. When operation team members use traditional point tools (i.e. Puppet, Chef, Jenkins,Selenium) to achieve DevOps benefits, the teams spend a considerable amount of time and effort creating agile workflow, effective governance, seamless activity transitions, and on-demand self-service access. A configurable DevOps PaaS can implement governance best practices and be readily adopted by teams without extensive implementation effort. Effective application delivery governance presents a simplified and unified user experience to complex development tools, processes, and team hand-offs. By integrating software promotion best practices, test automation, continuous integration, and issue tracking, application delivery governance raises software quality while reducing delivery timeframes. For more information, read about how to accelerate agility and maintain governance with DevOps PaaS. Recommended Reading Policy Management for Application Services Governance Application Services Governance Requires More Than a SOA Registry API and SOA Convergence Promoting services with API Management white paper Accelerate agility and maintain governance with DevOps PaaS Governance Registry Brings Integrity to SaaS Platform Gartner’s analysis of WSO2 SOA Governance

April 13, 2013

by Chris Haddad

· 5,970 Views · 2 Likes

Complex Event Processing Made Easy (using Esper)

The following is a very simple example of event stream processing (using the ESPER engine). Note - a full working example is available over on GitHub: https://github.com/corsoft/esper-demo-nuclear What is Complex Event processing (CEP)? Complex Event Processing (CEP), or Event Stream Stream Processing (ESP) are technologies commonly used in Event-Driven systems. These type of systems consume, and react to a stream of event data in real time. Typically these will be things like financial trading, fraud identification and process monitoring systems – where you need to identify, make sense of, and react quickly to emerging patterns in a stream of data events. Key Components of a CEP system A CEP system is like your typical database model turned upside down. Whereas a typical database stores data, and runs queries against the data, a CEP data stores queries, and runs data through the queries. To do this it basically needs: Data – in the form of ‘Events’ Queries – using EPL (‘Event Processing Language’) Listeners – code that ‘does something’ if the queries return results A Simple Example - A Nuclear Power Plant Take the example of a Nuclear Power Station.. Now, this is just an example – so please try and suspend your disbelief if you know something about Nuclear Cores, Critical Temperatures, and the like. It’s just an example. I could have picked equally unbelievable financial transaction data. But ... Monitoring the Core Temperature Now I don’t know what the core is, or if it even exists in reality – but for this example lets assume our power station has one, and if it gets too hot – well, very bad things happen.. Lets also assume that we have temperature gauges (thermometers?) in place which take a reading of the core temperature every second – and send the data to a central monitoring system. What are the requirements? We need to be warned when 3 types of events are detected: MONITOR just tell us the average temperature every 10 seconds - for information purposes WARNING WARN us if we have 2 consecutive temperatures above a certain threshold CRITICAL ALERT us if we have 4 consecutive events, with the first one above a certain threshold, and each subsequent one greater than the last – and the last one being 1.5 times greater than the first. This is trying to alert us that we have a sudden, rising escalating temperature spike – a bit like the diagram below. And let’s assume this is a very bad thing. Using Esper There are a number of ways you could approach building a system to handle these requirements. For the purpose of this post though - we will look at using Esper to tackle this problem How we approach this with Esper is: Using Esper – we can create 3 queries (using EPL - Esper Query Language) to model each of these event patterns. We then attach a listener to each query - this will be triggered when the EPL detects a matching pattern of events) We create an Esper service, and register these queries (and their listeners) We can then just throw Temperature data through the service – and let Esper tell alert the listeners when we get matches. (A working example of this simple solution is available on Githib - see link above) Our Simple ESPER Solution At the core of the system are the 3 queries for detecting the events. Query 1 – MONITOR (Just monitor the average temperature) select avg(value) as avg_val from TemperatureEvent.win:time_batch(10 sec) Query 2 – WARN (Tell us if we have 2 consecutive events which breach a threshold) select * from TemperatureEvent " match_recognize ( measures A as temp1, B as temp2 pattern (A B) define A as A.temperature > 400, B as B.temperature > 400) Query 3 – CRITICAL - 4 consecutive rising values above all above 100 with the fourth value being 1.5x greater than the first select * from TemperatureEvent match_recognize ( measures A as temp1, B as temp2, C as temp3, D as temp4 pattern (A B C D) define A as A.temperature > 100, B as (A.temperature < B.value), C as (B.temperature < C.value), D as (C.temperature < D.value) and D.value > (A.value * 1.5)) Some Code Snippets TemperatureEvent We assume our incoming data arrives in the form of a TemperatureEvent POJO If it doesn't - we can convert it to one, e.g. if it comes in via a JMS queue, our queue listener can convert it to a POJO. We don't have to do this, but doing so decouples us from the incoming data structure, and gives us more flexibility if we start to do more processing in our Java code outside the core Esper queries. An example of our POJO is below package com.cor.cep.event; package com.cor.cep.event; import java.util.Date; /** * Immutable Temperature Event class. * The process control system creates these events. * The TemperatureEventHandler picks these up * and processes them. */ public class TemperatureEvent { /** Temperature in Celcius. */ private int temperature; /** Time temerature reading was taken. */ private Date timeOfReading; /** * Single value constructor. * @param value Temperature in Celsius. */ /** * Temerature constructor. * @param temperature Temperature in Celsius * @param timeOfReading Time of Reading */ public TemperatureEvent(int temperature, Date timeOfReading) { this.temperature = temperature; this.timeOfReading = timeOfReading; } /** * Get the Temperature. * @return Temperature in Celsius */ public int getTemperature() { return temperature; } /** * Get time Temperature reading was taken. * @return Time of Reading */ public Date getTimeOfReading() { return timeOfReading; } @Override public String toString() { return "TemperatureEvent [" + temperature + "C]"; } } Handling this Event In our main handler class - TemperatureEventHandler.java, we initialise the Esper service. We register the package containing our TemperatureEvent so the EPL can use it. We also create our 3 statements and add a listener to each statement /** * Auto initialise our service after Spring bean wiring is complete. */ @Override public void afterPropertiesSet() throws Exception { initService(); } /** * Configure Esper Statement(s). */ public void initService() { Configuration config = new Configuration(); // Recognise domain objects in this package in Esper. config.addEventTypeAutoName("com.cor.cep.event"); epService = EPServiceProviderManager.getDefaultProvider(config); createCriticalTemperatureCheckExpression(); createWarningTemperatureCheckExpression(); createTemperatureMonitorExpression(); } An example of creating the Critical Temperature warning and attaching the listener /** * EPL to check for a sudden critical rise across 4 events, * where the last event is 1.5x greater than the first. * This is checking for a sudden, sustained escalating * rise in the temperature */ private void createCriticalTemperatureCheckExpression() { LOG.debug("create Critical Temperature Check Expression"); EPAdministrator epAdmin = epService.getEPAdministrator(); criticalEventStatement = epAdmin.createEPL(criticalEventSubscriber.getStatement()); criticalEventStatement.setSubscriber(criticalEventSubscriber); } And finally - an example of the listener for the Critical event. This just logs some debug - that's as far as this demo goes. package com.cor.cep.subscriber; import java.util.Map; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.stereotype.Component; import com.cor.cep.event.TemperatureEvent; /** * Wraps Esper Statement and Listener. No dependency on Esper libraries. */ @Component public class CriticalEventSubscriber implements StatementSubscriber { /** Logger */ private static Logger LOG = LoggerFactory.getLogger(CriticalEventSubscriber.class); /** Minimum starting threshold for a critical event. */ private static final String CRITICAL_EVENT_THRESHOLD = "100"; /** * If the last event in a critical sequence is this much greater * than the first - issue a critical alert. */ private static final String CRITICAL_EVENT_MULTIPLIER = "1.5"; /** * {@inheritDoc} */ public String getStatement() { // Example using 'Match Recognise' syntax. String criticalEventExpression = "select * from TemperatureEvent " + "match_recognize ( " + "measures A as temp1, B as temp2, C as temp3, D as temp4 " + "pattern (A B C D) " + "define " + " A as A.temperature > " + CRITICAL_EVENT_THRESHOLD + ", " + " B as (A.temperature < B.temperature), " + " C as (B.temperature < C.temperature), " + " D as (C.temperature < D.temperature) " + "and D.temperature > " + "(A.temperature * " + CRITICAL_EVENT_MULTIPLIER + ")" + ")"; return criticalEventExpression; } /** * Listener method called when Esper has detected a pattern match. */ public void update(Map eventMap) { // 1st Temperature in the Critical Sequence TemperatureEvent temp1 = (TemperatureEvent) eventMap.get("temp1"); // 2nd Temperature in the Critical Sequence TemperatureEvent temp2 = (TemperatureEvent) eventMap.get("temp2"); // 3rd Temperature in the Critical Sequence TemperatureEvent temp3 = (TemperatureEvent) eventMap.get("temp3"); // 4th Temperature in the Critical Sequence TemperatureEvent temp4 = (TemperatureEvent) eventMap.get("temp4"); StringBuilder sb = new StringBuilder(); sb.append("***************************************"); sb.append("\n* [ALERT] : CRITICAL EVENT DETECTED! "); sb.append("\n* " + temp1 + " > " + temp2 + " > " + temp3 + " > " + temp4); sb.append("\n***************************************"); LOG.debug(sb.toString()); } } The Running Demo Full instructions for running the demo can be found here: https://github.com/corsoft/esper-demo-nuclear An example of the running demo is shown below - it generates random Temperature events and sends them through the Esper processor (in the real world this would come in via a JMS queue, http endpoint or socket listener). When any of our 3 queries detect a match - debug is dumped to the console. In a real world solution each of these 3 listeners would handle the events differently - maybe by sending messages to alert queues/endpoints for other parts of the system to pick up the processing. Conclusions Using a system like Esper is a neat way to monitor and spot patterns in data in real time with minimal code. This is obviously (and intentionally) a very bare bones demo, barely touching the surface of the capabilities available. Check out the Esper web site for more info and demos. Esper also has a plugin for Apache Camel Integration engine - which allows you to configure you EPL queries directly in XML Spring camel routes, removing the need for any Java code completely (we will possibly cover this in a later blog post!)

April 11, 2013

by Adrian Milne

· 68,433 Views · 4 Likes

Add Custom Post Meta Data To Post List Table

One of the best thing about WordPress is that you can customise almost anything. In the admin area you can see a list of all the posts you have added in WordPress. Within this table it shows the basic information for each of the posts, the title, the author, the category, tags, comments and the date the post was published. WordPress has a number of different filters and actions that allow you to edit the output of the column so you can add your own data to this list. For example if you have custom post meta data which is useful information you want to display on the list of posts you can add new custom columns to the list. In this article you will learn how to add new columns to the post list, how you can add data to the column and how you can make this column sortable. Add New Columns First we start off by adding the new column to the list, for this we use the WordPress filter manage_edit-post_columns. This will allow you to edit the output of the columns by adding new values to the column array. The callback function on this filter will pass in one parameter which are the current columns on the list, the return of this function will be the new columns on the post table. This means that we can add additional values to the array to add extra columns to the table. The following code will add a new column to the table just after the title column. // Add a column to the edit post list add_filter( 'manage_edit-post_columns', 'add_new_columns'); /** * Add new columns to the post table * * @param Array $columns - Current columns on the list post */ function add_new_columns( $columns ) { $column_meta = array( 'meta' => 'Custom Column' ); $columns = array_slice( $columns, 0, 2, true ) + $column_meta + array_slice( $columns, 2, NULL, true ); return $columns; } Add Columns To Custom Post Types If you have custom post types in your site and want to add additional columns to this list, WordPress comes with built in filters you can apply to add new columns to this table. add_filter( 'manage_${post_type}_posts_columns', 'add_new_columns'); If you have a custom post type of portfolio then you can use the following code to add a column to the list of portfolio post types. function add_portfolio_columns($columns) { return array_merge($columns, array('client' => __('Client'), 'project_date' =>__( 'Project Date'))); } add_filter('manage_portfolio_posts_columns' , 'add_portfolio_columns'); Add Data To Custom Columns Once you have created the new columns for the posts list you can now add data to the new columns by using the WordPress action manage_posts_custom_column. Adding an action to this will be called on each column, from this call we can get data for the post display this on the post list. The following code will check what column we are on and get the custom post meta data for the current post and display this in the column. // Add action to the manage post column to display the data add_action( 'manage_posts_custom_column' , 'custom_columns' ); /** * Display data in new columns * * @param $column Current column * * @return Data for the column */ function custom_columns( $column ) { global $post; switch ( $column ) { case 'meta': $metaData = get_post_meta( $post->ID, 'twitter_url', true ); echo $metaData; break; } } Add Data To Custom Post Type Columns Along with being able to add a filter to custom post types by adding new columns, WordPress has a built in action you can use to add data to custom columns. In the above example we add a new column just for post types of a portfolio to add two new columns to the list, using the below code you can add data to these new columns. function custom_portfolio_column( $column, $post_id ) { switch ( $column ) { case 'project_date': echo get_post_meta( $post_id , 'project_date' , true ); break; case 'client': echo get_post_meta( $post_id , 'client' , true ); break; } } add_action( 'manage_portfolio_posts_custom_column' , 'custom_portfolio_column' ); Make Columns Sortable By default the new custom columns are not sortable so this makes it hard to find data that you need. To sort the custom columns WordPress has another filter manage_edit-post_sortable_columns you can use to assign which columns are sortable. When this action is ran the function will pass in a parameter of all the columns which are currently sortable, by adding your new custom columns to this list will now make these columns sortable. The value you give this will be used in the URL so WordPress understands which column to order by. The following to allow you to sort by the custom column meta. // Register the column as sortable function register_sortable_columns( $columns ) { $columns['meta'] = 'Custom Column'; return $columns; } add_filter( 'manage_edit-post_sortable_columns', 'register_sortable_columns' ); That's all the information you need to change the way posts are listed in your admin area. What useful information do you wish was displayed in the post list?

April 10, 2013

by Paul Underwood

· 13,697 Views

Configuring Apache SolrCloud on Amazon VPC

We are going to construct an Apache SolrCloud (4.1) with 12 node EC2 instance(s) inside Amazon VPC in this post. Since the search data stored inside the SolrCloud is critical, we are going to build High availability at Solr Node level as well as AZ level. This setup will be done inside private subnet of Amazon VPC and will leverage 3 Availability Zones of the Amazon EC2 Region. Deployment architecture of the setup is given below: A small brief about setup: 3 Zookeepers will be deployed on 3 Availability Zones. ZK EC2 instances will be deployed on the Private subnet of the Amazon VPC. 3 Solr Shard EC2 instances will be deployed on Private subnet of Availability Zone 1 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 2 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 3 inside Amazon VPC. EBS optimized + PIOPS EC2 instances can be used for Solr EC2 Nodes To know more about SolrCloud Deployment best practices on Amazon VPC, Refer article: http://harish11g.blogspot.in/2013/03/Apache-Solr-cloud-on-Amazon-EC2-AWS-VPC-implementation-deployment.html Step 1: Creating Virtual Private Cloud on AWS Create a VPC with Public and Private Subnets. Assume the Load balancer and Web/App Servers can reside on the public subnet and Apache Solr Cloud will reside on the private subnet of the VPC. Step 2: Assigning the IP for the Subnets Create the subnet with its IP range. Chose the Availability zone for this subnet. Step 3: Multiple Subnets on Multiple AZ’s Create multiple subnets in Multiple AZ for building a Highly available setup for SolCloud Step 4: Install Java for Zookeeper & Solr Amazon Linux is chosen as the EC2 OS variant. Execute the following instructions on the respective EC2 nodes after their launch. EC2 instances should be launched in Multi-AZ in Multiple VPC Private Subnets. Solr uses Zookeeper as the cluster configuration and coordinator. Zookeeper is a distributed file system containing information about all the Solr Nodes. Solrconfig.xml, Schema.xml etc are stored in the repository.We have used Oracle-Sun Java over OpenJDK “sudo -s” “cd /opt” “wget --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk-7u3-download-1501626.html;" http://download.oracle.com/otn-pub/java/jdk/7u13-b20/jdk-7u13-linux-x64.rpm” “mv jdk-7u10-linux-x64.rpm?AuthParam=1357217677_76ec3d8d9a3644f4b9ec1ea79e1fcf33 jdk-7u10-linux-x64.rpm jdk-7u10-linux-x64.rpm” “sudo rpm -ivh jdk-7u10-linux-x64.rpm” “alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_10/jre/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jdk1.7.0_10/jre/bin/javaws 20000” “alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_10/bin/javac 20000” “alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_10/bin/jar 20000” “alternatives --install /usr/bin/java java /usr/java/jre1.7.0_10/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jre1.7.0_10/bin/javaws 20000” “alternatives --configure java” Add JAVA_HOME in .bash_profile: “vim ~/.bash_profile” export JAVA_HOME="/usr/java/jdk1.7.0_09" export PATH=$PATH:$JAVA_HOME/bin Restart the instance. “init 6” Check the version of Java installed using “java -version” command Step 5: Configure the ZooKeeper (v3.4.5) Ensemble: Since single Zookeeper is not ideal for a large Solr cluster (because of SPOF), it is recommended to configure multiple Zookeepers in concert as an ensemble .In this step we will install and configure 3 ZooKeeper EC2 nodes spanning across 3 different Availability Zones in respective Private Subnets inside a VPC.Zookeeper will be configured on Amazon Linux. “sudo yum update” “sudo -s” “ cd /opt” “wget http://apache.techartifact.com/mirror/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz” “tar -xzvf zookeeper-3.4.5.tar.gz” “rm zookeeper-3.4.5.tar.gz” “cd zookeeper-3.4.5” “cp conf/zoo_sample.cfg conf/zoo.cfg” Add the following lines in zoo.cfg “vim conf/zoo.cfg” dataDir=/data server.1=[zk-server01-ip]:2888:3888 server.2=[zk-server02-ip]:2888:3888 server.3=[zk-server03-ip]:2888:3888 “cd /opt/zookeeper/data” “vim myid” 1 or 2 or 3 respectively on each ZooKeeper EC2 instances in Multi-AZ #Starting ZooKeeper Program. “bin/zkServer.sh start” Follow the above steps in all the ZooKeeper servers. ReferClustered (Multi-Server) SetupandConfiguration Parameters for understandingquorum_port,leader_election_port and the filemyid. Every ZooKeeper node needs to know about every other ZK EC2 node in the ensemble, and a majority of EC2’s (called a Quorum) are needed to provide the service. Make sure the VPC IP of all the Zookeepers are given in every ZK node, like the one in following command. server.1=:: server.2=:: server.3=:: Step 6: Configuring Solr 4.1 EC2 node In this step we will install and configure 3 Apache Solr4.1 Shard EC2 instances in a single Amazon AZ and 2 Solr Replicas in another AZ in their respective Private subnets. Please note that we have to specify all the ZooKeeper (ZK) hosts on every Solr instance as below. Note: Solr gets comes with jetty in default, it is suggested to use tomcat for production nodes. Perform the following after launching EC2 instances in Multi-AZ in Multiple VPC Private Subnets. “sudo -s” “yum update” “cd /opt” “wget http://apache.techartifact.com/mirror/lucene/solr/4.1.0/apache-solr-4.1.0.tgz” “tar -xzvf apache-solr-4.1.0.tgz” “rm -f apache-solr-4.1.0.tgz” On Solr Shard/Replica Instances: “cd /opt/apache-solr-4.0.0/example/” “vim /opt/apache-solr-4.0.0/example/solr/collection1/conf/solrconfig.xml” Change /var/data/solr to /data Starting Solr4.1 Shard/Replica Java Program. “java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=SolrCloud4.1-Conf -DnumShards=3 -DzkHost=[zk-server01-ip]:2181,[zk-server02-ip]:2181,[zk-server03-ip]:2181 -jar start.jar “java -DzkHost= DzkHost=:,:,: -jar start.jar” -DnumShards: the number of shards that will be present. Note that once set, this number cannot be increased or decreased without re-indexing the entire data set. (Dynamically changing the number of shards is part of the Solr roadmap!) -DzkHost: a comma-separated list of ZooKeeper servers. -Dbootstrap_confdir, -Dcollection.configName: these parameters are specified only when starting up the first Solr instance. This will enable the transfer of configuration files to ZooKeeper. Subsequent Solr instances need to just point to the ZooKeeper ensemble. The above command with –DnumShards=3 specifies that it is a 3-shard cluster. The first Solr EC2 node automatically becomes shard1 and the second Solr EC2 node automatically becomes shard2 …. What happens when we launch fourth Solr instance in this cluster? Since it’s a 3-shard cluster, the fourth Solr EC2 node automatically becomes a replica of shard1 and the fifth Solr EC2 node becomes a replica of shard2. Step 7: AWS Security Group TCP Ports to be enabled: Configure the following TCP ports on the AWS security group to allow access between Solr and ZK nodes deployed in Multiple AZ. Solr Shards/Replicas will connect to ZK through TCP Port 2181 Solr Web Interface with Jetty container through TCP Port 8983 Solr Web Interface with Tomcat container through TCP Port 8080 Every instance that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. We can accomplish this with the series of lines of the form server.id=host:port:port For example, server.1=[vpc-ip]:2888:3888 server.2=[vpc-ip]:2888:3888 server.3=[vpc-ip]:2888:3888 TCP Ports 2888, 3888 should be opened for ZK Ensemble.

April 5, 2013

by Harish Ganesan

· 7,829 Views

Weekend Project: Send sensor data from Arduino to MongoDB

Arduino is an open-source electronics platform that can acknowledge and interact with its environment through a variety of sensor types. It’s great for hardware prototyping and one-off projects. I just got an Arduino Board from our friends at SendGrid, who also gave me a little tutorial in the art of Arduino hacking. Inspired by the tutorial and armed with this new board, I bought a passive infared (PIR) motion sensor from my local Radio Shack. Now I was ready to play; in particular, I wanted to be able to collect that continuous stream of hardware sensor data into a MongoDB database for logging, trend analysis, system event correlation, etc. To this end, I created the demo project “mongodb-motion”, which I’ve made public on Github. In the “mongodb-motion” Github repo, you will find an Arudino project that writes motion sensor data to a cloud MongoDB database at MongoLab and sends alerts via email based on certain criteria. I built this demo using Node.js and the MongoLab REST API. Below, I’ll go through exactly what hardware you need to make your own “mongodb-motion” project a success, and how the code actually works. What You Need The hardware used in this demo includes: an Arduino UNO R3 and a Parallax PIR motion sensor. How the Code Works You can use a variety of motion sensors with the Arduino. In this particular experiment, I used a PIR motion sensor. The PIR motion sensor behaves like a switch, with ‘down’ events emitted on motion detection and ‘up’ events a few seconds after motion ceases to be detected. On the receiving side, I used JohnnyFive, an appropriately named Node.js package that accepts sensor events and sends messages to the Arduino board. With the two ends set, I’ll move on to the project’s configuration file. In this demo, I’ve included a configuration file, config-sample.js, where credentials for the MongoLab REST API and for the email SMTP server can be added. In my case, I used the SendGrid SMTP service. The configuration file also has two callbacks that determine when an email is emitted, one for each type of event – “detect” and “ceased”. I’ve used this feature to automatically send an email alert if an event timestamp is between 7:00pm and 8:00am, ostensibly when my office should be motionless… I’m out there watching you, office! Once you’ve customized this config-sample.js file, be sure to rename it to config.js in order for it to be usable. If you inspect the project code, you’ll notice that the MongoLab REST API is called in the logMsg() function, using an https.request. Building this little demo has given me some new ideas for hardware hacking the cloud. I hope you give it a try too. Thanks to the Arduino, Node.js and Javascript communities, and special thanks to Rick Waldon for Johnny Five, SendGrid for the UNO board, and a big shout out to @swiftalphaone for the Waza tutorial.

April 3, 2013

by Ben Wen

· 17,833 Views