Cloud Architecture Resources

The Latest Cloud Architecture Topics

If you are using Amazon Web Services(AWS), you are probably aware how to access and use resources like SNS, SQS, S3 using key and secret. With the aws-java-sdk that is straight forward: AmazonSNSClient snsClient = new AmazonSNSClient( new BasicAWSCredentials("your key", "your secret")) One of the difficulties with this approach is storing the key/secret securely especially when there are different set of these for different environments. Using java property files, combined with maven or spring profiles might help a little bit to externalize the key/secret out of your source code, but still doesn't solve the issue of securely accessing these resources. Amazon has another service to help you in this occasion. No, no, this is not one more service to pay for in order to use the previous services. It is a free service, actually it is a feature of the amazon account. AWS Identity and Access Management (IAM) lets you securely control access to AWS services and resources for your users, you can manage users and groups and define permissions for AWS resources. One interesting functionality of IAM is the ability to assign roles to EC2 instances. The idea is you create roles with sets of permissions and you launch an EC2 instance by assigning the role to the instance. And when you deploy an application on that instance, the application doesn't need to have access key and secret in order to access other amazon resource. The application will use the role credentials to sign the requests. This has a number of benefits like a centralized place to control all the instances credentials, reduced risk with auto refreshing credentials and so on. Here is a short video demonstrating how to assign roles to an EC2 instance: Once you have role based security enabled for an instance, to access other resources from that instances you have to create and AwsClient using the chained credential provider: AmazonSNSClient snsClient = new AmazonSNSClient( new DefaultAWSCredentialsProviderChain()) The provider will search your system properties, environment properties and finally call instance metadata API to retrieve the role credentials in chain of responsibility fashion. It will also refresh the credentials in the background periodically depending on its expiration period. And finally, if you want to use role based security from Camel applications running on Amazon, all you have to do is create an instance of the client with configured chained credentials object and don't specify any key or secret: from("direct:start") .to("aws-sns://MyTopic?amazonSNSClient=#snsClient");

March 26, 2013

by Bilgin Ibryam

· 14,433 Views

Where is My Datastore in Hyper-V? Server Virtualization - Part 4

The term 'datastore’ is one that many of you who work with VMware are familiar, but which doesn’t really translate to the world of Microsoft’s Hyper-V. “Since Hyper-V does not require a different formatting of the underlying physical disk structure like VMFS(VMware’s proprietary disk format) we are able to browse the ‘datastore’ with File Explorer(In Windows 8/Server 2012…formerly known as Windows Explorer).” “Who said that?” That quote was from my friend Tommy Patterson, who writes about datastores and how they compare to the file system structures used in Hyper-V in Part 4 of our “20+ Days of Server Virtualization” series. In his article he describes the locations of the various components that define and make up a Hyper-V virtual machine, and even provides a script to help you quickly locate filesystem locations for your virtual machine bits. READ HIS ARTICLE HERE

March 9, 2013

by Kevin Remde

· 9,093 Views

Spring, JMS, Listener Adapters, and Containers

In order to receive JMS messages, Spring provides the concept of message listener containers. These are beans that can be tied to receive messages that arrive at certain destinations. This post will examine the different ways in which containers can be configured. A simple example is below where the DefaultMessageListenerContianer has been configured to watch one queue (the property jms.queue.name) and has a reference to a myMessageListener bean which implements the MessageListener interface (ie onMessage): This is all very well but means that the myMessageListener bean will have to handle the JMS Message object and process accordingly depending upon the type of javax.jms.Message and its payload. For example: if (message instanceof MapMessage) { // cast, get object, do something } An alternative is to use a MessageListenerAdapter. This class abstracts away the above processing and leaves your code to deal with just the message's payload. For example: The delegate is a reference to a myMessageReceiverDelegate bean which has one or more methods called processMessage. It does not need to implement the MessageListener interface. This method can be overload to handle different payload types. Spring behind the scenes will determine which gets called. For example: public void processMessage(final HashMap message) { // do something } public void processMessage(final String message) { // do something } For the given approach though, only one queue can be tied to the container. Another approach is to tie many listeners (therefore many queues) to the one container, The below Spring XML, using the jms namespace, shows how two listeners for different queues can be tied to one container: The myMessageReceiverDelegate bean is treated as an adapter delegate, therefore does not need to implement the MessageListener interface. Each listener can have a different delegate but for the above example, all messages arriving at the two queues are processed by the one receiver bean ie myMessageReceiverDelegate. If there is a need to check the message type and extract the payload, then the listener can use a class which implements the MessageListener interface (eg the myMessageListener bean used in the first example). The onMessage method will then be called when messages arrive at the specified destination:

February 28, 2013

by Geraint Jones

· 72,598 Views · 2 Likes

CPU Cache Flushing Fallacy

Even from highly experienced technologists I often hear talk about how certain operations cause a CPU cache to "flush". This seems to be illustrating a very common fallacy about how CPU caches work, and how the cache sub-system interacts with the execution cores. In this article I will attempt to explain the function CPU caches fulfil, and how the cores, which execute our programs of instructions, interact with them. For a concrete example I will dive into one of the latest Intel x86 server CPUs. Other CPUs use similar techniques to achieve the same ends. Most modern systems that execute our programs are shared-memory multi-processor systems in design. A shared-memory system has a single memory resource that is accessed by 2 or more independent CPU cores. Latency to main memory is highly variable from 10s to 100s of nanoseconds. Within 100ns it is possible for a 3.0GHz CPU to process up to 1200 instructions. Each Sandy Bridge core is capable of retiring up to 4 instructions-per-cycle (IPC) in parallel. CPUs employ cache sub-systems to hide this latency and allow them to exercise their huge capacity to process instructions. Some of these caches are small, very fast, and local to each core; others are slower, larger, and shared across cores. Together with registers and main-memory, these caches make up our non-persistent memory hierarchy. Next time you are developing an important algorithm, try pondering that a cache-miss is a lost opportunity to have executed ~500 CPU instructions! This is for a single-socket system, on a multi-socket system you can effectively double the lost opportunity as memory requests cross socket interconnects. Memory Hierarchy Figure 1. For the circa 2012 Sandy Bridge E class servers our memory hierarchy can be decomposed as follows: Registers: Within each core are separate register files containing 160 entries for integers and 144 floating point numbers. These registers are accessible within a single cycle and constitute the fastest memory available to our execution cores. Compilers will allocate our local variables and function arguments to these registers. When hyperthreading is enabled these registers are shared between the co-located hyperthreads. Memory Ordering Buffers (MOB): The MOB is comprised of a 64-entry load and 36-entry store buffer. These buffers are used to track in-flight operations while waiting on the cache sub-system. The store buffer is a fully associative queue that can be searched for existing store operations, which have been queued when waiting on the L1 cache. These buffers enable our fast processors to run asynchronously while data is transferred to and from the cache sub-system. When the processor issues asynchronous reads and writes then the results can come back out-of-order. The MOB is used to disambiguate the ordering for compliance to the published memory model. Level 1 Cache: The L1 is a core-local cache split into separate 32K data and 32K instruction caches. Access time is 3 cycles and can be hidden as instructions are pipelined by the core for data already in the L1 cache. Level 2 Cache: The L2 cache is a core-local cache designed to buffer access between the L1 and the shared L3 cache. The L2 cache is 256K in size and acts as an effective queue of memory accesses between the L1 and L3. L2 contains both data and instructions. L2 access latency is 12 cycles. Level 3 Cache: The L3 cache is shared across all cores within a socket. The L3 is split into 2MB segments each connected to a ring-bus network on the socket. Each core is also connected to this ring-bus. Addresses are hashed to segments for greater throughput. Latency can be up to 38 cycles depending on cache size. Cache size can be up to 20MB depending on the number of segments, with each additional hop around the ring taking an additional cycle. The L3 cache is inclusive of all data in the L1 and L2 for each core on the same socket. This inclusiveness, at the cost of space, allows the L3 cache to intercept requests thus removing the burden from private core-local L1 & L2 caches. Main Memory: DRAM channels are connected to each socket with an average latency of ~65ns for socket local access on a full cache-miss. This is however extremely variable, being much less for subsequent accesses to columns in the same row buffer, through to significantly more when queuing effects and memory refresh cycles conflict. 4 memory channels are aggregated together on each socket for throughput, and to hide latency via pipelining on the independent memory channels. NUMA: In a multi-socket server we have non-uniform memory access. It is non-uniform because the required memory maybe on a remote socket having an additional 40ns hop across the QPI bus. Sandy Bridge is a major step forward for 2-socket systems over Westmere and Nehalem. With Sandy Bridge the QPI limit has been raised from 6.4GT/s to 8.0GT/s, and two lanes can be aggregated thus eliminating the bottleneck of the previous systems. For Nehalem and Westmere the QPI link is only capable of ~40% the bandwidth that could be delivered by the memory controller for an individual socket. This limitation made accessing remote memory a choke point. In addition, the QPI link can now forward pre-fetch requests which previous generations could not. Associativity Levels Caches are effectively hardware based hash tables. The hash function is usually a simple masking of some low-order bits for cache indexing. Hash tables need some means to handle a collision for the same slot. The associativity level is the number of slots, also known as ways or sets, which can be used to hold a hashed version of an address. Having more levels of associativity is a trade off between storing more data vs. power requirements and time to search each of the ways. For Sandy Bridge the L1 and L2 are 8-way and the L3 is 12-way associative. Cache Coherence With some caches being local to cores, we need a means of keeping them coherent so all cores can have a consistent view of memory. The cache sub-system is considered the "source of truth" for mainstream systems. If memory is fetched from the cache it is never stale; the cache is the master copy when data exists in both the cache and main-memory. This style of memory management is known as write-back whereby data in the cache is only written back to main-memory when the cache-line is evicted because a new line is taking its place. An x86 cache works on blocks of data that are 64-bytes in size, known as a cache-line. Other processors can use a different size for the cache-line. A larger cache-line size reduces effective latency at the expense of increased bandwidth requirements. To keep the caches coherent the cache controller tracks the state of each cache-line as being in one of a finite number of states. The protocol Intel employs for this is MESIF, AMD employs a variant know as MOESI. Under the MESIF protocol each cache-line can be in 1 of the 5 following states: Modified: Indicates the cache-line is dirty and must be written back to memory at a later stage. When written back to main-memory the state transitions to Exclusive. Exclusive: Indicates the cache-line is held exclusively and that it matches main-memory. When written to, the state then transitions to Modified. To achieve this state a Request-For-Ownership (RFO) message is sent which involves a read plus an invalidate broadcast to all other copies. Shared: Indicates a clean copy of a cache-line that matches main-memory. Invalid: Indicates an unused cache-line. Forward: Indicates a specialised version of the shared state i.e. this is the designated cache which should respond to other caches in a NUMA system. To transition from one state to another, a series of messages are sent between the caches to effect state changes. Previous to Nehalem for Intel, and Opteron for AMD, this cache coherence traffic between sockets had to share the memory bus which greatly limited scalability. These days the memory controller traffic is on a separate bus. The Intel QPI, and AMD HyperTransport, buses are used for cache coherence between sockets. The cache controller exists as a module within each L3 cache segment that is connected to the on-socket ring-bus network. Each core, L3 cache segment, QPI controller, memory controller, and integrated graphics sub-system are connected to this ring-bus. The ring is made up of 4 independent lanes for: request, snoop, acknowledge, and 32-bytes data per cycle. The L3 cache is inclusive in that any cache-line held in the L1 or L2 caches is also held in the L3. This provides for rapid identification of the core containing a modified line when snooping for changes. The cache controller for the L3 segment keeps track of which core could have a modified version of a cache-line it owns. If a core wants to read some memory, and it does not have it in a Shared, Exclusive, or Modified state; then it must make a read on the ring bus. It will then either be read from main-memory if not in the cache sub-systems, or read from L3 if clean, or snooped from another core if Modified. In any case the read will never return a stale copy from the cache sub-system, it is guaranteed to be coherent. Concurrent Programming If our caches are always coherent then why do we worry about visibility when writing concurrent programs? This is because within our cores, in their quest for ever greater performance, data modifications can appear out-of-order to other threads. There are 2 major reasons for this. Firstly, our compilers can generate programs that store variables in registers for relatively long periods of time for performance reasons, e.g. variables used repeatedly within a loop. If we need these variables to be visible across cores then the updates must not be register allocated. This is achieved in C by qualifying a variable as "volatile". Beware that C/C++ volatile is inadequate for telling the compiler to order other instructions. For this you need fences/barriers. The second major issue with ordering we have to be aware of is a thread could write a variable and then, if it reads it shortly after, could see the value in its store buffer which may be older than the latest value in the cache sub-system. This is never an issue for algorithms following the Single Writer Principle but is an issue for the likes of the Dekker and Peterson lock algorithms. To overcome this issue, and ensure the latest value is observed, the thread must wait for the store buffer to drain on that core. This can be achieved by issuing a fence instruction. The write of a volatile variable in Java, in addition to never being register allocated, is accompanied by a full fence instruction. This fence instruction on x86 has a significant performance impact by preventing progress on the issuing thread until the store buffer is drained. Fences on other processors can have more efficient implementations that simply put a marker in the store buffer for the search boundary, e.g. the Azul Vega does this. If you want to ensure memory ordering across Java threads when following the Single Writer Principle, and avoid the store fence, it is possible by using the j.u.c.Atomic(Int|Long|Reference).lazySet() method, as opposed to setting a volatile variable. The Fallacy Returning to the fallacy of "flushing the cache" as part of a concurrent algorithm. I think we can safely say that we never "flush" the CPU cache within our user space programs. I believe the source of this fallacy is the need to flush, mark or drain to a point, the store buffer for some classes of concurrent algorithms so the latest value can be observed on a subsequent load operation. For this we require a memory ordering fence and not a cache flush. Another possible source of this fallacy is that L1 caches, or the TLB, may need to be flushed based on address indexing policy on a context switch. ARM, previous to ARMv6, did not use address space tags on TLB entries thus requiring the whole L1 cache to be flushed on a context switch. Many processors require the L1 instruction cache to be flushed for similar reasons, in many cases this is simply because instruction caches are not required to be kept coherent. The bottom line is, context switching is expensive and a bit off topic, so in addition to the cache pollution of the L2, a context switch can also cause the TLB and/or L1 caches to require a flush. Intel x86 processors require only a TLB flush on context switch.

February 15, 2013

by Martin Thompson

· 11,555 Views · 3 Likes

Local WebHooks with Mule Cloud Connect and LocalTunnel v2

When using an external API for WebHooks or Callbacks as discussed in Chapters 3 and 5 of Getting Started with Mule Cloud Connect; The API provider running somewhere out there on the web needs to callback your application that is happily running in isolation on your local machine. For an API provider to callback your application, the application must be accessible over the web. Sure, you could upload and test your application on a public facing server, but you may find it quicker and easier to work on your local development machine and these are typically behind firewalls, NAT, or otherwise not able to provide a public URL. You need a way to make your local application available over the web. There are a few good services and tools out there to help with this. Examples include ProxyLocal, and Forward.io. Alternatively, you can set up your own reverse SSH Tunnel if you already have a remote system to forward your requests, but this is cumbersome to say the least. I find Localtunnel to be an excellent fit for this need and localtunnel have just recently released v2 of its service with a host of new features and enhancements. More information can be found here: http://progrium.com/blog/2012/12/25/localtunnel-v2-available-in-beta/ Installing Localtunnel Those familiar with version 1 of the service will know that the v1 Localtunnel client was written in Ruby and required Rubygems to install it. The v2 client is now written in Python and can instead be installed via easy_install or pip. If instead you're interested in using Localtunnel v1, then I have wrote a previous blog post on the subject here: http://blogs.mulesoft.org/connector-callback-testing-local/ To get started, you will first need to check that you have Python installed. Localtunnel requires Python 2.6 or later. Most systems come with Python installed as standard, but if not you can check via the following command: $ python -version More info on installing Python can be found here: http://wiki.python.org/moin/BeginnersGuide/Download Once complete, you will need easy_install to install the Localtunnel client.If you don't have easy_install after you install Python, you can install it with this bootstrap script: $ curl http://peak.telecommunity.com/dist/ez_setup.py | python Once complete, you can install the Localtunnel client using the following command: $ easy_install localtunnel First run with LocalTunnel Once installed, creating a tunnel is as simple as running the following command: $ localtunnel-beta 8082 The parameter after the command: "8000" is the local port we want Localtunnel to forward to. So whatever port your app is running on should replace this value. Each time you run the command you should get output similar to the following: Port 8082 is now accessible from http://fb0322605126.v2.localtunnel.com ... Note: As v2 is still in beta; the command local-tunnel-beta will eventually be installed as just localtunnel. This lets you keep the v1 just in case anything goes wrong with v2 during the beta. Configuring the Connector Now onto Mule! To demonstrate I will use the Twilio Cloud Connector example from Chapter 5. Twilio has an awesome WebHook implementation with great debugging tools. Twilio uses callbacks to tell you about the status of your requests; When you use Twilio to a place a phone call or send an SMS the Twilio API allows you to send a URL where you'll receive information about the phone call once it ends or the status of the outbound SMS message after it's processed. This example uses the Twilio Cloud Connector to send a simple SMS message. The most important thing to note is that the "status-callback-flow-ref" attribute. All connector operations that support callback's will have an optional attribute ending in "-flow-ref". In this case : "status-callback-flow-ref". As the name suggests, this attribute should reference a flow. This value must be a valid flow id from within your configuration. It is this flow that will be used to listen for the callback. Notice that the flow has no inbound endpoint? This is where the magic happens; when Twilio process the SMS message it will send a callback automatically to that flow without you having to define an inbound endpoint. The connector automatically generates an inbound endpoint and sends the auto generated URL to Twilio for you. Customizing the Callback The URL generated for the callback URL is built using 'localhost' as the host, the 'http.port' environment variable or 'localPort' value as the port and the path of the URL is typically just a random generated string or static value. So if I run this locally it would send Twilio my non public address, something like: http://localhost:80/...vv3v3er342fvvn. Each connector that accepts HTTP callbacks will provide you with an optional http-callback-config child element to override these settings. These settings can be set at the connector's config level as follows: Here we have amended the previous example to add the additonal http-callback-config configuration. The configuration takes three additional arguments: domain, localPort and remotePort. These settings will be used to constuct the URL that is passed to the external system. The URL will be the same as the default generated URL of the HTTP inbound-endpoint except that the host is replaced by the 'domain' setting (or its default value) and the port is replaced by the 'remotePort' setting (or its default value). In this case we have used the domain from the URL that Localtunnel generated for us earlier: fb0322605126.v2.localtunnel.com and set the localPort to 8082 as we run the Localtunnel command using port 8082 and the remotePort to 80 as the localtunnel server just runs on port 80. And that's it! If you run this configuration you should start seeing your callback being printed to the console. The same goes for any OAuth connectors too. If your using any OAuth connectors built using the DevKit OAuth modules, you can configure the OAuth callback in a similar fashion. A full Mule/Twilio WebHook project can be found here: https://github.com/ryandcarter/GettingStarted-MuleCloudConnect-OReilly/tree/master/chapter05/twilio-webhooks

February 5, 2013

by Ryan Carter

· 4,977 Views

Testing MapReduce with MRUnit

Testing and debugging multi threaded programs is hard. Now take the same programs and massively distribute them across multiple JVMs deployed on a cluster of machines and the complexity goes off the roof. One way to overcome this complexity is to do testing in isolation and catch as many bugs as possible locally. MRUnit is a testing framework that lets you test and debug Map Reduce jobs in isolation without spinning up a Hadoop cluster. In this blog post we will cover various features of MRUnit by walking through a simple MapReduce job. Lets say we want to take the input below and create an inverted index using MapReduce. Input www.kohls.com,clothes,shoes,beauty,toys www.amazon.com,books,music,toys,ebooks,movies,computers www.ebay.com,auctions,cars,computers,books,antiques www.macys.com,shoes,clothes,toys,jeans,sweaters www.kroger.com,groceries Expected output antiques www.ebay.com auctions www.ebay.com beauty www.kohls.com books www.ebay.com,www.amazon.com cars www.ebay.com clothes www.kohls.com,www.macys.com computers www.amazon.com,www.ebay.com ebooks www.amazon.com jeans www.macys.com movies www.amazon.com music www.amazon.com shoes www.kohls.com,www.macys.com sweaters www.macys.com toys www.macys.com,www.amazon.com,www.kohls.com groceries www.kroger.com below are the Mapper and Reducer that do the transformation public class InvertedIndexMapper extends MapReduceBase implements Mapper { public static final int RETAIlER_INDEX = 0; @Override public void map(LongWritable longWritable, Text text, OutputCollector outputCollector, Reporter reporter) throws IOException { final String[] record = StringUtils.split(text.toString(), ","); final String retailer = record[RETAIlER_INDEX]; for (int i = 1; i < record.length; i++) { final String keyword = record[i]; outputCollector.collect(new Text(keyword), new Text(retailer)); } } } public class InvertedIndexReducer extends MapReduceBase implements Reducer { @Override public void reduce(Text text, Iterator textIterator, OutputCollector outputCollector, Reporter reporter) throws IOException { final String retailers = StringUtils.join(textIterator, ','); outputCollector.collect(text, new Text(retailers)); } } Implementation details are not really important but basically Mapper gets a line at a time, splits the line and emits key value pairs where Key is a category of product and value is the website which is selling the product. For example line retailer,category1,category2 will be emitted as (category1,retailer) and (category2,retailer). Reducer gets a key and a list of values, transforms the list of values to a comma delimited String and emits the key and value out. Now lets use MRUnit to write various tests for this Job. Three key classes in MRUnits are MapDriver for Mapper Testing, ReduceDriver for Reducer Testing and MapReduceDriver for end to end MapReduce Job testing. This is how we will setup the Test Class. public class InvertedIndexJobTest { private MapDriver mapDriver; private ReduceDriver reduceDriver; private MapReduceDriver mapReduceDriver; @Before public void setUp() throws Exception { final InvertedIndexMapper mapper = new InvertedIndexMapper(); final InvertedIndexReducer reducer = new InvertedIndexReducer(); mapDriver = MapDriver.newMapDriver(mapper); reduceDriver = ReduceDriver.newReduceDriver(reducer); mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer); } } MRUnit supports two style of testings. First style is to tell the framework both input and output values and let the framework do the assertions, second is the more traditional approach where you do the assertion yourself. Lets write a test using the first approach. @Test public void testMapperWithSingleKeyAndValue() throws Exception { final LongWritable inputKey = new LongWritable(0); final Text inputValue = new Text("www.kroger.com,groceries"); final Text outputKey = new Text("groceries"); final Text outputValue = new Text("www.kroger.com"); mapDriver.withInput(inputKey, inputValue); mapDriver.withOutput(outputKey, outputValue); mapDriver.runTest(); } In the test above we tell the framework both input and output Key and Value pairs and the framework does the assertion for us. This test can be written in a more traditional way as follow @Test public void testMapperWithSingleKeyAndValueWithAssertion() throws Exception { final LongWritable inputKey = new LongWritable(0); final Text inputValue = new Text("www.kroger.com,groceries"); final Text outputKey = new Text("groceries"); final Text outputValue = new Text("www.kroger.com"); mapDriver.withInput(inputKey, inputValue); final List> result = mapDriver.run(); assertThat(result) .isNotNull() .hasSize(1) .containsExactly(new Pair(outputKey, outputValue)); } Sometimes Mapper emits multiple Key Value pairs for a single input. MRUnit provides a fluent API to support this use case. Here is an example @Test public void testMapperWithSingleInputAndMultipleOutput() throws Exception { final LongWritable key = new LongWritable(0); mapDriver.withInput(key, new Text("www.amazon.com,books,music,toys,ebooks,movies,computers")); final List> result = mapDriver.run(); final Pair books = new Pair(new Text("books"), new Text("www.amazon.com")); final Pair toys = new Pair(new Text("toys"), new Text("www.amazon.com")); assertThat(result) .isNotNull() .hasSize(6) .contains(books, toys); } You write the test for the reduce exactly the same way. @Test public void testReducer() throws Exception { final Text inputKey = new Text("books"); final ImmutableList inputValue = ImmutableList.of(new Text("www.amazon.com"), new Text("www.ebay.com")); reduceDriver.withInput(inputKey,inputValue); final List> result = reduceDriver.run(); final Pair pair2 = new Pair(inputKey, new Text("www.amazon.com,www.ebay.com")); assertThat(result) .isNotNull() .hasSize(1) .containsExactly(pair2); } Finally you can use MapReduceDriver to test your Mapper, Combiner and Reducer together as a single job. You can also pass multiple key value pairs as input to your job. Test below demonstrate MapReduceDriver in action @Test public void testMapReduce() throws Exception { mapReduceDriver.withInput(new LongWritable(0), new Text("www.kohls.com,clothes,shoes,beauty,toys")); mapReduceDriver.withInput(new LongWritable(1), new Text("www.macys.com,shoes,clothes,toys,jeans,sweaters")); final List> result = mapReduceDriver.run(); final Pair clothes = new Pair(new Text("clothes"), new Text("www.kohls.com,www.macys.com")); final Pair jeans = new Pair(new Text("jeans"), new Text("www.macys.com")); assertThat(result) .isNotNull() .hasSize(6) .contains(clothes, jeans); }

February 5, 2013

by Mansur Ashraf

· 13,928 Views · 1 Like

Tutorial: Deploying an API on EC2 from AWS

Curator's Note: This article was co-authored by Andrzej Jarzyna. At 3scale we find Amazon to be a fantastic platform for running APIs due to the complete control you have on the application stack. For people new to AWS the learning curve is quite steep. So we put together our best practices into this short tutorial. Besides Amazon EC2 we will use the Ruby Grape gem to create the API interface and an Nginx proxy to handle access control. Best of all everything in this tutorial is completely FREE! For the purpose of this tutorial you will need a running API based on Ruby and Thin server. If you don’t have one you can simply clone an example repo as described below (in the “Deploying the Application” section). If you are interested in the background of this example (Sentiment API), you can see a couple of previous guides which 3scale has published. Here we use version_1 of the API(‘API up and running in 10 minutes‘) with some extra sentiment analysis functionality (this part is covered in the second tutorial of the Sentiment API tutorial). Now we will start the creation and configuration of the Amazon EC2 instance. If you already have an EC2 instance (micro or not), you can jump to the next step -> Preparing Instance for Deployment. Creating and configuring EC2 Instance Let’s start by signing up for the Amazon Elastic Compute Cloud (Amazon EC2). For our needs the free tier http://aws.amazon.com/free/ is enough, covering all the basic needs. Once the account is created go to the EC2 dashboard under your AWS Management Console and click on the Launch Instance button. That will transfer you to a popup window where you will continue the process: Choose the classic wizard Choose an AMI (Ubuntu Server 12.04.1 LTS 32bit, T1micro instance) leaving all the other settings for Instance Details as default Create a keypair and download it – this will be the key which you will use to make an ssh connection to the server, it’s VERY IMPORTANT! Add inbound rules for the firewall with source always 0.0.0.0/0 (HTTP, HTTPS, ALL ICMP, TCP port 3000 used by the Ruby thin server) Preparing Instance for Deployment Now, as we have the instance created and running, we can directly connect there from our console (Windows users from PuTTY). Right click on your instance, connect and choose Connect with a standalone SSH Client. Follow the steps and change the username to ubuntu (instead of root) in the given example. After executing this step you are connected to your instance. We will have to install new packages. Some of them require root credentials, so you will have to set a new root password: sudo passwd root. Then login as root: su root. Now with root credentials execute: sudo apt-get update and switch back to your normal user with exit command and install all the required packages: install some libraries which will be required by rvm, ruby and git: sudo apt-get install build-essential git zlib1g-dev libssl-dev libreadline-gplv2-dev imagemagick libxml2-dev libxslt1-dev openssl libreadline6 libreadline6-dev zlib1g libyaml-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison libpq-dev libpq5 libeditline-dev install git (on Linux rather than from Source): http://www.git-scm.com/book/en/Getting-Started-Installing-Git install rvm: https://rvm.io/rvm/install/ install ruby rvm install 1.9.3 rvm use 1.9.3 --default Deploying the Application Our sample Sentiment API is located on Github. Try cloning the repository: git clone [email protected]:jerzyn/api-demo.git you can once again review the code and tutorial on creating and deploying this app here: http://www.3scale.net/2012/06/the-10-minute-api-up-running-3scale-grape-heroku-api-10-minutes/ and here http://www.3scale.net/2012/07/how-to-out-of-the-box-api-analytics/ note the changes (we are using only v1, as authentication will go through the proxy). Now you can deploy the app by issuing: bundle install. Now you can start the thin server: thin start. To access the API directly (i.e. without any security or access control) access: your-public-dns:3000/v1/words/awesome.json (you can find your-public-dns in the AWS EC2 Dashboard->Instances in the details window of your instance) For the Nginx integration you will have to create an elastic IP address. Inside the AWS EC2 dashboard create an elastic IP in the same region as your instance and associate that IP to it (you won’t have to pay anything for the elastic IP as long as it is associated with your instance in the same region). OPTIONAL: If you want to assign a custom domain to your amazon instance you will have to do one thing: add an A record to the DNS record of your domain mapping the domain to the elastic IP address you have previously created. Your domain provider should either give you some way to set the A record (the IPv4 address), or it will give you a way to edit the nameservers of your domain. If they do not allow you to set the A record directly, find a DNS management service, register your domain as a zone there and the service will give you the nameservers to enter in the admin panel of your domain provider. You can then add the A record for the domain. Some possible DNS management services include ZoneEdit (basic, free), Amazon route 53, etc. At this point you API is open to the world. This is good and bad – great that you are sharing, but bad in the sense that without rate limits a few apps could kill the resources of your server, and you have no insight into who is using your API and how it is being used. The solution is to add some management for your API… Enabling API Management with 3scale Rather than reinvent the wheel and implement rate limits, access controls and analytics from scratch we will leverage the handy 3scale API Management service. Get your free 3scale account, activate and log-in to the new instance through the provided links. The first time you log-in you can choose the option for some sample data to be created, so you will have some API keys to use later. Next you would probably like to go through the tour to get a glimpse on the system functionality (optional) and then start with the implementation. To get some instant results we will start with the sandbox proxy which can be used while in development. Then we will also configure an Nginx proxy which can scale up for full production deployments. There is some documentation on the configuration of the API proxy at 3scale: https://support.3scale.net/howtos/api-configuration/nginx-proxy and for more advanced configuration options here: https://support.3scale.net/howtos/api-configuration/nginx-proxy-advanced Once you sign into your 3scale account, Launch your API on the main Dashboard screen or Go to API->Select the service (API)->Integration in the sidebar->Proxy Set the address of of your API backend – this has to be the Elastic IP address unless the custom domain has been set, including http protocol and port 3000. Now you can save and turn on the sandbox proxy to test your API by hitting the sandbox endpoint (after creating some app credentials in 3scale): http://sandbox-endpoint/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are id and key of one of the sample applications which you created when you first logged into your 3scale account (if you missed that step just create a developer account and an application within that account). Try it without app credentials, next with incorrect credentials, and then once authenticated within and over any rate limits that you have defined. Only once it is working to your satisfaction do you need to download the config files for Nginx. Note: any time you have errors check whether you can access the API directly: your-public-dns:3000/v1/words/awesome.json. If that is not available, then you need to check if the AWS instance is running and if the Thin Server is running on the instance. Implement an Nginx Proxy for Access Control In order to streamline this step we recommend that you install the fantastic OpenResty web application that is basically a bundle of the standard Nginx core with almost all the necessary 3rd party Nginx modules built-in. Install dependencies: sudo apt-get install libreadline-dev libncurses5-dev libpcre3-dev perl Compile and install Nginx: cd ~ sudo wget http://agentzh.org/misc/nginx/ngx_openresty-1.2.3.8.tar.gz sudo tar -zxvf ngx_openresty-1.2.3.8.tar.gz cd ngx_openresty-1.2.3.8/ ./configure --prefix=/opt/openresty --with-luajit --with-http_iconv_module -j2 make sudo make install In the config file make the following changes: edit the .conf file from nginx download in line 28, which is preceded by info to change your server name put the correct domain (of your Elastic IP or custom domain name) in line 78 change the path to the .lua file, downloaded together with the .conf file. We are almost finished! Our last step is to start the NGINX proxy and put some traffic through it. If it is not running yet (remember, that thin server has to be started first), please go to your EC2 instance terminal (the one you were connecting through ssh before) and start it now: sudo /opt/openresty/nginx/sbin/nginx -p /opt/openresty/nginx/ -c /opt/openresty/nginx/conf/YOUR-CONFIG-FILE.conf The last step will be verifying that the traffic goes through with a proper authorization. To do that, access: http://your-public-dns/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are key and id of the application you want to access through the API call. Once everything is confirmed as working correctly, you will want to block public access to the API backend on port 3000, which bypasses any access controls. If encounter some problems with the Nginx configuration or need a more detailed guide, I encourage you to check the 3scale guide on configuring Nginx proxy: https://support.3scale.net/howtos/api-configuration/nginx-proxy. You can go completely wild with customization of your API gateway. If you want to dive more into the 3scale system configuration (like usage and monitoring of your API traffic) feel encouraged to browse our Quickstart guides and HowTo’s.

February 4, 2013

by Steven Willmott

· 17,865 Views

Sorting Text Files with MapReduce

in my last post i wrote about sorting files in linux. decently large files (in the tens of gb’s) can be sorted fairly quickly using that approach. but what if your files are already in hdfs, or ar hundreds of gb’s in size or larger? in this case it makes sense to use mapreduce and leverage your cluster resources to sort your data in parallel. mapreduce should be thought of as a ubiquitous sorting tool, since by design it sorts all the map output records (using the map output keys), so that all the records that reach a single reducer are sorted. the diagram below shows the internals of how the shuffle phase works in mapreduce. given that mapreduce already performs sorting between the map and reduce phases, then sorting files can be accomplished with an identity function (one where the inputs to the map and reduce phases are emitted directly). this is in fact what the sort example that is bundled with hadoop does. you can look at the how the example code works by examining the org.apache.hadoop.examples.sort class. to use this example code to sort text files in hadoop, you would use it as follows: shell$ export hadoop_home=/usr/lib/hadoop shell$ $hadoop_home/bin/hadoop jar $hadoop_home/hadoop-examples.jar sort \ -informat org.apache.hadoop.mapred.keyvaluetextinputformat \ -outformat org.apache.hadoop.mapred.textoutputformat \ -outkey org.apache.hadoop.io.text \ -outvalue org.apache.hadoop.io.text \ /hdfs/path/to/input \ /hdfs/path/to/output this works well, but it doesn’t offer some of the features that i commonly rely upon in linux’s sort, such as sorting on a specific column, and case-insensitive sorts. linux-esque sorting in mapreduce i’ve started a new github repo called hadoop-utils , where i plan to roll useful helper classes and utilities. the first one is a flexible hadoop sort. the same hadoop example sort can be accomplished with the hadoop-utils sort as follows: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ /hdfs/path/to/input \ /hdfs/path/to/output to bring sorting in mapreduce closer to the linux sort, the --key and --field-separator options can be used to specify one or more columns that should be used for sorting, as well as a custom separator (whitespace is the default). for example, imagine you had a file in hdfs called /input/300names.txt which contained first and last names: shell$ hadoop fs -cat 300names.txt | head -n 5 roy franklin mario gardner willis romero max wilkerson latoya larson to sort on the last name you would run: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ /input/300names.txt \ /hdfs/path/to/output the syntax of --key is pos1[,pos2] , where the first position (pos1) is required, and the second position (pos2) is optional - if it’s omitted then pos1 through the rest of the line is used for sorting. just like the linux sort, --key is 1-based, so --key 2 in the above example will sort on the second column in the file. lzop integration another trick that this sort utility has is its tight integration with lzop, a useful compression codec that works well with large files in mapreduce (see chapter 5 of hadoop in practice for more details on lzop). it can work with lzop input files that span multiple splits, and can also lzop-compress outputs, and even create lzop index files. you would do this with the codec and lzop-index options: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ --codec com.hadoop.compression.lzo.lzopcodec \ --map-codec com.hadoop.compression.lzo.lzocodec \ --lzop-index \ /hdfs/path/to/input \ /hdfs/path/to/output multiple reducers and total ordering if your sort job runs with multiple reducers (either because mapreduce.job.reduces in mapred-site.xml has been set to a number larger than 1, or because you’ve used the -r option to specify the number of reducers on the command-line), then by default hadoop will use the hashpartitioner to distribute records across the reducers. use of the hashpartitioner means that you can’t concatenate your output files to create a single sorted output file. to do this you’ll need total ordering , which is supported by both the hadoop example sort and the hadoop-utils sort - the hadoop-utils sort enables this with the --total-order option. shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --total-order 0.1 10000 10 \ /hdfs/path/to/input \ /hdfs/path/to/output the syntax is for this option is unintuitive so let’s look at what each field means. more details on total ordering can be seen in chapter 4 of hadoop in practice . more details for details on how to download and run the hadoop-utils sort take a look at the cli guide in the github project page .

January 26, 2013

by Alex Holmes

· 15,480 Views

Assign a Fixed IP to an AWS EC2 Instance

as described in my previous post the ip (and dns) of your running ec2 ami will change after a reboot of that instance. of course this makes it very hard to make your applications on that machine available for the outside world, like in this case our wordpress blog. that is where elastic ip comes to the rescue. with this feature you can assign a static ip to your instance. assign one to your application as follows: click on the elastic ips link in the aws console allocate a new address associate the address with a running instance right click to associate the ip with an instance: pick the instance to assign this ip to: note the ip being assigned to your instance if you go to the ip address you were assigned then you see the home page of your server: and the nicest thing is that if you stop and start your instance you will receive a new public dns but your instance is still assigned to the elastic ip address: one important note: as long as an elastic ip address is associated with a running instance, there is no charge for it. however an address that is not associated with a running instance costs $0.01/hour. this prevents users from ‘reserving’ addresses while they are not being used.

January 20, 2013

by Eric Genesky

· 22,981 Views

Reading Hive Tables from MapReduce

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight. This is part two of a two part blog series on how to read/write Apache Hive data from MapReduce jos. Part one (Writing Hive Tables from MapReduce) is here. So just as sometimes you need to write data to Hive with a custom MapReduce job, sometimes you need to read that data back from Hive with a custom MapReduce job. As covered in part one, Hive is a layer that sits on HDFS and imposes a standard convention on the structure of the files so it can interpret them as columns and rows. Reading data out of Hive is just a matter of parsing the files correctly. Recall that files processed by MapReduce (and by extension, Hive) are output as key value pairs. Hive ignores the keys (read as a BytesWritable with a value of null) and reads/writes the values as Text objects. The value of the Text object for each row is the concatenation of all the column values delimited by the delimiter of the table (which Hive defaults to the "char 1" ASCII character). Seems like a simple problem, so my first thought was to just using String.split() in the map() method of the MapReduce job. String SEPARATOR_FIELD = new String(new char[] {1}); String[] rowColumns = new String (rowTextObject.getBytes()).split(SEPARATOR_FIELD); In theory this should have worked perfectly, but unfortunately I have found that String.split() actually consumes repeated delimiters. This is a problem if any of the values in the row are blank, since split() will shift the positions of your columns and you will be unable to match up what values belong with which columns. An alternative would be to create a String from the Text object and iterate through it using indexOf(). This approach however requires extra object creation and depending on the scale of your MapReduce job and the size of your rows, may slow you down needlessly. So an alternative is to use the Text object's find() method. String SEPARATOR_FIELD = new String(new char[] {1}); String[] rowColumns = new String[NUMBER_OF_COLUMNS_IN_YOUR_HIVE_TABLE]; int start = 0; int end = 0; for (int i = 0; i < rowColumns.length; ++i) { end = rowTextObject.find(SEPARATOR_FIELD, start); if (end == -1) { end = rowString.getLength(); } rowColumns[i] = new String(rowTextObject.getBytes(), start, end-start); start = end + 1; } This will parse out each value into the appropriately index of the rowColumns array. Blank values will also be handled correctly and result in blank strings being inserted into the rowColumns array.

January 11, 2013

by Scott Leberknight

· 6,633 Views · 1 Like

Configuring IIS methods for ASP.NET Web API on Windows Azure Websites

That’s a pretty long title, I agree. When working on my implementation of RFC2324, also known as the HyperText Coffee Pot Control Protocol, I’ve been struggling with something that you will struggle with as well in your ASP.NET Web API’s: supporting additional HTTP methods like HEAD, PATCH or PROPFIND. ASP.NET Web API has no issue with those, but when hosting them on IIS you’ll find yourself in Yellow-screen-of-death heaven. The reason why IIS blocks these methods (or fails to route them to ASP.NET) is because it may happen that your IIS installation has some configuration leftovers from another API: WebDAV. WebDAV allows you to work with a virtual filesystem (and others) using a HTTP API. IIS of course supports this (because flagship product “SharePoint” uses it, probably) and gets in the way of your API. Bottom line of the story: if you need those methods or want to provide your own HTTP methods, here’s the bit of configuration to add to your Web.config file: Here’s what each part does: Under modules, the WebDAVModule is being removed. Just to make sure that it’s not going to get in our way ever again. The security/requestFiltering element I’ve added only applies if you want to define your own HTTP methods. So unless you need the XYZ method I’ve defined here, don’t add it to your config. Under handlers, I’m removing the default handlers that route into ASP.NET. Then, I’m adding them again. The important part? The "verb attribute. You can provide a list of comma-separated methods that you want to route into ASP.NET. Again, I’ve added my XYZ methodbut you probably don’t need it. This will work on any IIS server as well as on Windows Azure Websites. It will make your API… happy.

December 11, 2012

by Maarten Balliauw

· 20,562 Views

Hazelcast Distributed Execution with Spring

The ExecutorService feature had come with Java 5 and is under the java.util.concurrent package. It extends the Executor interface and provides a thread pool functionality to execute asynchronous short tasks. Java Executor Service Types is suggested to look over basic ExecutorService implementation. Also ThreadPoolExecutor is a very useful implementation of ExecutorService ınterface. It extends AbstractExecutorService providing default implementations of ExecutorService execution methods. It provides improved performance when executing large numbers of asynchronous tasks and maintains basic statistics, such as the number of completed tasks. How to develop and monitor Thread Pool Services by using Spring is also suggested to investigate how to develop and monitor Thread Pool Services. So far, we have just talked Undistributed Executor Service implementation. Let us also investigate Distributed Executor Service. Hazelcast Distributed Executor Service feature is a distributed implementation of java.util.concurrent.ExecutorService. It allows to execute business logic in cluster. There are four alternative ways to realize it : 1) The logic can be executed on a specific cluster member which is chosen. 2) The logic can be executed on the member owning the key which is chosen. 3) The logic can be executed on the member Hazelcast will pick. 4) The logic can be executed on all or subset of the cluster members. This article shows how to develop Distributed Executor Service via Hazelcast and Spring. Used Technologies : JDK 1.7.0_09 Spring 3.1.3 Hazelcast 2.4 Maven 3.0.4 STEP 1 : CREATE MAVEN PROJECT A maven project is created as below. (It can be created by using Maven or IDE Plug-in). STEP 2 : LIBRARIES Firstly, Spring dependencies are added to Maven’ s pom.xml 3.1.3.RELEASE UTF-8 org.springframework spring-core ${spring.version} org.springframework spring-context ${spring.version} com.hazelcast hazelcast-all 2.4 log4j log4j 1.2.16 maven-compiler-plugin(Maven Plugin) is used to compile the project with JDK 1.7 org.apache.maven.plugins maven-compiler-plugin 3.0 1.7 1.7 maven-shade-plugin(Maven Plugin) can be used to create runnable-jar org.apache.maven.plugins maven-shade-plugin 2.0 package shade com.onlinetechvision.exe.Application META-INF/spring.handlers META-INF/spring.schemas STEP 3 : CREATE Customer BEAN A new Customer bean is created. This bean will be distributed between two node in OTV cluster. In the following sample, all defined properties(id, name and surname)’ types are String and standart java.io.Serializable interface has been implemented for serializing. If custom or third-party object types are used, com.hazelcast.nio.DataSerializable interface can be implemented for better serialization performance. package com.onlinetechvision.customer; import java.io.Serializable; /** * Customer Bean. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class Customer implements Serializable { private static final long serialVersionUID = 1856862670651243395L; private String id; private String name; private String surname; public String getId() { return id; } public void setId(String id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getSurname() { return surname; } public void setSurname(String surname) { this.surname = surname; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((id == null) ? 0 : id.hashCode()); result = prime * result + ((name == null) ? 0 : name.hashCode()); result = prime * result + ((surname == null) ? 0 : surname.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Customer other = (Customer) obj; if (id == null) { if (other.id != null) return false; } else if (!id.equals(other.id)) return false; if (name == null) { if (other.name != null) return false; } else if (!name.equals(other.name)) return false; if (surname == null) { if (other.surname != null) return false; } else if (!surname.equals(other.surname)) return false; return true; } @Override public String toString() { return "Customer [id=" + id + ", name=" + name + ", surname=" + surname + "]"; } } STEP 4 : CREATE ICacheService INTERFACE A new ICacheService Interface is created for service layer to expose cache functionality. package com.onlinetechvision.cache.srv; import com.hazelcast.core.IMap; import com.onlinetechvision.customer.Customer; /** * A new ICacheService Interface is created for service layer to expose cache functionality. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public interface ICacheService { /** * Adds Customer entries to cache * * @param String key * @param Customer customer * */ void addToCache(String key, Customer customer); /** * Deletes Customer entries from cache * * @param String key * */ void deleteFromCache(String key); /** * Gets Customer cache * * @return IMap Coherence named cache */ IMap getCache(); } STEP 5 : CREATE CacheService IMPLEMENTATION CacheService is implementation of ICacheService Interface. package com.onlinetechvision.cache.srv; import com.hazelcast.core.IMap; import com.onlinetechvision.customer.Customer; import com.onlinetechvision.test.listener.CustomerEntryListener; /** * CacheService Class is implementation of ICacheService Interface. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class CacheService implements ICacheService { private IMap customerMap; /** * Constructor of CacheService * * @param IMap customerMap * */ @SuppressWarnings("unchecked") public CacheService(IMap customerMap) { setCustomerMap(customerMap); getCustomerMap().addEntryListener(new CustomerEntryListener(), true); } /** * Adds Customer entries to cache * * @param String key * @param Customer customer * */ @Override public void addToCache(String key, Customer customer) { getCustomerMap().put(key, customer); } /** * Deletes Customer entries from cache * * @param String key * */ @Override public void deleteFromCache(String key) { getCustomerMap().remove(key); } /** * Gets Customer cache * * @return IMap Coherence named cache */ @Override public IMap getCache() { return getCustomerMap(); } public IMap getCustomerMap() { return customerMap; } public void setCustomerMap(IMap customerMap) { this.customerMap = customerMap; } } STEP 6 : CREATE IDistributedExecutorService INTERFACE A new IDistributedExecutorService Interface is created for service layer to expose distributed execution functionality. package com.onlinetechvision.executor.srv; import java.util.Collection; import java.util.Set; import java.util.concurrent.Callable; import java.util.concurrent.ExecutionException; import com.hazelcast.core.Member; /** * A new IDistributedExecutorService Interface is created for service layer to expose distributed execution functionality. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public interface IDistributedExecutorService { /** * Executes the callable object on stated member * * @param Callable callable * @param Member member * @throws InterruptedException * @throws ExecutionException * */ String executeOnStatedMember(Callable callable, Member member) throws InterruptedException, ExecutionException; /** * Executes the callable object on member owning the key * * @param Callable callable * @param Object key * @throws InterruptedException * @throws ExecutionException * */ String executeOnTheMemberOwningTheKey(Callable callable, Object key) throws InterruptedException, ExecutionException; /** * Executes the callable object on any member * * @param Callable callable * @throws InterruptedException * @throws ExecutionException * */ String executeOnAnyMember(Callable callable) throws InterruptedException, ExecutionException; /** * Executes the callable object on all members * * @param Callable callable * @param Set all members * @throws InterruptedException * @throws ExecutionException * */ Collection executeOnMembers(Callable callable, Set members) throws InterruptedException, ExecutionException; } STEP 7 : CREATE DistributedExecutorService IMPLEMENTATION DistributedExecutorService is implementation of IDistributedExecutorService Interface. package com.onlinetechvision.executor.srv; import java.util.Collection; import java.util.Set; import java.util.concurrent.Callable; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Future; import java.util.concurrent.FutureTask; import org.apache.log4j.Logger; import com.hazelcast.core.DistributedTask; import com.hazelcast.core.Member; import com.hazelcast.core.MultiTask; /** * DistributedExecutorService Class is implementation of IDistributedExecutorService Interface. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class DistributedExecutorService implements IDistributedExecutorService { private static final Logger logger = Logger.getLogger(DistributedExecutorService.class); private ExecutorService hazelcastDistributedExecutorService; /** * Executes the callable object on stated member * * @param Callable callable * @param Member member * @throws InterruptedException * @throws ExecutionException * */ @SuppressWarnings("unchecked") public String executeOnStatedMember(Callable callable, Member member) throws InterruptedException, ExecutionException { logger.debug("Method executeOnStatedMember is called..."); ExecutorService executorService = getHazelcastDistributedExecutorService(); FutureTask task = (FutureTask) executorService.submit( new DistributedTask(callable, member)); String result = task.get(); logger.debug("Result of method executeOnStatedMember is : " + result); return result; } /** * Executes the callable object on member owning the key * * @param Callable callable * @param Object key * @throws InterruptedException * @throws ExecutionException * */ @SuppressWarnings("unchecked") public String executeOnTheMemberOwningTheKey(Callable callable, Object key) throws InterruptedException, ExecutionException { logger.debug("Method executeOnTheMemberOwningTheKey is called..."); ExecutorService executorService = getHazelcastDistributedExecutorService(); FutureTask task = (FutureTask) executorService.submit(new DistributedTask(callable, key)); String result = task.get(); logger.debug("Result of method executeOnTheMemberOwningTheKey is : " + result); return result; } /** * Executes the callable object on any member * * @param Callable callable * @throws InterruptedException * @throws ExecutionException * */ public String executeOnAnyMember(Callable callable) throws InterruptedException, ExecutionException { logger.debug("Method executeOnAnyMember is called..."); ExecutorService executorService = getHazelcastDistributedExecutorService(); Future task = executorService.submit(callable); String result = task.get(); logger.debug("Result of method executeOnAnyMember is : " + result); return result; } /** * Executes the callable object on all members * * @param Callable callable * @param Set all members * @throws InterruptedException * @throws ExecutionException * */ public Collection executeOnMembers(Callable callable, Set members) throws ExecutionException, InterruptedException { logger.debug("Method executeOnMembers is called..."); MultiTask task = new MultiTask(callable, members); ExecutorService executorService = getHazelcastDistributedExecutorService(); executorService.execute(task); Collection results = task.get(); logger.debug("Result of method executeOnMembers is : " + results.toString()); return results; } public ExecutorService getHazelcastDistributedExecutorService() { return hazelcastDistributedExecutorService; } public void setHazelcastDistributedExecutorService(ExecutorService hazelcastDistributedExecutorService) { this.hazelcastDistributedExecutorService = hazelcastDistributedExecutorService; } } STEP 8 : CREATE TestCallable CLASS TestCallable Class shows business logic to be executed. TestCallable task for first member of the cluster : package com.onlinetechvision.task; import java.io.Serializable; import java.util.concurrent.Callable; /** * TestCallable Class shows business logic to be executed. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class TestCallable implements Callable, Serializable{ private static final long serialVersionUID = -1839169907337151877L; /** * Computes a result, or throws an exception if unable to do so. * * @return String computed result * @throws Exception if unable to compute a result */ public String call() throws Exception { return "First Member' s TestCallable Task is called..."; } } TestCallable task for second member of the cluster : package com.onlinetechvision.task; import java.io.Serializable; import java.util.concurrent.Callable; /** * TestCallable Class shows business logic to be executed. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class TestCallable implements Callable, Serializable{ private static final long serialVersionUID = -1839169907337151877L; /** * Computes a result, or throws an exception if unable to do so. * * @return String computed result * @throws Exception if unable to compute a result */ public String call() throws Exception { return "Second Member' s TestCallable Task is called..."; } } STEP 9 : CREATE AnotherAvailableMemberNotFoundException CLASS AnotherAvailableMemberNotFoundException is thrown when another available member is not found. To avoid this exception, first node should be started before the second node. package com.onlinetechvision.exception; /** * AnotherAvailableMemberNotFoundException is thrown when another available member is not found. * To avoid this exception, first node should be started before the second node. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class AnotherAvailableMemberNotFoundException extends Exception { private static final long serialVersionUID = -3954360266393077645L; /** * Constructor of AnotherAvailableMemberNotFoundException * * @param String Exception message * */ public AnotherAvailableMemberNotFoundException(String message) { super(message); } } STEP 10 : CREATE CustomerEntryListener CLASS CustomerEntryListener Class listens entry changes on named cache object. package com.onlinetechvision.test.listener; import com.hazelcast.core.EntryEvent; import com.hazelcast.core.EntryListener; /** * CustomerEntryListener Class listens entry changes on named cache object. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ @SuppressWarnings("rawtypes") public class CustomerEntryListener implements EntryListener { /** * Invoked when an entry is added. * * @param EntryEvent * */ public void entryAdded(EntryEvent ee) { System.out.println("EntryAdded... Member : " + ee.getMember() + ", Key : "+ee.getKey()+", OldValue : "+ee.getOldValue()+", NewValue : "+ee.getValue()); } /** * Invoked when an entry is removed. * * @param EntryEvent * */ public void entryRemoved(EntryEvent ee) { System.out.println("EntryRemoved... Member : " + ee.getMember() + ", Key : "+ee.getKey()+", OldValue : "+ee.getOldValue()+", NewValue : "+ee.getValue()); } /** * Invoked when an entry is evicted. * * @param EntryEvent * */ public void entryEvicted(EntryEvent ee) { } /** * Invoked when an entry is updated. * * @param EntryEvent * */ public void entryUpdated(EntryEvent ee) { } } STEP 11 : CREATE Starter CLASS Starter Class loads Customers to cache and executes distributed tasks. Starter Class of first member of the cluster : package com.onlinetechvision.exe; import com.onlinetechvision.cache.srv.ICacheService; import com.onlinetechvision.customer.Customer; /** * Starter Class loads Customers to cache and executes distributed tasks. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class Starter { private ICacheService cacheService; /** * Loads cache and executes the tasks * */ public void start() { loadCacheForFirstMember(); } /** * Loads Customers to cache * */ public void loadCacheForFirstMember() { Customer firstCustomer = new Customer(); firstCustomer.setId("1"); firstCustomer.setName("Jodie"); firstCustomer.setSurname("Foster"); Customer secondCustomer = new Customer(); secondCustomer.setId("2"); secondCustomer.setName("Kate"); secondCustomer.setSurname("Winslet"); getCacheService().addToCache(firstCustomer.getId(), firstCustomer); getCacheService().addToCache(secondCustomer.getId(), secondCustomer); } public ICacheService getCacheService() { return cacheService; } public void setCacheService(ICacheService cacheService) { this.cacheService = cacheService; } } Starter Class of second member of the cluster : package com.onlinetechvision.exe; import java.util.Set; import java.util.concurrent.ExecutionException; import com.hazelcast.core.Hazelcast; import com.hazelcast.core.HazelcastInstance; import com.hazelcast.core.Member; import com.onlinetechvision.cache.srv.ICacheService; import com.onlinetechvision.customer.Customer; import com.onlinetechvision.exception.AnotherAvailableMemberNotFoundException; import com.onlinetechvision.executor.srv.IDistributedExecutorService; import com.onlinetechvision.task.TestCallable; /** * Starter Class loads Customers to cache and executes distributed tasks. * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class Starter { private String hazelcastInstanceName; private Hazelcast hazelcast; private IDistributedExecutorService distributedExecutorService; private ICacheService cacheService; /** * Loads cache and executes the tasks * */ public void start() { loadCache(); executeTasks(); } /** * Loads Customers to cache * */ public void loadCache() { Customer firstCustomer = new Customer(); firstCustomer.setId("3"); firstCustomer.setName("Bruce"); firstCustomer.setSurname("Willis"); Customer secondCustomer = new Customer(); secondCustomer.setId("4"); secondCustomer.setName("Colin"); secondCustomer.setSurname("Farrell"); getCacheService().addToCache(firstCustomer.getId(), firstCustomer); getCacheService().addToCache(secondCustomer.getId(), secondCustomer); } /** * Executes Tasks * */ public void executeTasks() { try { getDistributedExecutorService().executeOnStatedMember(new TestCallable(), getAnotherMember()); getDistributedExecutorService().executeOnTheMemberOwningTheKey(new TestCallable(), "3"); getDistributedExecutorService().executeOnAnyMember(new TestCallable()); getDistributedExecutorService().executeOnMembers(new TestCallable(), getAllMembers()); } catch (InterruptedException | ExecutionException | AnotherAvailableMemberNotFoundException e) { e.printStackTrace(); } } /** * Gets cluster members * * @return Set Set of Cluster Members * */ private Set getAllMembers() { Set members = getHazelcastLocalInstance().getCluster().getMembers(); return members; } /** * Gets an another member of cluster * * @return Member Another Member of Cluster * @throws AnotherAvailableMemberNotFoundException An Another Available Member can not found exception */ private Member getAnotherMember() throws AnotherAvailableMemberNotFoundException { Set members = getAllMembers(); for(Member member : members) { if(!member.localMember()) { return member; } } throw new AnotherAvailableMemberNotFoundException("No Other Available Member on the cluster. Please be aware that all members are active on the cluster"); } /** * Gets Hazelcast local instance * * @return HazelcastInstance Hazelcast local instance */ @SuppressWarnings("static-access") private HazelcastInstance getHazelcastLocalInstance() { HazelcastInstance instance = getHazelcast().getHazelcastInstanceByName(getHazelcastInstanceName()); return instance; } public String getHazelcastInstanceName() { return hazelcastInstanceName; } public void setHazelcastInstanceName(String hazelcastInstanceName) { this.hazelcastInstanceName = hazelcastInstanceName; } public Hazelcast getHazelcast() { return hazelcast; } public void setHazelcast(Hazelcast hazelcast) { this.hazelcast = hazelcast; } public IDistributedExecutorService getDistributedExecutorService() { return distributedExecutorService; } public void setDistributedExecutorService(IDistributedExecutorService distributedExecutorService) { this.distributedExecutorService = distributedExecutorService; } public ICacheService getCacheService() { return cacheService; } public void setCacheService(ICacheService cacheService) { this.cacheService = cacheService; } } STEP 12 : CREATE hazelcast-config.properties FILE hazelcast-config.properties file shows the properties of cluster members. First member properties : hz.instance.name = OTVInstance1 hz.group.name = dev hz.group.password = dev hz.management.center.enabled = true hz.management.center.url = http://localhost:8080/mancenter hz.network.port = 5701 hz.network.port.auto.increment = false hz.tcp.ip.enabled = true hz.members = 192.168.1.32 hz.executor.service.core.pool.size = 2 hz.executor.service.max.pool.size = 30 hz.executor.service.keep.alive.seconds = 30 hz.map.backup.count=2 hz.map.max.size=0 hz.map.eviction.percentage=30 hz.map.read.backup.data=true hz.map.cache.value=true hz.map.eviction.policy=NONE hz.map.merge.policy=hz.ADD_NEW_ENTRY Second member properties : hz.instance.name = OTVInstance2 hz.group.name = dev hz.group.password = dev hz.management.center.enabled = true hz.management.center.url = http://localhost:8080/mancenter hz.network.port = 5702 hz.network.port.auto.increment = false hz.tcp.ip.enabled = true hz.members = 192.168.1.32 hz.executor.service.core.pool.size = 2 hz.executor.service.max.pool.size = 30 hz.executor.service.keep.alive.seconds = 30 hz.map.backup.count=2 hz.map.max.size=0 hz.map.eviction.percentage=30 hz.map.read.backup.data=true hz.map.cache.value=true hz.map.eviction.policy=NONE hz.map.merge.policy=hz.ADD_NEW_ENTRY STEP 13 : CREATE applicationContext-hazelcast.xml Spring Hazelcast Configuration file, applicationContext-hazelcast.xml, is created and Hazelcast Distributed Executor Service and Hazelcast Instance are configured. ${hz.instance.name} ${hz.members} STEP 14 : CREATE applicationContext.xml Spring Configuration file, applicationContext.xml, is created. classpath:/hazelcast-config.properties STEP 15 : CREATE Application CLASS Application Class is created to run the application. ackage com.onlinetechvision.exe; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; /** * Application class starts the application * * @author onlinetechvision.com * @since 27 Nov 2012 * @version 1.0.0 * */ public class Application { /** * Starts the application * * @param String[] args * */ public static void main(String[] args) { ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml"); Starter starter = (Starter) context.getBean("starter"); starter.start(); } } STEP 16 : BUILD PROJECT After OTV_Spring_Hazelcast_DistributedExecution Project is built, OTV_Spring_Hazelcast_DistributedExecution-0.0.1-SNAPSHOT.jar will be created. Important Note : The Members of the cluster have got different configuration for Coherence so the project should be built separately for each member. STEP 17 : INTEGRATION with HAZELCAST MANAGEMENT CENTER Hazelcast Management Center enables to monitor and manage nodes in the cluster. Entity and backup counts which are owned by customerMap, can be seen via Map Memory Data Table. We have distributed 4 entries via customerMap as shown below : Sample keys and values can be seen via Map Browser : Added First Entry : Added Third Entry : hazelcastDistributedExecutorService details can be seen via Executors tab. We have executed 3 task on first member and 2 tasks on second member as shown below : STEP 18 : RUN PROJECT BY STARTING THE CLUSTER’ s MEMBER After created OTV_Spring_Hazelcast_DistributedExecution-0.0.1-SNAPSHOT.jar file is run at the cluster’ s members, the following console output logs will be shown : First member console output : Kas 25, 2012 4:07:20 PM com.hazelcast.impl.AddressPicker INFO: Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [x.y.z.t] Kas 25, 2012 4:07:20 PM com.hazelcast.impl.AddressPicker INFO: Prefer IPv4 stack is true. Kas 25, 2012 4:07:20 PM com.hazelcast.impl.AddressPicker INFO: Picked Address[x.y.z.t]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true Kas 25, 2012 4:07:21 PM com.hazelcast.system INFO: [x.y.z.t]:5701 [dev] Hazelcast Community Edition 2.4 (20121017) starting at Address[x.y.z.t]:5701 Kas 25, 2012 4:07:21 PM com.hazelcast.system INFO: [x.y.z.t]:5701 [dev] Copyright (C) 2008-2012 Hazelcast.com Kas 25, 2012 4:07:21 PM com.hazelcast.impl.LifecycleServiceImpl INFO: [x.y.z.t]:5701 [dev] Address[x.y.z.t]:5701 is STARTING Kas 25, 2012 4:07:24 PM com.hazelcast.impl.TcpIpJoiner INFO: [x.y.z.t]:5701 [dev] --A new cluster is created and First Member joins the cluster. Members [1] { Member [x.y.z.t]:5701 this } Kas 25, 2012 4:07:24 PM com.hazelcast.impl.MulticastJoiner INFO: [x.y.z.t]:5701 [dev] Members [1] { Member [x.y.z.t]:5701 this } ... -- First member adds two new entries to the cache... EntryAdded... Member : Member [x.y.z.t]:5701 this, Key : 1, OldValue : null, NewValue : Customer [id=1, name=Jodie, surname=Foster] EntryAdded... Member : Member [x.y.z.t]:5701 this, Key : 2, OldValue : null, NewValue : Customer [id=2, name=Kate, surname=Winslet] ... --Second Member joins the cluster. Members [2] { Member [x.y.z.t]:5701 this Member [x.y.z.t]:5702 } ... -- Second member adds two new entries to the cache... EntryAdded... Member : Member [x.y.z.t]:5702, Key : 4, OldValue : null, NewValue : Customer [id=4, name=Colin, surname=Farrell] EntryAdded... Member : Member [x.y.z.t]:5702, Key : 3, OldValue : null, NewValue : Customer [id=3, name=Bruce, surname=Willis] Second member console output : Kas 25, 2012 4:07:48 PM com.hazelcast.impl.AddressPicker INFO: Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [x.y.z.t] Kas 25, 2012 4:07:48 PM com.hazelcast.impl.AddressPicker INFO: Prefer IPv4 stack is true. Kas 25, 2012 4:07:48 PM com.hazelcast.impl.AddressPicker INFO: Picked Address[x.y.z.t]:5702, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5702], bind any local is true Kas 25, 2012 4:07:49 PM com.hazelcast.system INFO: [x.y.z.t]:5702 [dev] Hazelcast Community Edition 2.4 (20121017) starting at Address[x.y.z.t]:5702 Kas 25, 2012 4:07:49 PM com.hazelcast.system INFO: [x.y.z.t]:5702 [dev] Copyright (C) 2008-2012 Hazelcast.com Kas 25, 2012 4:07:49 PM com.hazelcast.impl.LifecycleServiceImpl INFO: [x.y.z.t]:5702 [dev] Address[x.y.z.t]:5702 is STARTING Kas 25, 2012 4:07:49 PM com.hazelcast.impl.Node INFO: [x.y.z.t]:5702 [dev] ** setting master address to Address[x.y.z.t]:5701 Kas 25, 2012 4:07:49 PM com.hazelcast.impl.MulticastJoiner INFO: [x.y.z.t]:5702 [dev] Connecting to master node: Address[x.y.z.t]:5701 Kas 25, 2012 4:07:49 PM com.hazelcast.nio.ConnectionManager INFO: [x.y.z.t]:5702 [dev] 55715 accepted socket connection from /x.y.z.t:5701 Kas 25, 2012 4:07:55 PM com.hazelcast.cluster.ClusterManager INFO: [x.y.z.t]:5702 [dev] --Second Member joins the cluster. Members [2] { Member [x.y.z.t]:5701 Member [x.y.z.t]:5702 this } Kas 25, 2012 4:07:56 PM com.hazelcast.impl.LifecycleServiceImpl INFO: [x.y.z.t]:5702 [dev] Address[x.y.z.t]:5702 is STARTED -- Second member adds two new entries to the cache... EntryAdded... Member : Member [x.y.z.t]:5702 this, Key : 3, OldValue : null, NewValue : Customer [id=3, name=Bruce, surname=Willis] EntryAdded... Member : Member [x.y.z.t]:5702 this, Key : 4, OldValue : null, NewValue : Customer [id=4, name=Colin, surname=Farrell] 25.11.2012 16:07:56 DEBUG (DistributedExecutorService.java:42) - Method executeOnStatedMember is called... 25.11.2012 16:07:56 DEBUG (DistributedExecutorService.java:46) - Result of method executeOnStatedMember is : First Member' s TestCallable Task is called... 25.11.2012 16:07:56 DEBUG (DistributedExecutorService.java:61) - Method executeOnTheMemberOwningTheKey is called... 25.11.2012 16:07:56 DEBUG (DistributedExecutorService.java:65) - Result of method executeOnTheMemberOwningTheKey is : First Member' s TestCallable Task is called... 25.11.2012 16:07:56 DEBUG (DistributedExecutorService.java:78) - Method executeOnAnyMember is called... 25.11.2012 16:07:57 DEBUG (DistributedExecutorService.java:82) - Result of method executeOnAnyMember is : Second Member' s TestCallable Task is called... 25.11.2012 16:07:57 DEBUG (DistributedExecutorService.java:96) - Method executeOnMembers is called... 25.11.2012 16:07:57 DEBUG (DistributedExecutorService.java:101) - Result of method executeOnMembers is : [First Member' s TestCallable Task is called..., Second Member' s TestCallable Task is called...] STEP 19 : DOWNLOAD https://github.com/erenavsarogullari/OTV_Spring_Hazelcast_DistributedExecution REFERENCES : Java ExecutorService Interface Hazelcast Distributed Executor Service

December 11, 2012

by Eren Avsarogullari

· 29,976 Views · 1 Like

Lightweight RPC with ZeroMQ (ØMQ) and Protocol Buffers

A frequent issue I come across writing integration applications with Mule is deciding how to communicate back and forth between my front end application, typically a web or mobile application, and a flow hosted on Mule. I could use web services and do something like annotate a component with JAX-RS and expose this out over HTTP. This is potentially overkill, particularly if I only want to host a few methods, the methods are asynchronous or I don’t want to deal with the overhead of HTTP. It also could be a lot of extra effort if the only consumers of the API, at least initially, are internal facing applications. Another choice is to use “synchronous” JMS with temporary reply queues. While Mule makes this easy to do, particularly with MuleClient, I now have to deal with the overhead of spinning up a JMS infrastructure. I could also be limited to Java only clients, depending on which JMS broker I choose. The latter is particularly signifcant, as Java probably isn’t the technology of choice on the web or mobile layer. ØMQ for RPC ØMQ, or ZeroMQ, is a networking library designed from the ground up to ease integration between distributed applications. In addition to supporting a variety of messaging patterns, which are enumerated in the extremely well written guide, the library is written in platform agnostic C with wrappers for different languages like Java, Python and Ruby. These features make it a good candidate to solve the challenges I introduced above, particularly since a community contributed module for ØMQ was released recently. Let’s consider a simple service that accepts a request for a range of stock quotes and returns the results and see how we can host this service with Mule and expose it out with the ØMQ Module. Data Serialization with Protocol Buffers Data is transported back and forth over ØMQ as byte arrays. We, as such, need to decide on a way to serialize our stock quote request and responses “on the wire.” Before we do that, however, let’s take a look at the Java canonical data model we’re using on the client and server side. The following Gists show the important bits of the StockQuote and StockQuoteResponse classes. public class StockQuote implements Serializable { String symbol; Date date; Double open; Double high; Double low; Double close; Long volume; Double adjustedClose; public class StockQuoteRequest implements Serializable { String symbol; Date startDate; Date endDate; public interface StockDataService { public List getQuote(StockQuoteRequest request); } We could use Java serialization to get the objects into byte arrays. Ignoring the other deficiencies of default Java serialization, the main drawback is that it limits our clients to one’s running on a JVM. XML or JSON provide better alternatives, but for the purposes of this example we’ll assume we want a more compact representation of the data (this isn’t totally unrealistic, stock quote data can be extremely time sensitive and we probably want to minimize serialization and deserialization overhead.) Protocol Buffers provide a good middle ground and also boast a Mule Module to provide the necessary transformers we need to move back and forth from the byte array representations. Let’s define two .proto files to define the wire format and generate the intermediary stubs for serialization. package com.acmesoft.zeromq; option java_package = "com.acmesoft.stock.model.serialization.protobuf"; option optimize_for = SPEED;package com.acmesoft.zeromq; option java_package = "com.acmesoft.stock.model.serialization.protobuf"; option optimize_for = SPEED; option java_multiple_files = true; message StockQuoteResponseBuffer { repeated StockQuoteBuffer result = 1; } message StockQuoteBuffer { required string symbol = 1; required int64 date = 2; required double open = 3; required double high = 4; required double low = 5; required double close = 6; required int64 volume = 7; required double adjustedClose = 8; } option java_multiple_files = true; message StockQuoteRequestBuffer { required string symbol = 1; required int64 start = 2; required int64 end = 3; } You typically would use the “protoc” compiler to generate the Java stubs. This is tedious, however, so we’ll instead modify the pom.xml of our project to compile the protoc files during the compile goals: com.google.protobuf.tools maven-protoc-plugin /usr/local/bin/protoc compile testCompile Since we already have a domain model we’ll add some helper classes to simplify the serialization tasks on the client side. public byte[] toProtocolBufferAsBytes() { return StockQuoteRequestBuffer.newBuilder() .setSymbol(symbol) .setStart(startDate.getTime()) .setEnd(endDate.getTime()).build().toByteArray(); } public static StockQuoteRequest fromProtocolBuffer(StockQuoteRequestBuffer buffer) { StockQuoteRequest request = new StockQuoteRequest(); request.setSymbol(buffer.getSymbol()); request.setStartDate(new Date(buffer.getStart())); request.setEndDate(new Date(buffer.getEnd())); return request; } public static StockQuoteResponseBuffer toProtocolBuffer(List quotes) { StockQuoteResponseBuffer.Builder responseBuilder = StockQuoteResponseBuffer.newBuilder(); for (StockQuote quote : quotes) { responseBuilder.addResult(StockQuoteBuffer.newBuilder() .setAdjustedClose(quote.getAdjustedClose()) .setClose(quote.getClose()) .setDate(quote.getDate().getTime()) .setHigh(quote.getHigh()) .setLow(quote.getLow()) .setOpen(quote.getOpen()) .setSymbol(quote.getSymbol()) .setVolume(quote.getVolume()).build()); } return responseBuilder.build(); } public static List listOfStockQuotesFromBytes(byte[] bytes) { List buffer; try { buffer = StockQuoteResponseBuffer.parseFrom(bytes).getResultList(); } catch (InvalidProtocolBufferException e) { throw new SerializationException(e); } List quotes = new ArrayList(); for (StockQuoteBuffer stockQuoteBuffer : buffer) { StockQuote stockQuote = new StockQuote(); stockQuote.setClose(stockQuoteBuffer.getClose()); stockQuote.setDate(new Date(stockQuoteBuffer.getDate())); stockQuote.setHigh(stockQuoteBuffer.getHigh()); stockQuote.setOpen(stockQuoteBuffer.getOpen()); stockQuote.setSymbol(stockQuoteBuffer.getSymbol()); stockQuote.setVolume(stockQuoteBuffer.getVolume()); stockQuote.setAdjustedClose(stockQuoteBuffer.getAdjustedClose()); stockQuote.setLow(stockQuoteBuffer.getLow()); quotes.add(stockQuote); } return quotes; } Configuring StockDataService Now that we have a canonical data model and a wire format defined we’re ready to wire up a Mule flow to expose the service out. Note that for this to work you need to have jzmq installed locally on your system. The following dependency needs to be added to your pom.xml once its installed: org.zeromq zmq 2.2.0 /usr/local/lib/zmq.jar system Where systemPath is the location of the zmq.jar on your filesystem. Once that’s out of the way we can configure the flow, as illustrated below: The ZeroMQ inbound-endpoint will be bound to TCP port 9090 with a request-response exchange pattern. The deserialize MP in the protobuf module will deserialize the byte array to the generated StockQuoteRequestBuffer class. From there we’ll use MEL to invoke the helper method on StockQuoteRequest to transform the intermediary class to the domain model. The List of StockQuotes returned from StockDataService will be transformed by the MEL expression using the “toProtocolBuffer” helper method on the domain model. The Protocol Buffer Module is then smart enough to implicitly transform the intermediary object to a byte array for the response. Consuming the Service from the Client Side Now that the server is ready we can turn our attention to the client side code to invoke the remote service. Let’s take a look at how this works: StockQuoteRequest stockQuoteRequest = new StockQuoteRequest(); stockQuoteRequest.setSymbol("FB"); stockQuoteRequest.setStartDate(new Date( new Date().getTime() - (86400000 * 7))); stockQuoteRequest.setEndDate(new Date()); ZMQ.Socket zmqSocket = zmqContext.socket(ZMQ.REQ); zmqSocket.setReceiveTimeOut(RECEIVE_TIMEOUT); zmqSocket.connect("tcp://localhost:9090"); zmqSocket.send(stockQuoteRequest.toProtocolBufferAsBytes(), 0); List quotes = StockQuote.listOfStockQuotesFromBytes(zmqSocket.recv(0)); We start off by defining the StockQuoteRequest object to give us all the quotes for Facebook stock from the last week. We can then open up a ZMQ socket, set the timeout, connect to the ZMQ socket on the remote Mule instance and send the byte representation of the StockQuoteRequest to it. zmqSocket.recv is then used to receive the bytes back from Mule. From here we can use the listOfStockQuotesFromBytes helper method we wrote above to convert the Protocol Buffer representation to a List of StockQuotes. Despite the fair bit of plumbing we did above, this is a pretty concise bit of client side code to invoke the remote service. Conclusion This blog post only touched on the features of ØMQ and the ØMQ Mule Module. In addition to request-reply, other exchange-patterns are supported, like one-way, push and pull. This effectively gives you the benefits of a reliable, asynchronous messaging layer without a centralized infrastructure. I hope to cover this in a later post. Protocol buffers also seem like a natural fit as a wire format for ØMQ. protobuffers echo ØMQ’s principals of being lightweight, fast and platform agnostic. These are also, not coincidently, principals Mule shares as an integration framework. The project for this example is available on GitHub.

November 26, 2012

by John D'Emic

· 28,583 Views

Exporting and Importing VM Settings with Azure Command-Line Tools

We've talked previously about the Windows Azure command-line tools, and have used them in a few posts such as Brian's Migrating Drupal to a Windows Azure VM. While the tools are generally useful for tons of stuff, one of the things that's been painful to do with the command-line is export the settings for a VM, and then recreate the VM from those settings. You might be wondering why you'd want to export a VM and then recreate it. For me, cost is the first thing that comes to mind. It costs more to keep a VM running than it does to just keep the disk in storage. So if I had something in a VM that I'm only using a few hours a day, I'd delete the VM when I'm not using it and recreate it when I need it again. Another potential reason is that you want to create a copy of the disk so that you can create a duplicate virtual machine. The export process used to be pretty arcane stuff; using the azure vm show command with a --json parameter and piping the output to file. Then hacking the .json file to fix it up so it could be used with the azure vm create-from command. It was bad. It was so bad, the developers added a new export command to create the .json file for you. Here's the basic process: Create a VM VM creation has been covered multiple ways already; you're either going to use the portal or command line tools, and you're either going to select an image from the library or upload a VHD. In my case, I used the following command: azure vm create larryubuntu CANONICAL__Canonical-Ubuntu-12-04-amd64-server-20120528.1.3-en-us-30GB.vhd larry NotaRe This command creates a new VM in the East US data center, enables SSH on port 22 and then stores a disk image for this VM in a blob. You can see the new disk image in blob storage by running: azure vm disk list The results should return something like: info: Executing command vm disk list + Fetching disk images data: Name OS data: ---------------------------------------- ------- data: larryubuntu-larryubuntu-0-20121019170709 Linux info: vm disk list command OK That's the actual disk image that is mounted by the VM. Export and Delete the VM Alright, I've done my work and it's the weekend. I need to export the VM settings so I can recreate it on Monday, then delete the VM so I won't get charged for the next 48 hours of not working. To export the settings for the VM, I use the following command: azure vm export larryubuntu c:\stuff\vminfo.json This tells Windows Azure to find the VM named larryubuntu and export its settings to c:\stuff\vminfo.json. The .json file will contain something like this: { "RoleName":"larryubuntu", "RoleType":"PersistentVMRole", "ConfigurationSets": [ { "ConfigurationSetType":"NetworkConfiguration", "InputEndpoints": [ { "LocalPort":"22", "Name":"ssh", "Port":"22", "Protocol":"tcp", "Vip":"168.62.177.227" } ], "SubnetNames":[] } ], "DataVirtualHardDisks":[], "OSVirtualHardDisk": { "HostCaching":"ReadWrite", "DiskName":"larryubuntu-larryubuntu-0-20121024155441", "OS":"Linux" }, "RoleSize":"Small" } If you're like me, you'll immediately start thinking "Hrmmm, I wonder if I can mess around with things like RoleSize." And yes, you can. If you wanted to bump this up to medium, you'd just change that parameter to medium. If you want to play around more with the various settings, it looks like the schema is maintained at https://github.com/WindowsAzure/azure-sdk-for-node/blob/master/lib/services/serviceManagement/models/roleschema.json. Once I've got the file, I can safely delete the VM by using the following command. azure vm delete larryubuntu It spins a bit and then no more VM. Recreate the VM Ugh, Monday. Time to go back to work, and I need my VM back up and running. So I run the following command: azure vm create-from larryubuntu c:\stuff\vminfo.json --location "East US" It takes only a minute or two to spin up the VM and it's ready for work. That's it - fast, simple, and far easier than the old process of generating the .json settings file. Note that I haven't played around much with the various settings described in the schema for the json file that I linked above. If you find anything useful or interesting that can be accomplished by hacking around with the .json, leave a comment about it.

October 29, 2012

by Larry Franks

· 6,457 Views

PartitionKey and RowKey in Windows Azure Table Storage

For the past few months, I’ve been coaching a “Microsoft Student Partner” (who has a great blog on Kinect for Windows by the way!) on Windows Azure. One of the questions he recently had was around PartitionKey and RowKey in Windows Azure Table Storage. What are these for? Do I have to specify them manually? Let’s explain… Windows Azure storage partitions All Windows Azure storage abstractions (Blob, Table, Queue) are built upon the same stack (whitepaper here). While there’s much more to tell about it, the reason why it scales is because of its partitioning logic. Whenever you store something on Windows Azure storage, it is located on some partition in the system. Partitions are used for scale out in the system. Imagine that there’s only 3 physical machines that are used for storing data in Windows Azure storage: Based on the size and load of a partition, partitions are fanned out across these machines. Whenever a partition gets a high load or grows in size, the Windows Azure storage management can kick in and move a partition to another machine: By doing this, Windows Azure can ensure a high throughput as well as its storage guarantees. If a partition gets busy, it’s moved to a server which can support the higher load. If it gets large, it’s moved to a location where there’s enough disk space available. Partitions are different for every storage mechanism: In blob storage, each blob is in a separate partition. This means that every blob can get the maximal throughput guaranteed by the system. In queues, every queue is a separate partition. In tables, it’s different: you decide how data is co-located in the system. PartitionKey in Table Storage In Table Storage, you have to decide on the PartitionKey yourself. In essence, you are responsible for the throughput you’ll get on your system. If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition. Should you set the PartitionKey to the same value for every entity stored? No. You’ll end up with scaling issues at some point. Should you set the PartitionKey to a unique value for every entity stored? No. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult. And that’s where our next concept kicks in… RowKey in Table Storage A RowKey in Table Storage is a very simple thing: it’s your “primary key” within a partition. PartitionKey + RowKey form the composite unique identifier for an entity. Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition. So in essence, a RowKey is just the identifier of an entity within a partition. PartitionKey and RowKey and performance Before building your code, it’s a good idea to think about both properties. Don’t just assign them a guid or a random string as it does matter for performance. The fastest way of querying? Specifying both PartitionKey and RowKey. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on RowKey within that partition. Less fast but still fast enough will be querying by specifying PartitionKey: table storage will know which partition to query. Less fast: querying on only RowKey. Doing this will give table storage no pointer on which partition to search in, resulting in a query that possibly spans multiple partitions, possibly multiple storage nodes as well. Wihtin a partition, searching on RowKey is still pretty fast as it’s a unique index. Slow: searching on other properties (again, spans multiple partitions and properties). Note that Windows Azure storage may decide to group partitions in so-called "Range partitions" - see http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx. In order to improve query performance, think about your PartitionKey and RowKey upfront, as they are the fast way into your datasets. Deciding on PartitionKey and RowKey Here’s an exercise: say you want to store customers, orders and orderlines. What will you choose as the PartitionKey (PK) / RowKey (RK)? Let’s use three tables: Customer, Order and Orderline. An ideal setup may be this one, depending on how you want to query everything: Customer (PK: sales region, RK: customer id) – it enables fast searches on region and on customer id Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: order line id) – allows fast querying on both order id as well as order line id. Of course, depending on the system you are building, the following may be a better setup: Customer (PK: customer id, RK: display name) – it enables fast searches on customer id and display name Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: item id) – allows fast querying on both order id as well as the item bought, of course given that one order can only contain one order line for a specific item (PK + RK should be unique) You see? Choose them wisely, depending on your queries. And maybe an important sidenote: don’t be afraid of denormalizing your data and storing data twice in a different format, supporting more query variations. There’s one additional “index” That’s right! People have been asking Microsoft for a secondary index. And it’s already there… The table name itself! Take our customer – order – orderline sample again… Having a Customer table containing all customers may be interesting to search within that data. But having an Orders table containing every order for every customer may not be the ideal solution. Maybe you want to create an order table per customer? Doing that, you can easily query the order id (it’s the table name) and within the order table, you can have more detail in PK and RK. And there's one more: your account name. Split data over multiple storage accounts and you have yet another "partition". Conclusion In conclusion? Choose PartitionKey and RowKey wisely. The more meaningful to your application or business domain, the faster querying will be and the more efficient table storage will work in the long run.

October 19, 2012

by Maarten Balliauw

· 57,770 Views · 10 Likes

EasyNetQ Cluster Support

EasyNetQ, my super simple .NET API for RabbitMQ, now (from version 0.7.2.34) supports RabbitMQ clusters without any need to deploy a load balancer. Simply list the nodes of the cluster in the connection string ... var bus = RabbitHutch.CreateBus("host=ubuntu:5672,ubuntu:5673"); In this example I have set up a cluster on a single machine, 'ubuntu', with node 1 on port 5672 and node 2 on port 5673. When the CreateBus statement executes, EasyNetQ will attempt to connect to the first host listed (ubuntu:5672). If it fails to connect it will attempt to connect to the second host listed (ubuntu:5673). If neither node is available it will sit in a re-try loop attempting to connect to both servers every five seconds. It logs all this activity to the registered IEasyNetQLogger. You might see something like this if the first node was unavailable: DEBUG: Trying to connect ERROR: Failed to connect to Broker: 'ubuntu', Port: 5672 VHost: '/'. ExceptionMessage: 'None of the specified endpoints were reachable' DEBUG: OnConnected event fired INFO: Connected to RabbitMQ. Broker: 'ubuntu', Port: 5674, VHost: '/' If the node that EasyNetQ is connected to fails, EasyNetQ will attempt to connect to the next listed node. Once connected, it will re-declare all the exchanges and queues and re-start all the consumers. Here's an example log record showing one node failing then EasyNetQ connecting to the other node and recreating the subscribers: INFO: Disconnected from RabbitMQ Broker DEBUG: Trying to connect DEBUG: OnConnected event fired DEBUG: Re-creating subscribers INFO: Connected to RabbitMQ. Broker: 'ubuntu', Port: 5674, VHost: '/' You get automatic fail-over out of the box. That’s pretty cool. If you have multiple services using EasyNetQ to connect to a RabbitMQ cluster, they will all initially connect to the first listed node in their respective connection strings. For this reason the EasyNetQ cluster support is not really suitable for load balancing high throughput systems. I would recommend that you use a dedicated hardware or software load balancer instead, if that’s what you want.

October 14, 2012

by Mike Hadlow

· 6,886 Views

How to Create and Deploy a Website with Windows Azure

Curator's note: This article originally appeared at WindowsAzure.com. To use this feature and other new Windows Azure capabilities, sign up for the free preview. Just as you can quickly create and deploy a web application created from the gallery, you can also deploy a website created on a workstation with traditional developer tools from Microsoft or other companies. Table of Contents Deployment Options How to: Create a Website Using the Management Portal How to: Create a Website from the Gallery How to: Delete a Website Next Steps Deployment Options Windows Azure supports deploying websites from remote computers using WebDeploy, FTP, GIT or TFS. Many development tools provide integrated support for publication using one or more of these methods and may only require that you provide the necessary credentials, site URL and hostname or URL for your chosen deployment method. Credentials and deployment URLs for all enabled deployment methods are stored in the website's publish profile, a file which can be downloaded in the Windows Azure (Preview) Management Portal from the Quick Start page or the quick glance section of the Dashboard page. If you prefer to deploy your website with a separate client application, high quality open source GIT and FTP clients are available for download on the Internet for this purpose. How to: Create a Website Using the Management Portal Follow these steps to create a website in Windows Azure. Login to the Windows Azure (Preview) Management Portal. Click the Create New icon on the bottom left of the Management Portal. Click the Web Site icon, click the Quick Create icon, enter a value for URL and then click the check mark next to create web site on the bottom right corner of the page. When the website has been created you will see the text Creation of Web Site '[SITENAME]' Completed. Click the name of the website displayed in the list of websites to open the website's Quick Start management page. On the Quick Start page you are provided with options to set up TFS or GIT publishing if you would like to deploy your finished website to Windows Azure using these methods. FTP publishing is set up by default for websites and the FTP Host name is displayed under FTP Hostname on the Quick Start and Dashboard pages. Before publishing with FTP or GIT choose the option to Reset deployment credentials on the Dashboard page. Then specify the new credentials (username and password) to authenticate against the FTP Host or the Git Repository when deploying content to the website. The Configure management page exposes several configurable application settings in the following sections: Framework: Set the version of .NET framework or PHP required by your web application. Diagnostics: Set logging options for gathering diagnostic information for your website in this section. App Settings: Specify name/value pairs that will be loaded by your web application on start up. For .NET sites, these settings will be injected into your .NET configuration AppSettings at runtime, overriding existing settings. For PHP and Node sites these settings will be available as environment variables at runtime. Connection Strings: View connection strings for linked resources. For .NET sites, these connection strings will be injected into your .NET configuration connectionStrings settings at runtime, overriding existing entries where the key equals the linked database name. For PHP and Node sites these settings will be available as environment variables at runtime. Default Documents: Add your web application's default document to this list if it is not already in the list. If your web application contains more than one of the files in the list then make sure your website's default document appears at the top of the list. How to: Create a Website from the Gallery The gallery makes available a wide range of popular web applications developed by Microsoft, third party companies, and open source software initiatives. Web applications created from the gallery do not require installation of any software other than the browser used to connect to the Windows Azure Management Portal. In this tutorial, you'll learn: How to create a new site through the gallery. How to deploy the site through the Windows Azure Portal. You'll build a Word press blog that uses a default template. The following illustration shows the completed application: Note To complete this tutorial, you need a Windows Azure account that has the Windows Azure Web Sites feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see Create a Windows Azure account and enable preview features. Create a web site in the portal Login to the Windows Azure Management Portal. Click the New icon on the bottom left of the dashboard. Click the Web Site icon, and click From Gallery. Locate and click the WordPress icon in list, and then click Next. On the Configure Your App page, enter or select values for all fields: Enter a URL name of your choice Leave Create a new MySQL database selected in the Database field Select the region closest to you Then click Next. On the Create New Database page, you can specify a name for your new MySQL database or use the default name. Select the region closest to you as the hosting location. Select the box at the bottom of the screen to agree to ClearDB's usage terms for your hosted MySQL database. Then click the check to complete the site creation. After you click Complete Windows Azure will initiate build and deploy operations. While the web site is being built and deployed the status of these operations is displayed at the bottom of the Web Sites page. After all operations are performed, A final status message when the site has been successfully deployed. Launch and manage your WordPress site Click on your new site from the Web Sites page to open the dashboard for the site. On the Dashboard management page, scroll down and click the link on the left under Site Url to open the site’s welcome page. Enter appropriate configuration information required by WordPress and click Install WordPress to finalize configuration and open the web site’s login page. Login to the new WordPress web site by entering the username and password that you specified on the Welcome page. You'll have a new WordPress site that looks similar to the site below. How to: Delete a Website Websites are deleted using the Delete icon in the Windows Azure Management Portal. The Delete icon is available in the Windows Azure Portal when you click Web Sites to list all of your websites and at the bottom of each of the website management pages. Next Steps For more information about Websites, see the following: Walkthrough: Troubleshooting a Website on Windows Azure

October 9, 2012

by Eric Gregory

· 85,359 Views

Your First Hadoop MapReduce Job

Hadoop MapReduce is a YARN-based system for parallel processing of large data sets. In this article, learn to quickly start writing the simplest MapReduce job.

September 12, 2012

by Amresh Singh

· 19,693 Views

How to Write Better POJO Services

In Java, you can easily implement some business logic in Plain Old Java Object (POJO) classes, and then able to run them in a fancy server or framework without much hassle. There many server/frameworks, such as JBossAS, Spring or Camel etc, that would allow you to deploy POJO without even hardcoding to their API. Obviously you would get advance features if you willing to couple to their API specifics, but even if you do, you can keep these to minimal by encapsulating your own POJO and their API in a wrapper. By writing and designing your own application as simple POJO as possible, you will have the most flexible ways in choose a framework or server to deploy and run your application. One effective way to write your business logic in these environments is to use Service component. In this article I will share few things I learned in writing Services. What is a Service? The word Service is overly used today, and it could mean many things to different people. When I say Service, my definition is a software component that has minimal of life-cycles such as init, start, stop, and destroy. You may not need all these stages of life-cycles in every service you write, but you can simply ignore ones that don't apply. When writing large application that intended for long running such as a server component, definining these life-cycles and ensure they are excuted in proper order is crucial! I will be walking you through a Java demo project that I have prepared. It's very basic and it should run as stand-alone. The only dependency it has is the SLF4J logger. If you don't know how to use logger, then simply replace them with System.out.println. However I would strongly encourage you to learn how to use logger effectively during application development though. Also if you want to try out the Spring related demos, then obviously you would need their jars as well. Writing basic POJO service You can quickly define a contract of a Service with life-cycles as below in an interface. package servicedemo; public interface Service { void init(); void start(); void stop(); void destroy(); boolean isInited(); boolean isStarted(); } Developers are free to do what they want in their Service implementation, but you might want to give them an adapter class so that they don't have to re-write same basic logic on each Service. I would provide an abstract service like this: package servicedemo; import java.util.concurrent.atomic.*; import org.slf4j.*; public abstract class AbstractService implements Service { protected Logger logger = LoggerFactory.getLogger(getClass()); protected AtomicBoolean started = new AtomicBoolean(false); protected AtomicBoolean inited = new AtomicBoolean(false); public void init() { if (!inited.get()) { initService(); inited.set(true); logger.debug("{} initialized.", this); } } public void start() { // Init service if it has not done so. if (!inited.get()) { init(); } // Start service now. if (!started.get()) { startService(); started.set(true); logger.debug("{} started.", this); } } public void stop() { if (started.get()) { stopService(); started.set(false); logger.debug("{} stopped.", this); } } public void destroy() { // Stop service if it is still running. if (started.get()) { stop(); } // Destroy service now. if (inited.get()) { destroyService(); inited.set(false); logger.debug("{} destroyed.", this); } } public boolean isStarted() { return started.get(); } public boolean isInited() { return inited.get(); } @Override public String toString() { return getClass().getSimpleName() + "[id=" + System.identityHashCode(this) + "]"; } protected void initService() { } protected void startService() { } protected void stopService() { } protected void destroyService() { } } This abstract class provide the basic of most services needs. It has a logger and states to keep track of the life-cycles. It then delegate new sets of life-cycle methods so subclass can choose to override. Notice that the start() method is checking auto calling init() if it hasn't already done so. Same is done in destroy() method to the stop() method. This is important if we're to use it in a container that only have two stages life-cycles invocation. In this case, we can simply invoke start() and destroy() to match to our service's life-cycles. Some frameworks might go even further and create separate interfaces for each stage of the life-cycles, such as InitableService or StartableService etc. But I think that would be too much in a typical app. In most of the cases, you want something simple, so I like it just one interface. User may choose to ignore methods they don't want, or simply use an adaptor class. Before we end this section, I would throw in a silly Hello world service that can be used in our demo later. package servicedemo; public class HelloService extends AbstractService { public void initService() { logger.info(this + " inited."); } public void startService() { logger.info(this + " started."); } public void stopService() { logger.info(this + " stopped."); } public void destroyService() { logger.info(this + " destroyed."); } } Managing multiple POJO Services with a container Now we have the basic of Service definition defined, your development team may start writing business logic code! Before long, you will have a library of your own services to re-use. To be able group and control these services into an effetive way, we want also provide a container to manage them. The idea is that we typically want to control and manage multiple services with a container as a group in a higher level. Here is a simple implementation for you to get started: package servicedemo; import java.util.*; public class ServiceContainer extends AbstractService { private List services = new ArrayList(); public void setServices(List services) { this.services = services; } public void addService(Service service) { this.services.add(service); } public void initService() { logger.debug("Initializing " + this + " with " + services.size() + " services."); for (Service service : services) { logger.debug("Initializing " + service); service.init(); } logger.info(this + " inited."); } public void startService() { logger.debug("Starting " + this + " with " + services.size() + " services."); for (Service service : services) { logger.debug("Starting " + service); service.start(); } logger.info(this + " started."); } public void stopService() { int size = services.size(); logger.debug("Stopping " + this + " with " + size + " services in reverse order."); for (int i = size - 1; i >= 0; i--) { Service service = services.get(i); logger.debug("Stopping " + service); service.stop(); } logger.info(this + " stopped."); } public void destroyService() { int size = services.size(); logger.debug("Destroying " + this + " with " + size + " services in reverse order."); for (int i = size - 1; i >= 0; i--) { Service service = services.get(i); logger.debug("Destroying " + service); service.destroy(); } logger.info(this + " destroyed."); } } From above code, you will notice few important things: We extends the AbstractService, so a container is a service itself. We would invoke all service's life-cycles before moving to next. No services will start unless all others are inited. We should stop and destroy services in reverse order for most general use cases. The above container implementation is simple and run in synchronized fashion. This mean, you start container, then all services will start in order you added them. Stop should be same but in reverse order. I also hope you would able to see that there is plenty of room for you to improve this container as well. For example, you may add thread pool to control the execution of the services in asynchronized fashion. Running POJO Services Running services with a simple runner program. In the simplest form, we can run our POJO services on our own without any fancy server or frameworks. Java programs start its life from a static main method, so we surely can invoke init and start of our services in there. But we also need to address the stop and destroy life-cycles when user shuts down the program (usually by hitting CTRL+C.) For this, the Java has the java.lang.Runtime#addShutdownHook() facility. You can create a simple stand-alone server to bootstrap Service like this: package servicedemo; import org.slf4j.*; public class ServiceRunner { private static Logger logger = LoggerFactory.getLogger(ServiceRunner.class); public static void main(String[] args) { ServiceRunner main = new ServiceRunner(); main.run(args); } public void run(String[] args) { if (args.length < 1) throw new RuntimeException("Missing service class name as argument."); String serviceClassName = args[0]; try { logger.debug("Creating " + serviceClassName); Class serviceClass = Class.forName(serviceClassName); if (!Service.class.isAssignableFrom(serviceClass)) { throw new RuntimeException("Service class " + serviceClassName + " did not implements " + Service.class.getName()); } Object serviceObject = serviceClass.newInstance(); Service service = (Service)serviceObject; registerShutdownHook(service); logger.debug("Starting service " + service); service.init(); service.start(); logger.info(service + " started."); synchronized(this) { this.wait(); } } catch (Exception e) { throw new RuntimeException("Failed to create and run " + serviceClassName, e); } } private void registerShutdownHook(final Service service) { Runtime.getRuntime().addShutdownHook(new Thread() { public void run() { logger.debug("Stopping service " + service); service.stop(); service.destroy(); logger.info(service + " stopped."); } }); } } With abover runner, you should able to run it with this command: $ java demo.ServiceRunner servicedemo.HelloService Look carefully, and you'll see that you have many options to run multiple services with above runner. Let me highlight couple: Improve above runner directly and make all args for each new service class name, instead of just first element. Or write a MultiLoaderService that will load multiple services you want. You may control argument passing using System Properties. Can you think of other ways to improve this runner? Running services with Spring The Spring framework is an IoC container, and it's well known to be easy to work POJO, and Spring lets you wire your application together. This would be a perfect fit to use in our POJO services. However, with all the features Spring brings, it missed a easy to use, out of box main program to bootstrap spring config xml context files. But with what we built so far, this is actually an easy thing to do. Let's write one of our POJO Service to bootstrap a spring context file. package servicedemo; import org.springframework.context.ConfigurableApplicationContext; import org.springframework.context.support.FileSystemXmlApplicationContext; public class SpringService extends AbstractService { private ConfigurableApplicationContext springContext; public void startService() { String springConfig = System.getProperty("springContext", "spring.xml); springContext = new FileSystemXmlApplicationContext(springConfig); logger.info(this + " started."); } public void stopService() { springContext.close(); logger.info(this + " stopped."); } } With that simple SpringService you can run and load any spring xml file. For example try this: $ java -DspringContext=config/service-demo-spring.xml demo.ServiceRunner servicedemo.SpringService Inside the config/service-demo-spring.xml file, you can easily create our container that hosts one or more service in Spring beans. Notice that I only need to setup init-method and destroy-method once on the serviceContainer bean. You can then add one or more other service such as the helloService as much as you want. They will all be started, managed, and then shutdown when you close the Spring context. Note that Spring context container did not explicitly have the same life-cycles as our services. The Spring context will automatically instanciate all your dependency beans, and then invoke all beans who's init-method is set. All that is done inside the constructor of FileSystemXmlApplicationContext. No explicit init method is called from user. However at the end, during stop of the service, Spring provide the springContext#close() to clean things up. Again, they do not differentiate stop from destroy. Because of this, we must merge our init and start into Spring's init state, and then merge stop and destroy into Spring's close state. Recall our AbstractService#destory will auto invoke stop if it hasn't already done so. So this is trick that we need to understand in order to use Spring effectively. Running services with JEE app server In a corporate env, we usually do not have the freedom to run what we want as a stand-alone program. Instead they usually have some infrustructure and stricter standard technology stack in place already, such as using a JEE application server. In these situation, the most portable way to run POJO services is in a war web application. In a Servlet web application, you can write a class that implements javax.servlet.ServletContextListener and this will provide you the life-cycles hook via contextInitialized and contextDestroyed. In there, you can instanciate your ServiceContainer object and call start and destroy methods accordingly. Here is an example that you can explore: package servicedemo; import java.util.*; import javax.servlet.*; public class ServiceContainerListener implements ServletContextListener { private static Logger logger = LoggerFactory.getLogger(ServiceContainerListener.class); private ServiceContainer serviceContainer; public void contextInitialized(ServletContextEvent sce) { serviceContainer = new ServiceContainer(); List services = createServices(); serviceContainer.setServices(services); serviceContainer.start(); logger.info(serviceContainer + " started in web application."); } public void contextDestroyed(ServletContextEvent sce) { serviceContainer.destroy(); logger.info(serviceContainer + " destroyed in web application."); } private List createServices() { List result = new ArrayList(); // populate services here. return result; } } You may configure above in the WEB-INF/web.xml like this: servicedemo.ServiceContainerListener The demo provided a placeholder that you must add your services in code. But you can easily make that configurable using the web.xml for context parameters. If you were to use Spring inside a Servlet container, you may directly use their org.springframework.web.context.ContextLoaderListener class that does pretty much same as above, except they allow you to specify their xml configuration file using the contextConfigLocation context parameter. That's how a typical Spring MVC based application is configure. Once you have this setup, you can experiment our POJO service just as the Spring xml sample given above to test things out. You should see our service in action by your logger output. PS: Actually what we described here are simply related to Servlet web application, and not JEE specific. So you can use Tomcat server just fine as well. The importance of Service's life-cycles and it's real world usage All the information I presented here are not novelty, nor a killer design pattern. In fact they have been used in many popular open source projects. However, in my past experience at work, folks always manage to make these extremely complicated, and worse case is that they completely disregard the importance of life-cycles when writing services. It's true that not everything you going to write needs to be fitted into a service, but if you find the need, please do pay attention to them, and take good care that they do invoked properly. The last thing you want is to exit JVM without clean up in services that you allocated precious resources for. These would become more disastrous if you allow your application to be dynamically reloaded during deployment without exiting JVM, in which will lead to system resources leakage. The above Service practice has been put into use in the TimeMachine project. In fact, if you look at the timemachine.scheduler.service.SchedulerEngine, it would just be a container of many services running together. And that's how user can extend the scheduler functionalities as well, by writing a Service. You can load these services dynamically by a simple properties file.

September 4, 2012

by Zemian Deng

· 39,262 Views

How to Migrate Drupal to Azure Web Sites

DrupalCon Munich is next week, and I am lucky enough to be going. As part of preparing for the conference, I thought it would be worthwhile to see just how easy (or difficult) it would be to migrate an existing Drupal site to Windows Azure Web Sites. So, in this post, I’ll do just that. Fortunately, because Windows Azure Web Sites supports both PHP and MySQL, the migration process is relatively straightforward. And, because Drupal and PHP run on any platform, the process I’ll describe should work for moving Drupal to Windows Azure Web Sites regardless of what platform you are moving from. Of course, Drupal installations can vary widely, so YMMV. I tested the instructions below on relatively small (and simple) Drupal installation running on CentOS 5. (Unfortunately, I won’t be using Drush since it isn’t supported on Windows Azure Websites.) If you are considering moving a large and complex Drupal application, may want to consider moving to Windows Azure Cloud Services (more information about that here: Migrating a Drupal Site from LAMP to Windows Azure). Before getting started, it’s worth noting that Windows Azure Websites lets you run up to 10 Web Sites for free in a multitenant environment. And, you can seamlessly upgrade to private, reserved VM instances as your traffic grows. To sign up, try the Windows Azure 90-day free trial. 1. Create a Windows Azure Web Site and MySQL database There is a step-by-step tutorial on http://www.windowsazure.com that walks you through creating a new website and a MySQL database, so I’ll refer you there to get started: Create a PHP-MySQL Windows Azure web site and deploy using Git. If you intend to use Git to publish your Drupal site, then go ahead and follow the instructions for setting up a Git repository. Make sure to follow the instructions in the Get remote MySQL connection information section as you will need that information later. You can ignore the remainder of the tutorial for the purposes of deploying your Drupal site, but if you are new to Windows Azure Web Sites (and to Git), you might find the additional reading informative. Ok, now you have a new website with a MySQL database, your have your MySQL database connection information, and you have (optionally) created a remote Git repository and made note of the Git deployment instructions. Now you are ready to copy your database to MySQL in Windows Azure Web Sites. 2. Copy database to MySQL in Windows Azure Web Sites I’m sure there is more than one way to copy your Drupal database, but I found the mysqldump tool to be effective and easy to use. To copy from a local machine to Windows Azure Web Sites, here’s the command I used: mysqldump -u local_username --password=local_password drupal | mysql -h remote_host -u remote_username --password=remote_password remote_db_name You will, of course, have to provide the username and password for your existing Drupal database, and you will have to provide the hostname, username, password, and database name for the MySQL database you created in step 1. This information is available in the connection string information that you should have noted in step 1. i.e. You should have a connection string that looks something like this: Database=remote_db_name;Data Source=remote_host;User Id=remote_username;Password=remote_password Depending on the size of your database, the copying process could take several minutes. Now your Drupal database is live in Windows Azure Websites. Before you deploy your Drupal code, you need to modify it so it can connect to the new database. 3. Modify database connection info in settings.php Here, you will again need your new database connection information. Open the /drupal/sites/default/setting.php file in your favorite text editor, and replace the values of ‘database’, ‘username’, ‘password’, and ‘host’ in the $databases array with the correct values for your new database. When you are finished, you should have something similar to this: $databases = array ( 'default' => array ( 'default' => array ( 'database' => 'remote_db_name', 'username' => 'remote_username', 'password' => 'remote_password', 'host' => 'remote_host', 'port' => '', 'driver' => 'mysql', 'prefix' => '', ), ), ); Be sure to save the settings.phpfile, then you are ready to deploy. 4. Deploy Drupal code using Git or FTP The last step is to deploy your code to Windows Azure Web Sites using Git or FTP. If you are using FTP, you can get the FTP hostname and username from you website’s dashboard. Then, use your favorite FTP client to upload your Drupal files to the /site/wwwroot folder of the remote site. If you are using Git, you need to set up a Git repository in Windows Azure Web Sites (steps for this are in the tutorial mentioned earlier). And, you will need Git installed on your local machine. Then, just follow the instructions provided after you created the repository: One note about using Git here: depending on your Git settings, your .gitignore file (a hidden file and a sibling to the .git folder created in your local root directory after you executed git commit), some files in your Drupal application may be ignored. In my case, all the files in the sites directory were ignored. If this happens, you will want to edit the .gitignore file so that these files aren’t ignored and redeploy. After you have deployed Drupal to Windows Azure Web Sites, you can continue to deploy updates via Git or FTP. Related information If you are looking for more information about Windows Azure Web Sites, these posts might be helpful: Windows Azure Websites- A PHP Perspective Windows Azure Websites, Web Roles, and VMs- When to use which- Configuring PHP in Windows Azure Websites with .user.ini Files One last thing you might consider, depending on your site, is using the Windows Azure Integration Module to store and serve your site’s media files.

August 19, 2012

by Brian Swan

· 10,268 Views