Containers Resources

The Latest Containers Topics

Mule ESB in Docker

In this article I will attempt to run the Mule ESB community edition in Docker in order to see whether it is feasible without any greater inconvenience. My goal is to be able to use Docker both when testing as well as in a production environment in order to gain better control over the environment and to separate different types of environments. I imagine that most of the Docker-related information can be applied to other applications – I have used Mule since it is what I usually work with. The conclusion I have made after having completed my experiments is that it is possible to run Mule ESB in Docker without any inconvenience. In addition, Docker will indeed allow me to have better control over the different environments and also allow me to separate them as I find appropriate. Finally, I just want to mention that I have used Docker in an Ubuntu environment. I have not attempted any of the exercises in Docker running on Windows or Mac OS X. Docker Briefly In short, Docker allows for creating of images that serve as blueprints for containers. A Docker container is an instance of a Docker image in the same way a Java object is an instance of a Java class. FROM codingtony/java MAINTAINER tony(dot)bussieres(at)ticksmith(dot)com RUN wget https://repository.mulesoft.org/nexus/content/repositories/releases/org/mule/distributions/mule-standalone/3.5.0/mule-standalone-3.5.0.tar.gz RUN cd /opt && tar xvzf ~/mule-standalone-3.5.0.tar.gz RUN echo "4a94356f7401ac8be30a992a414ca9b9 /mule-standalone-3.5.0.tar.gz" | md5sum -c RUN rm ~/mule-standalone-3.5.0.tar.gz RUN ln -s /opt/mule-standalone-3.5.0 /opt/mule CMD [ "/opt/mule/bin/mule" ] The resource isolation features of Linux are used to create Docker containers, which are more lightweight than virtual machines and are separated from the environment in which Docker runs, the host. Using Docker an image can be created that, every time it is started has a known state. In order to remove any doubts about whether the environment has been altered in any way, the container can be stopped and a new container started. I can even run multiple Docker containers on one and the same computer to simulate a multi-server production environment. Applications can also be run in their own Docker containers, as shown in this figure. Three Docker containers, each containing a specific application, running in one host. A more detailed introduction to Docker is available here. The main entry point to the Docker documentation can be found here. Motivation Some of the motivations I have for using Docker in both testing and production environments are: The environment in which I test my application should be as similar as the final deployment environment as possible, if not identical. Making the deployment environment easy to scale up and down. If it is easy to start a new processing node when need arise and stop it if it is no longer used, I will be able to adapt to changes rather quickly and thus reduce errors caused by, for instance, load peaks. Maintain an increased number of nodes to which applications can be deployed. Instead of running one instance of some kind of application server, Mule ESB in my case, on a computer, I want multiple instances that are partitioned, for instance, according to importance. High-priority applications run on one separate instance, which have higher priority both as far as resources (CPU, memory, disk etc) are concerned but also as far as support is concerned. Applications which are less critical run on another instance. Enable quick replacement of instances in the deployment environment. Reasons for having to replace instances may be hardware failure etc. Better control over the contents of the different environments. The concept of an environment that, at any time, may be disposed (and restarted) discourages hacks in the environment, which are usually poorly documented and sometimes difficult to trace. Using Docker, I need to change the appropriate Docker image if I want to make changes to some application environment. The Docker image file, commonly known as Dockerfile, can be checked into any ordinary revision control system, such as Git, Subversion etc, making changes reversible and traceable. Automate the creation of a testing environment. An example could be a nightly job that runs on my build server which creates a test environment, deploys one or more applications to it and then performs tests, such as load-testing. Prerequisites To get the best possible experience when running Docker, I run it under Ubuntu. According to the current documentation, Docker is supported under the following versions of Ubuntu: 12.04 LTS (64-bit) 13.04 (64-bit) 13.10 (64-bit) 14.04 (64-bit) Against my usual conservative self, I chose Ubuntu 14.10, which at the time of writing this article is the latest version. While I haven’t run into any issues, I cannot promise anything regarding compatibility with Docker as far as this version of Ubuntu is concerned. Installing Docker Before we install anything, those who have the Docker version from the Ubuntu repository should remove this version before installing a newer version of Docker, since the Ubuntu repository does not contain the most recent version and the package does not have the same name as the Docker package we will install: sudo apt-get remove docker.io The simplest way to install Docker is to use an installation script made available at the Docker website: curl -sSL https://get.docker.com/ubuntu/ | sudo sh If you are not running Ubuntu or if you do not want to use the above way of installing Docker, please refer to this page containing instructions on how to install Docker on various platforms. To verify the Docker installation, open a terminal window and enter: sudo docker version Output similar to the following should appear: Client version: 1.4.1 Client API version: 1.16 Go version (client): go1.3.3 Git commit (client): 5bc2ff8 OS/Arch (client): linux/amd64 Server version: 1.4.1 Server API version: 1.16 Go version (server): go1.3.3 Git commit (server): 5bc2ff8 We are now ready to start a Mule instance in Docker. Running Mule in Docker One of the advantages with Docker is that there is a large repository of Docker images that are ready to be used, and even extended if one so wishes. ThisDocker image is the one that I will use in this article. It is well documented, there is a source repository and it contains a recent version of the Mule ESB Community Edition. Some additional details on the Docker image: Ubuntu 14.04. Oracle JavaSE 1.7.0_65. This version will change as the PPA containing the package is updated. Mule ESB CE 3.5.0 Note that the image may change at any time and the specifications above may have changed. If you intend to use Docker in your organization, I would suspect that the best alternative is to create your own Docker images that are totally under your control. The Docker image repository is an excellent source of inspiration and aid even in this case. Starting a Docker Container To start a Docker container using this image, open a terminal window and write: sudo docker run codingtony/mule The first time an image is used it needs to be downloaded and created. This usually takes quite some time, so I suggest a short break here – perhaps for a cup of coffee or tea. If you just want to download an image without starting it, exchange the Docker command “run” with “pull”. Once the container is started, you will see some output to the console. If you are familiar with Mule, you will recognize the log output: MULE_HOME is set to /opt/mule-standalone-3.5.0 Running in console (foreground) mode by default, use Ctrl-C to exit... MULE_HOME is set to /opt/mule-standalone-3.5.0 Running Mule... --> Wrapper Started as Console Launching a JVM... Starting the Mule Container... Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved. INFO 2015-01-05 04:41:42,302 [WrapperListener_start_runner] org.mule.module.launcher.MuleContainer: ********************************************************************** * Mule ESB and Integration Platform * * Version: 3.5.0 Build: ff1df1f3 * * MuleSoft, Inc. * * For more information go to http://www.mulesoft.org * * * * Server started: 1/5/15 4:41 AM * * JDK: 1.7.0_65 (mixed mode) * * OS: Linux (3.16.0-28-generic, amd64) * * Host: f95698cfb796 (172.17.0.2) * ********************************************************************** Note that: In the text-box containing information about the Mule ESB and Integration Platform, there is a row which starts with “Host:”. The hexadecimal digit that follows is the Docker container id and the IP-address is the external IP-address of the Docker container in which Mule is running. Before we do anything with the Mule instance running in Docker, let’s take a look at Docker containers. Docker Containers We can verify that there is a Docker container running by opening another terminal window, or a tab in the first terminal window, and running the command: sudo docker ps As a result, you will see output similar to the following (I have edited the output in order for the columns to be aligned with the column titles): CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f95698cfb796 codingtony/mule:latest "/opt/mule/bin/mule" 7 min ago Up 7 min jolly_hopper From this output we can see that: The ID of the container is f95698cfb796. This ID can be used when performing operations on the container, such as stopping it, restarting it etc. The name of the image used to created the container. The command that is currently executing. If we look at the Dockerfile for the image, we can see that the last line in this file is: CMD [ “/opt/mule/bin/mule” ] This is the command that is executed whenever an instance of the Docker image is launched and it matches what we see in the COMMAND column for the Docker container. The CREATED column shows how much time has passed since the container was created. The STATUS column shows the current status of the image. When you have used Docker for a while, you can view all the containers using: sudo docker ps -a This will show you containers that are not running, in addition to the running ones. Containers that are not running can be restarted. The PORTS column shows any port mappings for the container. More about port mappings later. Finally, the NAMES column contain a more human-friendly container name. This container name can be used in the same way as the container id. Docker containers will consume disk-space and if you want to determine how much disk-space each of the containers on your computer use, issue the following command: sudo docker ps -a -s An additional column, SIZE, will be shown and in this column I see that my Mule container consumes 41,76kB. Note that this is in addition to the disk-space consumed by the Docker image. This number will grow if you use the container under a longer period of time, as the container retains any files written to disk. To completely remove a stopped Docker container, find the id or name of the container and use the command: sudo docker rm [container id or name here] Before going further, let’s stop the running container and remove it: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Files and Docker Containers So far we have managed to start a Mule instance running inside a Docker container, but there were no Mule applications deployed to it and the logs that were generated were only visible in the terminal window. I want to be able to deploy my applications to the Mule instance and examine the logs in a convenient way. In this section I will show how to: Share one or more directories in the host file-system with a Docker container. Access the files in a Docker container from the host. As the first step in looking at sharing directories between the host operating system and a Docker container, we are going to look at Mule logs. As part of this exercise we also set up the directories in the host operating system that are going to be shared with the Docker container. In your home directory, create a directory named “mule-root”. In the “mule-root” directory, create three directories named “apps”, “conf” and “logs”. Download the Mule CE 3.5.0 standalone distribution from this link. From the Mule CE 3.5.0 distribution, copy the files in the “apps” directory to the “mule-root/apps” directory you just created. From the Mule CE 3.5.0 distribution, copy the files in the “conf” directory to the “mule-root/conf” directory you created. The resulting file- and directory-structure should look like this (shown using the tree command): ~/mule-root/ ├── apps │ └── default │ └── mule-config.xml ├── conf │ ├── log4j.properties │ ├── tls-default.conf │ ├── tls-fips140-2.conf │ ├── wrapper-additional.conf │ └── wrapper.conf └── logs Edit the log4j.properties file in the “mule-root/conf” directory and set the log-level on the last line in the file to “DEBUG”. This modification has nothing to do with sharing directories, but is in order for us to be able to see some more output from Mule when we run it later. The last two lines should now look like this: # Mule classes log4j.logger.org.mule=DEBUG Binding Volumes We are now ready to launch a new Docker container and when we do, we will tell Docker to map three directories in the Docker container to three directories in the host operating system. Three directories in a Docker container bound to three directories in the host. Launch the Docker container with the command below. The -v option tells Docker that we want to make the contents of a directory in the host available at a certain path in the Docker container file-system. The -d option runs the container in the background and the terminal prompt will be available as soon as the id of the newly launched Docker container has been printed. sudo docker run -d -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule Examine the “mule-root” directory and its subdirectories in the host, which should now look like below. The files on the highlighted rows have been created by Mule. mule-root/ ├── apps │ ├── default │ │ └── mule-config.xml │ └── default-anchor.txt ├── conf │ ├── log4j.properties │ ├── tls-default.conf │ ├── tls-fips140-2.conf │ ├── wrapper-additional.conf │ └── wrapper.conf └── logs ├── mule-app-default.log ├── mule-domain-default.log └── mule.log Examine the “mule.log” file using the command “tail -f ~/mule-root/logs/mule.log”. There should be periodic output written to the log file similar to the following: DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Checking for changes... DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Current anchors: default-anchor.txt DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Deleted anchors: Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Direct Access to Docker Container Files When running Docker under the Ubuntu OS it is also possible to access the file-system of a Docker container from the host file-system. It may be possible to do this under other operating systems too, but I haven’t had the opportunity to test this. This technique may come in handy during development or testing with Docker containers for which you haven’t bound any volumes. Note! If given the choice to use either volume binding, as seen above, or direct access to container files as we will look at in this section for something more than a temporary file access, I would chose to use volume binding. Direct access to Docker container files relies on implementation details that I suspect may change in future versions of Docker if the developers find it suitable. With all that said, lets get the action started: Start a new Docker container: sudo docker run -d codingtony/mule Find the id of the newly launched Docker container: sudo docker ps Examine low-level information about the newly launched Docker container: sudo docker inspect [container id or name here] Output similar to this will be printed to the console (portions removed to conserve space): [{ "AppArmorProfile": "", "Args": [], "Config": { ... }, "Created": "2015-01-12T07:58:47.913905369Z", "Driver": "aufs", "ExecDriver": "native-0.2", "HostConfig": { ... }, "HostnamePath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/hostname", "HostsPath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/hosts", "Id": "68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef", "Image": "bcd0f37d48d4501ad64bae941d95446b157a6f15e31251e26918dbac542d731f", "MountLabel": "", "Name": "/thirsty_darwin", "NetworkSettings": { ... }, "Path": "/opt/mule/bin/mule", "ProcessLabel": "", "ResolvConfPath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/resolv.conf", "State": { ... }, "Volumes": {}, "VolumesRW": {} }] Locate the “Driver” node (highlighted in the above output) and ensure that its value is “aufs”. If it is not, you may need to modify the directory paths below replacing “aufs” with the value of this node. Personally I have only seen the “aufs” value at this node so anything else is uncharted territory to me. Copy the long hexadecimal value that can be found at the “Id” node (also highlighted in the above output). This is the long id of the Docker container. In a terminal window, issue the following command, inserting the long id of your container where noted: sudo ls -al /var/lib/docker/aufs/mnt/[long container id here] You are now looking at the root of the volume used by the Docker container you just launched. In the same terminal window, issue the following command: sudo ls -al /var/lib/docker/aufs/mnt/[long container id here]/opt The output from this command should look like this: total 12 drwxr-xr-x 4 root root 4096 jan 12 15:58 . drwxr-xr-x 75 root root 4096 jan 12 15:58 .. lrwxrwxrwx 1 root root 26 aug 10 04:19 mule -> /opt/mule-standalone-3.5.0 drwxr-xr-x 17 409 409 4096 jan 12 15:58 mule-standalone-3.5.0 Examine this line in the Dockerfile:RUN ln -s /opt/mule-standalone-3.5.0 /opt/muleWe see that a symbolic link is created and that the directory name and the name of the symbolic link matches the output we saw earlier. This matches the directory output in the previous step. To examine the Mule log file that we looked at when binding volumes earlier, use the following command: sudo cat /var/lib/docker/aufs/mnt/[long container id here]/opt/mule-standalone-3.5.0/logs/mule.log Next we create a new file in the Docker container using vi: sudo vi /var/lib/docker/aufs/mnt/[long container id here]/opt/mule-standalone-3.5.0/test.txt Enter some text into the new file by first pressing i and the type the text. When you are finished entering the text, press the Escape key and write the file to disk by typing the characters “:wq” without quotes. This writes the new contents of the file to disk and quits the editor. Leave the Docker container running after you are finished. In the next section, we are going to look at the file we just created from inside the Docker container. We have seen that we can examine the file system of a Docker container without binding volumes. It is also possible to copy or move files from the host file-system to the container’s file system using the regular commands. Root privileges are required both when examining and writing to the Docker container’s file system. Entering a Docker Container In order to verify that the file we just created in the host was indeed written to the Docker container, we are going to start a bash shell in the running Docker container and examine the location where the new file is expected to be located and the contents of the file. In the process we will see how we can execute commands in a Docker container from the host. Issue the command below in a terminal window. The exec Docker command is used to run a command, bash in this case, in a running Docker container. The -i flags tell Docker to keep the input stream open while the command is being executed. In this example, it allows us to enter commands into the bash shell running inside the Docker container. The -t flag cause Docker to allocate a text terminal to which the output from the command execution is printed. sudo docker exec -i -t [container id or name here] bash Note the prompt, which should change to [user]@[Docker container id]. In my case it looks like this: root@3ea374a280da:/# Go to the Mule installation directory using this command: cd /opt/mule-standalone-3.5.0/ Examine the contents of the directory: ls -al Among the other files, you should see the “test.txt” file: -rw-r--r-- 1 root root 53 Jan 14 03:19 test.txt Examine the contents of the “text.txt” file. The contents of the file should match what you entered earlier. cat text.txt Exit to the host OS: exit Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] We have seen that we can execute commands in a running Docker container. In this particular example, we used it to execute the bash shell and examine a file. I draw the conclusion that I should be able to set up a Docker image that contains a very controlled environment for some type of test and then create a container from that image and start the test from the host. Deploying a Mule Application In this section we will look at deploying a Mule application to an instance of the Mule ESB running in a Docker container. We will use volume binding, that we looked at in the section on files and Docker containers, to share directories in the host with the Docker container in order to make it easy to deploy applications, modify running applications, examine logs etc. Preparations Before deploying the application, we need to make some preparations: First of all, we restore the original log-level that we changed earlier. In this example, there will be log output when the applications we will deploy is run and we can limit the log generated by Mule. Edit the log4j.properties file in the “mule-root/conf” directory in the host and set the log-level on the last line in the file back to “INFO” and add one line, as in the listing below. The last three lines should now look like this: # Mule classes log4j.logger.org.mule=INFO log4j.logger.org.mule.tck.functional=DEBUG Next, we create the Mule application which we will deploy to the Mule ESB running in Docker: In some directory, create a file named “mule-deploy.properties” with the following contents: redeployment.enabled=true encoding=UTF-8 domain=default config.resources=HelloWorld.xml In the same directory create a file named “HelloWorld.xml”. This file contains the Mule configuration for our example application: Create a zip-archive named “mule-hello.zip” containing the two files created above: zip mule-hello.zip mule-deploy.properties HelloWorld.xml Deploy the Mule Application Before you start the Docker container in which the Mule EBS will run, make sure that you have created and prepared the directories in the host as described in the section Files and Docker Containers above. Start a new Mule Docker container using the command that we used when binding volumes: sudo docker run -d -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule As before, the -v option tells Docker to bind three directories in the host to three locations in the Docker container’s file system. Find the IP-address of the Docker container: sudo docker inspect [container id or name here] | grep IPAddress In my case, I see the following line which reveals the IP-address of the Docker container: “IPAddress”: “172.0.17.2”, Open a terminal window or tab and examine the Mule log. Leave this window or tab open during the exercise, in order to be able to verify the output from Mule. tail -f ~/mule-root/logs/mule.log Copy the zip-archive “mule-hello.zip” created earlier to the host directory ~/mule-root/apps/. Verify that the application has been deployed without errors in the Mule log: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Started app 'mule-hello' + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Leave the Docker container running after you are finished. In the next section we will look at how to access endpoints exposed by applications running in Docker containers. By binding directories in the host thus making them available in the Docker container, it becomes very simple to deploy Mule applications to an instance of Mule ESB running in a Docker container. I am considering this setup for a production environment as well, since it will enable me to perform backups of the directories containing Mule applications and configuration without having to access the Docker container’s file system. It is also in accord with the idea that a Docker container should be able to be quickly and easily restarted, which I feel it would not be if I had to deploy a number of Mule applications to it in order to recreate its previous state. Accessing Endpoints We now know that we can run the Mule ESB in a Docker container, we can deploy applications and examine the logs quite easily but one final, very important question remains to be answered; how to access endpoints exposed by applications running in a Docker container. This section assumes that the Mule application we deployed to Mule in the previous section is still running. In the host, open a web-browser and issue a request to the Docker container’s IP-address at port 8181. In my case, the URL is http://172.17.0.2:8181 Alternatively use the curl command in a terminal window. In my case I would write: curl 172.17.0.2:8181 The result should be a greeting in the following format: Hello World! It is now: 2015-01-14T07:39:03.942Z In addition, you should be able to see that a message was received in the Mule log. Now try the URL http://localhost:8181 You will get a message saying that the connection was refused, provided that you do not already have a service listening at that port. If you have another computer available that is connected to the same network as the host computer running Ubuntu, do the following: – Find the IP-address of the Ubuntu host computer using the ifconfigcommand. – In a web-browser on the other computer, try accessing port 8181 at the IP-address of the Ubuntu host computer. Again you will get a message saying that the connection was refused. Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Without any particular measures taken, we see that we can access a service exposed in a Docker container from the Docker host but we did not succeed in accessing the service from another computer. To make a service exposed in a Docker container reachable from outside of the host, we need to tell Docker to publish a port from the Docker container to a port in the host using the -p flag: Launch a new Docker container using the following command: sudo docker run -d -p 8181:8181 -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule The added flag -p 8181:8181 makes the service exposed at port 8181 in the Docker container available at port 8181 in the host. Try accessing the URL http://localhost:8181 from a web-browser on the host computer.The result should be a greeting of the form we have seen earlier. Try accessing port 8181 at the IP-address of the Ubuntu host computer from another computer.This should also result in a greeting message. Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Using the -p flag, we have seen that we can expose a service in a Docker container so that it becomes accessible from outside of the host computer. However, we also see that this information need to be supplied at the time of launching the Docker container. The conclusions that I draw from this is that: I can test and develop against a Mule ESB instance running in a Docker container without having to publish any ports, provided that my development computer is the Docker host computer. In a production environment or any other environment that need to expose services running in a Docker container to “the outside world” and where services will be added over time, I would consider deploying an Apache HTTP Server or NGINX on the Docker host computer and use it to proxy the services that are to be exposed. This way I can avoid re-launching the Docker container each time a new service is added and I can even (temporarily) redirect the proxy to some other computer if I need to perform some maintenance. Is There More? Of course! This article should only be considered an introduction and I am just a beginner with Docker. I hope I will have the time and inspiration to write more about Docker as I learn more.

January 20, 2015

by Ivan K

· 27,782 Views · 4 Likes

Docker Orchestration... What It Means and Why You Need It

[This article was written by Yaron Parasol.] Docker containers were created to help enable the fast, and reliable deployment of application components or tiers, by creating a container that holds a self-contained ready to deploy parts of applications, with the middleware and the app business logic needed to run them successfully. For example, a Spring application within a Tomcat container. By design, Docker is purposely an isolated self-contained part of the application, typically one tier or even one node in a tier. However, an application is typically multi-tier in its architecture and that means you have tiers with dependencies between them, where the nature of the dependencies can be anything from network connections and remote API invocations, to exchange of messages between application tiers. And hence an app is a set of different containers with specific configurations. This is why you need a way to glue the pieces of your app together. While, Docker has a basic solution for connecting containers using a Docker bridge, this solution is not always the preferred one, especially when deploying the container across different hosts and you need to take care of real network settings. Docker orchestration with TOSCA + Cloudify. Check it out. Go So, what role does the orchestrator play? The orchestrator will take care of two things: The timing of container creation - as containers need to be created by order of dependencies and Container configuration in order to allow containers to communicate with one another - and for that the orchestrator needs to pass runtime properties between containers. As a side note here: With Docker you need a special tweak here, as you typically don’t touch config files inside a container, you keep the container intact, so there is an interesting workaround for cases that this is required. One method to do this is by using a YAML-based orchestration plan to orchestrate the deployment of apps and post-deployment automation processes, which is the approach Cloudify employs. Based on TOSCA (topology and orchestration standard of cloud apps), this orchestration plan describes the components and their lifecycle, and the relationships between components, especially when it comes to complex topologies. This includes, what’s connected to what, what’s hosted on what, and other such considerations. TOSCA is able to describe the infrastructure, as well as, the middleware tier, and app layers on top of these. Cloudify basically takes this TOSCA orchestration plan (dubbed blueprints in Cloudify speak) and materializes these using workflows that traverse the graph of components, or this plan of components and issues commands to agents. These then create the app components and glue them together. The agents use extensions called plugins that are adaptors between the Cloudify configuration and the various infrastructure as a service (IaaS) and automation tools’ APIs. In our case, we created a plugin to interface with the Docker API. Introducing the Docker Cloudify Plugin The Cloudify-Docker plugin is quite straightforward, it installs the Docker API endpoint/server on the machine and then uses the Docker-Py binding to create, configure, and remove containers. TOSCA lifecycle events are: Create - installation of the app components Configure - configuration of the component Start - startup/running the component There is also stop & delete - for shutdown and removal We started by using the create - to create the container, we did not implement configure at the beginning, and start to run the application. But then we realized that for containers with dependencies we need to have runtime properties, such as IP import of the counterpart container in order to create the container for example. When we create an app server container, we need the port and IP of the database container. So, we pushed the creation of the container to the configure event, and used a TOSCA relationship pre-configure hook, to get the dependent container’s info at runtime. The way to expose the runtime info to the container with the dependencies is by setting them as environment variables. 01.interfaces: 02. cloudify.interfaces.lifecycle: 03. configure: 04. implementation: docker.docker_plugin.tasks.configure 05. inputs: 06. container_config: 07. command: mongod--rest--httpinterface --smallfiles 08. image: dockerfile/mongodb 09. start: 10. implementation: docker.docker_plugin.tasks.run 11. inputs: 12. container_start: 13. port_bindings: 14. 27017: 27017 15. 28017: 28017 Nodecellar Example I’d like to explain how this works by using our Nodecellar app as an example. The Nodecellar app is composed of two hosts that, in this case, Cloudify didn’t create but just SSHed into and then installed agents on. On one we have the MongoD container, with a MongoD process. On the other we have the Nodecellar container with NodeJS and the Nodecellar app within it. The Nodecellar container needs a connection to the MongoD container to run the app queries when the app starts. Ultimately, an orchestrator should not be limited to software deployment, the whole idea behind Docker Is to allow for agility, so we’d also like to use Docker in situations of auto-scale out and auto-heal, CD. In our next post we’ll show exactly that - how Cloudify can be used with Docker for post-deployment scenarios.

December 2, 2014

by Sharone Zitzman

· 17,889 Views

From Vaadin to Docker - A Novice's Journey

I’m a huge Vaadin fan and I’ve created a Github workshop I can demo at conferences. A common issue with such kind of workshops is that attendees have to prepare their workstations in advance… and there’s always a significant part of them that comes with not everything ready. At this point, two options are available to the speaker: either wait for each of the attendee to finish the preparation – too bad for the people who took the time at home to do that, or start anyway – and lose the not-ready part. Given the current buzz around Docker, I thought that could be a very good way to make the workshop preparation quicker – only one step, and hasslefree – no problem regarding the quirks of your operation system. The required steps I ask the attendees are the following: Install Git Install Java, Maven and Tomcat Clone the git repo Build the project (to prepare the Maven repository) Deploy the built webapp Start Tomcat These should directly be automated into Docker. As I wasted much time getting this to work, here’s the tale of my journey in achieving this (be warned, it’s quite long). If you’ve got similar use-cases, I hope it will be useful in you getting things done faster. Starting with Docker The first step was to get to know the basics about Docker. Fortunately, I had the chance to attend a Docker workshop by David Gageot at Duchess Swiss. This included both Docker installation and basics of Dockerfile. I assume readers have likewise a basic understanding of Docker. For those who don’t, I guess browsing the Docker’s official documentation is a nice idea: Installation Dockerfile reference Building my first Dockerfile The Docker image can be built with the following command ran into the directory of the Dockerfile: $ docker build -t vaadinworkshop . The first issues one can encounter when playing with Docker the first time, is to get the following error message: Get http:///var/run/docker.sock/v1.14/containers/json: dial unix /var/run/docker.sock: no such file or directory The reason is because one didn’t export the required environment variables displayed by the boot2docker information message. If you lost the exact data, no worry, just use the shellinit boot2docker parameter: $ boot2docker shellinit Writing /Users/i303869/.docker/boot2docker-vm/ca.pem: Writing /Users/i303869/.docker/boot2docker-vm/cert.pem: Writing /Users/i303869/.docker/boot2docker-vm/key.pem: export DOCKER_HOST=tcp://192.168.59.103:2376 export DOCKER_CERT_PATH=/Users/i303869/.docker/boot2docker-vm Copy-paste the export lines above will solve the issue. These can also be set in one’s .bashrc script as it seems these values seldom change. Next in line is the following error: Get http://192.168.59.103:2376/v1.14/containers/json: malformed HTTP response "x15x03x01x00x02x02" This error message seems to be because of a mismatch between versions of the client and the server. It seems it is because of a bug on Mac OSX when upgrading. For a long term solution, reinstall Docker from scratch; for a quick fix, use the --tls flag with the docker command. As it is quite cumbersome to type it everything, one can alias it: $ alias docker="docker --tls" My last mistake when building the image comes from building the Dockerfile from a not empty directory. Docker sends every file it finds in the directory of the Dockerfile to the Docker container for build: $ docker --tls build -t vaadinworkshop . Sending build context to Docker daemon Too many kB Fix: do not try this at home and start from a directory container the Dockerfile only. Starting from scratch Dockerfiles describe images – images are built as a layered list of instructions. Docker images are designed around single inheritance: one image has to be set a single parent. An image requiring no parent starts from scratch, but Docker provides 4 base official distributions: busybox, debian, ubuntu and centos (operating systems are generally a good start). Whatever you want to achieve, it is necessary to choose the right parent. Given the requirements I set for myself (Java, Maven, Tomcat and Git), I tried to find the right starting image. Many Dockerfiles are already available online on the Docker hub. The browsing app is quite good, but to be really honest, the search can really be improved. My intention was to use the image that matched the most of my requirements, then fill the gap. I could find no image providing Git, but I thought the dgageot/maven Dockerfile would be a nice starting point. The problem is that the base image is a busybox and provides no installer out-of-the-box (apt-get, yum, whatever). For this reason, David uses a lot of curl to get Java 8 and Maven in his Dockerfiles. I foolishly thought I could use a different flavor of busybox that provides the opkg installer. After a while, I accumulated many problems, resolving one heading to another. In the end, I finally decided to use the OS I was most comfortable with and to install everything myself: FROM ubuntu:utopic Scripting Java installation Installing git, maven and tomcat packages is very straightforward (if you don’t forget to use the non-interactive options) with RUN and apt-get: RUN apt-get update && \ apt-get install -y --force-yes git maven tomcat8 Java doesn’t fall into this nice pattern, as Oracle wants you to accept the license. Nice people did however publish it to a third-party repo. Steps are the following: Add the needed package repository Configure the system to automatically accept the license Configure the system to add un-certified packages Update the list of repositories At last, install the package Also add a package for Java 8 system configuration. RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list && \ echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && \ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 RUN apt-get update && \ apt-get install -y --force-yes oracle-java8-installer oracle-java8-set-default Building the sources Getting the workshop’s sources and building them is quite straightforward with the following instructions: RUN git clone https://github.com/nfrankel/vaadin7-workshop.git WORKDIR /vaadin7-workshop RUN mvn package The drawback of this approach is that Maven will start from a fresh repository, and thus download the Internet the first time it is launched. At first, I wanted to mount a volume from the host to the container to share the ~/.m2/repository folder to avoid this, but I noticed this could only be done at runtime through the -v option as the VOLUME instruction cannot point to a host directory. Starting the image The simplest command to start the created Docker image is the following: $ docker run -p 8080:8080 Do not forget the port forwarding from the container to the host, 8080 for the standard HTTP port. Also, note that it’s not necessary to run the container as a daemon (with the -d option). The added value of that is that the standard output of the CMD (see below) will be redirected to the host. When running as a daemon and wanting to check the logs, one has to execute bash in the container, which requires a sequence of cumbersome manipulations. Configuring and launching Tomcat Tomcat can be launched when starting the container by just adding the following instruction to the Dockerfile: CMD ["catalina.sh", "run"] However, trying to start the container at this point will result in the following error: Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/common/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/common], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/server/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/server], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/shared/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/shared], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina initDirs SEVERE: Cannot find specified temporary folder at /usr/share/tomcat8/temp Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina load WARNING: Unable to load server configuration from [/usr/share/tomcat8/conf/server.xml] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina initDirs SEVERE: Cannot find specified temporary folder at /usr/share/tomcat8/temp Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina load WARNING: Unable to load server configuration from [/usr/share/tomcat8/conf/server.xml] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina start SEVERE: Cannot start server. Server instance is not configured. I have no idea why, but it seems Tomcat 8 on Ubuntu is not configured in any meaningful way. Everything is available but we need some symbolic links here and there as well as creating the temp directory. This translates into the following instruction in the Dockerfile: RUN ln -s /var/lib/tomcat8/common $CATALINA_HOME/common && \ ln -s /var/lib/tomcat8/server $CATALINA_HOME/server && \ ln -s /var/lib/tomcat8/shared $CATALINA_HOME/shared && \ ln -s /etc/tomcat8 $CATALINA_HOME/conf && \ mkdir $CATALINA_HOME/temp The final trick is to connect the exploded webapp folder created by Maven to Tomcat’s webapps folder, which it looks for deployments: RUN mkdir $CATALINA_HOME/webapps && \ ln -s /vaadin7-workshop/target/workshop-7.2-1.0-SNAPSHOT/ $CATALINA_HOME/webapps/vaadinworkshop At this point, the Holy Grail is not far away, you just have to browse the URL… if only we knew what the IP was. Since running on Mac, there’s an additional VM beside the host and the container that’s involved. To get this IP, type: $ boot2docker ip The VM's Host only interface IP address is: 192.168.59.103 Now, browsing http://192.168.59.103:8080/vaadinworkshop/ will bring us to the familiar workshop screen: Developing from there Everything works fine but didn’t we just forget about one important thing, like how workshop attendees are supposed to work on the sources? Easy enough, just mount the volume when starting the container: docker run -v /Users//vaadin7-workshop:/vaadin7-workshop -p 8080:8080 vaadinworkshop Note that the host volume must be part of /Users and if on OSX, it must use boot2docker v. 1.3+. Unfortunately, it seems now is the showstopper, as mounting an empty directory from the host to the container will not make the container’s directory available from the host. On the contrary, it will empty the container’s directory given that the host’s directory doesn’t exist… It seems there’s an issue in Docker on Mac. The installation of JHipster runs into the same problem, and proposes to use the Samba Docker folder sharing project. I’m afraid I was too lazy to go further at this point. However, this taught me much about Docker, its usages and use-cases (as well as OSX integration limitations). For those who are interested, you’ll find below the Docker file. Happy Docker! FROM ubuntu:utopic MAINTAINER Nicolas Frankel # Config to get to install Java 8 w/o interaction RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list && echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 RUN apt-get update && apt-get install -y --force-yes git oracle-java8-installer oracle-java8-set-default maven tomcat8 RUN git clone https://github.com/nfrankel/vaadin7-workshop.git WORKDIR /vaadin7-workshop RUN git checkout v7.2-1 RUN mvn package ENV JAVA_HOME /usr/lib/jvm/java-8-oracle ENV CATALINA_HOME /usr/share/tomcat8 ENV PATH $PATH:$CATALINA_HOME/bin # Configure Tomcat 8 directories RUN ln -s /var/lib/tomcat8/common $CATALINA_HOME/common && ln -s /var/lib/tomcat8/server $CATALINA_HOME/server && ln -s /var/lib/tomcat8/shared $CATALINA_HOME/shared && ln -s /etc/tomcat8 $CATALINA_HOME/conf && mkdir $CATALINA_HOME/temp && mkdir $CATALINA_HOME/webapps && ln -s /vaadin7-workshop/target/workshop-7.2-1.0-SNAPSHOT/ $CATALINA_HOME/webapps/vaadinworkshop VOLUME ["/vaadin7-workshop"] CMD ["catalina.sh", "run"] # docker build -t vaadinworkshop . # docker run -v ~/vaadin7-workshop training/webapp -p 8080:8080 vaadinworkshop

November 25, 2014

by Nicolas Fränkel

· 13,044 Views

ZooKeeper on Kubernetes

The last couple of weeks I've been playing around with docker and kubernetes. If you are not familiar with kubernetes let's just say for now that its an open source container cluster management implementation, which I find really really awesome. One of the first things I wanted to try out was running an Apache ZooKeeper ensemble inside kubernetes and I thought that it would be nice to share the experience. For my experiments I used Docker v. 1.3.0 and Openshift V3, which I built from source and includes Kubernetes. ZooKeeper on Docker Managing a ZooKeeper ensemble is definitely not a trivial task. You usually need to configure an odd number of servers and all of the servers need to be aware of each other. This is a PITA on its own, but it gets even more painful when you are working with something as static as docker images. The main difficulty could be expressed as: "How can you create multiple containers out of the same image and have them point to each other?" One approach would be to use docker volumes and provide the configuration externally. This would mean that you have created the configuration for each container, stored it somewhere in the docker host and then pass the configuration to each container as a volume at creation time. I've never tried that myself, I can't tell if its a good or bad practice, I can see some benefits, but I can also see that this is something I am not really excited about. It could look like this: docker run -p 2181:2181 -v /path/to/my/conf:/opt/zookeeper/conf my/zookeeper An other approach would be to pass all the required information as environment variables to the container at creation time and then create a wrapper script which will read the environment variables, modify the configuration files accordingly, launch zookeeper. This is definitely easier to use, but its not that flexible to perform other types of tuning without rebuilding the image itself. Last but not least one could combine the two approaches into one and do something like: Make it possible to provide the base configuration externally using volumes. Use env and scripting to just configure the ensemble. There are plenty of images out there that take one or the other approach. I am more fond of the environment variables approach and since I needed something that would follow some of the kubernetes conventions in terms of naming, I decided to hack an image of my own using the env variables way. Creating a custom image for ZooKeeper I will just focus on the configuration that is required for the ensemble. In order to configure a ZooKeeper ensemble, for each server one has to assign a numeric id and then add in its configuration an entry per zookeeper server, that contains the ip of the server, the peer port of the server and the election port. The server id is added in a file called myid under the dataDir. The rest of the configuration looks like: server.1=server1.example.com:2888:3888 server.2=server2.example.com:2888:3888 server.3=server3.example.com:2888:3888 ... server.current=[bind address]:[peer binding port]:[election biding port]Note that if the server id is X the server.X entry needs to contain the bind ip and ports and not the connection ip and ports. So what we actually need to pass to the container as environment variables are the following: The server id. For each server in the ensemble: The hostname or ip The peer port The election port If these are set, then the script that updates the configuration could look like: if [ ! -z "$SERVER_ID" ]; then echo "$SERVER_ID" > /opt/zookeeper/data/myid #Find the servers exposed in env. for i in `echo {1..15}`;do HOST=`envValue ZK_PEER_${i}_SERVICE_HOST` PEER=`envValue ZK_PEER_${i}_SERVICE_PORT` ELECTION=`envValue ZK_ELECTION_${i}_SERVICE_PORT` if [ "$SERVER_ID" = "$i" ];then echo "server.$i=0.0.0.0:2888:3888" >> conf/zoo.cfg elif [ -z "$HOST" ] || [ -z "$PEER" ] || [ -z "$ELECTION" ] ; then #if a server is not fully defined stop the loop here. break else echo "server.$i=$HOST:$PEER:$ELECTION" >> conf/zoo.cfg fi done fi For simplicity the function that read the keys and values from env are excluded. The complete image and helping scripts to launch zookeeper ensembles of variables size can be found in the fabric8io repository. ZooKeeper on Kubernetes The docker image above, can be used directly with docker, provided that you take care of the environment variables. Now I am going to describe how this image can be used with kubernetes. But first a little rambling... What I really like about using kubernetes with ZooKeeper, is that kubernetes will recreate the container, if it dies or the health check fails. For ZooKeeper this also means that if a container that hosts an ensemble server dies, it will get replaced by a new one. This guarantees that there will be constantly a quorum of ZooKeeper servers. I also like that you don't need to worry about the connection string that the clients will use, if containers come and go. You can use kubernetes services to load balance across all the available servers and you can even expose that outside of kubernetes. Creating a Kubernetes confing for ZooKeeper I'll try to explain how you can create 3 ZooKeeper Server Ensemble in Kubernetes. What we need is 3 docker containers all running ZooKeeper with the right environment variables: { "image": "fabric8/zookeeper", "name": "zookeeper-server-1", "env": [ { "name": "ZK_SERVER_ID", "value": "1" } ], "ports": [ { "name": "zookeeper-client-port", "containerPort": 2181, "protocol": "TCP" }, { "name": "zookeeper-peer-port", "containerPort": 2888, "protocol": "TCP" }, { "name": "zookeeper-election-port", "containerPort": 3888, "protocol": "TCP" } ] } The env needs to specify all the parameters discussed previously. So we need to add along with the ZK_SERVER_ID, the following: ZK_PEER_1_SERVICE_HOST ZK_PEER_1_SERVICE_PORT ZK_ELECTION_1_SERVICE_PORT ZK_PEER_2_SERVICE_HOST ZK_PEER_2_SERVICE_PORT ZK_ELECTION_2_SERVICE_PORT ZK_PEER_3_SERVICE_HOST ZK_PEER_3_SERVICE_PORT ZK_ELECTION_3_SERVICE_PORT An alternative approach could be instead of adding all these manual configuration, to expose peer and election as kubernetes services. I tend to favor the later approach as it can make things simpler when working with multiple hosts. It's also a nice exercise for learning kubernetes. So how do we configure those services? To configure them we need to know: the name of the port the kubernetes pod the provide the service The name of the port is already defined in the previous snippet. So we just need to find out how to select the pod. For this use case, it make sense to have a different pod for each zookeeper server container. So we just need to have a label for each pod, the designates that its a zookeeper server pod and also a label that designates the zookeeper server id. "labels": { "name": "zookeeper-pod", "server": 1 } Something like the above could work. Now we are ready to define the service. I will just show how we can expose the peer port of server with id 1, as a service. The rest can be done in a similar fashion: { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-peer-1", "kind": "Service", "port": 2888, "containerPort": "zookeeper-peer-port", "selector": { "name": "zookeeper-pod", "server": 1 } } The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port. Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy. { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-client", "kind": "Service", "port": 2181, "createExternalLoadBalancer": "true", "containerPort": "zookeeper-client-port", "selector": { "name": "zookeeper-pod" } } The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port. Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy. { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-client", "kind": "Service", "port": 2181, "createExternalLoadBalancer": "true", "containerPort": "zookeeper-client-port", "selector": { "name": "zookeeper-pod" } } I hope you found it useful. There is definitely room for improvement so feel free to leave comments.

November 3, 2014

by Ioannis Canellos

· 22,284 Views · 3 Likes

Sharding Pitfalls Part III: Chunk Balancing and Collection Limits

In Parts 1 and 2 we have covered a number of common issues people run into when managing a sharded MongoDB cluster. In this final post of the series we will cover a subtle, but important distinction in terms of balancing a sharded cluster as well as an interesting limitation that can be worked around relatively easily, but is nonetheless surprising when it comes up. 6. Chunk balancing != data balancing != traffic balancing The balancer in a sharded cluster cares about just one thing: Are chunks for a given collection evenly balanced across all shards? If they are not, then it will take steps to rectify that imbalance. This all sounds perfectly logical, and even with extra complexity like tagging involved the logic is pretty straight forward. If we assume that all chunks are equal, then we can rest assured that our data is being evenly balanced across all the shards in our cluster and rest easy at night. Although that is sometimes, perhaps even frequently, the case it is not always true - chunks are not always equal. There can be massive “jumbo” chunks that exceed the maximum chunk size (64MiB), completely empty chunks and everything in between. Let’s use an example from our first pitfall, the monotonically increasing shard key. For our example, we have picked just such a key to shard on (date), and up until this point we have had just one shard and had not sharded the collection. We are about to add a second shard to our cluster and so we enable sharding on the collection and do the necessary admin work to add the new shard into the cluster. Once the collection is enabled for sharding, the first shard contains all the newly minted chunks. Let’s represent them in a simplified table of 10 chunks. This is not representative of a real data set, but it will do for illustrative purposes: Table 1 - Initial Chunk Layout Now we add our second shard. The balancer will kick in and attempt to distribute the chunks evenly. It will do this by moving the lowest range chunks to the new shard until the counts are identical. Once it is finished balancing, our table now looks like this: Table 2 - Balanced Chunk Layout That looks pretty good at the moment, but lets imagine that more recent chunks are more likely to have more activity (updates say) than older chunks. Adding the traffic share estimates for each chunk shows that shard1 is taking far more traffic (72%) than shard2 (28%) despite the chunks seeming balanced overall based on the approximate size. Hence, chunk balancing is not equal to traffic balancing. Using that same example, let’s add another wrinkle - periodic deletion of old data. Every 3 months we run a job to delete any data older than 12 months. Let’s look at the impact of that on our table after we run it for the first time (assuming the first run happens on July 1st 2015). Table 3 - Post-Delete Chunk Layout The distribution of data is now completely skewed toward shard1 - shard2 is in fact empty! However, the balancer is completely unaware of this imbalance - the chunk count has remained the same the entire time, and as far as it is concerned the system is in a steady state. With no data on shard2, our traffic imbalance as seen above will be even worse, and we have essentially negated the benefit of having a second shard for this collection. Possible Mitigation Strategies If data and traffic balance are important, select an appropriate shard key Move chunks manually to address the imbalances - swap “hot” chunks for “cool” chunks, empty chunks for larger chunks 7. Waiting too long to shard a collection (collection too large) This is not very common, but when it falls on your shoulders, it can be quite challenging to solve. There is a maximum data size for a collection when when it is initially split which is a function of the chunk size and data size as noted on the limits page. If your collection contains less than 256GiB of data, then there will be no issue. If the collection size exceeds 256GiB but is less than 400GiB, then MongoDB may be able to do an initial split without any special measures being taken. Otherwise, with larger initial data sizes and the default settings, the initial split will fail. It is worth noting that once split the collection may grow as needed and without any real limitations as long as you can continue to add shards as data size grows. Possible Mitigation Strategies Since the limit is dictated by the chunk size and the data size, and assuming there is not much to be done about the data size, then the remaining variable is the chunk size. This is adjustable (default is 64MiB) and can be raised in order to let a large collection split initially and then reduced once that has been completed. The required chunk size increase will depend on the actual data size. However, this is relatively easy to work out - simply divide your data size by 256GB and then multiply that figure by 64MiB (and round up if it is not a nice even number). As an example, let’s consider a 4TiB collection: 4TiB divided by 256GiB = 16 64MiB x 16 = 1024MiB Hence, set the max chunk size to 1024MiB, then perform the initial sharding of the collection, and then finally reduce the chunk size back to 64MiB using the same procedure. . Thanks for reading through the Sharding Pitfall series! If you want to learn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.

October 27, 2014

by Francesca Krihely

· 4,307 Views

Sharding Pitfalls Part II: Running a Sharded Cluster

By Adam Comerford, Senior Solutions Engineer In Part I we discussed important considerations when picking a shard key. In this post we will go through some recommendations when running a sharded cluster at scale. Scalability is one of the core benefits of sharding in MongoDB but this can give you a false sense of security; even with that flexibility, you still have to make smart decisions about how and when you deploy resources. In this post, we will cover a couple of common mistakes that people tend to make when it comes to running a sharded cluster. 3. Waiting too long to add a new shard (overloaded) You sharded your database and scaled horizontally for a reason, perhaps it was to add more memory or disk capacity. Whatever the reason, if your application usage grows over time so (generally) does your database utilization. Eventually, your current sharded cluster will pass a certain point, let’s call it 80% utilized (as a nice round estimate), such that it becomes problematic to add another shard. Why? Well, adding a new shard to a cluster is not free, and it is not instantaneous. It consumes resources and (initially) accepts very little traffic. Essentially, at the start of its existence, a newly added shard costs you capacity instead of adding capacity. The length of time it will stay in this state will depend on the balancer and how long it takes for a significant portion of “busy/active” chunks to move onto the new shard. It can often be easier to visualize this process, so let’s make up some hypothetical numbers and set the bar relatively low. Our imaginary existing cluster will be a set of 2 shards, with 2000 chunks (500 considered “active”) and to that we need to add a 3rd shard. This 3rd shard will eventually store one third of the active chunks (and total chunks). The question is, when does this shard stop adding overhead overall and instead become an asset? In reality, this will vary from cluster to cluster and have a lot of dependencies and variables - in other words you need to have good metrics about your cluster, particularly your load bottleneck. Therefore we will once again use our imaginations and go with a relatively low bar: when 5% of active chunks—that is, those chunks seeing most traffic—have migrated to the new shard, you should expect a net gain in performance. In our imaginary system we have evaluated our load levels, the expected impact of migrations and have determine that once that 5% threshold of active chunks has been migrated to the new shard it can be considered a net gain for the overall system. Once all chunks have been balanced, then the migration overhead disappears, but initially this will be an expected trade off. This chart shows how long it would take for new shards to reach net positive contribution in your cluster (the dotted line implies net gain): In this fabricated example, it takes almost 2 hours for the new shard to attain a viable level of active chunks and be considered a net gain for the overall system. Although these numbers are fictional, these numbers are based on setups we have seen in real systems with moderate load. From there it is relatively easy to imagine this set of migrations taking even longer on an overloaded set of shards, and taking far longer for our newly added shard to cross the threshold and become a net gain. As such it is best to be proactive and add capacity before it becomes a necessity. Possible Mitigation Strategies Manual balancing of targeted “hot” chunks (chunk that is being accessed more than others) to move activity to the new shard more quickly Add the shard at low traffic time so that there is less competition for resources Disable balancing on some collections, prioritise balancing busy collections first 4. Under-provisioning Config Servers Provisioning enough resources without being wasteful is always tricky, and all the more so in a complicated distributed system like a MongoDB sharded cluster. Everyone wants to use their hardware, virtual instances, virtual machines, containers and the like in the most efficient way possible, and get the best bang for their buck. Hence it is only natural to take a look at the various pieces of a distributed cluster and look for lower utilized pieces that could be put on less expensive resources. The most common pitfall here with MongoDB are the config servers, which are often neglected when stress testing a cluster. In testing environments and smaller deployments (unless specific measures are taken to stress them) they are relatively lightly loaded and usually identified as candidates for lesser instances/hardware. The problem is that these are critical pieces of infrastructure. They may not be heavily loaded all the time, but when they do see load and struggle to service requests, that can impact all queries (reads, writes, authentication) and add latency to all requests made of the cluster in question. In particular, the first config server in the list supplied to your mongos processes is vital. This is the config server that all mongos processes will default to read from when fetching or refreshing their view of the data distribution in your cluster. Similarly, this is the server that will be hit when attempting to authenticate a user. If it is under-provisioned and cannot service queries, or if it has problems with networking (packet loss, congestion), then the effects will be significant. Possible Mitigation Strategies Ensure the config servers are load tested, slightly over-provisioned (the first config server in particular) If using virtual machines or cloud based instances, investigate increasing available resources Turning off the balancer, disabling chunk splitting will reduce the chances of high read traffic to the config servers (no migrations, no meta data refresh) but this is only a temporary fix unless you have a perfect write distribution and may not eliminate issues completely. 5. Using the count() command on sharded collections This pitfall is very common, and it seems to hit somewhat randomly in terms of how long someone has been running a sharded environment. At some point, a question will arise along the lines of: “How are we tracking/verifying/checking how many documents we have in each collection on each shard, how balanced are they and do they agree with ?” Hopefully no one is actually constructing questions this way in your organization, but you get the basic idea. The most obvious way to do a quick check on this type of thing is to count the documents and see if the numbers make sense and/or agree with counts elsewhere. That thinking naturally leads people to the count command and they proceed to use it to gather figures for their documents and collections. Unfortunately, on a busy, mature sharded cluster, the results will very rarely be what is expected. The reason for this is that the count command as implemented today has several optimizations in place to make it faster to run in general and those speed optimizations essentially bypass a key piece of the sharding functionality needed to return accurate results in this case. This is a known bug and is being tracked in SERVER-3645, but does not stop people from consistently hitting this issue. The nature of the issue means that count will report documents in the results that it should not, for example: Documents that are being deleted as part of a chunk migrations Documents that have been left behind from previous chunk migrations (also known as orphans) Documents currently being copied as part of an in-flight chunk migration A regular query (rather than a count) will have its results filtered by the respective primary and not suffer from the same problem. Hence, if you were to manually count the results from a query client-side you would get an accurate result. This quirk of sharded environments will eventually be fixed, but for now it will inevitably crop up from time to time in all active sharded clusters used by a large team. Possible Mitigation Strategies Do counts on the client side, or use targeted, range based queries (with a primary read preference) to count instead Use cleanUpOrphaned and disable the balancer (make sure it has finished current round) when performing counts across the cluster If you want tolearn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.

October 21, 2014

by Francesca Krihely

· 4,757 Views

Getting Started with JHipster on OS X

Last week I was tasked with developing a quick prototype that used AngularJS for its client and Spring MVC for its server. A colleague developed the same application using Backbone.js and Spring MVC. At first, I considered using my boot-ionic project as a starting point. Then I realized I didn't need to develop a native mobile app, but rather a responsive web app. My colleague mentioned he was going to use RESThub as his starting point, so I figured I'd use JHipster as mine. We allocated a day to get our environments setup with the tools we needed, then timeboxed our first feature spike to four hours. My first experience with JHipster failed the 10-minute test. I spent a lot of time flailing about with various "npm" and "yo" commands, getting permissions issues along the way. After getting thinks to work with some sudo action, I figured I'd try its Docker development environment. This experience was no better. JHipster seems like a nice project, so I figured I'd try to find the causes of my issues. This article is designed to save you the pain I had. If you'd rather just see the steps to get up and running quickly, skip to the summary. The "npm" and "yo" issues I had seemed to be caused by a bad node/npm installation. To fix this, I removed node and installed nvm. Here's the commands I needed to remove node and npm: sudo rm -rf /usr/local/lib/node_modules sudo rm -rf /usr/local/include/node sudo rm /usr/local/bin/node sudo rm -rf /usr/local/bin/npm sudo rm /usr/local/share/man/man1/node.1 sudo rm -rf /usr/local/lib/dtrace/node.d sudo rm -rf ~/.npm Next, I ran "brew doctor" to make sure Homebrew was still happy. It told me some things were broken: $ brew doctor Warning: Broken symlinks were found. Remove them with `brew prune`: /usr/local/bin/yo /usr/local/bin/ionic /usr/local/bin/grunt /usr/local/bin/bower I ran brew update && brew prune, followed by brew install nvm. Next, I added the following to my ~/.profile: source $(brew --prefix nvm)/nvm.sh To install the latest version of node, I ran the commands below and set the latest version as the default: nvm ls-remote nvm install v0.11.13 nvm alias default v0.11.13 Once I had a fresh version of Node.js, I was able to run JHipster's local installation instructions. npm install -g yo npm install -g generator-jhipster Then I created my project: yo jhipster I was disappointed to find this created all the project files in my current directory, rather than in a subdirectory. I'd recommend you do the following instead: mkdir ~/projectname && cd ~/projectname && yo jhipster Before creating your project, JHipster asks you a number of questions. To see what they are, see its documentation on creating an application. Two things to be aware of: Hot reloading Java code doesn't work well (yet) with Java 8 Its OAuth2 implementation doesn't work with WebSockets In other words, I'd recommend using Java 7 + (cookie-based authentication with websockets) or (oauth2 authentication w/o websockets). After creating my project, I was able to run it using "mvn spring-boot:run" and view it at http://localhost:8080. To get hot-reloading for the client, I ran "grunt server" and opened my browser to http://localhost:9000. JHipster + Docker on OS X I had no luck getting the Docker instructions to work initially. I spent a couple hours on it, then gave up. A couple of days ago, I decided to give it another good ol' college-try. To make sure I figured out everything from scratch, I started by removing Docker. I re-installed Docker and pulled the JHipster image using the following: sudo docker pull jdubois/jhipster-docker The error I got from this was the following: 2014/09/05 19:43:38 Post http:///var/run/docker.sock/images/create?fromImage=jdubois%2Fjhipster-docker&tag=: dial unix /var/run/docker.sock: no such file or directory After doing some research, I learned I needed to run boot2docker init first. Next I ran boot2docker up to start the Docker daemon. Then I copied/pasted "export DOCKER_HOST=tcp://192.168.59.103:2375" into my console and tried to run docker pull again. It failed with the same error. The solution was simpler than you might think: don't use sudo. $ docker pull jdubois/jhipster-docker Pulling repository jdubois/jhipster-docker 01bdc74025db: Pulling dependent layers 511136ea3c5a: Download complete ... The next command that JHipster's documentation recommends is to run the Docker image, forward ports and share folders. When you run it, the terminal seems to hang and trying to ssh into it doesn't work. Others have recently reported a similar issue. I discovered the hanging is caused by a missing "-d" parameter and ssh doesn't work because you need to add a portmap to the VM to expose the port to your host. You can fix this by running the following: boot2docker down VBoxManage modifyvm "boot2docker-vm" --natpf1 "containerssh,tcp,,4022,,4022" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containertomcat,tcp,,8080,,8080" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containergruntserver,tcp,,9000,,9000" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containergruntreload,tcp,,35729,,35729" boot2docker start After making these changes, I was able to start the image and ssh into it. docker run -d -v ~/jhipster:/jhipster -p 8080:8080 -p 9000:9000 -p 35729:35729 -p 4022:22 -t jdubois/jhipster-docker ssh -p 4022 jhipster@localhost I tried creating a new project within the VM (cd /jhipster && yo jhipster), but it failed with the following error: /usr/lib/node_modules/generator-jhipster/node_modules/yeoman-generator/node_modules/mkdirp/index.js:89 throw err0; ^ Error: EACCES, permission denied '/jhipster/src' The fix was giving the "jhipster" user ownership of the directory. sudo chown jhipster /jhipster After doing this, I was able to generate an app and run it using "mvn spring-boot:run" and access it from my Mac at http://localhost:8080. I was also able to run "grunt server" and see it at http://localhost:9000 However, I was puzzled to see that there was nothing in my ~/jhipster directory. After doing some searching, I found that the docker run -v /host/path:/container/path doesn't work on OS X. David Gageot's A Better Boot2Docker on OSX led me to svendowideit/samba, which solved this problem. The specifics are documented in boot2docker's folder sharing section. I shutdown my docker container by running "docker ps", grabbing the first two characters of the id and then running: docker stop [2chars] I started the JHipster container without the -v parameter, used "docker ps" to find its name (backstabbing_galileo in this case), then used that to add samba support. docker run -d -p 8080:8080 -p 9000:9000 -p 35729:35729 -p 4022:22 -t jdubois/jhipster-docker docker run --rm -v /usr/local/bin/docker:/docker -v /var/run/docker.sock:/docker.sock svendowideit/samba backstabbing_galileo Then I was able to connect using Finder > Go > Connect to Server, using the following for the server address: cifs://192.168.59.103/jhipster To make this volume appear in my regular development area, I created a symlink: ln -s /Volumes/jhipster ~/dev/jhipster After doing this, all the files were marked as read-only. To fix, I ran "chmod -R 777 ." in the directory on the server. I noticed that this also worked if I ran it from my Mac's terminal, but it took quite a while to traverse all the files. I noticed a similar delay when loading the project into IntelliJ. Summary Phew! That's a lot of information that can be condensed down into four JHipster + Docker on OS X tips. Make sure your npm installation doesn't require sudo rights. If it does, reinstall using nvm. Add portmaps to your VM to expose ports 4022, 8080, 9000 and 35729 to your host. Change ownership on the /jhipster in the Docker image: sudo chown jhipster /jhipster. Use svendowideit/samba to share your VM's directories with OS X.

September 10, 2014

by Matt Raible

· 13,017 Views

Create Your Own Private Docker Registry

This is a post in a series discussing using spring-boot and docker for deployment. Refer to the end of the first post for a table of contents. Shortly after you start building docker containers you will realize that you need some place to publish your images. You could push to the central docker registry. However, the central registry is public. Not a great idea if you are working on a private project. If this is your case, you can simply run a local docker registry. To install and run your private registry run $ docker run -p 5000:5000 -d registry Surprise!!! It is ran in a docker container. You can now start pushing to your local repository. As an example, I will pull the latest postgres image and push version 9.4 to my local registry. $ docker pull postgres $ docker tag postgres:9.4 localhost:5000/postgres:9.4 $ docker push localhost:5000/postgres Outputs: The push refers to a repository [localhost:5000/postgres] (len: 1) Sending image list Pushing repository localhost:5000/postgres (1 tags) 511136ea3c5a: Image successfully pushed ec3443b7b068: Image successfully pushed 06af7ad6cff1: Image successfully pushed 37eae31ff4e9: Image successfully pushed 83e30bf01299: Image successfully pushed 499da968a652: Image successfully pushed bf09bd07d760: Image successfully pushed 1eee820e762b: Image successfully pushed 7bf9287ccfce: Image successfully pushed 288b8d534217: Image successfully pushed f20dbf0acb45: Image successfully pushed bd511e81a5ed: Image successfully pushed 8fe7eb38aea1: Image successfully pushed 464263a50f65: Image successfully pushed 1f58a67adecd: Image successfully pushed a99fb4ee814d: Image successfully pushed 6112f975feab: Image successfully pushed 6dff1b5c2259: Image successfully pushed Pushing tag for rev [6dff1b5c2259] on {http://localhost:5000/v1/repositories/postgres/tags/9.4} Looking at the current images, you will notice that the version tagged with localhost and the official images have the same information. Notice that I had to retag the image with the location of the repository. I thought the requirement to put the location address as part of the image name was a little odd. However, after using docker longer, it makes sense. It ensures you know where the image was originally pulled. $ docker images postgres 9.4 6dff1b5c2259 5 days ago 244.4 MB localhost:5000/postgres 9.4 6dff1b5c2259 5 days ago 244.4 MB Since docker tags are not permanent, and newer version of the postgres:9.4 image could be pushed to the public registry. When you self-host images, you are in control of when updates are pushed to any base image that you have extended. Someday I intend to learn how to build an image completely from scratch. Docker-ize All the Things!

August 11, 2014

by Robert Greathouse

· 18,822 Views · 1 Like

JBoss Data Grid: Installation and Development

In this blog, we will discuss one particular data grid platform from Redhat namely JBoss Data Grid (JDG). We will firstly cover how to access and install this data grid platform and then we will demonstrate how to develop and deploy a simple remote client/server data grid application which utilises the HotRod protocol. We will be using the latest release JDG 6.2 from Redhat in this article. Installation Overview To start using JDG, firstly log on to the redhat site https://access.redhat.com/home and download the software from the Downloads section of the site. We wish to download JDG 6.2 server by clicking on the appropriate links in the Downloads section. For future reference, it is also useful to download the quickstart and maven repository zip files. To install JDG, we simply unzip the JDG server package into an appropriate directory in your environment. JDG Overview In this section, we will provide a brief overview of the contents of the JDG installation package and the most notable configuration options available to users. Out of the box, users are provided with two runtime options either to run JDG in standalone or clustered mode. We can start JDG in either mode by invoking the stanadalone or clustered start up scripts in the / bin directory. To configure the JDG in either mode we need to configure the files standalone.xml and clustered.xml. In our case we will creating a distributed cache which will run on 3 node JDG cluster so we will be utilizing the clustered startup script. In order to set up and add new cache instances to JDG, we modify the infinispan subsystems in the appropriate xml configuration file above. We should also note the principal difference between the standalone and clustered configuration file is that in the clustered configuration file there is a JGroups subsystem configured element which allows for communication and messaging between configured cache instances running in a JDG cluster. Development Environment Setup and Configuration In this section, we will detail how to develop and configure a simple datagrid application which will be deployed to a 3 node JDG cluster. We will demonstrate how to configure and deploy a distributed cache in JDG and also show how to develop a HotRod Java client application which will be used to insert, update and display entries in the distributed cache. We will firstly discuss setting a new distributed cache on a 3 node JDG cluster. In this example, we will run our JDG cluster on a single machine by running each JDG instance on different ports. Firstly, we will create 3 instances of JDG by creating 3 directories (server1, server2, server3) on our host machine and unzipping each JDG installation into each directory. We will now configure each node in our cluster by copying and renaming the clustered.xml configuration file in the \server1\jboss-datagrid-6.2.0-server\standalone\configuration directory. We will name each of the cluster configuration files as "clustered1.xml", "clustered2.xml" and "clustered3.xml" for the JDG instances denoted by "server1", "server2" and "server3" respectively. We will now set up a new distributed cache on our JDG cluster by modifying the infinispan subsystem element in each clustered.xml file. We will demonstrate this for the node denoted "server1" here by modifying the file "clustered1.xml". The cache configuration shown here will be the same across all 3 nodes. To setup a new distributed cache named "directory-dist-cache", we configure the following elements in the file named "clustered1.xml" ......... ...... .............. ...... ...... /socket-binding-group> We will discuss the key elements and attributes relating to the configuration above. In the infinispan endpoint subsystem, we will configure hotrod clients to connect to the JDG server instance on socket 11222. The name of the cache container to host each of the cache instances will be held in the container named "clusteredcache". We have configured the infinispan core subsystem to the default cache container named "clusteredcacahe" whereby we will allow for jmx statistics to be collected relating the configured cache entries i.e statistics="true" We have created a new distributed cache named "directory-dist-cache" whereby there will be two copies of each cache entry held on two of the 3 cluster nodes. We have also set up an eviction policy whereby should there be more than 20 entries in our cache then cache entries will be removed using the LRU algorithm We should have configured nodes "server2" and "server3" to start up with a port offset of 100 and 200 respectively by configuring the socketing binding group element appropriately. Please view the socket bindings noted below. To set the socket binding element with a port offset of 100 on "server2", we configure "clustered2.xml" with the following entry: ...... ...... /socket-binding-group> To set the socket binding element with a port offset of 200 on "server3", we configure "clustered3.xml" with the following entry: ...... ...... /socket-binding-group> Before discussing the setup and configuration of our Hotrod client which will be used to interact with our JDG clustered HotRod server, we will start up each server instance to ensure our newly configured JDG distributed cache starts up correctly. Open up 3 Windows or Linux consoles and execute the following start up commands: Console 1: 1) Navigate to \server1\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the first instance of our JDG cluster denoted "server1": clustered -c=clustered1.xml -Djboss.node.name=server1 Console 2: 1) Navigate to \server2\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the second instance of our JDG cluster denoted "server2": clustered -c=clustered2.xml -Djboss.node.name=server2 Console 3: 1) Navigate to \server3\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the third instance of our JDG cluster denoted "server3": clustered -c=clustered3.xml -Djboss.node.name=server3 Providing all 3 JDG instances have started up correctly, you should see output in the console window whereby we can see there are 3 JDG instances in the JGroups view: HotRod Client Development Setup Now that the Hotrod server is up and running, we need to develop a Hotrod Java client which will interact with the clustered server application. The development environment consists of the following tools. 1) JDK Hotspot 1.7.0_45 2) IDE - Eclipse Kepler Build id: 20130919-0819 The HotRod client application is a simple application consisting of two Java classes. The application allows users to retrieve a reference to the distributed cache from the JDG server and then perform these actions: a) add new cinema objects. b) add and remove shows to each cinema object. c) print the list of all cinemas and shows stored in our distributed cache. The source code can be downloaded from github @ https://github.com/davewinters/JDG. We could use maven here to build and execute our application by configuring the maven settings.xml to point to the maven repository files we downloaded earlier and set up a maven project file (pom.xml) to build and execute the client application. In this article we will build our application using the Eclipse IDE and run the client application on the command line. To create a HotRod client application and execute the sample application, one should complete the following steps: 1) Create a new Java Project in Eclipse 2) Create a new package named uk.co.c2b2.jdg.hotrod and import the source code that has been downloaded from Github mentioned previously. 3) Now we need to configure the build path in Eclipse to contain the appropriate JDG client jar files which are required to compile the application. You should include all the client jar files in the project build path. These jar files are contained in the JDG installation zip file. For example on my machine these jar files are located in the directory: \server1\jboss-datagrid-6.2.0-server\client\hotrod\java 4. Providing the Eclipse build path has been configured appropriately, the application source should compile without issue. 5. We will need to execute the Hotrod application by opening the console window and executing the following command. Note the path specified here will differ depending on where the JDG client jar files and application class files are located in your environment: java -classpath ".;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\commons-pool-1.6-redhat-4.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-client-hotrod-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-commons-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-query-dsl-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-remote-query-client-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-logging-3.1.2.GA-redhat-1.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-river-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protobuf-java-2.5.0.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protostream-1.0.0.CR1-redhat-1.jar" uk/co/c2b2/jdg/hotrod/CinemaDirectory 6. The Hotrod client at runtime provides the end user with a number of different options to interact with the distributed cache as we can view from the console window below. Client Application Principal API Details We will not provide a detailed overview of the Hotrod application code however we will describe the principal API and code details briefly. In order to interact with the distributed cache on the JDG cluster using the Hotrod protocol, we will use the RemoteCacheManager Object which will allow us to retrieve a remote reference to the distributed cache. We have initialised a Properties object with the list of JDG instances and the associated with HotRod server port on each instance. We can add Cinema objects into the distributed cache using the RemoteCache.put() method. private RemoteCacheManager cacheManager; private RemoteCache cache; ..... Properties properties = new Properties(); properties.setProperty(ConfigurationProperties.SERVER_LIST, "127.0.0.1:11222;127.0.0.1:11322;127.0.0.1:11422"); cacheManager = new RemoteCacheManager(properties); cache = cacheManager.getCache("directory-dist-cache"); ..... cache.put(cinemaKey, cinemalist); In the webinar below, I describe in further detail how to set up a JDG cluster and how to develop and run the JDG application discussed above. For further details on JDG please visit: http://www.redhat.com/products/jbossenterprisemiddleware/data-grid/ Webinar: Introduction to JBoss Data Grid -- Installation, Configuration and Development In this webinar we will look at the basics of setting up JBoss Data Grid covering installation, configuration and development. We will look at practical examples of storing data, viewing the data in the cache and removing it. We will also take a look at the different clustered modes and what effect these have on the storage of your data:

July 25, 2014

by David Winters

· 16,103 Views

How to Install Mono on a Raspberry Pi

This post exists to help with an MSDN Magazine article that I am authoring It provides some of the low-level details for the article How to install Mono and root certificates on a raspberry pi How to create an Azure mobile service How to create a Custom API inside Azure mobile services that the raspberry pi can call into How to create an Azure storage account MONO - HOW TO INSTALL ON A RASPBERRY PI Why Mono? How to install Mono on a raspberry pi Installing trusted root certificates on to the raspberry pi http://www.mono-project.com/Main_Page An open source, cross-platform, implementation of C# and the CLR that is binary compatible with Microsoft.NET Mono is a free and open source project led by Xamarin (formerly by Novell) that provides a .NET Framework-compatible set of tools including, among others, a C# compiler and a Common Language Runtime WHY MONO? Because it lets us write .net code compiled on Windows We can simply copy the binary files from Windows to Linux and run it as is From a raspberry pi device, it is possible to use a .net application to take a photo and upload it to Windows Azure storage HOW TO INSTALL ON A RASPBERRY PI RUNNING LINUX You will issue the following commands: pi@raspberrypi ~ $ sudo apt-get update pi@raspberrypi ~ $ sudo apt-get install mono-complete The first command makes sure all the local package index are up to date with the changes made in repositories. Second command installs the complete Mono tooling and runtime. MAKING SURE THAT YOUR MONO APPLICATIONS CAN MAKE A HTTPS REST-BASED CALLS This command downloads the trusted root certificates from the Mozilla LXR web site into the Mono certificate store. Once complete, the Raspberry PI will be capable of making web requests using HTTPS requests within Mono. pi@raspberrypi ~ $ mozroots --import --ask-remove --machine CREATING A NEW AZURE MOBILE SERVICES ACCOUNT The mobile services account is needed to host a Node.js application that provides shared access signatures to raspberry pi devices The shared access signature is needed by the raspberry pi, so that it can directly and securely upload photos to Azure storage STEPS TO CREATE AN AZURE MOBILE SERVICE The steps below will create an Azure mobile service The service will be used to host a Node.js application interacting with a raspberry pi devices We will provision a SQL database, although it will not be used initially FOLLOW THESE STEPS TO CREATE THE MOBILE SERVICE Login into the Azure Portal Select MOBILE SERVICES from the left menu pane at the Azure Portal. In the lower left corner select "+NEW" to create a new Azure Mobile Service. Make sure you've selected, "COMPUTE / MOBILE SERVICE / CREATE." You will now enter a url. We will call this service raspberrymobileservice. For the DATABASE, we will choose "Create a new SQL database instance." The REGION we chose is "West US." The BACKEND is "JavaScript." Click the "->" arrow to proceed to the next screen. In this screen you will "Specify database settings." The NAME of your database will based on the URL you entered previously. In this case, the database is called "raspberrymobileservice_db." You will need to choose a SERVER. We will choose "New SQL database server" from the drop-down list. You will need to provide a SERVER LOGIN NAME and a SERVER LOGIN PASSWORD. Take note of the login you provided as it will be needed later CREATING A CUSTOM API Azure mobile services allows you to create a custom API written in JavaScript that can be called from a raspberry pi device using REST This custom API is really just a Node.js application running in the server CREATING THE API TO RESPOND TO THE DEVICE TRYING TO UPLOAD PHOTOS Now that the service is established, we will turn our attention to creating an API that the device can call into to upload a photo. Login into the Azure Portal Your mobile service will take a few minutes to complete, and you should see the "Ready" flag as the "Status" for your service. Once it is ready you can drill into your service to customize its behavior. Just to the right of the service name, click the right arrow key "->" to drill into the service details. The top menu bar will offer many options, but we are interested in the one titled "API." The API allows you to create a series of node.JS API calls that a device can call into using rest-based approaches. Click on "API." from there, select "CREATE A CUSTOM API." You will be asked to provide an API name. Type in "photos" for the API name. Below you will see a series of drop-down combo boxes that relate to permission. We will keep the default value of "Anybody with the application key." This might not be the best option for all scenarios. You can read more about this here. http://msdn.microsoft.com/en-us/library/azure/jj193161.aspx. Click the checkmark to complete the process. The name of the AP you just created, "Photos," should be visible on the portal interface. To drill into the photos API click on the right arrow key "->". The right arrow key will be just to the right of the name of the API "Photos". At this point you should see a basic script that has been provided by default. We will overwrite this default script with our own script as described in the MSDN Magazine article. CREATING A STORAGE ACCOUNT TO STORE THE PHOTOS Navigate to the portal and create a storage account Create a container for the photos Obtain the: Storage Account Name (you will provide a name) Storage Account Access key (generated for you) Container Name (you will create) CREATING A STORAGE ACCOUNT We will need a storage account so that we can upload photos to it. The steps are well documented here: http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/ In our case we call the storage account raspberrystorage. This means that the URL that the device will use to upload photos is https://raspberrystorage.blob.core.windows.net/. As you complete these steps make sure that you choose the storage account location to be the same location as was used for your mobile services account. This avoids any unnecessary latency or bandwidth costs between data centers. Once the storage account is created, we will need to create a container within it. Photos or any blob for that matter, are always stored within a container. To create a container drill into your newly created storage account and select CONTAINERS from the top menu. From there, select CREATE A CONTAINER. The new container dialog box will ask for a name for your container. Take note of the name you provide. We are calling our container ?photocontainer.? When the raspberry pi device uploads photos to the storage account, it will target a specific container, such as the one we just created. You will next be asked to indicate ACCESS rights. To keep things simple we will select access rights of Public Blob. ENTERING APP SETTINGS Rather than hard-code storage account information inside your JavaScript/Node.js applications, you should consider using apps settings inside of the Azure mobile services portal This post also discusses it well: http://blogs.msdn.com/b/carlosfigueira/archive/2013/12/09/application-settings-in-azure-mobile-services.aspx ?The idea of application settings is a set of key-value pairs which can be set for the mobile service (either via the portal or via the command-line interface), and those values could be then read in the service runtime.? NAVIGATING TO APP SETTINGS Navigate to the Azure Mobile Services section of the portal. Drill into the specific service by hitting the arrow below Select from the Configure Menu at the top Scroll down to the very bottom to see app settings Note that we need to enter: - We need to get this from Azure Storage - PhotoContainerName - AccountName - AccountKey We get this information from the Azure Storage Section of the Portal. Note that you need to have provisioned a Storage Account to have this information. How to get the AccountKey with Azure Storage Services Now you can get the access keys HOW NODE.JS WILL ACCESS THE APP SETTINGS You will create a Node.js application inside of Azure Mobile Services See previous steps THE NODE.JS APPLICATION READING APP SETTINGS You will starting by going back to Azure Mobile Services and drill down into your newly minted service We called ours raspberrymobileservice Once you click API, you should see: Notice the app settings are being read on lines 12 to 14.

June 19, 2014

by Bruno Terkaly

· 16,798 Views

Exploring Message Brokers: RabbitMQ, Kafka, ActiveMQ, and Kestrel

Explore different message brokers, and discover how these important web technologies impact a customer's backlog of messages, and cluster/data performance.

June 3, 2014

by Yves Trudeau

· 460,755 Views · 86 Likes

Java EE: The Basics

wanted to go through some of the basic tenets, the technical terminology related to java ee. for many people, java ee/j2ee still mean servlets, jsps or maybe struts at best. no offence or pun intended! this is not a java ee 'bible' by any means. i am not capable enough of writing such a thing! so let us line up the 'keywords' related to java ee and then look at them one by one java ee java ee apis (specifications) containers services multitiered applications components let's try to elaborate on the above mentioned points. ok. so what is java ee? 'ee' stands for enterprise edition. that essentially makes java ee - java enterprise edition. if i had to summarize java ee in a couple of sentences, it would go something like this "java ee is a platform which defines 'standard specifications/apis' which are then implemented by vendors and used for development of enterprise (distributed, 'multi-tired', robust) 'applications'. these applications are composed of modules or 'components' which use java ee 'containers' as their run-time infrastructure." what is this 'standardized platform' based upon? what does it constitute? the platform revolves around 'standard' specifications or apis . think of these as contracts defined by a standard body e.g. enterprise java beans (ejb), java persistence api (jpa), java message service (jms) etc. these contracts/specifications/apis are implemented by different vendors e.g. glassfish, oracle weblogic, apache tomee etc alright. what about containers? containers can be visualized as 'virtual/logical partitions' . each container supports a subset of the apis/specifications defined by the java ee platform they provide run-time 'services' to the 'applications' which they host the java ee specification lists 4 types of containers ejb container web container application client container applet container java ee containers i am not going to dwell into details of these containers in this post. services?? well, 'services' are nothing but a result of the vendor implementations of the standard 'specifications' (mentioned above). examples of specifications are - jersey for jax-rs (restful services), tyrus (web sockets), eclipselink (jpa), weld (cdi) etc. the 'container' is the interface between the deployed application ('service' consumer) and the application server. here is a list of 'services' which are rendered by the 'container' to the underlying 'components' (this is not an exhaustive list) persistence - offered by the java persistence api (jpa) which drives object relational mapping (orm) and an abstraction for the database operations. messaging - the java message service (jms) provides asynchronous messaging between disparate parts of your applications. contexts & dependency injection - cdi provides loosely coupled and type safe injection of resources. web services - jaxrs and jaxws provide support for rest and soap style services respectively transaction - provided by the java transaction api (jta) implementation what is a typical java ee 'application'? what does it comprise of? applications are composed of different ' components ' which in turn are supported by their corresponding ' container ' supported 'component' types are: enterprise applications - make use of the specifications like ejb, jms, jpa etc and are executed within an ejb container web applications - they leverage the servlet api, jsp, jsf etc and are supported by a web container application client - executed in client side. they need an application client container which has a set of supported libraries and executes in a java se environment. applets - these are gui applications which execute in a web browser. how are java ee applications structured? as far as java ee 'application' architecture is concerned, they generally tend follow the n-tier model consisting of client tier, server tier and of course the database (back end) tier client tier - consists of web browsers or gui (swing, java fx) based clients. web browsers tend to talk to the 'web components' on the server tier while the gui clients interact directly with the 'business' layer within the server tier server tier - this tier comprises of the dynamic web components (jsp, jsf, servlets) and the business layer driven by ejbs, jms, jpa, jta specifications. database tier - contains 'enterprise information systems' backed by databases or even legacy data repositories. generic 3-tier java ee application architecture java ee - bare bones, basics.... as quickly and briefly as i possibly could. that's all for now! :-) stay tuned for more java ee content, specifically around the latest and greatest version of the java ee platform --> java ee 7 happy reading!

April 29, 2014

by Abhishek Gupta

CORE

· 40,689 Views · 3 Likes

A Docker ‘Hello World' With Mono

Docker is a lightweight virtualization technology for Linux that promises to revolutionize the deployment and management of distributed applications. Rather than requiring a complete operating system, like a traditional virtual machine, Docker is built on top of Linux containers, a feature of the Linux kernel, that allows light-weight Docker containers to share a common kernel while isolating applications and their dependencies. There’s a very good Docker SlideShare presentation here that explains the philosophy behind Docker using the analogy of standardized shipping containers. Interesting that the standard shipping container has done more to create our global economy than all the free-trade treaties and international agreements put together. A Docker image is built from a script, called a ‘Dockerfile’. Each Dockerfile starts by declaring a parent image. This is very cool, because it means that you can build up your infrastructure from a layer of images, starting with general, platform images and then layering successively more application specific images on top. I’m going to demonstrate this by first building an image that provides a Mono development environment, and then creating a simple ‘Hello World’ console application image that runs on top of it. Because the Dockerfiles are simple text files, you can keep them under source control and version your environment and dependencies alongside the actual source code of your software. This is a game changer for the deployment and management of distributed systems. Imagine developing an upgrade to your software that includes new versions of its dependencies, including pieces that we’ve traditionally considered the realm of the environment, and not something that you would normally put in your source repository, like the Mono version that the software runs on for example. You can script all these changes in your Dockerfile, test the new container on your local machine, then simply move the image to test and then production. The possibilities for vastly simplified deployment workflows are obvious. Docker brings concerns that were previously the responsibility of an organization’s operations department and makes them a first class part of the software development lifecycle. Now your infrastructure can be maintained as source code, built as part of your CI cycle and continuously deployed, just like the software that runs inside it. Docker also provides docker index, an online repository of docker images. Anyone can create an image and add it to the index and there are already images for almost any piece of infrastructure you can imagine. Say you want to use RabbitMQ, all you have to do is grab a handy RabbitMQ images such as https://index.docker.io/u/tutum/rabbitmq/ and run it like this: docker run -d -p 5672:5672 -p 55672:55672 tutum/rabbitmq The –p flag maps ports between the image and the host. Let’s look at an example. I’m going to show you how to create a docker image for the Mono development environment and have it built and hosted on the docker index. Then I’m going to build a local docker image for a simple ‘hello world’ console application that I can run on my Ubuntu box. First we need to create a Docker file for our Mono environment. I’m going to use the Mono debian packages from directhex. These are maintained by the official Debian/Ubuntu Mono team and are the recommended way of installing the latest Mono versions on Ubuntu. Here’s the Dockerfile: #DOCKER-VERSION 0.9.1 # #VERSION 0.1 # # monoxide mono-devel package on Ubuntu 13.10 FROM ubuntu:13.10 MAINTAINER Mike Hadlow RUN sudo DEBIAN_FRONTEND=noninteractive apt-get install -y -q software-properties-common RUN sudo add-apt-repository ppa:directhex/monoxide -y RUN sudo apt-get update RUN sudo DEBIAN_FRONTEND=noninteractive apt-get install -y -q mono-devel Notice the first line (after the comments) that reads, ‘FROM ubuntu:13.10’. This specifies the parent image for this Dockerfile. This is the official docker Ubuntu image from the index. When I build this Dockerfile, that image will be automatically downloaded and used as the starting point for my image. But I don’t want to build this image locally. Docker provide a build server linked to the docker index. All you have to do is create a public GitHub repository containing your dockerfile, then link the repository to your profile on docker index. You can read the documentation for the details. The GitHub repository for my Mono image is at https://github.com/mikehadlow/ubuntu-monoxide-mono-devel. Notice how the Docker file is in the root of the repository. That’s the default location, but you can have multiple files in sub-directories if you want to support many images from a single repository. Now any time I push a change of my Dockerfile to GitHub, the docker build system will automatically build the image and update the docker index. You can see image listed here:https://index.docker.io/u/mikehadlow/ubuntu-monoxide-mono-devel/ I can now grab my image and run it interactively like this: $ sudo docker pull mikehadlow/ubuntu-monoxide-mono-devel Pulling repository mikehadlow/ubuntu-monoxide-mono-devel f259e029fcdd: Download complete 511136ea3c5a: Download complete 1c7f181e78b9: Download complete 9f676bd305a4: Download complete ce647670fde1: Download complete d6c54574173f: Download complete 6bcad8583de3: Download complete e82d34a742ff: Download complete $ sudo docker run -i mikehadlow/ubuntu-monoxide-mono-devel /bin/bash mono --version Mono JIT compiler version 3.2.8 (Debian 3.2.8+dfsg-1~pre1) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notifications: epoll Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen exit Next let’s create a new local Dockerfile that compiles a simple ‘hello world’ program, and then runs it when we run the image. You can follow along with these steps. All you need is a Ubuntu machine with Docker installed. First here’s our ‘hello world’, save this code in a file named hello.cs: using System; namespace Mike.MonoTest { public class Program { public static void Main() { Console.WriteLine("Hello World"); } } } Next we’ll create our Dockerfile. Copy this code into a file called ‘Dockerfile’: #DOCKER-VERSION 0.9.1 FROM mikehadlow/ubuntu-monoxide-mono-devel ADD . /src RUN mcs /src/hello.cs CMD ["mono", "/src/hello.exe"] Once again, notice the ‘FROM’ line. This time we’re telling Docker to start with our mono image. The next line ‘ADD . /src’, tells Docker to copy the contents of the current directory (the one containing our Dockerfile) into a root directory named ‘src’ in the container. Now our hello.cs file is at /src/hello.cs in the container, so we can compile it with the mono C# compiler, mcs, which is the line ‘RUN mcs /src/hello.cs’. Now we will have the executable, hello.exe, in the src directory. The line ‘CMD [“mono”, “/src/hello.exe”]’ tells Docker what we want to happen when the container is run: just execute our hello.exe program. As an aside, this exercise highlights some questions around what best practice should be with Docker. We could have done this in several different ways. Should we build our software independently of the Docker build in some CI environment, or does it make sense to do it this way, with the Docker build as a step in our CI process? Do we want to rebuild our container for every commit to our software, or do we want the running container to pull the latest from our build output? Initially I’m quite attracted to the idea of building the image as part of the CI but I expect that we’ll have to wait a while for best practice to evolve. Anyway, for now let’s manually build our image: $ sudo docker build -t hello . Uploading context 1.684 MB Uploading context Step 0 : FROM mikehadlow/ubuntu-monoxide-mono-devel ---> f259e029fcdd Step 1 : ADD . /src ---> 6075dee41003 Step 2 : RUN mcs /src/hello.cs ---> Running in 60a3582ab6a3 ---> 0e102c1e4f26 Step 3 : CMD ["mono", "/src/hello.exe"] ---> Running in 3f75e540219a ---> 1150949428b2 Successfully built 1150949428b2 Removing intermediate container 88d2d28f12ab Removing intermediate container 60a3582ab6a3 Removing intermediate container 3f75e540219a You can see Docker executing each build step in turn and storing the intermediate result until the final image is created. Because we used the tag (-t) option and named our image ‘hello’, we can see it when we list all the docker images: $ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE hello latest 1150949428b2 10 seconds ago 396.4 MB mikehadlow/ubuntu-monoxide-mono-devel latest f259e029fcdd 24 hours ago 394.7 MB ubuntu 13.10 9f676bd305a4 8 weeks ago 178 MB ubuntu saucy 9f676bd305a4 8 weeks ago 178 MB ... Now let’s run our image. The first time we do this Docker will create a container and run it. Each subsequent run will reuse that container: $ sudo docker run hello Hello World And that’s it. Imagine that instead of our little hello.exe, this image contained our web application, or maybe a service in some distributed software. In order to deploy it, we’d simply ask Docker to run it on any server we like; development, test, production, or on many servers in a web farm. This is an incredibly powerful way of doing consistent repeatable deployments. To reiterate, I think Docker is a game changer for large server side software. It’s one of the most exciting developments to have emerged this year and definitely worth your time to check out.

April 3, 2014

by Mike Hadlow

· 11,309 Views

Docker: Bulk Remove Images and Containers

I’ve just started looking at Docker. It’s a cool new technology that has the potential to make the management and deployment of distributed applications a great deal easier. I’d very much recommend checking it out. I’m especially interested in using it to deploy Mono applications because it promises to remove the hassle of deploying and maintaining the mono runtime on a multitude of Linux servers. I’ve been playing around creating new images and containers and debugging my Dockerfile, and I’ve wound up with lots of temporary containers and images. It’s really tedious repeatedly running ‘docker rm’ and ‘docker rmi’, so I’ve knocked up a couple of bash commands to bulk delete images and containers. Delete all containers: sudo docker ps -a -q | xargs -n 1 -I {} sudo docker rm {} Delete all un-tagged (or intermediate) images: sudo docker rmi $( sudo docker images | grep '' | tr -s ' ' | cut -d ' ' -f 3)

April 2, 2014

by Mike Hadlow

· 14,675 Views

Distributed Counters Feature Design

this is another experiment with longer posts. previously, i used the time series example as the bed on which to test some ideas regarding feature design, to explain how we work and in general work out the rough patches along the way. i should probably note that these posts are purely fiction at this point. we have no plans to include a time series feature in ravendb at this time. i am trying to work out some thoughts in the open and get your feedback. at any rate, yesterday we had a request for cassandra style counters at the mailing list. and as long as i am doing feature design series, i thought that i could talk about how i would go about implementing this. again, consider this fiction, i have no plans of implementing this at this time. the essence of what we want is to be able to… count stuff. efficiently, in a distributed manner, with optional support for cross data center replication. very roughly, the idea is to have “sub counters”, unique for every node in the system. whenever you increment the value, we log this to our own sub counter, and then replicate it out. whenever you read it, we just sum all the data we have from all the sub counters. let us outline the various parts of the solution in the same order as the one i used for time series. storage a counter is just a named 64 bits signed integer. a counter name can be any string up to 128 printable characters. the external interface of the storage would look like this: 1: public struct counterincrement 2: { 3: public string name; 4: public long change; 5: } 6: 7: public struct counter 8: { 9: public string name; 10: public string source; 11: public long value; 12: } 13: 14: public interface icounterstorage 15: { 16: void localincrementbatch(counterincrement[] batch); 17: 18: counter[] read(string name); 19: 20: void replicatedupdates(counter[] updates); 21: } as you can see, this gives us very simple interface for the storage. we can either change the data locally (which modify our own storage) or we can get an update from a replica about its changes. there really isn’t much more to it, to be fair. the localincrementbatch() increment a local value, and read() will return all the values for a counter. there is a little bit of trickery involved in how exactly one would store the counter values. for now, i think we’ll store each counter as two step values. we’ll have a tree of multi tree values that will carry each value from each source. that means that a counter will take roughly 4kb or so. this is easy to work with and nicely fit the model voron uses internally. note that we’ll outline additional requirement for storage (searching for counter by prefix, iterating over counters, addresses of other servers, stats, etc) below. i’m not showing them here because they aren’t the major issue yet. over the wire skipping out on any optimizations that might be required, we will expose the following endpoints: get /counters/read?id=users/1/visits&users/1/posts <—will return json response with all the relevant values (already summed up). { “users/1/visits”: 43, “users/1/posts”: 3 } get /counters/read?id=users/1/visits&users/1/1/posts&raw=true <—will return json response with all the relevant values, per source. { “users/1/visits”: {“rvn1”: 21, “rvn2”: 22 } , “users/1/posts”: { “rvn1”: 2, “rvn3”: 1 } } post /counters/increment <– allows to increment counters. the request is a json array of the counter name and the change. for a real system, you’ll probably need a lot more stuff, metrics, stats, etc. but this is the high level design, so this would be enough. note that we are skipping the high performance stream based writes we outlined for time series. we’ll probably won’t need them, so that doesn’t matter, but they are an option if we need them. system behavior this is where it is really not interesting, there is very little behavior here, actually. we only have to read the data from the storage, sum it up, and send it to the user. hardly what i’ll call business logic. client api the client api will probably look something like this: 1: counters.increment("users/1/posts"); 2: counters.increment("users/1/visits", 4); 3: 4: using(var batch = counters.batch()) 5: { 6: batch.increment("users/1/posts"); 7: batch.increment("users/1/visits",5); 8: batch.submit(); 9: } note that we’re offering both batch and single api. we’ll likely also want to offer a fire & forget style, which will be able to offer even better performance (because they could do batching across more than a single thread), but that is out of scope for now. for simplicity sake, we are going to have the client just a container for all of endpoints that it knows about. the container would be responsible for… updating the client visible topology, selecting the best server to use at any given point, etc. user interface there isn’t much to it. just show a list of counter values in a list. allow to search by prefix, allow to dive into a particular counter and read its raw values, but that is about it. oh, and allow to delete a counter. deleting data honestly, i really hate deletes. they are very expensive to handle properly the moment you have more than a single node. in this case, there is an inherent race condition between a delete going out and another node getting an increment. and then there is the issue of what happens if you had a node down when you did the delete, etc. this just sucks. deletion are handled normally, (with the race condition caveat, obviously), and i’ll discuss how we replicate them in a bit. high availability / scale out by definition, we actually don’t want to have storage replication here. either log shipping or consensus based. we actually do want to have different values, because we are going to be modifying things independently on many servers. that means that we need to do replication at the database level. and that leads to some interesting questions. again, the hard part here is the deletes. actually, the really hard part is what we are going to do with the new server problem. the new server problem dictates how we are going to bring a new server into the cluster. if we could fix the size of the cluster, that would make things a lot easier. however, we are actually interested in being able to dynamically grow the cluster size. therefor, there are only two real ways to do it: add a new empty node to the cluster, and have it be filled from all the other servers. add a new node by backing up an existing node, and restoring as a new node. ravendb, for example, follows the first option. but it means that in needs to track a lot more information. the second option is actually a lot simpler, because we don’t need to care about keeping around old data. however, this means that the process of bringing up a new server would now be: update all nodes in the cluster with the new node address (node isn’t up yet, replication to it will fail and be queued). backup an existing node and restore at the new node. start the new node. the order of steps is quite important. and it would be easy to get it wrong. also, on large systems, backup & restore can take a long time. operationally speaking, i would much rather just be able to do something like, bring a new node into the cluster in “silent” mode. that is, it would get information from all the other nodes, and i can “flip the switch” and make it visible to clients at any point in time. that is how you do it with ravendb, and it is an incredibly powerful system, when used properly. that means that for all intents and purposes, we don’t do real deletes. what we’ll actually do is replace the counter value with delete marker. this turns deletes into a much simple “just another write”. it has the sad implication of not free disk space on deletes, but deletes tend to be rare, and it is usually fine to add a “purge” admin option that can be run on as needed basis. but that brings us to an interesting issue, how do we actually handle replication. the topology map to simplify things, we are going to go with one way replication from a node to another. that allows complex topologies like master-master, cluster-cluster, replication chain, etc. but in the end, this is all about a single node replication to another. the first question to ask is, are we going to replicate just our local changes, or are we going to have to replicate external changes as well? the problem with replicating external changes is that you may have the following topology: now, server a got a value and sent it to server b. server b then forwarded it to server c. however, at that point, we also have a the value from server a replicated directly to server c. which value is it supposed to pick? and what about a scenario where you have more complex topology? in general, because in this type of system, we can have any node accept writes, and we actually desire this to be the case , we don’t want this behavior. we want to only replicate local data, not all the data. of course, that leads to an annoying question, what happens if we have a 3 node cluster, and one node fails catastrophically. we can bring a new node in, and the other two nodes will be able to fill in their values via replication, but what about the node that is down? the data isn’t gone, it is still right there in the other two nodes, but we need a way to pull it out. therefor, i think that the best option would be to say that nodes only replicate their local state, except in the case of a new node. a new node will be told the address of an existing node in the cluster, at which point it will: register itself in all the nodes in the cluster (discoverable from the existing node). this assumes a standard two way replication link between all servers, if this isn’t the case, the operators would have the responsibility to setup the actual replication semantics on their own. new node now starts getting updates from all the nodes in the cluster. it keeps them in a log for now, not doing anything yet. ask that node for a complete update of all of its current state. when it has all the complete state of the existing node, it replays all of the remembered logs that it didn’t have a chance to apply yet. then it announces that it is in a valid state to start accepting client connections. note that this process is likely to be very sensitive to high data volumes. that is why you’ll usually want to select a backup node to read from, and that decision is an ops decision. you’ll also want to be able to report extensively on the current status of the node, since this can take a while, and ops will be watching this very closely. server name a node requires a unique name. we can use guids, but those aren’t readable, so we can use machine name + port, but those can change. ideally, we can require the user to set us up with a unique name. that is important for readability and for being able to alter see all the values we have in all the nodes. it is important that names are never repeated, so we’ll probably have a guid there anyway, just to be on the safe side. actual replication semantics since we have the new server problem down to an automated process, we can choose the drastically simpler model of just having an internal queue per each replication destination. whenever we make a change, we also make a note of that in the queue for that destination, then we start an async replication process to that server, sending all of our updates there. it is always safe to overwrite data using replication, because we are overwriting our own data, never anyone else. and… that is about it, actually. there are probably a lot of details that i am missing / would discover if we were to actually implement this. but i think that this is a pretty good idea about what this feature is about.

March 25, 2014

by Oren Eini

· 12,648 Views · 1 Like

Step-by-Step: Live Migrate Multiple (Clustered) VMs in One Line of PowerShell - Revisited

A while back, I wrote an article showing how to Live Migrate Your VMs in One Line of Powershell between non-clustered Windows Server 2012 Hyper-V hosts using Shared Nothing Live Migration. Since then, I’ve been asked a few times for how this type of parallel Live Migration would be performed for highly available virtual machines between Hyper-V hosts within a cluster. In this article, we’ll walk through the steps of doing exactly that … via Windows PowerShell on Windows Server 2012 or 2012 R2 or our FREE Hyper-V Server 2012 R2 bare-metal, enterprise-grade hypervisor in a clustered configuration. Wait! Do I need PowerShell to Live Migrate multiple VMs within a Cluster? Well, actually … No. You could certainly use the Failover Cluster Manager GUI tool to select multiple highly available virtual machines, right-click and select Move | Live Migration … Failover Cluster Manager – Performing Multi-VM Live Migration But, you may wish to script this process for other reasons … perhaps to efficiently drain all VM’s from a host as part of a maintenance script that will be performing other tasks. Can I use the same PowerShell cmdlets for Live Migrating within a Cluster? Well, actually … No again. When VMs are made highly available resources within a cluster, they’re managed as cluster group resources instead of being standalone VM resources. As a result, we have a different set of Cluster-aware PowerShell cmdlets that we use when managing these cluster groups. To perform a scripted multi-VM Live Migration, we’ll be leveraging three of these cmdlets: Get-ClusterNode, Get-ClusterGroup and Move-ClusterVirtualMachineRole Now, let’s see that one line of PowerShell! Before getting to the point of actually performing the multi-VM Live Migration in a single PowerShell command line, we first need to setup a few variables to handle the "what" and "where" of moving these VMs. First, let’s specify the name of the cluster with which we’ll be working. We’ll store it in a $clusterName variable. $clusterName = read-host -Prompt "Cluster name" Next, we’ll need to select the cluster node to which we’ll be Live Migrating the VMs. Lets use the Get-ClusterNode and Out-GridView cmdlets together to prompt for the cluster node and store the value in a $targetClusterNode variable. $targetClusterNode = Get-ClusterNode -Cluster $clusterName | Out-GridView -Title "Select Target Cluster Node" ` -OutputMode Single And then, we’ll need to create a list of all the VMs currently running in the cluster. We can use the Get-ClusterGroup cmdlet to retrieve this list. Below, we have an example where we are combining this cmdlet with a Where-Object cmdlet to return only the virtual machine cluster groups that are running on any node except the selected target cluster node. After all, it really doesn’t make any sense to Live Migrate a VM to the same node on which it’s currently running! $haVMs = Get-ClusterGroup -Cluster $clusterName | Where-Object {($_.GroupType -eq "VirtualMachine") ` -and ($_.OwnerNode -ne $targetClusterNode.Name)} We’ve stored the resulting list of VMs in a $haVMs variable. Ready to Live Migrate! OK … Now we have all of our variables defined for the cluster, the target cluster node and the list of VMs from which to choose. Here’s our single line of PowerShell to do the magic … $haVMs | Out-GridView -Title "Select VMs to Move" –PassThru | Move-ClusterVirtualMachineRole -MigrationType Live ` -Node $targetClusterNode.Name -Wait 0 Proceed with care: Keep in mind that your target cluster node will need to have sufficient available resources to run the VM's that you select for Live Migration. Of course, it's best to initially test tasks like this in your lab environment first. Here’s what is happening in this single PowerShell command line: We’re passing the list of VMs stored in the $haVMs variable to the Out-GridView cmdlet. Out-GridView prompts for which VMs to Live Migrate and then passes the selected VMs down the PowerShell object pipeline to the Move-ClusterVirtualMachineRole cmdlet. This cmdlet initiates the Live Migration for each selected VM, and because it’s using a –Wait 0 parameter, it initiates each Live Migration one-after-another without waiting for the prior task to finish. As a result, all of the selected VMs will Live Migrate in parallel, up to the maximum number of concurrent Live Migrations that you’ve configured on these cluster nodes. The VMs selected beyond this maximum will simply queue up and wait their turn. Unlike some competing hypervisors, Hyper-V doesn't impose an artificial hard-coded limit on how many VMs for you can Live Migrate concurrently. Instead, it's up to you to set the maximum to a sensible value based on your hardware and network capacity. Do you have your own PowerShell automation ideas for Hyper-V? Feel free to share your ideas in the Comments section below. See you in the Clouds! - Keith

March 3, 2014

by Keith Mayer

· 10,709 Views

To ServiceMix or Not to ServiceMix

This morning an interesting topic was posted to the Apache ServiceMix user forum, asking the question: To ServiceMix or not ServiceMix. In my mind the short answer is: NO Guillaume Nodet one of the key architects and long time committer on Apache ServiceMix already had his mind set 3 years ago when he wrong this blog post - Thoughts about ServiceMix. What has happened on the ServiceMix project was that the ServiceMix kernel was pulled out of ServiceMix into its own project - Apache Karaf. That happened in spring 2009, which Guillaume also blogged about. So is all that bad? No its IMHO all great. In fact having the kernel as a separate project, and Camel and CXF as the integration and WS/RS frameworks, would allow the ServiceMix team to focus on building the ESB that truly had value-add. But that did not happen. ServiceMix did not create a cross product security model, web console, audit and trace tooling, clustering, governance, service registry, and much more that people were looking for in an ESB (or related to a SOA suite). There were only small pieces of it, but never really baked well into the project. That said its not too late. I think the ServiceMix project is dying, but if a lot of people in the community step up, and contribute and work on these things, then it can bring value to some users. But I seriously doubt this will happen. PS: 6 years ago I was working as a consultant and looked at the next integration platform for a major Danish organization, and we looked at ServiceMix back then and dismissed it due its JBI nature, and the new OSGi based architecture was only just started. And frankly it has taken a long long time to mature Apache Karaf / Felix / Aries and the other pieces in OSGi to what they are today to offer a stable and sound platform for users to build their integration applications. That was not the case 4-6 years ago. Okay No to ServiceMix - what are my options then? So what should use you instead of ServiceMix? Well in my mind you have at least these two options. 1) Use Apache Karaf and add the pieces you need, such as Camel, CXF, ActiveMQ and build your own ESB. These individual projects have regular releases, and you can upgrade as you need. The ServiceMix project only has the JBI components in additional, that you should NOT use. Only legacy users that got on the old ServiceMix 3.x wagon may need to use this in a graceful upgrade from JBI to Karaf based containers. 2) Take a look at fabric8. IMHO fabric8 is all that value-add the ServiceMix project did not create, and a lot more. James Strachan, just blogged today about some of his thoughts on fabric8, JBoss Fuse, and Karaf. I encourage you to take a read. For example he talks about how fabric becomes poly container, so you have a much wider choice of which containers/JVM to run your integration applications. OSGi is no longer a requirement. (IMHO that is very very existing and potentially a changer). I encourage you to check out fabric8 web-site, and also read the overview and motivation sections of the documentation. And then check out some of the videos. After the upcoming JBoss Fuse 6.1 release, the Fuse team at Red Hat will have more time and focus to bring the documentation at fabric8 up to date covering all the functionality we have (there is a lot more), and as well bring out a 1.0 community released using pure community releases. This gives end users a 100% free to use out of the box release. And users looking for a commercial release can then use JBoss Fuse. Best of both worlds. Summary Okay back to the question - to ServiceMix or not. Then NO. Innovation happens outside ServiceMix, and also more and more outside Apache. If you have thoughts then you can share those in comments to this blog, or better yet, get involved in the discussion forum at the ServiceMix user forum. PPS: The thoughts on this blog is mine alone, and are not any official words from my employer.

February 12, 2014

by Claus Ibsen

· 16,968 Views

Couchbase .NET SDK 2.0 Development Series: Part 1-1: Server Configuration

This article was originally written by Jeff Morris In the introduction to this series, I discussed some of the motivation for rewriting .NET SDK, the goals, objectives and the major features of the upcoming 2.0 release, and we examined the high-level architecture (10,000 feet view) of a Couchbase Server Client SDK. In this post we will go over the design and development of one of the core configuration components of a Couchbase SDK: Server Configuration. Introduction A Couchbase SDK client requires configuration from two sources: the Client Configuration, which defines the IP of the cluster to connect to, number of connections to use and other important information regarding how the client will interact with the cluster, and the Server Configuration, which defines the current state of the cluster (e.g. number of nodes, buckets that are available, etc.), thus driving the internal state of a client (Cluster Map) This post will only discuss the Server Configuration aspects and will largely revolve around implementing several well-defined interfaces or contracts. HTTP Streaming Configuration Currently, most clients use a “bootstrapping” technique via client configuration and a “Streaming Configuration” exposed by the Couchbase REST API. This is supported by versions of Couchbase from 2.2 and back. The usual approach is as follows: Within the “uris” element of a Client Configuration (semantics very per client), a URL is defined for which to start the bootstrapping process: http://[SERVER]:8091/pools The response is then parsed and the a request is made to get the buckets configuration: http://[SERVER]:8091/pools/default?uuid=[UUID] This response is parsed and another request is made to get streaming URL from: http://[SERVER]:8091/pools/default/buckets?v=[VERSION]&uuid=[UUID] Finally, the streaming URL connection is made which is long-lived and raises events in the client with respect to changes in the cluster: http://[SERVER]:8091/pools/default/bucketsStreaming/default?bucket_uuid=[UUID] The client will then change its internal state to match that of the current server configuration. There are some problems with this approach, among others: The “streaming URL” is resource intensive to create and maintain (mainly memory) on the server-side During a rebalance or failover situation, the cluster configuration may change many, many times. Each time this happens the client must tear down all of its resources (socket connections, VBucket mappings) and build its state up again and again, which can leads to reduced throughput, latency, higher than expected memory and CPU usage, and so on and so forth… Operations that are in-flight may be terminated and then re-tried on a new config state – it’s as if the “carpet has been pulled out from underneath them”. Responding to NOT_MY_VBUCKET responses are handled in-efficiently by simple trying the next node in the list – there is no information to help the client in which node to re-direct the operation to. A New Model for Configuration Management: CCCP While the streaming HTTP “bootstrapping” approach has worked reasonably well for most clients, the downsides have begun to outweigh the plusses, thus a new model for updating client configuration has been defined is available starting with the 2.5 version of the Couchbase Server: Client Cluster Configuration Publication or “CCCP”. CCCP introduces a new operation to be used before or after authentication to request configuration as well as a mechanism for returning configuration information when a NOT_MY_VBUCKET response is returned for a failed operation. In this case CCCP supporting SDK, the client will react by using the configuration to update itself before resending the operation. Note that a NOT_MY_VBUCKET is the standard response that is returned by the cluster when the cluster itself has changed (during a rebalance or failover scenario for example) and the client has not yet “synched” up and is using a stale configuration, resulting in an invalid key mapping. Whereas the “bootstrapping” approach is somewhat of a “pull” type operation, CCCP is either “push” or “pull” depending upon whether the request was initiated by the client (via an explicit CMD_GET_CLUSTER_CONFIG operation) or by the server itself (via a NOT_MY_VBUCKET response to an operation). We will go over CCCP in more detail in a later post. File Based Configuration One other semi-supported configuration option exists: file based configuration. File based configuration is primarily useful for testing and development and we will provide an implementation in the test projects to remove some of the dependencies that are difficult to replicate and or cause false positives when running the test suite. Structural Architecture View Internally the Server Configuration component of the client is a provider based model, in which multiple implementations of a configuration provider can be configured in the client and then a strategy can be chosen to determine which provider should be used. The default is a simple linear, fallback approach where the first configured provider is used and then if it fails the next provider in sequence will take its place. Here is a diagram showing the main actor objects and the relationships with some of other key objects within the client which will be discussed in subsequent posts: A description of each follows: ConfigurationProvider: a source which shall yield a new ConfigInfo. It’s the responsibility of the provider to provide the mechanism for fetching the configuration from its source. ConfigurationInformation: the configuration info contains a list of possible nodes and the VBucket map informing clients about which servers within said nodes a given key should be forwarded to. ConfigurationManager: bridge between the client and the providers and the strategy taken to determine which provider to use and what retry logic to apply. A more detailed document of this architecture can be found here. Please note that this, like all development, is an evolutionary process, so expect some changes and revisions over time. Conclusion and Next Steps This post discussed the history (HTTP Streaming) and the future (CCCP) of Couchbase SDK Server Configuration Management. In the next post we will go into detail the implementation of the HTTP Streaming configuration provider which is required for clients targeting pre-2.5 versions of the Couchbase Server.

February 7, 2014

by Don Pinto

· 3,788 Views

Java: Handling a RuntimeException in a Runnable

At the end of last year I was playing around with running scheduled tasks to monitor a Neo4j cluster and one of the problems I ran into was that the monitoring would sometimes exit. I eventually realised that this was because a RuntimeException was being thrown inside the Runnable method and I wasn’t handling it. The following code demonstrates the problem: import java.util.ArrayList; import java.util.List; import java.util.concurrent.*; public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } If we run that code we’ll see the RuntimeException but the executor won’t exit because the thread died without informing it: Exception in thread "main" pool-1-thread-1 -> 1391212558074 java.util.concurrent.ExecutionException: java.lang.RuntimeException: game over at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at RunnableBlog.main(RunnableBlog.java:11) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) At the time I ended up adding a try catch block and printing the exception like so: public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { e.printStackTrace(); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } This allows the exception to be recognised and as far as I can tell means that the thread executing the Runnable doesn’t die. java.lang.RuntimeException: game over pool-1-thread-1 -> 1391212651955 at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212652956 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212653955 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) This worked well and allowed me to keep monitoring the cluster. However, I recently started reading ‘Java Concurrency in Practice‘ (only 6 years after I bought it!) and realised that this might not be the proper way of handling the RuntimeException. public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { Thread t = Thread.currentThread(); t.getUncaughtExceptionHandler().uncaughtException(t, e); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } I don’t see much difference between the two approaches so it’d be great if someone could explain to me why this approach is better than my previous one of catching the exception and printing the stack trace.

February 6, 2014

by Mark Needham

· 19,654 Views

How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1

Learn how to set up a four node Hadoop cluster using AWS EC2, PuTTy(gen), and WinSCP.

January 23, 2014

by Hardik Pandya

· 136,005 Views · 3 Likes