Cloud Architecture Resources

The Latest Cloud Architecture Topics

The API Gateway Pattern: Angular JS and Spring Security Part IV

Written by Dave Syer in the Spring blog In this article we continue our discussion of how to use Spring Security with Angular JS in a “single page application”. Here we show how to build an API Gateway to control the authentication and access to the backend resources using Spring Cloud. This is the fourth in a series of articles, and you can catch up on the basic building blocks of the application or build it from scratch by reading the first article, or you can just go straight to the source code in Github. In the last article we built a simple distributed application that used Spring Session to authenticate the backend resources. In this one we make the UI server into a reverse proxy to the backend resource server, fixing the issues with the last implementation (technical complexity introduced by custom token authentication), and giving us a lot of new options for controlling access from the browser client. Reminder: if you are working through this article with the sample application, be sure to clear your browser cache of cookies and HTTP Basic credentials. In Chrome the best way to do that for a single server is to open a new incognito window. Creating an API Gateway An API Gateway is a single point of entry (and control) for front end clients, which could be browser based (like the examples in this article) or mobile. The client only has to know the URL of one server, and the backend can be refactored at will with no change, which is a significant advantage. There are other advantages in terms of centralization and control: rate limiting, authentication, auditing and logging. And implementing a simple reverse proxy is really simple with Spring Cloud. If you were following along in the code, you will know that the application implementation at the end of the last article was a bit complicated, so it’s not a great place to iterate away from. There was, however, a halfway point which we could start from more easily, where the backend resource wasn’t yet secured with Spring Security. The source code for this is a separate project in Github so we are going to start from there. It has a UI server and a resource server and they are talking to each other. The resource server doesn’t have Spring Security yet so we can get the system working first and then add that layer. Declarative Reverse Proxy in One Line To turn it into an API Gateawy, the UI server needs one small tweak. Somewhere in the Spring configuration we need to add an @EnableZuulProxy annotation, e.g. in the main (only)application class: @SpringBootApplication @RestController @EnableZuulProxy public class UiApplication { ... } and in an external configuration file we need to map a local resource in the UI server to a remote one in the external configuration (“application.yml”): security: ... zuul: routes: resource: path: /resource/** url: http://localhost:9000 This says “map paths with the pattern /resource/** in this server to the same paths in the remote server at localhost:9000”. Simple and yet effective (OK so it’s 6 lines including the YAML, but you don’t always need that)! All we need to make this work is the right stuff on the classpath. For that purpose we have a few new lines in our Maven POM: org.springframework.cloud spring-cloud-starter-parent 1.0.0.BUILD-SNAPSHOT pom import org.springframework.cloud spring-cloud-starter-zuul ... Note the use of the “spring-cloud-starter-zuul” - it’s a starter POM just like the Spring Boot ones, but it governs the dependencies we need for this Zuul proxy. We are also using because we want to be able to depend on all the versions of transitive dependencies being correct. Consuming the Proxy in the Client With those changes in place our application still works, but we haven’t actually used the new proxy yet until we modify the client. Fortunately that’s trivial. We just need to go from this implementation of the “home” controller: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('http://localhost:9000/').success(function(data) { $scope.greeting = data; }) }); to a local resource: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('resource/').success(function(data) { $scope.greeting = data; }) }); Now when we fire up the servers everything is working and the requests are being proxied through the UI (API Gateway) to the resource server. Further Simplifications Even better: we don’t need the CORS filter any more in the resource server. We threw that one together pretty quickly anyway, and it should have been a red light that we had to do anything as technically focused by hand (especially where it concerns security). Fortunately it is now redundant, so we can just throw it away, and go back to sleeping at night! Securing the Resource Server You might remember in the intermediate state that we started from there is no security in place for the resource server. Aside: Lack of software security might not even be a problem if your network architecture mirrors the application architecture (you can just make the resource server physically inaccessible to anyone but the UI server). As a simple demonstration of that we can make the resource server only accessible on localhost. Just add this to application.properties in the resource server: server.address: 127.0.0.1 Wow, that was easy! Do that with a network address that’s only visible in your data center and you have a security solution that works for all resource servers and all user desktops. Suppose that we decide we do need security at the software level (quite likely for a number of reasons). That’s not going to be a problem, because all we need to do is add Spring Security as a dependency (in the resource server POM): org.springframework.boot spring-boot-starter-security That’s enough to get us a secure resource server, but it won’t get us a working application yet, for the same reason that it didn’t in Part III: there is no shared authentication state between the two servers. Sharing Authentication State We can use the same mechanism to share authentication (and CSRF) state as we did in the last, i.e. Spring Session. We add the dependency to both servers as before: org.springframework.session spring-session 1.0.0.RELEASE org.springframework.boot spring-boot-starter-redis but this time the configuration is much simpler because we can just add the same Filterdeclaration to both. First the UI server (adding @EnableRedisHttpSession): @SpringBootApplication @RestController @EnableZuulProxy @EnableRedisHttpSession public class UiApplication { ... } and then the resource server. There are two changes to make: one is adding@EnableRedisHttpSession and a HeaderHttpSessionStrategy bean to theResourceApplication: @SpringBootApplication @RestController @EnableRedisHttpSession class ResourceApplication { ... @Bean HeaderHttpSessionStrategy sessionStrategy() { new HeaderHttpSessionStrategy(); } } and the other is to explicitly ask for a non-stateless session creation policy inapplication.properties: security.sessions: NEVER As long as redis is still running in the background (use the fig.yml if you like to start it) then the system will work. Load the homepage for the UI at http://localhost:8080 and login and you will see the message from the backend rendered on the homepage. How Does it Work? What is going on behind the scenes now? First we can look at the HTTP requests in the UI server (and API Gateway): VERB PATH STATUS RESPONSE GET / 200 index.html GET /css/angular-bootstrap.css 200 Twitter bootstrap CSS GET /js/angular-bootstrap.js 200 Bootstrap and Angular JS GET /js/hello.js 200 Application logic GET /user 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /resource 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /login.html 200 Angular login form partial POST /login 302 Redirect to home page (ignored) GET /user 200 JSON authenticated user GET /resource 200 (Proxied) JSON greeting That’s identical to the sequence at the end of Part II except for the fact that the cookie names are slightly different (“SESSION” instead of “JSESSIONID”) because we are using Spring Session. But the architecture is different and that last request to “/resource” is special because it was proxied to the resource server. We can see the reverse proxy in action by looking at the “/trace” endpoint in the UI server (from Spring Boot Actuator, which we added with the Spring Cloud dependencies). Go tohttp://localhost:8080/trace in a browser and scroll to the end (if you don’t have one already get a JSON plugin for your browser to make it nice and readable). You will need to authenticate with HTTP Basic (browser popup), but the same credentials are valid as for your login form. At or near the end you should see a pair of requests something like this: { "timestamp": 1420558194546, "info": { "method": "GET", "path": "/", "query": "" "remote": true, "proxy": "resource", "headers": { "request": { "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260", "x-forwarded-prefix": "/resource", "x-forwarded-host": "localhost:8080" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } }, } }, { "timestamp": 1420558200232, "info": { "method": "GET", "path": "/resource/", "headers": { "request": { "host": "localhost:8080", "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } } } }, The second entry there is the request from the client to the gateway on “/resource” and you can see the cookies (added by the browser) and the CSRF header (added by Angular as discussed inPart II). The first entry has remote: true and that means it’s tracing the call to the resource server. You can see it went out to a uri path “/” and you can see that (crucially) the cookies and CSRF headers have been sent too. Without Spring Session these headers would be meaningless to the resource server, but the way we have set it up it can now use those headers to re-constitute a session with authentication and CSRF token data. So the request is permitted and we are in business! Conclusion We covered quite a lot in this article but we got to a really nice place where there is a minimal amount of boilerplate code in our two servers, they are both nicely secure and the user experience isn’t compromised. That alone would be a reason to use the API Gateway pattern, but really we have only scratched the surface of what that might be used for (Netflix uses it for a lot of things). Read up on Spring Cloud to find out more on how to make it easy to add more features to the gateway. The next article in this series will extend the application architecture a bit by extracting the authentication responsibilities to a separate server (the Single Sign On pattern).

February 9, 2015

by Pieter Humphrey

· 16,349 Views

How-To: Setup Development Environment for Hadoop MapReduce

This post is intended for folks who are looking out for a quick start on developing a basic Hadoop MapReduce application. We will see how to set up a basic MR application for WordCount using Java, Maven and Eclipse and run a basic MR program in local mode , which is easy for debugging at an early stage. Assuming JDK 1.6+ is already installed and Eclipse has a setup for Maven plugin and download from default maven repository is not restriced. Problem Statement : To count the occurrence of each word appearing in an input file using MapReduce. Step 1 : Adding Dependency Create a maven project in eclipse and use following code in your pom.xml. 4.0.0 com.saurzcode.hadoop MapReduce 0.0.1-SNAPSHOT jar org.apache.hadoop hadoop-client 2.2.0 Upon saving it should download all required dependencies for running a basic Hadoop MapReduce program. Step 2 : Mapper Program Map step involves tokenizing the file, traversing the words, and emitting a count of one for each word that is found. Our mapper class should extend Mapper class and override it’s map method. When this method is called the value parameter of the method will contain a chunk of the lines of file to be processed and the output parameter is used to emit word instances. In real world clustered setup, this code will run on multiple nodes which will be consumed by set of reducers to process further. public class WordCountMapper extends Mapper { private final IntWritable ONE = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while(tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, ONE); } } } Step 3 : Reducer Program Our reducer extends the Reducer class and implement logic to sum up each occurrence of word token received from mappers.Output from Reducers will go to the output folder as a text file ( default or as configured in Driver program for Output format) named as part-r-00000 along with a _SUCCESS file. public class WordCountReducer extends Reducer { public void reduce(Text text, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(text, new IntWritable(sum)); } } Step 4 : Driver Program Our driver program will configure the job by supplying the map and reduce program we just wrote along with various input , output parameters. public class WordCount { public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Path inputPath = new Path(args[0]); Path outputDir = new Path(args[1]); // Create configuration Configuration conf = new Configuration(true); // Create job Job job = new Job(conf, "WordCount"); job.setJarByClass(WordCountMapper.class); // Setup MapReduce job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); job.setNumReduceTasks(1); // Specify key / value job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // Input FileInputFormat.addInputPath(job, inputPath); job.setInputFormatClass(TextInputFormat.class); // Output FileOutputFormat.setOutputPath(job, outputDir); job.setOutputFormatClass(TextOutputFormat.class); // Delete output if exists FileSystem hdfs = FileSystem.get(conf); if (hdfs.exists(outputDir)) hdfs.delete(outputDir, true); // Execute job int code = job.waitForCompletion(true) ? 0 : 1; System.exit(code); } } That’s It !! We are all set to execute our first MapReduce Program in eclipse in local mode. Let’s assume there is an input text file called input.txt in folder input which contains following text : foo bar is foo count count foo for saurzcode Expected output : foo 3 bar 1 is 1 count 2 for 1 saurzcode 1 Let’s run this program in eclipse as Java Application :- We need to give path to input and output folder/file to the program as argument.Also, note output folder shouldn’t exist before running this program else program will fail. java com.saurzcode.mapreduce.WordCount input/inputfile.txt output If this program runs successfully emitting set of lines while it is executing mappers and reducers, we should see a output folder and with following files : output/ _SUCCESS part-r-00000

January 30, 2015

by Saurabh Chhajed

· 13,107 Views · 1 Like

Mule ESB in Docker

In this article I will attempt to run the Mule ESB community edition in Docker in order to see whether it is feasible without any greater inconvenience. My goal is to be able to use Docker both when testing as well as in a production environment in order to gain better control over the environment and to separate different types of environments. I imagine that most of the Docker-related information can be applied to other applications – I have used Mule since it is what I usually work with. The conclusion I have made after having completed my experiments is that it is possible to run Mule ESB in Docker without any inconvenience. In addition, Docker will indeed allow me to have better control over the different environments and also allow me to separate them as I find appropriate. Finally, I just want to mention that I have used Docker in an Ubuntu environment. I have not attempted any of the exercises in Docker running on Windows or Mac OS X. Docker Briefly In short, Docker allows for creating of images that serve as blueprints for containers. A Docker container is an instance of a Docker image in the same way a Java object is an instance of a Java class. FROM codingtony/java MAINTAINER tony(dot)bussieres(at)ticksmith(dot)com RUN wget https://repository.mulesoft.org/nexus/content/repositories/releases/org/mule/distributions/mule-standalone/3.5.0/mule-standalone-3.5.0.tar.gz RUN cd /opt && tar xvzf ~/mule-standalone-3.5.0.tar.gz RUN echo "4a94356f7401ac8be30a992a414ca9b9 /mule-standalone-3.5.0.tar.gz" | md5sum -c RUN rm ~/mule-standalone-3.5.0.tar.gz RUN ln -s /opt/mule-standalone-3.5.0 /opt/mule CMD [ "/opt/mule/bin/mule" ] The resource isolation features of Linux are used to create Docker containers, which are more lightweight than virtual machines and are separated from the environment in which Docker runs, the host. Using Docker an image can be created that, every time it is started has a known state. In order to remove any doubts about whether the environment has been altered in any way, the container can be stopped and a new container started. I can even run multiple Docker containers on one and the same computer to simulate a multi-server production environment. Applications can also be run in their own Docker containers, as shown in this figure. Three Docker containers, each containing a specific application, running in one host. A more detailed introduction to Docker is available here. The main entry point to the Docker documentation can be found here. Motivation Some of the motivations I have for using Docker in both testing and production environments are: The environment in which I test my application should be as similar as the final deployment environment as possible, if not identical. Making the deployment environment easy to scale up and down. If it is easy to start a new processing node when need arise and stop it if it is no longer used, I will be able to adapt to changes rather quickly and thus reduce errors caused by, for instance, load peaks. Maintain an increased number of nodes to which applications can be deployed. Instead of running one instance of some kind of application server, Mule ESB in my case, on a computer, I want multiple instances that are partitioned, for instance, according to importance. High-priority applications run on one separate instance, which have higher priority both as far as resources (CPU, memory, disk etc) are concerned but also as far as support is concerned. Applications which are less critical run on another instance. Enable quick replacement of instances in the deployment environment. Reasons for having to replace instances may be hardware failure etc. Better control over the contents of the different environments. The concept of an environment that, at any time, may be disposed (and restarted) discourages hacks in the environment, which are usually poorly documented and sometimes difficult to trace. Using Docker, I need to change the appropriate Docker image if I want to make changes to some application environment. The Docker image file, commonly known as Dockerfile, can be checked into any ordinary revision control system, such as Git, Subversion etc, making changes reversible and traceable. Automate the creation of a testing environment. An example could be a nightly job that runs on my build server which creates a test environment, deploys one or more applications to it and then performs tests, such as load-testing. Prerequisites To get the best possible experience when running Docker, I run it under Ubuntu. According to the current documentation, Docker is supported under the following versions of Ubuntu: 12.04 LTS (64-bit) 13.04 (64-bit) 13.10 (64-bit) 14.04 (64-bit) Against my usual conservative self, I chose Ubuntu 14.10, which at the time of writing this article is the latest version. While I haven’t run into any issues, I cannot promise anything regarding compatibility with Docker as far as this version of Ubuntu is concerned. Installing Docker Before we install anything, those who have the Docker version from the Ubuntu repository should remove this version before installing a newer version of Docker, since the Ubuntu repository does not contain the most recent version and the package does not have the same name as the Docker package we will install: sudo apt-get remove docker.io The simplest way to install Docker is to use an installation script made available at the Docker website: curl -sSL https://get.docker.com/ubuntu/ | sudo sh If you are not running Ubuntu or if you do not want to use the above way of installing Docker, please refer to this page containing instructions on how to install Docker on various platforms. To verify the Docker installation, open a terminal window and enter: sudo docker version Output similar to the following should appear: Client version: 1.4.1 Client API version: 1.16 Go version (client): go1.3.3 Git commit (client): 5bc2ff8 OS/Arch (client): linux/amd64 Server version: 1.4.1 Server API version: 1.16 Go version (server): go1.3.3 Git commit (server): 5bc2ff8 We are now ready to start a Mule instance in Docker. Running Mule in Docker One of the advantages with Docker is that there is a large repository of Docker images that are ready to be used, and even extended if one so wishes. ThisDocker image is the one that I will use in this article. It is well documented, there is a source repository and it contains a recent version of the Mule ESB Community Edition. Some additional details on the Docker image: Ubuntu 14.04. Oracle JavaSE 1.7.0_65. This version will change as the PPA containing the package is updated. Mule ESB CE 3.5.0 Note that the image may change at any time and the specifications above may have changed. If you intend to use Docker in your organization, I would suspect that the best alternative is to create your own Docker images that are totally under your control. The Docker image repository is an excellent source of inspiration and aid even in this case. Starting a Docker Container To start a Docker container using this image, open a terminal window and write: sudo docker run codingtony/mule The first time an image is used it needs to be downloaded and created. This usually takes quite some time, so I suggest a short break here – perhaps for a cup of coffee or tea. If you just want to download an image without starting it, exchange the Docker command “run” with “pull”. Once the container is started, you will see some output to the console. If you are familiar with Mule, you will recognize the log output: MULE_HOME is set to /opt/mule-standalone-3.5.0 Running in console (foreground) mode by default, use Ctrl-C to exit... MULE_HOME is set to /opt/mule-standalone-3.5.0 Running Mule... --> Wrapper Started as Console Launching a JVM... Starting the Mule Container... Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved. INFO 2015-01-05 04:41:42,302 [WrapperListener_start_runner] org.mule.module.launcher.MuleContainer: ********************************************************************** * Mule ESB and Integration Platform * * Version: 3.5.0 Build: ff1df1f3 * * MuleSoft, Inc. * * For more information go to http://www.mulesoft.org * * * * Server started: 1/5/15 4:41 AM * * JDK: 1.7.0_65 (mixed mode) * * OS: Linux (3.16.0-28-generic, amd64) * * Host: f95698cfb796 (172.17.0.2) * ********************************************************************** Note that: In the text-box containing information about the Mule ESB and Integration Platform, there is a row which starts with “Host:”. The hexadecimal digit that follows is the Docker container id and the IP-address is the external IP-address of the Docker container in which Mule is running. Before we do anything with the Mule instance running in Docker, let’s take a look at Docker containers. Docker Containers We can verify that there is a Docker container running by opening another terminal window, or a tab in the first terminal window, and running the command: sudo docker ps As a result, you will see output similar to the following (I have edited the output in order for the columns to be aligned with the column titles): CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f95698cfb796 codingtony/mule:latest "/opt/mule/bin/mule" 7 min ago Up 7 min jolly_hopper From this output we can see that: The ID of the container is f95698cfb796. This ID can be used when performing operations on the container, such as stopping it, restarting it etc. The name of the image used to created the container. The command that is currently executing. If we look at the Dockerfile for the image, we can see that the last line in this file is: CMD [ “/opt/mule/bin/mule” ] This is the command that is executed whenever an instance of the Docker image is launched and it matches what we see in the COMMAND column for the Docker container. The CREATED column shows how much time has passed since the container was created. The STATUS column shows the current status of the image. When you have used Docker for a while, you can view all the containers using: sudo docker ps -a This will show you containers that are not running, in addition to the running ones. Containers that are not running can be restarted. The PORTS column shows any port mappings for the container. More about port mappings later. Finally, the NAMES column contain a more human-friendly container name. This container name can be used in the same way as the container id. Docker containers will consume disk-space and if you want to determine how much disk-space each of the containers on your computer use, issue the following command: sudo docker ps -a -s An additional column, SIZE, will be shown and in this column I see that my Mule container consumes 41,76kB. Note that this is in addition to the disk-space consumed by the Docker image. This number will grow if you use the container under a longer period of time, as the container retains any files written to disk. To completely remove a stopped Docker container, find the id or name of the container and use the command: sudo docker rm [container id or name here] Before going further, let’s stop the running container and remove it: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Files and Docker Containers So far we have managed to start a Mule instance running inside a Docker container, but there were no Mule applications deployed to it and the logs that were generated were only visible in the terminal window. I want to be able to deploy my applications to the Mule instance and examine the logs in a convenient way. In this section I will show how to: Share one or more directories in the host file-system with a Docker container. Access the files in a Docker container from the host. As the first step in looking at sharing directories between the host operating system and a Docker container, we are going to look at Mule logs. As part of this exercise we also set up the directories in the host operating system that are going to be shared with the Docker container. In your home directory, create a directory named “mule-root”. In the “mule-root” directory, create three directories named “apps”, “conf” and “logs”. Download the Mule CE 3.5.0 standalone distribution from this link. From the Mule CE 3.5.0 distribution, copy the files in the “apps” directory to the “mule-root/apps” directory you just created. From the Mule CE 3.5.0 distribution, copy the files in the “conf” directory to the “mule-root/conf” directory you created. The resulting file- and directory-structure should look like this (shown using the tree command): ~/mule-root/ ├── apps │ └── default │ └── mule-config.xml ├── conf │ ├── log4j.properties │ ├── tls-default.conf │ ├── tls-fips140-2.conf │ ├── wrapper-additional.conf │ └── wrapper.conf └── logs Edit the log4j.properties file in the “mule-root/conf” directory and set the log-level on the last line in the file to “DEBUG”. This modification has nothing to do with sharing directories, but is in order for us to be able to see some more output from Mule when we run it later. The last two lines should now look like this: # Mule classes log4j.logger.org.mule=DEBUG Binding Volumes We are now ready to launch a new Docker container and when we do, we will tell Docker to map three directories in the Docker container to three directories in the host operating system. Three directories in a Docker container bound to three directories in the host. Launch the Docker container with the command below. The -v option tells Docker that we want to make the contents of a directory in the host available at a certain path in the Docker container file-system. The -d option runs the container in the background and the terminal prompt will be available as soon as the id of the newly launched Docker container has been printed. sudo docker run -d -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule Examine the “mule-root” directory and its subdirectories in the host, which should now look like below. The files on the highlighted rows have been created by Mule. mule-root/ ├── apps │ ├── default │ │ └── mule-config.xml │ └── default-anchor.txt ├── conf │ ├── log4j.properties │ ├── tls-default.conf │ ├── tls-fips140-2.conf │ ├── wrapper-additional.conf │ └── wrapper.conf └── logs ├── mule-app-default.log ├── mule-domain-default.log └── mule.log Examine the “mule.log” file using the command “tail -f ~/mule-root/logs/mule.log”. There should be periodic output written to the log file similar to the following: DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Checking for changes... DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Current anchors: default-anchor.txt DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Deleted anchors: Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Direct Access to Docker Container Files When running Docker under the Ubuntu OS it is also possible to access the file-system of a Docker container from the host file-system. It may be possible to do this under other operating systems too, but I haven’t had the opportunity to test this. This technique may come in handy during development or testing with Docker containers for which you haven’t bound any volumes. Note! If given the choice to use either volume binding, as seen above, or direct access to container files as we will look at in this section for something more than a temporary file access, I would chose to use volume binding. Direct access to Docker container files relies on implementation details that I suspect may change in future versions of Docker if the developers find it suitable. With all that said, lets get the action started: Start a new Docker container: sudo docker run -d codingtony/mule Find the id of the newly launched Docker container: sudo docker ps Examine low-level information about the newly launched Docker container: sudo docker inspect [container id or name here] Output similar to this will be printed to the console (portions removed to conserve space): [{ "AppArmorProfile": "", "Args": [], "Config": { ... }, "Created": "2015-01-12T07:58:47.913905369Z", "Driver": "aufs", "ExecDriver": "native-0.2", "HostConfig": { ... }, "HostnamePath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/hostname", "HostsPath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/hosts", "Id": "68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef", "Image": "bcd0f37d48d4501ad64bae941d95446b157a6f15e31251e26918dbac542d731f", "MountLabel": "", "Name": "/thirsty_darwin", "NetworkSettings": { ... }, "Path": "/opt/mule/bin/mule", "ProcessLabel": "", "ResolvConfPath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/resolv.conf", "State": { ... }, "Volumes": {}, "VolumesRW": {} }] Locate the “Driver” node (highlighted in the above output) and ensure that its value is “aufs”. If it is not, you may need to modify the directory paths below replacing “aufs” with the value of this node. Personally I have only seen the “aufs” value at this node so anything else is uncharted territory to me. Copy the long hexadecimal value that can be found at the “Id” node (also highlighted in the above output). This is the long id of the Docker container. In a terminal window, issue the following command, inserting the long id of your container where noted: sudo ls -al /var/lib/docker/aufs/mnt/[long container id here] You are now looking at the root of the volume used by the Docker container you just launched. In the same terminal window, issue the following command: sudo ls -al /var/lib/docker/aufs/mnt/[long container id here]/opt The output from this command should look like this: total 12 drwxr-xr-x 4 root root 4096 jan 12 15:58 . drwxr-xr-x 75 root root 4096 jan 12 15:58 .. lrwxrwxrwx 1 root root 26 aug 10 04:19 mule -> /opt/mule-standalone-3.5.0 drwxr-xr-x 17 409 409 4096 jan 12 15:58 mule-standalone-3.5.0 Examine this line in the Dockerfile:RUN ln -s /opt/mule-standalone-3.5.0 /opt/muleWe see that a symbolic link is created and that the directory name and the name of the symbolic link matches the output we saw earlier. This matches the directory output in the previous step. To examine the Mule log file that we looked at when binding volumes earlier, use the following command: sudo cat /var/lib/docker/aufs/mnt/[long container id here]/opt/mule-standalone-3.5.0/logs/mule.log Next we create a new file in the Docker container using vi: sudo vi /var/lib/docker/aufs/mnt/[long container id here]/opt/mule-standalone-3.5.0/test.txt Enter some text into the new file by first pressing i and the type the text. When you are finished entering the text, press the Escape key and write the file to disk by typing the characters “:wq” without quotes. This writes the new contents of the file to disk and quits the editor. Leave the Docker container running after you are finished. In the next section, we are going to look at the file we just created from inside the Docker container. We have seen that we can examine the file system of a Docker container without binding volumes. It is also possible to copy or move files from the host file-system to the container’s file system using the regular commands. Root privileges are required both when examining and writing to the Docker container’s file system. Entering a Docker Container In order to verify that the file we just created in the host was indeed written to the Docker container, we are going to start a bash shell in the running Docker container and examine the location where the new file is expected to be located and the contents of the file. In the process we will see how we can execute commands in a Docker container from the host. Issue the command below in a terminal window. The exec Docker command is used to run a command, bash in this case, in a running Docker container. The -i flags tell Docker to keep the input stream open while the command is being executed. In this example, it allows us to enter commands into the bash shell running inside the Docker container. The -t flag cause Docker to allocate a text terminal to which the output from the command execution is printed. sudo docker exec -i -t [container id or name here] bash Note the prompt, which should change to [user]@[Docker container id]. In my case it looks like this: root@3ea374a280da:/# Go to the Mule installation directory using this command: cd /opt/mule-standalone-3.5.0/ Examine the contents of the directory: ls -al Among the other files, you should see the “test.txt” file: -rw-r--r-- 1 root root 53 Jan 14 03:19 test.txt Examine the contents of the “text.txt” file. The contents of the file should match what you entered earlier. cat text.txt Exit to the host OS: exit Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] We have seen that we can execute commands in a running Docker container. In this particular example, we used it to execute the bash shell and examine a file. I draw the conclusion that I should be able to set up a Docker image that contains a very controlled environment for some type of test and then create a container from that image and start the test from the host. Deploying a Mule Application In this section we will look at deploying a Mule application to an instance of the Mule ESB running in a Docker container. We will use volume binding, that we looked at in the section on files and Docker containers, to share directories in the host with the Docker container in order to make it easy to deploy applications, modify running applications, examine logs etc. Preparations Before deploying the application, we need to make some preparations: First of all, we restore the original log-level that we changed earlier. In this example, there will be log output when the applications we will deploy is run and we can limit the log generated by Mule. Edit the log4j.properties file in the “mule-root/conf” directory in the host and set the log-level on the last line in the file back to “INFO” and add one line, as in the listing below. The last three lines should now look like this: # Mule classes log4j.logger.org.mule=INFO log4j.logger.org.mule.tck.functional=DEBUG Next, we create the Mule application which we will deploy to the Mule ESB running in Docker: In some directory, create a file named “mule-deploy.properties” with the following contents: redeployment.enabled=true encoding=UTF-8 domain=default config.resources=HelloWorld.xml In the same directory create a file named “HelloWorld.xml”. This file contains the Mule configuration for our example application: Create a zip-archive named “mule-hello.zip” containing the two files created above: zip mule-hello.zip mule-deploy.properties HelloWorld.xml Deploy the Mule Application Before you start the Docker container in which the Mule EBS will run, make sure that you have created and prepared the directories in the host as described in the section Files and Docker Containers above. Start a new Mule Docker container using the command that we used when binding volumes: sudo docker run -d -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule As before, the -v option tells Docker to bind three directories in the host to three locations in the Docker container’s file system. Find the IP-address of the Docker container: sudo docker inspect [container id or name here] | grep IPAddress In my case, I see the following line which reveals the IP-address of the Docker container: “IPAddress”: “172.0.17.2”, Open a terminal window or tab and examine the Mule log. Leave this window or tab open during the exercise, in order to be able to verify the output from Mule. tail -f ~/mule-root/logs/mule.log Copy the zip-archive “mule-hello.zip” created earlier to the host directory ~/mule-root/apps/. Verify that the application has been deployed without errors in the Mule log: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Started app 'mule-hello' + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Leave the Docker container running after you are finished. In the next section we will look at how to access endpoints exposed by applications running in Docker containers. By binding directories in the host thus making them available in the Docker container, it becomes very simple to deploy Mule applications to an instance of Mule ESB running in a Docker container. I am considering this setup for a production environment as well, since it will enable me to perform backups of the directories containing Mule applications and configuration without having to access the Docker container’s file system. It is also in accord with the idea that a Docker container should be able to be quickly and easily restarted, which I feel it would not be if I had to deploy a number of Mule applications to it in order to recreate its previous state. Accessing Endpoints We now know that we can run the Mule ESB in a Docker container, we can deploy applications and examine the logs quite easily but one final, very important question remains to be answered; how to access endpoints exposed by applications running in a Docker container. This section assumes that the Mule application we deployed to Mule in the previous section is still running. In the host, open a web-browser and issue a request to the Docker container’s IP-address at port 8181. In my case, the URL is http://172.17.0.2:8181 Alternatively use the curl command in a terminal window. In my case I would write: curl 172.17.0.2:8181 The result should be a greeting in the following format: Hello World! It is now: 2015-01-14T07:39:03.942Z In addition, you should be able to see that a message was received in the Mule log. Now try the URL http://localhost:8181 You will get a message saying that the connection was refused, provided that you do not already have a service listening at that port. If you have another computer available that is connected to the same network as the host computer running Ubuntu, do the following: – Find the IP-address of the Ubuntu host computer using the ifconfigcommand. – In a web-browser on the other computer, try accessing port 8181 at the IP-address of the Ubuntu host computer. Again you will get a message saying that the connection was refused. Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Without any particular measures taken, we see that we can access a service exposed in a Docker container from the Docker host but we did not succeed in accessing the service from another computer. To make a service exposed in a Docker container reachable from outside of the host, we need to tell Docker to publish a port from the Docker container to a port in the host using the -p flag: Launch a new Docker container using the following command: sudo docker run -d -p 8181:8181 -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule The added flag -p 8181:8181 makes the service exposed at port 8181 in the Docker container available at port 8181 in the host. Try accessing the URL http://localhost:8181 from a web-browser on the host computer.The result should be a greeting of the form we have seen earlier. Try accessing port 8181 at the IP-address of the Ubuntu host computer from another computer.This should also result in a greeting message. Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Using the -p flag, we have seen that we can expose a service in a Docker container so that it becomes accessible from outside of the host computer. However, we also see that this information need to be supplied at the time of launching the Docker container. The conclusions that I draw from this is that: I can test and develop against a Mule ESB instance running in a Docker container without having to publish any ports, provided that my development computer is the Docker host computer. In a production environment or any other environment that need to expose services running in a Docker container to “the outside world” and where services will be added over time, I would consider deploying an Apache HTTP Server or NGINX on the Docker host computer and use it to proxy the services that are to be exposed. This way I can avoid re-launching the Docker container each time a new service is added and I can even (temporarily) redirect the proxy to some other computer if I need to perform some maintenance. Is There More? Of course! This article should only be considered an introduction and I am just a beginner with Docker. I hope I will have the time and inspiration to write more about Docker as I learn more.

January 20, 2015

by Ivan K

· 27,784 Views · 4 Likes

AWS Activate: Pros, Cons, and Everything in Between

First and foremost, it is important to define what AWS Activate is and what it is used for before we can take a deeper look. Exactly one year ago, Amazon created a program specifically designed for a particular group of customers that often times is in need of as much help as they can get (AKA startups). This program supports startups in their initial phase of building their businesses. This includes providing AWS credits, taking part in startup contests, and receiving benefits from third party solutions on the AWS cloud. Activate allows AWS partners that want to create a presence within the Activate community offer perks to member startups. Some of which include discounts and extended free tiers. Some startups that have attained high levels of success with AWS include Spotify, Pinterest, and Dropbox. With the big shots maintaining their places in startup stardom, Amazon has opened its doors to the next generation of innovators. As such, Amazon offers two different Activate packages. The Self-Starter package is comprised of a limited amount of each of the offerings listed above, whereas the Portfolio package includes some added bonuses along the lines of more high-profile and technical support as well as more in-depth training. On his blog AWS’ CTO, Werner Vogel, reiterated the importance of startups, “Startups will forever be a very important customer segment of AWS. They were among our first customers and along the way some amazing businesses have been built by these startups, many of which running for 100% on AWS.” “We’re excited to be a part of this global momentum in the startup ecosystem. The challenge now is to support and assist an increasing number of startups across the world.” The fun doesn’t stop there. In April of this year, AWS expanded the Activate package to offer much more than generalupport. This entailed sponsoring solution architects to take startups through step by step consultations in the fields of security, architecture and performance. Consequently, though Amazon’s professional services teams were established for customers, it was natural to have them take part in Activate. By nurturing new startups and making them rely heavily on the AWS cloud. As we can see today, companies that started with AWS 4 years ago are now worth billions of dollars. Airbnb and Dropbox, for example, now thoroughly enjoy the flexibility Amazon offers, as well as the fact that they no longer have to maintain cumbersome IT operations. Why not from the get-go? So the question is, if Amazon essentially built AWS on startups, why hasn’t Activate been around from the get-go, 6 years ago? AWS owes a great deal of its success to scalable startups that wanted and needed servers to run their businesses, yet didn’t have the initial capital to build their own data centers. No one really knows why Amazon did not provide startups back then with the kind of support they do today. However, as the market matured, it became clear that Amazon realized that an increasing number of startups could use their help. As a result, Amazon discovered that marketing their support services through Venture Capitalists and incubators around the world would include them as partners in this program and aid in marketing the service to startups of all kinds. “AWS Activate requires a special registration that allows startup customers with a valid AWS account to apply for either a self-starter package or a portfolio package. If a startup is a member of one of the accelerators, seed funds, or startup organizations that Amazon already works with, they may apply for the more exclusive AWS Activate Portfolio Package.” Learn More Incubators and Accelerators It was a natural step for Amazon to partner with accelerators all over the world with the Activate package. In addition to supporting startups, as mentioned above, these accelerators act as channels in the startup scene.At the first AWS re:Invent, Bezos jokes to his fellow investors, saying that eventually some of the investments will return to him because of how heavily the startup scene relies on Amazon. Activate and the approximately 150 accelerators across the world, including White Accel, Techstars, Appwest, and Battery Ventures, genuinely support and understand the values of the AWS service. They are happy to be able to use the Activate platform to help their startups flourish within the AWS clouds. 3rd Party Partners Aside from the accelerators, as an Amazon partner, you can enroll special offers to Activate members. For example, members that are part of the Self-Starter package may receive a 3 month free trial for Chef, whereas Portfolio members may receive a 6 month trial. Most of the partners will provide an extended free trial or credits via Activate. For instance, Trend Micro, one of Amazon’s biggest partners in the security domain, provides $2500 credit for Activate members in the Portfolio package. While there are not many partners on the list, the ones that are mentioned are very helpful and provide nice benefits for Activate members. Reviews of the program from both the partners’ and startups’ side showed that Activate is ideal for startups that have resource constraints. While members within the Self-Starter package are able to use the AWS Free Usage Tier, Portfolio members can receive anywhere from $1,000 to $15,000 in AWS Promotional Credit. The credit is maybe the most important value for these startups. Bearing in mind that Google also has their own line of packages and credit for new companies, it makes sense for AWS to start giving more life to these companies, above the free tier. Everyone has access to the free tier, these startups simply get more of it. Seems that there is no downside to participating. There is no obligation and the worst thing that can happen is that you will find that the services are great, and simply continue using them, which may result in you being locked-in to the point where you need to eventually pay. On the other hand, seems that the last announcement in April, which is actually “meet our architects”. Meaning the knowledge that Amazon’s architects share with startups in their consultation sessions help them get a better grasp on the ecosystem, as well as understand that more resource utilization is ultimately the next logical step for growth. All in all, although Amazon didn’t offer with this program 4 years ago, the AWS cloud was still the natural choice for startups. It included all of the benefits a startup can get using and online and on-demand infinite amount of resources. As a result, it is the clear choice for web scale startups. There are many reasons why Amazon only recently decided to offer free benefits to their prized potential customers. While it could have stemmed from competition from Microsoft and Google, or Amazon may want to simply show their support for their potential customers, demonstrating their cloud’s benefits at an early stage. Aside from that, Amazon understands and is built on companies with long term goals and possibilities. Therefore Amazon sees startups as a long term investment, which starts off with little risk.

December 15, 2014

by Ofir Nachmani

· 10,613 Views

Using Azure AD SSO Tokens for Multiple AAD Resources from Native Mobile Apps

This blog post is the third in a series that cover Azure Active Directory Single Sign-On (SSO) authentication in native mobile applications. Authenticating iOS app users with Azure Active Directory How to Best handle AAD access tokens in native mobile apps Using Azure SSO tokens for Multiple AAD Resources From Native Mobile Apps(this post) Sharing Azure SSO access tokens across multiple native mobile apps. Brief Start In an enterprise context, it is highly likely that you would have multiple web services that your native mobile app needs to consume. I had exactly this scenario, where one of my clients had asked if they could maintain the same token in the background in the mobile app to use it for accessing multiple web services. I spent some time digging through the documentation and conducting some experiments to confirm some points. Therefore, this post is to share my findings on accessing multiple Azure AD resources from native mobile apps using ADAL. In the previous two posts, we looked at implementing Azure AD SSO login on native mobile apps, then we looked at how to best maintain these access tokens. This post discusses how to use Azure AD SSO tokens to manage access to multiple AAD resources. Let’s assume that we have 2 web services sitting in Azure (ie WebApi1, and WebApi2), both of which are set to use Azure AD authentication. Then, we have the native mobile app, which needs access to both web services (WebApi1, and WebApi2). Let’s look at what we can and cannot do. Cannot Use the Same Azure AD Access-Token for Multiple Resources The first thing that comes to mind is to use the same access token for multiple Azure AD resources, and that is what the client asked about. However, this is not allowed. Azure AD issues a token for certain resource (which is mapped to an Azure AD app). When we call AcquireToken(), we need to provide a resourceID, only ONE resourceID. The result would have a token that can only be used for the supplied resource (id). There are ways where you could use the same token (as we will see later in this post), but it is not recommended as it complicates operations logging, authentication process tracing, etc. Therefore it is better to look at the other options provided by Azure and the ADAL library. Use the Refresh-Token to Acquire Tokens for Multiple Resources The ADAL library supports acquiring multiple access-Tokens for multiple resources using a refresh token. This means once a user is authenticated, the ADAL’s authentication context, would be able to generate an access-token to multiple resources without authenticating the user again. This was mentioned briefly by the MSDN documentation here. The refresh token issued by Azure AD can be used to access multiple resources. For example, if you have a client application that has permission to call two web APIs, the refresh token can be used to get an access token to the other web API as well. (MSDN documentation) public async Task RefreshTokens() { var tokenEntry = await tokensRepository.GetTokens(); var authorizationParameters = new AuthorizationParameters (_controller); var result = "Refreshed an existing Token"; bool hasARefreshToken = true; if (tokenEntry == null) { var localAuthResult = await _authContext.AcquireTokenAsync ( resourceId1, clientId, new Uri (redirectUrl), authorizationParameters, UserIdentifier.AnyUser, null); tokenEntry = new Tokens { WebApi1AccessToken = localAuthResult.AccessToken, RefreshToken = localAuthResult.RefreshToken, Email = localAuthResult.UserInfo.DisplayableId, ExpiresOn = localAuthResult.ExpiresOn }; hasARefreshToken = false; result = "Acquired a new Token"; } var refreshAuthResult = await _authContext.AcquireTokenByRefreshTokenAsync(tokenEntry.RefreshToken, clientId, resourceId2); tokenEntry.WebApi2AccessToken = refreshAuthResult.AccessToken; tokenEntry.RefreshToken = refreshAuthResult.RefreshToken; tokenEntry.ExpiresOn = refreshAuthResult.ExpiresOn; if (hasARefreshToken) { // this will only be called when we try refreshing the tokens (not when we are acquiring new tokens. refreshAuthResult = await _authContext.AcquireTokenByRefreshTokenAsync (refreshAuthResult.RefreshToken, clientId, resourceId1); tokenEntry.WebApi1AccessToken = refreshAuthResult.AccessToken; tokenEntry.RefreshToken = refreshAuthResult.RefreshToken; tokenEntry.ExpiresOn = refreshAuthResult.ExpiresOn; } await tokensRepository.InsertOrUpdateAsync (tokenEntry); return result; } As you can see from above, we check if we have an access-token from previous runs, and if we do, we refresh the access-tokens for both web services. Notice how the _authContext.AcquireTokenByRefreshTokenAsync() provides an overloading parameter that takes a resourceId. This enables us to get multiple access tokens for multiple resources without having to re-authenticate the user. The rest of the code is similar to what we have seen in the previous two posts. ADAL Library Can Produce New Tokens For Other Resources In the previous two posts, we looked at ADAL library and how it uses TokenCache. Although ADAL does not support persistent caching of tokens yet on mobile apps, it still uses the TokenCache for in-memory caching. This enables ADAL library to generate new access-tokens if the context (AuthenticationContext) still exists from previous authentications. Remember in the previous post we said it is recommended to keep a reference to the authentication-context? Here it comes in handy, as it enables us to generate new access-tokens for accessing multiple Azure AD resources. var localAuthResult = await _authContext.AcquireTokenAsync ( resourceId2, clientId, new Uri (redirectUrl), authorizationParameters, UserIdentifier.AnyUser, null); Calling AcquireToken() (even with no refresh-token) would give us a new access-token to webApi2. This is due to ADAL great goodness where it checks if we have a refresh-token in-memory (managed by ADAL), then it uses that to generate a new access-token for webApi2. An alternative The third alternative option is the simplest, but not necessarily the best. In this option, we could use the same access token to consume multiple Azure AD resources. To do this, we need to use the same Azure AD app ID when setting the web application’s authentication. This requires some understanding of how the Azure AD authentication happens on our web apps. If you refer to Taiseer Joudeh’s tutorial, which we mentioned before, you will see that in our web app, we need to tell the authentication framework what’s our Authority and the Audience (Azure AD app Id). If we set up both of our web apps, to use the same Audience (Azure AD app Id), meaning that we link them both into the same Azure AD application, then we could use the same access-token to use both web services. // linking our web app authentication to an Azure AD application private void ConfigureAuth(IAppBuilder app) { app.UseWindowsAzureActiveDirectoryBearerAuthentication( new WindowsAzureActiveDirectoryBearerAuthenticationOptions { Audience = ConfigurationManager.AppSettings["Audience"], Tenant = ConfigurationManager.AppSettings["Tenant"] }); } As we said before, this is very simple and requires less code, but could cause complications in terms of security logging and maintenance. At the end of the day, it depends on your context and what you are trying to achieve. Therefore, I thought it would be worth mentioning and I will leave the judgement for you on which option you choose. Conclusions We looked at how we could use Azure AD SSO with ADAL to access multiple resources from native mobile apps. As we saw, there are three main options, and the choice could be made based on the context of your app. I hope you find this useful and if you have any questions or you need help with some development that you are doing, then just get in touch. This blog post is the third in a series that cover Azure Active Directory Single Sign-On (SSO) authentication in native mobile applications.

December 12, 2014

by Has Altaiar

· 11,484 Views · 1 Like

High Availability, Disaster Recovery, and Microsoft Azure

both high availability (ha) and disaster recovery (dr) have been essential it topics. fundamentally ha is about fault tolerance relevant to the availability of an examined subject like application, database, vms, etc. while dr roots on the ability to resume operations in the aftermath of a catastrophic event. a fundamental difference of these two is that ha expects no down time and no data loss, while dr does. they are different issues and should be addressed separately. background for many it shops, either ha or dr has been a high risk and high cost item. both are essential to business continuity, while traditionally tough technical problems to solve with very significant and long-term commitments on resources. not only they are technically challenging, but a continual cost-cutting which has become an it standard practice in the past two decades makes purchasing hardware/software and constructing either ha or dr solution on premises further distant from it’s financial and technical realties. sense of urgency too often, the technical challenges and resource commitments overwhelm it and turn ha and dr into academic discussions, or symbolic items on a project checklist. at the same time, information is rapidly exploding as internet, mobility and social-network are becoming integral in our daily lives and businesses. there are progressively more data to process and store. for many businesses, the needs for ha and dr is urgent for better managing risks. and continual availability and on-demand recoverability of it are becoming increasingly critical. this is the reality, now the good news is that the recent introduction of cloud computing has fundamentally changed how an ha or dr solution can be implemented. microsoft azure is a vivid example of ha and dr solutions with significantly reduced the required financial commitment and involved technical complexities. the traditional approach by establishing redundancy and acquiring a physical dr site with long-term resources and financial commitments is now largely replaced with consumable services which can be configured in minutes by mouse-clicking and with a manageable cost structure based on usage. ha and dr have become it solutions which are financially realistic and technically feasible for businesses in all sizes. ha, redundancy, and microsoft azure lrs ha is to eliminate a single point of failure of an examined component, an application for example. it denotes a strategy to employ redundancy such that a target application can and will continue being available without downtime while experiencing a failure of hosting hardware or software. there are various and well-developed ha solutions like a hyper-v host cluster using redundant hardware to eliminate a single point of failure of hosting os or hardware, and an application cluster for eliminating a single point of failure by running the application in multiple vm instances with a synchronous state. although ha implementations may vary, the fundamental principle nevertheless remains the same. ha expects neither downtime nor data loss while experiencing an outage of a target hardware or software. ha has become dramatically simple in microsoft azure. basically, all data written to disk in microsoft azure are kept at least in the so-called lrs, locally redundant storage. lrs replicates a transaction synchronously to three different storage nodes across fault domains and upgrade domains within the same region for durability. in layman’s terms, microsoft azure by default maintains at least three copies of user data to achieve ha. dr, replication, and microsoft azure grs dr is about having a plan and backups in place to resume operations in the aftermath of a catastrophic event. unplanned outage is assumed in a dr scenario, therefore some data loss is also expected. notice that ha and dr are different business problems and addressed differently. while both ha and dr are based on applying redundancy, i.e. a source and replicas, or multiple identical nodes of an examines component like application instance, databases, or vms, there are however differences between the two. a dr solution generally employs replicas or backups, are implemented with asynchronous processes, and expects an outage of a source and with some data loss in transit while the outage occurs. while ha requires a logical representation with a real-time integrity using synchronous processes across all participating nodes, expects neither downtime nor data loss while experiencing an outage of a participating node. for a critical workload, one approach of dr is to establish geo-replication to address an outage of an entire geographic area caused by a natural disaster, for example. the concern is that a catastrophic event may impact an entire geographic area causing a datacenter where a mission critical application is being hosted becomes unavailable for an extended period of time. in microsoft azure, geo redundant storage or grs is the default and an optional setting, as shown above, while configuring a storage account. grs will queue a transaction committed to lrs as an asynchronous replication to a secondary region, a few hundreds miles away from the primary region where a storage account is originated. at the secondary region, data is also stored in lrs, i.e. made durable by replicating it to three storage nodes. specifically, a microsoft azure storage account configured with grs essentially maintains three replicas locally for high availability, and replicates the content and maintains three replicas at a secondary datacenter a few hundreds miles away for dr. so all are six copies, three locally and three remotely. all these are configured by one, yes one mouse click from a dropdown list while creating a storage account. the above is a conceptual model illustrated a data flow of grs. grs replication has little performance impact on an application since application data are committed to lrs in real-time while replication to grs is queued, i.e. asynchronously. a write to lrs is synchronous and in real-time, once committed, the changes are expected within 15 minutes to be asynchronously replicated to the secondary site. for a ra-grs storage account, in addition to one primary endpoint for read/write operations as it is in a grs, there is also one secondary endpoint as read only becomes available as shown below. the cost implications of grs or ra-grs include the additional storage and the transmission costs for egress traffic, as applicable, of the secondary datacenter. ingress traffic is free . and microsoft azure storage sla offers 99.9% availability and a cost calculator is also available. microsoft azure recovery services so far, much is about backing up or replicating data. to successfully restore, a dr plan must be put in place and ensure its availability upon a dr scenario in progress. either placing a dr plan at a primary site where the source is or a secondary site where a replica stays has some issues and concerns. keeping a dr plan at the source site where all the resources are in place and on-the-job trainings seems logical. or does it? dr is assuming a catastrophic event over an extended geographic areas where the source site is experiencing an outage. in such case, keeping a dr plan in the source site defeats the purpose. maintaining a dr plan at the secondary site is the choice then. in a dr scenario, a recovery site is to be brought on line within a expected period of time according to a dr plan, and having the dr plan right there and then at a recovery site makes all the sense. or does it? this decision introduces a number of requirements including the physical readiness, the timeliness, and the financial implications on securing and maintaining a dr plan at a remote physical facility. for a vmm server running on system center 2012 sp1 or later, an idea, reliable and straightforward way is to use azure recovery services to maintain a dr plan as shown below. and for any backup needs, using cloud as a backup site makes backing up and restoring data an anytime anywhere operation. azure site recovery vault this service essentially acts as the director of a dr process. it orchestrates and manages the protection and failover of vms in clouds managed by virtual machine manager 2012 sp1 or later. a noticeable advantage is the ability to test a recovery configuration, exercise a proactive failover and recovery, and automate recovery in the event of a site outage. the sla of site recovery services is 99.9% availability to ensure a configured dr plan is always in place with expected updates. this is a dr solution that it can implement, simulate, verify, bring online and be absolutely confident with the readiness. azure backup vault this is a reliable, scalable and inexpensive data protection solution with zero capital investment and extremely low operational expense. like other secure communication with microsoft azure, you will first upload a public certificate to microsoft azure. then download the backup agent to register a target server with the backup vault. then select what to be backed up. both microsoft azure backup sla (99.9% availability) and cost calculator are available for better assessing the solution. closing thoughts form an application’s view, ha is an on-going event while dr is an anticipation. ha and dr are different business problems and should be addressed differently. nevertheless, microsoft azure provides a single platform to gracefully address ha with lrs, dr with grs, and dr orchestration with recovery services, and all with published sla s and a predictable cost structure . going forward, it pros can now include ha and dr as a reliable, scalable and relatively inexpensive proposition by employing microsoft azure as a solution platform. call to action register at microsoft virtual academy, http://aka.ms/mva1 , and train yourself on microsoft azure by taking the track of courses. go to http://aka.ms/azure200 and acquire a free trial subscription and assess microsoft azure for ha and dr solutions. review my recommended content at http://aka.ms/recommended .

December 9, 2014

by Yung Chou

· 11,574 Views · 2 Likes

Docker Orchestration... What It Means and Why You Need It

[This article was written by Yaron Parasol.] Docker containers were created to help enable the fast, and reliable deployment of application components or tiers, by creating a container that holds a self-contained ready to deploy parts of applications, with the middleware and the app business logic needed to run them successfully. For example, a Spring application within a Tomcat container. By design, Docker is purposely an isolated self-contained part of the application, typically one tier or even one node in a tier. However, an application is typically multi-tier in its architecture and that means you have tiers with dependencies between them, where the nature of the dependencies can be anything from network connections and remote API invocations, to exchange of messages between application tiers. And hence an app is a set of different containers with specific configurations. This is why you need a way to glue the pieces of your app together. While, Docker has a basic solution for connecting containers using a Docker bridge, this solution is not always the preferred one, especially when deploying the container across different hosts and you need to take care of real network settings. Docker orchestration with TOSCA + Cloudify. Check it out. Go So, what role does the orchestrator play? The orchestrator will take care of two things: The timing of container creation - as containers need to be created by order of dependencies and Container configuration in order to allow containers to communicate with one another - and for that the orchestrator needs to pass runtime properties between containers. As a side note here: With Docker you need a special tweak here, as you typically don’t touch config files inside a container, you keep the container intact, so there is an interesting workaround for cases that this is required. One method to do this is by using a YAML-based orchestration plan to orchestrate the deployment of apps and post-deployment automation processes, which is the approach Cloudify employs. Based on TOSCA (topology and orchestration standard of cloud apps), this orchestration plan describes the components and their lifecycle, and the relationships between components, especially when it comes to complex topologies. This includes, what’s connected to what, what’s hosted on what, and other such considerations. TOSCA is able to describe the infrastructure, as well as, the middleware tier, and app layers on top of these. Cloudify basically takes this TOSCA orchestration plan (dubbed blueprints in Cloudify speak) and materializes these using workflows that traverse the graph of components, or this plan of components and issues commands to agents. These then create the app components and glue them together. The agents use extensions called plugins that are adaptors between the Cloudify configuration and the various infrastructure as a service (IaaS) and automation tools’ APIs. In our case, we created a plugin to interface with the Docker API. Introducing the Docker Cloudify Plugin The Cloudify-Docker plugin is quite straightforward, it installs the Docker API endpoint/server on the machine and then uses the Docker-Py binding to create, configure, and remove containers. TOSCA lifecycle events are: Create - installation of the app components Configure - configuration of the component Start - startup/running the component There is also stop & delete - for shutdown and removal We started by using the create - to create the container, we did not implement configure at the beginning, and start to run the application. But then we realized that for containers with dependencies we need to have runtime properties, such as IP import of the counterpart container in order to create the container for example. When we create an app server container, we need the port and IP of the database container. So, we pushed the creation of the container to the configure event, and used a TOSCA relationship pre-configure hook, to get the dependent container’s info at runtime. The way to expose the runtime info to the container with the dependencies is by setting them as environment variables. 01.interfaces: 02. cloudify.interfaces.lifecycle: 03. configure: 04. implementation: docker.docker_plugin.tasks.configure 05. inputs: 06. container_config: 07. command: mongod--rest--httpinterface --smallfiles 08. image: dockerfile/mongodb 09. start: 10. implementation: docker.docker_plugin.tasks.run 11. inputs: 12. container_start: 13. port_bindings: 14. 27017: 27017 15. 28017: 28017 Nodecellar Example I’d like to explain how this works by using our Nodecellar app as an example. The Nodecellar app is composed of two hosts that, in this case, Cloudify didn’t create but just SSHed into and then installed agents on. On one we have the MongoD container, with a MongoD process. On the other we have the Nodecellar container with NodeJS and the Nodecellar app within it. The Nodecellar container needs a connection to the MongoD container to run the app queries when the app starts. Ultimately, an orchestrator should not be limited to software deployment, the whole idea behind Docker Is to allow for agility, so we’d also like to use Docker in situations of auto-scale out and auto-heal, CD. In our next post we’ll show exactly that - how Cloudify can be used with Docker for post-deployment scenarios.

December 2, 2014

by Sharone Zitzman

· 17,893 Views

Cookies set through the Owin API sometimes mysteriously disappear. The problem is that deep within System.Web, there has been a cookie monster sleeping since the dawn of time (well, at least since .NET and System.Web was released). The monster has been sleeping for all this time, but now, with the new times arriving with Owin, the monster is awake. Being starved from the long sleep, it eats cookies set through the Owin API for breakfast. Even if the cookies are properly set, they are eaten by the monster before the Set-Cookie headers are sent out to the client browser. This typically results in heisenbugsaffecting sign in and sign out functionality. TL;DR The problem is that System.Web has its own master source of cookie information and that isn’t the Set-Cookie header. Owin only knows about the Set-Cookie header. A workaround is to make sure that any cookies set by Owin are also set in the HttpContext.Current.Response.Cookies collection. This is exactly what my Kentor.OwinCookieSaver middleware does. It should be added in to the Owin pipeline (typically in Startup.Auth.cs), before any middleware that handles cookies. app.UseKentorOwinCookieSaver(); The cookie saver middleware preserves cookies set by other middleware. Unfortunately it is not reliable for cookies set by the application code (such as in MVC Actions). The reason is that the System.Web cookie handling code might be run after the application code, but before the middleware. For cookies set by the application code, the workaround by storing a dummy value in the sessions is more safe. The Reason The System.Web API has been around since the dawn of .NET. Back then, it was tightly coupled to IIS and the one and only API for web applications. As the one and only, it could assume that it was the master of all information. In HttpResponse.cs there is a check whether the cookie collection is changed (adding cookies doesn’t count as a change) and in that case it wipes the existing Set-Cookie header. if (_cookies.Changed || needToReset) { // delete all set cookie headers headers.Remove("Set-Cookie"); // write all the cookies again for(int c = 0; c < _cookies.Count; c++) { // Write the cookies, code removed for brevity. } } This is what a sleeping cookie monster looks like in the code. It’s sleeping, because there’s still nothing questioning the cooke collection being the master. But that all changes when Owin was introduced. The Owin API knows nothing aboutSystem.Web.HttpContext. In fact, that’s kind of the point with Owin, to break the dependency between .NET web applications and IIS. In Katana, cookies are (in most cases) added by a call toResponse.Cookies.Append() which adds a new Set-Cookie header. Effectively we have a system with conflicting views on where the master information is stored. Owin considers the actual header to be the master while System.Web considers the response cookie collection to be the master. Having conflicting masters is never a good idea. This is a known issue for Katana, classified as “High Impact”. The Workaround Middleware The conflict between the two distance relatives System.Web and Owin is a typical family conflict. The older one is wrong, but won’t change views just because someone young appears with new facts. When mediating such a conflict it’s usually easiest to get the younger generation to work around the older. Changing System.Web is not feasible, so the focus has to be on Owin. The workaround middleware I’ve created checks the Set-Cookie header and syncs its contents back to the cookie collection. By putting it before any cookie handling middleware in the pipeline it can save the cookies from the monster, before System.Web deletes the header. The core function of the workaround middleware is the Invoke method. public async override Task Invoke(IOwinContext context) { await Next.Invoke(context); var setCookie = context.Response.Headers.GetValues("Set-Cookie"); if(setCookie != null) { var cookies = CookieParser.Parse(setCookie); foreach(var c in cookies) { if(!HttpContext.Current.Response.Cookies.AllKeys.Contains(c.Name)) { HttpContext.Current.Response.Cookies.Add(c); } } } } The logic is quite straight forward. Parse each Set-Cookie header into a HttpCookie object and ensure that it is present in the response cookie collection. For the applications we’ve tested it works, but it is a workaround and not a real fix. Please leave a comment below if you find situation where this workaround does not work. That’s very valuable information for others having the same issue. A Permanent Fix I’m also looking into fixing this permanently by contributing to the System.Web host in Katana. The fix there would be to directly intercept any calls to set the Set-Cookie header and add them to the cookie to the collection too. That should be a much more stable solution as it prevents the problem rather than trying to fix it afterwards.

November 27, 2014

by Anders Abel

· 24,210 Views

From Vaadin to Docker - A Novice's Journey

I’m a huge Vaadin fan and I’ve created a Github workshop I can demo at conferences. A common issue with such kind of workshops is that attendees have to prepare their workstations in advance… and there’s always a significant part of them that comes with not everything ready. At this point, two options are available to the speaker: either wait for each of the attendee to finish the preparation – too bad for the people who took the time at home to do that, or start anyway – and lose the not-ready part. Given the current buzz around Docker, I thought that could be a very good way to make the workshop preparation quicker – only one step, and hasslefree – no problem regarding the quirks of your operation system. The required steps I ask the attendees are the following: Install Git Install Java, Maven and Tomcat Clone the git repo Build the project (to prepare the Maven repository) Deploy the built webapp Start Tomcat These should directly be automated into Docker. As I wasted much time getting this to work, here’s the tale of my journey in achieving this (be warned, it’s quite long). If you’ve got similar use-cases, I hope it will be useful in you getting things done faster. Starting with Docker The first step was to get to know the basics about Docker. Fortunately, I had the chance to attend a Docker workshop by David Gageot at Duchess Swiss. This included both Docker installation and basics of Dockerfile. I assume readers have likewise a basic understanding of Docker. For those who don’t, I guess browsing the Docker’s official documentation is a nice idea: Installation Dockerfile reference Building my first Dockerfile The Docker image can be built with the following command ran into the directory of the Dockerfile: $ docker build -t vaadinworkshop . The first issues one can encounter when playing with Docker the first time, is to get the following error message: Get http:///var/run/docker.sock/v1.14/containers/json: dial unix /var/run/docker.sock: no such file or directory The reason is because one didn’t export the required environment variables displayed by the boot2docker information message. If you lost the exact data, no worry, just use the shellinit boot2docker parameter: $ boot2docker shellinit Writing /Users/i303869/.docker/boot2docker-vm/ca.pem: Writing /Users/i303869/.docker/boot2docker-vm/cert.pem: Writing /Users/i303869/.docker/boot2docker-vm/key.pem: export DOCKER_HOST=tcp://192.168.59.103:2376 export DOCKER_CERT_PATH=/Users/i303869/.docker/boot2docker-vm Copy-paste the export lines above will solve the issue. These can also be set in one’s .bashrc script as it seems these values seldom change. Next in line is the following error: Get http://192.168.59.103:2376/v1.14/containers/json: malformed HTTP response "x15x03x01x00x02x02" This error message seems to be because of a mismatch between versions of the client and the server. It seems it is because of a bug on Mac OSX when upgrading. For a long term solution, reinstall Docker from scratch; for a quick fix, use the --tls flag with the docker command. As it is quite cumbersome to type it everything, one can alias it: $ alias docker="docker --tls" My last mistake when building the image comes from building the Dockerfile from a not empty directory. Docker sends every file it finds in the directory of the Dockerfile to the Docker container for build: $ docker --tls build -t vaadinworkshop . Sending build context to Docker daemon Too many kB Fix: do not try this at home and start from a directory container the Dockerfile only. Starting from scratch Dockerfiles describe images – images are built as a layered list of instructions. Docker images are designed around single inheritance: one image has to be set a single parent. An image requiring no parent starts from scratch, but Docker provides 4 base official distributions: busybox, debian, ubuntu and centos (operating systems are generally a good start). Whatever you want to achieve, it is necessary to choose the right parent. Given the requirements I set for myself (Java, Maven, Tomcat and Git), I tried to find the right starting image. Many Dockerfiles are already available online on the Docker hub. The browsing app is quite good, but to be really honest, the search can really be improved. My intention was to use the image that matched the most of my requirements, then fill the gap. I could find no image providing Git, but I thought the dgageot/maven Dockerfile would be a nice starting point. The problem is that the base image is a busybox and provides no installer out-of-the-box (apt-get, yum, whatever). For this reason, David uses a lot of curl to get Java 8 and Maven in his Dockerfiles. I foolishly thought I could use a different flavor of busybox that provides the opkg installer. After a while, I accumulated many problems, resolving one heading to another. In the end, I finally decided to use the OS I was most comfortable with and to install everything myself: FROM ubuntu:utopic Scripting Java installation Installing git, maven and tomcat packages is very straightforward (if you don’t forget to use the non-interactive options) with RUN and apt-get: RUN apt-get update && \ apt-get install -y --force-yes git maven tomcat8 Java doesn’t fall into this nice pattern, as Oracle wants you to accept the license. Nice people did however publish it to a third-party repo. Steps are the following: Add the needed package repository Configure the system to automatically accept the license Configure the system to add un-certified packages Update the list of repositories At last, install the package Also add a package for Java 8 system configuration. RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list && \ echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && \ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 RUN apt-get update && \ apt-get install -y --force-yes oracle-java8-installer oracle-java8-set-default Building the sources Getting the workshop’s sources and building them is quite straightforward with the following instructions: RUN git clone https://github.com/nfrankel/vaadin7-workshop.git WORKDIR /vaadin7-workshop RUN mvn package The drawback of this approach is that Maven will start from a fresh repository, and thus download the Internet the first time it is launched. At first, I wanted to mount a volume from the host to the container to share the ~/.m2/repository folder to avoid this, but I noticed this could only be done at runtime through the -v option as the VOLUME instruction cannot point to a host directory. Starting the image The simplest command to start the created Docker image is the following: $ docker run -p 8080:8080 Do not forget the port forwarding from the container to the host, 8080 for the standard HTTP port. Also, note that it’s not necessary to run the container as a daemon (with the -d option). The added value of that is that the standard output of the CMD (see below) will be redirected to the host. When running as a daemon and wanting to check the logs, one has to execute bash in the container, which requires a sequence of cumbersome manipulations. Configuring and launching Tomcat Tomcat can be launched when starting the container by just adding the following instruction to the Dockerfile: CMD ["catalina.sh", "run"] However, trying to start the container at this point will result in the following error: Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/common/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/common], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/server/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/server], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/shared/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/shared], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina initDirs SEVERE: Cannot find specified temporary folder at /usr/share/tomcat8/temp Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina load WARNING: Unable to load server configuration from [/usr/share/tomcat8/conf/server.xml] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina initDirs SEVERE: Cannot find specified temporary folder at /usr/share/tomcat8/temp Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina load WARNING: Unable to load server configuration from [/usr/share/tomcat8/conf/server.xml] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina start SEVERE: Cannot start server. Server instance is not configured. I have no idea why, but it seems Tomcat 8 on Ubuntu is not configured in any meaningful way. Everything is available but we need some symbolic links here and there as well as creating the temp directory. This translates into the following instruction in the Dockerfile: RUN ln -s /var/lib/tomcat8/common $CATALINA_HOME/common && \ ln -s /var/lib/tomcat8/server $CATALINA_HOME/server && \ ln -s /var/lib/tomcat8/shared $CATALINA_HOME/shared && \ ln -s /etc/tomcat8 $CATALINA_HOME/conf && \ mkdir $CATALINA_HOME/temp The final trick is to connect the exploded webapp folder created by Maven to Tomcat’s webapps folder, which it looks for deployments: RUN mkdir $CATALINA_HOME/webapps && \ ln -s /vaadin7-workshop/target/workshop-7.2-1.0-SNAPSHOT/ $CATALINA_HOME/webapps/vaadinworkshop At this point, the Holy Grail is not far away, you just have to browse the URL… if only we knew what the IP was. Since running on Mac, there’s an additional VM beside the host and the container that’s involved. To get this IP, type: $ boot2docker ip The VM's Host only interface IP address is: 192.168.59.103 Now, browsing http://192.168.59.103:8080/vaadinworkshop/ will bring us to the familiar workshop screen: Developing from there Everything works fine but didn’t we just forget about one important thing, like how workshop attendees are supposed to work on the sources? Easy enough, just mount the volume when starting the container: docker run -v /Users//vaadin7-workshop:/vaadin7-workshop -p 8080:8080 vaadinworkshop Note that the host volume must be part of /Users and if on OSX, it must use boot2docker v. 1.3+. Unfortunately, it seems now is the showstopper, as mounting an empty directory from the host to the container will not make the container’s directory available from the host. On the contrary, it will empty the container’s directory given that the host’s directory doesn’t exist… It seems there’s an issue in Docker on Mac. The installation of JHipster runs into the same problem, and proposes to use the Samba Docker folder sharing project. I’m afraid I was too lazy to go further at this point. However, this taught me much about Docker, its usages and use-cases (as well as OSX integration limitations). For those who are interested, you’ll find below the Docker file. Happy Docker! FROM ubuntu:utopic MAINTAINER Nicolas Frankel # Config to get to install Java 8 w/o interaction RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list && echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 RUN apt-get update && apt-get install -y --force-yes git oracle-java8-installer oracle-java8-set-default maven tomcat8 RUN git clone https://github.com/nfrankel/vaadin7-workshop.git WORKDIR /vaadin7-workshop RUN git checkout v7.2-1 RUN mvn package ENV JAVA_HOME /usr/lib/jvm/java-8-oracle ENV CATALINA_HOME /usr/share/tomcat8 ENV PATH $PATH:$CATALINA_HOME/bin # Configure Tomcat 8 directories RUN ln -s /var/lib/tomcat8/common $CATALINA_HOME/common && ln -s /var/lib/tomcat8/server $CATALINA_HOME/server && ln -s /var/lib/tomcat8/shared $CATALINA_HOME/shared && ln -s /etc/tomcat8 $CATALINA_HOME/conf && mkdir $CATALINA_HOME/temp && mkdir $CATALINA_HOME/webapps && ln -s /vaadin7-workshop/target/workshop-7.2-1.0-SNAPSHOT/ $CATALINA_HOME/webapps/vaadinworkshop VOLUME ["/vaadin7-workshop"] CMD ["catalina.sh", "run"] # docker build -t vaadinworkshop . # docker run -v ~/vaadin7-workshop training/webapp -p 8080:8080 vaadinworkshop

November 25, 2014

by Nicolas Fränkel

· 13,044 Views

Configuring an OpenStack VM with Multiple Network Cards

[This article was written by Barak Merimovich.] We have discussed OpenStack networking extensively in previous posts. In this post, I’d like to dive into a more advanced OpenStack networking scenario. Many cloud images are not configured to automatically bring up all network cards that are available. They will usually only have a single network card configured. To correctly set up a host in the cloud with multiple network cards, log on to the machine and bring up the additional interfaces. echo $'auto eth1\niface eth1 inet dhcp' | sudo tee /etc/network/interfaces.d/eth1.cfg > /dev/null sudo ifup eth1 Networks in the cloud A complex network architecture is a mainstay of modern IaaS clouds. Understanding how to configure your cloud-based networks, and hosts, is critical to getting your application working in the cloud. This is especially true with Cloudify, the open source cloud orchestration platform I work on. The cloud, like the world, used to be flat It was not that long a time ago that most IaaS providers only supported flat networks – all of your hosts were in one large network. Separation between services running in the cloud was enforced in software or with firewalls/security-groups. But technically, all of the hosts were connected to the same network and visible to each other. The flat network model is simple, and therefore easy to reason and understand. It was a good choice for the early days of the IaaS cloud and no doubt helped with getting applications into the cloud in the first place. It was one of the things that made EC2 so easy to use for anyone just starting out with the ‘cloud’. This model is in fact still available on Amazon Web Services under the title ‘EC2-Classic’. And for many applications, a flat network is good enough. But as cloud adoption increases, more complex applications are moving into the clouds, and issues like network separation, security, SLA and broadcast domains make more complex networks models a must. Software Defined Networks (SDN) fill that gap. They are now a staple of most major IaaS clouds. AWS has AWS-VPC, OpenStack has the Neutron project and there are many other implementations. Working with SDN requires knowing a bit more about how information moves around between your cloud resources. In this post I am going to discuss how to set up a host in the cloud so it will play nice with complex networks. I’ll be using OpenStack, but the concepts are similar for other cloud infrastructures. Openstack configuration I am going to start with an empty tenant, only the public network is available. First, lets set up out networks and router: neutron router-create demo-router neutron net-create demo-network-1 neutron net-create demo-network-2 neutron subnet-create --name demo-subnet-1 demo-network-1 10.0.0.0/24 neutron subnet-create --name demo-subnet-2 demo-network-2 10.0.1.0/24 neutron router-interface-add demo-router demo-subnet-1 neutron router-interface-add demo-router demo-subnet-2 neutron router-gateway-set demo-router public Note the network IDs: neutron net-list | id | name | subnets | | 2c33efe2-6204-4125-9716-3bc525630016 | demo-network-1 | 928dafa0-83ef-459c-b20d-71d8ea596fa2 10.0.0.0/24 | | aa30627e-c181-4a4b-89bf-5dd7c26c244e | demo-network-2 | 26d573f7-7953-4a54-825b-ed7bbc0661c7 10.0.1.0/24 | | e502de8d-929a-4ee0-bd18-efa297875cf6 | public | d40dab51-a729-452c-9ee6-b9ad08d10808 | We’ll start with a standard Ubuntu cloud image: glance image-create --name "Ubuntu 12.04 Standard" --location "http://uec-images.ubuntu.com/precise/current/precise-server-cloudimg-amd64-disk1.img" --disk-format qcow2 --container-format bare Create the keypair and security group: nova keypair-add demo-keypair > demo-keypair.pem chmod 400 demo-keypair.pem nova secgroup-create demo-security-group "Security group for demo" nova secgroup-add-rule demo-security-group tcp 22 22 0.0.0.0/0 Let’s spin up an instance connected to both our networks: nova boot -flavor m1.small --image "Ubuntu 12.04 Standard" --nic net-id=2c33efe2-6204-4125-9716-3bc525630016 --nic net-id=aa30627e-c181-4a4b-89bf-5dd7c26c244e --security-groups demo-security-group --key-name demo-keypair demo-vm And set up floating IPs for the first network: nova list | ID | Name | Status | Task State | Power State | Networks | 2b17588b-8980-4489-9a04-6539a159dc3c | demo-vm | ACTIVE | None | Running | demo-network-1=10.0.0.2; demo-network-2=10.0.1.2 | neutron floatingip-create public neutron floatingip-list | id | fixed_ip_address | floating_ip_address | port_id | | 49c8b05e-bb8f-4b07-80ed-3155ab6ffc09 | | 192.168.15.42 | | neutron port-list | id | name | mac_address | fixed_ips | | 1ccfd334-7328-4b22-b93e-24a0888276ab | | fa:16:3e:14:39:39 | {"subnet_id": "94598487-c1fc-4f55-ac1f-ef2545d5cfeb", "ip_address": "10.0.1.3"} | | a482c4f6-fa74-476e-b1ce-cd8dd0c70815 | | fa:16:3e:18:92:79 | {"subnet_id": "94598487-c1fc-4f55-ac1f-ef2545d5cfeb", "ip_address": "10.0.1.2"} | | b23d7836-30c5-4bff-b873-15c87ba051f6 | | fa:16:3e:3a:28:40 | {"subnet_id": "dec6ec74-cfa9-4a08-8792-54900631b98e", "ip_address": "10.0.0.3"} | | d421b447-2adf-406f-876b-142238683344 | | fa:16:3e:9d:fc:7f | {"subnet_id": "dec6ec74-cfa9-4a08-8792-54900631b98e", "ip_address": "10.0.0.2"} | | dcf8696b-cc80-4b48-b09c-61c0f8ab02ac | | fa:16:3e:5b:39:fb | {"subnet_id": "94598487-c1fc-4f55-ac1f-ef2545d5cfeb", "ip_address": "10.0.1.1"} | | f6a1666e-495a-4d3f-afa3-754b3cb3cfc0 | | fa:16:3e:8a:1b:fb | {"subnet_id": "dec6ec74-cfa9-4a08-8792-54900631b98e", "ip_address": "10.0.0.1"} | neutron floatingip-associate 49c8b05e-bb8f-4b07-80ed-3155ab6ffc09 d421b447-2adf-406f-876b-142238683344 Note how we matched the VM’s IP to its port, and associated the floating IP to the port. I wish there was an easier way to do this from the CLI… If everything worked correctly, you should have the following setup: Let’s make sure ssh works correctly: ssh -i demo-keypair.pem [email protected] hostname demo-vm Cool, ssh works. Now, we should have two network cards, right? ssh -i demo-keypair.pem [email protected] hostname demo-vm Cool, ssh works. Now, we should have two network cards, right? ssh -i demo-keypair.pem [email protected] ifconfig eth0 Link encap:Ethernet HWaddr fa:16:3e:5f:a2:5f inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe5f:a25f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:230 errors:0 dropped:0 overruns:0 frame:0 TX packets:224 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:46297 (46.2 KB) TX bytes:31130 (31.1 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Huh?! The VM only has one working network interface! Where is my second NIC? Was there a configuration problem with the OpenStack network setup? The answer is here: ssh -i demo-keypair.pem [email protected] ifconfig -a eth0 Link encap:Ethernet HWaddr fa:16:3e:5f:a2:5f inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe5f:a25f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:324 errors:0 dropped:0 overruns:0 frame:0 TX packets:332 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:69973 (69.9 KB) TX bytes:47218 (47.2 KB) eth1 Link encap:Ethernet HWaddr fa:16:3e:29:6d:22 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) The second NIC exists, but is not running. The issue is not with the OpenStack network configuration – it’s with the image. The image itself should be configured to work correctly with multiple NICs. All we have to do is bring up the NIC. So we ssh into the instance: ssh -i demo-keypair.pem [email protected] And run the following commands: echo $'auto eth1\niface eth1 inet dhcp' | sudo tee /etc/network/interfaces.d/eth1.cfg > /dev/null sudo ifup eth1 The second NIC should now be running: ifconfig eth1 eth1 Link encap:Ethernet HWaddr fa:16:3e:18:92:79 inet addr:10.0.1.2 Bcast:10.0.1.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe18:9279/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:81 errors:0 dropped:0 overruns:0 frame:0 TX packets:45 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:15376 (15.3 KB) TX bytes:3960 (3.9 KB) And there you go – your VM can access both networks. This issue can make life complicated when setting up a complex, or even a not very complex, application. When will this issue hurt you? Well, imagine a scenario where you have a web server and a database server. The web server is connected to both Network1 and Network2, and the database server is only connected to Network2. Network1 is connected to the external world over a router, and Network 2 is completely internal, adding another layer of security to the critical database server. So what happens if the web server only has one network card? If only the NIC for Network1 is up, the web server can’t access the database. If only the NIC for Network2 is up, the web server can’t be reached from the external world. Even worse, if this web server is accessed via a floating IP, this IP will also not work, so you won’t be able to access the web server and fix the issue. Tricky. In conclusion The above commands will bring up your additional network card. You will of-course need to repeat this process for each additional network card, and for each VM. You can use a start-up script (a.k.a. user-data script) or system service to run these commands, but there are better ways. I’ll discuss how to automate the network setup in a follow-up post. This was originally posted at Barak's blog Head in the Clouds, find it here.

November 4, 2014

by Sharone Zitzman

· 14,823 Views

ZooKeeper on Kubernetes

The last couple of weeks I've been playing around with docker and kubernetes. If you are not familiar with kubernetes let's just say for now that its an open source container cluster management implementation, which I find really really awesome. One of the first things I wanted to try out was running an Apache ZooKeeper ensemble inside kubernetes and I thought that it would be nice to share the experience. For my experiments I used Docker v. 1.3.0 and Openshift V3, which I built from source and includes Kubernetes. ZooKeeper on Docker Managing a ZooKeeper ensemble is definitely not a trivial task. You usually need to configure an odd number of servers and all of the servers need to be aware of each other. This is a PITA on its own, but it gets even more painful when you are working with something as static as docker images. The main difficulty could be expressed as: "How can you create multiple containers out of the same image and have them point to each other?" One approach would be to use docker volumes and provide the configuration externally. This would mean that you have created the configuration for each container, stored it somewhere in the docker host and then pass the configuration to each container as a volume at creation time. I've never tried that myself, I can't tell if its a good or bad practice, I can see some benefits, but I can also see that this is something I am not really excited about. It could look like this: docker run -p 2181:2181 -v /path/to/my/conf:/opt/zookeeper/conf my/zookeeper An other approach would be to pass all the required information as environment variables to the container at creation time and then create a wrapper script which will read the environment variables, modify the configuration files accordingly, launch zookeeper. This is definitely easier to use, but its not that flexible to perform other types of tuning without rebuilding the image itself. Last but not least one could combine the two approaches into one and do something like: Make it possible to provide the base configuration externally using volumes. Use env and scripting to just configure the ensemble. There are plenty of images out there that take one or the other approach. I am more fond of the environment variables approach and since I needed something that would follow some of the kubernetes conventions in terms of naming, I decided to hack an image of my own using the env variables way. Creating a custom image for ZooKeeper I will just focus on the configuration that is required for the ensemble. In order to configure a ZooKeeper ensemble, for each server one has to assign a numeric id and then add in its configuration an entry per zookeeper server, that contains the ip of the server, the peer port of the server and the election port. The server id is added in a file called myid under the dataDir. The rest of the configuration looks like: server.1=server1.example.com:2888:3888 server.2=server2.example.com:2888:3888 server.3=server3.example.com:2888:3888 ... server.current=[bind address]:[peer binding port]:[election biding port]Note that if the server id is X the server.X entry needs to contain the bind ip and ports and not the connection ip and ports. So what we actually need to pass to the container as environment variables are the following: The server id. For each server in the ensemble: The hostname or ip The peer port The election port If these are set, then the script that updates the configuration could look like: if [ ! -z "$SERVER_ID" ]; then echo "$SERVER_ID" > /opt/zookeeper/data/myid #Find the servers exposed in env. for i in `echo {1..15}`;do HOST=`envValue ZK_PEER_${i}_SERVICE_HOST` PEER=`envValue ZK_PEER_${i}_SERVICE_PORT` ELECTION=`envValue ZK_ELECTION_${i}_SERVICE_PORT` if [ "$SERVER_ID" = "$i" ];then echo "server.$i=0.0.0.0:2888:3888" >> conf/zoo.cfg elif [ -z "$HOST" ] || [ -z "$PEER" ] || [ -z "$ELECTION" ] ; then #if a server is not fully defined stop the loop here. break else echo "server.$i=$HOST:$PEER:$ELECTION" >> conf/zoo.cfg fi done fi For simplicity the function that read the keys and values from env are excluded. The complete image and helping scripts to launch zookeeper ensembles of variables size can be found in the fabric8io repository. ZooKeeper on Kubernetes The docker image above, can be used directly with docker, provided that you take care of the environment variables. Now I am going to describe how this image can be used with kubernetes. But first a little rambling... What I really like about using kubernetes with ZooKeeper, is that kubernetes will recreate the container, if it dies or the health check fails. For ZooKeeper this also means that if a container that hosts an ensemble server dies, it will get replaced by a new one. This guarantees that there will be constantly a quorum of ZooKeeper servers. I also like that you don't need to worry about the connection string that the clients will use, if containers come and go. You can use kubernetes services to load balance across all the available servers and you can even expose that outside of kubernetes. Creating a Kubernetes confing for ZooKeeper I'll try to explain how you can create 3 ZooKeeper Server Ensemble in Kubernetes. What we need is 3 docker containers all running ZooKeeper with the right environment variables: { "image": "fabric8/zookeeper", "name": "zookeeper-server-1", "env": [ { "name": "ZK_SERVER_ID", "value": "1" } ], "ports": [ { "name": "zookeeper-client-port", "containerPort": 2181, "protocol": "TCP" }, { "name": "zookeeper-peer-port", "containerPort": 2888, "protocol": "TCP" }, { "name": "zookeeper-election-port", "containerPort": 3888, "protocol": "TCP" } ] } The env needs to specify all the parameters discussed previously. So we need to add along with the ZK_SERVER_ID, the following: ZK_PEER_1_SERVICE_HOST ZK_PEER_1_SERVICE_PORT ZK_ELECTION_1_SERVICE_PORT ZK_PEER_2_SERVICE_HOST ZK_PEER_2_SERVICE_PORT ZK_ELECTION_2_SERVICE_PORT ZK_PEER_3_SERVICE_HOST ZK_PEER_3_SERVICE_PORT ZK_ELECTION_3_SERVICE_PORT An alternative approach could be instead of adding all these manual configuration, to expose peer and election as kubernetes services. I tend to favor the later approach as it can make things simpler when working with multiple hosts. It's also a nice exercise for learning kubernetes. So how do we configure those services? To configure them we need to know: the name of the port the kubernetes pod the provide the service The name of the port is already defined in the previous snippet. So we just need to find out how to select the pod. For this use case, it make sense to have a different pod for each zookeeper server container. So we just need to have a label for each pod, the designates that its a zookeeper server pod and also a label that designates the zookeeper server id. "labels": { "name": "zookeeper-pod", "server": 1 } Something like the above could work. Now we are ready to define the service. I will just show how we can expose the peer port of server with id 1, as a service. The rest can be done in a similar fashion: { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-peer-1", "kind": "Service", "port": 2888, "containerPort": "zookeeper-peer-port", "selector": { "name": "zookeeper-pod", "server": 1 } } The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port. Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy. { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-client", "kind": "Service", "port": 2181, "createExternalLoadBalancer": "true", "containerPort": "zookeeper-client-port", "selector": { "name": "zookeeper-pod" } } The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port. Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy. { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-client", "kind": "Service", "port": 2181, "createExternalLoadBalancer": "true", "containerPort": "zookeeper-client-port", "selector": { "name": "zookeeper-pod" } } I hope you found it useful. There is definitely room for improvement so feel free to leave comments.

November 3, 2014

by Ioannis Canellos

· 22,284 Views · 3 Likes

Sharding Pitfalls Part III: Chunk Balancing and Collection Limits

In Parts 1 and 2 we have covered a number of common issues people run into when managing a sharded MongoDB cluster. In this final post of the series we will cover a subtle, but important distinction in terms of balancing a sharded cluster as well as an interesting limitation that can be worked around relatively easily, but is nonetheless surprising when it comes up. 6. Chunk balancing != data balancing != traffic balancing The balancer in a sharded cluster cares about just one thing: Are chunks for a given collection evenly balanced across all shards? If they are not, then it will take steps to rectify that imbalance. This all sounds perfectly logical, and even with extra complexity like tagging involved the logic is pretty straight forward. If we assume that all chunks are equal, then we can rest assured that our data is being evenly balanced across all the shards in our cluster and rest easy at night. Although that is sometimes, perhaps even frequently, the case it is not always true - chunks are not always equal. There can be massive “jumbo” chunks that exceed the maximum chunk size (64MiB), completely empty chunks and everything in between. Let’s use an example from our first pitfall, the monotonically increasing shard key. For our example, we have picked just such a key to shard on (date), and up until this point we have had just one shard and had not sharded the collection. We are about to add a second shard to our cluster and so we enable sharding on the collection and do the necessary admin work to add the new shard into the cluster. Once the collection is enabled for sharding, the first shard contains all the newly minted chunks. Let’s represent them in a simplified table of 10 chunks. This is not representative of a real data set, but it will do for illustrative purposes: Table 1 - Initial Chunk Layout Now we add our second shard. The balancer will kick in and attempt to distribute the chunks evenly. It will do this by moving the lowest range chunks to the new shard until the counts are identical. Once it is finished balancing, our table now looks like this: Table 2 - Balanced Chunk Layout That looks pretty good at the moment, but lets imagine that more recent chunks are more likely to have more activity (updates say) than older chunks. Adding the traffic share estimates for each chunk shows that shard1 is taking far more traffic (72%) than shard2 (28%) despite the chunks seeming balanced overall based on the approximate size. Hence, chunk balancing is not equal to traffic balancing. Using that same example, let’s add another wrinkle - periodic deletion of old data. Every 3 months we run a job to delete any data older than 12 months. Let’s look at the impact of that on our table after we run it for the first time (assuming the first run happens on July 1st 2015). Table 3 - Post-Delete Chunk Layout The distribution of data is now completely skewed toward shard1 - shard2 is in fact empty! However, the balancer is completely unaware of this imbalance - the chunk count has remained the same the entire time, and as far as it is concerned the system is in a steady state. With no data on shard2, our traffic imbalance as seen above will be even worse, and we have essentially negated the benefit of having a second shard for this collection. Possible Mitigation Strategies If data and traffic balance are important, select an appropriate shard key Move chunks manually to address the imbalances - swap “hot” chunks for “cool” chunks, empty chunks for larger chunks 7. Waiting too long to shard a collection (collection too large) This is not very common, but when it falls on your shoulders, it can be quite challenging to solve. There is a maximum data size for a collection when when it is initially split which is a function of the chunk size and data size as noted on the limits page. If your collection contains less than 256GiB of data, then there will be no issue. If the collection size exceeds 256GiB but is less than 400GiB, then MongoDB may be able to do an initial split without any special measures being taken. Otherwise, with larger initial data sizes and the default settings, the initial split will fail. It is worth noting that once split the collection may grow as needed and without any real limitations as long as you can continue to add shards as data size grows. Possible Mitigation Strategies Since the limit is dictated by the chunk size and the data size, and assuming there is not much to be done about the data size, then the remaining variable is the chunk size. This is adjustable (default is 64MiB) and can be raised in order to let a large collection split initially and then reduced once that has been completed. The required chunk size increase will depend on the actual data size. However, this is relatively easy to work out - simply divide your data size by 256GB and then multiply that figure by 64MiB (and round up if it is not a nice even number). As an example, let’s consider a 4TiB collection: 4TiB divided by 256GiB = 16 64MiB x 16 = 1024MiB Hence, set the max chunk size to 1024MiB, then perform the initial sharding of the collection, and then finally reduce the chunk size back to 64MiB using the same procedure. . Thanks for reading through the Sharding Pitfall series! If you want to learn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.

October 27, 2014

by Francesca Krihely

· 4,307 Views

Sharding Pitfalls Part II: Running a Sharded Cluster

By Adam Comerford, Senior Solutions Engineer In Part I we discussed important considerations when picking a shard key. In this post we will go through some recommendations when running a sharded cluster at scale. Scalability is one of the core benefits of sharding in MongoDB but this can give you a false sense of security; even with that flexibility, you still have to make smart decisions about how and when you deploy resources. In this post, we will cover a couple of common mistakes that people tend to make when it comes to running a sharded cluster. 3. Waiting too long to add a new shard (overloaded) You sharded your database and scaled horizontally for a reason, perhaps it was to add more memory or disk capacity. Whatever the reason, if your application usage grows over time so (generally) does your database utilization. Eventually, your current sharded cluster will pass a certain point, let’s call it 80% utilized (as a nice round estimate), such that it becomes problematic to add another shard. Why? Well, adding a new shard to a cluster is not free, and it is not instantaneous. It consumes resources and (initially) accepts very little traffic. Essentially, at the start of its existence, a newly added shard costs you capacity instead of adding capacity. The length of time it will stay in this state will depend on the balancer and how long it takes for a significant portion of “busy/active” chunks to move onto the new shard. It can often be easier to visualize this process, so let’s make up some hypothetical numbers and set the bar relatively low. Our imaginary existing cluster will be a set of 2 shards, with 2000 chunks (500 considered “active”) and to that we need to add a 3rd shard. This 3rd shard will eventually store one third of the active chunks (and total chunks). The question is, when does this shard stop adding overhead overall and instead become an asset? In reality, this will vary from cluster to cluster and have a lot of dependencies and variables - in other words you need to have good metrics about your cluster, particularly your load bottleneck. Therefore we will once again use our imaginations and go with a relatively low bar: when 5% of active chunks—that is, those chunks seeing most traffic—have migrated to the new shard, you should expect a net gain in performance. In our imaginary system we have evaluated our load levels, the expected impact of migrations and have determine that once that 5% threshold of active chunks has been migrated to the new shard it can be considered a net gain for the overall system. Once all chunks have been balanced, then the migration overhead disappears, but initially this will be an expected trade off. This chart shows how long it would take for new shards to reach net positive contribution in your cluster (the dotted line implies net gain): In this fabricated example, it takes almost 2 hours for the new shard to attain a viable level of active chunks and be considered a net gain for the overall system. Although these numbers are fictional, these numbers are based on setups we have seen in real systems with moderate load. From there it is relatively easy to imagine this set of migrations taking even longer on an overloaded set of shards, and taking far longer for our newly added shard to cross the threshold and become a net gain. As such it is best to be proactive and add capacity before it becomes a necessity. Possible Mitigation Strategies Manual balancing of targeted “hot” chunks (chunk that is being accessed more than others) to move activity to the new shard more quickly Add the shard at low traffic time so that there is less competition for resources Disable balancing on some collections, prioritise balancing busy collections first 4. Under-provisioning Config Servers Provisioning enough resources without being wasteful is always tricky, and all the more so in a complicated distributed system like a MongoDB sharded cluster. Everyone wants to use their hardware, virtual instances, virtual machines, containers and the like in the most efficient way possible, and get the best bang for their buck. Hence it is only natural to take a look at the various pieces of a distributed cluster and look for lower utilized pieces that could be put on less expensive resources. The most common pitfall here with MongoDB are the config servers, which are often neglected when stress testing a cluster. In testing environments and smaller deployments (unless specific measures are taken to stress them) they are relatively lightly loaded and usually identified as candidates for lesser instances/hardware. The problem is that these are critical pieces of infrastructure. They may not be heavily loaded all the time, but when they do see load and struggle to service requests, that can impact all queries (reads, writes, authentication) and add latency to all requests made of the cluster in question. In particular, the first config server in the list supplied to your mongos processes is vital. This is the config server that all mongos processes will default to read from when fetching or refreshing their view of the data distribution in your cluster. Similarly, this is the server that will be hit when attempting to authenticate a user. If it is under-provisioned and cannot service queries, or if it has problems with networking (packet loss, congestion), then the effects will be significant. Possible Mitigation Strategies Ensure the config servers are load tested, slightly over-provisioned (the first config server in particular) If using virtual machines or cloud based instances, investigate increasing available resources Turning off the balancer, disabling chunk splitting will reduce the chances of high read traffic to the config servers (no migrations, no meta data refresh) but this is only a temporary fix unless you have a perfect write distribution and may not eliminate issues completely. 5. Using the count() command on sharded collections This pitfall is very common, and it seems to hit somewhat randomly in terms of how long someone has been running a sharded environment. At some point, a question will arise along the lines of: “How are we tracking/verifying/checking how many documents we have in each collection on each shard, how balanced are they and do they agree with ?” Hopefully no one is actually constructing questions this way in your organization, but you get the basic idea. The most obvious way to do a quick check on this type of thing is to count the documents and see if the numbers make sense and/or agree with counts elsewhere. That thinking naturally leads people to the count command and they proceed to use it to gather figures for their documents and collections. Unfortunately, on a busy, mature sharded cluster, the results will very rarely be what is expected. The reason for this is that the count command as implemented today has several optimizations in place to make it faster to run in general and those speed optimizations essentially bypass a key piece of the sharding functionality needed to return accurate results in this case. This is a known bug and is being tracked in SERVER-3645, but does not stop people from consistently hitting this issue. The nature of the issue means that count will report documents in the results that it should not, for example: Documents that are being deleted as part of a chunk migrations Documents that have been left behind from previous chunk migrations (also known as orphans) Documents currently being copied as part of an in-flight chunk migration A regular query (rather than a count) will have its results filtered by the respective primary and not suffer from the same problem. Hence, if you were to manually count the results from a query client-side you would get an accurate result. This quirk of sharded environments will eventually be fixed, but for now it will inevitably crop up from time to time in all active sharded clusters used by a large team. Possible Mitigation Strategies Do counts on the client side, or use targeted, range based queries (with a primary read preference) to count instead Use cleanUpOrphaned and disable the balancer (make sure it has finished current round) when performing counts across the cluster If you want tolearn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.

October 21, 2014

by Francesca Krihely

· 4,757 Views

How to Allow Only HTTPS on an S3 Bucket

It is possible to disable HTTP access on S3 bucket, limiting S3 traffic to only HTTPS requests. The documentation is scattered around the Amazon AWS documentation, but the solution is actually straightforward. All you need to do to block HTTP traffic on an S3 bucket is add a Condition in your bucket's policy. AWS supports a global condition for verifying SSL. So you can add a condition like this: "Condition": { "Bool": { "aws:SecureTransport": "true" } } Here's a complete example: { "Version": "2008-10-17", "Id": "some_policy", "Statement": [ { "Sid": "AddPerm", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::my_bucket/*", "Condition": { "Bool": { "aws:SecureTransport": "true" } } } ] } Now accessing the contents of my_bucket over HTTP will produce a 403 error, while using HTTPS will work fine.

October 8, 2014

by Matt Butcher

· 17,786 Views

Getting Started with JHipster on OS X

Last week I was tasked with developing a quick prototype that used AngularJS for its client and Spring MVC for its server. A colleague developed the same application using Backbone.js and Spring MVC. At first, I considered using my boot-ionic project as a starting point. Then I realized I didn't need to develop a native mobile app, but rather a responsive web app. My colleague mentioned he was going to use RESThub as his starting point, so I figured I'd use JHipster as mine. We allocated a day to get our environments setup with the tools we needed, then timeboxed our first feature spike to four hours. My first experience with JHipster failed the 10-minute test. I spent a lot of time flailing about with various "npm" and "yo" commands, getting permissions issues along the way. After getting thinks to work with some sudo action, I figured I'd try its Docker development environment. This experience was no better. JHipster seems like a nice project, so I figured I'd try to find the causes of my issues. This article is designed to save you the pain I had. If you'd rather just see the steps to get up and running quickly, skip to the summary. The "npm" and "yo" issues I had seemed to be caused by a bad node/npm installation. To fix this, I removed node and installed nvm. Here's the commands I needed to remove node and npm: sudo rm -rf /usr/local/lib/node_modules sudo rm -rf /usr/local/include/node sudo rm /usr/local/bin/node sudo rm -rf /usr/local/bin/npm sudo rm /usr/local/share/man/man1/node.1 sudo rm -rf /usr/local/lib/dtrace/node.d sudo rm -rf ~/.npm Next, I ran "brew doctor" to make sure Homebrew was still happy. It told me some things were broken: $ brew doctor Warning: Broken symlinks were found. Remove them with `brew prune`: /usr/local/bin/yo /usr/local/bin/ionic /usr/local/bin/grunt /usr/local/bin/bower I ran brew update && brew prune, followed by brew install nvm. Next, I added the following to my ~/.profile: source $(brew --prefix nvm)/nvm.sh To install the latest version of node, I ran the commands below and set the latest version as the default: nvm ls-remote nvm install v0.11.13 nvm alias default v0.11.13 Once I had a fresh version of Node.js, I was able to run JHipster's local installation instructions. npm install -g yo npm install -g generator-jhipster Then I created my project: yo jhipster I was disappointed to find this created all the project files in my current directory, rather than in a subdirectory. I'd recommend you do the following instead: mkdir ~/projectname && cd ~/projectname && yo jhipster Before creating your project, JHipster asks you a number of questions. To see what they are, see its documentation on creating an application. Two things to be aware of: Hot reloading Java code doesn't work well (yet) with Java 8 Its OAuth2 implementation doesn't work with WebSockets In other words, I'd recommend using Java 7 + (cookie-based authentication with websockets) or (oauth2 authentication w/o websockets). After creating my project, I was able to run it using "mvn spring-boot:run" and view it at http://localhost:8080. To get hot-reloading for the client, I ran "grunt server" and opened my browser to http://localhost:9000. JHipster + Docker on OS X I had no luck getting the Docker instructions to work initially. I spent a couple hours on it, then gave up. A couple of days ago, I decided to give it another good ol' college-try. To make sure I figured out everything from scratch, I started by removing Docker. I re-installed Docker and pulled the JHipster image using the following: sudo docker pull jdubois/jhipster-docker The error I got from this was the following: 2014/09/05 19:43:38 Post http:///var/run/docker.sock/images/create?fromImage=jdubois%2Fjhipster-docker&tag=: dial unix /var/run/docker.sock: no such file or directory After doing some research, I learned I needed to run boot2docker init first. Next I ran boot2docker up to start the Docker daemon. Then I copied/pasted "export DOCKER_HOST=tcp://192.168.59.103:2375" into my console and tried to run docker pull again. It failed with the same error. The solution was simpler than you might think: don't use sudo. $ docker pull jdubois/jhipster-docker Pulling repository jdubois/jhipster-docker 01bdc74025db: Pulling dependent layers 511136ea3c5a: Download complete ... The next command that JHipster's documentation recommends is to run the Docker image, forward ports and share folders. When you run it, the terminal seems to hang and trying to ssh into it doesn't work. Others have recently reported a similar issue. I discovered the hanging is caused by a missing "-d" parameter and ssh doesn't work because you need to add a portmap to the VM to expose the port to your host. You can fix this by running the following: boot2docker down VBoxManage modifyvm "boot2docker-vm" --natpf1 "containerssh,tcp,,4022,,4022" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containertomcat,tcp,,8080,,8080" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containergruntserver,tcp,,9000,,9000" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containergruntreload,tcp,,35729,,35729" boot2docker start After making these changes, I was able to start the image and ssh into it. docker run -d -v ~/jhipster:/jhipster -p 8080:8080 -p 9000:9000 -p 35729:35729 -p 4022:22 -t jdubois/jhipster-docker ssh -p 4022 jhipster@localhost I tried creating a new project within the VM (cd /jhipster && yo jhipster), but it failed with the following error: /usr/lib/node_modules/generator-jhipster/node_modules/yeoman-generator/node_modules/mkdirp/index.js:89 throw err0; ^ Error: EACCES, permission denied '/jhipster/src' The fix was giving the "jhipster" user ownership of the directory. sudo chown jhipster /jhipster After doing this, I was able to generate an app and run it using "mvn spring-boot:run" and access it from my Mac at http://localhost:8080. I was also able to run "grunt server" and see it at http://localhost:9000 However, I was puzzled to see that there was nothing in my ~/jhipster directory. After doing some searching, I found that the docker run -v /host/path:/container/path doesn't work on OS X. David Gageot's A Better Boot2Docker on OSX led me to svendowideit/samba, which solved this problem. The specifics are documented in boot2docker's folder sharing section. I shutdown my docker container by running "docker ps", grabbing the first two characters of the id and then running: docker stop [2chars] I started the JHipster container without the -v parameter, used "docker ps" to find its name (backstabbing_galileo in this case), then used that to add samba support. docker run -d -p 8080:8080 -p 9000:9000 -p 35729:35729 -p 4022:22 -t jdubois/jhipster-docker docker run --rm -v /usr/local/bin/docker:/docker -v /var/run/docker.sock:/docker.sock svendowideit/samba backstabbing_galileo Then I was able to connect using Finder > Go > Connect to Server, using the following for the server address: cifs://192.168.59.103/jhipster To make this volume appear in my regular development area, I created a symlink: ln -s /Volumes/jhipster ~/dev/jhipster After doing this, all the files were marked as read-only. To fix, I ran "chmod -R 777 ." in the directory on the server. I noticed that this also worked if I ran it from my Mac's terminal, but it took quite a while to traverse all the files. I noticed a similar delay when loading the project into IntelliJ. Summary Phew! That's a lot of information that can be condensed down into four JHipster + Docker on OS X tips. Make sure your npm installation doesn't require sudo rights. If it does, reinstall using nvm. Add portmaps to your VM to expose ports 4022, 8080, 9000 and 35729 to your host. Change ownership on the /jhipster in the Docker image: sudo chown jhipster /jhipster. Use svendowideit/samba to share your VM's directories with OS X.

September 10, 2014

by Matt Raible

· 13,019 Views

Creating a Custom SQL Server VM Image in Azure

Recently I had the opportunity to work on a project were I needed to create a custom SQL Server image for use with Azure VMs. The process was a little more challenging than I initially anticipated. I think this is mostly because I was not familiar with the process of preparing a SQL Server image. Perhaps this isn’t much of a challenge for an experienced SQL Server DBA or IT Pro. For me, it was a great learning experience. Why a Custom SQL Server Image? The Azure VM image gallery already contains a SQL Server image. It’s very easy to create a new SQL Server VM using this image. However, doing so has a few important trade-offs to consider: Unable to fully customize the base install of SQL Server. This is a template/image after all – you get a VM configured the way the image was configured. Unable to use your own SQL Server license. If your company has an Enterprise Agreement (EA) with Microsoft, it’s likely there is already some SQL Server licenses built into that agreement. Depending on the details, it may be significantly cheaper to use the licenses from the EA instead of paying the SQL Server VM image upcharge from Azure. The Basic Steps There are 6 basic steps to creating a custom SQL Server VM image for use in Azure. Provision a new base Windows Server VM Download the SQL Server installation media Run SQL Server setup to prepare an image Configure Windows to complete the installation of SQL Server Capture the image and add it to the Azure VM image gallery Create a new VM instance using the custom SQL Server image The basic idea here is to create a base VM, customize it with a SQL Server image, capture the VM to create an image, and then provision new VMs using that captured VM image. Let’s dive into each of these in a little more detail. Note: the terminology here can be a little confusing. When referring to the VM used to create the template/image, I’ll use the term “base VM”. When referring to the VM created from the base VM, I’ll use the term “VM instance”. 1. Provision a new base Windows Server VM There are multiple ways to create a Windows Server VM in Azure. Creating a VM via the Azure management portal and PowerShell are probably the two most popular options. Be sure to check out this tutorial to learn how to do so via the portal. For the purposes of this post, I’ll do so via PowerShell. $img = Get-AzureVMImage ` | where { ( $_.PublisherName -ilike "Microsoft*" -and $_.ImageFamily -ilike "Windows Server 2012 Datacenter" ) } ` | Sort-Object -Unique -Descending -Property ImageFamily ` | sort -Descending -Property PublishDate ` | select -First(1) $vmConfig = New-AzureVMConfig -Name "sql-1" -InstanceSize Small -ImageName $img.ImageName | Add-AzureProvisioningConfig -Windows -AdminUsername "[admin-username-here]" -Password "[admin-password-here]" New-AzureVM -ServiceName "SQLServerVMTemplate" -VMs $vmConfig -Location "East US" -WaitForBoot 2. Download the SQL Server installation media With the base Windows Server 2012 VM created, we can now get ready to prepare (sysprep) the SQL Server installation. To do that, we need to get the SQL Server installation media onto the machine. The easiest way I found to do this was to leverage Azure blob storage. Upload the SQL Server ISO file to Azure blob storage Remote Desktop (RDP) into the base VM From the VM, download the SQL Server ISO file to the local disk Mount the SQL Server ISO file to the VM Copy the ISO contents (not the ISO file itself) to the VM’s C:\ drive. For example, use C:\sql The SQL Server installation media files need to be copied to the local C: drive so it can be used later to complete the SQL Server installation (when provisioning the actual SQL Server VM instance). 3. Run SQL Server setup to prepare an image In order to prepare the (sysprep’d) SQL Server VM image (which we can use as a template for future VMs), we need to run the SQL Server installation and instruct it topreparean image – not run the full installation. An easy way to do this is with a SQL Server configuration file, an example of which I’ve included below. ConfigurationFile.ini ;SQL Server 2012 Configuration File [OPTIONS] ; Specifies a Setup workflow, like INSTALL, UNINSTALL, or UPGRADE. This is a required parameter. ACTION="PrepareImage" ; Detailed help for command line argument ENU has not been defined yet. ENU="True" ; Parameter that controls the user interface behavior. Valid values are Normal for the full UI, AutoAdvance for a simplified UI, and EnableUIOnServerCore for bypassing Server Core setup GUI block. ;UIMODE="Normal" ; Specifies setup not display any user interface. ;QUIET="False" ; Specifies setup to display progress only, without any user interaction. QUIETSIMPLE="True" ; Specifies whether SQL Server Setup should discover and include product updates. The valid values are True and False or 1 and 0. By default SQL Server Setup will include updates that are found. UpdateEnabled="True" ; Specifies features to install, uninstall, or upgrade. The list of top-level features include SQL, AS, RS, IS, MDS, and Tools. The SQL feature will install the Database Engine, Replication, Full-Text, and Data Quality Services (DQS) server. The Tools feature will install Management Tools, Books online components, SQL Server Data Tools, and other shared components. FEATURES=SQLENGINE ; Specifies the location where SQL Server Setup will obtain product updates. The valid values are "MU" to search Microsoft Update, a valid folder path, a relative path such as .\MyUpdates or a UNC share. By default SQL Server Setup will search Microsoft Update or a Windows Update service through the Window Server Update Services. UpdateSource="MU" ; Displays the command line parameters usage HELP="False" ; Specifies that the detailed Setup log should be piped to the console. INDICATEPROGRESS="False" ; Specifies that Setup should install into WOW64. This command line argument is not supported on an IA64 or a 32-bit system. X86="False" ; Specifies the root installation directory for shared components. This directory remains unchanged after shared components are already installed. INSTALLSHAREDDIR="C:\Program Files\Microsoft SQL Server" ; Specifies the root installation directory for the WOW64 shared components. This directory remains unchanged after WOW64 shared components are already installed. INSTALLSHAREDWOWDIR="C:\Program Files (x86)\Microsoft SQL Server" ; Specifies the Instance ID for the SQL Server features you have specified. SQL Server directory structure, registry structure, and service names will incorporate the instance ID of the SQL Server instance. INSTANCEID="MSSQLSERVER" ; Specifies the installation directory. INSTANCEDIR="C:\Program Files\Microsoft SQL Server" There are two steps in this process: Copy the ConfigurationFile.ini file (from your local PC) to the same location as the SQL Server installation media (i.e.c:\sql) on the base VM. Run SQL Server setup to prepare an image. From a command prompt (on the base VM), navigate to theC:\sqlfolder and then execute the following command: Setup.exe /ConfigurationFile=ConfigurationFile.ini /IAcceptSQLServerLicenseTerms=true 4. Configure Windows to complete the installation of SQL Server At this point the base VM should have an “installation” of SQL Server that is not fully completed. The SQL Server bits are in place, but they’re not configured for a full server install . . . at least not yet. The final configuration of SQL Server will take place when the VM instance (of which this template/image is the base) is provisioned and boots up for the first time. This is accomplished by using a CMD file with the following content: @ECHO OFF && SETLOCAL && SETLOCAL ENABLEDELAYEDEXPANSION && SETLOCAL ENABLEEXTENSIONS REM All commands will be executed during first Virtual Machine boot "C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012\setup.exe" /QS /ACTION=CompleteImage /INSTANCEID=MSSQLSERVER /INSTANCENAME=MSSQLSERVER /IACCEPTSQLSERVERLICENSETERMS=1 /SQLSYSADMINACCOUNTS=%COMPUTERNAME%\Administrators /BROWSERSVCSTARTUPTYPE=AUTOMATIC /INDICATEPROGRESS /TCPENABLED=1 /PID="[YOUR-SQL-SERVER-PRODUCT-ID-HERE]" On your local PC, save the file as SetupComplete2.cmd RDP / log into the base VM Copy the SetupComplete2.cmd from your local PC file to the c:\Windows\OEM folder on the base VM Change the value for the SQLSYSADMINACCOUNTS value to be that of the administrative account created on the VM (or better yet – the local Administrators group account) If needed, supply the SQL Server product ID (PID) value. When Windows starts on the new VM instance for the first time, the SetupComplete2.cmd file should automatically run. It is invoked by the SetupComplete.cmd file already on the machine. 5. Capture the image and add it to the Azure VM image gallery At this point a base SQL Server VM has been created and the groundwork laid to complete the install. Now it is time to create the VM image from the base VM, and do to that you sysprep and capture the base VM. Please follow the guide on How to Capture a Windows Virtual Machine to Use as a Template. 6. Create a new VM using the custom SQL Server image With a new custom VM image template available in the VM image gallery, you can provision a new VM instance using that custom template. Upon first boot, the newly provisioned VM should complete the full SQL Server installation as laid out in your SetupComplete2.cmd file. Please follow the guide on How to Create a Custom Virtual Machine for more information on creating the VM from the template. Closing Thoughts One of the quirks I noticed when preparing the base SQL Server image is that it was not possible to prepare the image with SQL Server Management Studio (SSMS). I would have to do the install after the newly provisioned VM instance is created. Not hard, but time consuming (an annoying if doing this on multiple VM instances). I later learned that SQL Server 2012 Cumulative Update 1 does allow for preparing a SQL Server image with SSMS installed. I’ve included a link below that describes the process for creating a SQL Server image with CU1. In the end, this process really is not all that hard. Time consuming? Yes! The worst part (at least for me) was really just understanding how the SQL Server installation and sysprep process works. Once I wrapped my head around that, the process was a lot smoother. Helpful Resources While I was learning how to create a custom SQL Server VM image, the following resources were very helpful: How to: Create a Windows Azure Virtual Machine Operating System Image for Microsoft Dynamics NAV. This MSDN article provided the jumping off point on learning how to install SQL Server by using a sysprep image. Install SQL Server 2012 from the Command Prompt Install SQL Server 2012 Using a Configuration File Install SQL Server 2012 Using SysPrep How to create a slipstream SQL Server 2012 and Cumulative Update 1 image –http://sqlperformance.com/2012/12/system-configuration/sql-2012-slipstream I would like to thank Scott Klein for his assistance in verifying these steps. His help was extremely valuable to ensure I was doing this the right way.

September 10, 2014

by Michael Collier

· 6,491 Views

Create Your Own Private Docker Registry

This is a post in a series discussing using spring-boot and docker for deployment. Refer to the end of the first post for a table of contents. Shortly after you start building docker containers you will realize that you need some place to publish your images. You could push to the central docker registry. However, the central registry is public. Not a great idea if you are working on a private project. If this is your case, you can simply run a local docker registry. To install and run your private registry run $ docker run -p 5000:5000 -d registry Surprise!!! It is ran in a docker container. You can now start pushing to your local repository. As an example, I will pull the latest postgres image and push version 9.4 to my local registry. $ docker pull postgres $ docker tag postgres:9.4 localhost:5000/postgres:9.4 $ docker push localhost:5000/postgres Outputs: The push refers to a repository [localhost:5000/postgres] (len: 1) Sending image list Pushing repository localhost:5000/postgres (1 tags) 511136ea3c5a: Image successfully pushed ec3443b7b068: Image successfully pushed 06af7ad6cff1: Image successfully pushed 37eae31ff4e9: Image successfully pushed 83e30bf01299: Image successfully pushed 499da968a652: Image successfully pushed bf09bd07d760: Image successfully pushed 1eee820e762b: Image successfully pushed 7bf9287ccfce: Image successfully pushed 288b8d534217: Image successfully pushed f20dbf0acb45: Image successfully pushed bd511e81a5ed: Image successfully pushed 8fe7eb38aea1: Image successfully pushed 464263a50f65: Image successfully pushed 1f58a67adecd: Image successfully pushed a99fb4ee814d: Image successfully pushed 6112f975feab: Image successfully pushed 6dff1b5c2259: Image successfully pushed Pushing tag for rev [6dff1b5c2259] on {http://localhost:5000/v1/repositories/postgres/tags/9.4} Looking at the current images, you will notice that the version tagged with localhost and the official images have the same information. Notice that I had to retag the image with the location of the repository. I thought the requirement to put the location address as part of the image name was a little odd. However, after using docker longer, it makes sense. It ensures you know where the image was originally pulled. $ docker images postgres 9.4 6dff1b5c2259 5 days ago 244.4 MB localhost:5000/postgres 9.4 6dff1b5c2259 5 days ago 244.4 MB Since docker tags are not permanent, and newer version of the postgres:9.4 image could be pushed to the public registry. When you self-host images, you are in control of when updates are pushed to any base image that you have extended. Someday I intend to learn how to build an image completely from scratch. Docker-ize All the Things!

August 11, 2014

by Robert Greathouse

· 18,827 Views · 1 Like

Deploying a Spring Boot Application to Cloud Foundry with Spring-Cloud

I have a small Spring boot based application that uses a Postgres database as a datastore. I wanted to document the steps involved in deploying this sample application to Cloud Foundry. Some of the steps are described in the Spring Boot reference guide, however the guides do not sufficiently explain how to integrate with the datastore provided in a cloud based environment. Spring-cloud provides the glue to connect Spring based applications deployed on a Cloud to discover and connect to bound services, so the first step is to pull in the Spring-cloud libraries into the project with the following pom entries: org.springframework.cloud spring-cloud-spring-service-connector 1.0.0.RELEASE org.springframework.cloud spring-cloud-cloudfoundry-connector 1.0.0.RELEASE Once this dependency is pulled in, connecting to a bound service is easy, just define a configuration along these lines: @Configuration public class PostgresCloudConfig extends AbstractCloudConfig { @Bean public DataSource dataSource() { return connectionFactory().dataSource(); } } Spring-Cloud understands that the application is deployed on a specific Cloud(currently Cloud Foundry and Heroku by looking for certain characteristics of the deployed Cloud platform), discovers the bound services, recognizes that there is a bound service using which a Postgres based datasource can be created and returns the datasource as a Spring bean. This application can now deploy cleanly to a Cloud Foundry based Cloud. The sample application can be tried out in a version of Cloud Foundry deployed with bosh-lite, these are how the steps in my machine looks like once Cloud Foundry is up and running with bosh-lite: The following command creates a user provided service in Cloud Foundry: cf create-user-provided-service psgservice -p '{"uri":"postgres://postgres:[email protected]:5432/hotelsdb"}' Now, push the app, however don't start it up. We can do that once the service above is bound to the app: cf push spring-boot-mvc-test -p target/spring-boot-mvc-test-1.0.0-SNAPSHOT.war --no-start Bind the service to the app and restart the app: cf bind-service spring-boot-mvc-test psgservice cf restart spring-boot-mvc-test That is essentially it, Spring Cloud should ideally take over at the point and cleanly parse the credentials from the bound service which within Cloud Foundry translates to an environment variable called VCAP_SERVICES, and create the datasource from it. There is however an issue with this approach - once the datasource bean is created using spring-cloud approach, it does not work in a local environment anymore. The potential fix for this is to use Spring profiles, assume that there is a different "cloud" Spring profile available in Cloud environment where the Spring-cloud based datasource gets returned: @Profile("cloud") @Configuration public class PostgresCloudConfig extends AbstractCloudConfig { @Bean public DataSource dataSource() { return connectionFactory().dataSource(); } } and let Spring-boot auto-configuration create a datasource in the default local environment, this way the configuration works both local as well as in Cloud. Where does this "cloud" profile come from, it can be created using a ApplicationContextInitializer, and looks this way: public class SampleWebApplicationInitializer implementsApplicationContextInitializer { private static final Log logger = LogFactory.getLog(SampleWebApplicationInitializer.class); @Override public void initialize(AnnotationConfigEmbeddedWebApplicationContext applicationContext) { Cloud cloud = getCloud(); ConfigurableEnvironment appEnvironment = applicationContext.getEnvironment(); if (cloud!=null) { appEnvironment.addActiveProfile("cloud"); } logger.info("Cloud profile active"); } private Cloud getCloud() { try { CloudFactory cloudFactory = new CloudFactory(); return cloudFactory.getCloud(); } catch (CloudException ce) { return null; } } } This initializer makes use of the Spring-cloud's scanning capabilities to activate the "cloud" profile. One last thing which I wanted to try was to make my local behave like Cloud atleast in the eyes of Spring-Cloud and this can be done by adding in some environment variables using which Spring-Cloud makes the determination of the type of cloud where the application is deployed, the following is my startup script in local for the app to pretend as if it is deployed in Cloud Foundry: read -r -d '' VCAP_APPLICATION <<'ENDOFVAR' {"application_version":"1","application_name":"spring-boot-mvc-test","application_uris":[""],"version":"1.0","name":"spring-boot-mvc-test","instance_id":"abcd","instance_index":0,"host":"0.0.0.0","port":61008} ENDOFVAR export VCAP_APPLICATION=$VCAP_APPLICATION read -r -d '' VCAP_SERVICES <<'ENDOFVAR' {"postgres":[{"name":"psgservice","label":"postgresql","tags":["postgresql"],"plan":"Standard","credentials":{"uri":"postgres://postgres:[email protected]:5432/hotelsdb"}]} ENDOFVAR export VCAP_SERVICES=$VCAP_SERVICES mvn spring-boot:run This entire sample is available at this github location:https://github.com/bijukunjummen/spring-boot-mvc-test Conclusion Spring Boot along with Spring-Cloud project now provide an excellent toolset to create Spring-powered cloud ready applications, and hopefully these notes are useful in integrating Spring Boot with Spring-Cloud and using these for seamless local and Cloud deployments.

August 5, 2014

by Biju Kunjummen

· 34,017 Views · 2 Likes

JBoss Data Grid: Installation and Development

In this blog, we will discuss one particular data grid platform from Redhat namely JBoss Data Grid (JDG). We will firstly cover how to access and install this data grid platform and then we will demonstrate how to develop and deploy a simple remote client/server data grid application which utilises the HotRod protocol. We will be using the latest release JDG 6.2 from Redhat in this article. Installation Overview To start using JDG, firstly log on to the redhat site https://access.redhat.com/home and download the software from the Downloads section of the site. We wish to download JDG 6.2 server by clicking on the appropriate links in the Downloads section. For future reference, it is also useful to download the quickstart and maven repository zip files. To install JDG, we simply unzip the JDG server package into an appropriate directory in your environment. JDG Overview In this section, we will provide a brief overview of the contents of the JDG installation package and the most notable configuration options available to users. Out of the box, users are provided with two runtime options either to run JDG in standalone or clustered mode. We can start JDG in either mode by invoking the stanadalone or clustered start up scripts in the / bin directory. To configure the JDG in either mode we need to configure the files standalone.xml and clustered.xml. In our case we will creating a distributed cache which will run on 3 node JDG cluster so we will be utilizing the clustered startup script. In order to set up and add new cache instances to JDG, we modify the infinispan subsystems in the appropriate xml configuration file above. We should also note the principal difference between the standalone and clustered configuration file is that in the clustered configuration file there is a JGroups subsystem configured element which allows for communication and messaging between configured cache instances running in a JDG cluster. Development Environment Setup and Configuration In this section, we will detail how to develop and configure a simple datagrid application which will be deployed to a 3 node JDG cluster. We will demonstrate how to configure and deploy a distributed cache in JDG and also show how to develop a HotRod Java client application which will be used to insert, update and display entries in the distributed cache. We will firstly discuss setting a new distributed cache on a 3 node JDG cluster. In this example, we will run our JDG cluster on a single machine by running each JDG instance on different ports. Firstly, we will create 3 instances of JDG by creating 3 directories (server1, server2, server3) on our host machine and unzipping each JDG installation into each directory. We will now configure each node in our cluster by copying and renaming the clustered.xml configuration file in the \server1\jboss-datagrid-6.2.0-server\standalone\configuration directory. We will name each of the cluster configuration files as "clustered1.xml", "clustered2.xml" and "clustered3.xml" for the JDG instances denoted by "server1", "server2" and "server3" respectively. We will now set up a new distributed cache on our JDG cluster by modifying the infinispan subsystem element in each clustered.xml file. We will demonstrate this for the node denoted "server1" here by modifying the file "clustered1.xml". The cache configuration shown here will be the same across all 3 nodes. To setup a new distributed cache named "directory-dist-cache", we configure the following elements in the file named "clustered1.xml" ......... ...... .............. ...... ...... /socket-binding-group> We will discuss the key elements and attributes relating to the configuration above. In the infinispan endpoint subsystem, we will configure hotrod clients to connect to the JDG server instance on socket 11222. The name of the cache container to host each of the cache instances will be held in the container named "clusteredcache". We have configured the infinispan core subsystem to the default cache container named "clusteredcacahe" whereby we will allow for jmx statistics to be collected relating the configured cache entries i.e statistics="true" We have created a new distributed cache named "directory-dist-cache" whereby there will be two copies of each cache entry held on two of the 3 cluster nodes. We have also set up an eviction policy whereby should there be more than 20 entries in our cache then cache entries will be removed using the LRU algorithm We should have configured nodes "server2" and "server3" to start up with a port offset of 100 and 200 respectively by configuring the socketing binding group element appropriately. Please view the socket bindings noted below. To set the socket binding element with a port offset of 100 on "server2", we configure "clustered2.xml" with the following entry: ...... ...... /socket-binding-group> To set the socket binding element with a port offset of 200 on "server3", we configure "clustered3.xml" with the following entry: ...... ...... /socket-binding-group> Before discussing the setup and configuration of our Hotrod client which will be used to interact with our JDG clustered HotRod server, we will start up each server instance to ensure our newly configured JDG distributed cache starts up correctly. Open up 3 Windows or Linux consoles and execute the following start up commands: Console 1: 1) Navigate to \server1\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the first instance of our JDG cluster denoted "server1": clustered -c=clustered1.xml -Djboss.node.name=server1 Console 2: 1) Navigate to \server2\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the second instance of our JDG cluster denoted "server2": clustered -c=clustered2.xml -Djboss.node.name=server2 Console 3: 1) Navigate to \server3\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the third instance of our JDG cluster denoted "server3": clustered -c=clustered3.xml -Djboss.node.name=server3 Providing all 3 JDG instances have started up correctly, you should see output in the console window whereby we can see there are 3 JDG instances in the JGroups view: HotRod Client Development Setup Now that the Hotrod server is up and running, we need to develop a Hotrod Java client which will interact with the clustered server application. The development environment consists of the following tools. 1) JDK Hotspot 1.7.0_45 2) IDE - Eclipse Kepler Build id: 20130919-0819 The HotRod client application is a simple application consisting of two Java classes. The application allows users to retrieve a reference to the distributed cache from the JDG server and then perform these actions: a) add new cinema objects. b) add and remove shows to each cinema object. c) print the list of all cinemas and shows stored in our distributed cache. The source code can be downloaded from github @ https://github.com/davewinters/JDG. We could use maven here to build and execute our application by configuring the maven settings.xml to point to the maven repository files we downloaded earlier and set up a maven project file (pom.xml) to build and execute the client application. In this article we will build our application using the Eclipse IDE and run the client application on the command line. To create a HotRod client application and execute the sample application, one should complete the following steps: 1) Create a new Java Project in Eclipse 2) Create a new package named uk.co.c2b2.jdg.hotrod and import the source code that has been downloaded from Github mentioned previously. 3) Now we need to configure the build path in Eclipse to contain the appropriate JDG client jar files which are required to compile the application. You should include all the client jar files in the project build path. These jar files are contained in the JDG installation zip file. For example on my machine these jar files are located in the directory: \server1\jboss-datagrid-6.2.0-server\client\hotrod\java 4. Providing the Eclipse build path has been configured appropriately, the application source should compile without issue. 5. We will need to execute the Hotrod application by opening the console window and executing the following command. Note the path specified here will differ depending on where the JDG client jar files and application class files are located in your environment: java -classpath ".;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\commons-pool-1.6-redhat-4.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-client-hotrod-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-commons-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-query-dsl-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-remote-query-client-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-logging-3.1.2.GA-redhat-1.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-river-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protobuf-java-2.5.0.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protostream-1.0.0.CR1-redhat-1.jar" uk/co/c2b2/jdg/hotrod/CinemaDirectory 6. The Hotrod client at runtime provides the end user with a number of different options to interact with the distributed cache as we can view from the console window below. Client Application Principal API Details We will not provide a detailed overview of the Hotrod application code however we will describe the principal API and code details briefly. In order to interact with the distributed cache on the JDG cluster using the Hotrod protocol, we will use the RemoteCacheManager Object which will allow us to retrieve a remote reference to the distributed cache. We have initialised a Properties object with the list of JDG instances and the associated with HotRod server port on each instance. We can add Cinema objects into the distributed cache using the RemoteCache.put() method. private RemoteCacheManager cacheManager; private RemoteCache cache; ..... Properties properties = new Properties(); properties.setProperty(ConfigurationProperties.SERVER_LIST, "127.0.0.1:11222;127.0.0.1:11322;127.0.0.1:11422"); cacheManager = new RemoteCacheManager(properties); cache = cacheManager.getCache("directory-dist-cache"); ..... cache.put(cinemaKey, cinemalist); In the webinar below, I describe in further detail how to set up a JDG cluster and how to develop and run the JDG application discussed above. For further details on JDG please visit: http://www.redhat.com/products/jbossenterprisemiddleware/data-grid/ Webinar: Introduction to JBoss Data Grid -- Installation, Configuration and Development In this webinar we will look at the basics of setting up JBoss Data Grid covering installation, configuration and development. We will look at practical examples of storing data, viewing the data in the cache and removing it. We will also take a look at the different clustered modes and what effect these have on the storage of your data:

July 25, 2014

by David Winters

· 16,105 Views

Reporting Back from MongoDB World 2014, NYC, Planet JSON

Closely approaching the one year mark of when I first joined MongoLab (and the MongoDB community), I had the pleasure of attending the inaugural MongoDB World conference put together by the incredible MongoDB team. Second only to the excitement around major MongoDB feature announcements was the collective disbelief that this was MongoDB’s first multi-day conference ever. A big congratulations to all those that worked hard to put on such a massive (did you see the Intrepid!?) event. All this planning would have been for naught if MongoDB leaders and engineers failed to deliver announcements and features that would meet and exceed expectations. From major public cloud announcements to the reveal of document-level locking in version 2.8, developers and conference goers had plenty to be excited about. There was a lot to digest from the conference… we’ll cover the major highlights in case you missed them. Big announcements in public cloud Our time at the MongoLab booth yielded many high-quality conversations, predominantly those about offloading previously internal processes and workloads to the public cloud. It was remarkable to see and hear so many enterprise teams with the exact same message: the public cloud is the future, and the future is now. It’s no surprise then that MongoDB, Inc. released not one, but two press releases around MongoDB solutions for the public cloud. Fully-managed MongoDB on the Microsoft Azure Store Nearly one year ago, MongoDB, Inc. chose to partner with the MongoLab team to build a production-ready MongoDB solution for developers on Microsoft Azure. On the first day of World, MongoDB, Inc. announced the product of our collaboration – a fully-managed highly available MongoDB-as-a-Service Add-On offering on the Microsoft Azure Store. This new service runs MongoDB Enterprise and offers replication, monitoring and support from MongoDB, Inc. It’s also backed by MongoDB Management Service (MMS), allowing for point-in-time recovery of MongoDB deployments. Now, teams without the expertise or resources to manage their MongoDB deployment(s) can outsource all the database operations (monitoring and alerting, backups, performance tuning, etc.) to both MongoLab and MongoDB’s expert support teams. You can check out the MongoDB add-on in the Azure Store: https://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/ MongoDB solutions on Google Cloud Platform MongoDB, Inc. also announced the arrival of new resources to help Google Cloud Platform customers deploy MongoDB on Google Compute Engine. These resources include a “Click to Deploy” feature and a MongoDB on Google Compute Engine Solutions paper covering MongoDB best practices. If you are looking for a fully-managed solution, with automated provisioning, backups, integrated monitoring and alerting, along with expert support, MongoLab recently announced the arrival of production-ready replica sets on Google. Product Roadmap – MongoDB version 2.8 On the second day of MongoDB World, Eliot Horowitz, MongoDB, Inc. CTO & Co-founder, took center stage and announced two huge changes to the MongoDB core project: document-level locking and pluggable storage engines. These features not only reflect improvements to the core project, but also signal to the community that the MongoDB team is listening to its users and is capable of delivering the software needed to power the workloads of tomorrow. Document-level locking The slides above from Eliot’s keynote point to a current obstacle (database-level locking) in MongoDB that limits overall scalability. With database-level locking, any write operation to the database holds the write lock and prevents subsequent writes from executing on the database until the original operation holding the write lock completes. Eliot’s announcement of document-level locking moves the write lock contention from the database level to the document (MongoDB equivalent to SQL “records”) level. This change will allow users to achieve much higher write throughput (we saw a 10x performance improvement in the live demo) across their MongoDB deployments, improving write scalability. If you’d like to try out document-level locking, the MongoDB team has already pushed the feature to the master branch on GitHub. This should only be used for experimentation, not to be run in production. Pluggable storage engine As MongoDB matures, feature releases like document level locking will continue to allow developers to build robust systems on top of MongoDB. But as the number of use cases grows, different tooling tailored to specific use cases may prove to be extremely beneficial. For example, if Company X decides that they want to use MongoDB to warehouse some of their data, they would likely want to optimize their database for slow-moving data and storage efficiency (compression). With the introduction of pluggable storage engines, many new possibilities are open to the community. Teams can now write their own storage engine for a particular use case, configure replica set nodes with different storage engines for specific situations, or collaborate with the open-source community to architect innovative solutions. This feature not only allows for more granular control of the database, but also encourages the MongoDB community to work together. Takeaways: A maturing and thriving ecosystem Roughly a year ago, MongoLab CTO Todd Dampier recapped MongoSF 2013 and spoke to the health of the MongoDB ecosystem. How far we’ve come! After attending the inaugural MongoDB World and chatting with MongoDB Masters, interns, hackathon winners, power users and those new to the community, the enthusiasm is still surging and as positive as ever. This enthusiasm is well placed. Developers and hackers use MongoDB because so much rich data on the web is shared as JSON (think Facebook, Twitter, Google, etc.). As a result, MongoDB is the de-facto database for hackathons and bootstrapped projects. Just learn the API for the site you want to mine, throw the JSON in MongoDB and query your data with the rich query language- it’s that easy. The MongoDB ecosystem is maturing as well. Take a look at the Customer Success Stories and you’ll get a feel for the extent in which enterprises leverage the solution and use it in production. To further drive enterprise adoption, MongoDB, Inc.’s public cloud solutions and product roadmap features aim to help teams run MongoDB in production and give teams the confidence that MongoDB will continue to improve scalability and meet their growing project requirements. Congratulations again to the MongoDB team on their big announcements and for creating such a fantastic forum at which to learn and meet fellow MongoDB users. Our team at MongoLab had a great time making new friends and talking shop; we look forward to meeting more MongoDB users soon (at a MongoDB Days near you)! -Chris@MongoLab

July 2, 2014

by Chris Chang

· 6,502 Views