DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest DevOps and CI/CD Topics

article thumbnail
Spring Integration Tests with MongoDB Rulez
Spring integration tests allow you to test functionality against a running application. This article shows proper database set- and clean-up with MongoDB.
June 10, 2015
by Ralf Stuckert
· 21,463 Views · 2 Likes
article thumbnail
Mounting an EBS Volume to Docker on AWS Elastic Beanstalk
Mounting an EBS volume to a Docker instance running on Amazon Elastic Beanstalk (EB) is surprisingly tricky. The good news is that it is possible. I will describe how to automatically create and mount a new EBS volume (optionally based on a snapshot). If you would prefer to mount a specific, existing EBS volume, you should check out leg100’s docker-ebs-attach (using AWS API to mount the volume) that you can use either in a multi-container setup or just include the relevant parts in your own Dockerfile. The problem with EBS volumes is that, if I am correct, a volume can only be mounted to a single EC2 instance – and thus doesn’t play well with EB’s autoscaling. That is why EB supports only creating and mounting a fresh volume for each instance. Why would you want to use an auto-created EBS volume? You can already use a docker VOLUME to mount a directory on the host system’s ephemeral storage to make data persistent across docker restarts/redeploys. The only advantage of EBS is that it survives restarts of the EC2 instance but that is something that, I suppose, happens rarely. I suspect that in most cases EB actually creates a new EC2 instance and then destroys the old one. One possible benefit of an EBS volume is that you can take a snapshot of it and use that to launch future instances. I’m now inclined to believe that a better solution in most cases is to set up automatic backup to and restore from S3, f.ex. using duplicity with its S3 backend (as I do for my NAS). Anyway, here is how I got EBS volume mounting working. There are 4 parts to the solution: Configure EB to create an EBS mount for your instances Add custom EB commands to format and mount the volume upon first use Restart the Docker daemon after the volume is mounted so that it will see it (see this discussion) Configure Docker to mount the (mounted) volume inside the container 1-3.: .ebextensions/01-ebs.config: # .ebextensions/01-ebs.config commands: 01format-volume: command: mkfs -t ext3 /dev/sdh test: file -sL /dev/sdh | grep -v 'ext3 filesystem' # ^ prints '/dev/sdh: data' if not formatted 02attach-volume: ### Note: The volume may be renamed by the Kernel, e.g. sdh -> xvdh but # /dev/ will then contain a symlink from the old to the new name command: | mkdir /media/ebs_volume mount /dev/sdh /media/ebs_volume service docker restart # We must restart Docker daemon or it wont' see the new mount test: sh -c "! grep -qs '/media/ebs_volume' /proc/mounts" option_settings: # Tell EB to create a 100GB volume and mount it to /dev/sdh - namespace: aws:autoscaling:launchconfiguration option_name: BlockDeviceMappings value: /dev/sdh=:100 4.: Dockerrun.aws.json and Dockerfile: Dockerrun.aws.json: mount the host’s /media/ebs_volume as /var/easydeploy/share inside the container: { "AWSEBDockerrunVersion": "1", "Volumes": [ { "HostDirectory": "/media/ebs_volume", "ContainerDirectory": "/var/easydeploy/share" } ] } Dockerfile: Tell Docker to use a directory on the host system as /var/easydeploy/share – either a randomly generated one or the one given via the -m mount option to docker run: ... VOLUME ["/var/easydeploy/share"] ...
June 3, 2015
by Jakub Holý
· 14,766 Views
article thumbnail
Ecosystem of Hadoop Animal Zoo
hadoop is best known for map reduce and it's distributed file system (hdfs). recently other productivity tools developed on top of these will form a complete ecosystem of hadoop. most of the projects are hosted under apache software foundation . hadoop ecosystem projects are listed below. hadoop common a set of components and interfaces for distributed file system and i/o (serialization, java rpc, persistent data structures) http://hadoop.apache.org/ hadoop ecosystem hdfs a distributed file system that runs on large clusters of commodity hardware. hadoop distributed file system, hdfs renamed form ndfs. scalable data store that stores semi-structured, un-structured and structured data. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfsuserguide.html http://wiki.apache.org/hadoop/hdfs map reduce map reduce is the distributed, parallel computing programming model for hadoop. inspired from google map reduce research paper . hadoop includes implementation of map reduce programming model. in map reduce there are two phases, not surprisingly map and reduce. to be precise in between map and reduce phase, there is another phase called sort and shuffle. job tracker in name node machine manages other cluster nodes. map reduce programming can be written in java. if you like sql or other non- java languages, you are still in luck. you can use utility called hadoop streaming. http://wiki.apache.org/hadoop/hadoopmapreduce hadoop streaming a utility to enable map reduce code in many languages like c, perl, python, c++, bash etc., examples include a python mapper and awk reducer. http://hadoop.apache.org/docs/r1.2.1/streaming.html avro a serialization system for efficient, cross-language rpc and persistent data storage. avro is a framework for performing remote procedure calls and data serialization. in the context of hadoop, it can be used to pass data from one program or language to another, e.g. from c to pig. it is particularly suited for use with scripting languages such as pig, because data is always stored with its schema in avro. http://avro.apache.org/ apache thrift apache thrift allows you to define data types and service interfaces in a simple definition file. taking that file as input, the compiler generates code to be used to easily build rpc clients and servers that communicate seamlessly across programming languages. instead of writing a load of boilerplate code to serialize and transport your objects and invoke remote methods, you can get right down to business. http://thrift.apache.org/ hive and hue if you like sql, you would be delighted to hear that you can write sql and hive convert it to a map reduce job. but, you don't get a full ansi-sql environment. hue gives you a browser based graphical interface to do your hive work. hue features a file browser for hdfs, a job browser for map reduce/yarn, an hbase browser, query editors for hive, pig, cloudera impala and sqoop2.it also ships with an oozie application for creating and monitoring workflows, a zookeeper browser and an sdk. pig a high-level programming data flow language and execution environment to do map reduce coding the pig language is called pig latin. you may find naming conventions some what un-conventional, but you get incredible price-performance and high availability. https://pig.apache.org/ jaql jaql is a functional, declarative programming language designed especially for working with large volumes of structured, semi-structured and unstructured data. as its name implies, a primary use of jaql is to handle data stored as json documents, but jaql can work on various types of data. for example, it can support xml, comma-separated values (csv) data and flat files. a "sql within jaql" capability lets programmers work with structured sql data while employing a json data model that's less restrictive than its structured query language counterparts. 1. jaql in google code 2. what is jaql? by ibm sqoop sqoop provides a bi-directional data transfer between hadoop -hdfs and your favorite relational database. for example you might be storing your app data in relational store such as oracle, now you want to scale your application with hadoop so you can migrate oracle database data to hadoop hdfs using sqoop. http://sqoop.apache.org/ oozie manages hadoop workflow. this doesn't replace your scheduler or BPM tooling, but it will provide if-then-else branching and control with hadoop jobs. https://oozie.apache.org/ zookeeper a distributed, highly available coordination service. zookeeper provides primitives such as distributed locks that can be used for building the highly scalable applications. it is used to manage synchronization for cluster. http://zookeeper.apache.org/ hbase based on google's bigtable , hbase "is an open-source, distributed, version, column-oriented store" that sits on top of hdfs. a super scalable key-value store. it works very much like a persistent hash-map (for python developers think like a dictionary). it is not a conventional relational database. it is a distributed, column oriented database. hbase uses hdfs for it's underlying. supports both batch-style computations using map reduce and point queries for random reads. https://hbase.apache.org/ cassandra a column oriented nosql data store which offers scalability, high availability with out compromising on performance. it perfect platform for commodity hardware and cloud infrastructure.cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for de-normalization and materialized views , and powerful built-in caching. http://cassandra.apache.org/ flume a real time loader for streaming your data into hadoop. it stores data in hdfs and hbase.flume "channels" data between "sources" and "sinks" and its data harvesting can either be scheduled or event-driven. possible sources for flume include avro, files, and system logs, and possible sinks include hdfs and hbase. http://flume.apache.org/ mahout machine learning for hadoop, used for predictive analytics and other advanced analysis. there are currently four main groups of algorithms in mahout: recommendations, a.k.a. collective filtering classification, a.k.a categorization clustering frequent item set mining, a.k.a parallel frequent pattern mining mahout is not simply a collection of pre-existing algorithms; many machine learning algorithms are intrinsically non-scalable; that is, given the types of operations they perform, they cannot be executed as a set of parallel processes. algorithms in the mahout library belong to the subset that can be executed in a distributed fashion. http://en.wikipedia.org/wiki/list_of_machine_learning_algorithms https://www.coursera.org/course/machlearning https://mahout.apache.org/ fuse makes the hdfs system to look like a regular file system so that you can use ls, rm, cd etc., directly on hdfs data. whirr apache whirr is a set of libraries for running cloud services. whirr provides a cloud-neutral way to run services. you don't have to worry about the idiosyncrasies of each provider.a common service api. the details of provisioning are particular to the service. smart defaults for services. you can get a properly configured system running quickly, while still being able to override settings as needed. you can also use whirr as a command line tool for deploying clusters. https://whirr.apache.org/ giraph an open source graph processing api like pregel from google https://giraph.apache.org/ chukwa chukwa, an incubator project on apache, is a data collection and analysis system built on top of hdfs and map reduce. tailored for collecting logs and other data from distributed monitoring systems, chukwa provides a workflow that allows for incremental data collection, processing and storage in hadoop. it is included in the apache hadoop distribution as an independent module. https://chukwa.apache.org/ drill apache drill, an incubator project on apache, is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. drill is the open source version of google's dremel system which is available as an iaas service called google big query. one explicitly stated design goal is that drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. http://incubator.apache.org/drill/ impala (cloudera) released by cloudera, impala is an open-source project which, like apache drill, was inspired by google's paper on dremel; the purpose of both is to facilitate real-time querying of data in hdfs or hbase. impala uses an sql-like language that, though similar to hiveql, is currently more limited than hiveql. because impala relies on the hive meta store, hive must be installed on a cluster in order for impala to work. the secret behind impala's speed is that it "circumvents map reduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel rdbmss." (source: cloudera) http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html http://training.cloudera.com/elearning/impala/
June 3, 2015
by Umashankar Ankuri
· 23,879 Views · 3 Likes
article thumbnail
DevOps is Killing Maintenance. Let's Celebrate.
There's a misconception that DevOps is killing developers, but its not, it is killing the idea of server and IT operations maintenance.
May 23, 2015
by Jim Bird
· 10,920 Views
article thumbnail
Efficient Cassandra Write Pattern for Micro-Batching
The best way to write to a Cassandra cluster are concurrent asynchronous writes. In cases where data exhibits strong temporal locality, speed can be improved.
May 20, 2015
by John Georgiadis
· 35,023 Views · 1 Like
article thumbnail
How To Set Up a Tomcat, Apache and mod_jk Cluster
In this article I will go through a common set-up for a small production environment. A single tier, load balanced application server cluster. Overview A high level overview of what we will be doing. Downloading and installing Apache HTTP server and mod_jk Downloading Tomcat Downloading Java Configuring two local Tomcat servers Clustering the two Tomcat servers Configuring Apache to use mod_jk to forward request to Tomcat Deploying application to Tomcat server that tests our set-up Introduction What is Apache? Apache is an HTTP server. What is mod_jk? It is an Apache module that allows AJP communication between Apache and a back end application server like Tomcat.I am running this on Ubuntu 14.04LTS installed on a dual boot PC with Windows 7. Download Apache2 We are going to use Ubuntu's APT package maintenance system to obtain and install Apache2. sudo apt-get install apache2 This will install in /etc/apache2 Download and install mod_jk The mod_jk module is not included in the Apache2 download so must be obtained and installed separately. The installation requires that the mod_jk module is visible to Apache and configured to ensure that Apache knows where to look for it and what to do with the requests you want to proxy. sudo apt-get install libapache2-mod-jk This will install in /etc/libapache2-mod-jk also two files have been added to the /etc/apache2/mods-available folder. Downloading and installing Tomcat 8 At the time of writing this Tomcat 8 does not have a package in APT so you must download the binaries from the tomcat website.http://tomcat.apache.org/download-80.cgi select the appropriate binary distribution and extract it as follows. tar xvzf apache-tomcat-8.0.5.tar.gz We need two copies of the Tomcat server to be load balanced. I created two directories in the /opt/ location: /opt/tomcat-server1/ and /opt/tomcat-server2/ and copied tomcat into each one. Download and install Java Download Java from APT as follows: apt-get install openjdk-7-jdk and set JAVA_HOME in .bashrc vim ~/.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 Configure two local Tomcat servers We will edit only the server.xml of the server2 installation of tomcat. We need to change port numbers to avoid conflicts.We change the following: and comment out the HTTP Connector as we only want the web application to be accessible through the load balancer.Here is my server2 Tomcat server.xml configuration. Configure mod_jk Load balancing is configured in the workers.properties file, located /etc/libapache2-mod-jk/ where workers represent actual or virtual workers.We will define two actual workers and two virtual workers which map to the Tomcat servers. In the worker.list property I have defined two virtual workers: status and loadbalancer, I will refer to these later in the Apache configuration.Workers for each server have been defined using values for the server.xml configuration files. I used the port values for the AJP connectors and I have included an lbfactor that sets the preference that the load balancer will show for that server.Finally we define the virtual workers. The loadbalancer worker is set to type lb and set the workers that represent the Tomcat servers in the balancer_workers properties. The status only needs to be set to type status. worker.list=loadbalancer,status worker.server1.port=8009worker.server1.host=localhostworker.server1.type=ajp13 worker.server2.port=9009worker.server2.host=localhostworker.server2.type=ajp13 worker.server1.lbfactor=1worker.server2.lbfactor=1 worker.loadbalancer.type=lbworker.loadbalancer.balance_workers=server1,server2 worker.status.type=status Ensure that you remove any other worker configuration that are not being used. Configure Apache Web Server to forward requests You will need to add the following to the Apache configurations located in etc/apache2/sites-enabled/000-default.conf JkMount /status status JkMount /* loadbalancer Verify the installation To test that all has been configured correctly we need to deploy an application. A sample application that has been used for years to test such configurations is called the ClusterJSP sample application. You can find it by googling in or from the JBoss site.Now deploy the war to the webapps folder on both servers and start each server using the start-up script /opt/tomcat-server1/bin/startup.sh.Go to http://localhost/clusterjsp/HaJsp.jsp and you should see the page show HttpSession information. Now lets look at the mod_jk status page: http://localhost/status. You will see that this page shows information about the load balancer workers and the workers it is balancing. If everything is working you will see the worker error state show OK or OK/IDLE if they are not currently balancing load. Things to try out Enable sticky sessions: Configure jvmRoute in the server.xml configuration. Further reading Loadbalancing with mod_jk and ApacheWorking with mod_jk Connecting Apache's Web Server to Multiple Instances of Tomcat
May 19, 2015
by Alex Theedom
· 10,795 Views · 1 Like
article thumbnail
The Origins of Trunk Based Development
Learn more about trunk-based development and its influence on source control management systems, DevOps, and software development as a whole.
May 16, 2015
by Paul Hammant
· 6,550 Views
article thumbnail
Docker Machine on Windows - How To Setup You Hosts
I've been playing around with Docker a lot lately. Many reasons for that, one for sure is, that I love to play around with latest technology and even help out to build a demo or two or a lab. The main difference, between what everybody else of my coworkers is doing is, that I run my setup on Windows. Like most of the middleware developers out there. So, If you followed Arun's blog about "Docker Machine to Setup Docker Host" you might have tried to make this work on windows already. Here is the ultimate short how-to guide on using Docker Machine to administrate and spin up your Docker hosts. Docker Machine Machine lets you create Docker hosts on your computer, on cloud providers, and inside your own data center. It creates servers, installs Docker on them, then configures the Docker client to talk to them. You basically don't have to have anything installed on your machine prior to this. Which is a hell lot easier, than having to manually install boot2docker before. So, let's try this out. You want to have at least one thing in place before starting with anything Docker or Machine. Go and get Git for Windows (aka msysgit). It has all kinds of helpful unix tools in his belly, which you need anyway. Prerequisites - The One For All Solution The first is to install the windows boot2docker distribution which I showed in an earlier blog. It contains the following bits configured and ready for you to use: - VirtualBox - Docker Windows Client Prerequisites- The Bits And Pieces I dislike the boot2docker installer for a variety of reasons. Mostly, because I want to know what exactly is going on on my machine. So I played around a bit and here is the bits and pieces installer if you decide against the one-for-all solution. Start with the virtualization solution. We need something like that on Windows, because it just can't run Linux and this is what Docker is based on. At least for now. So, get VirtualBox and ensure that version 4.3.18 is correctly installed on your system (VirtualBox-4.3.18-96516-Win.exe, 105 MB). WARNING: There is a strange issue, when you run Windows itself in Virtualbox. You might run into an issue with starting the host. And while you're at it, go and get the Docker Windows Client. The other is to grab the final from the test servers as a direct download (docker-1.6.0.exe, x86_64, 7.5MB). Rename to "docker" and put it into a folder of your choice (I assume it will be c:\docker\. Now you also need to download Docker Machine, which is another single executable (docker-machine_windows-amd64.exe, 11.5MB). Rename to "docker-machine" and put it into the same folder. Now add this folder to your PATH: set PATH=%PATH%;C:\docker If you change your standard PATH environment variable, this might safe your from a lot of typing. That's it. Now you're ready to create your first Machine managed Docker Host. Create Your Docker Host With Machine All you need is a simple command: docker-machine create --driver virtualbox dev And the output should state: ←[34mINFO←[0m[0000] Creating SSH key... ←[34mINFO←[0m[0001] Creating VirtualBox VM... ←[34mINFO←[0m[0016] Starting VirtualBox VM... ←[34mINFO←[0m[0022] Waiting for VM to start... ←[34mINFO←[0m[0076] "dev" has been created and is now the active machine. ←[34mINFO←[0m[0076] To point your Docker client at it, run this in your shell: eval "$(docker-machine.exe env dev)" This means, you just created a Docker Host using the VirtualBox provider and the name “dev”. Now you need to find out on which IP address the host is running. docker-machine ip 192.168.99.102 If you want to configure your environment variables, needed by the client more easy, just use the following command: docker-machine env dev export DOCKER_TLS_VERIFY=1 export DOCKER_CERT_PATH="C:\\Users\\markus\\.docker\\machine\\machines\\dev" export DOCKER_HOST=tcp://192.168.99.102:2376 Which outputs the Linux version of environment variable definition. All you have to do is to change the "export" keyword to "set", remove the " and the double back-slashes and you are ready to go. C:\Users\markus\Downloads>set DOCKER_TLS_VERIFY=1 C:\Users\markus\Downloads>set DOCKER_CERT_PATH=C:\Users\markus\.docker\machine\machines\dev C:\Users\markus\Downloads>set DOCKER_HOST=tcp://192.168.99.102:2376 Time to test our Docker Client And here we go now run WildFly on your freshly created host: docker run -it -p 8080:8080 jboss/wildfly Watch the container being downloaded and check, that it is running by redirecting your browser to http://192.168.99.102:8080/. Congratulations on having setup your very first docker host with Maschine on Windows.
May 12, 2015
by Markus Eisele
· 20,149 Views
article thumbnail
8 Questions You Need to Ask About Microservices, Containers & Docker in 2015
In containers and microservices, we’re facing the greatest potential change in how we deliver and run software services since the arrival of virtual machines.
May 9, 2015
by Andrew Phillips
· 15,008 Views · 1 Like
article thumbnail
Binding to Data Services with Spring Boot in Cloud Foundry
Written by Dave Syer on the Spring blog In this article we look at how to bind a Spring Boot application to data services (JDBC, NoSQL, messaging etc.) and the various sources of default and automatic behaviour in Cloud Foundry, providing some guidance about which ones to use and which ones will be active under what conditions. Spring Boot provides a lot of autoconfiguration and external binding features, some of which are relevant to Cloud Foundry, and many of which are not. Spring Cloud Connectors is a library that you can use in your application if you want to create your own components programmatically, but it doesn’t do anything “magical” by itself. And finally there is the Cloud Foundry java buildpack which has an “auto-reconfiguration” feature that tries to ease the burden of moving simple applications to the cloud. The key to correctly configuring middleware services, like JDBC or AMQP or Mongo, is to understand what each of these tools provides, how they influence each other at runtime, and and to switch parts of them on and off. The goal should be a smooth transition from local execution of an application on a developer’s desktop to a test environment in Cloud Foundry, and ultimately to production in Cloud Foundry (or otherwise) with no changes in source code or packaging, per the twelve-factor application guidelines. There is some simple source code accompanying this article. To use it you can clone the repository and import it into your favourite IDE. You will need to remove two dependencies from the complete project to get to the same point where we start discussing concrete code samples, namely spring-boot-starter-cloud-connectors and auto-reconfiguration. NOTE: The current co-ordinates for all the libraries being discussed are org.springframework.boot:spring-boot-*:1.2.3.RELEASE,org.springframework.boot:spring-cloud-*-connector:1.1.1.RELEASE,org.cloudfoundry:auto-reconfiguration:1.7.0.RELEASE. TIP: The source code in github includes a docker-compose.yml file (docs here). You can use that to create a local MySQL database if you don’t have one running already. You don’t actually need it to run most of the code below, but it might be useful to validate that it will actually work. Punchline for the Impatient If you want to skip the details, and all you need is a recipe for running locally with H2 and in the cloud with MySQL, then start here and read the rest later when you want to understand in more depth. (Similar options exist for other data services, like RabbitMQ, Redis, Mongo etc.) Your first and simplest option is to simply do nothing: do not define a DataSource at all but put H2 on the classpath. Spring Boot will create the H2 embedded DataSource for you when you run locally. The Cloud Foundry buildpack will detect a database service binding and create a DataSource for you when you run in the cloud. If you add Spring Cloud Connectors as well, your app will also work in other cloud platforms, as long as you include a connector. That might be good enough if you just want to get something working. If you want to run a serious application in production you might want to tweak some of the connection pool settings (e.g. the size of the pool, various timeouts, the important test on borrow flag). In that case the buildpack auto-reconfiguration DataSource will not meet your requirements and you need to choose an alternative, and there are a number of more or less sensible choices. The best choice is probably to create a DataSource explicitly using Spring Cloud Connectors, but guarded by the “cloud” profile: @Configuration @Profile("cloud") public class DataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSourceclass, null); } } You can use spring.datasource.* properties (e.g. in application.properties or a profile-specific version of that) to set the additional properties at runtime. The “cloud” profile is automatically activated for you by the buildpack. Now for the details. We need to build up a picture of what’s going on in your application at runtime, so we can learn from that how to make a sensible choice for configuring data services. Layers of Autoconfiguration Let’s take a a simple app with DataSource (similar considerations apply to RabbitMQ, Mongo, Redis): @SpringBootApplication public class CloudApplication { @Autowired private DataSource dataSource; public static void main(String[] args) { SpringApplication.run(CloudApplication.class, args); } } This is a complete application: the DataSource can be @Autowired because it is created for us by Spring Boot. The details of the DataSource (concrete class, JDBC driver, connection URL, etc.) depend on what is on the classpath. Let’s assume that the application uses Spring JDBC via the spring-boot-starter-jdbc (or spring-boot-starter-data-jpa), so it has aDataSource implementation available from Tomcat (even if it isn’t a web application), and this is what Spring Boot uses. Consider what happens when: Classpath contains H2 (only) in addition to the starters: the DataSource is the Tomcat high-performance pool from DataSourceAutoConfiguration and it connects to an in memory database “testdb”. Classpath contains H2 and MySQL: DataSource is still H2 (same as before) because we didn’t provide any additional configuration for MySQL and Spring Boot can’t guess the credentials for connecting. Add spring-boot-starter-cloud-connectors to the classpath: no change inDataSource because the Spring Cloud Connectors do not detect that they are running in a Cloud platform. The providers that come with the starter all look for specific environment variables, which they won’t find unless you set them, or run the app in Cloud Foundry, Heroku, etc. Run the application in “cloud” profile with spring.profiles.active=cloud: no change yet in the DataSource, but this is one of the things that the Java buildpack does when your application runs in Cloud Foundry. Run in “cloud” profile and provide some environment variables to simulate running in Cloud Foundry and binding to a MySQL service: VCAP_APPLICATION={"name":"application","instance_id":"FOO"} VCAP_SERVICES={"mysql":[{"name":"mysql","tags":["mysql"],"credentials":{"uri":"mysql://root:root@localhost/test"}]} (the “tags” provides a hint that we want to create a MySQL DataSource, the “uri” provides the location, and the “name” becomes a bean ID). The DataSource is now using MySQL with the credentials supplied by the VCAP_* environment variables. Spring Boot has some autoconfiguration for the Connectors, so if you looked at the beans in your application you would see a CloudFactory bean, and also the DataSource bean (with ID “mysql”). Theautoconfiguration is equivalent to adding @ServiceScan to your application configuration. It is only active if your application runs in the “cloud” profile, and only if there is no existing @Bean of type Cloud, and the configuration flagspring.cloud.enabled is not “false”. Add the “auto-reconfiguration” JAR from the Java buildpack (Maven co-ordinatesorg.cloudfoundry:auto-reconfiguration:1.7.0.RELEASE). You can add it as a local dependency to simulate running an application in Cloud Foundry, but it wouldn’t be normal to do this with a real application (this is just for experimenting with autoconfiguration). The auto-reconfiguration JAR now has everything it needs to create a DataSource, but it doesn’t (yet) because it detects that you already have a bean of type CloudFactory, one that was added by Spring Boot. Remove the explicit “cloud” profile. The profile will still be active when your app starts because the auto-reconfiguration JAR adds it back again. There is still no change to theDataSource because Spring Boot has created it for you via the @ServiceScan. Remove the spring-boot-starter-cloud-connectors dependency, so that Spring Boot backs off creating a CloudFactory. The auto-reconfiguration JAR actually has its own copy of Spring Cloud Connectors (all the classes with different package names) and it now uses them to create a DataSource (in a BeanFactoryPostProcessor). The Spring Boot autoconfigured DataSource is replaced with one that binds to MySQL via theVCAP_SERVICES. There is no control over pool properties, but it does still use the Tomcat pool if available (no support for Hikari or DBCP2). Remove the auto-reconfiguration JAR and the DataSource reverts to H2. TIP: use web and actuator starters with endpoints.health.sensitive=false to inspect the DataSource quickly through “/health”. You can also use the “/beans”, “/env” and “/autoconfig” endpoints to see what is going in in the autoconfigurations and why. NOTE: Running in Cloud Foundry or including auto-reconfiguration JAR in classpath locally both activate the “cloud” profile (for the same reason). The VCAP_* env vars are the thing that makes Spring Cloud and/or the auto-reconfiguration JAR create beans. NOTE: The URL in the VCAP_SERVICES is actually not a “jdbc” scheme, which should be mandatory for JDBC connections. This is, however, the format that Cloud Foundry normally presents it in because it works for nearly every language other than Java. Spring Cloud Connectors or the buildpack auto-reconfiguration, if they are creating a DataSource, will translate it into a jdbc:* URL for you. NOTE: The MySQL URL also contains user credentials and a database name which are valid for the Docker container created by the docker-compose.yml in the sample source code. If you have a local MySQL server with different credentials you could substitute those. TIP: If you use a local MySQL server and want to verify that it is connected, you can use the “/health” endpoint from the Spring Boot Actuator (included in the sample code already). Or you could create a schema-mysql.sql file in the root of the classpath and put a simple keep alive query in it (e.g. SELECT 1). Spring Boot will run that on startupso if the app starts successfully you have configured the database correctly. The auto-reconfiguration JAR is always on the classpath in Cloud Foundry (by default) but it backs off creating any DataSource if it finds a org.springframework.cloud.CloudFactorybean (which is provided by Spring Boot if the CloudAutoConfiguration is active). Thus the net effect of adding it to the classpath, if the Connectors are also present in a Spring Boot application, is only to enable the “cloud” profile. You can see it making the decision to skip auto-reconfiguration in the application logs on startup: 015-04-14 15:11:11.765 INFO 12727 --- [ main] urceCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type javax.sql.DataSource 2015-04-14 15:11:57.650 INFO 12727 --- [ main] ongoCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type org.springframework.data.mongodb.MongoDbFactory 2015-04-14 15:11:57.650 INFO 12727 --- [ main] bbitCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type org.springframework.amqp.rabbit.connection.ConnectionFactory 2015-04-14 15:11:57.651 INFO 12727 --- [ main] edisCloudServiceBeanFactoryPostProcessor : Skipping auto-reconfiguring beans of type org.springframework.data.redis.connection.RedisConnectionFactory ... etc. Create your own DataSource The last section walked through most of the important autoconfiguration features in the various libraries. If you want to take control yourself, one thing you could start with is to create your own instance of DataSource. You could do that, for instance, using aDataSourceBuilder which is a convenience class and comes as part of Spring Boot (it chooses an implementation based on the classpath): @SpringBootApplication public class CloudApplication { @Bean public DataSource dataSource() { return DataSourceBuilder.create().build(); } ... } The DataSource as we’ve defined it is useless because it doesn’t have a connection URL or any credentials, but that can easily be fixed. Let’s run this application as if it was in Cloud Foundry: with the VCAP_* environment variables and the auto-reconfiguration JAR but not Spring Cloud Connectors on the classpath and no explicit “cloud” profile. The buildpack activates the “cloud” profile, creates a DataSource and binds it to the VCAP_SERVICES. As already described briefly, it removes your DataSource completely and replaces it with a manually registered singleton (which doesn’t show up in the “/beans” endpoint in Spring Boot). Now add Spring Cloud Connectors back into the classpath the application and see what happens when you run it again. It actually fails on startup! What has happened? The@ServiceScan (from Connectors) goes and looks for bound services, and creates bean definitions for them. That’s a bit like the buildpack, but different because it doesn’t attempt to replace any existing bean definitions of the same type. So you get an autowiring error because there are 2 DataSources and no way to choose one to inject into your application in various places where one is needed. To fix that we are going to have to take control of the Cloud Connectors (or simply not use them). Using a CloudFactory to create a DataSource You can disable the Spring Boot autoconfiguration and the Java buildpack auto-reconfiguration by creating your own Cloud instance as a @Bean: @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSource.class, null); } Pros: The Connectors autoconfiguration in Spring Boot backed off so there is only oneDataSource. It can be tweaked using application.properties via spring.datasource.*properties, per the Spring Boot User Guide. Cons: It doesn’t work without VCAP_* environment variables (or some other cloud platform). It also relies on user remembering to ceate the Cloud as a @Bean in order to disable the autoconfiguration. Summary: we are still not in a comfortable place (an app that doesn’t run without some intricate wrangling of environment variables is not much use in practice). Dual Running: Local with H2, in the Cloud with MySQL There is a local configuration file option in Spring Cloud Connectors, so you don’t have to be in a real cloud platform to use them, but it’s awkward to set up despite being boiler plate, and you also have to somehow switch it off when you are in a real cloud platform. The last point there is really the important one because you end up needing a local file to run locally, but only running locally, and it can’t be packaged with the rest of the application code (for instance violates the twelve factor guidelines). So to move forward with our explicit @Bean definition it’s probably better to stick to mainstream Spring and Spring Boot features, e.g. using the “cloud” profile to guard the explicit creation of a DataSource: @Configuration @Profile("cloud") public class DataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSource.class, null); } } With this in place we have a solution that works smoothly both locally and in Cloud Foundry. Locally Spring Boot will create a DataSource with an H2 embedded database. In Cloud Foundry it will bind to a singleton service of type DataSource and switch off the autconfigured one from Spring Boot. It also has the benefit of working with any platform supported by Spring Cloud Connectors, so the same code will run on Heroku and Cloud Foundry, for instance. Because of the @ConfigurationProperties you can bind additional configuration to the DataSource to tweak connection pool properties and things like that if you need to in production. NOTE: We have been using MySQL as an example database server, but actually PostgreSQL is at least as compelling a choice if not more. When paired with H2 locally, for instance, you can put H2 into its “Postgres compatibility” mode and use the same SQL in both environments. Manually Creating a Local and a Cloud DataSource If you like creating DataSource beans, and you want to do it both locally and in the cloud, you could use 2 profiles (“cloud” and “local”), for example. But then you would have to find a way to activate the “local” profile by default when not in the cloud. There is already a way to do that built into Spring because there is always a default profile called “default” (by default). So this should work: @Configuration @Profile("default") // or "!cloud" public class LocalDataSourceConfiguration { @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return DataSourceBuilder.create().build(); } } @Configuration @Profile("cloud") public class CloudDataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return cloud().getSingletonServiceConnector(DataSource.class, null); } } The “default” DataSource is actually identical to the autoconfigured one in this simple example, so you wouldn’t do this unless you needed to, e.g. to create a custom concreteDataSource of a type not supported by Spring Boot. You might think it’s all getting a bit complicated, but in fact Spring Boot is not making it any harder, we are just dealing with the consequences of needing to control the DataSource construction in 2 environments. Using a Non-Embedded Database Locally If you don’t want to use H2 or any in-memory database locally, then you can’t really avoid having to configure it (Spring Boot can guess a lot from the URL, but it will need that at least). So at a minimum you need to set some spring.datasource.* properties (the URL for instance). That that isn’t hard to do, and you can easily set different values in different environments using additional profiles, but as soon as you do that you need to switch off the default values when you go into the cloud. To do that you could define thespring.datasource.* properties in a profile-specific file (or document in YAML) for the “default” profile, e.g. application-default.properties, and these will not be used in the “cloud” profile. A Purely Declarative Approach If you prefer not to write Java code, or don’t want to use Spring Cloud Connectors, you might want to try and use Spring Boot autoconfiguration and external properties (or YAML) files for everything. For example Spring Boot creates a DataSource for you if it finds the right stuff on the classpath, and it can be completely controlled through application.properties, including all the granular features on the DataSource that you need in production (like pool sizes and validation queries). So all you need is a way to discover the location and credentials for the service from the environment. The buildpack translates Cloud Foundry VCAP_*environment variables into usable property sources in the Spring Environment. Thus, for instance, a DataSource configuration might look like this: spring.datasource.url: ${cloud.services.mysql.connection.jdbcurl:jdbc:h2:mem:testdb} spring.datasource.username: ${cloud.services.mysql.connection.username:sa} spring.datasource.password: ${cloud.services.mysql.connection.password:} spring.datasource.testOnBorrow: true The “mysql” part of the property names is the service name in Cloud Foundry (so it is set by the user). And of course the same pattern applies to all kinds of services, not just a JDBCDataSource. Generally speaking it is good practice to use external configuration and in particular @ConfigurationProperties since they allow maximum flexibility, for instance to override using System properties or environment variables at runtime. Note: similar features are provided by Spring Boot, which provides vcap.services.*instead of cloud.services.*, so you actually end up with more than one way to do this. However, the JDBC urls are not available from the vcap.services.* properties (non-JDBC services work fine with tthe corresponding vcap.services.*credentials.url). One limitation of this approach is it doesn’t apply if the application needs to configure beans that are not provided by Spring Boot out of the box (e.g. if you need 2 DataSources), in which case you have to write Java code anyway, and may or may not choose to use properties files to parameterize it. Before you try this yourself, though, beware that actually it doesn’t work unless you also disable the buildpack auto-reconfiguration (and Spring Cloud Connectors if they are on the classpath). If you don’t do that, then they create a new DataSource for you and Spring Boot cannot bind it to your properties file. Thus even for this declarative approach, you end up needing an explicit @Bean definition, and you need this part of your “cloud” profile configuration: @Configuration @Profile("cloud") public class CloudDataSourceConfiguration { @Bean public Cloud cloud() { return new CloudFactory().getCloud(); } } This is purely to switch off the buildpack auto-reconfiguration (and the Spring Boot autoconfiguration, but that could have been disabled with a properties file entry). Mixed Declarative and Explicit Bean Definition You can also mix the two approaches: declare a single @Bean definition so that you control the construction of the object, but bind additional configuration to it using@ConfigurationProperties (and do the same locally and in Cloud Foundry). Example: @Configuration public class LocalDataSourceConfiguration { @Bean @ConfigurationProperties(DataSourceProperties.PREFIX) public DataSource dataSource() { return DataSourceBuilder.create().build(); } } (where the DataSourceBuilder would be replaced with whatever fancy logic you need for your use case). And the application.properties would be the same as above, with whatever additional properties you need for your production settings. A Third Way: Discover the Credentials and Bind Manually Another approach that lends itself to platform and environment independence is to declare explicit bean definitions for the @ConfigurationProperties beans that Spring Boot uses to bind its autoconfigured connectors. For instance, to set the default values for a DataSourceyou can declare a @Bean of type DataSourceProperties: @Bean @Primary public DataSourceProperties dataSourceProperties() { DataSourceProperties properties = new DataSourceProperties(); properties.setInitialize(false); return properties; } This sets a default value for the “initialize” flag, and allows other properties to be bound fromapplication.properties (or other external properties). Combine this with the Spring Cloud Connectors and you can control the binding of the credentials when a cloud service is detected: @Autowired(required="false") Cloud cloud; @Bean @Primary public DataSourceProperties dataSourceProperties() { DataSourceProperties properties = new DataSourceProperties(); properties.setInitialize(false); if (cloud != null) { List infos = cloud.getServiceInfos(RelationalServiceInfo.class); if (infos.size()==1) { RelationalServiceInfo info = (RelationalServiceInfo) infos.get(0); properties.setUrl(info.getJdbcUrl()); properties.setUsername(info.getUserName()); properties.setPassword(info.getPassword()); } } return properties; } and you still need to define the Cloud bean in the “cloud” profile. It ends up being quite a lot of code, and is quite unnecessary in this simple use case, but might be handy if you have more complicated bindings, or need to implement some logic to choose a DataSource at runtime. Spring Boot has similar *Properties beans for the other middleware you might commonly use (e.g. RabbitProperties, RedisProperties, MongoProperties). An instance of such a bean marked as @Primary is enough to reset the defaults for the autoconfigured connector. Deploying to Multiple Cloud Platforms So far, we have concentrated on Cloud Foundry as the only cloud platform in which to deploy the application. One of the nice features of Spring Cloud Connectors is that it supports other platforms, either out of the box or as extension points. Thespring-boot-starter-cloud-connectors even includes Heroku support. If you do nothing at all, and rely on the autoconfiguration (the lazy programmer’s approach), then your application will be deployable in all clouds where you have a connector on the classpath (i.e. Cloud Foundry and Heroku if you use the starter). If you take the explicit @Bean approach then you need to ensure that the “cloud” profile is active in the non-Cloud Foundry platforms, e.g. through an environment variable. And if you use the purely declarative approach (or any combination involving properties files) you need to activate the “cloud” profile and probably also another profile specific to your platform, so that the right properties files end up in theEnvironment at runtime. Summary of Autoconfiguration and Provided Behaviour Spring Boot provides DataSource (also RabbitMQ or Redis ConnectionFactory, Mongo etc.) if it finds all the right stuff on the classpath. Using the “spring-boot-starter-*” dependencies is sufficient to activate the behaviour. Spring Boot also provides an autowirable CloudFactory if it finds Spring Cloud Connectors on the classpath (but switches off only if it finds a @Bean of type Cloud). The CloudAutoConfiguration in Spring Boot also effectively adds a @CloudScan to your application, which you would want to switch off if you ever needed to create your ownDataSource (or similar). The Cloud Foundry Java buildpack detects a Spring Boot application and activates the “cloud” profile, unless it is already active. Adding the buildpack auto-reconfiguration JAR does the same thing if you want to try it locally. Through the auto-reconfiguration JAR, the buildpack also kicks in and creates aDataSource (ditto RabbitMQ, Redis, Mongo etc.) if it does not find a CloudFactory bean or a Cloud bean (amongst others). So including Spring Cloud Connectors in a Spring Boot application switches off this part of the “auto-reconfiguration” behaviour (the bean creation). Switching off the Spring Boot CloudAutoConfiguration is easy, but if you do that, you have to remember to switch off the buildpack auto-reconfiguration as well if you don’t want it. The only way to do that is to define a bean definition (can be of type Cloud orCloudFactory for instance). Spring Boot binds application.properties (and other sources of external properties) to@ConfigurationProperties beans, including but not limited to the ones that it autoconfigures. You can use this feature to tweak pool properties and other settings that need to be different in production environments. General Advice and Conclusion We have seen quite a few options and autoconfigurations in this short article, and we’ve only really used thee libraries (Spring Boot, Spring Cloud Connectors, and the Cloud Foundry buildpack auto-reconfiguration JAR) and one platform (Cloud Foundry), not counting local deployment. The buildpack features are really only useful for very simple applications because there is no flexibility to tune the connections in production. That said it is a nice thing to be able to do when prototyping. There are only three main approaches if you want to achieve the goal of deploying the same code locally and in the cloud, yet still being able to make necessary tweaks in production: Use Spring Cloud Connectors to explicitly create DataSource and other middleware connections and protect those @Beans with @Profile("cloud"). The approach always works, but leads to more code than you might need for many applications. Use the Spring Boot default autoconfiguration and declare the cloud bindings usingapplication.properties (or in YAML). To take full advantage you have to expliccitly switch off the buildpack auto-reconfiguration as well. Use Spring Cloud Connectors to discover the credentials, and bind them to the Spring Boot@ConfigurationProperties as default values if present. The three approaches are actually not incompatible, and can be mixed using@ConfigurationProperties to provide profile-specific overrides of default configuration (e.g. for setting up connection pools in a different way in a production environment). If you have a relatively simple Spring Boot application, the only way to choose between the approaches is probably personal taste. If you have a non-Spring Boot application then the explicit @Bean approach will win, and it may also win if you plan to deploy your application in more than one cloud platform (e.g. Heroku and Cloud Foundry). NOTE: This blog has been a journey of discovery (who knew there was so much to learn?). Thanks go to all those who helped with reviews and comments, in particularScott Frederick, who spotted most of the mistakes in the drafts and always had time to look at a new revision.
May 6, 2015
by Pieter Humphrey
· 27,059 Views · 2 Likes
article thumbnail
Why Run Your Microservices on a PaaS
[This article by Chris Haddad comes to you from the DZone Guide to Cloud Development - 2015 Edition. For more information—including in-depth articles from industry experts, best solutions for PaaS, iPaaS, IaaS, and MBaaS, and more—click the link below to download your free copy of the guide.] Microservices can be understood from two angles. First, the differential: teams that take a microservice design approach divide business solutions into distinct, full-stack business services owned by autonomous teams. Second, the integral: microservice-based applications weave multiple atomic microservices into holistic user experiences. Unfortunately, traditional application delivery models and traditional middleware infrastructure do not address microservice-specific demands for on-demand provisioning, dynamic composition, and service level management. On the other hand, the Platform-as-a-Service (PaaS) model addresses these demands perfectly. Running microservices on a PaaS fabric decreases solution fragility, reduces operational burden, and enhances developer productivity. To understand why, we’ll first review how microservices separate concerns from both business and object-oriented design perspectives. Second, we’ll consider how microservice-based design can complicate deployment as applications scale dynamically. Third, we’ll focus on how a PaaS environment helps to solve many of the problems both addressed and introduced by microservices-based architectures — in other words, why PaaS and microservices are a match made in heaven. Microservices: Separating Concerns By Business Solution A microservice approach decomposes monolithic applications according to the single responsibility pattern. In a microservice solution, each microservice interface delivers discrete business capabilities (e.g. customer profile, product catalogue, inventory, order, billing, fulfillment) within a well-defined, bounded context. The atomic microservice interfaces reside on separate and distinct full-stack application platforms that contain separate database storage, integration flows, and web application hosting. By separating concerns onto separate full-stack platforms and not sharing database instances or web application hosts across services, every team is free to choose different runtime languages and frameworks for its own microservice. Also, every team is free to evolve its data schemas, application frameworks, and business logic without impacting other teams. Because microservices are a relatively new design approach, many development teams may have the misconception that creating a microservice-based solution requires simply deploying small web services in containers. But this doesn’t cut quite deep enough. The correct approach is to evolve your monolithic design by applying service-oriented principles (i.e. encapsulation, loose coupling, separation of concerns) in conjunction with domain-driven design techniques and dynamic runtime application composition. For example, in a typical ecommerce scenario, a development team applies the bounded context pattern and single responsibility pattern to refactor a monolithic application into units distinguished by business capability (see Figure 2). By creating a user experience from loosely coupled services instead of tightly coupled native-language business objects, teams have more independence to develop, evolve, and deploy each business capability separately. Obviously, the microservice design approach works best for (a) greenfield projects or (b) modernization efforts where teams focus on refactoring monolithic application assets. The Microservice Execution Trap Although a microservice approach decouples development dependencies and speeds up development iterations, microservices also create a challenging environment for high-performance scaling and reliable runtime execution. More complex, loosely coupled, and dynamic environments distribute business capabilities over the entire network. Even a task as simple as responding to a single web application page request may spread out across several microservice instances residing on a distributed network topology. Martin Fowler and Stefan Tilkov (both microservice proponents) warn teams that successfully implementing a microservice approach requires choosing platforms that decrease solution fragility and reduce operational burdens. What Platform-as-a-Service Offers Platform-as-a-Service environments reduce microservice operational burdens when infrastructure-as-code and declarative policies are used to eliminate all manual actions and increase runtime quality of service (i.e. reliability, availability, scalability, and performance). The appropriate PaaS environment will automatically deploy, provision, and link full-stack microservices. In a microservice architecture, teams want to rapidly release new versions and perform A/B testing across versions. When teams define instance dependencies, scaling properties, and security policies as PaaS metadata or code scripts, the runtime fabric can reduce manual effort and increase release confidence. With a DevOps- friendly PaaS, the team can experiment with new service versions and safely rollback to a prior stable release if a problem arises. Because microservices are full-stack silos *1* that can be composed of multiple server instances (e.g. web server, database, load balancer, integration server), a PaaS can reduce deployment complexity by automatically spinning up and linking all instances. Linking may require discovering instance locations, dynamically initializing network routes, and auto-configuring connection strings based on service version or tenant. A traditional application will compose business functions and user experience by statically linking class files and shared object libraries. In contrast, microservice- based applications use service composition to connect available microservices endpoints and realize a fully functional application. While many microservice proponents promote microservice-based interactions by “smart endpoints through dumb pipes, ‘ effective service composition requires smart infrastructure building blocks to bootstrap and maintain connections between services and consumers. The right PaaS solves these problems. Infrastructure building blocks will register service endpoint locations, associate metadata and policies, connect clients, circuit break around failures, correlate inter-service calls, and load balance traffic. A microservice-friendly PaaS will provide service registries, metadata services, discovery services, and service virtualization gateways. In the pipe, circuit breakers will automatically route traffic on failover or overload. Smart endpoint code will dynamically connect with microservices based on discovery service responses and negotiated quality of service parameters. Rather than being hard-coded to a specific service hostname and URI, endpoint code will query for microservice location based on security assurances, performance guarantees, traffic load, service version, client tenancy, or business domain. When services are unavailable or underperform, smart endpoints will follow the tolerant reader pattern and gracefully degrade experience or proactively recover. A few recovery options include reading from local caches or circuit tripping to backup service endpoints. In conjunction with smart endpoint actions, a smart PaaS will spin up new microservice endpoints and full-stack instances based on service level management metrics. By following microservice architecture best practices, teams create anti-fragile applications that not only withstand a shock, but also improve performance and quality of service when stressed or experiencing failures. To drive this non-intuitive behavior, the underlying platform environment must be ready to scale, repair, and reconnect services. PaaS service level management components will create more resilient and anti-fragile microservices by monitoring performance, elastically provisioning instances, and dynamically re-routing traffic. Scaling an anti-fragile microservice is more difficult than scaling a web application. The PaaS should distribute microservice instances across multiple availability zones and dynamically adjust traffic to reduce latency and response time. Because transient microservice instances will rapidly start, stop, and change location, the service management layer must be completely automated and integrated with routing services. A PaaS environment will deliver the service level management, dynamic service composition, circuit breakers, and on-demand provisioning functions required to overcome the complexity inherent within a distributed microservice-based application architecture. Running microservices on a PaaS fabric will decrease solution fragility, reduce operational burden, and enhance developer productivity. If you are pursuing a microservice design approach, make sure you choose a microservice- friendly PaaS. DOWNLOAD YOUR FREE COPY TODAY
May 5, 2015
by Chris Haddad
· 12,063 Views · 2 Likes
article thumbnail
Adding Version Details on MANIFEST.MF via Jenkins
Article Purpose: The following article will suggest a solution for adding custom information to your web application using manifest.mf file and expose that information in API. The necessity for this solution came to solve "blindness" in deployment. in simpler words, I want to know what war i deployed, what build version it has and more useful information i might need. Since i exposed the information via API, this could function as application "health-check", so this is a nice little addition for the solution. Techs: war (your web application) mvn - build tool maven-war-plugin Jenkins job Step 1 - Adding maven war plugin First you should add the maven war plugin to your pom.xml. In the tag you should add the custom information you need. in case you want to use the default information the plugin gives you out of the box, just mark the value of as true. In the following example you can see i added fields like buildVersion, build time and so on. This build tool is going to provide this data, in this case jenkins. org.apache.maven.plugins maven-war-plugin false ${project.version} ${build.number} ${maven.build.timestamp} ${agent.name} ${user.name} ${[yourWarName].war.finalName} Step 2 - Define maven variables The properties you want to add should be provided from "out side", you should add variables in your pom so Jenkins assign the values there. In the following example below, you can see the build.number variable i defined in the previous step. SNAPSHOT Step 3 - Adding maven goals to your Jenkins job definitions Jenkins has available environment variables you can pass to the maven goal, such as BUILD_NUMBER, BUILD_ID and so on. assign the Jenkins variable to the maven variable in the build step. when Jenkins is going to build the code, it going to sign the manifest,mf file with the custom information we wanted. Step 4 - Exposing the information of war We did all this hard work for one purpose. so we will have this information available in deployment stage. you should expose via API this information. in this example i used Spring MVC controller, and mapped the data to json format but you can have your pick. package your.package; @Controller @RequestMapping(value = "/version", produces = MediaType.APPLICATION_JSON_VALUE) public class VersionController { @Autowired ApplicationContext applicationContext; @RequestMapping(method = RequestMethod.GET) @ResponseBody public JSONObject getVersion() { JSONObject result = new JSONObject(); Resource resource = applicationContext.getResource("/META-INF/MANIFEST.MF"); try { Manifest manifest = new Manifest(resource.getInputStream()); if (manifest != null){ Attributes mainAttributes = manifest.getMainAttributes(); if(mainAttributes != null){ for (Object key : mainAttributes.keySet()) { result.put(key, mainAttributes.get(key)); } } } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return result; } } Step 5 - Enjoy :)
May 5, 2015
by Amazia Gur
· 18,041 Views
article thumbnail
Gradle Goodness: Use Git Commit Id in Build Script
The nice thing about Gradle is that we can use Java libraries in our build script. This way we can add extra functionality to our build script in an easy way. We must use the classpath dependency configuration for our build script to include the library. For example we can include the library Grgit, which provides an easy way to interact with Git from Java or Groovy code. This library is also the basis for the Gradle Git plugin. In the next example build file we add the Grgit library to our build script classpath. Then we use the open method of the Grgit class. From the returned object we invoke the head to get the commit id identified as id. With the abbreviatedId property we get the shorter version of the Git commit id. The build file also includes the application plugin. We customize the applicationDistribution CopySpec from the plugin and expand the properties in a VERSION file. This way our distribution always includes a plain text file VERSION with the Git commit id of the code. buildscript { repositories { jcenter() } dependencies { // Add dependency for build script, // so we can access Git from our // build script. classpath 'org.ajoberstar:grgit:1.1.0' } } apply plugin: 'java' apply plugin: 'application' ext { // Open the Git repository in the current directory. git = org.ajoberstar.grgit.Grgit.open(file('.')) // Get commit id of HEAD. revision = git.head().id // Alternative is using abbreviatedId of head() method. // revision = git.head().abbreviatedId } // Use abbreviatedId commit id in the version. version = "2.0.1.${git.head().abbreviatedId}" // application plugin extension properties. mainClassName = 'sample.Hello' applicationName = 'sample' // Customize applicationDistribution // CopySpec from application plugin extension. applicationDistribution.with { from('src/dist') { include 'VERSION' expand( buildDate: new Date(), // Use revision with Git commit id: revision : revision, version : project.version, appName : applicationName) } } // Contents for src/dist/VERSION: /* Version: ${version} Revision: ${revision} Build-date: ${buildDate.format('dd-MM-yyyy HH:mm:ss')} Application-name: ${appName} */ assemble.dependsOn installDist When we run the build task for our project we get the following contents in our VERSION file: Version: 2.0.1.e2ab261 Revision: e2ab2614011ff4be18c03e4dc1f86ab9ec565e6c Build-date: 22-04-2015 13:53:31 Application-name: sample Written with Gradle 2.3.
April 28, 2015
by Hubert Klein Ikkink
· 11,396 Views · 1 Like
article thumbnail
Diagnosing SST Errors with Percona XtraDB Cluster for MySQL
[This article was written by Stephane Combaudon] State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues. Port for SST is not open The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail. You will see in the error log of the donor that SST is started: [...] 141223 16:08:48 [Note] WSREP: Node 2 (node1) requested state transfer from '*any*'. Selected 0 (node3)(SYNCED) as donor. 141223 16:08:48 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6) 141223 16:08:48 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 141223 16:08:48 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6'' [...] But then nothing happens, and some time later you will see a bunch of errors: [...] 2014/12/23 16:09:52 socat[2965] E connect(3, AF=2 192.168.234.101:4444, 16): Connection timed out WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 0 1 (20141223 16:09:52.057) WSREP_SST: [ERROR] Cleanup after exit with status:32 (20141223 16:09:52.064) WSREP_SST: [INFO] Cleaning up temporary directories (20141223 16:09:52.068) 141223 16:09:52 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6' [...] On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted: [...] 141223 16:08:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6) 141223 16:08:48 [Note] WSREP: Requesting state transfer: success, donor: 0 141223 16:08:49 [Note] WSREP: (f9560d0d, 'tcp://0.0.0.0:4567') turning message relay requesting off 141223 16:09:52 [Warning] WSREP: 0 (node3): State transfer to 2 (node1) failed: -32 (Broken pipe) 141223 16:09:52 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort. The solution is of course to make sure that the ports are open on both sides. SST is not correctly configured Sometimes you will see an error like this on the donor: 141223 21:03:15 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.102:4444/xtrabackup_sst' --auth 'sstuser:s3cretzzz' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e63f38f2-8ae6-11e4-a383-46557c71f368:0'' [...] WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20141223 21:03:26.973) And if you look at innobackup.backup.log: 41223 21:03:26 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES). innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2995 main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530 innobackupex: Error: Failed to connect to MySQL server: DBI connect(';mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at /usr//bin/innobackupex line 2979 What happened? The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file: [mysqld] wsrep_sst_auth=sstuser:s3cret And you also need to create the corresponding MySQL user: mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY 's3cret'; So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set. Galera versions do not match Here is another set of errors you may see in the error log of the donor: 141223 21:14:27 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:30 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:33 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node. The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version. But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file: [mysqld] wsrep_provider_options="socket.checksum=1" Conclusion SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.
April 27, 2015
by Peter Zaitsev
· 11,837 Views
article thumbnail
On Neo4j Indexes, Match and Merge
Neo4j uses schema indexes, constraints, merges, matches, and a variety of other indexing features to help with your NoSQL needs.
April 20, 2015
by Michael Hunger
· 11,795 Views
article thumbnail
Using Apache Kafka for Integration and Data Processing Pipelines with Spring
written by josh long on the spring blog applications generated more and more data than ever before and a huge part of the challenge - before it can even be analyzed - is accommodating the load in the first place. apache’s kafka meets this challenge. it was originally designed by linkedin and subsequently open-sourced in 2011. the project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. the design is heavily influenced by transaction logs. it is a messaging system, similar to traditional messaging systems like rabbitmq, activemq, mqseries, but it’s ideal for log aggregation, persistent messaging, fast (_hundreds_ of megabytes per second!) reads and writes, and can accommodate numerous clients. naturally, this makes it perfect for cloud-scale architectures! kafka powers many large production systems . linkedin uses it for activity data and operational metrics to power the linkedin news feed, and linkedin today, as well as offline analytics going into hadoop. twitter uses it as part of their stream-processing infrastructure. kafka powers online-to-online and online-to-offline messaging at foursquare. it is used to integrate foursquare monitoring and production systems with hadoop-based offline infrastructures. square uses kafka as a bus to move all system events through square’s various data centers. this includes metrics, logs, custom events, and so on. on the consumer side, it outputs into splunk, graphite, or esper-like real-time alerting. netflix uses it for 300-600bn messages per day. it’s also used by airbnb, mozilla, goldman sachs, tumblr, yahoo, paypal, coursera, urban airship, hotels.com, and a seemingly endless list of other big-web stars. clearly, it’s earning its keep in some powerful systems! installing apache kafka there are many different ways to get apache kafka installed. if you’re on osx, and you’re using homebrew, it can be as simple as brew install kafka . you can also download the latest distribution from apache . i downloaded kafka_2.10-0.8.2.1.tgz , unzipped it, and then within you’ll find there’s a distribution of apache zookeeper as well as kafka, so nothing else is required. i installed apache kafka in my $home directory, under another directory, bin , then i created an environment variable, kafka_home , that points to $home/bin/kafka . start apache zookeeper first, specifying where the configuration properties file it requires is: $kafka_home/bin/zookeeper-server-start.sh $kafka_home/config/zookeeper.properties the apache kafka distribution comes with default configuration files for both zookeeper and kafka, which makes getting started easy. you will in more advanced use cases need to customize these files. then start apache kafka. it too requires a configuration file, like this: $kafka_home/bin/kafka-server-start.sh $kafka_home/config/server.properties the server.properties file contains, among other things, default values for where to connect to apache zookeeper ( zookeeper.connect ), how much data should be sent across sockets, how many partitions there are by default, and the broker id ( broker.id - which must be unique across a cluster). there are other scripts in the same directory that can be used to send and receive dummy data, very handy in establishing that everything’s up and running! now that apache kafka is up and running, let’s look at working with apache kafka from our application. some high level concepts.. a kafka broker cluster consists of one or more servers where each may have one or more broker processes running. apache kafka is designed to be highly available; there are no master nodes. all nodes are interchangeable. data is replicated from one node to another to ensure that it is still available in the event of a failure. in kafka, a topic is a category, similar to a jms destination or both an amqp exchange and queue. topics are partitioned, and the choice of which of a topic’s partition a message should be sent to is made by the message producer. each message in the partition is assigned a unique sequenced id, its offset . more partitions allow greater parallelism for consumption, but this will also result in more files across the brokers. producers send messages to apache kafka broker topics and specify the partition to use for every message they produce. message production may be synchronous or asynchronous. producers also specify what sort of replication guarantees they want. consumers listen for messages on topics and process the feed of published messages. as you’d expect if you’ve used other messaging systems, this is usually (and usefully!) asynchronous. like spring xd and numerous other distributed system, apache kafka uses apache zookeeper to coordinate cluster information. apache zookeeper provides a shared hierarchical namespace (called znodes ) that nodes can share to understand cluster topology and availability (yet another reason that spring cloud has forthcoming support for it..). zookeeper is very present in your interactions with apache kafka. apache kafka has, for example, two different apis for acting as a consumer. the higher level api is simpler to get started with and it handles all the nuances of handling partitioning and so on. it will need a reference to a zookeeper instance to keep the coordination state. let’s turn now turn to using apache kafka with spring. using apache kafka with spring integration the recently released apache kafka 1.1 spring integration adapter is very powerful, and provides inbound adapters for working with both the lower level apache kafka api as well as the higher level api. the adapter, currently, is xml-configuration first, though work is already underway on a spring integration java configuration dsl for the adapter and milestones are available. we’ll look at both here, now. to make all these examples work, i added the libs-milestone-local maven repository and used the following dependencies: org.apache.kafka:kafka_2.10:0.8.1.1 org.springframework.boot:spring-boot-starter-integration:1.2.3.release org.springframework.boot:spring-boot-starter:1.2.3.release org.springframework.integration:spring-integration-kafka:1.1.1.release org.springframework.integration:spring-integration-java-dsl:1.1.0.m1 using the spring integration apache kafka with the spring integration xml dsl first, let’s look at how to use the spring integration outbound adapter to send message instances from a spring integration flow to an external apache kafka instance. the example is fairly straightforward: a spring integration channel named inputtokafka acts as a conduit that forwards message messages to the outbound adapter, kafkaoutboundchanneladapter . the adapter itself can take its configuration from the defaults specified in the kafka:producer-context element or it from the adapter-local configuration overrides. there may be one or many configurations in a given kafka:producer-context element. here’s the java code from a spring boot application to trigger message sends using the outbound adapter by sending messages into the incoming inputtokafka messagechannel . package xml; import org.apache.commons.logging.log; import org.apache.commons.logging.logfactory; import org.springframework.beans.factory.annotation.qualifier; import org.springframework.boot.commandlinerunner; import org.springframework.boot.springapplication; import org.springframework.boot.autoconfigure.springbootapplication; import org.springframework.context.annotation.bean; import org.springframework.context.annotation.dependson; import org.springframework.context.annotation.importresource; import org.springframework.integration.config.enableintegration; import org.springframework.messaging.messagechannel; import org.springframework.messaging.support.genericmessage; @springbootapplication @enableintegration @importresource("/xml/outbound-kafka-integration.xml") public class demoapplication { private log log = logfactory.getlog(getclass()); @bean @dependson("kafkaoutboundchanneladapter") commandlinerunner kickoff(@qualifier("inputtokafka") messagechannel in) { return args -> { for (int i = 0; i < 1000; i++) { in.send(new genericmessage<>("#" + i)); log.info("sending message #" + i); } }; } public static void main(string args[]) { springapplication.run(demoapplication.class, args); } } using the new apache kafka spring integration java configuration dsl shortly after the spring integration 1.1 release, spring integration rockstar artem bilan got to work on adding a spring integration java configuration dsl analog and the result is a thing of beauty! it’s not yet ga (you need to add the libs-milestone repository for now), but i encourage you to try it out and kick the tires. it’s working well for me and the spring integration team are always keen on getting early feedback whenever possible! here’s an example that demonstrates both sending messages and consuming them from two different integrationflow s. the producer is similar to the example xml above. new in this example is the polling consumer. it is batch-centric, and will pull down all the messages it sees at a fixed interval. in our code, the message received will be a map that contains as its keys the topic and as its value another map with the partition id and the batch (in this case, of 10 records), of records read. there is a messagelistenercontainer -based alternative that processes messages as they come. package jc; import org.apache.commons.logging.log; import org.apache.commons.logging.logfactory; import org.springframework.beans.factory.annotation.autowired; import org.springframework.beans.factory.annotation.qualifier; import org.springframework.beans.factory.annotation.value; import org.springframework.boot.commandlinerunner; import org.springframework.boot.springapplication; import org.springframework.boot.autoconfigure.springbootapplication; import org.springframework.context.annotation.bean; import org.springframework.context.annotation.configuration; import org.springframework.context.annotation.dependson; import org.springframework.integration.integrationmessageheaderaccessor; import org.springframework.integration.config.enableintegration; import org.springframework.integration.dsl.integrationflow; import org.springframework.integration.dsl.integrationflows; import org.springframework.integration.dsl.sourcepollingchanneladapterspec; import org.springframework.integration.dsl.kafka.kafka; import org.springframework.integration.dsl.kafka.kafkahighlevelconsumermessagesourcespec; import org.springframework.integration.dsl.kafka.kafkaproducermessagehandlerspec; import org.springframework.integration.dsl.support.consumer; import org.springframework.integration.kafka.support.zookeeperconnect; import org.springframework.messaging.messagechannel; import org.springframework.messaging.support.genericmessage; import org.springframework.stereotype.component; import java.util.list; import java.util.map; /** * demonstrates using the spring integration apache kafka java configuration dsl. * thanks to spring integration ninja artem bilan * for getting the java configuration dsl working so quickly! * * @author josh long */ @enableintegration @springbootapplication public class demoapplication { public static final string test_topic_id = "event-stream"; @component public static class kafkaconfig { @value("${kafka.topic:" + test_topic_id + "}") private string topic; @value("${kafka.address:localhost:9092}") private string brokeraddress; @value("${zookeeper.address:localhost:2181}") private string zookeeperaddress; kafkaconfig() { } public kafkaconfig(string t, string b, string zk) { this.topic = t; this.brokeraddress = b; this.zookeeperaddress = zk; } public string gettopic() { return topic; } public string getbrokeraddress() { return brokeraddress; } public string getzookeeperaddress() { return zookeeperaddress; } } @configuration public static class producerconfiguration { @autowired private kafkaconfig kafkaconfig; private static final string outbound_id = "outbound"; private log log = logfactory.getlog(getclass()); @bean @dependson(outbound_id) commandlinerunner kickoff( @qualifier(outbound_id + ".input") messagechannel in) { return args -> { for (int i = 0; i < 1000; i++) { in.send(new genericmessage<>("#" + i)); log.info("sending message #" + i); } }; } @bean(name = outbound_id) integrationflow producer() { log.info("starting producer flow.."); return flowdefinition -> { consumer spec = (kafkaproducermessagehandlerspec.producermetadataspec metadata)-> metadata.async(true) .batchnummessages(10) .valueclasstype(string.class) .valueencoder(string::getbytes); kafkaproducermessagehandlerspec messagehandlerspec = kafka.outboundchanneladapter( props -> props.put("queue.buffering.max.ms", "15000")) .messagekey(m -> m.getheaders().get(integrationmessageheaderaccessor.sequence_number)) .addproducer(this.kafkaconfig.gettopic(), this.kafkaconfig.getbrokeraddress(), spec); flowdefinition .handle(messagehandlerspec); }; } } @configuration public static class consumerconfiguration { @autowired private kafkaconfig kafkaconfig; private log log = logfactory.getlog(getclass()); @bean integrationflow consumer() { log.info("starting consumer.."); kafkahighlevelconsumermessagesourcespec messagesourcespec = kafka.inboundchanneladapter( new zookeeperconnect(this.kafkaconfig.getzookeeperaddress())) .consumerproperties(props -> props.put("auto.offset.reset", "smallest") .put("auto.commit.interval.ms", "100")) .addconsumer("mygroup", metadata -> metadata.consumertimeout(100) .topicstreammap(m -> m.put(this.kafkaconfig.gettopic(), 1)) .maxmessages(10) .valuedecoder(string::new)); consumer endpointconfigurer = e -> e.poller(p -> p.fixeddelay(100)); return integrationflows .from(messagesourcespec, endpointconfigurer) .>>handle((payload, headers) -> { payload.entryset().foreach(e -> log.info(e.getkey() + '=' + e.getvalue())); return null; }) .get(); } } public static void main(string[] args) { springapplication.run(demoapplication.class, args); } } the example makes heavy use of java 8 lambdas. the producer spends a bit of time establishing how many messages will be sent in a single send operation, how keys and values are encoded (kafka only knows about byte[] arrays, after all) and whether messages should be sent synchronously or asynchronously. in the next line, we configure the outbound adapter itself and then define an integrationflow such that all messages get sent out via the kafka outbound adapter. the consumer spends a bit of time establishing which zookeeper instance to connect to, how many messages to receive (10) in a batch, etc. once the message batches are recieved, they’re handed to the handle method where i’ve passed in a lambda that’ll enumerate the payload’s body and print it out. nothing fancy. using apache kafka with spring xd apache kafka is a message bus and it can be very powerful when used as an integration bus. however, it really comes into its own because it’s fast enough and scalable enough that it can be used to route big-data through processing pipelines. and if you’re doing data processing, you really want spring xd ! spring xd makes it dead simple to use apache kafka (as the support is built on the apache kafka spring integration adapter!) in complex stream-processing pipelines. apache kafka is exposed as a spring xd source - where data comes from - and a sink - where data goes to. spring xd exposes a super convenient dsl for creating bash -like pipes-and-filter flows. spring xd is a centralized runtime that manages, scales, and monitors data processing jobs. it builds on top of spring integration, spring batch, spring data and spring for hadoop to be a one-stop data-processing shop. spring xd jobs read data from sources , run them through processing components that may count, filter, enrich or transform the data, and then write them to sinks. spring integration and spring xd ninja marius bogoevici , who did a lot of the recent work in the spring integration and spring xd implementation of apache kafka, put together a really nice example demonstrating how to get a full working spring xd and kafka flow working . the readme walks you through getting apache kafka, spring xd and the requisite topics all setup. the essence, however, is when you use the spring xd shell and the shell dsl to compose a stream. spring xd components are named components that are pre-configured but have lots of parameters that you can override with --.. arguments via the xd shell and dsl. (that dsl, by the way, is written by the amazing andy clement of spring expression language fame!) here’s an example that configures a stream to read data from an apache kafka source and then write the message a component called log , which is a sink. log , in this case, could be syslogd, splunk, hdfs, etc. xd> stream create kafka-source-test --definition "kafka --zkconnect=localhost:2181 --topic=event-stream | log"--deploy and that’s it! naturally, this is just a tase of spring xd, but hopefully you’ll agree the possibilities are tantalizing. deploying a kafka server with lattice and docker it’s easy to get an example kafka installation all setup using lattice , a distributed runtime that supports, among other container formats, the very popular docker image format. there’s a docker image provided by spotify that sets up a collocated zookeeper and kafka image . you can easily deploy this to a lattice cluster, as follows: ltc create --run-as-root m-kafka spotify/kafka from there, you can easily scale the apache kafka instances and even more easily still consume apache kafka from your cloud-based services. next steps you can find the code for this blog on my github account . we’ve only scratched the surface! if you want to learn more (and why wouldn’t you?), then be sure to check out marius bogoevici and dr. mark pollack’s upcoming webinar on reactive data-pipelines using spring xd and apache kafka where they’ll demonstrate how easy it can be to use rxjava, spring xd and apache kafka!
April 18, 2015
by Pieter Humphrey
· 29,095 Views
article thumbnail
A cluster management framework, Apache Helix
What is Helix? It is used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration. Modeling a distributed system as a state machine with constraints on states and transitions. Terminologies Node : A single machine Cluster: Set of Nodes Resource : A logical entry (e.g. database, index, task) Partition: Subset of the resource (Each subtask is referred to as a partition) Replica: Copy of a Partition State (e.g Master, Slave). It increase the availability of the system State: Describes the role of a replica (Each node in the cluster has its own Current State) State Machine and Transitions: An action that allows a replica to move from one state to another, thus changing its role. ( e.g Slave --> Master ) spectators: the external clients. Helix provides an External View that is an aggregated view of the current state across all nodes. Current State: represents resource's actual state at a participating node. - INSTANCE_NAME: Unique name representing the process - SESSION_ID: ID that is automatically assigned every time a process joins the cluster Rebalancer: The core component of Helix is the Controller which runs the Rebalance algorithm on every cluster event. Dynamic Ideal State: Helix powerful is that Ideal State can be changed dynamically. It is adjusting the ideal state. Whenever a cluster event occurs, Helix can operate in one of three modes FULL_AUTO SEMI_AUTO CUSTOMIZED Cluster events can be one of the following: Nodes start and/or stop Nodes experience soft and/or hard failures New nodes are added/removed [1] http://helix.apache.org/Concepts.html
April 13, 2015
by Madhuka Udantha
· 7,859 Views
article thumbnail
Adopting Microservices at Netflix: Lessons for Team and Process Design
[this article was written by tony mauro .] in a previous blog post , we shared best practices for designing a microservices architecture, based on adrian cockcroft’s presentation at nginx.conf2014 about his experience as director of web engineering and then cloud architect at netflix . in this follow-up post, we’ll review his recommendations for retooling your development team and processes for a smooth transition to microservices. optimize for speed, not efficiency source: [email protected] the top lesson that cockcroft learned at netflix is that speed wins in the marketplace. if you ask any developer whether a slower development process is better, no one ever says yes. nor do management or customers ever complain that your development cycle is too fast for them. the need for speed doesn’t just apply to tech companies, either: as software becomes increasingly ubiquitous on the internet of things – in cars, appliances, and sensors as well as mobile devices – companies that didn’t used to do software development at all now find that their success depends on being good at it. netflix made an early decision to optimize for speed. this refers specifically to tooling your software development process so that you can react quickly to what your customers want, or even better, can create innovative web experiences that attract customers. speed means learning about your customers and giving them what they want at a faster pace than your competitors. by the time competitors are ready to challenge you in a specific way, you’ve moved on to the next set of improvements. this approach turns the usual paradigm of optimizing for efficiency on its head. efficiency generally means trying to control the overall flow of the development process to eliminate duplication of effort and avoid mistakes, with an eye to keeping costs down. the common result is that you end up focusing on savings instead of looking for opportunities that increase revenue. in cockcroft’s experience, if you say “i’m doing this because it’s more efficient,” the unintended result is that you’re slowing someone else down. this is not an encouragement to be wasteful, but you should optimize for speed first. efficiency becomes secondary as you satisfy the constraint that you’re not slowing things down. the way you grow the business to be more efficient is to go faster. make sure your assumptions are still true many large companies that have enjoyed success in their market (we can call them incumbents ) are finding themselves overtaken by nimbler, usually smaller, organizations ( disruptors ) that react much more quickly to changing consumer behavior. their large size isn’t necessarily the root of the problem – netflix is no longer a small company, for example. as cockcroft sees it, the main cause of difficulty for industry incumbents is that they’re operating under business assumptions that are no longer true. or, as will rogers put it, it’s not what we don’t know that hurts. it’s what we know that ain’t so.” of course, you have to make assumptions as you formulate a business model, and then it makes sense to optimize your business practices around them. the danger comes from sticking with assumptions after they’re no longer true, which means you’re optimizing on the wrong thing. that’s when you become vulnerable to industry disruptors who are making the right assumptions and optimizations for the current business climate. as examples, consider the following assumptions that hold sway at many incumbents. we’ll examine them further in the indicated sections and describe the approach netflix adopted. computing power is expensive. this was true when increasing your computing capacity required capital expenditure on computer hardware. see put your infrastructure in the cloud . process prevents problems. at many companies, the standard response to something going wrong is to add a preventative step to the relevant procedure. see create a high freedom, high responsibility culture with less process . here are some ways to avoid holding onto assumptions that have passed their expiration date: as obvious as it might seem, you need to make your assumptions explicit, then periodically review them to make sure they still hold true. keep aware of technological trends. as an example, the cost of solid state storage drive (ssds) storage continues to go down. it’s still more expensive than regular disks, but the cost difference is becoming small enough that many companies are deciding the superior performance is worth paying a bit more for. [ed: in this entertaining video , fastly founder and ceo artur bergman explains why he believes ssds are always the right choice.] talk to people who aren’t your customers. this is especially necessary for incumbents, who need to make sure that potential new customers are interested in their product. otherwise, they don’t hear about the fact that they’re not being used. as an example, some vendors in the storage space are building hyper-converged systems even as more and more companies are storing their data in the cloud and using open source storage management software. netflix, for example, stores data on amazon web services (aws) servers with ssds and manages it with apache cassandra . a single specialist in java distributed systems is managing the entire configuration without any commercial storage tools or help from engineers specializing in storage, san, or backup. don’t base your future strategy on current it spending, but instead on level of adoption by developers. suppose that your company accounts for nearly all spending in the market for proprietary virtualization software, but then a competitor starts offering an open source-based product at only 1% the cost of yours. if people start choosing it instead of your product, than at the point that your share of total spending is still 90%, your market share has declined to only 10%. if you’re only attending to your revenue, it seems like you’re still in good shape, but 10% of market share can collapse really quickly. put your infrastructure in the cloud source: [email protected] in make sure your assumptions are still true , we mentioned that in the past it was valid to base your business plan on the assumption that computing power was expensive, because it was: the only way to increase your computing capacity was to buy computer hardware. you could then make money by using this expensive resource in the right way to solve customer problems. the advent of cloud computing has pretty much completely invalidated this assumption. it is now possible to buy the amount of capacity you need when you need it, and to pay for only the time you actually use it. the new assumption you need to make is that (virtual) machines are ephemeral. you can create and destroy them at the touch of a button or a call to an api, without any need to negotiate with other departments in your company. one way to think of this change is that the self-service cloud makes formerly impossible things instantaneous. all of netflix’s engineers are in california, but they manage a worldwide infrastructure. the cloud enables them to experiment and determine whether (for example) adding servers in particular location improves performance. suppose they notice problems with video delivery in brazil. they can easily set up 100 cloud server instances in são paulo within a couple hours. if after a week they determine that the difference in delivery speed and reliability isn’t large enought to justify the cost of the additional server instances, they can shut them down just as quickly and easily as they created them. this kind of experiment would be so expensive with a traditional infrastructure that you would never attempt it. you would have to hire an agent in são paulo to coordinate the project, find a data center, satisfy brazilian government regulations, ship machines to brazil, and so on. it would be six months before you could even run the test and find out that increased local capacity didn’t improve your delivery speed. create a high freedom, high responsibility culture with less process in make sure your assumptions are still true , we observed that many companies create rules and processes to prevent problems. when someone makes a mistake, they add a rule to the hr manual that says “well, don’t do that again.” if you read some hr manuals from this perspective, you can extract a historical record of everything that went wrong at the company. when something goes wrong in the development process, the corresponding reaction is to add a new step to the procedure. the major problem with creating process to prevent problems is that over time you build up complex “scar tissue” processes that slow you down. netflix doesn’t have an hr manual. there is a single guideline: “act in netflix’s best interest.” the idea is that if an employee can’t figure out how to interpret the guideline in a given situation, he or she doesn’t have enough judgment to work there. if you don’t trust the judgment of the people on your team, you have to ask why you’re employing them. it’s true that you’ll have to fire people occasionally for violating the guideline. overall, the high level of mutual trust among members of a team, and across the company as a whole, becomes a strong binding force. the following books outline new ways of thinking about process if you’re looking to transform your organization: the goal: a process of ongoing improvement by eliyahu m. goldratt and jeff cox. this book has become a standard management text at business schools since its original publication in 1984. written as a novel about a manager who has only 90 days to improve performance at his factory or have it closed down, it embodies goldratt’s theory of constraints in the context of process control and automation. the phoenix project: a novel about it, devops, and helping your business win by gene kim and kevin behr. as the title indicates, it’s also a novel, about an it manager who has 90 days to save a project that’s late and over budget, or his entire department will be outsourced. he discovers devops as the solution to his problem. replace silos with microservice teams most software development groups are separated into silos, with no overlap of personnel between them. the standard process for a software development project starts with the product manager meeting with the user experience and development groups to discuss ideas for new features. after the idea is implemented in code, the code is passed to the quality assurance (qa) and database administration teams and discussed in more meetings. communication with the system, network, and san administrators is often via tickets. the whole process tends to be slow and loaded with overhead. source: adrian cockcroft some companies try to speed up by creating small “start-up”-style teams that handle the development process from end to end, or sometimes such teams are the result of acquisitions where the acquired company continues to run independently as a separate division. but if the small teams are still doing monolithic delivery, there are usually still handoffs between individuals or groups with responsibility for different functions. the process suffers from the same problems as monolithic delivery in larger companies – it’s simply not very efficient or agile. source: adrian cockcroft conway’s law says that the interface structure of a software system will reflect the social structure of the organization that produced it. so if you want to switch to a microservices architecture, you need to organize your staff into product teams and use devops methodology. there are no longer distinct product managers, ux managers, development managers, and so on, managing downward in their silos. there is a manager for each product feature (implemented as a microservice), who supervises a team that handles all aspects of software development for the microservice, from conception through deployment. the platform team provides infrastructure support that the product teams access via apis. at netflix, the platform team was mostly aws in seattle, with some netflix-managed infrastructure layers built on top. but it doesn’t matter whether your cloud platform is in-house or public; the important thing is that it’s api-driven, self-service, and automatable. source: adrian cockcroft adopt continuous delivery, guided by the ooda loop a siloed team organization is usually paired with monolithic delivery model, in which an integrated, multi-function application is released as a unit (often version-numbered) on a regular schedule. most software development teams use this model initially because it is relatively simple and works well enough with a small number of developers (say, 50 or fewer). however, as the team grows it becomes a real issue when you discover a bug in one developer’s code during qa or production testing and the work of 99 other developers is blocked from release until the bug is fixed. in 2009 netflix adopted a continuous delivery model, which meshes perfectly with a microservices architecture. each microservice represents a single product feature that can be updated independently of the other microservices and on its own schedule. discovering a bug in a microservice has no effect on the release schedule of any other microservice. continuous delivery relies on packaging microservices in standard containers. netflix initially used aws machine images (amis) and it was possible to deploy an update into a test or production environment in about 10 minutes. with docker, that time is reduced even further, to mere seconds in some cases. at netflix, the conceptual framework for continuous development and delivery is an observe-orient-decide-act (ooda) loop . source: adrian cockcroft (http://www.slideshare.net/adrianco) observe refers to examining your current status to look for places where you can innovate. you want your company culture to implicitly authorize anyone who notices an opportunity to start a project to exploit it. for example, you might notice what the diagram calls a “customer pain point”: a lot of people abandoning the registration process on your website when they reach a certain step. you can undertake a project to investigate why and fix the problem. orient refers to analyzing metrics to understand the reasons for the phenomena you’ve observed at the observe point. often this involves analyzing large amounts of unstructured data, such as log files; this is often referred to as big data analysis. the answers you’re looking for are not already in your business intelligence database. you’re examining data that no one has previously looked at and asking questions that haven’t been asked before. decide refers to developing and executing a project plan. company culture is a big factor at this point. as previously discussed, in a high-freedom, high-responsibility culture you don’t need to get management approval before starting to make changes. you share your plan, but you don’t have to ask for permission. act refers to testing your solution and putting it into production. you deploy a microservice that includes your incremental feature to a cloud environment, where it’s automatically put into an ab test to compare it to the previous solution, side by side, for as long as it takes to collect the data that shows whether your approach is better. cooperating microservices aren’t disrupted, and customers don’t see your changes unless they’re selected for the test. if your solution is better, you deploy it into production. it doesn’t have to be a big improvement, either. if the number of clients for your microservice is large enough, then even a fraction of a percent improvement (in response time, say) can be shown to be statistically valid, and the cumulative effect over time of many small changes can be significant. now you’re back at the observe point. you don’t always have to perform all the steps or do them in strict order, either. the important characteristic of the process is that it enables you quickly to determine what your customers want and to create it for them. cockcroft says “it’s hard not to win” if you’re basing your moves on enough data points and your competitors are making guesses that take months to be proven or disproven. the state of art is to circle the loop every one to two weeks, but every microservice team can do it independently. with microservices you can go much faster because you’re not trying to get entire company going around the loop in lockstep. how nginx plus can help at nginx we believe it’s crucial to your future success that you adopt a 4-tier application architecture in which applications are developed and deployed as sets of microservices . we hope the information we’ve shared in this post and its predecessor, adopting microservices at netflix: lessons for architectural design , are helpful as you plan your transition to today’s state-of-the-art architecture for application development. when it’s time to deliver your apps, nginx plus offers an application delivery platform that provides the superior performance, reliability, and scalability your users expect. fully adopting a microservices-based architecture is easier and more likely to succeed when you move to a single software tool for web serving, load balancing, and content caching. nginx plus combines those functions and more in one easy to deploy and manage package. our approach empowers developers to define and control the flawless delivery of their microservices, while respecting the standards and best practices put into place by a platform team. click here to learn more about how nginx plus can help your applications succeed. video recordings fast delivery nginx.conf2014, october 2014 migrating to microservices, part 1 silicon valley microservices meetup, august 2014 migrating to microservices, part 2 silicon valley microservices meetup, august 2014
April 13, 2015
by Patrick Nommensen
· 9,779 Views
article thumbnail
Patterns of API Virtualization
[This article was written by Matthew Heusser.] When Christopher Alexander wrote A Pattern Language in 1977, he was looking for a more powerful way to describe how towns and buildings were laid out. These patterns would allow architects, builders and planners to work together, to use the same words, mean the same thing, and create systems that were beautiful and worked, instead of more urban sprawl. Twenty years later, Gamma, Helms, Johnson and Vlissdes took the pattern idea and applied it to object-oriented software, which at the time was struggling to figure out how to create windows-based applications. Today the struggle is figuring out how to break software into small components that can be tested independently, and then having those components interact, typically over internet protocols. Raw SQL commands are giving way to service oriented systems that interact through APIs, sometimes all within one company, sometimes outside with Microsoft, Google, Amazon, or other APIs like a manufacturing company or supplier. While I do not claim to be Christopher Alexander or the Gang of Four, I am seeing some patterns emerge – a set of solutions to a defined problem – and would like to share a few of those today. What do you mean API? Alistair Cockburn’s Hexagonal Architecture (below) presents a way to think about APIs. The application we want to develop is in the middle and has a set of adapters to the external world. Those adapters might be an API we expose, like a ‘search’ interface to an online catalog, or the API’s we call, including the database, an email gateway, or the ‘permissions’ service, to see what types of search results we should show to this user. Cockburn’s Hexagonal Architecture gives us two ways to think about APIs: Our own, and the services we call. (Source: http://alistair.cockburn.us/Hexagonal+architecture) That’s a lot of APIs. Let’s explore about some ways to virtualize these services – and why. Automated Build and Continuous Integration Say, for example, you are working on a piece of software to analyze trending terms on social media – such as a customer complaint that is being liked and tweeted. You want companies to find these problems when they start to trend up, then reach out to the customer and solve it, or, perhaps, reach out to say “thank you” and amplify it. Modern build systems, like Jenkins, TFS, and TeamCity can compile, deploy, and even run the system to check for known scenarios. The trouble is those pesky adapters to external systems, like Twitter and Facebook. The software could do its job, but there is no way to know if the application is correct in its guesses about trends and importance. Getting the data from the providers can turn a quick build into a slow process that uses a lot of network traffic. By recording and storing known answers to predictable requests, then simulating the service and playing back known (“canned”) data, API Virtualization allows build systems to do more, with faster, more predictable results. This does not remove the need for end-to-end testing, but it does allow the team to have more confidence with each build. Performance Testing Your Application Like build/deploy systems, performance testing the application (the inside of the hexagon) with live, external services can cause problems. All that extra traffic can cause problems with the actual company network infrastructure; it could cause bandwidth problems at the point of the ISP. Some 3rd Party APIs charge a micro-fee per transaction, or limit bandwidth. Many of them lack a ‘test’ sandbox to develop in, so performance testing could interact with real, production work. Standing up a virtual server to return pre-planned data means you can performance test your application – not the third party – prevent bandwidth throttles, not step on production data, and avoid paying fees intended for real (production) use that is actually being used to test our environment. Avoid Integration Environment Inconsistency A few years ago I worked at a large organization that was wrapping old code in proxy services, so they could be consumed by other teams. Login, add-to-cart, search catalog, create custom catalog, permissions, all of it was possible to access through API calls, most of it as simple as a web URL that returned some text. The problem was the “System Integration Test” environment, or SIT. Every team tested its services in SIT, which meant about a third of the time, something was broken. After finding a bug in the current build, we would track it back to the catalog service, walk over to that team, bring up the issue, and they would say “thanks, we are testing a new build of catalog.” We expected catalog to work in SIT. Anything else meant a waste of someone’s time. Automated tools reporting false errors were even worse. When teams performance tested their services, everything calling the service got slow, if it worked at all. By virtualizing services we could test our application end-to-end against known data, without the troubles of SIT, or having to build additional expensive test-lab-like copies of production. Best of all, creating the virtual services is a snap – just record the live service with a tool and instruct it to play back similar requests. Flip Integration Tests from Virtual To Real for Final Checking All this API virtualization creates a risk that the team will move from test to production and something will be different between the Virtual API and the live one. If the Virtual API server is just returning the same thing product did when we recorded it and we have automated checks in place, we can change our test server to point to the real service and re-run all the automated checks. As long as the source data hasn’t changed and we are reading, not writing, from production, the checks should all pass. If the production API has changed, we will get failures, and they will be easy enough to fix and retest. Simulate Slow or Unresponsive Service In The Middle Of A Long Running Transaction Sometimes you want to test if a server is overloaded or down. Calling Facebook and asking them to turn off their servers is unlikely to work; even just coordinating with the team down the hall could create a lot of overhead. You also might want to test this often – every day or every hour – and manually pulling a plug or coordinating with the Login team every hour might not be realistic. The trick is to bring the service down once and record the exact behavior of the system, then use a virtual server to simulate that behavior, over and over again, every day. That means you’ll get the exact behavior, not a guess, and know exactly how the application under test can deal with it. Early Development of System against an Undeployed API Sometimes the API you are testing against does not exist, even in test. It’s still possible to create a Virt (virtual API) which returns some roughly equivalent data, and makes it possible to move forward on the core application without introducing new risks. Avoid Configuration and Copying Hassles Many companies use a test system that is a copy of production, and then refresh the system periodically. Sometimes, you want test scenarios that do not exist in production, so you have to create them … and lose them during a refresh. The same problem happens with 3rd party APIs, when, for example, a part is discontinued, and you are testing ordering that part, or the sample person you check for insurance coverage leaves the company. If the request for the part of the coverage goes through an API, you can record known good results that don’t change, even after a database refresh – then leave the real, end-to-end testing for an exploratory step that will be lighter, quicker, more accurate, and have more confidence. A Fistful of Techniques Today we discussed a half-dozen common patterns to API virtualization, mostly around testing systems in isolation that consume data through an API, like a 3rd party or an internal service. These ideas are new, and evolving. What are a few of your favorites?
April 9, 2015
by Denis Goodwin
· 4,122 Views
article thumbnail
Adopting Microservices at Netflix: Lessons for Architectural Design
[This article was written by Tony Mauro.] In some recent blog posts, we’ve explained why we believe it’s crucial to adopt a four-tier application architecture in which applications are developed and deployed as sets of microservices. It’s becoming increasingly clear that if you keep using development processes and application architectures that worked just fine ten years ago, you simply can’t move fast enough to capture and hold the interest of mobile users who can choose from an ever-growing number of apps. Switching to a microservices architecture creates exciting opportunities in the marketplace for companies. For system architects and developers, it promises an unprecedented level of control and speed as they deliver innovative new web experiences to customers. But at such a breathless pace, it can feel like there’s not a lot of room for error. In the real world, you can’t stop developing and deploying your apps as you retool the processes for doing so. You know that your future success depends on transitioning to a microservices architecture, but how do you actually do it? Fortunately for us, several early adopters of microservices are now generously sharing their expertise in the spirit of open source, not only in the form of published code but in conference presentations and blog posts. Netflix is a leading example. As the Director of Web Engineering and then Cloud Architect, Adrian Cockcroft oversaw the company’s transition from a traditional development model with 100 engineers producing a monolithic DVD-rental application to a microservices architecture with many small teams responsible for the end-to-end development of hundreds of microservices that work together to stream digital entertainment to millions of Netflix customers every day. Now a Technology Fellow at Battery Ventures, Cockcroft is a prominent evangelist for microservices and cloud-native architectures, and serves on the NGINX Technical Advisory Board. In a two-part series of blog posts, we’ll present top takeaways from two talks that Cockcroft delivered last year, at the first annual NGINX conference in October and at a Silicon Valley Microservices Meetup a couple months earlier. (The complete video recordings are also well worth watching.) This post defines microservices architecture and outlines some best practices for designing one. Adopting Microservices at Netflix: Lessons for Team and Process Design discusses why and how to adopt a new mindset for software development and reorganize your teams around it. What is a Microservices Architecture? Cockcroft defines a microservices architecture as a service-oriented architecture composed of loosely coupled elements that have bounded contexts. Loosely coupled means that you can update the services independently; updating one service doesn’t require changing any other services. If you have a bunch of small, specialized services but still have to update them together, they’re not microservices because they’re not loosely coupled. One kind of coupling that people tend to overlook as they transition to a microservices architecture is database coupling, where all services talk to the same database and updating a service means changing the schema. You need to split the database up and denormalize it. The concept of bounded contexts comes from the book Domain Driven Design by Eric Evans. A microservice with correctly bounded context is self-contained for the purposes of software development. You can understand and update the microservice’s code without knowing anything about the internals of its peers, because the microservices and its peers interact strictly through APIs and so don’t share data structures, database schemata, or other internal representations of objects. If you’ve developed applications for the Internet, you’re already familiar with these concepts, in practice if not by name. Most mobile apps talk to quite a few back-end services, to enable its users to do things like share on Facebook, get directions from Google Maps, and find restaurants on Foursquare, all within the context of the app. If your mobile app were tightly coupled with those services, then before you could release an update you would have to talk to all of their development teams to make sure that your changes aren’t going to break anything. When working with a microservices architecture, you think of other internal development teams like those Internet back ends: as external services that your microservice interacts with through APIs. The commonly understood “contract” between microservices is that their APIs are stable and forward compatible. Just as it’s unacceptable for the Google Maps API to change without warning and in such a way that it breaks its users, your API can evolve but must remain compatible with previous versions. Best Practices for Designing a Microservices Architecture Cockcroft describes his role as Cloud Architect at Netflix not in terms of controlling the architecture, but as discovering and formalizing the architecture that emerged as the Netflix engineers built it. The Netflix development team established several best practices for designing and implementing a microservices architecture. Create a Separate Data Store for Each Microservice Do not use the the same back-end data store across microservices. You want the team for each microservice to choose the database that best suits the service. Moreover, with a single data store it’s too easy for microservices written by different teams to share database structures, perhaps in the name of reducing duplication of work. You end up with the situation where if one team updates a database structure, other services that also use that structure have to be changed too. Breaking apart the data can make data management more complicated, because the separate storage systems can more easily get out sync or become inconsistent, and foreign keys can change unexpectedly. You need to add a tool that performs master data management (MDM) by operating in the background to find and fix inconsistencies. For example, it might examine every database that stores subscriber IDs, to verify that the same IDs exist in all of them (there aren’t missing or extra IDs in any one database). You can write your own tool or buy one. Many commercial relational database management systems (RDBMSs) do these kinds of checks, but they usually impose too many requirements for coupling, and so don’t scale. Keep Code at a Similar Level of Maturity Keep all code in a microservice at a similar level of maturity and stability. In other words, if you need to add or rewrite some of the code in a deployed microservice that’s working well, the best approach is usually to create a new microservice for the new or changed code, leaving the existing microservice in place. [Editor’s note: This is sometimes referred to as the immutable infrastructure principle.] This way you can iteratively deploy and test the new code until it is bug free and maximally efficient, without risking failure or performance degradation in the existing microservice. Once the new microservice is as stable as the original, you can merge them back together if they really perform a single function together, or there are other efficiencies from combining them. However, in Cockcroft’s experience it is much more common to realize you should split up a microservice because it’s gotten too big. Do a Separate Build for Each Microservice Do a separate build for each microservice, so that it can pull in component files from the repository at the revision levels appropriate to it. This sometimes leads to the situation where various microservices pull in a similar set of files, but at different revision levels. That can make it more difficult to clean up your codebase by decommissioning old file versions (because you have to verify more carefully that a revision is no longer being used), but that’s an acceptable trade-off for how easy it is to add new files as you build new microservices. The asymmetry is intentional: you want introducing a new microservice, file, or function easy, not dangerous. Deploy in Containers Deploying microservices in containers is important because it means you just need just one tool to deploy everything. As long as the microservice is in a container, the tool knows how to deploy it. It doesn’t matter what the container is. That said, Docker seems very quickly to have become the de facto standard for containers. Treat Servers as Stateless Treat servers, particularly those that run customer-facing code, as interchangeable members of a group. They all perform the same functions, so you don’t need to be concerned about them individually. Your only concern is that there are enough of them to produce the amount of work you need, and you can use auto scaling to adjust the numbers up and down. If one stops working, it’s automatically replaced by another one. Avoid “snowflake” systems in which you depend on individual servers to perform specialized functions. Cockcroft’s analogy is that you want to think of servers like cattle, not pets. If you have a machine in production that performs a specialized function, and you know it by name, and everyone gets sad when it goes down, it’s a pet. Instead you should think of your servers like a herd of cows. What you care about is how many gallons of milk you get. If one day you notice you’re getting less milk than usual, you find out which cows aren’t producing well and replace them. Netflix Delivery Architecture is Built on nginx Netflix is a longtime nginx user and became the first customer of NGINX, Inc. after it incorporated in 2011. Indeed, Netflix chose nginx as the heart of their delivery infrastructure, the Netflix Open Connect Content Delivery Network (CDN), one of the largest CDNs in the world. With the ability to serve thousands, and sometimes millions, of requests per second, nginx is an optimal solution for high-performance HTTP delivery and enables companies like Netflix to offer high-quality digital experiences to millions of customers every day. Video Recordings Fast Delivery nginx.conf2014, October 2014 Migrating to Microservices, Part 1 Silicon Valley Microservices Meetup, August 2014 Migrating to Microservices, Part 2 Silicon Valley Microservices Meetup, August 2014
April 7, 2015
by Patrick Nommensen
· 33,731 Views · 1 Like
  • Previous
  • ...
  • 319
  • 320
  • 321
  • 322
  • 323
  • 324
  • 325
  • 326
  • 327
  • 328
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×