DevOps and CI/CD Resources

Is Asynchronous EJB Just a Gimmick?

Blocking APIs can hurt your applications performance. So does using Asynchronous EJBs help?

July 29, 2015

by Ant Kutschera

· 26,846 Views · 9 Likes

This Week in Modern Software: State of DevOps 2015

Read about the state of DevOps, including Puppet Labs' 2015 report, cloud computing, and Apple Watches.

July 27, 2015

by Fredric Paul

· 2,375 Views

Using Java 8 CompletableFuture and Rx-Java Observable

A simple scatter-gather scenario using Java 8 CompletableFuture and using Rx-Java Observable.

July 24, 2015

by Biju Kunjummen

· 22,850 Views · 5 Likes

Monitoring DevOps Style With WildFly 9 And Jolokia

Using JMX you can monitor you Java EE servers for key metrics.

July 22, 2015

by Markus Eisele

· 4,141 Views

The Periodic Table of DevOps Tools

A very cool, intuitive guide to the massive landscape of DevOps tools and their use cases.

July 22, 2015

by Necco Ceresani

· 35,391 Views · 7 Likes

Publishing Domain Events with Spring Integration EventBus

How to use Spring Integration to keep domain and integration concerns separate and simple for many Spring-based applications.

July 19, 2015

by Muhammad Noor

· 22,675 Views · 6 Likes

Microservices with Spring

How to put Spring, Spring Boot, and Spring Cloud together to create a microservice.

July 15, 2015

by Pieter Humphrey

CORE

· 20,044 Views · 6 Likes

Docker in Action: The Shared Memory Namespace

In this article, excerpted from the book Docker in Action, I will show you how to open access to shared memory between containers. Linux provides a few tools for sharing memory between processes running on the same computer. This form of inter-process communication (IPC) performs at memory speeds. It is often used when the latency associated with network or pipe based IPC drags software performance below requirements. The best examples of shared memory based IPC usage is in scientific computing and some popular database technologies like PostgreSQL. Docker creates a unique IPC namespace for each container by default. The Linux IPC namespace partitions shared memory primitives like named shared memory blocks and semaphores, as well as message queues. It is okay if you are not sure what these are. Just know that they are tools used by Linux programs to coordinate processing. The IPC namespace prevents processes in one container from accessing the memory on the host or in other containers. Sharing IPC Primitives Between Containers I’ve created an image named allingeek/ch6_ipc that contains both a producer and consumer. They communicate using shared memory. Listing 1 will help you understand the problem with running these in separate containers. Listing 1: Launch a Communicating Pair of Programs # start a producer docker -d -u nobody --name ch6_ipc_producer \ allingeek/ch6_ipc -producer # start the consumer docker -d -u nobody --name ch6_ipc_consumer \ allingeek/ch6_ipc -consumer Listing 1 starts two containers. The first creates a message queue and starts broadcasting messages on it. The second should pull from the message queue and write the messages to the logs. You can see what each is doing by using the following commands to inspect the logs of each: docker logs ch6_ipc_producer docker logs ch6_ipc_consumer If you executed the commands in Listing 1 something should be wrong. The consumer never sees any messages on the queue. Each process used the same key to identify the shared memory resource but they referred to different memory. The reason is that each container has its own shared memory namespace. If you need to run programs that communicate with shared memory in different containers, then you will need to join their IPC namespaces with the --ipc flag. The --ipc flag has a container mode that will create a new container in the same IPC namespace as another target container. Listing 2: Joining Shared Memory Namespaces # remove the original consumer docker rm -v ch6_ipc_consumer # start a new consumer with a joined IPC namespace docker -d --name ch6_ipc_consumer \ --ipc container:ch6_ipc_producer \ allingeek/ch6_ipc -consumer Listing 2 rebuilds the consumer container and reuses the IPC namespace of the ch6_ipc_producer container. This time the consumer should be able to access the same memory location where the server is writing. You can see this working by using the following commands to inspect the logs of each: docker logs ch6_ipc_producer docker logs ch6_ipc_consumer Remember to cleanup your running containers before moving on: # remember: the v option will clean up volumes, # the f option will kill the container if it is running, # and the rm command takes a list of containers docker rm -vf ch6_ipc_producer ch6_ipc_consumer There are obvious security implications to reusing the shared memory namespaces of containers. But this option is available if you need it. Sharing memory between containers is safer alternative to sharing memory with the host.

July 9, 2015

by Jeff Nickoloff

· 38,446 Views · 1 Like

Refactoring with Loops and Collection Pipelines: Part 1

The loop is the classic way of processing collections, but with the greater adoption of first-class functions in programming languages the collection pipeline is an appealing alternative. In this article I look at refactoring loops to collection pipelines with a series of small examples. I'm publishing this article in installments. This adds an example of refactoring a loop that summarizes flight delay data for each destination airport. A common task in programming is processing a list of objects. Most programmers naturally do this with a loop, as it's one of the basic control structures we learn with our very first programs. But loops aren't the only way to represent list processing, and in recent years more people are making use of another approach, which I call the collection pipeline. This style is often considered to be part of functional programming, but I used it heavily in Smalltalk. As OO languages support lambdas and libraries that make first class functions easier to program with, then collection pipelines become an appealing choice. Refactoring a Simple Loop into a Pipeline I'll start with a simple example of a loop and show the basic way I refactor one into a collection pipeline. Let's imagine we have a list of authors, each of which has the following data structure. class Author... public string Name { get; set; } public string TwitterHandle { get; set;} public string Company { get; set;} This example uses C# Here is the loop. class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { var result = new List (); foreach (Author a in authors) { if (a.Company == company) { var handle = a.TwitterHandle; if (handle != null) result.Add(handle); } } return result; } My first step in refactoring a loop into a collection pipeline is to apply Extract Variable on the loop collection. class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { var result = new List (); var loopStart = authors; foreach (Author a in loopStart) { if (a.Company == company) { var handle = a.TwitterHandle; if (handle != null) result.Add(handle); } } return result; } This variable gives me a starting point for pipeline operations. I don't have a good name for it right now, so I'll use one that makes sense for the moment, expecting to rename it later. I then start looking at bits of behavior in the loop. The first thing I see is a conditional check, I can move this to the pipeline with a . class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { var result = new List (); var loopStart = authors .Where(a => a.Company == company); foreach (Author a in loopStart) { if (a.Company == company) { var handle = a.TwitterHandle; if (handle != null) result.Add(handle); } } return result; } I see the next part of the loop operates on the twitter handle, rather than the author, so I can use a a . class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { var result = new List (); var loopStart = authors .Where(a => a.Company == company) .Select(a => a.TwitterHandle); foreach (string handle in loopStart) { var handle = a.TwitterHandle; if (handle != null) result.Add(handle); } return result; } Next in the loop as another conditional, which again I can move to a filter operation. class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { var result = new List (); var loopStart = authors .Where(a => a.Company == company) .Select(a => a.TwitterHandle) .Where (h => h != null); foreach (string handle in loopStart) { if (handle != null) result.Add(handle); } return result; } All the loop now does is add everything in its loop collection into the result collection, so I can remove the loop and just return the pipeline result. class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { var result = new List (); return authors .Where(a => a.Company == company) .Select(a => a.TwitterHandle) .Where (h => h != null); foreach (string handle in loopStart) { result.Add(handle); } return result; } Here's the final state of the code class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { return authors .Where(a => a.Company == company) .Select(a => a.TwitterHandle) .Where (h => h != null); } What I like about collection pipelines is that I can see the flow of logic as the elements of the list pass through the pipeline. For me it reads very closely to how I'd define the outcome of the loop "take the authors, choose those who have a company, and get their twitter handles removing any null handles". Furthermore, this style of code is familiar even in different languages who have different syntaxes and different names for pipeline operators. Java public List twitterHandles(List authors, String company) { return authors.stream() .filter(a -> a.getCompany().equals(company)) .map(a -> a.getTwitterHandle()) .filter(h -> null != h) .collect(toList()); } Ruby def twitter_handles authors, company authors .select {|a| company == a.company} .map {|a| a.twitter_handle} .reject {|h| h.nil?} end while this matches the other examples, I would replace the final reject with compact Clojure (defn twitter-handles [authors company] (->> authors (filter #(= company (:company %))) (map :twitter-handle) (remove nil?))) F# let twitterHandles (authors : seq, company : string) = authors |> Seq.filter(fun a -> a.Company = company) |> Seq.map(fun a -> a.TwitterHandle) |> Seq.choose (fun h -> h) again, if I wasn't concerned about matching the structure of the other examples I would combine the map and choose into a single step I've found that once I got used to thinking in terms of pipelines I could apply them quickly even in an unfamiliar language. Since the fundamental approach is the same it's relatively easy to translate from even unfamiliar syntax and function names. Refactoring within the Pipeline, and to a Comprehension Once you have some behavior expressed as a pipeline, there are potential refactorings you can do by reordering steps in the pipeline. One such move is that if you have a map followed by a filter, you can usually move the filter before the map like this. class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { return authors .Where(a => a.Company == company) .Where (a => a.TwitterHandle != null) .Select(a => a.TwitterHandle); } When you have two adjacent filters, you can combine them using a conjunction. class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { return authors .Where(a => a.Company == company && a.TwitterHandle != null) .Select(a => a.TwitterHandle); } Once I have a C# collection pipeline in the form of a simple filter and map like this, I can replace it with a Linq expression class Author... static public IEnumerable TwitterHandles(IEnumerable authors, string company) { return from a in authors where a.Company == company && a.TwitterHandle != null select a.TwitterHandle; } I consider Linq expressions to be a form of , and similarly you can do something like this with any language that supports list comprehensions. It's a matter of taste whether you prefer the list comprehension form, or the pipeline form (I prefer pipelines). In general pipelines are more powerful, in that you can't refactor all pipelines into comprehensions.

July 7, 2015

by Martin Fowler

· 3,673 Views

Microservices Design Principles

Get a crash course in understanding microservices and the difficulties in implementing them.

July 5, 2015

by Saravanan Subramanian

· 60,279 Views · 10 Likes

Playing with Percona XtraDB Cluster in Docker

[This article was written by Sveta Smirnova] Like any good, thus lazy, engineer I don’t like to start things manually. Creating directories, configuration files, specify paths, ports via command line is too boring. I wrote already how I survive in case when I need to start MySQL server (here). There is also the MySQL Sandbox which can be used for the same purpose. But what to do if you want to start Percona XtraDB Cluster this way? Fortunately we, at Percona, have engineers who created automation solution for starting PXC. This solution uses Docker. To explore it you need: Clone the pxc-docker repository:git clone https://github.com/percona/pxc-docker Install Docker Compose as described here cd pxc-docker/docker-bld Follow instructions from the README file: a) ./docker-gen.sh 5.6 (docker-gen.sh takes a PXC branch as argument, 5.6 is default, and it looks for it on github.com/percona/percona-xtradb-cluster) b) Optional: docker-compose build (if you see it is not updating with changes). c) docker-compose scale bootstrap=1 members=2 for a 3 node cluster Check which ports assigned to containers: $docker port dockerbld_bootstrap_1 3306 0.0.0.0:32768 $docker port dockerbld_members_1 4567 0.0.0.0:32772 $docker port dockerbld_members_2 4568 0.0.0.0:32776 Now you can connect to MySQL clients as usual: $mysql -h 0.0.0.0 -P 32768 -uroot Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 10 Server version: 5.6.21-70.1 MySQL Community Server (GPL), wsrep_25.8.rXXXX Copyright (c) 2009-2015 Percona LLC and/or its affiliates Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> 6. To change MySQL options either pass it as a mount at runtime with something like volume: /tmp/my.cnf:/etc/my.cnf in docker-compose.yml or connect to container’s bash (docker exec -i -t container_name /bin/bash), then change my.cnf and run docker restart container_name Notes. If you don’t want to build use ready-to-use images If you don’t want to run Docker Compose as root user add yourself to docker group

July 3, 2015

by Peter Zaitsev

· 4,528 Views

Exclusive Collection

TROLEE is online shopping portal based in India tendering fashion products to the customers worldwide. TROLEE offering wide range of products in the category of designer sarees, Salwar kameez, Kurtis, Exclusive Wedding Collection, Indian designer collection, Western outfits, Jeans, T-shirts, and Women’s Apparels at wholesale price in India. Metaphorically, TROLEE has been known as Shopping Paradise as customer always feel to Shop bigger than ever in each events organized by TROLEE. On each order shipping facility available free of cost in India and delivery can be done Worldwide. We have been appreciated by our customer for the Best Festival Offers and discounts with Assured Service, quality products. Just visit us trolee.com

July 3, 2015

by Kamlesh Gohil

· 891 Views

Exclusive Collection

TROLEE is online shopping portal based in India tendering fashion products to the customers worldwide. TROLEE offering wide range of products in the category of designer sarees, Salwar kameez, Kurtis, Exclusive Wedding Collection, Indian designer collection, Western outfits, Jeans, T-shirts, and Women’s Apparels at wholesale price in India. Metaphorically, TROLEE has been known as Shopping Paradise as customer always feel to Shop bigger than ever in each events organized by TROLEE. On each order shipping facility available free of cost in India and delivery can be done Worldwide. We have been appreciated by our customer for the Best Festival Offers and discounts with Assured Service, quality products. Just visit us trolee.com

July 3, 2015

by Kamlesh Gohil

· 576 Views

Software Architecture in DevOps

A new book looks at how DevOps affects architectural decisions, and a software architect’s role in DevOps.

July 3, 2015

by Jim Bird

· 53,144 Views · 3 Likes

Too Big Data: Coping with Overplotting

written by tim brock. scatter plots are a wonderful way of showing ( apparent ) relationships in bivariate data. patterns and clusters that you wouldn't see in a huge block of data in a table can become instantly visible on a page or screen. with all the hype around big data in recent years it's easy to assume that having more data is always an advantage. but as we add more and more data points to a scatter plot we can start to lose these patterns and clusters. this problem, a result of overplotting, is demonstrated in the animation below. the data in the animation above is randomly generated from a pair of simple bivariate distributions. the distinction between the two distributions becomes less and less clear as we add more and more data. so what can we do about overplotting? one simple option is to make the data points smaller. (note this is a poor "solution" if many data points share exactly the same values.) we can also make them semi-transparent. and we can combine these two options: these refinements certainly help when we have ten thousand data points. however, by the time we've reached a million points the two distributions have seemingly merged in to one again. making points smaller and more transparent might help things; nevertheless, at some point we may have to consider a change of visualization. we'll get on to that later. but first let's try to supplement our visualization with some extra information. specifically let's visualize the marginal distributions . we have several options. there's far too much data for a rug plot , but we can bin the data and show histograms . or we can use a smoother option - a kernel density plot . finally, we could use the empirical cumulative distribution . this last option avoids any binning or smoothing but the results are probably less intuitive. i'll go with the kernel density option here, but you might prefer a histogram. the animated gif below is the same as the gif above but with the smoothed marginal distributions added. i've left scales off to avoid clutter and because we're only really interested in rough judgements of relative height. adding marginal distributions, particularly the distribution of variable 2, helps clarify that two different distributions are present in the bivariate data. the twin-peaked nature of variable 2 is evident whether there are a thousand data points or a million. the relative sizes of the two components is also clear. by contrast, the marginal distribution of variable 1 only has a single peak, despite coming from two distinct distributions. this should make it clear that adding marginal distributions is by no means a universal solution to overplotting in scatter plots. to reinforce this point, the animation below shows a completely different set of (generated) data points in a scatter plot with marginal distributions. the data again comes from a random sample of two different 2d distributions, but both marginal distributions of the complete dataset fail to highlight this separation. as previously, when the number of data points is large the distinction between the two clusters can't be seen from the scatter plot either. returning to point size and opacity, what do we get if we make the data points very small and almost completely transparent? we can now clearly distinguish two clusters in each dataset. it's difficult to make out any fine detail though. since we've lost that fine detail anyway, it seems apt to question whether we really want to draw a million data points. it can be tediously slow and impossible in certain contexts. 2d histograms are an alternative. by binning data we can reduce the number of points to plot and, if we pick an appropriate color scale, pick out some of the features that were lost in the clutter of the scatter plot. after some experimenting i picked a color scale that ran from black through green to white at the high end. note, this is (almost) the reverse of the effect created by overplotting in the scatter plots above. in both 2d histograms we can clearly see the two different clusters representing the two distributions from which the data is drawn. in the first case we can also see that there are more counts from the upper-left cluster than the bottom-right cluster, a detail that is lost in the scatter plot with a million data points (but more obvious from the marginal distributions). conversely, in the case of the second dataset we can see that the "heights" of the two clusters are roughly comparable. 3d charts are overused, but here (see below) i think they actually work quite well in terms of providing a broad picture of where the data is and isn't concentrated. feature occlusion is a problem with 3d charts so if you're going to go down this route when exploring your own data i highly recommend using software that allows for user interaction through rotation and zooming. in summary, scatter plots are a simple and often effective way of visualizing bivariate data. if, however, your chart suffers from overplotting, try reducing point size and opacity. failing that, a 2d histogram or even a 3d surface plot may be helpful. in the latter case be wary of occlusion.

July 3, 2015

by Josh Anderson

· 13,222 Views

Git Workflows: The 4 Major Types

Git offers several types of workflows. Learn what they are and which type is best suited for your specific purpose.

July 3, 2015

by Madhuka Udantha

· 34,224 Views · 2 Likes

Ramesh Shivakumaran Gulftainer records 8 growth in container volume to achieve 6.4 Million Teus in 2014

16 Apr 2015 In a year defined by international expansion and investments in new infrastructure to enhance operational efficiency, Gulftainer recorded robust growth across its entire terminal portfolio. Iain Rawlinson, Group Commercial Director of Gulftainer said: “The positive growth recorded by Gulftainer across its terminals globally underlines the confidence of our partners in our ability to meet their requirements efficiently. Our extensive network and technological expertise are the strengths that have enabled us to expand our footprint to new locations. We continuously invest in enhancing our infrastructure, thus boosting reliability, operational efficiency and productivity.” He added: “The growth in volume achieved throughout our terminals is strong testament to the expertise and dedication of our employees and the strong productivity levels we are able to achieve on a consistent basis. In the dynamic global trade routes linking Asia and Europe, our terminals today play an increasingly significant role. Even as we expand and grow our business, we also remain committed to the communities we serve in by creating new jobs and supporting the domestic economy.” In global markets, Gulftainer’s Saudi terminals recorded impressive growth with Northern Container Terminal accounting for 1.9 million TEUs, sustaining previous-year trends, while Jubail Container Terminal (JCT) noted a growth of 22 per cent to over 396,000 TEUs. The total volume at the Saudi terminals was over 2.29 million TEUs. Gulftainer’s Umm Qasr terminal also accomplished a significant growth of 46 per cent in 2014, while the Recife terminal in Brazil marked a growth in volume of 7 per cent. Gulftainer’s UAE terminals recorded a total volume of 3.8 million TEUs in line with the all-round growth in business. The company marked another significant milestone, with the Sharjah Container Terminal (SCT) surpassing 400,000 TEUs in annual throughput for the very first time. Operations at SCT were energised by the positive growth in global trade and the arrival of new services, such as UASC’s Gulf India Service (GIS1), which now connects Sharjah with Sohar in Oman, Mundra in India and Karachi in Pakistan. The addition of this service represented a significant development for Sharjah and boosted the national carrier’s volumes through SCT last year. The only fully fledged operational container terminal in the UAE located outside the Strait of Hormuz, Khorfakkan Container Terminal (KCT) has today emerged as one of the most important transshipment hubs for the Arabian Gulf, the Indian Sub-continent, the Gulf of Oman and the East African markets. Further strengthening the operations at KCT, Gulftainer has received and commissioned new state-of-the-art Ship to Shore (STS) and Rubber Tyred Gantry (RTG) cranes that will further increase overall performance and productivity. This enhanced infrastructure marks an investment of over US$60 million. Gulftainer has set an ambitious target to triple the volume over the next decade through organic growth across existing businesses, exploring green field opportunities and potential M&A activities.

July 2, 2015

by Androcles Buckley

· 580 Views

GULFTAINER SURPASSES 400,000 TEU MILESTONE AT SHARJAH CONTAINER TERMINAL IN 2014

Gulftainer, a privately owned, independent terminal operating and logistics company, marked another significant milestone with the Sharjah Container Terminal (SCT) surpassing 400,000 TEUs (Twenty Foot Equivalent Units) in annual throughput during 2014. SCT has again recorded double-digit growth compared to last year’s volumes. The achievement was reached with an impressive safety record under challenging conditions including space constraints. Iain Rawlinson, Group Commercial Director of Gulftainer said that the professional approach of Gulftainer’s management, along with consistently high productivity levels, was a driving force behind the Terminal’s success. “SCT has always marketed itself as ‘The Flexible Alternative’ and the individual attention we extend to our customers offers us an advantage over competitors.” The 400,000th unit was discharged from Mag Container Lines’ vessel, ‘Mag Success’, one of the Terminal’s regular callers, which considers Sharjah as her base port. Speaking on behalf of Mag Line’s CEO, BDM Jamal Saleh congratulated the Terminal for its achievement. He said: “The announcement today reflects how Gulftainer and MCL have grown together over the years and, in partnership, managed to reach this target. The continuous support, flexibility and excellent operational performance MCL receives from Gulftainer, both operationally and logistically, has contributed greatly to this achievement.” The milestone was achieved on the shift of Duty Superintendent Mehmood Malik, the longest serving employee at over 38 years at the Terminal and part of the team when the first TEU crossed the quay. Mehmood has witnessed several records and milestones and recalls handling 2,500 TEUs in 1976: “At that time we could not imagine reaching the levels of throughput we have today, so this is a very special moment for me.” SCT, which is managed and operated by Gulftainer on behalf of the Sharjah Port Authority, has the honour of being the site of the first container terminal in the Gulf, commenced operations in 1976. SCT is located in the heart of Sharjah and is an ideal gateway for import and export cargo with direct links throughout the Gulf, Asia, Europe, Americas and Africa. The strong performance of the Sharjah economy has supported the growth of many of SCT’s customers, enabling them to increase their throughput and contribute to a record year for the Terminal. The relationships built with our customers have been strengthened by the joint efforts of Gulftainer’s sales and marketing team and the high levels of service and operational efficiency at the terminal, “When looking at the Sharjah market, the dedicated team at SCT listen to and address the many requirements of our diverse and interesting customer base,” said Iain Rawlinson. SCT’s figures have been further boosted with the arrival of new services throughout the year, including UASC’s Gulf India Service (GIS1), which now connects Sharjah with Sohar in Oman, Mundra in India and Karachi in Pakistan, which has boosted in the national carrier’s volumes through SCT in November and December. Gulftainer’s current portfolio covers UAE operations in Khorfakkan Port and Port Khalid in Sharjah as well as activities at Umm Qasr in Iraq, Recife in Brazil, Jeddah and Jubail in Saudi Arabia and in Tripoli Port in Lebanon, which will be operational in April 2016. It also marked another milestone in 2014 with its expansion to the US by signing a long-term agreement to operate the container and multi-cargo terminal at Port Canaveral in Florida. With a current handling activity of over 6 million TEUs, the company has set an ambitious target to triple the volume over the next decade through organic growth across existing businesses, exploring green field opportunities and potential M&A activities.

July 2, 2015

by Tirill Malmin

· 619 Views

Using Camel, CDI Inside Kubernetes With Fabric8

Learn about how to integrate Apache Camel and Fabric8 into an existing Kubernetes CDI service.

July 2, 2015

by Ioannis Canellos

· 19,385 Views · 1 Like

SolrCloud: What Happens When ZooKeeper Fails – Part Two

in the previous blog post about solrcloud we’ve talked about the situation when zookeeper connection failed and how solr handles that situation. however, we only talked about query time behavior of solrcloud and we said that we will get back to the topic of indexing in the future. that future is finally here – let’s see what happens to indexing when zookeeper connection is not available. looking back at the old post in the solrcloud – what happens when zookeeper fails? blog post, we’ve shown that solr can handle querying without any issues when connection to zookeeper has been lost (which can be caused by different reasons). of course this is true until we change the cluster topology. unfortunately, in case of indexing or cluster change operations, we can’t change the cluster state or index documents when zookeeper connection is not working or zookeeper failed to read/write the data we want. why we can run queries? the situation is quite simple – querying is not an operation that needs to alter solrcloud cluster state. the only thing solr needs to do is accept the query, run it against known shards/replicas and gather the results. of course cluster topology is not retrieved with each query, so when there is no active zookeeper connection (or zookeeper failed) we don’t have a problem with running queries. there is also one important and not widely know feature of solrcloud – the ability to return partial results. by adding the shards.tolerant=true parameter to our queries we inform solr, that we can live with partial results and it should ignore shards that are not available. this means that solr will return results even if some of the shards from our collection is not available. by default, when this parameter is not present or set to false , solr will just return error when running a query against collection that doesn’t have all the shards available. why we can’t index data? so, we can’t we index data, when zookeeper connection is not available or when zookeeper doesn’t have a quorum? because there is potentially not enough information about the cluster state to process the indexing operation. solr just may not have the fresh information about all the shards, replicas, etc. because of that, indexing operation may be pointed to incorrect shard (like not to the current leader), which can lead to data corruption. and because of that indexing (or cluster change) operation is jus not possible. it is generally worth remembering, that all operations that can lead to cluster state update or collections update won’t be possible when zookeeper quorum is not visible by solr (in our test case, it will be a lack of connectivity of a single zookeeper server). of course, we could leave you with what we wrote above, but let’s check if all that is true. running zookeeper a very simple step. for the purpose of the test we will only need a single zookeeper instance which is run using the following command from zookeeper installation directory: bin/zkserver.sh start we should see the following information on the console: jmx enabled by default using config: /users/gro/solry/zookeeper/bin/../conf/zoo.cfg starting zookeeper ... started and that means that we have a running zookeeper server. starting two solr instances to run the test we’ve used the newest available solr version – the 5.2.1 when this blog post was published. to run two solr instances we’ve used the following command: bin/solr start -e cloud -z localhost:2181 solr asked us a few questions when it was starting and the answers where the following: number of instances: 2 collection name: gettingstarted number of shards: 2 replication count: 1 configuration name: data_driven_schema_configs cluster topology after solr started was as follows: let’s index a few documents to see that solr is really running, we’ve indexed a few documents by running the following command: bin/post -c gettingstarted docs/ if everything went well, after running the following command: curl -xget 'localhost:8983/solr/gettingstarted/select?indent=true&q=*:*&rows=0' we should see solr responding with similar xml: 0 38 *:* true 0 we’ve indexed our documents, we have solr running. let’s stop zookeeper and index data to stop zookeeper server we will just run the following command in the zookeeper installation directory: bin/zkserver.sh stop and now, let’s again try to index our data: bin/post -c gettingstarted docs/ this time, instead of data being written into the collection we will get an error response similar to the following one: posting file index.html (text/html) to [base]/extract simpleposttool: warning: solr returned an error #503 (service unavailable) for url: http://localhost:8983/solr/gettingstarted/update/extract?resource.name=%2fusers%2fgro%2fsolry%2f5.2.1%2fdocs%2findex.html&literal.id=%2fusers%2fgro%2fsolry%2f5.2.1%2fdocs%2findex.html simpleposttool: warning: response: 5033cannot talk to zookeeper - updates are disabled.503 as we can see, the lack of zookeeper connectivity resulted in solr not being able to index data. of course querying still works. turning on zookeeper again and retrying indexing will be successful, because solr will automatically reconnect to zookeeper and will start working again. short summary of course this and the previous blog post related to zookeeper and solrcloud are only touching the surface of what is happening when zookeeper connection is not available. a very good test that shows us data consistency related information can be found at http://lucidworks.com/blog/call-maybe-solrcloud-jepsen-flaky-networks/ . i really recommend it if you would like to know what will happen with solrcloud in various emergency situations.

July 2, 2015

by Rafał Kuć

· 17,674 Views

The Latest DevOps and CI/CD Topics