DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • AWS Redshift Data Sharing: Unlocking the Power of Collaborative Analytics
  • 7 Essential Tips for a Production ClickHouse Cluster
  • How To Manage Redis Cluster Topology With Command Line
  • Data Migration from AWS DocumentDB to Atlas on AWS

Trending

  • Rust and WebAssembly: Unlocking High-Performance Web Apps
  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • Segmentation Violation and How Rust Helps Overcome It
  • Zero Trust for AWS NLBs: Why It Matters and How to Do It
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. SolrCloud: What Happens When ZooKeeper Fails – Part Two

SolrCloud: What Happens When ZooKeeper Fails – Part Two

By 
Rafał Kuć user avatar
Rafał Kuć
·
Jul. 02, 15 · Interview
Likes (0)
Comment
Save
Tweet
Share
17.7K Views

Join the DZone community and get the full member experience.

Join For Free

in the previous blog post about solrcloud we’ve talked about the situation when zookeeper connection failed and how solr handles that situation. however, we only talked about query time behavior of solrcloud and we said that we will get back to the topic of indexing in the future. that future is finally here – let’s see what happens to indexing when zookeeper connection is not available.

looking back at the old post

in the solrcloud – what happens when zookeeper fails? blog post, we’ve shown that solr can handle querying without any issues when connection to zookeeper has been lost (which can be caused by different reasons). of course this is true until we change the cluster topology. unfortunately, in case of indexing or cluster change operations, we can’t change the cluster state or index documents when zookeeper connection is not working or zookeeper failed to read/write the data we want.

why we can run queries?

the situation is quite simple – querying is not an operation that needs to alter solrcloud cluster state. the only thing solr needs to do is accept the query, run it against known shards/replicas and gather the results. of course cluster topology is not retrieved with each query, so when there is no active zookeeper connection (or zookeeper failed) we don’t have a problem with running queries.

there is also one important and not widely know feature of solrcloud – the ability to return partial results. by adding the shards.tolerant=true parameter to our queries we inform solr, that we can live with partial results and it should ignore shards that are not available. this means that solr will return results even if some of the shards from our collection is not available. by default, when this parameter is not present or set to false , solr will just return error when running a query against collection that doesn’t have all the shards available.

why we can’t index data?

so, we can’t we index data, when zookeeper connection is not available or when zookeeper doesn’t have a quorum? because there is potentially not enough information about the cluster state to process the indexing operation. solr just may not have the fresh information about all the shards, replicas, etc. because of that, indexing operation may be pointed to incorrect shard (like not to the current leader), which can lead to data corruption. and because of that indexing (or cluster change) operation is jus not possible.

it is generally worth remembering, that all operations that can lead to cluster state update or collections update won’t be possible when zookeeper quorum is not visible by solr (in our test case, it will be a lack of connectivity of a single zookeeper server).

of course, we could leave you with what we wrote above, but let’s check if all that is true.

running zookeeper

a very simple step. for the purpose of the test we will only need a single zookeeper instance which is run using the following command from zookeeper installation directory:

bin/zkserver.sh start

we should see the following information on the console:

jmx enabled by default
using config: /users/gro/solry/zookeeper/bin/../conf/zoo.cfg
starting zookeeper ... started

and that means that we have a running zookeeper server.

starting two solr instances

to run the test we’ve used the newest available solr version – the 5.2.1 when this blog post was published. to run two solr instances we’ve used the following command:

bin/solr start -e cloud -z localhost:2181

solr asked us a few questions when it was starting and the answers where the following:

  • number of instances: 2
  • collection name: gettingstarted
  • number of shards: 2
  • replication count: 1
  • configuration name: data_driven_schema_configs

cluster topology after solr started was as follows:

zrzut ekranu 2015-06-21 o 11.13.31

let’s index a few documents

to see that solr is really running, we’ve indexed a few documents by running the following command:

bin/post -c gettingstarted docs/

if everything went well, after running the following command:

curl -xget 'localhost:8983/solr/gettingstarted/select?indent=true&q=*:*&rows=0'

we should see solr responding with similar xml:

<?xml version="1.0" encoding="utf-8"?>
<response>
 <lst name="responseheader">
  <int name="status">0</int>
  <int name="qtime">38</int>
  <lst name="params">
   <str name="q">*:*</str>
   <str name="indent">true</str>
   <str name="rows">0</str>
  </lst>
 </lst>
 <result name="response" numfound="3577" start="0" maxscore="1.0">
 </result>
</response>

we’ve indexed our documents, we have solr running.

let’s stop zookeeper and index data

to stop zookeeper server we will just run the following command in the zookeeper installation directory:

bin/zkserver.sh stop

and now, let’s again try to index our data:

bin/post -c gettingstarted docs/

this time, instead of data being written into the collection we will get an error response similar to the following one:

posting file index.html (text/html) to [base]/extract
simpleposttool: warning: solr returned an error #503 (service unavailable) for url: http://localhost:8983/solr/gettingstarted/update/extract?resource.name=%2fusers%2fgro%2fsolry%2f5.2.1%2fdocs%2findex.html&literal.id=%2fusers%2fgro%2fsolry%2f5.2.1%2fdocs%2findex.html
simpleposttool: warning: response: <?xml version="1.0" encoding="utf-8"?>
<response>
<lst name="responseheader"><int name="status">503</int><int name="qtime">3</int></lst><lst name="error"><str name="msg">cannot talk to zookeeper - updates are disabled.</str><int name="code">503</int></lst>
</response>

as we can see, the lack of zookeeper connectivity resulted in solr not being able to index data. of course querying still works. turning on zookeeper again and retrying indexing will be successful, because solr will automatically reconnect to zookeeper and will start working again.

short summary

of course this and the previous blog post related to zookeeper and solrcloud are only touching the surface of what is happening when zookeeper connection is not available. a very good test that shows us data consistency related information can be found at http://lucidworks.com/blog/call-maybe-solrcloud-jepsen-flaky-networks/ . i really recommend it if you would like to know what will happen with solrcloud in various emergency situations.

cluster Data (computing)

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • AWS Redshift Data Sharing: Unlocking the Power of Collaborative Analytics
  • 7 Essential Tips for a Production ClickHouse Cluster
  • How To Manage Redis Cluster Topology With Command Line
  • Data Migration from AWS DocumentDB to Atlas on AWS

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!