DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. SolrCloud HOWTO

SolrCloud HOWTO

Rafał Kuć user avatar by
Rafał Kuć
·
Mar. 15, 13 · Interview
Like (0)
Save
Tweet
Share
11.84K Views

Join the DZone community and get the full member experience.

Join For Free

what is the most important change in 4.x version of apache solr? i think there are many of them but solr cloud is definitely something that changed a lot in solr architecture. until now, bigger installations suffered from single point of failure (spof) – there was only the one master server and when this server was going down, the whole cluster lose the ability to receive new data. of course you could go for multiple masters, where a single master was responsible for indexing some part of the data, but still, there was a spof present in your deployment. even if everything worked, due to commit interval and the fact that slave instances checked the presence of new data periodically, the solution was far from ideal – the new data in the cluster appeared minutes after commit.

solr cloud changed this behavior. in this article we will setup a new solrcloud cluster from the scratch and we will see how it work.

our example cluster

in our example we will use three solr servers. every server in the cluster is capable of handling both the index and the query requests. this is the main difference from the old-fashioned solr architecture with single master and multiple slave servers. in the new architecture there is one additional element present: zookeeper, which is responsible for holding configuration of the cluster and for synchronization of its work. it is crucial to understand that solr relies on information stored in zookeeper – if zookeeper will fail, the whole cluster is useless. because of this it is very important to have a fault tolerant zookeeper ensemble and because of this we use three independent instances of zookeeper that will form the ensemble.

zookeeper installation

as we said previously, zookeeper is a vital part of solrcloud cluster. although we can use embedded zookeeper, this is only handy for testing. for production you definitely want your zookeeper to be installed independently from solr and run in a different java virtual machine process to avoid those two interrupting each other and influencing each others work.

the installation of apache zookeeper is straight forward and may be described by the following steps:

  1. download zookeeper archive from: http://www.apache.org/dyn/closer.cgi/zookeeper/
  2. unpack downloaded archive and copy conf/zoo_sample.cfg to conf/zoo.cfg
  3. modify zoo.cfg :
    1. change datadir to directory where you want to hold all cluster configuration data
    2. add information about all zookeeper servers (see below)

after mentioned changes my zoo.cfg looks like the following one:

ticktime=2000
initlimit=10
synclimit=5
datadir=/var/zookeeper/data
clientport=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
  1. copy this archive to the all servers, where zookeeper service should be run
  2. create file /var/zookeeper/data/myid with server identifier. this identifier is different for each instance (for example on zk2 this file should contain 2 number)
  3. start all instances using “bin/zkserver.sh start-foreground” and verify validity of the installation
  4. add “bin/zkserver.sh start” to starting scripts and make sure that operation system monitors that zookeeper service is available.

solr installation

the installation of solr is the following:

  1. download solr archive from: http://www.apache.org/dyn/closer.cgi/lucene/solr/4.1.0
  2. unpack downloaded archive
  3. in this tutorial we will use the ready solr installation from the example directory and all changes are made to this example installation
  4. copy archive to all servers which are the part of the cluster
  5. install to zookeeper configuration data, which will be used by the solr cluster. for this run the first instance with:

    java -dbootstrap_confdir=./solr/collection1/conf -dcollection.configname=solr1 -dzkhost=zk1:2181 -dnumshards=2 -jar start.jar

this should be run only once. the next run will use configuration from zookeeper cluster and local configuration files are not needed.

  1. run all instances using

java –dzkhost=zk1:2181 –jar start.jar

verify the installation

go into administration panel on any solr instance. for our deployment the url should be like http://solr1:8983/solr . when you click on cloud tab, and graph, you should see something similar to the following screen shot:

cloud

collection

our first collection – the collection1 is divided into two shards ( shard1 and shard2 ). each of those shards is placed on two solr instances (ok, on the picture you see that every solr is placed on the same host – i have currently only one physical server available for tests – any volunteers for donation? ;) ). you can see that type of the dot tell us if it is a primary shard or replica.

summary

i hope this is the first note about solrcloud. i know it is very short and skips details and information about shards, replicas and architecture of this solution. treat this as a simple checklist for basic, (but real) configuration of your cloud.

cluster

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How To Build a Spring Boot GraalVM Image
  • Apache Kafka Is NOT Real Real-Time Data Streaming!
  • Use Golang for Data Processing With Amazon Kinesis and AWS Lambda
  • [DZone Survey] Share Your Expertise and Take our 2023 Web, Mobile, and Low-Code Apps Survey

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: