DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Intro to Spring Data MongoDB Reactive and How to Move It to the Cloud
  • Spring Data: Easy MongoDB Migration Using Mongock
  • Advanced Search and Filtering API Using Spring Data and MongoDB
  • Manage Hierarchical Data in MongoDB With Spring

Trending

  • Using the Spring @RequestMapping Annotation
  • LLM Integration in Enterprise Applications: A Practical Guide
  • Java in a Container: Efficient Development and Deployment With Docker
  • Clean Code: Concurrency Patterns, Context Management, and Goroutine Safety, Part 5
  1. DZone
  2. Data Engineering
  3. Databases
  4. Spring Data + Solr Cloud + Zookeeper + MongoDB + Ubuntu Integration

Spring Data + Solr Cloud + Zookeeper + MongoDB + Ubuntu Integration

This article will show you how to set up a clustered MongoDB to store the data, a Solr Cloud to index the data in MongoDB in real time. Then how to use mongo-connector to replicate data from MongoDB to Solr Cloud. And finally, develop a simple Spring Data Solr application that talks to Solr Cloud to search the data in MongoDB.

By 
Chuanbao Lu user avatar
Chuanbao Lu
·
Jul. 11, 16 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
10.7K Views

Join the DZone community and get the full member experience.

Join For Free

I had some years of experience with Solr version 4.x. But Solr version 5 and version 6 are much different from the older version. So I decided to refresh my knowledge by integrating Solr Cloud 6.0.1, MongoDB 3.2.7, Spring Data Solr, which is part of Spring Boot 1.4.0.M3.  

My goal is to set up a clustered MongoDB to store the data, a Solr Cloud to index the data in MongoDB in real time. I will use mongo-connector to replicate data from MongoDB to Solr Cloud. Then, I will develop a simple Spring Data Solr application that talks to Solr Cloud to search the data in MongoDB.

Please feel free to give me any feedback or ask me any questions about it.  Hope this article is helpful for your next project.

Install JDK 1.8 If You Don't Have on Your Ubuntu:

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Install MongoDB 3.2.12:

  1. Follow these instructions to install MongoDB on

Ubuntu https://docs.mongodb.com/v3.0/tutorial/install-mongodb-on-ubuntu/

  1. Edit /etc/mongod.conf to setup replica set. I just installed one instance of MongoDB. You can install multiple instances that share the same replica set.
# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1

# replica sets

replication:
  replSetName: rs0
  1. Restart MongoDB
$ sudo service mongod restart
  1. Run mongo to connect mongod server, then run
$ mongo

rs.initiate()
rs.status()

You will see something like this:

{
    "set" : "rs0",
    "date" : ISODate("2016-06-14T07:41:49.307Z"),
    "myState" : 1,
    "term" : NumberLong(1),
    "heartbeatIntervalMillis" : NumberLong(2000),
    "members" : [
    {
        "_id" : 0,
        "name" : "ubuntu:27017",
        "health" : 1,
        "state" : 1,
        "stateStr" : "PRIMARY",
        "uptime" : 400763,
        "optime" : {
            "ts" : Timestamp(1465716890, 2227),
            "t" : NumberLong(1)
        },
        "optimeDate" : ISODate("2016-06-12T07:34:50Z"),
        "electionTime" : Timestamp(1465694725, 2),
        "electionDate" : ISODate("2016-06-12T01:25:25Z"),
        "configVersion" : 1,
        "self" : true
    }
    ],
    "ok" : 1
}
  1. Download sample MongoDB JSON data from here:

https://raw.githubusercontent.com/mongodb/docs-assets/primer-dataset/primer-dataset.json

  1. Import sample data into MongoDB:
$ mongoimport --db test --collection restaurant --drop --file ~/primer-dataset.json
  1. Run mongo, run the following to verify if you have sample data imported correctlly.
$mongo

use test
db.restaurant.findOne()
  1. The sample data will look like this:
{
    "_id" : ObjectId("575d1095ae7da76b8fb71b1a"),
    "address" : {
        "building" : "1007",
        "coord" : [
        -73.856077,
        40.848447
        ],
        "street" : "Morris Park Ave",
        "zipcode" : "10462"
    },
    "borough" : "Bronx",
    "cuisine" : "Bakery",
    "grades" : [
    {
        "date" : ISODate("2014-03-03T00:00:00Z"),
        "grade" : "A",
        "score" : 2
    },
    {
        "date" : ISODate("2013-09-11T00:00:00Z"),
        "grade" : "A",
        "score" : 6
    },
    {
        "date" : ISODate("2013-01-24T00:00:00Z"),
        "grade" : "A",
        "score" : 10
    },
    {
        "date" : ISODate("2011-11-23T00:00:00Z"),
        "grade" : "A",
        "score" : 9
    },
    {
        "date" : ISODate("2011-03-10T00:00:00Z"),
        "grade" : "B",
        "score" : 14
    }
    ],
    "name" : "Morris Park Bake Shop",
    "restaurant_id" : "30075445"
}

Install Zookeeper 3.4.6:

  1. Follow instruction of Setting Up a Single ZooKeeper from here (you can install multiple Zookeeper instances if you want):

https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble

  1. Start Zookeeper:
$ /opt/zookeeper-3.4.6/bin/zkServer.sh start/stop/restart

Add Solr Configuration Files to Zookeeper

  1. Download my sample code from: https://github.com/lcbdl/resaurant.git
  2. It's a good idea to track the change of you configurations in a version control system so every time you update the files, you commit to both version control system and zookeeper.
  3. You have to have the following in solrconfig.xml:
<schemaFactory class="ClassicIndexSchemaFactory"/>

Solr 6 uses managed_schema by default. But to index MongoDB, we have to have a schema.xml.

So we need this to force Solr to use classic schema.xml:

<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />

mongo-connector uses this to retrieve schema.xml from solr

  1. schema.xml
   ......
   <!-- 
MongoDB use _id as the primary key name.
   -->
   <field name="_id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

   <!-- metadata used by mongo-connector -->
   <field name="_ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
   <field name="ns" type="string" indexed="true" stored="true" required="true" multiValued="false" />

   <!-- fields for the document -->
   <field name="restaurant_id" type="string" indexed="true" stored="true"/>
   <field name="name" type="text_general" indexed="true" stored="true"/>
   <field name="borough" type="string" indexed="true" stored="true"/>
   <field name="cuisine" type="string" indexed="true" stored="true"/>

   <field name="address.building" type="string" indexed="true" stored="true"/>
   <dynamicField name="address.coord.*" type="float" indexed="false" stored="false"/>
   <field name="address.street" type="string" indexed="true" stored="true"/>
   <field name="address.zipcode" type="string" indexed="true" stored="true"/>

   <dynamicField name="*.date" type="date" indexed="false" stored="false" />
   <dynamicField name="*.grade" type="string" indexed="false" stored="false" />
   <dynamicField name="*.score" type="int" indexed="false" stored="false" />

   <field name="grades.dates" type="date" indexed="true" stored="true" multiValued="true" />
   <field name="grades.grades" type="string" indexed="true" stored="true" multiValued="true" />
   <field name="grades.scores" type="int" indexed="true" stored="true" multiValued="true" />

   .......
   <!-- 
      Specify what field is the unique key in solr
   -->
   <uniqueKey>_id</uniqueKey>

   <copyField source="name" dest="text"/>
   <copyField source="borough" dest="text"/>
   <copyField source="cuisine" dest="text"/>
   <copyField source="address.building" dest="text"/>
   <copyField source="address.street" dest="text"/>
   <copyField source="address.zipcode" dest="text"/>
   <copyField source="address.coord.*" dest="text"/>

   <copyField source="*.date" dest="grades.dates" />
   <copyField source="*.grade" dest="grades.grades" />
   <copyField source="*.score" dest="grades.scores" />

   ......

It's really challenging to map array embedded JSON document in Solr. For the below example, I used:

<dynamicField name="*.date" type="date" indexed="false" stored="false" />

and:

<field name="grades.dates" type="date" indexed="true" stored="true" multiValued="true" />

then:

 <copyField source="*.date" dest="grades.dates" />

Solr only supports dynamic field name with either leading or trailing asterisks. It doesn't support something like grades.*.date.

"grades" : [
{
    "date" : ISODate("2014-03-03T00:00:00Z"),
    "grade" : "A",
    "score" : 2
},
{
    "date" : ISODate("2013-09-11T00:00:00Z"),
    "grade" : "A",
    "score" : 6
}]

Install Solr Cloud

  1. Download Solr-6.0.1.tgz file from:

http://www.us.apache.org/dist/lucene/solr/6.0.1/solr-6.0.1.tgz

$ wget http://www.us.apache.org/dist/lucene/solr/6.0.1/solr-6.0.1.tgz
  1. Extract the installation script from the tgz file:
$ tar xzf solr-6.0.1.tgz solr-6.0.1/bin/install_solr_service.sh --strip-components=2
  1. Install 2 Solr instances:
$ sudo ./install_solr_service.sh solr-6.0.1.tgz -i /opt -d /var/solr1 -u <username> -s solr1 -p 8983
$ sudo ./install_solr_service.sh solr-6.0.1.tgz -i /opt -d /var/solr2 -u <username> -s solr2 -p 8984
  1. Update /opt/solr-6.0.1/bin/solr.in.sh
$ vi /opt/solr-6.0.1/bin/solr.in.sh

Add the following line in it:

ZK_HOST="localhost:2181"

  1. Start Solr instances:
$ sudo service solr1 start

$ sudo service solr2 start
  1. Check solr1 and solr2 status, and make sure you have output like below:
$ sudo service solr1 status


Found 1 Solr nodes:

Solr process 4922 running on port 8983
{
    "solr_home":"/var/solr1/data",
    "version":"6.0.1 c7510a0fdd93329ec04c853c8557f4a3f2309eaf - sarowe - 2016-05-23 19:40:37",
    "startTime":"2016-06-12T03:34:56.375Z",
    "uptime":"0 days, 21 hours, 37 minutes, 50 seconds",
    "memory":"214.7 MB (%43.8) of 490.7 MB",
    "cloud":{
        "ZooKeeper":"localhost:2181",
        "liveNodes":"2",
        "collections":"0"
    }
}
  1. Create collection 

Open your browser, and go to this URL: http://192.168.1.151:8983/solr/admin/collections?action=CREATE&name=restaurant&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=restaurant

Install Mongo-connector and Connect Solr and Mongodb

  1. Install pip if you don't have pip in your Ubuntu:
$ apt-get update

$ apt-get install python-pip
  1. Install mongo-connector:
$ pip install mongo-connector
  1. Run mongo-connector (You can create a ubuntu server and start when system start up).

Mongo-connector will populate all existing data from MongoDB to Solr, also update Solr data whenever the data in MongoDB changes.

$ mongo-connector --auto-commit-interval=0 -m localhost:27017 -t http://localhost:8983/solr/restaurant -d solr_doc_manager
  1. Test if you have data in Solr by access this URL, and you will see the result:

http://192.168.1.151:8983/solr/restaurant/select?indent=on&q=*:*&wt=json

{
    "responseHeader":{
        "zkConnected":true,
        "status":0,
        "QTime":19,
        "params":{
            "q":"*:*",
            "indent":"on",
            "wt":"json"}},
        "response":{"numFound":25359,"start":0,"maxScore":1.0,"docs":[
        {
            "grades.grades":["A",
            "A",
            "A",
            "B"],
            "restaurant_id":"40396126",
            "grades.scores":[7,
            21,
            7,
            2],
            "borough":"Manhattan",
            "address.street":"West   57 Street",
            "cuisine":"American ",
            "grades.dates":["2013-07-12T00:00:00Z",
            "2012-07-17T00:00:00Z",
            "2012-03-07T00:00:00Z",
            "2014-08-01T00:00:00Z"],
            "address.building":"205",
            "_ts":6295206107744831667,
            "address.zipcode":"10019",
            "ns":"test.restaurant",
            "name":"Europa Cafe",
            "_id":"575d1095ae7da76b8fb71f02",
            "_version_":1536925504605519872},
    ......

Create a Spring Data Solr Applciation to Run a Query Against Solr Cloud

  1. Download my sample code to you workspace folder:
$ git clone https://github.com/lcbdl/resaurant.git
  1. Import maven project into your Java IDE (I used Spring STS)
  2. Right click on RestaurantRepositoryTest.java, and Run As -> JUnit Test.  

You will see a good test result if everything is fine.

Key Components in the Code

You can find my full sample code here: https://github.com/lcbdl/resaurant

  1. Model class Restaurant.java:
@SolrDocument(solrCoreName="restaurant")  // Solr collection name
public class Restaurant {

    @Field("_id")                         // Specify field name in solr
    @Id                                   // This is required
    private String id;

    @Field("name")
    private String name;

    @Field("restaurant_id")
    private String restaurantId;

    @Field("borough")
    private String borough;

      ......
  1. Repository interface RestaurantRepository.java:
public interface RestaurantRepository extends SolrCrudRepository<Restaurant, String> {

    List<Restaurant> findByName(String name);

}
  1. Spring Boot Main Applciation Class:
@SpringBootApplication
public class RestaurantApplication extends SpringBootServletInitializer {

    @Override
    protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
        return application.sources(RestaurantApplication.class);
    }

    public static void main(String[] args) {
        SpringApplication.run(RestaurantApplication.class, args);
    }
}
  1. SpringSolrConfiguration.java:
@Configuration
@EnableSolrRepositories(basePackages = { "ca.knc.restaurant.repository.solr" }, multicoreSupport = true)
public class SpringSolrConfig {

    @Value("${spring.data.solr.zk-host}")
    private String zkHost;

    @Bean
    public SolrClient solrClient() {
        return new CloudSolrClient(zkHost);
    }

    @Bean
    public SolrTemplate solrTemplate(SolrClient solrClient) throws Exception {
        return new SolrTemplate(solrClient);
    }

}
  1. application.properties:
# SOLR (SolrProperties)
spring.data.solr.zk-host=192.168.1.151:2181
  1. RestaurantRepositoryTest.java
@RunWith(SpringJUnit4ClassRunner.class)
@SpringBootTest(classes = RestaurantApplication.class)
public class RestaurantRepositoryTest {

    @Autowired
    private RestaurantRepository restaurantRepository;

    @Test
    public void findByNameTest() {
        List<Restaurant> restaurants = restaurantRepository.findByName("Morris Park Bake Shop");
        assertNotNull(restaurants);
        assertTrue(restaurants.size() > 0);
        for (Restaurant r : restaurants) {
            System.out.println(r.toString());
        }
    }

}
Spring Framework Data (computing) MongoDB Spring Data Cloud ubuntu Integration

Opinions expressed by DZone contributors are their own.

Related

  • Intro to Spring Data MongoDB Reactive and How to Move It to the Cloud
  • Spring Data: Easy MongoDB Migration Using Mongock
  • Advanced Search and Filtering API Using Spring Data and MongoDB
  • Manage Hierarchical Data in MongoDB With Spring

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook