Over a million developers have joined DZone.

Setting up a Neo4j Cluster on Amazon

DZone's Guide to

Setting up a Neo4j Cluster on Amazon

· Database Zone
Free Resource

What if you could learn how to use MongoDB directly from the experts, on your schedule, for free? We've put together the ultimate guide for learning MongoDBSign up and you'll receive instructions for how to get started!

There are multiple ways to setup a Neo4j Cluster on Amazon Web Services (AWS) and I want to show you one way to do it.


  1. Create a VPC
  2. Launch 1 Instance
  3. Install Neo4j HA
  4. Clone 2 Instances
  5. Configure the Instances
  6. Start the Coordinators
  7. Start the Neo4j Cluster
  8. Create 2 Load Balancers
  9. Next Steps

We’ll start off by logging on to Amazon Web Services and creating a Virtual Private Cloud:

We’ll create a VPC with a Single Public Subnet Only (but you may choose a VPC with Public and Private Subnets if you’d like).

Once our VPC is up and running, we can create instances by heading over to EC2.

You may choose any Amazon Machine Image you’d like, but I’m going to go with Ubuntu 12.04 LTS.

You can pick any Instance Type you want, depending on how large your data is. I’ll just create a small instance type, but make sure to launch it in the VPC.

I’ll choose the IP to be I will create 2 more instances later which will be 152 and 153.

I will create a new Key Pair, call it “ha_cluster” and download it to my machine.

I’ll create a new security group called “neo4j_ha_cluster” and add all traffic within

Screen Shot 2012-12-12 at 11.37.46 AM

Zookeeper and Neo4j will communicate only within this network. After our instance is created we will allocate a new Elastic IP, making sure to select VPC where it prompts you where you want to use it.

Associate the Elastic IP to the currently running instance.

Open a terminal window and head to the directory where you saved the ha_cluster.pem file earlier. Then change the permissions:

chmod 400 ha_cluster.pem

We will be moving that Elastic IP around a bit so we’ll use a funny way to connect to the instance via ssh so that the IP doesn’t get associated with any specific host. Replace with your Elastic IP address.

ssh -i ha_cluster.pem -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ubuntu@

Once we’ve logged on to the server, let’s run update and install java.

sudo apt-get update
sudo apt-get install openjdk-6-jdk

Next we’ll download and install Neo4j Enterprise edition and rename the directory to just neo4j.

wget http://dist.neo4j.org/neo4j-enterprise-1.8-unix.tar.gz
tar -xvzf neo4j-enterprise-1.8-unix.tar.gz
mv neo4j-enterprise-1.8 neo4j

We need to do a little tweaking to the upper limit of the number of files we can have open. We can sudo into root and edit the limits.conf file:

sudo su
vi /etc/security/limits.conf

We’ll add two entries setting the file limits to 40000 for the ubuntu user.

ubuntu   soft    nofile  40000
ubuntu   hard    nofile  40000

Reboot the instance and while waiting for it to come up we will make use of an unmanaged extension created by Chris Gioran to work with the Amazon Load Balancers.

git clone git://github.com/maxdemarzi/ha-rest-master-info.git
cd ha-rest-master-info
mvn package
cd ..

This will create a ha-rest-master-info/target/ha-rest-master-info-0.2.jar file which we want to copy to our instance.

scp -i ha_cluster.pem -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ha-rest-master-info/target/ha-rest-master-info-0.2.jar ubuntu@

Reconnect to the instance and check the update open file limits now returns 40000.

ssh -i ha_cluster.pem -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ubuntu@
ulimit -n

Also make sure the jar file made it:

ls neo4j/plugins
ha-rest-master-info-0.2.jar  README.txt

Great, now let’s configure Neo4j to use it. Edit neo4j-server.properties and make the following changes:

Set the Server to the current ip. You will need to change this to 152, and 153 for the other cluster members.
vi neo4j/conf/neo4j-server.properties

We also need to tell Neo4j to run in High Availability mode by uncommenting the line below:


Finally enable the code in the jar file we added:


We’ll also need to edit the neo4j.properties file:

First, set the server id to 1 (or 2 or 3 on the other instances).

vi neo4j/conf/neo4j.properties
ha.server_id = 1
Second, set the addresses of the 3 coordinators:
Finally, change the server setting to (or 152, 153 on the other instances).
ha.server = 
Let’s setup the 3 coordinators, one for each of the ec2 instances we’ll spin up by replacing the entries in coord.cfg to match the entries below.
vi neo4j/conf/coord.cfg
Finally we want to make sure all the instances know about each other so edit your hosts file:
sudo vi /etc/hosts ip-10-0-0-151 ip-10-0-0-152 ip-10-0-0-153

At this point, we want to make two more copies. So we can do these steps all over again two more times, or better yet create an image of your instance and clone it.

Screen Shot 2012-12-10 at 1.54.20 PM

It will restart the instance you are on (and log you off), but in a minute or three you should be able to go to the Images > AMIs section and launch a clone of your first instance.

Launch Instance

Make sure you launch it inside the VPC and not on EC2.


Launch 1 at time so you can specify the ip address to and 153.

give it an ip

Select the already existing security group, and let the instances come up.

use existing security group

Make sure you modify the neo4j.properties and neo4j-server.properties files as mentioned above:

vi neo4j/conf/neo4j.properties
ha.server_id=2 (and then 3)
ha.server = (and then 153)
vi neo4j/conf/neo4j-server.properties
org.neo4j.server.webserver.address= (and then 153)

You’ll also want to set the Zookeeper coordinator ids on each instance:

On instance 1:
echo '1' > data/coordinator/myid
On instance 2:
echo '2' > data/coordinator/myid
On instance 3:
echo '3' > data/coordinator/myid

Start the coordinators on all 3 instances:

neo4j/bin/neo4j-coordinator start

Finally start Neo4j on all 3 instances (using the no-wait option):

neo4j/bin/neo4j start-no-wait

Starting Neo4j Server...WARNING: not changing user
process [2213]...Started the server in the background, returning...

Give them a minute to start up and then you can test which one is the master with:


Now we will setup two load balancers. A load balancer where we will point our “reads” and another load balancer where we will point our “writes”.

Create a new load balancer, and set the ports to 7474:


For our read load balancer we will point the health check to the root:


Add our VPC subnet:


As well as create a new security group that allows just port 7474 to go through:


Then we will create a second load balancer that points just to the Master:


Using /ha-info/masterinfo/isMaster as our health check path.


Add all three instances to both load balancers:


Give them a few minutes to come up:


Then you can check their status. All three instances should report “In Service” for the read load balancer:


Only one instance should report “In Service” for the write load balancer:


We can test that they are both up and running by:

curl internal-Neo4j-Up-1172790076.us-west-1.elb.amazonaws.com:7474/
  "management" : "http://internal-Neo4j-Up-1172790076.us-west-1.elb.amazonaws.com:7474/db/manage/",
  "data" : "http://internal-Neo4j-Up-1172790076.us-west-1.elb.amazonaws.com:7474/db/data/"

…and to make sure the write load balancer is pointing to the master:

curl internal-Neo4j-Master-864793314.us-west-1.elb.amazonaws.com:7474/ha-info/masterinfo/isMaster

Now you have a Neo4j cluster up and running on a VPC on the Amazon cloud. You can point your Application servers to those load balancers and go from there. If this seemed a little complicated… well it is. Look for a simplified clustering solution in Neo4j version 1.9.

Checkout the different configuration options in the Neo4j Documentation. Particularly the ha.pull_interval and ha.tx_push_factor settings.

If you run into trouble with Zookeeper, you should reset the Coordinator cluster to a clean state, by shutting down all instances, removing the data/coordinator/version-2/* data files and restarting the Coordinators.

Also take a look at Harold Spencer Jr. blog post on using eucalyptus and the DIY on Amazon EC2 guide.

Would love some help making this process easier, so if you have mad Amazon EC2 skills and know how to better automate this, please comment below. Thanks!

What if you could learn how to use MongoDB directly from the experts, on your schedule, for free? We've put together the ultimate guide for learning MongoDBSign up and you'll receive instructions for how to get started!


Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}