Over a million developers have joined DZone.

Titan Server: From a Single Server to a Highly Available Cluster

DZone's Guide to

Titan Server: From a Single Server to a Highly Available Cluster

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 


Titan Growth Titan is a distributed graph database capable of storing graphs on the order of hundreds of billions of edges while, at the same time, supporting billions of real-time graph traversals a day. For most graph applications, the high-end performance aspects of Titan will never be reached. This does not mean that Titan is unsuitable for graph applications at the smaller scale — in the billions of edges and below. The purpose of this post is to introduce Titan from the perspective of a team of engineers developing a new graph-based application. These engineers will initially develop and test their codebase using a single Titan Server. When the application matures and is ready for production use, a highly-available setup is deployed. Finally, as the application becomes more popular and the data size and transactional load increases, a fully distributed cluster is leveraged. Growing a Titan database from a single server to a cluster is simply a matter of configuration. In this way, Titan gracefully scales to accommodate the changing requirements of a graph application.

Titan Single Server

Titan Single MachineAugustus and Tiberius are two software engineers who have designed an application that represents the Gods and people of Rome within a graph of familial relationships — a genealogy application. The intention is that Roman scholars will use the application to better understand the social fabric of their great Empire. While the intention is single-user, the two engineers decide to leverage Titan as the backend graph database. For one, Titan is completely free for any use (Apache 2 licensed) and two, it supports both single server and distributed deployments. The latter is important to them because the Greek Oracle of Delphi foretold that a genealogy graph would one day be used online by everyone throughout the Roman Empire.

$ wget http://s3.thinkaurelius.com/downloads/titan/titan-cassandra-0.3.0.zip
$ unzip titan-cassandra-0.3.0.zip
$ cd titan-cassandra-0.3.0
$ sudo bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties
13/03/27 12:40:32 INFO service.CassandraDaemon: JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_12-ea
13/03/27 12:40:32 INFO service.CassandraDaemon: Heap size: 40566784/477233152
13/03/27 12:40:32 INFO config.DatabaseDescriptor: Loading settings from file:/Users/marko/software/aurelius/titan/config/cassandra.yaml
13/03/27 12:40:32 INFO config.DatabaseDescriptor: Global memtable threshold is enabled at 151MB
13/03/27 12:40:32 INFO service.CacheService: Initializing key cache with capacity of 2 MBs.
13/03/27 12:40:35 INFO server.RexProRexsterServer: RexPro serving on port: [8184]
13/03/27 12:40:35 INFO server.HttpRexsterServer: Rexster Server running on: [http://localhost:8182]
13/03/27 12:40:35 INFO server.ShutdownManager: Bound shutdown socket to / Starting listener thread for shutdown requests.
    Users without wget can use curl -O or download from the Titan download page. 

The above sequence of 4 shell commands downloads and starts up a Titan Server on the localhost. Titan Server embeds both Cassandra and (a lightweight version of) Rexster within the same JVM. Titan Server exposes the following language-agnostic endpoints for developers to communicate with the graph:

  1. A RESTful endpoint available at http://localhost:8182/graphs.
  2. A RexPro binary protocol endpoint available on port 8184.

Titan HTTP/RexPro

Titan Server is configured via two primary files: titan-server-rexster.xml (shown below) and cassandra.yaml (discussed in the next section). These files are located in the config/ directory of the titan-cassandra-x.y.z distribution.


NOTE: Along with the above endpoints, Titan Server also exposes a JVM native serialization interface that can be used by JVM languages. This interface, for example, is the means by which Faunus/Hadoop interacts with Titan Server for global graph analytics. For more information on this endpoint, see Using Cassandra.

Titan Highly Available

The genealogy application was showing promise as a single-user system for studying the genetic history of the Roman people and Gods. Due to the positive response, Augustus and Tiberius decide that a multi-user online genealogy service would be a successful product.

 how many siblings did jupiter have?
// who is caesar's grandmother?
// who are marcus' children's in-laws?

As it currently stands, the genealogy data set is approximately 1 billion edges. Therefore, it can be stored and processed on a single machine. As a single-user application a single Titan Server suffices. However, with multiple users, it is important that the system is robust and can serve numerous concurrent requests. If the application is only backed by a single server, then if that server goes down, the application is unusable. To ensure 1.) no single point of failure and 2.) support for more transactions per second, Augustus and Tiberius deploy 3 machines each with a Titan Server installed.

Titan Highly Available The team updates the default config/cassandra.yaml file of each Titan Server by changing the localhost property value to be the IP address of the machine and adding a seed IP address for discoverability (see Multinode Cluster). Next, they start each Titan Server one after the other (titan.sh). To ensure that the servers properly clustered together, they use Cassandra’s nodetool.

apache-cassandra-1.2.3$ bin/nodetool ring
Datacenter: datacenter1
Replicas: 1
Address         Rack        Status State   Load            Owns                Token
                                                                               57715295010532946864463892271081778854    rack1       Up     Normal  93.06 KB        49.28%              141555886663081320436455748965948652071  rack1       Up     Normal  59.73 KB        33.44%              28311611028231080169766921879398209884    rack1       Up     Normal  9.43 KB         17.28%              57715295010532946864463892271081778854

Finally, on one of the servers, the cassandra-cli tool is used to update the replication factor of the titan-keyspace.

apache-cassandra-1.2.3$ bin/cassandra-cli -h
[default@unknown] update keyspace titan with strategy_options = {replication_factor:3};
[default@unknown] show schema titan;
create keyspace titan
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 3}
  and durable_writes = true;

With a replication factor of 3, each of the 3 Titan Servers are the primary host of approximately one-third of the vertices in the graph while, at the same time, each maintains a replica of the primary data of the other two servers. In this way, a highly-available, master-master setup is rendered. With this model, there is no single point of failure. If one of the database machines goes down, the other two are able to serve the primary data of the dead machine. If two of the machines go down, the remaining machine can serve data — albeit not with the same throughput possible when all three machines are available. With full master-master replication, the graph is duplicated and each server can support both reads and writes to the graph.

Titan Clustered

Roman Forum The following summer, the prophecy of the Oracle of Delphi comes true. An announcement is made in the Roman forum about the utility of the online genealogy application. Immediately, the plebeians of Rome join the site. They feverishly add their family histories and traverse the graph to learn more about their genetic past. This spike in usage puts an excessive amount of strain on the servers. With so many concurrent users, the three server machines have their CPU and disk I/O peaked trying to process requests.

Titan Clustered To remedy the situation, 6 more Titan Server machines are added to the cluster for a total of 9 machines. The token ring is rebalanced to ensure that each server maintains a relatively equal amount of the graph. A perfect/fair partition of 2^128 into 9 parts is below (see token ring calculator).


Each machine has its token updated using the following nodetool command. By repartitioning the token ring, the 3 original servers transfer their data to the newly on-boarded servers in order to distributed the data load as specified by their location in the 128-bit token space (each vertex hashes to a particular 128-bit token).

apache-cassandra-1.2.3$ bin/nodetool -h move 0
apache-cassandra-1.2.3$ bin/nodetool -h move 18904575940052135809661593108510408704
apache-cassandra-1.2.3$ bin/nodetool -h move 37809151880104271619323186217020817408

Token Ring Partition

With the replication factor still set to 3, each server does not maintain a full replica of the graph. Instead, each server only replicates a third of the full graph (3/9). At this point, no single server has a full picture of the graph. However, because there are more servers, more transactions can be served and more data can be stored. Augustus and Tiberius have successfully grown their single-user graph application to a distributed system that stores and processes a massive genealogy graph represented across a cluster of Titan Server machines.


Titan Head Titan was developed from the outset to support OLTP distributed graph storage and processing. While it is important that a graph database can scale indefinitely, less than 1% of applications written today will ever leverage near trillion edge graphs. The other 99% of applications will store and process million and billion edge graphs. Titan is able to meet the requirements of both segments of the graph application space. Furthermore, Titan scales gracefully as developers move from a single server prototype, to a highly-available production system, to ultimately, a fully distributed cluster sustaining the size and workload requirements seen by 1% of applications.


Stephen Mallette and Blake Eggleston are the developers of Rexster’s RexPro. Their efforts were a driving force behind the development of Titan Server.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.


Published at DZone with permission of Marko Rodriguez, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}