Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Our Own Multi-Model Database (Part 6)

DZone's Guide to

Our Own Multi-Model Database (Part 6)

As the journey toward a multi-model database continues, we look at indexing data to enhance performance while also examining relationships between nodes.

· Database Zone
Free Resource

Learn how to move from MongoDB to Couchbase Server for consistent high performance in distributed environments at any scale.

shitty6

Back in part two, we ran some JMH tests to see how many empty nodes we could create. (If you want to start even earlier, dive into parts onethreefour, and five.) Let’s try that test one more time, but adding some properties. Our nodes will have a username, an age, and a weight randomly assigned. It’s not a long test, but just enough to give us a ballpark.

@Benchmark
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@Fork(1)
@Threads(4)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public void measureCreateNodeWithProperties() throws IOException {
    HashMap<String, Object> properties = new HashMap<>();
    properties.put("username", "username" + rand.nextInt() );
    properties.put("age", + rand.nextInt(100) );
    properties.put("weight", rand.nextInt(300) );
    db.addNode( String.valueOf(rand.nextInt()), properties);
}


When we run it, we get about 350,000 operations per second. That’s pretty damn nice, but what about reads? Let’s start with a little foreshadowing… In the ChronicleMap readme:

Chronicle Map is not a multimap. Using a ChronicleMap<K, Collection> as multimap is technically possible, but often leads to problems (see a StackOverflow answer for details).

Then on Twitter:

screen-shot-2017-01-22-at-7-47-05-pm

Hum… we already saw in part two that the serializing and deserializing gave me trouble reading. So let's test. We’re going to borrow a stupid test from another multi-model database that is really a fruit but tells everyone they are a vegetable. These guys go around talking about distributing graph databases with billions of edges and run bullshit comparative benchmarks on 1,632,803 nodes and 30M relationships graphs. Anyway, they run an aggregation test where they group count all the nodes by their age property. The equivalent of this:

@Test
public void shouldAggregate() {
    Iterator<Map.Entry<String, HashMap>> iter = db.getAllNodes();
    HashMap<Integer, Integer> ages = new HashMap<>();
    Integer age;
    Long start = System.currentTimeMillis();
    while (iter.hasNext()) {
        Map.Entry<String, HashMap> nodeEntry = iter.next();
        age = (Integer) nodeEntry.getValue().get("age");
        ages.merge(age, 1, Integer::sum);
    }
    Long end = System.currentTimeMillis();
    System.out.println(end-start);
}


So how fast do we complete this test? In about 28 seconds. How fast do they do it in? 1.250 seconds. Damn it, we can’t let some else win. No way. So what do you do when the docs tell you that you are barking up the wrong tree and one of the authors of the library you are using tells you that you done messed up?

toughgoshopping

But what am I shopping for? Let’s go back to our test for a minute, what about if we only wanted to find users between with ages between 35-44? How would we handle that? Iterate and throw away invalid values? No, that doesn’t scale. We need an index. We haven’t even talked about indexing yet. I was going to punt on that until later, but it is later. Alright… so we either need to bake in Lucene or another index engine that was really meant to index documents, not simple properties… or we need to find a Java Collection Library that has indexing built in, doesn’t require serializing and deserializing for access, and is not much slower than ChronicleMap. After praying to the all knowing and all powerful search engine in the cloud for some help, it answered with CQEngine.

CQEngine solves the scalability and latency problems of iteration by making it possible to build indexes on the fields of the objects stored in a collection, and applying algorithms based on the rules of set theory to reduce the time complexity of accessing them.

I can’t believe I never heard of it before. Basically, it’s just what I’m looking for, but we already got burned twice, so how about before we get too far we do some testing first? Good, first thing I gotta do is create a copy of GuancialeDB, called “GuancialeDB2” and start replacing. We’ll need an ObjectLockingIndexedCollection because we want to have unique constraints on the node id and we don’t want two or more threads stepping all over each other.

private static IndexedCollection<PropertyContainer> nodes = new ObjectLockingIndexedCollection<>();
private static IndexedCollection<PropertyContainer> relationships = new ObjectLockingIndexedCollection<>();


We’ll create a class called PropertyContainer that has an id a bunch of properties in a HashMap like before.

public class PropertyContainer {

    public final String id;
    public final HashMap<String, Object> properties;

    public PropertyContainer(String id, HashMap<String, Object> properties) {
        this.id = id;
        this.properties = properties;
    }

    public String getId() {
        return id;
    }

    public static final Attribute<PropertyContainer, String> ID = 
                            attribute("id", PropertyContainer::getId);
...


The last thing we need to do is add the unique index on id to both the nodes and relationships collections in our constructor.
private GuancialeDB2() {
    nodes.addIndex(UniqueIndex.onAttribute(PropertyContainer.ID));
    relationships.addIndex(UniqueIndex.onAttribute(PropertyContainer.ID));
    related = new HashMap<>();
}


There were some minor tweaks to various methods, but it wasn’t very painful. Luckily we have not written much code, and this stuff is contained within just the GuancialeDB2 class, so it was easy to change. Running various benchmarks now gives us:

Benchmark                                                            Score         Error 
GuancialeDBBenchmark.measureCreateEmptyNode                    2727115.220 ±  542998.073  ops/s
GuancialeDBBenchmark2.measureCreateEmptyNode                   3070698.529 ± 1159462.893  ops/s

GuancialeDBBenchmark.measureCreateEmptyNodes                      6535.202 ±    1608.875  ops/s
GuancialeDBBenchmark2.measureCreateEmptyNodes                     5966.563 ±    4186.546  ops/s

GuancialeDBBenchmark.measureCreateNodeWithProperties            353071.807 ±   23993.490  ops/s
GuancialeDBBenchmark2.measureCreateNodeWithProperties           889107.557 ±  393243.609  ops/s

GuancialeDBBenchmark.measureCreateNodesWithProperties              471.845 ±      88.751  ops/s
GuancialeDBBenchmark2.measureCreateNodesWithProperties             277.441 ±     350.483  ops/s

GuancialeDBBenchmark.measureCreateEmptyNodesAndRelationships         3.861 ±      10.257  ops/s
GuancialeDBBenchmark2.measureCreateEmptyNodesAndRelationships        1.602 ±       1.314  ops/s


The variability of some of these tests is HUGE… a good reason not to run performance tests on your laptop. But if we squint at it, it tells us that it’s close enough on the writes. What about reads? What about that aggregation test from earlier on? Would you believe it came all the way down to just 400ms. Boo Yeah! Now we’re winning.

The indexing functionality lets us generate new indexes at runtime which is perfect if we’re going to continue to follow Neo4j into the 2.0 era with Labels and optional Schema. Let’s see how that works by replicating the aggregation test but to only count people with ages 35-44. First, we’ll create our users.

IndexedCollection<PropertyContainer> nodes = new ConcurrentIndexedCollection<>();

for (int person = 0; person < 1632803; person++) {
    HashMap<String, Object> properties = new HashMap<>();
    properties.put("id" + person, "id" + person);
    properties.put("age", rand.nextInt(120));
    nodes.add(new PropertyContainer("id" + person, properties));
}


Now this next part looks complicated, but basically, we are creating a class for the attribute we are going to index. It will pull in the age property of our PropertyContainer which is an Integer. Then we create an instance of that class which we will use in a bit when creating the new index.
Class<? extends SimpleNullableAttribute<PropertyContainer, Integer>> attributeClass =
    generateSimpleNullableAttributeForParameterizedGetter(
        PropertyContainer.class, Integer.class,"getIntegerProperty", "age", "age");

SimpleNullableAttribute<PropertyContainer, Integer> ageAttribute = attributeClass.newInstance();


First, we need to add a “getIntegerProperty” method to our PropertyContainer class…and we will need to add one of these for every type later.
public Integer getIntegerProperty(String key) {
    return (Integer)properties.get(key);
}


With the ageAttribute made, we will add a “NavigableIndex” which is an index backed by a ConcurrentSkipListMap that supports : Equal, LessThan, GreaterThan and Between. This last one is what we will use in our query.
nodes.addIndex( NavigableIndex.onAttribute(ageAttribute));


Next, we prepare the query and the result map, aggregating only nodes with ages between 35 and 44:
HashMap<Integer, Integer> ages = new HashMap<>();
Integer age;
Query<PropertyContainer> query = between(ageAttribute, 35, 44);
Long start = System.currentTimeMillis();
ResultSet<PropertyContainer> results = nodes.retrieve(query);
for (PropertyContainer nodeEntry : results) {
    age = (Integer) nodeEntry.getProperties().get("age");
    ages.merge(age, 1, Integer::sum);
}
Long end = System.currentTimeMillis();
System.out.println(end-start);


Our aggregation of all the ages took around 400ms, this one takes just 75ms. Awesome! Now we have a foundation for property indexes.

There is just one problem… since we’re no longer using ChronicleMap the name GuancialeDB doesn’t make sense anymore. Plus I also want to leave that code up, so people see the folly of my ways. So we need a new name. A good one, one that sticks.

At dinner the other day, Luke Gannon suggested “Disoriented DB.” Named somewhat after another multi-model database that doesn’t know which way to turn, but they know benchmarks and magically win them all *cough* cached results *cough*. They also know feature tables. They have the best feature tables. There are no better feature tables. Full of the “alternate facts” we keep hearing so much about lately.

I also heard that Sean Parker stopped over for dinner at another database vendor that was doing bullshit benchmarks recently and told them to "drop the D," just "graph." I like that, simpler is better. So now it is just "Disoriented." We can write Disoriented tests, start our Disoriented server, etc. When your relational database is giving you troubles, you can tell your manager “let’s get Disoriented”. Once you get Disoriented, you can tell your DevOps team to just “deploy Disoriented” and get some nachos. Love it. New code on GitHub as always.

…and a message to all our graph database vendor frenemies out there.

jordan-stop-it

Stop with the competitive vendor benchmarks. We don’t do that at Neo4j. We spend our time educating the world about graphs. We wrote a book and give it away for free, we write countless blog posts and encourage our community to do the same, we have over a hundred meet-up groups, we write example models and queries, we host two graph conferences at year and would be happy to speak about graphs anywhere invited.

Do the same. Work on your documentation, work on guides and walk-throughs, write sample applications, help your users on Slack or StackOverflow, start topics on your mailing list, host a meet-up, grow your community. Help the world make sense of data by using graphs, and fight the real enemy.

Want to deliver a whole new level of customer experience? Learn how to make your move from MongoDB to Couchbase Server.

Topics:
multi-model databases ,database ,tutorial ,indexing ,graph traversal

Published at DZone with permission of Max De Marzi, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}