Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using Neo4j to Interpret CSV Files for Sales Compensation in WoW

DZone's Guide to

Using Neo4j to Interpret CSV Files for Sales Compensation in WoW

Compensation planning and calculations in complex sales organizations can be incredibly taxing on traditional databases. Using Neo4j, let's see how we can do it.

· Database Zone
Free Resource

Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%! 

Compensation planning and calculations in complex sales organizations can be incredibly taxing on traditional databases. Sufficiently large organizations often will end up batching these processes overnight or into a weekly job.

For the purposes of this blog post, let’s take a look at Slashco: the hottest multi-level marketing “guild” in the World of Warcraft. Get the full datasets and Cypher queries used in this blog post here.


Slaschco logo World of Warcraft

In this guild, “associates” may join up and sell Slashco’s fine adventuring gear to the many peoples of Azeroth as well as recruit their friends and family to also sell Slashco products. When an associate recruits someone to join the guild, that new member becomes a part of the associate’s “down-stream.” As a royalty for bringing on this new member, associates will receive a small commission of every item sold by members “down-stream” of them.

In this compensation schedule we have sales-mages who will both sell products directly as well as recruit people to also sell products on behalf of Slashco. I created a sample model based on popular multi-level marketing model. It comprises of four income streams:

    • Direct Sales: For each item a team-member sells directly, they earn a commission.
    • Downstream Royalty: A percentage earned of the retail price of items sold by people “within their downstream,” meaning those that they have recruited to work for Slashco and by extension those who their recruits have recruited.
    • Wholesale Profit: A percentage earned on the goods sold to those “within their downstream” (these goods will then be re-sold to consumers.
    • Global Sales Royalty: A percentage of all sales made by those within Slashco (a.k.a. revenue sharing).

Here’s an example of how a typical MLM organization might compensate its associates:

Multi-Level Marketing Sales Compensation Pyramid



We see that, depending on what level you’re at, how you’re compensated both for your direct sales and also for your passive income streams can vary greatly. As such, calculating compensation can be hairy to say the least.

If we were to migrate our data from a legacy sales compensation planner, it would be best to push all of it out into a series of .csv files. In the database world, comma separated value files are often the lowest common denominator—thus for 90% of the projects I work on, I start there.

Let’s pretend we have pushed our old MySQL data into a series of 3 .csv files:

These are idealized dumps where I’ve synthesized what would normally be about a dozen CSV files into three, simply to show an example of how the data looks before and after Neo4j’s ETL process (LOAD CSV).

When inspected by Excel (you could look at them with any other text editor as well), we’ll see a series of headers and rows with values that correspond to those headers. For example in the image below, we can see that a transaction with an ID of 1000000 has a value of 30 under the header of “salesRepID”, 22 under the header of “period”, 7780 under the header of “item1”, etc. etc. We’ll need to tell Neo4j how to interpret this information into a series of nodes and relationships instead of square “row and column” data.

CSV Data for Multi-Level Marketing Sales Compensation

Before we actually load any data into Neo4j, it’s important to know what questions will be important to our business. In this case, we’re modeling a sales compensation tool. Our top queries might be something like:

    1. Commission due to each rep, by period, in accord with compensation rules
    2. Commission due to each rep, annually in accord with compensation rules
    3. Sales Leaderboard
      • Top Sales Rep by Total Sales
      • Top Sales Rep by Largest Deal
    4. Global reporting
      • Sales by period
      • Top selling items
    5. Recommendations
      • What items are most frequently sold together?

When we know these questions, we can then build a model that makes it easy to answer those questions quickly and efficiently.

As a general rule, if a piece of data is important to your use-case “make it physical” meaning, if you’re going to be traversing or filtering based on a specific bit of information, then make it a node or relationship.

For example: Because we know that “deals” are important to our Sales Leaderboard, we’re going to make the idea of a deal (transaction) a specific type of node. We’ll then add information about that transaction, like when it happened, who owns it, and what items the deal contained. However, as a “last step” we’ll be doing some basic math operations on price, which is specific to that item — so we’ll keep that as a property.

Slaschco Sales Compensation Application Data Model

Sales Compensation Data Model



The scripts I’m using to build this example are located in this git repo.

Now that we have a data model, let’s fire up Neo4j and pass in our import script (found here). Essentially what we’re doing is creating a few constraints and indexes, then telling Neo4j how to interpret our CSV files into the above model.

Neo4j CSV Import

Now that we’ve loaded in all of our data, let’s open up our browser and start answering some of our top queries.

Neo4j Browser Graph Visualization

We’ll Work Backwards

5. Recommendation

    • What items are most frequently sold together?
//recommendation engine, what items are most frequently co-sold?
MATCH path = (item:Item)-[:CONTAINS]-(:Transaction)-[:CONTAINS]-(item2:Item)
WHERE id(item) > id(item2)
WITH item, item2, count(distinct path) as instances
ORDER BY instances DESC
LIMIT 3
RETURN item.name, item2.name, instances;


Cypher Query Recommendation Engine

4. Global Reporting

    • Sales by period
    • Top selling items
//total sales volume by period descending
MATCH (p:Period)-[:OCCURED_IN]-(t:Transaction)-[:CONTAINS]-(i:Item)
WITH sum(i.price) as sales, p
ORDER BY sales DESC
LIMIT 10
RETURN sales, p.period;


Cypher Query for Global Sales Reporting

MATCH (t:Transaction)-[:CONTAINS]-(i:Item)
WITH count(distinct(t)) as itemSales, i
ORDER BY itemSales DESC
LIMIT 5
RETURN i.name as name, itemSales as count;

cypher-query-global-sales-reporting-2

3. Sales Leaderboard

    • Top Sales Rep by Total Sales Volume
    • Top Sales Rep by Largest Deal
//Who has sold the most volume?
MATCH (rep)-[:SOLD]-(txn)-[:CONTAINS]-(itm)
WITH rep, round(sum(itm.price)) as volume
ORDER BY volume DESC
LIMIT 5
RETURN rep.name as name, volume;


Cypher Query for a Sales Leaderboard

//Who closed the largest deal?
MATCH (rep)-[:SOLD]-(txn)
WITH rep, txn
MATCH (txn)-[:CONTAINS]-(itm)
WITH rep, txn, round(sum(itm.price)) as dealSize
ORDER BY dealSize DESC
LIMIT 5
RETURN rep.name as name, txn.transactionID as transction, dealSize as `deal size`;


cypher-query-sales-leaderboard-largest-deal


2. Commission Due to Each Rep, Annually in Accord with Compensation Rules

Due to the complexity of the queries, I decided to run them with each level of rep separated out into its own query, however they all follow the basic form of the “what do I do with all this gold” query:


//level 6 comp
MATCH (transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.05) as globalRoyalty
MATCH (big_boss:Person {level:6})<-[r:REPORTS_TO*..]-(downStreamers)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.1)+sum(item.wholesalePrice*.5) + globalRoyalty as downStreamGlobal6, big_boss
MATCH (boss)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.65) + downStreamGlobal6 as tc6, big_boss.name as n6
RETURN tc6, n6;


Sales Compensation Annually Neo4j Query



1. Commission Due to Each Rep, by Period in Accord with Compensation Rules

This looks frighteningly similar to our last query, except we’ve added a short pattern

(transaction)-[:OCCURRED_IN]-(period {period:35})

which will filter out all transactions that occurred in periods that are not the 35th. We still see that over a reasonable sized dataset (100 employees, 20k items, 10k transactions), neo4j is lightning fast.

//level 6 comp with time period
MATCH (transaction)-[:OCCURRED_IN]-(p:Period {period:35})
WITH transaction, p
MATCH (transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.05) as globalRoyalty, p
MATCH (transaction)-[:OCCURRED_IN]-(p:Period {period:35})
WITH globalRoyalty, p, transaction
MATCH (big_boss:Person {level:6})<-[r:REPORTS_TO*..]-(downStreamers)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.1)+sum(item.wholesalePrice*.5) + globalRoyalty as downStreamGlobal6, big_boss, p
MATCH (transaction)-[:OCCURRED_IN]-(p:Period {period:35})
WITH transaction, downStreamGlobal6, big_boss
MATCH (boss)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.65) + downStreamGlobal6 as tc6, big_boss.name as n6
RETURN tc6, n6;


Sales Compensation by Period, Neo4j Query

//kvg

[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j. This post was originally written by Kevin Van Gundy on Kevin Van Gundy’s blog.]

Running out of memory? Never run out of memory with Redis Enterprise databaseStart your free trial today.

Topics:
neo4j ,database ,nosql ,graph database ,world of warcraft ,game dev

Published at DZone with permission of Andreas Kollegger, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}