Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j: A First Attempt at Retail Product Substitution

DZone's Guide to

Neo4j: A First Attempt at Retail Product Substitution

· Java Zone
Free Resource

Learn how to troubleshoot and diagnose some of the most common performance issues in Java today. Brought to you in partnership with AppDynamics.

One of the interesting problems in the world of online shopping from the perspective of the retailer is working out whether there is a suitable substitute product if an ordered item isn’t currently in stock.

Since this problem brings together three types of data – order history, stock levels and products – it seems like it should be a nice fit for Neo4j so I ‘graphed up’ a quick example.

I wrote the following cypher to create some products, a person, a few orders and then the availability of those products in an imaginary store.

CREATE (bakedBeans :Category {name: "Baked Beans"} )
CREATE (fruit :Category {name: "Fruit"} )
 
CREATE (hbb :Product {name: "Heinz Baked Beans", type: "brand"} )
CREATE (bbb :Product {name: "Branstone Baked Beans", type: "brand"} )
CREATE (sbb :Product {name: "Sainsbury's Baked Beans", type: "own"} )
CREATE (apple :Product {name: "Bag of Apples"} )
 
CREATE UNIQUE (hbb)-[:HAS_CATEGORY]->(bakedBeans)
CREATE UNIQUE (bbb)-[:HAS_CATEGORY]->(bakedBeans)
CREATE UNIQUE (sbb)-[:HAS_CATEGORY]->(bakedBeans)
 
CREATE (southwark :Store {name: "Southwark"})
 
CREATE UNIQUE (southwark)-[:HAS_IN_STOCK {availability: 0}]->(hbb)
CREATE UNIQUE (southwark)-[:HAS_IN_STOCK {availability: 2}]->(bbb)
CREATE UNIQUE (southwark)-[:HAS_IN_STOCK {availability: 10}]->(sbb)
CREATE UNIQUE (southwark)-[:HAS_IN_STOCK {availability: 10}]->(apple)
 
CREATE (mark :Person {name: "Mark"})
 
CREATE (order1 :Order {id: 1, date: 1380884632})
 
CREATE UNIQUE (order1)-[:CONTAINS {count: 1}]->(hbb)
CREATE UNIQUE (order1)-[:CONTAINS {count: 5}]->(apple)
CREATE UNIQUE (mark)-[:PLACED_ORDER]->(order1)
 
CREATE (order2 :Order {id: 2, date: 1380885051})
 
CREATE UNIQUE (order2)-[:CONTAINS {count: 1}]->bbb
CREATE UNIQUE (mark)-[:PLACED_ORDER]->(order2)

We may then have a new order which we’re trying to fulfil and we want to check whether the products are in stock.

First we’ll create the order:

// Create the order
CREATE (order3:Order {id: 3, date: 1380895051})
 
WITH order3
 
// Assign the order to Mark
MATCH (p:Person)
WHERE p.name = "Mark"
CREATE UNIQUE (p)-[:PLACED_ORDER]->(order3)
 
WITH order3
 
// Populate the order with some products
MATCH (p:Product)
WHERE p.name = "Heinz Baked Beans" 
CREATE UNIQUE (order3)-[:CONTAINS {count: 2}]->(p)
 
WITH order3
 
MATCH (p:Product)
WHERE p.name = "Bag of Apples"
CREATE UNIQUE (order3)-[:CONTAINS {count: 2}]->(p)

Now let’s check on the availability of each item in the order in a particular store:

// Find the products in the order
MATCH (o:Order)-[c:CONTAINS]->(product)
WHERE o.id = 3
WITH product, c.count AS count
 
// Check which items are out of stock in our store
MATCH (s:Store)-[inStock:HAS_IN_STOCK]->(product)
WHERE s.name = "Southwark"
RETURN product, inStock
 
==> +--------------------------------------------------------------------------------------------+
==> | product                                            | inStock                               |
==> +--------------------------------------------------------------------------------------------+
==> | Node[11444]{name:"Heinz Baked Beans",type:"brand"} | :HAS_IN_STOCK[60053]{availability:0}  |
==> | Node[11447]{name:"Bag of Apples"}                  | :HAS_IN_STOCK[60056]{availability:10} |
==> +--------------------------------------------------------------------------------------------+

Now if we change that query to only return items which have less items in stock than we attempted to order we’ll see that Heinz Baked Beans aren’t available:

// Find the products in the order
MATCH (o:Order)-[c:CONTAINS]->(product)
WHERE o.id = 3
WITH product, c.count AS count
 
// Check which items are out of stock in our store
MATCH (s:Store)-[inStock:HAS_IN_STOCK]->(product)
WHERE s.name = "Southwark" AND count > inStock.availability
RETURN product, inStock
 
==> +-------------------------------------------------------------------------------------------+
==> | product                                            | inStock                              |
==> +-------------------------------------------------------------------------------------------+
==> | Node[15281]{name:"Heinz Baked Beans",type:"brand"} | :HAS_IN_STOCK[86079]{availability:0} |
==> +-------------------------------------------------------------------------------------------+
MATCH (p:Person)-[:PLACED_ORDER]->order-[c:CONTAINS]->product-[:HAS_CATEGORY]->category
WHERE p.name = "Mark" AND category.name = "Baked Beans" AND order.id <> 3
RETURN product.name, product.type, order.id
 
==> +---------------------------------------------------+
==> | product.name            | product.type | order.id |
==> +---------------------------------------------------+
==> | "Heinz Baked Beans"     | "brand"      | 1        |
==> | "Branstone Baked Beans" | "brand"      | 2        |
==> +---------------------------------------------------+

Having identified Mark as a connoisseur of branded baked beans, we might then run a query to check if there are any other branded baked beans available in that store:

MATCH (s:Store)-[inStock:HAS_IN_STOCK]->p-[:HAS_CATEGORY]->c
WHERE s.name = "Southwark" 
AND c.name = "Baked Beans" 
AND inStock.availability > 0 
AND p.type = "brand"
RETURN p.name, inStock.availability
 
==> +------------------------------------------------+
==> | p.name                  | inStock.availability |
==> +------------------------------------------------+
==> | "Branstone Baked Beans" | 2                    |
==> +------------------------------------------------+

This is obviously an extremely naive approach so I went to twitter in the hope that I could find a more sophisticated approach:

Nicole White is currently playing around with association rule mining which sounds interesting.

It is described on its Wikipedia page like so:

Based on the concept of strong rules, Rakesh Agrawal et al.[2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets.

Although it’s not exactly the same as what I want to do I need to look into it more to see if some of the ideas can be applied.

I also learnt that the terminology for what I’m looking for is a ‘similar items‘ algorithm and I think what I’m looking to spike would be a hybrid recommender system which combines content similarity and user’s previous purchase history.

I’ve been looking around to see if there are any open or anonymised retail data sets to play around with but all I’ve come across is the ‘Frequent Itemset Mining Dataset Repository‘. Unfortunately when I tried to open the files they seem to just contain random numbers, so I must be doing something wrong.

If anyone knows of a retail data set I can play around with please point me in the right direction.

Understand the needs and benefits around implementing the right monitoring solution for a growing containerized market. Brought to you in partnership with AppDynamics.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}