Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Refactoring: Extracting Nodes and Tables

DZone's Guide to

Refactoring: Extracting Nodes and Tables

Refactoring databases is time-consuming and error prone. Read on for an example of how to do it right.

· Database Zone
Free Resource

Learn how to move from MongoDB to Couchbase Server for consistent high performance in distributed environments at any scale.

In my previous blog post, I showed how to add a new property/field to a node with a label/record in a table for a football transfers dataset that I’ve been playing with.

After introducing this ‘nationality’ property, I realized that I now had some duplication in the model:

2016 05 22 10 15 15

players.nationality and clubs.country are referring to the same countries but they’ve both got them stored as strings so we can’t ensure the integrity of our countries and ensure that we’re referring to the same country.

We have the same issue in the graph model as well:

Image title

This time Player.nationality and Club.country refer to the same countries.

We can solve our problem by introducing a countries table in the relational model and a set of nodes with a ‘Country’ label in the graph model. Let’s start with relational.

This is the model we’re driving towards:

Image title

The first thing we need to do is create a countries table and populate it:

CREATE TABLE countries (
    "code" CHARACTER VARYING(3) NOT NULL PRIMARY KEY,
    "name" CHARACTER VARYING(50) NOT NULL
);
INSERT INTO countries VALUES('MNE', 'Montenegro');
INSERT INTO countries VALUES('SWZ', 'Swaziland', 'caf');
...

Next, let’s update the clubs table to reference the countries table:

ALTER TABLE clubs
ADD COLUMN country_id CHARACTER VARYING(3)
REFERENCES countries(code);

And let’s run a query to populate that column:

UPDATE clubs AS cl
SET country_id = c.code
FROM clubs
INNER JOIN countries AS c 
ON c.name = clubs.country
WHERE cl.id = clubs.id;

This query iterates over all the clubs, queries the country table to find the country id for that row and then stores it in the ‘country_id’ field. Finally we can remove the ‘country’ field:

ALTER TABLE clubs
DROP COLUMN country;

Now we do the same drill for the players table:

ALTER TABLE players
ADD COLUMN country_id CHARACTER VARYING(3)
REFERENCES countries(code);
UPDATE players AS p
SET country_id = c.code
FROM players
INNER JOIN countries AS c 
ON c.name = players.nationality
WHERE p.id = players.id;
ALTER TABLE players
DROP COLUMN nationality;

Now it’s time for the graph. This is the model we want to get to:

2016 05 22 10 51 49

First, we’ll create the countries:

CREATE CONSTRAINT ON (c:Country)
ASSERT c.id IS UNIQUE
LOAD CSV WITH HEADERS FROM "file:///countries.csv"
AS row
MERGE (country:Country {id: row.countryCode})
ON CREATE SET country.name = row.country

And now let’s get clubs and players to point at those countries nodes and get rid of their respective nationality/country properties:

MATCH (club:Club)
MATCH (country:Country {name: club.country})
MERGE (club)-[:PART_OF]->(country)
REMOVE club.country
MATCH (player:Player)
MATCH (country:Country {name: player.nationality})
MERGE (player)-[:PLAYS_FOR]->(country)
REMOVE player.nationality

And that’s it, we can now write queries against our new model.

Want to deliver a whole new level of customer experience? Learn how to make your move from MongoDB to Couchbase Server.

Topics:
sql ,nosql ,refactoring ,nodes ,tables ,relational database ,graph database ,neo4j

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}