The purpose of
this post is to give a common footing for those reading to understand
what I mean when I talk about a "graph". The field of graph theory is a
very deep and well-explored one. The real trick here is trying to give
an introduction to graphs and graph theory without getting lost in too
much detail. I will endeavour to keep this post to the utter,
simplified basics of graphs.
(My apologies to
anyone who already knows all about graph theory; this is meant more for
those who might not have been exposed to graphs before, as well as to
establish some common terms and nomenclature I'll be using in this blog.
If some of the definitions seem too simplistic, please bear in mind
I'm trying to just get the basics out there so anyone can understand the
When most people think "graphs", they think of a series of bars or lines
depicting their Q4 growth projections. (Ok, most marketing and sales
people I talk to do, anyway.)
We also think of graphs like the following (which is one of my personal favourites from our good friend Jay-Z):
Alas, these are
not the graphs I'm referring to in this blog. The kind I'm talking
about is a little bit different; though, at its core, perhaps we can
take a graph to mean (at a very oversimplified level), "a graphical
representation of data".
The kind of graph
I'm referring to in this blog has to do with the representation of
objects and their interconnections. Because I'm terrible at drawing
(and lazy), I'll show a simple graph that illustrates what I'm saying
and then try to describe what's going on.
graph-fiends (yes, I just made that horrible pun up), is one example of a
graph. What does this graph consist of? Here's what we have:
- 6 vertices or nodes (the circles), and,
- 7 edges or relationships (the lines between the circles).
At a first
glance, that's all there is to it. In mathematics, we commonly use the
terms "vertices" and "edges" (the vertices being the circles/points, the
edges being the lines between); however, it's likely a bit more
intuitive to most to use the terms "nodes" and "relationships" in place
of "vertices" and "edges", respectively. I'll try to stick to using
"nodes" and "relationships", though I may occasionally use the other
Is a graph still a
graph if there are no edges involved? Absolutely. This is one example
of a "null graph". (For those interested, at the other end of the
spectrum, a "complete graph" is one whose vertices are all
connected/adjacent to each other, for a total of n(n-1)/2 edges, where n
is the number of vertices.)
can be "undirected" (like the example above), or "directed" (like the
example below). Directed edges have a source and a destination (this is
useful for showing relationships between two objects). Undirected
edges can also be used to show a bi-directional relationship (i.e. one
that exists in both directions); for example, Tom likes Dana, and Dana
Still with me?
So, we know that a graph consists of a series of vertices or nodes that can be connected via edges or relationships. These edges can be undirected or directed.
At its core, that is what a graph is.
By now, if you
haven't already, you're seeing how this might be useful to represent
highly connected data. Social networks are a great example (think of
how you could represent your "friendships" on Facebook or MySpace with a
graph; this is actually called a "Likes" graph). Try to think of other information you could represent in graph fashion.
What about a family tree? For sure. As another matter of fact, a tree is actually a type of graph that is directed and acyclical (meaning that there are no loops, i.e. if you start following the directions on a graph, you travel each node exactly once).
Weight is another concept we come across in graph theory and graph databases. Think of weight as being how important or strong a relationship is between two nodes. For example, a higher weight can imply a stronger relationship between two nodes representing, say, two friends.
This is an important concept as we can use it to determine a shortest path or least costly path between two nodes. (See an explanation of paths further below.)
One slightly more complicated idea is that of degree, which represents the number of edges attached to a node. So, a node with a high degree will be one that has a lot of relationships attached
to it. This can be very useful for determining just how popular, for
example, something or someone is. (Cue jokes for this blog.)
One last concept to get out there before wrapping this article up is the concept of a walk or traversal. A traversal is taken to be the resultant path found by starting at a node X and following to some other node on the graph Y and the edges and nodes visited
during the trip from X to Y. This concept becomes important as we
start to look in to graph-based databases as traversals are part of what
we can use to extract useful information from a graph (e.g. "Who are
all of my cousins?") and to even predict and recommend a product for
someone based on their browsing history.
Ok, I think
that's enough math-like stuff for one post. The definitions given in
this post don't even scratch the surface of what's out there in the
world of graph theory. While I'm no serious
graph theorist academically, I'm sure there are some standard texts out
there. That said, as mentioned earlier, Wikipedia is really a great
resource for exploring the world
of graph theory
a bit more.
other graph-related concepts as they become needed (e.g. searching or
path-finding); however, if there are requests for discussing specific
graphing concepts, I'd be happy to address them (provided I can speak
intelligently to them, of course).
Now go weird your friends out with jokes that their family tree isn't really a tree as it has loops!