Over a million developers have joined DZone.

Graph Analytics From Oracle Labs — PGX 1.2

Parallel Graph Analytics (PGX) is a technology from Oracle Labs—the organization that Sun Microsystems' Sun Labs became after Oracle bought Sun. PGX 1.2 was just released. Read on for the details.

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.

Parallel Graph Analytics (PGX) is a technology from Oracle Labs—the organization that Sun Microsystems' Sun Labs became after Oracle bought Sun. PGX 1.2 was just released.

Like Neo4J or GraphX, it does graph analytics—represent data as a graph and run analyses and pattern matching on it. However, it offers, in some cases, orders of magnitude better performance. It does that in several ways—the DSL you use to write analytics algorithms is compiled to highly parallel Java code, and the declarative SQL-like pattern matching language, PGQL, is similarly parallelized—and the runtime is highly optimized for memory footprint.

Graph analytics is really useful for all sorts of things. The obvious ones are things like your Facebook friends graph—find people you probably know because your friends know them, and that sort of thing. What's less obvious is all the other things you can use it for. Some examples:

  • Let nodes represent people and insurance claims; look for patterns where the same people appear on both sides of several claims, filtered for geographical proximity. You've found an insurance fraud ring.
  • Let nodes represent Java methods that call other Java methods. Run PageRank or another centrality algorithm, and index how central—how important—each method is. When a Git commit modifies an important method, send an email to the team asking for review.
  • A recommendation engine - matrix factorization lets you take a graph of users and items they recommended, and synthesize "features"—latent categorization—that lets you predict other items a user will recommend highly—or the reverse, find users who will be interested in an item.

The point here is that, once you start doing graph analysis and get used to thinking in terms of graphs, you have this epiphany that there are all sorts of things you can learn. What graph analysis does is surface latent information that's encoded in the structure of a set of relationships, and those relationships can be as easily parts, suppliers, and products as they can be Facebook friends or other obvious things. The software industry has barely scratched the surface on the kinds of things graph analytics can be used for.

Nodes and edges in a graph have properties—key/value pairs—that can be used when computing an analysis. In PGX, running an analysis usually results in synthesizing new properties on components of the graph, and those can then be used in pattern-matching queries. So, PGX really provides full-service graph analytics in a single package.

Graph analytics don't work well in SQL databases—for a lot of typical graph-questions you'd like to answer, you'd have to JOIN one table on itself n times (and you don't actually know the value of n except that it could be the entire row count of the table). That is something SQL databases don't perform well at.

So, starting PGX in local mode is pretty simple - and it can handle shockingly large graphs on a laptop. You just download, install it and run 

$PGX_HOME/bin/pgx

 to start the interactive Groovy shell (it also has a Java API and REST API built in):

foo@bar ~/work/lib/pgx $ bin/pgx
PGX Shell 1.2.0-SNAPSHOT
type :help for available commands
02:11:11,961 [main] INFO Ctrl$2 - >>> PGX engine running.
variables instance, session and analyst ready to use
pgx>

Loading a graph in a plaintext format such as edge-list or graphml is simple - you write a small JSON file that describes the schema of the graph and the format, then

pgx> graph = session.readGraphWithProperties('myGraph.json');

And, you can immediately run any of the built-in algorithms on it:

pgx> analyst.countTriangles(graph, true);
==> 23

Custom analytics algorithms are written in a language called Green-Marl that treats graph elements as first-class citizens, has common operations such as breadth-first-search as (parallelizable) language-constructs, and conveniences like initializing a vector with a default value (or random values) in a single line of code.

For example, here is the classic PageRank algorithm:

procedure pagerank(G: graph, e,d: double, max_iter_count: int;
                   pg_rank: nodeProp) {
    double diff;
    int cnt = 0;
    double N = G.numNodes();
    G.pg_rank = 1 / N;
    do {
        diff = 0.0;
        foreach (t: G.nodes) {
            double val = (1-d) / N + d* 
                sum(w: t.inNbrs) {w.pg_rank / w.outDegree()} ;
            diff += | val - t.pg_rank |;
            t.pg_rank <= val; } cnt++; } while ((diff > e) && (cnt < max_iter_count));
}

To load and run this, you would simply do this:

pgx> program = session.compileProgram("pagerank.gm");

Pattern matching, on the other hand, uses a declarative SQL-like language called PGQL, that allows for matching on node and edge properties has features similar to those of SQL. For example, say you believe the proverb the enemy of my enemy is my friend - and you have a graph of who is feuding with whom. This query will find, given an input node, the list of the enemies of their enemies:

pgx> resultSet = G.queryPgql("SELECT x.name, z.name WHERE x -[e1 WITH label = 'feuds']-> y, y -[e2 WITH label = 'feuds']-> z");

Anyway, this is too short an article to describe all of the things you can do with PGX. You can download the PGX technology preview from Oracle Labs here: http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytics/overview/index.html

PGX is also incorporated into Oracle Big Data Spatial and Graph for commercial use: https://www.oracle.com/database/big-data-spatial-and-graph/index.html

Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

Topics:
java ,graph ,graph algorithms ,graph database ,graph databases ,graph db ,graph analytics

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}