Over a million developers have joined DZone.

APOC 1.1.0 Release: Awesome Procedures on Cypher

APOC 1.1.0 is out! Read on to see a vast array of the new commands and what they can do for you and your database.

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.

Learn what's new in the 1.1.0 release of in the Awesome Procedures on Cypher (a.k.a. "APOC") library

I'm super thrilled to announce last week's 1.1.0 release of the Awesome Procedures on Cypher (APOC). A lot of new and cool stuff has been added and some issues have been fixed.

Thanks to everyone who contributed to the procedure collection, especially Stefan Armbruster, Kees Vegter, Florent Biville, Sascha Peukert, Craig Taverner, Chris Willemsen, and many more.

And, of course, my thanks go to everyone who tried APOC and gave feedback so that we could improve the library.

The APOC library was first released as version 1.0 in conjunction with the Neo4j 3.0 release at the end of April with around 90 procedures and was mentioned in Emil's Neo4j 3.0 release keynote.

In early May, we had a 1.0.1 release with a number of new procedures especially around free text search, graph algorithms, and geocoding, which was also used by the journalists of the ICIJ for their downloadable Neo4j database of the Panama Papers.

And now, two months later, we've reached 200 procedures that are provided by APOC. These cover a wide range of capabilities, some of which I want to discuss today. In each section of this post, I'll only list a small subset of the new procedures that were added.

If you want get more detailed information, please check out the documentation with examples.

Notable Changes

As the 100 new procedures represent quite a change, I want to highlight the aspects of APOC that got extended or documented with more practical examples.

Besides the apoc.meta.graph functionality that was there from the start, additional procedures to return and sample graph metadata have been added. Some, like apoc.meta.stats, access the transactional database statistics to quickly return information about label and relationship-type counts.

There are now also procedures to return and check of types of values and properties:

CALL apoc.meta.subGraph({config})

Examines a sample sub graph to create the meta-graph, default sampleSize
is 100 config is: {labels:[labels],rels:[rel-types],sample:sample}.

CALL apoc.meta.stats YIELD labelCount,

relTypeCount, propertyKeyCount,

nodeCount, relCount, labels,

relTypes, stats

Returns the information stored in the transactional database statistics.

CALL apoc.meta.type(value)

Type name of a value (INTEGER,FLOAT,STRING,BOOLEAN,RELATIONSHIP,NODE,PATH,
NULL,UNKNOWN,MAP,LIST
)

CALL apoc.meta.isType(value, type)R

returns a row if type name matches none if not.


Data Import/Export

The first export procedures output the provided graph data as Cypher statements in the format that neo4j-shell understands and that can also be read with apoc.cypher.runFile.

Indexes and constraints, as well as batched sets of CREATE statements for nodes and relationships, will be written to the provided file path.

apoc.export.cypherAll(file, config)

Exports whole database incl. indexes as cypher statements to the provided file.

apoc.export.cypherData(nodes, rels, file, config)

Exports given nodes and relationships incl. indexes as cypher statements to the provided file.

apoc.export.cypherQuery(query, file, config)

Exports nodes and relationships from the cypher statement incl. indexes as cypher statements to the provided file.

Data Integration with Cassandra, MongoDB, and RDBMS

Making integration with other databases easier is a big aspiration of APOC.

Being able to directly read and write data from these sources using Cypher statements is very powerful. As Cypher is an expressive data processing language that allows a variety of data filtering, cleansing, and conversions and preparing of the original data.

APOC integrates with relational (RDBMS) and other tabular databases like Cassandra using JDBC. Each row returned from a table or statement is provided as a map value to Cypher to be processed.

And for ElasticSearch, the same is achieved by using the underlying JSON-HTTP functionality. For MongoDB, we support connecting via their official Java driver.

To avoid listing full database connection strings with usernames and passwords in your procedures, you can configure those in $NEO4J_HOME/conf/neo4j.conf using the apoc.{jdbc,mongodb,es}.<name>.url config parameters, and just pass name as the first parameter in the procedure call.

Here is a part of the Cassandra example from the data integration section of the docs using the Cassandra JDBC Wrapper.

Entry in neo4j.conf:
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist

CALL apoc.load.jdbc('cassandra_songs', 'track_by_artist') YIELD row
MERGE (a:Artist {name: row.artist}) 
MERGE (g:Genre {name: row.genre}) 
CREATE (t:Track {id: toString(row.track_id), title: row.track, length: row.track_length_in_seconds}) 
CREATE (a)-[:PERFORMED]->(t) 
CREATE (t)-[:GENRE]->(g); // Added 63213 labels, created 63213 nodes, set 182413 properties, created 119200 relationships. 

For each data source that you want to connect to, just provide the relevant driver in the $NEO4J_HOME/plugins directory as well. It will then automatically be picked up by APOC.

Even if you just visualize which kind of graphs are hidden in that data, there is already a big benefit of being able to do that without leaving the comfort of Cypher and the Neo4j Browser.

To render virtual nodes, relationships and graphs, you can use the appropriate procedures from the apoc.create.* package.

Controlled Cypher Execution

While individual Cypher statements can be run easily, more complex executions — like large data updates, background executions, or parallel executions — are not yet possible out of the box.

These kinds of abilities are added by the apoc.periodic. and the apoc.cypher. packages. Especially, apoc.peridoc.iterate and apoc.periodic.commit are useful for batched updates.

Procedures like apoc.cypher.runMany allow execution of semicolon-separated statements and apoc.cypher.mapParallel allows parallel execution of partial or whole Cypher statements driven by a collection of values.   

CALL apoc.cypher.runFile(file or url) yield row, result

Runs each statement in the file, all semicolon separated — currently no schema operations.

CALL apoc.cypher.runMany('cypher;\nstatements;',{params})

Runs each semicolon separated statement and returns summary — currently no schema operations.

CALL apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value

executes fragment in parallel batches with the list segments being assigned to _.

   

CALL apoc.periodic.commit(statement, params)

Repeats a batch update statement until it returns 0, this procedure is blocking.

CALL apoc.periodic.countdown('name',statement,delay-
in-seconds)

Submit a repeatedly-called background statement until it returns 0.

CALL apoc.periodic.iterate('statement returning
items', 'statement per item',
{batchSize:1000,parallel:true}) YIELD batches, total

Run the second statement for each item returned by the first statement. Returns number of batches and total processed rows.


Schema/Indexing

Aside from the manual index update and query support that was already there in APOC 1.0, more manual index management operations have been added.

   

CALL apoc.index.list() - YIELD type,name,config

Lists all manual indexes.

CALL apoc.index.remove('name') YIELD type,name,config

Removes manual indexes.

CALL apoc.index.forNodes('name',{config}) YIELD type,name,configG

Gets or creates manual node index.

CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config

Gets or creates manual relationship index.


There is pretty neat support for free text search that is also detailed with examples in the documentation. It allows you, with apoc.index.addAllNodes, to add a number of properties of nodes with certain labels to a free text search index which is then easily searchable with apoc.index.search.

apoc.index.addAllNodes('index-name',{label1:['prop1',…],…})

Add all nodes to this full text index with the given properties; additionally populates a "search" index.

apoc.index.search('index-name', 'query') YIELD node, weight

Search for the first 100 nodes in the given full text index matching the given Lucene query returned by relevance.


Collection and Map Functions

While Cypher has already great support for handling maps and collections, there are always some capabilities that are not possible yet. That's where APOC's map and collection functions come in. You can now dynamically create, clean, and update maps.

apoc.map.fromPairs([[key,value],[key2,value2],…])

Creates a map from list with key-value pairs.

apoc.map.fromLists([keys],[values])

Creates a map from a keys and a values list.

apoc.map.fromValues([key,value,key1,value1])

Creates a map from alternating keys and values in a list.

apoc.map.setKey(map,key,value)

Returns the map with the value for this key added or replaced.

apoc.map.clean(map,[keys],[values]) yield value

Removes the keys and values (e.g. null-placeholders) contained in those lists, good for data cleaning from CSV/JSON.


There are means to convert and split collections to other shapes and much more.   

apoc.coll.partition(list,batchSize)

Partitions a list into sublists of batchSize.

apoc.coll.zip([list1],[list2])

All values in a list.

apoc.coll.pairs([list])

Returns `[first,second],[second,third], …

apoc.coll.toSet([list])

Returns a unique list backed by a set.

apoc.coll.split(list,value)

Splits collection on given values rows of lists, value itself will not be part of resulting lists.

apoc.coll.indexOf(coll, value)

Position of value in the list.


You can UNION, SUBTRACT, and INTERSECTION collections and much more.

apoc.coll.union(first, second)

Creates the distinct union of the two lists.

apoc.coll.intersection(first, second)

Returns the unique intersection of the two lists.

apoc.coll.disjunction(first, second)

Returns the disjunct set of the two lists.


Graph Representation

There are a number of operations on a graph that return a subgraph of nodes and relationships. With the apoc.graph.* operations you can create such a named graph representation from a number of sources.

apoc.graph.from(data,'name',{properties}) yield graph

Creates a virtual graph object for later processing it tries its best to extract the graph information from the data you pass in.

apoc.graph.fromPaths([paths],'name',{properties})

Creates a virtual graph object for later processing.

apoc.graph.fromDB('name',{properties})

Creates a virtual graph object for later processing.

apoc.graph.fromCypher('statement',{params},'name',{properties})

Creates a virtual graph object for later processing.


The idea is that on top of this graph, representation of other operations (like export or updates) as well as graph algorithms can be executed. The general structure of this representation is:

{
    name:"Graph name",
    nodes:[node1,node2],
    relationships: [rel1,rel2],
    properties:{key:"value1",key2:42}
}

Plans for the Future

Of course, it doesn't stop here. As outlined in the readme, there are many ideas for future development of APOC.

One area to be expanded is graph algorithms and the quality and performance of their implementation. We also want to support import and export capabilities, for instance for graphml and binary formats.

Something that, in the future, should be more widely supported by APOC procedures is to work with a subgraph representation of a named set of nodes, relationships, and properties.

Conclusion

There is a lot more to explore, just take a moment and have a look at the wide variety of procedures listed in the readme.

Going forward I want to achieve a more regular release cycle of APOC. Every two weeks there should be a new release so that everyone benefits from bug fixes and new features.

Now, please:
Cheers,
Michael

Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

Topics:
algorithms ,database ,library ,neo4j ,cypher ,graph algorithms

Published at DZone with permission of Michael Hunger. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}