The APOC Spring Release

DZone 's Guide to

The APOC Spring Release

It's been a busy month for Neo4j's APOC library. The spring release includes new functions and tweaks to the procedure compiler and better JSON data selection.

· Database Zone ·
Free Resource

With version 3.0, you can extend Neo4j with user-defined procedures, functions, and, going forward, also aggregate functions. About a year ago during the 3.0 milestone phase, I started to work on the first set of graph refactoring procedures. These evolved into the APOC library, which, at the release of Neo4j 3.0, featured about 100 procedures. With Neo4j 3.1, that grew to about 250 procedures and functions and, as of now, we've reached about 300.

April Is APOC Awareness Month

This month we reward articles that demonstrate how to use APOC to do cool stuff with Graphs, Neo4j and Cypher. Read all about it in the 

announcement blog post



With the beginning of spring, we gathered the contributions during the long winter nights and released three new versions for your pleasure.

You can find the releases for the different Neo4j versions here:

If you want to learn more about the exisiting APOC feature set, please visit the procedures-gallery on neo4j.com, the APOC documentation, :play http://guides.neo4j.com/apoc in your Browser or read the past blog articles on the topic.

New Feature Contributions

But let’s look at some of the new features since the last release in December.

Stefan Armbruster

Stefan Armbruster worked on automatizing the “manual” index updates, which you can enable with apoc.autoUpdate.enabled=true in your neo4j.conf. You also need a autoUpdate:true configuration setting in your manual index definition. He also added support for mixed content to apoc.load.xml, and provided the apoc.test.regexGroups functions for extracting parts of regular expressions.

Andrew Bowman

Andrew Bowman started his first contributions this month but already added:

  • apoc.coll functions: shuffle(), randomItem(), randomItems(), containsDuplicates(), duplicates(), duplicatesWithCount(), occurrences(), reverse()
  • apoc.path procedures: subgraphNodes(), subgraphAll(), and spanningTree()
  • apoc.date functions: convert() and add()
  • apoc.algo functions: cosineSimilarity(), euclideanDistance(), euclideanSimilarity()
  • Extended the capabilities for the apoc.path.expand procedure with new operators for filtering end nodes, limits, excluding start node from filters and more.

MATCH (p1:Employee)
MATCH (p2:Role {name:'Role 1-Analytics Manager'})
MATCH (sk:Skill)<-[y:REQUIRES_SKILL]-(p2)
WITH p1, p2,
     collect(coalesce(x.proficiency,0)) as xprof,
     collect(coalesce(y.proficiency,0)) as yprof
RETURN p1.name as name, 
       apoc.algo.cosineSimilarity(xprof, yprof) as cosineSim

Florent Biville

Florent Biville added a new feature to the procedure compiler that allows us to generate the tabular information about procedures and functions automatically to be included in the documentation. That includes this really nice, searchable table at the beginning of the docs.

Tomaz Bratanic

Tomaz Bratanic submitted including a weight property as an improvement to the Gephi Streaming capability. He also wrote a really nice blog post about it.

The Larus Team

I’m also very happy to announce that our partner Larus BA from Vencice, Italy, will support me going forward in working on APOC in a more focused manner. With the help of their team, we will take care of the open issues and feature requests and also add new cool stuff to APOC. They already addressed a number of issues which are included in this release. For example honoring Neo4j’s import directory configuration, handling ElasticSearch scroll results, and following redirects when loading from files.

Michael Hunger 

I spent some time bugfixing (graphml export, TTL, setting array properties, more robust startup). I also worked on improving the documentation, now there are independent versions of the docs published for the different versions.

Something I wanted for a longer time is to improve the performance of apoc.periodic.iterate, which is used for managing large scale updates or data creation with batched transactions. If you now provide iterateList:true, it will execute the inner statement only once but with prepending an UNWIND. Prefixing your inner statement with WITH {foo} AS foo for each return value is also no longer necessary.

For conflicting queries, you can now for instance use retries:5. See also my blog post about performant updates with Cypher.

For quite a while I wanted to add json-path support to APOC's load.json procedure and the different JSON functions. Now this allows you to reach into a JSON document and pull out only the data you’re interested in:

Question authors from StackOverflow using json-path:

WITH "http://bit.ly/so_neo4j" AS url
CALL apoc.load.json(url,'$.items.*owner.display_name') YIELD value
UNWIND value.result as name
RETURN name, count(*)

Bitwise operations were turned into a function I added apoc.text.format, .lpad, .rpad and added new functions for creating virtual nodes and relationships. Some missing procedures for updating/removing labels, properties, and relationships were also added. I also added support for gzipped streams for load CSV and load XML, in the future we want to add more protocols here, e.g. “hdfs://” and allow URLs to follow redirects, so stay tuned.

If you have any feedback to existing functionality, bug reports of feature requests, please let us know by issuing them to the repository.

And if you like APOC, please don’t forget to star it on GitHub.

apoc, database, graph database, neo4j

Published at DZone with permission of Michael Hunger , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}