The Power Behind the Paradise Papers
The Power Behind the Paradise Papers
Neo4j has powered the Paradise Papers release. Read on to see how journalists leveraged a graph database to quickly sift through the mountain of data they uncovered.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Using Neo4j, the ICIJ has built upon their Pulitzer Prize-winning investigation of 2016 — the Panama Papers — and they’ve begun to add politicians featured in early Paradise Paper reports to their Offshore Leaks Database.
The new 1.4 TB of data — 13.4 million documents — includes information leaked from trust company Asiaciti and from Appleby, a 100-year-old offshore law firm specializing in tax havens as well as information leaked. The files were obtained by German newspaper Süddeutsche Zeitung and shared with Washington D.C.-headquartered ICIJ, a network of independent reporting teams around the world.
As in previous investigations, Neo4j plays a key role in revealing the connections between the wealthy, their money, and the taxation-friendly countries in which it resides.
The reason? Graph databases excel at managing highly connected data and complex queries.
Instead of using tables the way a relational database does, graphs use special structures incorporating nodes, properties, and relationships to define and store data, making them highly proficient at analyzing the relationships and any interconnections between data and allowing journalists to “follow the money” easier than ever.
Unprecedented volumes of highly connected data
“Most of the leaks we get are not structured since they are raw documents. With the Paradise Papers, those documents represented 1.4 TB of data and were gathered from different sources. Putting them in a single database was a challenge for us. With Neo4j and [visualization tool] Linkurious, and after a few weeks of research, we were able to propose to our 382 journalists a way to explore the data and also to share visualizations from stories they were working on. It’s surprising how intuitive a graph database can be for non-tech savvy people. Thanks to this approach, we could both investigate and prepare the future releases.”
“It’s a revolutionary discovery tool that’s transformed our investigative journalism process,” she says, “because relationships are all important in telling you where the criminality and secrecy lies, who works with whom, and so on. Understanding relationships at huge scale is where graph techniques excel. At least 11.5m documents, and far larger than any data leaks we have investigated before, we needed a technology that could handle these unprecedented volumes of highly connected data quickly, easily and efficiently.”
“We also needed an easy-to-use and intuitive solution that didn’t require the intervention of any data scientist or developers, so that journalists around the globe would work with the data, regardless of their technical abilities. Linkurious Enterprise was the best platform to explore this data and to share insights in a secure way. Using the Linkurious graph visualization platform with Neo4j is a powerful combination.”
According to Neo4j Co-Founder and CEO, Emil Eifrem:
“Whatever else we can be sure of as the Paradise Papers’ investigation unfolds, it’s only with world-class tools like Neo4j and Linkurious that world-class investigation of vast and complex datasets like this can happen in our Age of Connections. Graph databases are the only option when trying to make sense of the vast terabytes of connected data that we are producing more and more of, and they are an essential tool for international agencies, governments, financial services and security firms trying to uncover the truth.”
Stay Tuned for More Coverage of the Paradise Papers
In the coming days and weeks, the Neo4j team will continue to unveil how graph technology powered the Paradise Papers investigation, including an in-depth look at the ICIJ data model with example queries, graph visualizations, and more.
In the meantime, continue to follow the ICIJ’s Paradise Papers coverage exploring the political and economic dimensions of the investigation as they continue to unfold.
Published at DZone with permission of Bryce Merkl Sasaki , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.