November 13th, 2015 - A day to remember
Just over two weeks ago, we remembered the sad anniversary of one of the most atrocious and vile terrorist attachs that our generation has seen. It's easy to forget many things in our daily rat race, but I don't think I will easily forget this video, which was all over the internet hours/days after the attack on the Bataclan concert hall in Paris:All it takes is a drop of empathy and humanity to understand the horror that these victims went through. The sound of the one person shouting "Oscar .... Oscar... Oscar..." just keeps on ringing through my head.
Many things have happened since that attack. Belgium had its own attack last March, and then France had another terrible event last July in Nice - the past 12 months were pretty terrible in terms of terrorist attacks in my part of the world.
But we knew that already - so why this blogpost series? And what does it have to do with the wonderful world of graphs that has been the topic of this blog? Well - here's what: my daily newspaper "De Standaard" in Belgium ran an article on the anniversary of the Paris attack, and a very elaborate online edition that really sparked my interest. You can find it over here:
If you scroll down that page, you will very quickly see a very interesting piece of data visualisation:
Yes! A Network! A graph explaining all of the people, locations and transportation vehicles used by the terrorist cell that planned and executed the Paris attack - and that could be linked very easily to the Brussels attack a few months later. Interesting. So of course I had to try to get this data into Neo4j and learn some more about it with some interactive querying and exploration.
Getting the Paris Terrorist Attack Network into Neo4j
In order to do that, I of course had to find the data in some structured format. So I spent some time looking aroundm and finally asked myself the question: where does De Standaard get the data from? So I started looking at the looking at the HTML source of the web page, and very quickly spotted that the network visualization above was rendered using d3.js. So let's look for d3 in the page source:
Ah there it is: there's a reference to a piece of javascript code that is being loaded from http://www.standaard.be/extra/static/d3/201611/bataclan/js/script.js, which is doing the actual rendering. Let's open that script in our browser:
Eureka! there it is: the reference to the data file that is feeding the d3.js visualization: a .json file that is being loaded from http://cdn2.standaard.be/extra/static/d3/201611/bataclan/data/graph.json, and which looks like this:
That looks very doable: there's a section for the "nodes" of the network visualization, and then at the bottom there's also a list of "links" that represent the relationships between the nodes. It's not an ideal candidate for a Neo4j graph (we'll see that later as well when we refactor parts of the graph), but it's a great starting point.
Loading into Neo4j using APOC
As you may know by now from reading this blog, the swiss army knife of Neo4j is called APOC these days. You can find the manual for these procedures over here, and download the .jar files for your 3.0.x server from over here. If you are like me, however, and like to play around with the latest and greatest beta version of Neo4j, then you should use this version for 3.1.x. Why? Well because there is a very nice procedure in there to work with JSON source documents and then manipulate them in Cypher.
Here's how I load the above .json file into Neo4j:
//set up index
CREATE INDEX ON :Node(ID);
//load the graph
WITH "http://cdn2.standaard.be/extra/static/d3/201611/bataclan/data/graph.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.nodes as nodes
CREATE (n:Node)
SET n = nodes
WITH value
UNWIND value.links AS rels
MATCH (n1:Node {ID: rels.source}), (n2:Node {ID: rels.target})
MERGE (n1)-[r:RELATED_TO]->(n2)
RETURN n1,r,n2;
As you can see, it's really easy:
- graph the json
- unwind the "nodes" part of the json as a collection of nodes
- set the properties inside those nodes to the values found in the json
- unwind the "links" part of the json as a collection of relationships
- lookup the starting-node and ending-node of the relationships by their ID-property
- merge the relationship into the pattern of existing nodes.
The result is small:
and looks like this:
So that was really easy. Now we can do some cleaning and pruning of the data - which is the topic of our next blog-post, which we will publish soon.
Hope this is useful - check back soon for more!
Rik
No comments:
Post a Comment