Wednesday 30 November 2016

Exploring the Paris Terrorist Attack network - part 2/3

In part 1 of this blogpost series, we got the basic Paris Terrorist Attack Network loaded into Neo4j. It looked like this:
There's a couple things that annoyed be about this graph:

  1. First, the relationships are all "bidirectional", which really clutters the visualisation. In Neo4j, relationships are always directed, which kind of makes it awkward to store these bi-directional relationships like this. 
  2. Of course, this graph was originally made by De Standaard newspaper in Flanders, Belgium, so therefore it was created in Dutch. A couple of the key concepts though (type of node, status of the node) would be easily and meaningfully translated for you to have any fun with the dataset.
  3. The graph was not "labeled", and therefore lacked some essential structural elements that would allow for fun manipulation in the Neo4j Browser. 
  4. The relationships did not really say anything about the type of relationship. 
Let's tackle these one by one.

From undirected to directed

So let's try to refactor this basic Neo4j graph a bit so that we can work with it more easily. First, let's get rid of the "bidirectional" relationships. Let's remove half of them using this Cypher query:


//remove duplicate rels
MATCH (n1:Node)-[r1:RELATED_TO]->(n2:Node)-[r2:RELATED_TO]->(n1)
WHERE id(n1)>id(n2)
DELETE r2
RETURN n1,r1,n2;


Then our graph visualisation in the Browser immediately looks much nicer and less cluttered:


Next: the translations.

Translating using CASE

In the next two queries, we will be using Cypher's CASE expressions to translate some of the core and simple concepts from Dutch to English:

MATCH (n:Node)
SET n.type =
CASE
WHEN n.type="persoon" THEN "Person"
WHEN n.type="plaats" THEN "Location"
WHEN n.type="transport" THEN "Transportation"
END
WITH n
SET n.name = n.label
REMOVE n.label
WITH n
SET n.status =
CASE
WHEN n.status="gevangenis" THEN "Jailed"
WHEN n.status="dood" THEN "Deceased"
END;

Easy! Next: "colouring" the graph by adding labels to the nodes and more type information to the relationships:

Adding labels to the graph 

As you may have seen above / in the previous blogpost, there is actually quite a bit of structural information with regards to the Nodes in the original .json file. Every node actually has a "type" property, which specifies if a specific entity is a Person, a Location, or a means of Transportation. So why not convert that type property into a Neo4j Label? We could do something like

//colour the nodes without parameters as labels
MATCH (n:Node {type:"Person"})
SET n:Person;
MATCH (n:Node {type:"Location"})
SET n:Location;
MATCH (n:Node {type:"Transportation"})
SET n:Transportation;
but frankly that would be pretty ugly. What would we do if we had missed one of the type property's values? So that's why I chose to use another APOC to do this in a parametrized way:

//colour the nodes
MATCH (n:Node)
CALL apoc.create.addLabels( id(n), [ n.type​ ] ) YIELD node
RETURN n;

So then we very quickly get this result:

You will notice that the latest and greatest edition of the Neo4j browser allows you to specify little icons to denote the meaning of the nodes in the graph, which is a great addition.

Adding relationship info to the graph.

Last but not least, I thought it would be useful to understand a bit more about the nature of the relationships - but that information was not readily available in the .json file that the newspaper provided. So I did a bit of manual laber with that other Swiss Army knif of data wrangling, Google Sheets. Here's what I did:

  • I imported the Json into a google sheet using the technique described here
  • I added the script to my google sheet
  • I called the function in the "D3 - nodes" and "D3 - relationships" worksheets, and loaded the info there
  • I copied the "D3 - relationships" sheet into a "Neo4j - relationships" sheet so that I could do some editing on it.

You can find the entire Google sheet over here.

Once it is in there, we can essentially use our trusted "Load CSV" process to manipulate the data. That's when I basically had two options:

1. Adding new relationship types:

Similar to what we did above with labels, using this APOC, which reads the data from the CSV, and then based on the csv.RELTYPE column it creates new relationships with that specific type:

//colour the rels with reltypes
LOAD CSV WITH HEADERS FROM "https://docs.google.com/a/neotechnology.com/spreadsheets/d/1nN0Ba-1Eoy_-VfnfFAMi-HhVT0TPOpFz94Gvm5iq3Ss/export?format=csv&id=1nN0Ba-1Eoy_-VfnfFAMi-HhVT0TPOpFz94Gvm5iq3Ss&gid=572100379" AS csv
WITH csv
WHERE not csv.RELTYPE="??"
MATCH (n1:Node {ID: toInt(csv.Source)}), (n2:Node {ID: toInt(csv.Target)})
CALL apoc.create.relationship(n1,csv.RELTYPE,null, n2) YIELD rel
RETURN n1, rel, n2;


However, when I looked at that I found it a bit confusing:
It actually added back some of the clutter that we had removed by removing the bi-directional relationships. So I thought I would take another approach: we can also "colour" the relationships by adding properties to the relationships instead of adding new relationship types.

2. Adding properties to the relationships

So this is how we proceed if we want to store this info as properties on the RELATED_TO relationships. This is how we do this:

//colour the rels with relproperties
LOAD CSV WITH HEADERS FROM "https://docs.google.com/a/neotechnology.com/spreadsheets/d/1nN0Ba-1Eoy_-VfnfFAMi-HhVT0TPOpFz94Gvm5iq3Ss/export?format=csv&id=1nN0Ba-1Eoy_-VfnfFAMi-HhVT0TPOpFz94Gvm5iq3Ss&gid=572100379" AS csv
WITH csv
WHERE not csv.RELTYPE="??"
MATCH (n1:Node {ID: toInt(csv.Source)})-[rel:RELATED_TO]-(n2:Node {ID: toInt(csv.Target)})
SET rel.type = csv.RELTYPE
RETURN n1, rel, n2;

Then, we could also visualise that data by using a relationship property as a visualization caption:
And that's it!

So now we are ready for querying this graph. We'll do that in the 3rd and final blogpost, which I will publish shortly.

Hope this is useful to you.

Cheers

Rik

3 comments:

  1. I have run the scripts on a new Neo4j 3.1 database and my results look nothing like yours.
    Half of the names are missing and no icons are displayed. Is there some detail missing from the article?
    Thanks for the great article BTW.

    ReplyDelete
    Replies
    1. I am trying this now :) ... will report back...

      Delete
    2. Hi there Stub... I just reran all the queries on a vanilla 3.1.1 server with apoc 3.1.0.3 (https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/tag/3.1.0.3) and it all works fine. Could it be that you have the Neo4j browser visualisation wrongly configured?

      For the basic colours etc you can follow the instructions as on https://neo4j.com/developer/guide-neo4j-browser/#_configuration ... for the Icons, you need to enable the "experimental features" in the browser configuration, and then they will be configurable from the browser UI... just like you can associate colours with a label, you can also associate basic icons...

      hth

      Rik

      Delete