Friday 2 December 2016

Exploring the Paris Terrorist Attack network - part 3/3

Previously, on this blog, I had started writing about how we could get some of the data published by a local Belgian newspaper, De Standaard, on the Paris Terrorist Attack Network into Neo4j. In
  • Part 1, we talked about loading the raw JSON data into Neo4j, and then in
  • Part 2, we cleaned up some of the data for easy querying in Neo4j. 
So that's where we are. To wrap things up, I just wanted to illustrate some of the results and queries in Neo4j around some of the most interesting figures in this Terrorist network. I started some of my explorations around a widely reported terrorist, and Belgian national, called Salah Abdeslam.


So let's take a look at Salah in Neo4j.


Querying for Salah

Here's the simplest Cypher statement exploring Salah's node and immediate surroundings.

//finding Salah
MATCH (p:Person)-[r]-()
WHERE p.name CONTAINS "Salah"
return p,r;

The result looks like this in the Neo4j browser:



Pretty simple, and of course the Browser allows me to interactively explore the graph step by step.

Pathfinding: links between two locations

Next, we'll do a simple pathfinding query, trying to understand the links between two locations in this tiny little graph. Let's look at the links between Boedapest (sorry, Dutch spelling) and Bobigny. The first was where a bunch of the terrorist group travelled through on their way back from Syria to Europe, and the second was a safehouse used by the group. Here's the query:

//Finding links between items and their surroundings
MATCH (l1:Location {name:"Boedapest"}), (l2:Location {name:"Bobigny"}),
p = allshortestpaths ( (l1)-[*]-(l2))
WITH l1, l2, p
MATCH p1=(l1)-[r1]-(c1), p2=(l2)-[r2]-(c2)
RETURN p, p1, p2


It gives us this result:



And we see how our man Salah plays a pivotal role in this network already. That's why I started to play around with a couple of well-known graph algorithms using APOCs.

Betweenness and Pagerank using APOCs

In our Awesome Procedures, there are some really nice hooks to very well known graph algorithms that could give us a great feel for the importance and significance of some of these graph components. Take a look at them over here, but in this example we will specifically focus on the Betweenness and Pagerank scoring techniques to evaluate the structural characteristics of the network.

Betweenness scoring

Here's how we go about the betweenness scoring using APOC:

//calculate the betweenness using APOC
MATCH (node:Person)
WITH collect(node) AS nodes
CALL apoc.algo.betweenness(['RELATED_TO'],nodes,'BOTH') YIELD node, score
RETURN node.name, score
ORDER BY score DESC
LIMIT 10

The result is kind of what we expected:

Our man Salah is VERY much "between" the different subgroups of the Paris Terrorist Network.

Pagerank scoring

Then we look at the famous Pagerank score, also used by Google and many others in evaluating relevance or significance of network elements. Here's the APOC-based query:

//calculate pagerank using APOC
MATCH (node:Person)
WITH collect(node) AS nodes
// compute over relationships of all types
CALL apoc.algo.pageRank(nodes) YIELD node, score
RETURN node.name, score
ORDER BY score DESC

Then we get this result:



Again, highlighting the importance of our "spider in the web", Salah.

I hope this gave you some ideas as to how to further interact with this dataset. It's tiny in terms of size, but really interesting in terms of content and trying to understand the way these groups work.

Hope this was interesting - this concludes the third and last part of this blogpost series. All the data and queries for this series are available on github, as always.

Cheers

Rik

No comments:

Post a Comment