Wednesday, 20 July 2016

Graphing the Tour de France - part 3/3

In the past two blogposts I have been creating and importing some nice Tour de France 2016 data. It's a small dataset, for sure, and this is by no means a realistic graph application - but perhaps we can still have some fun exploiting the data with some cypher queries. That's what we'll try now. I have put all of the example queries together in this gist, so please feel free to play around with it :) ... let's take you through it.

Is the model really there?

First and foremost, let's verify the model that we wanted to put in place, with yet another AAPOC (Awesome APOC). We thought we were going to get this model:

and yes indeed - we got the same thing (pivoted 90 degrees). If we call the following apoc
CALL apoc.meta.graphSample(1000)
we immediately get a view of what the model of the dataset in our database is like. The above apoc does that by sampling the dataset - which of course is not really needed in this example - we could also do
CALL apoc.meta.graph
which does not sample (and takes the whole graph into account) and get the same result.
I have been tempted to add one more thing to the model (a JerseyType) - but for all intents and purposes everything we wanted to be there is there.

Some data exploration: subgraphs

One of the nicest things about the graph, I think, is that it's pretty easy and attractive to just explore the data based on some simple initial queries. I like to start by looking at some initial labels, and explore from there, like this for example:

//look at the stages subgraph
MATCH (n:Stage)-[r]-() RETURN n,r LIMIT 25
which returns
 or for example a two-hop exploration of the rider subgraph:
//look at the rider subgraph
MATCH (n:Rider)-[r*..2]-() RETURN n,r LIMIT 25
which returns

More data exploration: paths

Of course, one of the cooler types of queries that we can do in Neo4j, are the path-oriented queries that allow me to explore unknown connections between different entities. So let's start by looking at some links between two riders, Greg Van Avermaet (of Belgium, of course) and Chris Froome (super-favorite for this year's victory).
match (r1:Rider), (r2:Rider),p = allshortestpaths ((r1)-[*]-(r2))where r1.fullname contains "Avermaet"and r2.fullname contains "Froome"return plimit 10;
Immediately we can see the links: both of them had the Yellow jersey for a couple of days!

Then we can of course look at links between different entities, like for example the link between two teams:
match (t1:Team), (t2:Team),
p = allshortestpaths ((t1)-[*]-(t2))
where contains "ORICA"
and contains "Quick"
return p
limit 10;
Looking at this query result:
we can immediately see that both teams have had some success with stage wins and "White" youth jerseys.

Finally, let's look at the links between a rider and a team:
match (r1:Rider), (t2:Team),
p = allshortestpaths ((r1)-[*]-(t2))
where r1.fullname contains "Froome"
and contains "Quick"
return p
limit 10;
And we can see there's a clear and unique link:

So now let's look at some other, more advanced queries.

More advanced, APOC queries

The new AAPOCs (Awesome APOCs, just as a reminder :) ) give us easy access to some other great graphy query patterns that can really help us.  First of all, let's do some PageRanking using one of the Graph Algo APOCs.
match (t:Team)
with collect(t) as teams
call apoc.algo.pageRank(teams) YIELD node, score
return, score
order by score desc
limit 10
This gives us a feel for the "rank" of the teams in the graph in terms of their connected importance as defined by PageRank:
Or, in the same category of graph algorithms, let's look at the Betweenness centrality. There's another AAPOC for that, so let's use it:
match (r:Rider)
WHERE %2 = 0
with collect(r) as riders
call apoc.algo.betweenness(['ON_PODIUM','HAS_JERSEY'],riders,'BOTH') YIELD node, score
with node.fullname as name, score
where score > 0
return name, score
order by score desc
limit 10
As you can see we are filtering out the 0-scoring nodes, and get the following result:
Rockstar Rider Peter Sagan, FTW!

Of course there are so many other things that we could query for - and this blog post is just an initial start of that. Just to show you that tiny little graph datasets can be FUN to explore with Neo4j - I hope I have shown you that much!

Hope this was a fun and useful exercise - now, back to the bike!



1 comment:

  1. It doesn't make a difference in the event that you have years of experience skateboarding or just purchased your first load up yesterday, we would all be able to utilize a few tips on the off chance that you are going to invest critical measures of energy in a skateboard. Here are some useful tips for skateboarding to keep you on the right way. These tips can likewise spare you a huge amount of agony. Skateboarder's arm protection