The Tube: A Great Graph
As you can easily imagine, or just plainly see from looking at any of the maps of the tube, the Underground really is a very sophisticated system, and can only be described as a very sophisticated graph. We always refer to it - in our Neo4j presentations - as the perfect example of how one-page-graphs can easily represent and provide *insight* into complex system ... without having to have a PhD in maths. Literally: almost everyone can use the tube - almost everyone can use a graph.Finding a nice "tubular" dataset
Since we talk about this example all the time, and since I am indeed an avid, non-native tube-user, I thought it would be interesting to look at how I could fit the Tube system into a neo4j database. It took me a while, but of course the data is out there: this page links to this spreadsheet that has a very nice starting point. It contains the Line, the Direction, the Stations, the Distance between stations, and then 3 different time measurements between the stations.
Importing this into a neo4j database is really, really easy.
Creating a neo4j Tube database
First things first: from the above spreadsheet, we would probably be best off to transform it into a .csv file. Easy peasy in Excel: the result is over here. Once we have that, we can use the ever so awesome neo4j-shell-tools (the 2.0 version is over here, in case you can't find it!) to import the data into a nice little graph model:
Kudos to Alistair Jones for making Arrows - it's actually very useable these days :)) ...
In other words: Stations have to be unique, are connected by one or more "Lines" in two directions, and the "Lines" have a "Direction" property (east, west, north, south...), a "Time" between stations property (which can be different in opposite directions!), and a "Distance" between stations property.
The import script for the .csv file is quite simple, as it completely leverages the new neo4j 2.0RC1 way of working:
- it uses a schema constraint to ensure that the stations are unique
- it uses the new Match-syntax (with property-matching in the pattern instead of in a where clause)
All in all it is very simple and effective. The resulting graph.db directory is over here.
Exploring the tube in the neo4j browser
Ever since it's introduction at GraphConnect San Francisco, the neo4j browser has become my favourite place to play around with neo4j and cypher. One of it's coolest features is the ability to apply stylesheets to your graph visualisations. So I wanted to apply this to my new tube-graph, and use the "official" tube-line colours in the browser.
Very quickly, I found the colours conveniently online:
LINE | TRUE HEXADECIMAL | WEB SAFE HEXADECIMAL |
---|---|---|
Bakerloo
|
#B36305
|
#996633
|
Central
|
#E32017
|
#CC3333
|
Circle
|
#FFD300
|
#FFCC00
|
District
|
#00782A
|
#006633
|
Hammersmith and City
|
#F3A9BB
|
#CC9999
|
Jubilee
|
#A0A5A9
|
#868F98
|
Metropolitan
|
#9B0056
|
#660066
|
Northern
|
#000000
|
#000000
|
Piccadilly
|
#003688
|
#000099
|
Victoria
|
#0098D4
|
#0099CC
|
Waterloo and City
|
#95CDBA
|
#66CCCC
|
So then all I had to do was to download the .grass file from the browser, and start editing the "relationship" sections. In the example below, the .Circle and .Central are the names of the relationship types "Circle" and "Central". Logical.
You can download the full .grass file that I created from over here.
Nice: if I start exploring the surrounding tube network for "London Bridge" station, I quickly get a feel for the network:
But of course the real fun begins with the queries.
Exploring the Tube with Cypher
Obviously I don't have the technical skills - at all - to develop anything like a route planner for the London Underground. But: using the dataset that we just created, it's quite easy to see how it would be very doable to create something like that. Let's look at some of the queries that I created:
Show the different underground lines:
Show the most densely connected underground station
With "densely connected" meaning the most different underground lines passing through it.
And then you can drill into this really easily and explore some more:
And finally: pathfinding
Of course we can do some rudimentary pathfinding in Cypher. But it's rudimentary - and just included for fun. Let's say that I would want to go from Tower Hill to Southwark (one of the most tedious tube connections that I would take sometimes to get to our London office).
Anyone a bit familiar with London knows that this is "b*ll*cks", and noone would ever do that. The right thing (I think) to do is to take the district/circle lines from Tower Hill to Blackfriars - and then just walk across the bridge to the office. Easy.
I have included some other pathfinding queries in the gist - but I am pretty sure that they would need work :) ...
That's about it for now. I think I have demonstrated how easy it is make the Great Tube Graph even greater by putting it into a graph database like neo4j - and how you could easily use something like the neo4j browser to find your way around one of the world's most complicated networks.
Hope you enjoy!
Rik