Friday 22 November 2013

Meet this "Tubular" graph!

Many of us know London. Those of us that have visited London will know "the Tube", "the Underground" - simply the fastest and most efficient way to get around (although I must admit that Hailo has been quite a contender lately...). Beautiful city, lovely place to work, and since I started working for Neo, it feels a bit like my home away from home.

The Tube: A Great Graph

As you can easily imagine, or just plainly see from looking at any of the maps of the tube, the Underground really is a very sophisticated system, and can only be described as a very sophisticated graph. We always refer to it - in our Neo4j presentations - as the perfect example of how one-page-graphs can easily represent and provide *insight* into complex system ... without having to have a PhD in maths. Literally: almost everyone can use the tube - almost everyone can use a graph.

Finding a nice "tubular" dataset

Since we talk about this example all the time, and since I am indeed an avid, non-native tube-user, I thought it would be interesting to look at how I could fit the Tube system into a neo4j database. It took me a while, but of course the data is out there: this page links to this spreadsheet that has a very nice starting point. It contains the Line, the Direction, the Stations, the Distance between stations, and then 3 different time measurements between the stations.

Importing this into a neo4j database is really, really easy.

Creating a neo4j Tube database

First things first: from the above spreadsheet, we would probably be best off to transform it into a .csv file. Easy peasy in Excel: the result is over here. Once we have that, we can use the ever so awesome neo4j-shell-tools (the 2.0 version is over here, in case you can't find it!) to import the data into a nice little graph model:

Kudos to Alistair Jones for making Arrows - it's actually very useable these days :)) ...

In other words: Stations have to be unique, are connected by one or more "Lines" in two directions, and the "Lines" have a "Direction" property (east, west, north, south...), a "Time" between stations property (which can be different in opposite directions!), and a "Distance" between stations property.

The import script for the .csv file is quite simple, as it completely leverages the new neo4j 2.0RC1 way of working:
  • it uses a schema constraint to ensure that the stations are unique
  • it uses the new Match-syntax (with property-matching in the pattern instead of in a where clause)

All in all it is very simple and effective. The resulting graph.db directory is over here.

Exploring the tube in the neo4j browser

Ever since it's introduction at GraphConnect San Francisco, the neo4j browser has become my favourite place to play around with neo4j and cypher. One of it's coolest features is the ability to apply stylesheets to your graph visualisations. So I wanted to apply this to my new tube-graph, and use the "official" tube-line colours in the browser. 

Hammersmith and City
Waterloo and City
So then all I had to do was to download the .grass file from the browser, and start editing the "relationship" sections. In the example below, the .Circle and .Central are the names of the relationship types "Circle" and "Central". Logical.

You can download the full .grass file that I created from over here.

Nice: if I start exploring the surrounding tube network for "London Bridge" station, I quickly get a feel for the network:

But of course the real fun begins with the queries.

Exploring the Tube with Cypher

Obviously I don't have the technical skills - at all - to develop anything like a route planner for the London Underground. But: using the dataset that we just created, it's quite easy to see how it would be very doable to create something like that. Let's look at some of the queries that I created:

Show the different underground lines:

Show the most densely connected underground station

With "densely connected" meaning the most different underground lines passing through it.

And then you can drill into this really easily and explore some more:

And finally: pathfinding

Of course we can do some rudimentary pathfinding in Cypher. But it's rudimentary - and just included for fun. Let's say that I would want to go from Tower Hill to Southwark (one of the most tedious tube connections that I would take sometimes to get to our London office).

Anyone a bit familiar with London knows that this is "b*ll*cks", and noone would ever do that. The right thing (I think) to do is to take the district/circle lines from Tower Hill to Blackfriars - and then just walk across the bridge to the office. Easy.

I have included some other pathfinding queries in the gist - but I am pretty sure that they would need work :) ...

That's about it for now. I think I have demonstrated how easy it is make the Great Tube Graph even greater by putting it into a graph database like neo4j - and how you could easily use something like the neo4j browser to find your way around one of the world's most complicated networks. 

Hope you enjoy!



  1. Looks really great, but....

    According to it's strongly advised *not* to use relationship type to store data. Imho the tube-name is data and thus should be stored in a relationship property.
    For a small example like this it's fine and nice to play with, but you shouldn't use this method in a real-life growing database.
    However, it would be nice if relationships could have labels as well.... :-)

  2. I would agree. With a simple cypher statement you could replace the label with a more generic one, and set a property to the Line name.

    Thanks for the feedback.