Thursday, 25 August 2016

Orienteering with Neo4j - moving from Cypher to the Dijkstra-APOC - part 1/2

So last July, my dear colleagues at Neo4j decided that they would tweet about a blogpost that I wrote 3 years ago.

The post was first published on my own blog over here, and then re-blogged over at the neo4j.com/blog. I also wrote a graphgist about it at the time, which I have revisited on the graphgist portal just now.

Some context

This entire thing started that summer with a blogpost by my friend and (at the time) colleague, Ian Robinson, about using a clever cypher query to calculate the weighted shortest path over a (small) graph. I decided to use that mechanism and apply it to my lovely hobby/sport: Orienteering. Pathfinding through forests, parks, cities - it's what we do in that sport, all the time. And efficient pathfinding in this environment, requires you to read the map, understand what the fastest route is, and run that as fast as you can. Effectively, when you want to "understand the fastest route", you will be weighing different alternative route choices against one another, and - as quickly as you can - choose that one for your run. It is, in effect, a total graph problem, a total "weighted shortest path problem" on a detailed map of your surroundings. So I used Ian's approach, and applied it to a small graph of an orienteering excercise in an Antwerp park.

The Antwerp Park-Orienteering exercise

Here's what I did at the time. I took a small excercise that we did in a park, featuring a small tour that goes
START=>1st Control=>2nd Control=>Finish
For each of these three "legs" of the course, I laid out the different path options, including their distances and runnability scores. Then I imported that very small dataset into Neo4j using a very quick import script (we're only talking a handful of nodes and relationships here), all of which you can find below or in this github gist. The model is pretty simple:



And importing is simple too: first we set up the schema:
create index on :Control(name);
create index on :Waypoint(name);
Which is very straight forward:
And then we add the data:
create (one:Control {name:'1'}),
(two:Control {name:'2'}),
(three:Control {name:'Finish'}),
(oneone:Waypoint {name:'011'}),
(onetwo:Waypoint {name:'012'}),
(onethree:Waypoint {name:'013'}),
(onefour:Waypoint {name:'014'}),
(onefive:Waypoint {name:'015'}),
(twoone:Waypoint {name:'021'}),
(twotwo:Waypoint {name:'022'}),
(oneoneone:Waypoint {name:'111'}),
(oneonetwo:Waypoint {name:'112'}),
(oneonethree:Waypoint {name:'113'}),
(onetwoone:Waypoint {name:'121'}),
(twooneone:Waypoint {name:'211'}),
(twotwoone:Waypoint {name:'221'}),
(twotwotwo:Waypoint {name:'222'}),
(twothreeone:Waypoint {name:'231'}),
(twothreetwo:Waypoint {name:'232'}),
(zero)-[:COURSE_TO {distance:110, time:0, runnability:0}]->(one),
(one)-[:COURSE_TO {distance:100, time:0, runnability:0}]->(two),
(two)-[:COURSE_TO {distance:90, time:0, runnability:0}]->(three),
(zero)-[:NAVIGATE_TO {distance:5, time:0, runnability:1}]->(oneone),
(oneone)-[:NAVIGATE_TO {distance:15, time:0, runnability:1}]->(onetwo),
(onetwo)-[:NAVIGATE_TO {distance:5, time:0, runnability:1}]->(onethree),
(onethree)-[:NAVIGATE_TO {distance:100, time:0, runnability:1}]->(onefour),
(onefour)-[:NAVIGATE_TO {distance:80, time:0, runnability:0.9}]->(onefive),
(onefive)-[:NAVIGATE_TO {distance:5, time:0, runnability:0.8}]->(one),
(zero)-[:NAVIGATE_TO {distance:100, time:0, runnability:0.9}]->(twoone),
(twoone)-[:NAVIGATE_TO {distance:10, time:0, runnability:0.8}]->(twotwo),
(twotwo)-[:NAVIGATE_TO {distance:10, time:0, runnability:0.8}]->(one),
(one)-[:NAVIGATE_TO {distance:5, time:0, runnability:0.8}]->(oneoneone),
(oneoneone)-[:NAVIGATE_TO {distance:10, time:0, runnability:0.8}]->(oneonetwo),
(oneonetwo)-[:NAVIGATE_TO {distance:30, time:0, runnability:0.8}]->(oneonethree),
(oneonethree)-[:NAVIGATE_TO {distance:60, time:0, runnability:0.8}]->(two),
(one)-[:NAVIGATE_TO {distance:5, time:0, runnability:0.8}]->(onefive),
(onefive)-[:NAVIGATE_TO {distance:105, time:0, runnability:0.9}]->(onetwoone),
(onetwoone)-[:NAVIGATE_TO {distance:5, time:0, runnability:0.8}]->(two),
(two)-[:NAVIGATE_TO {distance:15, time:0, runnability:0.8}]->(twooneone),
(twooneone)-[:NAVIGATE_TO {distance:75, time:0, runnability:0.9}]->(three),
(two)-[:NAVIGATE_TO {distance:20, time:0, runnability:0.7}]->(twotwoone),
(twotwoone)-[:NAVIGATE_TO {distance:15, time:0, runnability:0.9}]->(twotwotwo),
(twotwotwo)-[:NAVIGATE_TO {distance:75, time:0, runnability:1}]->(three),
(two)-[:NAVIGATE_TO {distance:5, time:0, runnability:0.9}]->(onetwoone),
(onetwoone)-[:NAVIGATE_TO {distance:40, time:0, runnability:0.9}]->(twothreeone),
(twothreeone)-[:NAVIGATE_TO {distance:40, time:0, runnability:0.9}]->(twothreetwo),
(twothreetwo)-[:NAVIGATE_TO {distance:70, time:0, runnability:1}]->(three);
Which add the mini-graph.
Now, we can actually run the shortest path query over it, as per Ian's instructions. The query is on this github gist too, and goes like this:
//Original weighted shortestPath query rewritten for schema indexes
MATCH  (startNode:Control {name:"Start"}),
       (endNode:Control {name:"Finish"}),
       p=(startNode)-[:NAVIGATE_TO*]->(endNode)
RETURN p AS shortestPath,
       reduce(distance=0, r in relationships(p) | distance+r.distance) AS totalDistance
       ORDER BY totalDistance ASC
       LIMIT 1;
And the result is, as expected, totally correct and justified:

We can even make this query a bit more sophisticated, by looking a path weight that would not just be read from a property, but which would be calculated by a combination of properties (in this case: distance AND runnability). We can do that with the following query, which is also on github:
//Original query with RUNNABILITY rewritten for schema indexes
MATCH  (startNode:Control {name:"Start"}),
      (endNode:Control {name:"Finish"}),
      p=(startNode)-[:NAVIGATE_TO*]->(endNode)
RETURN p AS shortestPath,
      reduce(EstimatedTime=0, r in relationships(p) | EstimatedTime+(r.distance/r.runnability)) AS TotalEstimatedTime
      ORDER BY TotalEstimatedTime ASC
      LIMIT 1;
Because the weights are different, this query gives us a different "optimal" path for our orienteering excercise: because of the "runnability" factor, we have a different "route choice" from Control 1 to Control 2, and the query has a different solution for this part of the problem:



Yey! It all works. As you can read from the Cypher statements above, we can actually understand pretty easily what is going on here: Cypher first computes ALL the possible paths from the Start Control to the Finish Control, and then uses the "reduce" function to figure out which one of the paths has the shortest weighted path from start to finish. This works really quite well on a small dataset like this: we only have 20 nodes in our graph. But as we will see later on, the fact that we first need ALL the possible paths may become a problem for us later on... on larger graphs, that will just lead to a combinatory explosion.

Using Dijkstra to solve the same orienteering problem

I - and Ian of course as well - always knew that this approach would only work on smaller graphs. Which is why I was super excited to see that the new Awesome APOC procedures gave us access to some of the graph algorithms in Neo4j from Cypher. You see, Neo4j has always had a package full of graph algorithms available and shipped with it - it was just something that was only accessible from the JAVA API. Developers that we using Neo4j from a different language (using Cypher) could never get access to that - and that's exactly what has changed now with the Awesome APOCs. You can make an algo accessible from Cypher - and that's what has happened.

Finding a weighted shortest path can be done in different ways, but one of the most common is by using Dijkstra's Algorithm. It's best explained over here, but for our queries above, it really radically simplifies the operation: just download the latest Awesome APOC library over here, drop the .jar file into the ./plugins directory of your Neo4j installation, and then just run:
MATCH  (startNode:Control {name:"Start"}), (endNode:Control {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'NAVIGATE_TO', 'distance') YIELD path, weight
return path;
and you get the result back instantaneously.


It's exactly the same result as above - but as we will see later on, it is done completely differently, and does not require the calculation of ALL paths upfront.

We can also use Dijkstra to calculate the shortest weighted path using the calculated distance/runnability property, but not without adding that calculated property explicitly to the graph. We need to do:
//add the calculated property for dijkstra to work
match (a)-[r:NAVIGATE_TO]->(b)
set r.calculated=r.distance/r.runnability;
and add these properties:

and then we can run the new route finding algo:
MATCH  (startNode:Control {name:"Start"}), (endNode:Control {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'NAVIGATE_TO', 'calculated') YIELD path, weight
return path;
and get the same modified result for the second leg of the orienteering challenge.
So that works! Like a charm!

In the second part of this blogpost, I will explain WHY the fact that we now have Dijkstra at our APOC-fingertips, is in fact a very, very big deal. Look for that post in the next couple of days.

Hope this was useful for you!

Cheers

Rik

No comments:

Post a Comment