In part 1 of this blogpost, I explained how you can use the Awesome Neo4j APOCs to calculate a weighted shortest path on a graph with a more optimized and more efficient algorithm, based on Dijkstra's work. In this second and last blogpost on this topic, I would love to explain a bit why I think this is pretty much a very big deal. APOCs give you access, from Cypher, to a whole slew of graph algorithms, many of them very useful for all kinds of different graph operations.
Orienteering - a bit more complicated in the real world
One reason why I wanted to write this second post, is of course because my lovely sport - Orienteering - is of course a bit more complicated in the real world than what you have seen in that little park run that I talked about in the previous two posts. To give you a feel for it:
- Here's an excerpt of my run in the World Masters Orienteering Champs a few weeks back in Estonia. More details over here - but I can tell you that for each and every one of these legs there's at least half a dozen different route options - and of course my course had 20+ control points too. So a bit more of a bigger graph anyway!
- And here's another example: actually being run today (August 25th) is the actual elite's World Orienteering Champs on the long distance. Just. Look. At. This. Map.
It's just amazing, and if you are an orienteer like me, pretty darn SCARY. Crazy race, physically and mentally.
Point being: even in orienteering, the "shortest weighted pathfinding" is WAY more complicated than the previous example, and in real Enterprise applications, it will be even more complicated. That's why I wanted to try the old way (using the reduce-based Cypher query) and the new way (using the Dijkstra-based APOC) on a larger graph.
So here's what I did:
Point being: even in orienteering, the "shortest weighted pathfinding" is WAY more complicated than the previous example, and in real Enterprise applications, it will be even more complicated. That's why I wanted to try the old way (using the reduce-based Cypher query) and the new way (using the Dijkstra-based APOC) on a larger graph.
So here's what I did:
- First: I applied the same methods that I used on the previous blogpost, to a "more realistic" orienteering graph that would feature a 10-control race. For each "leg" of the race, I would have 3 different route choices, and for each route, I would have 5 different waypoints.
- Second: I created a (hypothetical, of course) 1000-control orienteering graph with the same characteristics: 3 routes for every leg, and 5 waypoints for every route.
And then I could look at the difference between the two query strategies. Let's start with the first.
A 10 control orienteering race: very different performance
Here's the query to create that 10-control orienteering race, also available on github, of course:
create index on :Control(seq);
unwind range(0,10) as range
merge (c2:Control {seq: range});
match (n1:Control {seq:0}),(n2:Control {seq:10})
set n1.name = "Start"
set n2.name = "Finish";
match (c1:Control), (c2:Control)
where c2.seq=c1.seq+1
merge (c1)-[:COURSE_TO]->(c2);
match (c1:Control)-->(c2:Control)
where c2.seq=c1.seq+1
with distinct c1,c2
create (c1)-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w1:Route1Waypoint {name: "Waypoint 1.1"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w2:Route1Waypoint {name: "Waypoint 1.2"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w3:Route1Waypoint {name: "Waypoint 1.3"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w4:Route1Waypoint {name: "Waypoint 1.4"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w5:Route1Waypoint {name: "Waypoint 1.5"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(c2);
match (c1:Control)-->(c2:Control)
where c2.seq=c1.seq+1
with distinct c1,c2
create (c1)-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w1:Route2Waypoint {name: "Waypoint 2.1"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w2:Route2Waypoint {name: "Waypoint 2.2"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w3:Route2Waypoint {name: "Waypoint 2.3"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w4:Route2Waypoint {name: "Waypoint 2.4"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w5:Route2Waypoint {name: "Waypoint 2.5"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(c2);
match (c1:Control)-->(c2:Control)
where c2.seq=c1.seq+1
with distinct c1,c2
create (c1)-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w1:Route3Waypoint {name: "Waypoint 3.1"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w2:Route3Waypoint {name: "Waypoint 3.2"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w3:Route3Waypoint {name: "Waypoint 3.3"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w4:Route3Waypoint {name: "Waypoint 3.4"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w5:Route3Waypoint {name: "Waypoint 3.5"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(c2);
Here's a sample of what the result looks like:
Then, I can very quickly "call a friend" and get the solution to my 10-control Orienteering race by doing one of two things:
1. Run the Cypher Shortest path query:
MATCH (startNode:Control {name:"Start"}),
(endNode:Control {name:"Finish"}),
p=(startNode)-[:NAVIGATE_TO*]->(endNode)
RETURN p AS shortestPath,
reduce(EstimatedTime=0, r in relationships(p) | EstimatedTime+(r.distance/r.runnability)) AS TotalEstimatedTime
ORDER BY TotalEstimatedTime ASC
LIMIT 1;
The result just about fits into my Neo4j Brower:
But what you probably did not see (unless you tried it) was that the:
Man! That's SLOW! It's becoming very clear now, that the combinatory explosion of paths that this query is using (intentionally), is just not a feasible option on larger graphs. If the result is already that bad on this small of a graph, then we'd better quickly turn to
2. Run the Dijkstra-APOC-based query:
2. Run the Dijkstra-APOC-based query:
MATCH (startNode:Control {name:"Start"}), (endNode:Control {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'NAVIGATE_TO', 'calculated') YIELD path, weight
return path;
Which gives me exactly the same result: just look at the colours of the waypoints to see that the route choices are identical for the different legs of the course:
And: BONUS! Thanks to the the AWESOME APOC's power, the
Now, I guess it would become pretty clear that doing this on an even larger graph would be ... interesting, right? Let's try.
The 1000-control Orienteering Race
Edsger W. Dijkstra |
However, if you give these 1000 controls to our dear friend Mr. Dijkstra, and within a very short wait you get:
It of course does not correctly visualise the entire solution to the problem in the Browser (what would be the point!), but the important thing is that
Which is super great. Again - this is why APOCs are Awesome - you can run these complex algorithms, straight on the database, straight from Cypher, without any additional complications.
Try THAT in your relational, document, key-value, or column-family store!
Hoping this exercise was useful for you, and that you had as much fun with this as I did.
Cheers
Rik
No comments:
Post a Comment