In part 1 of this blogpost, I explained how you can use the Awesome Neo4j APOCs to calculate a weighted shortest path on a graph with a more optimized and more efficient algorithm, based on Dijkstra's work. In this second and last blogpost on this topic, I would love to explain a bit why I think this is pretty much a very big deal. APOCs give you access, from Cypher, to a whole slew of graph algorithms, many of them very useful for all kinds of different graph operations.
Orienteering - a bit more complicated in the real world
One reason why I wanted to write this second post, is of course because my lovely sport - Orienteering - is of course a bit more complicated in the real world than what you have seen in that little park run that I talked about in the previous two posts. To give you a feel for it:
- Here's an excerpt of my run in the World Masters Orienteering Champs a few weeks back in Estonia. More details over here - but I can tell you that for each and every one of these legs there's at least half a dozen different route options - and of course my course had 20+ control points too. So a bit more of a bigger graph anyway!
- And here's another example: actually being run today (August 25th) is the actual elite's World Orienteering Champs on the long distance. Just. Look. At. This. Map.
data:image/s3,"s3://crabby-images/654b3/654b38afd4786813dfb502ff57627c97abb7b1ca" alt=""
It's just amazing, and if you are an orienteer like me, pretty darn SCARY. Crazy race, physically and mentally.
Point being: even in orienteering, the "shortest weighted pathfinding" is WAY more complicated than the previous example, and in real Enterprise applications, it will be even more complicated. That's why I wanted to try the old way (using the reduce-based Cypher query) and the new way (using the Dijkstra-based APOC) on a larger graph.
So here's what I did:
Point being: even in orienteering, the "shortest weighted pathfinding" is WAY more complicated than the previous example, and in real Enterprise applications, it will be even more complicated. That's why I wanted to try the old way (using the reduce-based Cypher query) and the new way (using the Dijkstra-based APOC) on a larger graph.
So here's what I did:
- First: I applied the same methods that I used on the previous blogpost, to a "more realistic" orienteering graph that would feature a 10-control race. For each "leg" of the race, I would have 3 different route choices, and for each route, I would have 5 different waypoints.
- Second: I created a (hypothetical, of course) 1000-control orienteering graph with the same characteristics: 3 routes for every leg, and 5 waypoints for every route.
And then I could look at the difference between the two query strategies. Let's start with the first.
A 10 control orienteering race: very different performance
Here's the query to create that 10-control orienteering race, also available on github, of course:
create index on :Control(seq);
unwind range(0,10) as range
merge (c2:Control {seq: range});
match (n1:Control {seq:0}),(n2:Control {seq:10})
set n1.name = "Start"
set n2.name = "Finish";
match (c1:Control), (c2:Control)
where c2.seq=c1.seq+1
merge (c1)-[:COURSE_TO]->(c2);
match (c1:Control)-->(c2:Control)
where c2.seq=c1.seq+1
with distinct c1,c2
create (c1)-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w1:Route1Waypoint {name: "Waypoint 1.1"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w2:Route1Waypoint {name: "Waypoint 1.2"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w3:Route1Waypoint {name: "Waypoint 1.3"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w4:Route1Waypoint {name: "Waypoint 1.4"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w5:Route1Waypoint {name: "Waypoint 1.5"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(c2);
match (c1:Control)-->(c2:Control)
where c2.seq=c1.seq+1
with distinct c1,c2
create (c1)-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w1:Route2Waypoint {name: "Waypoint 2.1"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w2:Route2Waypoint {name: "Waypoint 2.2"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w3:Route2Waypoint {name: "Waypoint 2.3"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w4:Route2Waypoint {name: "Waypoint 2.4"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w5:Route2Waypoint {name: "Waypoint 2.5"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(c2);
match (c1:Control)-->(c2:Control)
where c2.seq=c1.seq+1
with distinct c1,c2
create (c1)-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w1:Route3Waypoint {name: "Waypoint 3.1"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w2:Route3Waypoint {name: "Waypoint 3.2"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w3:Route3Waypoint {name: "Waypoint 3.3"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w4:Route3Waypoint {name: "Waypoint 3.4"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(w5:Route3Waypoint {name: "Waypoint 3.5"})-[:NAVIGATE_TO {distance: toInt(rand()*1000), runnability: ceil(rand()*10)/10}]->(c2);
Here's a sample of what the result looks like:
Then, I can very quickly "call a friend" and get the solution to my 10-control Orienteering race by doing one of two things:
1. Run the Cypher Shortest path query:
MATCH (startNode:Control {name:"Start"}),
(endNode:Control {name:"Finish"}),
p=(startNode)-[:NAVIGATE_TO*]->(endNode)
RETURN p AS shortestPath,
reduce(EstimatedTime=0, r in relationships(p) | EstimatedTime+(r.distance/r.runnability)) AS TotalEstimatedTime
ORDER BY TotalEstimatedTime ASC
LIMIT 1;
The result just about fits into my Neo4j Brower:
But what you probably did not see (unless you tried it) was that the:
Man! That's SLOW! It's becoming very clear now, that the combinatory explosion of paths that this query is using (intentionally), is just not a feasible option on larger graphs. If the result is already that bad on this small of a graph, then we'd better quickly turn to
2. Run the Dijkstra-APOC-based query:
2. Run the Dijkstra-APOC-based query:
MATCH (startNode:Control {name:"Start"}), (endNode:Control {name:"Finish"})
call apoc.algo.dijkstra(startNode, endNode, 'NAVIGATE_TO', 'calculated') YIELD path, weight
return path;
Which gives me exactly the same result: just look at the colours of the waypoints to see that the route choices are identical for the different legs of the course:
And: BONUS! Thanks to the the AWESOME APOC's power, the
Now, I guess it would become pretty clear that doing this on an even larger graph would be ... interesting, right? Let's try.
The 1000-control Orienteering Race
![]() |
Edsger W. Dijkstra |
However, if you give these 1000 controls to our dear friend Mr. Dijkstra, and within a very short wait you get:
It of course does not correctly visualise the entire solution to the problem in the Browser (what would be the point!), but the important thing is that
Which is super great. Again - this is why APOCs are Awesome - you can run these complex algorithms, straight on the database, straight from Cypher, without any additional complications.
Try THAT in your relational, document, key-value, or column-family store!
Hoping this exercise was useful for you, and that you had as much fun with this as I did.
Cheers
Rik
No comments:
Post a Comment