As you may know by now, I like beer. A lot - why else would I keep writing and talking about it? But there’s more to life than sweet beverages, and one of the things that I have been doing for as long as I can remember is Orienteering. I have been practicing the sport in Belgium since 1984 - I was 11 years old. My dad used to take me to races all across the continent - we truly had a blast. And we still do: I still orienteer almost every week, and so does my dad. Now I take my 8 and 10-year old kids with me to the races, and their granddad cheers them on every step of the way. It’s a fantastic family sport.

One of the reasons why it is so fantastic is that - Orienteering is a “thinking sport”. You have to concentrate to navigate. You have to run to have the best time (it’s a race), but if you run too fast, you are sure to make navigation mistakes. You have to find the balance between physical and mental fitness - which is hard but completely awesome when you succeed. And: it’s outdoors - in the woods and fields. What’s not to like?

So what does that have to do with neo4j? Well, orienteering is all about “

**”: the *fastest* route from start to finish. Fast can be short. Fast can also mean that it is better to take a detour: if it is easier to run the longer route, than to walk the shorter route, you are better off choosing the longer route. In essence, every orienteering race is … a graph problem waiting to be solved in the middle of nature.**__finding the shortest path__## Orienteering = a green graph problem

In case you don’t know: orienteering races are a bit like an obstacle race. Every participant gets assigned a course, out there in the green forests and fields, and along that course are sequences of beacons that one needs to get to in order. Such a sequence is … a path on a graph - you have to choose how to navigate from obstacle to obstacle, from node to node.
Essentially, the orienteer has to navigate and choose the

**fastest**route. Finding the fastest route effectively this__boils down to a “weighted shortest path” calculation__. You calculate the shortest path using- distance: shorter = better
- runnability: higher = better. Runnability can be affected by the type of terrain (running on a road? through a field? through a forest? through a forest with soil covered with plants? over a hill? through a valley? …)

as your parameters. For every “leg” of the race, you estimate the presumable “best route” based on the assumption that distance / runnability will be the best indicator for your likely speed.

## Example: a 2 control race in Antwerp, Belgium

If we then look at every leg separately, you can see that for every leg, there are a number of route options.

So 3 controls, and different routes with different characteristics. As you can see in the schematic representations, every route has different “waypoints” - specific points of interest that I can identify on the map, and recognize in the “field”. These waypoints are extremely important for the navigation exercise that we are doing - they allow us to break the problem up in smaller pieces and evaluate our options.

Intuitively, all of us will have a “feeling” about what would be the best route choice, but now let’s use graph algorithms to do this for us!

## Graph database model to navigate

In order to apply a graph algorithm, we first need to create a graph. These are my nodes:

- Control nodes: the race beacons that I have to pass by
- The alternative route choices, decomposed in waypoints.

Then, let’s create the relationships between these nodes. We will have “COURSE_TO” relationships between controls, and “NAVIGATE_TO” relationships between waypoints. Effectively, these will become “paths” on my graph, hopping from node to node along the relationships.

- From the start to control 1: I have
**0->0.11->0.12->0.13->0.14->0.15**as one route and**0->0.21->0.22->1**as another route - From control 1 to control 2 I have
**1->1.11->1.12->1.13->2**as one route option, and**1->0.15->1.21->2**as another option. - From control 2 to the finish I have 3 options:
**2->2.11->3**,**2->2.21->2.22->3**and**2->1.21->2.31->2.32->3**.

As you can see, I have immediately added “distance” (in meters) and “runnability” (in %) properties to my relationships.

When I then generate a neo4j database using the spreadsheet method, I get a nice little database - ready to be queried and ready for my algorithms.

## Graphs algorithms to win the race!

In order to calculate the best route to win the race, I need to calculate the shortest path across the graph - which is standard functionality of neo4j. But because there’s more to it than running the course in straight lines between controls, I need to incorporate**weights**(distance, runnability) to get a realistic estimate of what would be the best route choice. To do so I am using a technique so well demonstrated by Ian Robinson on his blog last June.

Let’s go through two versions of this calculation:

- find the shortest path by distance only:

START startNode=node:node_auto_index(name="Start"),

endNode=node:node_auto_index(name="Finish")

MATCH p=(startNode)-[:NAVIGATE_TO*]->(endNode)

RETURN p AS shortestPath,

reduce(distance=0, r in relationships(p) : distance+r.distance) AS totalDistance

ORDER BY totalDistance ASC

LIMIT 1;

- find the best estimate of the fastest path, as a function of distance/runnability. In a real race this would probably be the route that I would choose - as it would give me the best chance of winning the race.

To do this, we will be using ‘reduce’ to sum the distance divided by the runnability: longer distance with superior runnability, is possibly faster than shorter distance with lower runnability:

START startNode=node:node_auto_index(name="Start"),

endNode=node:node_auto_index(name="Finish")

MATCH p=(startNode)-[:NAVIGATE_TO*]->(endNode)

RETURN p AS shortestPath,

reduce(EstimatedTime=0, r in relationships(p) : EstimatedTime+(r.distance/r.runnability)) AS TotalEstimatedTime

ORDER BY TotalEstimatedTime ASC

LIMIT 1;

As you can see, the first and second queries take the same (shortest) path for start to control 1 and from control 2 to finish, but recommends a clearly different path from control 1 to control 2 (following the forest road instead of cutting through the forest).

Many applications for weighted shortest paths!

Obviously Orienteering is not a business application, but in logistics, planning, impact analysis and many other applications, weighted shortest path algorithms will have a great potential. Whether it is to find out how things are related to each other, determining the most efficient way to get something from point A to point B, or finding out who would be affected by a particular type of capacity outage - the approach that I used for my orienteering problem would work just as nicely!

Hi,

ReplyDeletei tried this query on my graph :

START startNode = node(269604), endNode = node(269605)

MATCH path=(startNode)-[:CAR_MAIN_NODES_RELATION*]-(endNode)

RETURN path AS shortestPath, reduce(cost=0, rel in relationships(path) | cost + rel.edgeLength) AS totalCost

ORDER BY totalCost ASC

LIMIT 3

but, after 10 minutes, I have not had any results yet...

My graph had 157689 nodes and 1063621 edges.

Antonio, I am sorry for not replying earlier. I am actually publishing a post tomorrow that will hopefully allow you to do this MUCH more efficiently using a Neo4j APOC procedure based on Dijkstra's algorithm.

Delete