Showing posts with label shortestpath. Show all posts

Monday, 29 August 2016

Orienteering with Neo4j - solving the 1000-control race with the Dijkstra-APOC - part 2/2

In part 1 of this blogpost, I explained how you can use the Awesome Neo4j APOCs to calculate a weighted shortest path on a graph with a more optimized and more efficient algorithm, based on Dijkstra's work. In this second and last blogpost on this topic, I would love to explain a bit why I think this is pretty much a very big deal. APOCs give you access, from Cypher, to a whole slew of graph algorithms, many of them very useful for all kinds of different graph operations.

Orienteering - a bit more complicated in the real world

One reason why I wanted to write this second post, is of course because my lovely sport - Orienteering - is of course a bit more complicated in the real world than what you have seen in that little park run that I talked about in the previous two posts. To give you a feel for it:

Here's an excerpt of my run in the World Masters Orienteering Champs a few weeks back in Estonia. More details over here - but I can tell you that for each and every one of these legs there's at least half a dozen different route options - and of course my course had 20+ control points too. So a bit more of a bigger graph anyway!
And here's another example: actually being run today (August 25th) is the actual elite's World Orienteering Champs on the long distance. Just. Look. At. This. Map.

Orienteering with Neo4j - moving from Cypher to the Dijkstra-APOC - part 1/2

So last July, my dear colleagues at Neo4j decided that they would tweet about a blogpost that I wrote 3 years ago.

#tbt: Finding the Shortest Path through the Park (with Graphs!)https://t.co/CLlK5Izuxk
via @rvanbruggen pic.twitter.com/SgbLYBUrRC
— Neo4j (@neo4j) July 28, 2016

The post was first published on my own blog over here, and then re-blogged over at the neo4j.com/blog. I also wrote a graphgist about it at the time, which I have revisited on the graphgist portal just now.

Some context

This entire thing started that summer with a blogpost by my friend and (at the time) colleague, Ian Robinson, about using a clever cypher query to calculate the weighted shortest path over a (small) graph. I decided to use that mechanism and apply it to my lovely hobby/sport: Orienteering. Pathfinding through forests, parks, cities - it's what we do in that sport, all the time. And efficient pathfinding in this environment, requires you to read the map, understand what the fastest route is, and run that as fast as you can. Effectively, when you want to "understand the fastest route", you will be weighing different alternative route choices against one another, and - as quickly as you can - choose that one for your run. It is, in effect, a total graph problem, a total "weighted shortest path problem" on a detailed map of your surroundings. So I used Ian's approach, and applied it to a small graph of an orienteering excercise in an Antwerp park.

The GraphBlogGraph: 3rd blogpost out of 3

Querying the GraphBlogGraph

After having created the GraphBlogGraph in a Google Spreadsheet in part 1, and having imported it into Neo4j in part 2, we can now start having some fun and analysing and querying that dataset. There are obviously a lot of things we could do here, but in this final blog post I am just going to explore some initial things that I am sure you could then elaborate and extend upon.

Let’s start with a simple query

// Which pages have the most links
match (b:Blog)--(p:Page)-[r:LINKS_TO]->(p2:Page)

return b.name, p.title, count(r)

order by count(r) desc

Run this in the Neo4j browser and we get:

or just return the graphical result with a slightly different query:

match (b:Blog)--(p:Page)-[r:LINKS_TO]->(p2:Page)

with b,p,r,p2, count(r) as count

order by count DESC

limit 50

return b,p,r,p2

And then you start to see that Max De Marzi is actually the “king of linking”: he links his pages to other web pages a lot (which is actually very good for search-engine-optimization) .

A quick visit to one of Max’ pages does actually confirm that: there’s a lot of cool, bizarre, but always interesting links on Max’ blogposts:

So let’s do another query. Let’s look at the different links that exist between blogposts of our blog-authors. Are they actually quoting/referring to one another or not? Let’s do

//links between blogposts

MATCH p=((n1:Blog)--(p1:Page)-[:LINKS_TO]-(p2:Page)--(b2:Blog))

RETURN p;

and then we actually find that there are some links - but not that many.

Same thing if we look at this a different way: let’s do some pathfinding and check out the paths between different blogs, for example my blog and Michael’s

match (b1:Blog {name:"Bruggen"}),(b3:Blog {name:"JEXP Blog"}),

p2 = allshortestpaths((b1)-[*]-(b3))

return p2 as paths

Then we actually see a bit more interesting connections: we don’t refer to one another directly very often, but we both refer to the same pages - and those pages become the links between our blogs. At depth 4 we see these kinds of patterns:

Interesting, right? I think so, at least!

Then let’s do some more playing around, looking at the most linked to pages:

//Which pages are being linked to most

match ()-[r:LINKS_TO]->(p:Page)

return p.url, count(r)

order by count(r) DESC

limit 10;

That quickly uncovers the true “spider in the web”, my friend, colleague and graphista-extraordinaire: Michael Hunger:

Last but not least, I wanted to revisit an old and interesting way of running PageRank on Neo4j using Cypher (not using the Graphaware NodeRank module, therefore). I blogged about some time ago, and it’s actually really interesting and easy to do. Here’s the query:

UNWIND range(1,50) AS round
MATCH (n:Page)
WHERE rand() < 0.1
MATCH (n:Page)-[:LINKS_TO*..10]->(m:Page)
SET m.rank = coalesce(m.rank,0) + 1

This does 50 iterations of PageRank, using a 0,1 damping factor and a maximum depth of 10. Running it is surprisingly quick:

If you do that a couple of times, and even do a few hundred iterations at once, you will quickly see the results emerge with the following simple query:

match (n:Page)
where n.rank is not null
return n.url, n.rank
order by n.rank desc
limit 10;

Confirming the “spider in the web” theory that I mentioned above. Michael rules the links!

All of these queries are of course on Github for you to play around with. Would love to hear your thoughts on these three blogposts, and hope that they were as fun for you to read as they were for me to write.

All the best.

Rik

Bruggen Blog

Pages

Monday, 29 August 2016

Orienteering with Neo4j - solving the 1000-control race with the Dijkstra-APOC - part 2/2

Orienteering - a bit more complicated in the real world

Thursday, 25 August 2016

Orienteering with Neo4j - moving from Cypher to the Dijkstra-APOC - part 1/2

Some context

Wednesday, 13 January 2016

The GraphBlogGraph: 3rd blogpost out of 3

Querying the GraphBlogGraph

Labels

Blogarchive

Metricool