Friday 17 April 2015

Querying the SNAP Beeradvocate dataset in Neo4j - part 3

In the previous part of this blog post series of three posts, I imported the the SNAP Beeradvocate dataset into Neo4j. All good and well, and we now have the following meta-graph:

So now we can start querying the dataset. It's a bit different from the "Belgian Beer" dataset that I worked on previously - this one is a lot bigger, and also a bit more US-focused. But still - we can do some nice queries on it. Let's start with somethng nice and simple:

//where is Duvel 
match (b:Beer {name:"Duvel"}) return b 
//where is Duvel and surroundings 
match (b:Beer {name:"Duvel"})-[r]-()return b,rlimit 50
The result is interesting:

Then we try the same for Orval.
match (b:Beer {name:"Orval"}) return b 
does not return anything. So let's see if we can find it some other way:

The following query:
match (b:Beer)
where left(b.name,5) = "Orval"
return b  
tries to find Beers that have their name START with the word Orval. And yes indeed, we find it immediately.

And I am very happy to report that this query actually taught me something new about Orval. Even though I am a big fan and have been to the brewery multiple times - I had never drank the "Petite Orval", a lighter trappist beer that is only brewn for the monks. See wikipedia for more details. 

So let's take a look at some interesting paths. Here's a path between Duvel and Orval:
match (d:Beer {name:"Duvel"}), (o:Beer {name: "Orval Trappist Ale"}),  path = shortestpath((d)-[*]-(o))  return path  
that gives an (*one*) interesting path:
I really need to find that "Duvel Single" beer and taste it.

Or look at some additional paths, and run this query:
match (d:Beer {name:"Duvel"}), (o:Beer {name: "Orval Trappist Ale"}),
path = allshortestpaths((d)-[*]-(o))
return path
limit 10  
I have put in place the LIMIT to not make the browser blow up. The result is interesting:

There seem to be quite a few REVIEWERS that are reviewing both beers. That's interesting of course, but let's say that I would not want the reviewers/reviews be part of the path? Well, turns that "excluding" nodes from a shortestpath function is not that easy in Cypher - you are better of including the relationship types that you want to have included. Like this:

//link between Duvel and Orval discarding the reviewers and reviews 
match (d:Beer {name:"Duvel"}), (o:Beer {name: "Orval Trappist Ale"}), path = allshortestpaths((d)-[:BREWS | HAS_STYLE*]-(o))  return path  
This query gives yet another interesting result:
Turns out a the brewery seems to be brewing a number of similar beers to Orval! Not sure how true this is - but worth an investigation!

Last but not least, is the search for my favourite beer style - the Trappist beers. Now, this is kind of tricky, as the dataset that we are working with here is kind of American focused - and not all the beer brands or styles are as I would expect them to be. On top of that, we currently don't have "fulltext" search capabilities in the wonderful new "Schema indexes" (we do have them in the legacy indexes, but I am not using those here), so we have to work around that with a regular expression in the query. Let me show you:
//Find the Trappist beers, their breweries and styles 
match (br:Brewery)--(b:Beer)--(s:Style)
where b.name =~ ".*\\bTrappist\\b.*"
OR s.name =~ ".*\\bTrappist\\b.*"
return b,br,s;  
gives us all the beers, breweries and styles that have the word "Trappist" in their beer or style names. It gives us a really interesting subgraph to take a look at:

Seems like the good news is that I have quite a bit of beer exploration to do!

That concludes this third part of the Beeradvocate network dataset exploration in Neo4j. There's a lot of other stuff that we could do with this dataset - but I hope you already found it as interesting as I did - and as always, please send me your feedback!

Cheers

Rik

PS: Here are the links to

No comments:

Post a Comment