Thursday 14 June 2018

Exploring new datatypes in Neo4j 3.4 and the Open Beer Database - part 1/2

Recently, I gave a talk at the Amsterdam, Brussels and London Neo4j meetups about some of the new and exciting features in Neo4j 3.4. While preparing for it, I was looking for material and I found some very cool stuff that powerfully explains the new features. The best resource is probably this post by Ryan Boyd, and the video that goes with it:

Ryan does a great job at explaining the new features, and goes into some detail on the new temporal and spatial data types that you can now use in Neo4j 3.4. You can explore these new features yourself by accessing the Neo4j Sandbox developed specifically for this purpose. Or you can just do what I did, and use the Neo4j Desktop to spin up a Neo4j instance, and access the "guide". You do that by typing
into the Neo4j browser, and then you can access the entire guide, add some data to your dataset, and play around.

However, that did not really do it for me, so I went looking for a dataset to try some stuff out myself. I remembered accessing a dataset before that had geospactial coordinates in it: the Open Beer Database. Or what did you expect?

That dataset is publicly available in a set of .csv files, so very suited for a quick import into Neo4j. Even better, I found Kevin Heraud's Github account, and his set of ready-to-to import scripts that I could start from. Easy.

I tweaked Kevin's script a little bit: you can find
Essentially what I did is the simple, and I will take you through it step by step.

Loading the OpenBeerDB into Neo4j

First I started a new Neo4j instance in the desktop:

On the running instance, I started a terminal within Neo4j Desktop, and then fired up the Cypher-shell:

Once that's there, it's very straightforward to run the load script:

It loads a graph from the .csv files on Kevin's github account, and you end up with just under 8800 nodes in your graph. Easy. You can take a look in the Neo4j browser and explore what's there:

The special think here are the RED nodes in the screenshot above, the GEOCODE nodes. These guys represent the locations where the GREEN nodes (the BREWERIES) are located, and have latitude and longitude properties on it. That's great. Now I can apply the new datatypes to this by simply running the following query:
//3.4 locations added from Geocode
MATCH (g:Geocode)
SET g.location = point({latitude: g.latitude, longitude: g.longitude});
and then adding a (spatial) index on the new location properties of the Geocode nodes:
CREATE INDEX ON :Geocode(location);
This is what that looks like:

Now that we have that database set up, we can start doing some Beer-related (!) geospatial queries in Neo4j. Let's do that in part 2/2 of this blogpost.

Hope that's useful - comments always welcome.



No comments:

Post a Comment