Thursday 3 July 2014

Using LoadCSV to Import data from Google Spreadsheet

My colleague Rickard recently did a great graphgist on Elite:Dangerous trading. When I read his post, the first thing that struck me was that he had found a great clever way to import data into Neo4j. And since I have been into data imports for a while, I decided to take it for a spin, and write it up for you.

The mechanism uses two fundamental capabilities:

  • Google Spreadsheets have a great capability to export the data into a CSV format. That export capability generates a unique URI that you can download the CSV of a specific sheet in the spreadsheet from. 
  • Neo4j's Load CSV capability can leverage data that is located at **any** URI. The data does not have to be local - it can be anywhere on the network.
So let's give this a try. 

Preparing the Google Spreadsheet

What we want to do is to put some sample data into a spreadsheet first.
You can find that actual sheet over here. Clearly it's not a very big sheet - we are just trying to do a little test.

Now, in order to export this file to CSV, and to make that export accessible over the internet, we need two things: 
  1. the spreadsheet needs to be publicly accessible over the internet. Otherwise Google will ask you to authenticate first, and the Neo4j Load CSV process will not know how to do that.
  2. you need to generate the download URI of the CSV export. That is very simple too. First you do the export:

    And then you take a look at your browser download history to figure out what the URI of the download was. Easy:

    You can copy that URI from there (in this case it is this ugly thing!)- and then we move on to the next stage: importing into Neo4j.

Importing the data into Neo4j

That Import process is very simple now, with Load CSV. Here's the query that I wrote, which uses the URI of the CSV version of the spreadsheet (in green):

It's very simple. First I add the colours using a MERGE, then I connect the persons to their respective colours using a create. Running that import gives me results in a matter of milliseconds, and with no intermediate steps:
And although this graph is very haphazard and not very interesting, I can query it straight away.

That's all there is to it really. Importing data was already way easier with LoadCSV, but thanks to Rickard, we can now do it straight from a Google Spreadsheet. Thanks Rickard!

Hope this was useful.