Wednesday 6 March 2013

Importing data into Neo4j - the spreadsheet way

I am sure that many of you are very technical people, very knowledgeable about all things Java, Dr. Who and many other things - but I in case you have ever met me, you would probably have noticed that I am not. And I don’t want to be. I love technology, but have never had the talent, inclination or education to program - so I don’t. But I still want to get data into Neo4j - so how do I do that?

There are many technical tools out there (definitely look here, here and here, but I needed something simple. So my friend and colleague Michael Hunger came to the rescue, and offered some help to create a spreadsheet to import into Neo4j.

You will find the spreadsheet here, and you will find two components:

  1. an instruction sheet. I will get to that later.
  2. a data import sheet. Let’s look at that first.

The Data Import Sheet

This sheet is composed of two parts:
  • columns A, B and C: these contain the data for the Nodes of our graph, using an “id”, a “name”, and a “type
  • columns F, G and H: these contain the data for the Relationships of our graph, having a “from-id” (where the relationship starts), a “to-id” (where the relationship ends), and a “relationship type”. Columns F and G reference the nodes and their id’s in column A.

And then comes the seccret sauce: how to create Cypher statements from these nodes and relationships. For this we use very simple statements that leverage the columns mentioned above, the cypher syntax and string concatenation. Look at the columns D and I:
  • cypher statements to create the nodes:

="create n={id:'"&A2&"', name:'"&B2&"', type:'"&C2&"'};"

output for row 2:

create n={id:'1', name:'Amada Emory', type:'Female'};

As you can see, it takes that id, name and type properties from columns A, B and C, and puts these into a “create” cypher statement.

  • cypher statements to create the relationships:

="start n1=node:node_auto_index(id='"&F2&"'), n2=node:node_auto_index(id='"&G2&"')  create n1-[:"&H2&"]->n2;"

output for row 2:

start n1=node:node_auto_index(id='1'), n2=node:node_auto_index(id='11') create n1-[:MOTHER_OF]->n2;

This one is a little bit more complicated, as it will be using Neo4j’s auto-index: in order to create the relationship, we first have to look up start node and end node from the auto-index using the ID property. And then the create-statement creates the relationship based on the relationship-type in column H.

So with this, we end up with two columns containing a bunch of cypher statements. So then what?

The Instructions Sheet

In the first sheet of the spreadsheet, you will find a bunch of instructions. Basically, you need to go through the following steps:
  • download and unzip Neo4j server.
  • copy/paste the cypher statements from the Import Sheet into a text file.
  • wrap these with a neo4j transaction (begin, commit) - so that all of the statements get persisted to disk in the same transaction (or not in case of an error). Not important for smaller datasets, more important for larger datasets.
  • some instructions on how to enable auto-indexing on Neo4j. This is important, because as you insert data into the database, it needs to get indexed for setting up the relationships properly (see above), and future use.
  • and some instructions on how you can pipe the text file into the neo4j shell - if necessary. For small datasets (and therefore, a limited number of cypher statements) you can do with copy/pasting the textfile into the Web-UI console - but that might not always work.
  • starting the server and browsing the Web-UI

And there we go: the dataset gets created, and Neo4j is ready for use. I hope this little overview was useful for you - it sure was useful for me when getting my hands dirty for the first time :) …