Friday 25 February 2022

Importing (BEER) data into Neo4j - WITHOUT CODING!

Importing data into a graph structure stored in a graph database can be a real pain. Always has been, probably always will be to some degree. But we can really make the pain be a lot more tolerable - and today's blogpost is going to be about just that. The reason for this is pretty great: Neo4j has just launched a new online tool that allowed me to make the whole process a really easy and straightforward experience - take a look at it at http://data-importer.graphapp.io.

So let me try to explain how it works in the next few paragraphs.

First: find a dataset

Obviously the internet is flooded with data these days - but for this exercise I used https://datasetsearch.research.google.com/ for the first time. Amazing tool, as usual from Google. And I quickly found an interesting one that I could download from Kaggle.

This dataset contains information about the different types of beers and various aspects of it such beer style, absolute beer volume, beer name, brewer name, beer appearance, beer taste, its aroma, overall ratings, review, etc.  - and it does so in a single .csv file with about 500k rows. Cool. 

So I was ready to take that to the importer.


Preparing the importer: modeling and mapping

Once unzipped, the .csv is a good 414MB large. Not huge - but not that small either. So when I loaded it into the importer, I was pleasantly surprised at the fact that I almost immediately got the structure of the file to appear on the left hand tab of the interface. Now all I needed to do would be to draw a little graph model and map the fields onto the model.

For nodes this is pretty straightforward: I just select the .csv files that belong together and insert them into the model structure that I drew out. Important here is that you need to make sure that every node has an "identifier" property - so that the importer knows what to create/merge at insert time.
On the left hand side of the slide you can actually see your "progress" by looking at the green markers on the fields of the .csv file. The green markers mean that you have already mapped those - the missing markers mean that you still have some work to do to complete the mapping.

After a few more steps - we are talking minutes, not hours here - we have a complete model with a complete mapping of all the columns in the .csv file. 


It's also super cool to see how the relationships are easily mapped out: you select the identifiers for the FROM and the TO nodes, and name the TYPE of the relationship accordingly.

Once this is all done, we are ready to actually running the import. Note that I have not typed a SINGLE LINE OF CODE yet. No cypher. No apoc. No batching of transactions. No special configurations on the database. Nothing at all.


Running the import

Then we are ready to "push the blue button". Knowing how finnicky these types of import procedures were in the past - I took a deep breath and hoped for the best.

But: I am VERY happy to report that it all went super smoothly. A good 5 minutes later, the data importer happily reported the completion of the import:

And it even allowed me to take a look at the internals and how it performed the different steps of the task. Here's what happened when it was importing the Beer nodes:
And here you can see what happened when it imported the (Brewer)-[:BREWS]->(Beer) relationships:

So waw that was really surprisingly easy. Let's see if we can confirm the results in the actual database.


Viewing the results in the Neo4j Browser  and Neo4j Bloom

Looking at the data in the Neo4j browser immediately seemed to confirm the succesful import. Here are some nodes stats:

And of course also some relationship stats:

So we are definitely looking at the right type of volumes in terms of imported data.

The schema also looked correct:


So then we are ready to start exploring. Maybe, just maybe, I can find some more tasty tidbits to savour over the weekend. Browser and Bloom seem to be making plenty of nice suggestions.


My conclusion? It's hard to me to say how impressed I was with this beta version of the data importer tool. It's really easy to use, and it "just works"/ Considering that I know for a fact that many first time Neo4j users struggle with this, I am sure that this tool will have a massive impact on our user community. Truly: data import will never be the same again after this tool hits the road! Exciting!

Hope this was useful!

All the best

Rik

PS: I have put the archive of the combined dataset and import .json file on this link. Just download it, go to https://data-importer.neo4j.io/ and "open model with data" and you should be golden!


No comments:

Post a Comment