Friday 24 July 2015

Loading the Belgian Corporate Registry into Neo4j - part 4

In this fourth and final blog post (parts 1, 2 and 3 were published before), I would like to try and summarize my experience in loading the Belgian Corporate Registry into Neo4j. Here's a couple of meaningful points, that I hope will benefit everyone.

1. Size matters: doing import at scale is totally different than doing it for a few hundred/thousand nodes and relationships. More memory is good. Tweaking the settings is good. In a real production environment it would probably have been a better idea to do this import offline. Read some of the documentation and my previous article for tips.

2. Complexity matters: the more connected the graph is, the more you will need to think about the import process in detail. Bulk loading stuff is easy, but connecting it up can be hard and needs to be thought through. The magic happens in the query plan. So take a look at a small import first to understand what is happening in every step of the plan - and make sure you avoid "expensive" steps that take a lot of resources. Often times that will mean splitting up operations into smaller parts, like for example creating nodes first, and then adding the relationship - instead of writing the pattern in one go.

3. A fool with a tool: there are a range of different import tools at your disposal - but if you don't understand what they do, you may still fail. In part 2 I was super convinced that my funky bash+python wizardry was going to do the trick - but it didn't. I should have looked at the query plan in more detail, and thought about how to get around it. In hindsight, it would probably have been a good idea to look at offline import in more detail.

4. Dungeons and Dragons: down in the bowels of Neo4j there are still some nasty dungeons and dragons, like the Eager Pipe that we tackled in part 3. Our engineers are fighting these day and night, and know how to beat them. So the number one thing to do if you are struggling - is to reach out and talk to us. Otherwise it's all too easy to get lost.

That's about it, for now. Please don't forget to look at the following links next time you want to do real import magic:

Hope this was useful. Feedback always welcome.



No comments:

Post a Comment