As you probably know, importing data into Neo4j can be a bit tricky, in spite of some of the wonderful tools that we have these days. I blogged about this last year, and if you are looking for some guidance then please go there.
Turns out that, in order to get the most out of your import efforts, there's actually a few settings that you should be aware of and tweak - depending on your specific environment. Your machine's memory will be of paramount importance, and your dataset will also determine some of the optimization characteristics that we will discuss below.
Essentially there's three parameters to tweak:
- the Java heap size
- the Memory-mapping of neo4j files
- the neo4j cache configuration.
Settings | Batch Importer |
Heap size
|
Add parameters to the batch importers command line start statement:
-Xms<size> : this sets initial Java heap size
-Xmx<size> : this sets maximum Java heap size |
Memory mapping settings
|
Important Note: for the Batch importer, memory mapping settings are PART OF the heap settings above - you use a part of the heap size by using memory mapped files. That’s why you should try to give as much memory as possible as heap to the batch-importer. Leave 1-4GB to the operating system.
Try to memory map all of the node store, and as much of the relationship store files as possible.
Edit: /path/to/importer/batch.properties
use_memory_mapped_buffers=true
# 14 bytes per node neostore.nodestore.db.mapped_memory=200M # 33 bytes per relationship neostore.relationshipstore.db.mapped_memory=3G # 38 bytes per property neostore.propertystore.db.mapped_memory=500M # 60 bytes per long-string block neostore.propertystore.db.strings.mapped_memory=500M |
Cache settings
|
For bulk update/import operations, the cache should be disabled as you write only and no node or relationship objects are loaded.
Edit: /path/to/importer/batch.properties
cache_type=none
|
Then, let's explore the same setting for the running neo4j server import capabilities, for example using neo4j-shell-tools:
Settings | Neo4j Server / Neo4j-shell-tools |
Heap size
|
Edit: /path/to/neo4j/conf/neo4j-wrapper.conf
# Initial Java Heap Size (in MB)
wrapper.java.initmemory=4096
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=4096
|
Memory mapping settings
|
Important Note: for the Neo4j server (and neo4j-shell-tools that run against a server), memory mapping settings are SEPARATE of the heap settings. Your heap memory allocation will be additional to the memory mapping allocation. Usually you use between 4 and 8GB as heap. The remainder of your RAM is used for memory mapping.
Note that on users that run Neo4j on Windows, there is a significant difference: there, the memory mapping is part of the heap, and the principle explained in the batch-importer section should be followed.
Try to memory map all of the node store, and as much of the relationship store files as possible.
Edit: /path/to/neo4j/conf/neo4j.properties
The settings and settings to be added to this file are identical to the ones mentioned for the Batch Importer:
use_memory_mapped_buffers=true # 14 bytes per node neostore.nodestore.db.mapped_memory=200M # 33 bytes per relationship neostore.relationshipstore.db.mapped_memory=3G # 38 bytes per property neostore.propertystore.db.mapped_memory=500M # 60 bytes per long-string block neostore.propertystore.db.strings.mapped_memory=500M |
Cache settings
|
As you create relationships by looking up and updating nodes, the cache should be kept active on a running neo4j server that you are loading data into. Here we have a difference between the Community and Enterprise editions of neo4j: the Enterprise edition has a better cache that is not present in Community - the “High Performance Cache”. Therefore, for bulk update/import operations, you should Edit the neo4j.properties file in the conf directory of your neo4j installation:
Edit: /path/to/neo4j/conf/neo4j.properties
# Setting for Community Edition:
cache_type=weak
# Setting for Enterprise Edition:
cache_type=hpc
|
I am hoping that this was a good overview of the different setting that you should keep in mind and tweak - and where you should tweak them - in your specific environment.
Hope this was useful
Rik