The first I needed to do was to create a graph model out of my CSV files. Here's what I picked:
So then I would need to create a series of Load CSV commands to import these. And this is where it got it got interesting. I created the Cypher queries myself, and found that they worked fine - except for one part. This was the part where I had to add the reviews to the graph. This was my query:
using periodic commit
load csv with headers
from "file:/Users/rvanbruggen/Dropbox/Neo Technology/Demo/BEER/BeerAdvocate/real/ba7.csv" as csv
fieldterminator ';'
with csv
where csv.review_profileName is not null
match (b:Beer {name: csv.beer_name}), (p:Profile {name: csv.review_profileName})
create (p)-[:CREATES_REVIEW]->(r:Review {taste: toFloat(csv.review_taste), appearance: toFloat(csv.review_appearance), text: csv.review_text, time: toInt(csv.review_time), aroma: toFloat(csv.review_aroma), palate: toFloat(csv.review_palate), overall: toFloat(csv.review_overall)})-[:REVIEW_COVERS]->(b);
The first thing Michael asked was my query plan (using the EXPLAIN) commando: this was particularly interesting. Michael saw that there was a step in there that was called "Eager". Mark has blogged about this elsewhere already, and it was clear that we had to get rid of this.
Here's the query that Michael suggested:
//the query below is NO LONGER PROBLEMATIC
using periodic commit
load csv with headers from "file:/Users/rvanbruggen/Dropbox/Neo Technology/Demo/BEER/BeerAdvocate/real/ba15.csv" as csv fieldterminator ';'
with csv where csv.review_profileName is not null
create (r:Review {taste: toFloat(csv.review_taste), appearance: toFloat(csv.review_appearance), text: csv.review_text, time: toInt(csv.review_time), aroma: toFloat(csv.review_aroma), palate: toFloat(csv.review_palate), overall: toFloat(csv.review_overall)})
with r,csv
match (b:Beer {name: csv.beer_name})
match (p:Profile {name: csv.review_profileName})
create (p)-[:CREATES_REVIEW]->(r)
create (r)-[:REVIEW_COVERS]->(b);
// takes 13s
You can find the two import scripts on github:
- this is my old version (which DID NOT WORK, at least not always)
UPDATE: in the original version of this blogpost, I was working with version 2.2.0 of Neo4j. Recently, 2.2.1 was released - and guess what: the queries run just fine. Apparently the team had made some change to how Neo4j handles composite merge updates - and it now just flies through all queries, even with my old, sub-optimal version of the queries.. Kudos! - this is Michael's version (which, of course, WORKS)
UPDATE: I would still recommend using this version of the queries :)
Let's explore some of the differences.
- Michael's version included the same indexes as mine - but also included a UNIQUENESS CONSTRAINT. This seems to be a good idea because it makes the MERGE-ing of the data unnecessary - you can just CREATE instead.
- Michael's version does "one MERGE at a time". Rather than merging in entire patterns like
merge (b)-[:HAS_STYLE]-(s:Style {name: csv.beer_style})
you instead do
merge (s:Style {name: csv.beer_style})
and then merge (b)-[:HAS_STYLE]->(s)
- you reorder certain parts of the query to come earlier in the sequence. I noticed that he did the CREATE of the review first, then transferred that result into the next part of the query with WITH, and then did two matches (for Beers and Profiles) to connect the Review to the appropriate Beer and Profile. To be honest, this seems to have been a bit of a trial and error search - but we found out after talking to the awesome devteam that this should no longer be necessary as from the forthcoming version 2.2.1 of Neo4j.
The result is pretty awesome. After having run the entire import (which means running the same import 15 times - see over here for the complete script) I got a pretty shiny new database to play around with:
In the last part of this blog-series, I will be doing some fancy queries. Really looking forward to it :))
Hope you found this useful.
Cheers
Rik
PS: Here are the links to
In the last part of this blog-series, I will be doing some fancy queries. Really looking forward to it :))
Hope you found this useful.
Cheers
Rik
PS: Here are the links to
No comments:
Post a Comment