A couple of weeks ago, when I woke up early (!) Sunday morning to get "pistolets" and croissants for my family from our local bakery, I immediately took notice when I saw a graph behind the bakery counter. It was a "foodpairing" graph, sponsored by the people of Puratos - a wholesale provider of bakery products, grains, etc. So I get home and start googling, and before you know it I find some terribly interesting research by Yong-Yeol (YY) Ahn, featured in a Wired article, and in Scientific American, and in Nature. This researcher had done some fascinating work in understanding al 57k recipes from Epicurious, Allrecipes and Menupan, their composing ingredients and ingredient categories, their origin and - perhaps most fascinating of all - their chemical compounds.
And best of all: he made his datasets (this one and this one) available, so that I could spend some time trying to get it into neo4j and take it for a spin.
The dataset: some graph cleanup requiredThe dataset was there, but clearly wasn't perfect for import yet. I would have to do some work. And like always, that works starts with a model. Time to use Arrows again, and start drawing. I ended up with this:
It turned out to be a bit of manual work, but in the end I found it very easy to create the sheet that I needed. It was even less than 500k rows long in the end - so Excel didn't really blink. You can find the final excel file that I created over here.
Then it was really just a matter of exporting excel to CSV files, and getting it ready for import into neo4j with neo4j-shell-tools. Again: easy enough - I sort of went through this a couple of times before. You can find the zip file with all the csv files over here, and the neo4j-shell instructions are in this gist.
As you can see from the screenshot below, the dataset was well imported, without any issues, in a matter of minutes.
Query fun on the foodnetworkI have put all of the queries that I wrote on this gist over here - but I am sure you can come up with some more interesting ones.
Let's look if we can find out how many recipe-categories there would be in the different areas if the dataset. That would mean looking for the following pattern:
and that would yield the following result:
Or another interesting example, zooming into the specific Cuisines: what are the most popular ingredient categories in Belgium and the Netherlands, two neighbouring countries with a lot in common. The cypher query would look something like:
and the results would look like this (click for larger view):
And then last but not least, let's look at some specific recipes based on actual ingredients that we like. For example, I am a big fan of a "salade Liègeoise", which is a lukewarm dish with bacon, green beans, potatoes and in some cases, hard boiled eggs. Let's see if we could find any other recipes in our database that would use these ingredients? Chances are that we would like them, no? So here goes. The cypher query would go like this:
Note the use of the "collect" function to get all the ingredients of a recipe into one resultset column. And the result is actually quite interesting:
And also visually this gives us a pretty interesting picture:
Turns out there's quite a few similar dishes that I could choose from. Gotta do that some day :) ...
And now it's your turn
If you want to play around with this dataset yourself, there are multiple options:
- start with the zipped import files and the import script as described above
- download the zipped graph.db directory from over here.
- pay a visit to our friends at Graphenedb.com, who have an extremely nice sandbox environment that you can play around with. Handle with care, of course!
If you do, you may also want to apply this grass-file so that you don't have to mess around with the default settings.
I hope you thought this was as interesting as I found it - and as always, would love to get your feedback! In any case, I wish you and your families
a Merry Christmas, and a Happy New Year!