Wednesday 27 October 2021

The Quest for Graphalue - episode 3 of a Podcast Series on Graph Value

Please take a look at the 3rd episode in our new podcast series about Graph Value on www.graphalue.com. Me and my partner in crime Stefan Wendin have just a published a third episode in the new podcast series on finding, defining, documenting, presenting and achieving Graph Value. You should check it out!

Episode 3 is over here! Enjoy! 


Wednesday 20 October 2021

The Quest for Graphalue - episode 2 of a our Podcast Series on Graph Value

Please take a look at the 2nd episode in our new podcast series about Graph Value on www.graphalue.com. Me and my partner in crime Stefan Wendin have just a published a second episode in the new podcast series on finding, defining, documenting, presenting and achieving Graph Value. You should check it out!

Episode 2 is over here! Enjoy! 


Wednesday 13 October 2021

The Quest for Graphalue - episode 1 of a new Podcast Series on Graph Value

Please take a look at our new podcast series on www.graphalue.com. Me and my partner in crime Stefan Wendin have just started an exciting new podcast series on finding, defining, documenting, presenting and achieving Graph Value. You should check it out!

Episode 1 is over here! Enjoy!


Tuesday 5 October 2021

ReBeerGraph: importing the Belgian BeerGraph straight from a Wikipedia HTML page

I have written about beer a few times, also on this blog. I have figured out a variety of ways to import The Wikipedia Page with all the belgian beers into Neo4j over the years. Starting with a spreadsheet based approach, then importing it from a Google Sheet (with it's great automated .csv export facilities), and then building on that functionality to automatically import the Wikipedia page into a spreadsheet using some funky IMPORTHTML() functions in Google sheets.

But: all of the above have started to crumble recently. The Wikipedia page, is actually kind of a difficult thing to parse automatically, as it splits up the dataset into many different HTML tables (which makes me need to import multiple datasets, really), and then it also seems like Wikipedia has added an additional column to it's data (the "Timeframe" or "Period" in which a beer had been brewn), which had lots of missing, and therefore empty cells. All of that messes up the IMPORTHTML() Google sheet that I had been using to automatically import the page into a gsheet.

So: I had been on the lookout for a different way of doing the automated import. And recently, while I was working on another side project (of course), I actually bumped into it. That's what I want to cover and demonstrate here: importing the data, directly from the page, without intermediate steps, automatically, using the apoc.load.html functionality.