Wednesday, 20 October 2021

The Quest for Graphalue - episode 2 of a our Podcast Series on Graph Value

Please take a look at the 2nd episode in our new podcast series about Graph Value on www.graphalue.com. Me and my partner in crime Stefan Wendin have just a published a second episode in the new podcast series on finding, defining, documenting, presenting and achieving Graph Value. You should check it out!

Episode 2 is over here! Enjoy! 


Wednesday, 13 October 2021

The Quest for Graphalue - episode 1 of a new Podcast Series on Graph Value

Please take a look at our new podcast series on www.graphalue.com. Me and my partner in crime Stefan Wendin have just started an exciting new podcast series on finding, defining, documenting, presenting and achieving Graph Value. You should check it out!

Episode 1 is over here! Enjoy!


Tuesday, 5 October 2021

ReBeerGraph: importing the Belgian BeerGraph straight from a Wikipedia HTML page

I have written about beer a few times, also on this blog. I have figured out a variety of ways to import The Wikipedia Page with all the belgian beers into Neo4j over the years. Starting with a spreadsheet based approach, then importing it from a Google Sheet (with it's great automated .csv export facilities), and then building on that functionality to automatically import the Wikipedia page into a spreadsheet using some funky IMPORTHTML() functions in Google sheets.

But: all of the above have started to crumble recently. The Wikipedia page, is actually kind of a difficult thing to parse automatically, as it splits up the dataset into many different HTML tables (which makes me need to import multiple datasets, really), and then it also seems like Wikipedia has added an additional column to it's data (the "Timeframe" or "Period" in which a beer had been brewn), which had lots of missing, and therefore empty cells. All of that messes up the IMPORTHTML() Google sheet that I had been using to automatically import the page into a gsheet.

So: I had been on the lookout for a different way of doing the automated import. And recently, while I was working on another side project (of course), I actually bumped into it. That's what I want to cover and demonstrate here: importing the data, directly from the page, without intermediate steps, automatically, using the apoc.load.html functionality.

Monday, 2 August 2021

Summer fun with Musicbrainz: the "real" Six Degrees of Kanye West (part 3/3)

So: now that we have had some fun setting up our local Musicbrainz database (part 1), and importing the data into our Neo4j database (part 2), we can now start having some fun. That means: checking if that actual 6 degrees of Kanye West, and the actual "Kanye Number", is findable and reproducible in our Neo4j database, in an efficient way. Let's take a look at that.

Note: part of this effort was actually motivated by the fact that I have noticed that the python code that powers the above website, actually caches the results (see the github repo for more info) rather than calculate the Kanye Number in real time like we will do here. I guess that speaks to the power of graph databases, right?

But let's take a look at some queries.

Find other artists that worked together with Kanye

Let's start with some simple

match (kanye:Artist {name: "Kanye West"})--(r:Recording)--(a2:Artist)
return kanye,r,a2
limit 100

That gives you a bit of a peek already: Kanye's co-recorders

Summer fun with Musicbrainz: the "real" Six Degrees of Kanye West (part 2/3)

In the first article of this series we talked about our mission to recreate the Six Degrees of Kanye West website in Neo4j - and how we are going to use the (Musicbrainz database)[www.musicbrainz.org] to do that. We have a running postgres database, and now we can start the import of part of the dataset into Neo4j to understand what the infamous Kanye Number of artists would be.

Loading data into Neo4j

There's lots of different approaches to loadhing the data, but when I started looking at the model in a bit more detail:

  the model

Summer fun with Musicbrainz: the "real" Six Degrees of Kanye West (part 1/3)

Last year, I had a lot of fun working with a fantastic little tool that a colleague of mine created, to analyse and enhance Spotify playlists using Neo4j. While I was working on that blogpost, and I was experimenting with what little I know of Python code, I came across another example project that Spotify actually highlighted on their website: it's called Six Degrees of Kanye West and it's simply amazing.

The idea behind this site seems to be similar to the "Six Degrees of Kevin Bacon": if you have ever worked with Kevin directly, your Bacon Number is 1. If you have worked with someone that has worked with Kevin, then your Bacon Number is 2. Etc etc - and then this is applied to the idea of musicians working together on songs.

For example, if you go there and you look up some unknown artist (like then inimitable Belgian schlager singer, Helmut Lotti), like this:

 

Friday, 16 July 2021

Graphistania 2.0, Episode 15: The Summer Session with Emil

Well this makes me very happy: just before many of us are taking some summer vacations, and ON THE DAY OF MY 2ND VACCINATION SHOT, I am able to publish another Graphistania podcast episode - interviewing my friend and boss (how awesome is it to be able to say that!) Emil Eifrem. We talk about the world, the graph database market, Neo4j the company, and of course, the products. It was a ton of fun, and I even got Emil to agree to publishing the video recording too :) ... Hope you enjoy it as much as I did. Here goes:

And of course the video:


Here's the transcript of our conversation:

RVB 00:00:01.709 So have I got your consent to record, please?

EE 00:00:07.850 Fine. [laughter] I want to put it on the record that you have my consent to record this and release the audio.