Monday, 2 August 2021

Summer fun with Musicbrainz: the "real" Six Degrees of Kanye West (part 3/3)

So: now that we have had some fun setting up our local Musicbrainz database (part 1), and importing the data into our Neo4j database (part 2), we can now start having some fun. That means: checking if that actual 6 degrees of Kanye West, and the actual "Kanye Number", is findable and reproducible in our Neo4j database, in an efficient way. Let's take a look at that.

Note: part of this effort was actually motivated by the fact that I have noticed that the python code that powers the above website, actually caches the results (see the github repo for more info) rather than calculate the Kanye Number in real time like we will do here. I guess that speaks to the power of graph databases, right?

But let's take a look at some queries.

Find other artists that worked together with Kanye

Let's start with some simple

match (kanye:Artist {name: "Kanye West"})--(r:Recording)--(a2:Artist)
return kanye,r,a2
limit 100

That gives you a bit of a peek already: Kanye's co-recorders

Summer fun with Musicbrainz: the "real" Six Degrees of Kanye West (part 2/3)

In the first article of this series we talked about our mission to recreate the Six Degrees of Kanye West website in Neo4j - and how we are going to use the (Musicbrainz database)[] to do that. We have a running postgres database, and now we can start the import of part of the dataset into Neo4j to understand what the infamous Kanye Number of artists would be.

Loading data into Neo4j

There's lots of different approaches to loadhing the data, but when I started looking at the model in a bit more detail:

  the model

Summer fun with Musicbrainz: the "real" Six Degrees of Kanye West (part 1/3)

Last year, I had a lot of fun working with a fantastic little tool that a colleague of mine created, to analyse and enhance Spotify playlists using Neo4j. While I was working on that blogpost, and I was experimenting with what little I know of Python code, I came across another example project that Spotify actually highlighted on their website: it's called Six Degrees of Kanye West and it's simply amazing.

The idea behind this site seems to be similar to the "Six Degrees of Kevin Bacon": if you have ever worked with Kevin directly, your Bacon Number is 1. If you have worked with someone that has worked with Kevin, then your Bacon Number is 2. Etc etc - and then this is applied to the idea of musicians working together on songs.

For example, if you go there and you look up some unknown artist (like then inimitable Belgian schlager singer, Helmut Lotti), like this:


Friday, 16 July 2021

Graphistania 2.0, Episode 15: The Summer Session with Emil

Well this makes me very happy: just before many of us are taking some summer vacations, and ON THE DAY OF MY 2ND VACCINATION SHOT, I am able to publish another Graphistania podcast episode - interviewing my friend and boss (how awesome is it to be able to say that!) Emil Eifrem. We talk about the world, the graph database market, Neo4j the company, and of course, the products. It was a ton of fun, and I even got Emil to agree to publishing the video recording too :) ... Hope you enjoy it as much as I did. Here goes:

And of course the video:

Here's the transcript of our conversation:

RVB 00:00:01.709 So have I got your consent to record, please?

EE 00:00:07.850 Fine. [laughter] I want to put it on the record that you have my consent to record this and release the audio.

Monday, 21 June 2021

Revisiting Covid-19 contact tracing with Neo4j 4.3's relationship indexes

Last week was a great week at the "office". One that I don't think I will easily forget. Not only did we host our Nodes 2021 conference, but we also launched our new website, published a MASSIVE trillion-relationship graph, and announced a crazy $325M series F investment round that will fuel our growth in years to come. 

In all that good news, the new release of Neo4j 4.3 kinda disappeared into the background - which is why I thought it would be fun to write a short blogpost about one of the key features that are part of this new release: relationship property indexes.

This is a really interesting feature for a number of different reasons. But let's draw your attention to two main points of attention:

  1. Relationship indexes will lead to performance improvements: all of a sudden the Neo4j Cypher query planner is going to be able to use a lot more information, provided by these relationship indexes. The planner is becoming smarter - and therefore queries will become faster. We will explore this below.
  2. Relationship indexes will actually have interesting modelling implications: the introduction of these indexes could have far-reaching implications with regards to how we model certain things. Here's what we mean with that

You can see that both alternative models could have good use, but that the second model is simpler and potentially more elegant. It will depend on the use case to decide between the two - but in the past we would most often use the first model for performance reasons - and we will see below that that will no longer be a main reason with the addition of relationship indexes. Let's investigate.

Thursday, 10 June 2021

Network Analysis of Shakespeare's plays

What do you do when a new colleague starts to talk to you about how they would love to experiment with getting a dataset about Romeo & Juliet into a graph? Yes, that's right, you get your graph boots on, and you start looking out for a great dataset that you could play around with. And as usual, one things leads to another (it's all connected, remember!), and you end up with this incredible experiment that twists, turns and meanders into something fascinating. That's what happened here too.  

William Shakespeare

Finding a Data source

That was so easy. I very quickly located a Dataset on Kaggle that I thought would be really interesting. It's a comma-separated file, about 110k lines long and 10MB in size, that holds all the lines that Shakespeare wrote for his plays. It's just an amazing dataset - not too complicated, but terribly interesting.

The structure of the file has the following File headers:


Of course you can find the dataset on Kaggle yourself, but I actually quickly imported it into a google sheet version that you can access as well. This gsheet is shared and made public on the internet, and can then be downloaded as a csv at any time from this URL. This URL is what we will use for importing this data into Neo4j.

Tuesday, 11 May 2021

Graphistania 2.0 - The one with all the GraphStuff

Yes! Here's another great Neo4j podcast episode for you. I hope you will enjoy it -  just as much as I enjoyed recording it with Stefan.

Note that I have put all the interesting links together at the very bottom of the post. They all come from the Twin4j newsletter - to which you should all subscribe, obviously!

Here's the transcript of our conversation:

RVB: 00:00:44.353 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j, and yes, it's that time again. We are recording another Graphistania Neo4j podcast. And on the other side of this Zoom call is my dear partner in crime, Stefan, Stefan Wendin. How are you, man?
SW: 00:01:05.215 Always good. Always good meeting up, doing this with you, Rik. It's one of the favourites of the month. And I don't know, what can be better, talking about graphs with your best friend Rik in a sunny southern part of Sweden? Amazing. So good to go.
RVB: 00:01:22.239 Good to go. Fantastic. Great to have you here. And actually, we need to specify one thing, right, before we move on to the real topic of our podcast recording.