About two months ago, my colleague Niels published an amazing blogpost. He showed us how to solve a problem that I really recognized: to make sense of your age-old Spotify playlists that are getting seriously out of hand. I have this problem in the real world: I keep adding songs to my "favorites" playlist, or to some collaborative playlists that I have with my kids/friends - but I end up with these huge gathering pots of songs that... really don't make a lot of sense anymore, and really have not much use anymore.
spotipy wrapper of the Spotify Web API, and of course our favourite database, Neo4j and some of it's graphy tools (Graph Data Science to the rescue) to make a really fancy new set of Spotify playlists that were much more useable. Take a look at Niels' script over here. So I wanted to have a play with Niels' work in my own environment - and do some more exploration in Neo4j. Here's what happened.
Preparing to get the data from Spotify
So the first thing I did was to create a playlist in Spotify that I could have some fun with. I actually create two - a big one and a small one. The small one (<100 songs) is the one that I used for testing, and the larger one (>2000 songs) for actually doing some work. The important thing to look for here is the Spotify Playlist URI, which you can find by right-clicking the three dots:
Next thing that I needed to do was to login to the Spotify developer portal at https://developer.spotify.com/ and add another application. This basically creates a little sandbox environment for our future application to talk to the Spotify API. It's super easy to do: just go to https://developer.spotify.com/dashboard/applications and create a new application:
Once we have that, we need to edit a few settings, specifically the callback settings, then the last step is to add the credentials (your spotify user ID, the client ID, and the client secret) to our python script. The heading of the script will look like this:
Importing the data from Spotify using python
As you have probably guessed, I have customised the script from Niels. You can find it over here on github. Obviously you will need to install spotipy and the python neo4j driver for this to work, bit once you have that done, you can just spin up a neo4j server in your Neo4j Desktop and connect to that.
Side note: if you want to take a look at the different methods that get called in the script, you can use something like pycallgraph. I thought it was very useful to illustrate how the different parts of the script interact and refer to one another. Obviously, we would want to actually look at that in a proper Neo4j graph some day - but that's a different blogpost, methinks!
So when I run the python script now, I get this:
So now the data is nice and comfy in our Neo4j databae, and we can take a stab at exploring this!
Exploring the Spotify playlist in Neo4j
Just to get acquainted with our new best friend, the Neo4j database, here's what the model looks like:
And so we can start running some queries. This one gives me a feel for the data quantities:
match (n)and gives me this result:
return "Node" as Type,labels(n) as Name,count(n) as Count
return "Relationship" as Type,type(r) as Name, count(r) as Count
So let's take a look at some links between artists - typically something that is very difficult to do on other data platforms, and very straightforward in Neo4j. Let's for the links between "BRUCE" and "TOM":
match (a1:Artist), (a2:Artist), path = allshortestpaths ((a1)-[*]-(a2))this gives me an interesting subgraph:
where toUpper(a1.name) contains "BRUCE"
and toUpper(a2.name) contains "TOM"
We can also start using some of the Graph Data Science scores that I had added in the script:
match (a:Artist)return a.name as ArtistName, a.`pagerank-spotify` as SpotifyPagerank, a.`pagerank-workedwith` as WorkedWithPageRank, a.`pagerank-similarity` as PageRankSimilarityorder by a.`pagerank-spotify` desclimit 10;
Again some interesting stats:
And it's very easy to explore the immediate surroundings of some of these artists:
match path = ((a:Artist)-[*..2]-(conn))
order by a.`pagerank-spotify` desc
Which gives me this subgraph:
I have also added some stats about number of tracks per artist:
return a.name as Artist, count(t) as NumberOfTracks
order by NumberOfTracks desc
match (ar:Artist)<--(t:Track)-->(al:Album)-->(a)return ar.name, al.name, count(t)order by count(t) desclimit 10
I am sure you can up with lots of other interesting queries. I have added the ones in this blogpost over here on github, as usual.
Now, while I was hacking away with Niels' initial script and making it my own, Niels was working on some other really cool stuff. I want to include that in this post, as I think it could be fantastically useful to many people. It's called "NeoDash", and provides you with a super simple and powerful way to interact with Neo4j databases in a "dashboard style" way. Let me show it to you.
Using NeoDash to create a Spotify Dashboard
The way NeoDash works is super simple. All you need to do is add this rectangular parts to a grid, and each of these rectangles will
- behave in a certain way based on the type that you give it. You can make
- line chart,
- bar charts, and
- markdown text sections
- have a few parameters that you need to specify. More specifically, you will want to "instruct" each rectangle section to query your Neo4j instance based on some cypher query that you specify.
The result of this is a really nicely looking frontend dashboard that you can literally create in minutes without doing ANY coding. Take a look at this:
https://nielsdejong.nl/neodash/). The example .json that I created is also on github - but I figure you would want to create your own very quickly.
So that's about if for now. All the scripts/config is on the github page over here. Hope you have fun with this like I did - and if you have any comments or questions, then please reach out!
All the best
Post a Comment