Hours after publishing my previous blogpost about the WorldCup Graph, I actually found a better, and more up to date dataset that contained all the data of the actual squads that are going to play in the actual World Cup in Qatar. I found it on this wikipedia page, which lists all the tables with the actual squads, some player details, coaches etc. as they were announced on 10th/11th of November.
So: I figured it would be nice to revisit the WorldcupGraph, and show a simpler and faster way to achieve the results of the previous exercise. So: I have actually put this data in this spreadsheet, and then downloaded a .csv
version:
- one
.csv
file containing the squads - one
.csv
file containing the tournament schedule (just like in the previous example, actually)
These two files are super nice and simple, and therefore we can actually use the Neo4j Data Importer toolset to import these really easily.
Using the importer to import the dataset
As you can see from the screenshot below, I built a very similar, but slightly different, datamodel with the data importer tool. If you go to the link above, and then download the .zip
file that contains the dataset as well as the model definition, you can load the data in a heartbeat:
The results appear within seconds:
Once we have that, we can start doing some of the queries that we had shared in the previous blogpost.
Let's get the query juices going again
Let's build this up nice and slowly.
Simple query: Let's find "Belgium"
All we do is lookup the NationalTeam
node:
MATCH (t:NationalTeam {name: "Belgium"})
RETURN t;
Next: Things connected to Belgium
Then we look at a path that explores direct and indirect connections to the "Belgium" node:
Direct connections we find like this:
MATCH path = (t:NationalTeam {name: "Belgium"})--(conn) RETURN path;
indirect connections we find like this:
MATCH path = (t:NationalTeam {name: "Belgium"})-[*..2]-(conn) RETURN path;
The result looks like this subgraph:
Next, let's look at some Player
-centric queries.
Paths between players
First, let's find the shortest paths between Lionel Messi and Kevin De Bruyne:
MATCH path = ALLSHORTESTPATHS ((p1:Player {name: "Lionel Messi"})-[*..5]-(p2:Player {short_name: "Kevin De Bruyne"}))
RETURN path
LIMIT 100;
Find matches with players from same club on two sides
The basic idea behind this query is simple: would there be any WorldCup matches, where there would be players playing for National teams, that are actually on the same team during the rest of the season. That could be interesting in many different ways, right?
So here's a simple query specifically for Belgium, essentially looking for a cyclic pattern (starting and ending with the Club
node):
MATCH path = (c)--(p1:Player)--(t1:NationalTeam {name: "Belgium"})--(m:Match)--(t2:NationalTeam)--(p2:Player)--(c:Club)
RETURN path;
We can also make this more generic, but then we have to limit the results if we want to keep the results within the realms of what the Neo4j Browser can display:
MATCH path = (c)--(p1:Player)--(t1:NationalTeam)--(m:Match)--(t2:NationalTeam)--(p2:Player)--(c:Club)
RETURN path
Last but not least: load a database from a backup via http
Again, just a matter of hours after the previous post, I got a tweet from Christophe, with another amazing suggestion.
Would be nice to publish a backup on Github as well so we can use CREATE DATABASE FROM https://.... in Neo4j 5 ;-)
— Christophe Willemsen (@ikwattro) November 15, 2022
I had not even heard of this, but it's SO AMAZING, that I really do have to mention it.
Instead of going through the above import procedure, why not just restore a backup to your server, over http???
Here's how you do that:
in your
neo4j.conf
, you have to enable the loading of a database backup from a URI: use this entry to allow thatdbms.databases.seed_from_uri_providers=URLConnectionSeedProvider
then, all you do is run this statement:
CREATE DATABASE worldcup22 OPTIONS { existingData: "use", seedUri: "https://downloads.graphaware.com/neo4j-db-seeds/world-cup-2022-neo4j.backup" }
And then you wait a few seconds for the database to come online, and you are done!
That's all I have for now. I hope this was an interesting update for you, and that you can easily experiment with this yourself as well.
Have fun!
Cheers
Rik
- My Blog: https://blog.bruggen.com/
- My Tweets: https://twitter.com/rvanbruggen
- My LinkedIn: https://www.linkedin.com/in/rikvanbruggen/
- My Gists: https://gist.github.com/rvanbruggen
No comments:
Post a Comment