Tuesday 15 November 2022

A 2nd, better way to WorldCupGraph

Hours after publishing my previous blogpost about the WorldCup Graph, I actually found a better, and more up to date dataset that contained all the data of the actual squads that are going to play in the actual World Cup in Qatar. I found it on this wikipedia page, which lists all the tables with the actual squads, some player details, coaches etc. as they were announced on 10th/11th of November.

So: I figured it would be nice to revisit the WorldcupGraph, and show a simpler and faster way to achieve the results of the previous exercise. So: I have actually put this data in this spreadsheet, and then downloaded a .csv version:

These two files are super nice and simple, and therefore we can actually use the Neo4j Data Importer toolset to import these really easily.

Using the importer to import the dataset

As you can see from the screenshot below, I built a very similar, but slightly different, datamodel with the data importer tool. If you go to the link above, and then download the .zip file that contains the dataset as well as the model definition, you can load the data in a heartbeat:

The results appear within seconds:

Once we have that, we can start doing some of the queries that we had shared in the previous blogpost.

Let's get the query juices going again

Let's build this up nice and slowly.

Simple query: Let's find "Belgium"

All we do is lookup the NationalTeam node:

MATCH (t:NationalTeam {name: "Belgium"}) 
RETURN t;

Next: Things connected to Belgium

Then we look at a path that explores direct and indirect connections to the "Belgium" node:

  • Direct connections we find like this:

      MATCH path = (t:NationalTeam {name: "Belgium"})--(conn) 
      RETURN path;
    

  • indirect connections we find like this:

      MATCH path = (t:NationalTeam {name: "Belgium"})-[*..2]-(conn) 
      RETURN path;
    

    The result looks like this subgraph:

Next, let's look at some Player-centric queries.

Paths between players

First, let's find the shortest paths between Lionel Messi and Kevin De Bruyne:

MATCH path = ALLSHORTESTPATHS ((p1:Player {name: "Lionel Messi"})-[*..5]-(p2:Player {short_name: "Kevin De Bruyne"}))
RETURN path
LIMIT 100;

Find matches with players from same club on two sides

The basic idea behind this query is simple: would there be any WorldCup matches, where there would be players playing for National teams, that are actually on the same team during the rest of the season. That could be interesting in many different ways, right?

So here's a simple query specifically for Belgium, essentially looking for a cyclic pattern (starting and ending with the Club node):

MATCH path = (c)--(p1:Player)--(t1:NationalTeam {name: "Belgium"})--(m:Match)--(t2:NationalTeam)--(p2:Player)--(c:Club)
RETURN path;

We can also make this more generic, but then we have to limit the results if we want to keep the results within the realms of what the Neo4j Browser can display:

MATCH path = (c)--(p1:Player)--(t1:NationalTeam)--(m:Match)--(t2:NationalTeam)--(p2:Player)--(c:Club)
RETURN path

Last but not least: load a database from a backup via http

Again, just a matter of hours after the previous post, I got a tweet from Christophe, with another amazing suggestion.

I had not even heard of this, but it's SO AMAZING, that I really do have to mention it.

Instead of going through the above import procedure, why not just restore a backup to your server, over http???

Here's how you do that:

  • in your neo4j.conf, you have to enable the loading of a database backup from a URI: use this entry to allow that dbms.databases.seed_from_uri_providers=URLConnectionSeedProvider

  • then, all you do is run this statement:

      CREATE DATABASE worldcup22 
      OPTIONS { 
      existingData: "use", 
      seedUri: "https://downloads.graphaware.com/neo4j-db-seeds/world-cup-2022-neo4j.backup"
      }
    

And then you wait a few seconds for the database to come online, and you are done!

That's all I have for now. I hope this was an interesting update for you, and that you can easily experiment with this yourself as well.

Have fun!

Cheers

Rik

No comments:

Post a Comment