Monday, 21 June 2021

Revisiting Covid-19 contact tracing with Neo4j 4.3's relationship indexes

Last week was a great week at the "office". One that I don't think I will easily forget. Not only did we host our Nodes 2021 conference, but we also launched our new website, published a MASSIVE trillion-relationship graph, and announced a crazy $325M series F investment round that will fuel our growth in years to come. 

In all that good news, the new release of Neo4j 4.3 kinda disappeared into the background - which is why I thought it would be fun to write a short blogpost about one of the key features that are part of this new release: relationship property indexes.

This is a really interesting feature for a number of different reasons. But let's draw your attention to two main points of attention:

  1. Relationship indexes will lead to performance improvements: all of a sudden the Neo4j Cypher query planner is going to be able to use a lot more information, provided by these relationship indexes. The planner is becoming smarter - and therefore queries will become faster. We will explore this below.
  2. Relationship indexes will actually have interesting modelling implications: the introduction of these indexes could have far-reaching implications with regards to how we model certain things. Here's what we mean with that

You can see that both alternative models could have good use, but that the second model is simpler and potentially more elegant. It will depend on the use case to decide between the two - but in the past we would most often use the first model for performance reasons - and we will see below that that will no longer be a main reason with the addition of relationship indexes. Let's investigate.

Thursday, 10 June 2021

Network Analysis of Shakespeare's plays

What do you do when a new colleague starts to talk to you about how they would love to experiment with getting a dataset about Romeo & Juliet into a graph? Yes, that's right, you get your graph boots on, and you start looking out for a great dataset that you could play around with. And as usual, one things leads to another (it's all connected, remember!), and you end up with this incredible experiment that twists, turns and meanders into something fascinating. That's what happened here too.  

William Shakespeare

Finding a Data source

That was so easy. I very quickly located a Dataset on Kaggle that I thought would be really interesting. It's a comma-separated file, about 110k lines long and 10MB in size, that holds all the lines that Shakespeare wrote for his plays. It's just an amazing dataset - not too complicated, but terribly interesting.

The structure of the file has the following File headers:

DatalinePlayPlayerLinenumberActSceneLinePlayerPlayerLine
abcdefghijklmnopqr

Of course you can find the dataset on Kaggle yourself, but I actually quickly imported it into a google sheet version that you can access as well. This gsheet is shared and made public on the internet, and can then be downloaded as a csv at any time from this URL. This URL is what we will use for importing this data into Neo4j.