Wednesday, 23 September 2020

Exponential growth in Neo4j

With the current surges of the Covid-19 Pandemic globally, there is a huge amount of debate raging in our societies - everywhere. It’s almost as if the duality between left and right that has been dividing many political spectra in the past few years, is now also translating itself into a duality that is all about more freedom for the individual (and potentially - a higher spread of the SARS-CoV-2 virus), versus more restrictions for the individual. It’s such a difficult debate - with no clear definitive outcome that I know of. There’s just too many uncertainties and variations in the pandemic - I personally don’t see how you can make generic statements about it very easily.

One thing I do know though, is that very smart and loveable people, in my own social and professional circle and beyond, seem to be confused by some of the data. Very often, they make seemingly rational arguments about the numbers that are seeing - but ignoring the fact that we are looking at an Exponential Growth problem. In this post, I want to talk about that a little bit, and illustrate it with an example from the Neo4j world.

What is Exponential Growth exactly?

Let’s take a look at the definition from good old Wikipedia:
Exponential growth is a specific way that a quantity may increase over time. It occurs when the instantaneous rate of change (that is, the derivative) of a quantity with respect to time is proportional to the quantity itself. Described as a function, a quantity undergoing exponential growth is an exponential function of time, that is, the variable representing time is the exponent (in contrast to other types of growth, such as quadratic growth).
The basic functions that are being entertained here are very simple in terms of the maths:

Friday, 18 September 2020

OpenTrials in Neo4j - with a simple ETL job

I have been meaning to write about this for such a long time. Ever since the lockdown happened, I have been wanting to take a look at a particular biomedical dataset that looks extremely interesting to me: the OpenTrials dataset. If you are not familiar with this yet, this is what they say:

OpenTrials is a collaboration between Open Knowledge International and Dr Ben Goldacre from the University of Oxford DataLab. It aims to locate, match, and share all publicly accessible data and documents, on all trials conducted, on all medicines and other treatments, globally. 

It's a super interesting initiative, and it really flows from the idea that in much of the very intensive, expensive biomedical research, we should be looking at how to better use and re-use the knowledge that we are building up. Kind of like what people in the initiative, (remember the interview I did with Daniel - so great!) and others are doing. 

Downloading and restoring the dataset

It's a bit hidden, but you can actually download a (slightly older, but still) dataset of the OpenTrials dataset from their website. The dataset is actually a Postgres dump file: I got the latest one from

Monday, 7 September 2020

Graphistania 2.0 - Episode 8 - The one after the Covid-summer

No sure if we should be happy or sad - but hey - the Covid-19 summer of 2020 is almost behind us. Like most people, I found it quite a strange and unusual summer, with very few foreign adventures (although I did manage to squeeze in a cycling/camping trip to the French Alps in July), lots of cycling, some great family time... and of course lots of time with graphs :) ... 

So that means that we are also kicking the Graphistania podcast back into gear - here's the next episode for you:

Here's the transcript of our conversation:

RVB: 00:00:15.863 [music] Hey, Stefan, I do need to ask you for consent, I think, right?

SW: 00:00:19.847 Hi. Yeah, I consent. [laughter] This is always the weird moment.

RVB: 00:00:24.727 Exactly. I thought, "Start with that one again."

SW: 00:00:28.436 Exactly, just to create a little bit of tension in the air.

Wednesday, 8 July 2020

Graphistania 2.0 - Episode 7 - The one after the Covid-19 lockdown

Yes! We were able to record and publish another episode of our Graphistania podcast. It's been an amazing and turbulent couple of months - but before the summer holiday season really takes off we wanted to get this to you.

Wishing you a fantastic and relaxing time - and in the mean time enjoy this episode!

Here's the transcript of our conversation:

Monday, 29 June 2020

Executives of Belgian Public Companies - revisited!

Tuesday, 16 June 2020

What VAT Fraud Detection and Contact Tracing have in common

In the previous blogpost we already illustrated in some detail that the contact tracing graph that we built, has a lot of similarities with a product recommendation system graph. We focused on a the Person-Visit-Place triangle that we had built in our Contact Tracing Graph data model, and converted the red and yellow bits into a Person-Purchase-Product triangles.
There is of course another part to the contact tracing graph that is also very interesting: the Person-Meets-Person subgraph. We derived that graph from the original contact tracing graph, by assuming that if a Person had visited a Place at the same time as another person, they would have been likely to have had a meeting there. This Person-Meets-Person subgraph was the basis for most of our graph analytics.

Friday, 12 June 2020

What Recommender Systems and Contact Tracing have in common

With the Covid-19 pandemic raging in the past few months, I have had a lot of interesting conversations about the use of graph technology and how it could help the world be a better, safer, healthier place. At Neo4j, we even put in place a specific Graphs4Good program, helping out where we can. There's splendid research going on at, companies like Elsevier chipping in (and using Neo4j) as well, and I have tried to write up my humble thoughts on how Contact Tracing could really benefit from using graphs as well. See some of my recent posts published on this blog.

Looking at that work, however, I always had a the feeling that I was looking at an excellent example of something else: an excellent example of a great "graph problem". The contact tracing example is a great fit for a tool like Neo4j, and the reason why that is the case is basically because the problem that we are trying to solve with contact tracing (understanding the pandemic spread in our societies, predicting potential evolutions of the pandemic based on contacts between healthy and sick individuals, protecting the healthcare systems by managing the rate of spreading this way) is very much suited for analysis with graph technology. It is a domain where the links between people, the links between people and places, their visits, their meetings are the main important data entities that we need to look at. It's the connections that matter. It's the connections that are becoming the "equal citizens" in the dataset - and therefore we need to spend time and resources analysing it.

But of course I know one thing for sure: there are plenty of other cases that are like that, that are true "graph problems" and that could really benefit from a graph approach to solving it. We know that from all the Neo4j project that we have been running for years. So how do I demonstrate that? How do I show that Contact Tracing is essentially the same thing like a recommendation engine? Or another graph application that we have come to know and love. Let's explore that.