Tuesday, 28 April 2020

Contact tracing guide for the Neo4j Browser

Based on the past two blogposts on (Covid-19) contact tracing (see here for the posts, here for the movies), I thought it would be a good idea to pick up an old skill - to create a Browser Guide for Neo4j for people to look at this dataset example more easily. I did this a long time ago for my beergraph as well, so why not do it for the contacttracinggraph :) ...

About the Neo4j Browser and Browser guides

Here's what this is: with Neo4j, the native graph database, we always ship a default user interface called the "Neo4j Browser". It's a interactive application that communicates with the database, and that essentially allows you to fire of Cypher queries and look at / manipulate the contents of your database. Read up about it over here. Once you have done that you will realise that the Browser is actually more than that: it's also a great way for people to learn more about Neo4j, and has a built in mechanism to share "guides" to various topics. If you experiment a bit with the following commands:

A guided tour of Neo4j Browser
:play intro
Graph database basics
:play concepts
Neo4j’s graph query language introduction
:play cypher
The Movie Graph
A mini graph model of connections between actors and movies
:play movie graph
The Northwind Database
A classic use case of RDBMS to graph with import instructions and queries
:play northwind graph
you will get to see a number of topics that allow you to familiarise yourself with it really easily. Most of these guides are either built in or available for serving from a webserver. But: you can also develop these guides yourself. There's a really nice worked example over here, but the process really is dead simple:

Friday, 24 April 2020

(Covid-19) Contact tracing follow-up - demo movies

In my previous post I outlined the 4 different blogposts that I wrote about using the Neo4j Graph Database for Contact Tracing. Each of these posts is actually interesting in and of its own, and actually makes for a really nice demo of the capabilities in Neo4j. So I created those today - and put them on a Youtube playlist for you:

Tuesday, 21 April 2020

(Covid-19) Contact tracing - an amazing graph problem & rabbit hole

In the past couple of days, I have been working with several of my colleagues on a number of projects, all around the world, that are preparing our societies for a post-lockdown strategy that will allow us to keep the Covid-19 pandemic under control, and still regain some of our freedoms. This will be tricky, for sure, but as in so many problems, technology can probably assist.

That's why I started experimenting with how a graph database like Neo4j could help with this. Some of the tracing problems that we will face, are uniquely well suited for a graph database approach: it allows for us to see and understand the indirect contacts that healthy and sick people may have had with one another, and the effects that this could cause in our environments. It also allows for some unique predictive analytics: the structure of our contacts, the network/graph that it constructs, actually says a lot about the importance that parts of the network may play in the evolution of the pandemic. Graph Data Science can give us pointers as to where this should direct our policies.

This has ended up being quite an extensive piece of work. In order to keep it readable, I have cut it up into 4 blogposts, which I will put up all at the same time:
There's so much potential in this dataset, and in this problem domain in general. I feel like I have gone into the rabbit hole and have just resurfaced for some air. But who knows, maybe I will dive back in and do some more digging - after all, this is interesting stuff, and I love working on interesting topics.

Hope this is as interesting for you as it was for me.

All the best


Note that these demos will require the following environment: 
  • Neo4j Desktop 1.2.7, Neo4j Enteprise 3.5.17, apoc, gds 1.1.0, or
  • Neo4j Desktop 1.2.7, Neo4j Enterprise 4.0.3, apoc (NOT later! a bug in apoc.coll.max/apoc.coll.min needs to be resolved)

(Covid-19) Contact Tracing Blogpost - part 4/4

Part 4/4: Some loose ends for the Contact Tracing graph

In this last part of this blogpost series, I wanted to quickly articulate some interesting points that I found useful during these experiments.

Using the geospatial data for some additional insights

You may remember that back in part 1, I imported some geospatial properties into our graph - assigning coordinates to all of the Places nodes that we have in the graph. Clearly this also opens up further possibilities for additional analysis, which I have not explored yet in the previous posts. Suffice to say that this data is super easy to work with in Neo4j. Just run a query like this:
match (pl:Place) return pl.id, pl.name, pl.type, pl.location limit 10;
And you can see that the pl.location property has a real geospatial data type that I can use:

(Covid-19) Contact Tracing Blogpost - part 3/4

Part 3/4: Graph Analytics on the contact tracing graph

Note that these queries require environment: Neo4j Desktop 1.2.7, Neo4j Enteprise 3.5.17, apoc and GDS 1.1. At the time of writing, Neo4j 4.0.3 is not yet supported by GDS 1.1.

One of the fantastic qualities of the graph data model, I have always found, is that it can give you interesting insights - without even looking at the data. The structure of the network can give you some really interesting new revelations, that you would not even have considered before. That is why Neo4j has invested a ton of effort in providing our industry with a completely new set of capabilities that allow us to discover these structural insights more easily - in the form of a new Graph Data Science Library. We have recently released the product, and you should read up on it in detail, and I think it would be a great and interesting idea to explore it on this Contact Tracing dataset that we have built in part 1 and queried in part 2.

Some data prep for analytics: inferring a new relationship

In order to do that, there's actually something that's missing: a new relationship between two Persons, which infers the fact that two people have MET. We can do that based on the overlap time of their visits to the same place - therefore leveraging a query from part 2. This is what are going to do: create a MEETS relationship between 2 Person nodes, based on the overlap - and we do that like this:

match (p1:Person)-[v1:VISITS]->(pl:Place)<-[v2:VISITS]-(p2:Person)
where id(p1)<id(p2)
with p1, p2, apoc.coll.max([v1.starttime.epochMillis, v2.starttime.epochMillis]) as maxStart,
apoc.coll.min([v1.endtime.epochMillis, v2.endtime.epochMillis]) as minEnd
where maxStart <= minEnd
with p1, p2, sum(minEnd-maxStart) as meetTime
create (p1)-[:MEETS {meettime: duration({seconds: meetTime/1000})}]->(p2);

As you can see, we are storing the length of the inferred meeting as a duration property on the relationship. The result appears very quickly:

(Covid-19) Contact Tracing Blogpost - part 2/4

Part 2/4: Querying the contact tracing graph

Note that these queries require environment: Neo4j Desktop 1.2.7, Neo4j Enteprise 3.5.17, apoc or Neo4j Enterprise 4.0.3, apoc (NOT later! a bug in apoc.coll.max/apoc.coll.min needs to be resolved)

In Part 1 we created and imported a contact tracing graph. Now, we are ready to experiment with some interesting graphy queries.

The most interesting part about many if these queries, I find, is that they all relay on the fundamental principle of "hypothesis-free querying". What I mean by this is, is that graph querying, in my experience and opinion, have this wonderful quality about them that you can actually interact with the data in a way that does not require you to hypothesize too much about the structure of the dataset. This is important, because very often I just won't know what I don't know, and making meaningful hypotheses is actually really hard and complicated. The fact that we don't have to do that, is a great win.

As always, you will find all queries are on github, so that you can have a play with it yourself as well. So let's dive right into it.

Who has a sick person potentially infected

To answer that, I will "grab" a sick person from the dataset, and then just walk the dataset from the person to the other persons that are currently healthy. The query goes like this:

match (p:Person {healthstatus:"Sick"})
with p
limit 1
match (p)--(v1:Visit)--(pl:Place)--(v2:Visit)--(p2:Person {healthstatus:"Healthy"})
return p.name as Spreader, v1.starttime as SpreaderStarttime, v2.endtime as SpreaderEndtime, pl.name as PlaceVisited, p2.name as Target, v2.starttime as TargetStarttime, v2.endtime as TargetEndttime;

(Covid-19) Contact Tracing Blogpost - part 1/4

Part 1/4: creating and importing a synthetic contact tracing graph

As we are living in these very interesting times, and many countries are still going through a massive operation to slow down the devastating effects of the SARS-CoV-2 virus and its CoViD-19 effects, there is of course also a lot of discussion already going on what we will do after the initial surge of the virus has passed, and when the various countries and regions will start opening up their economies.

A tactic many countries seem to be taking is the implementation of some kind of Contact Tracing. Using the technology on our phones and our pervasive internet connectivity, we could imagine a way to implement "distancing" and isolation of people that are either already victim of, or vulnerable to, CoViD-19. This seems like a logical, and useful tactic, that could help us to open up our economies for business, while still maintaining the basic attitude of wanting to "flatten the curve". Of course there are still many, many issues with this approach, not in the least with regards to patient privacy and political freedoms, but it seems like an interesting track to explore, at least. Many government organisations have therefore started to explore this, and are working with some of the industry giants like Google and Apple to make this a reality.

This evolution started a whole range of discussions inside Neo4j, especially with regards to the usefulness of a graph database to make sense of some of these contact traceability databases. I remember reading Christakis and Fowler's Connected book, and understanding that virus outbreaks are one of those cases where our direct contacts don't necessarily matter - or at least not matter alone. Indirect contacts, between our friends' friends' friends, can be just as important. So lots of interesting, graph-oriented questions arise: How could we maximise the effect of our distancing measures, and of any contact tracing applications that we put in place? How could we use the excellent and predictive power of the graph to find out which of a person's connections could be most risky? How can we use graph analytics to better understand the structural power and weakness of our social networks? And many more.

So, being locked down myself (although Belgium clearly has a much software stance than for example France or Italy), I thought I would spend some time exploring this. That's what this blogpost series is going to be about - so let's get right to it.