Monday, 8 March 2021

Contact tracing in Neo4j: using triggers to automate actions

Last year I was able to spend some time thinking and writing about Contact Tracing, especially since it should probably be a very important part of any kind of system that allows us to deal with pandemics like the Covid-19 pandemic. You kan find some of these articles on this blog if you are interested. This is still as relevant as ever, but seems to have been a more difficult thing to implement practically in a free and privacy-sensitive society like ours (see this research and this article for some thoughts on this). Nevertheless it seems relevant still, as the Canton of Geneva has demonstrated.

In this article I wanted to share a very short little piece of work that I did together with my colleagues on the part that ACTIONS contact tracing activities. What do we ACTUALLY DO when we find a person that is at risk? What alarms should go off? Who should get notified? Where should we be looking for our next potential patient?

This, I think, is something that we could solve with a "database trigger". This is a systematic way of always reacting to events in the database in a consistent and predictable way. It has been demonstrated a number of times before - I really liked David's way of explaining it in his article on data loading/streaming with Neo4j's triggers. David describes it well:
What’s a Trigger?
If you haven’t worked with triggers before, a trigger is a database method of running some action whenever an event happens. You can use them to make the database react to events, rather than passively accept data, which makes them a good fit for streaming data, which is a set of events coming in.
Triggers need two pieces:
  • A trigger condition (which event should the trigger fire on?)
  • A trigger action (what to do when the trigger fires?)
Triggers are available in Neo4j through Awesome Procedures on Cypher (APOC), and you can find the documentation on them here
So what I want to take you through here is just to take the contact tracing dataset, and show you how easy even a salesperson-lost-in-cypherspace like myself could make it work. So here goes.

Installing and preparing

So the first thing we need to do is to create new database in Neo4j Desktop. Once installed, we should install the APOC plugin, which should be directly available in Neo4j Desktop's "plugins" section. The trigger functionality that we will be using us actually part of the APOC library, and you can find documentation on them over here.

In order to generate the dataset, I will use the Faker plugin again. See the previous blogpost for more on that, but installing that is actually really easy. First we need to download the latest release from github, unzip that file into plugins directory of your freshly baked server, and then manually do a small piece of restructuring in the file structure:

  • put  neo4jFaker-0.9.1.jar in plugins directory
  • and put ddgres directory in plugins directory. 
  • delete all other directories
Once that done, we need to edit the neo4j.conf file so that the faker library will actually be allowed to run inside your Neo4j server. You do that by adding* 

to neo4j.conf. Easy. The file structure should look like this:

And the neo4j.conf should have a line like this.

Once that's done the last change is to allow the apoc triggers to run by adding 


to neo4j.conf as well.

Generating the dataset

First thing after starting the database is to fire up the Neo4j Browser, and check if faker library functions are active. That's easy:

call dbms.functions() yield name
with name
where name starts with "fkr"
return *

If it returns with a list of functions, then we are good to go.

Next we just need to run two queries to create the dataset: 5000 persons with 15000 MEETS relationships.

foreach (i in range(1,5000) |
    create (p:Person { id : i })
    set p += fkr.person('1940-01-01','2020-05-15')
    set p.healthstatus = fkr.stringElement("Sick,Healthy")
    set p.confirmedtime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
    set p.birthDate = datetime(p.birthDate)
    set p.addresslocation = point({x: toFloat(51.210197+rand()/100), y: toFloat(4.402771+rand()/100)})
    set = p.fullName
    remove p.fullName

and then

match (p:Person)
with collect(p) as persons
call fkr.createRelations(persons, "MEETS" , persons, "1-n") yield relationships as meetsRelations1
call fkr.createRelations(persons, "MEETS" , persons, "1-n") yield relationships as meetsRelations2
call fkr.createRelations(persons, "MEETS" , persons, "1-n") yield relationships as meetsRelations3
with meetsRelations1+meetsRelations2+meetsRelations3 as meetsRelations
unwind meetsRelations as meetsRelation
set meetsRelation.starttime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
set meetsRelation.endtime = meetsRelation.starttime + duration("PT"+toInteger(round(rand()*10))+"H"+toInteger(round(rand()*60))+"M")
set meetsRelation.meettime = duration.between(meetsRelation.starttime,meetsRelation.endtime)
set meetsRelation.meettimeinseconds=meetsRelation.meettime.seconds;

That finishes in seconds.

And then we end up with a very simple graph data model:

Now we can start asking ourselves how we can take these automatically triggered actions based on database triggers. So let's explore that.

Working with triggers

As mentioned above, you can find the documentation for the trigger procedures and functions over here. Once the configuration flag is set (apoc.trigger.enabled=true) in neo4j.conf, we can add the triggers to the database, and start testing them. so let's add them first.

Adding two triggers

You will find that there are different types of triggers that you can add. I will explore two types in this post:

  1. a trigger that will fire as soon as a LABEL is added to a node in the database.
  2. a trigger that will fire as soon as a property is set in the database. 
The structure and process of adding the triggers is very similar and simple. Let's walk through it.

First we will be  adding the CHANGE LABEL trigger to the database - we call this trigger "highriskpersonadded":

CALL apoc.trigger.add(
'UNWIND apoc.trigger.nodesByLabel($assignedLabels,"HighRiskPerson") AS n MATCH (n)-[:MEETS]-(p:Person) set p:ElevatedRiskPerson'

Next we are adding a CHANGE PROPERTY trigger to the database - we call this trigger "healthstatuspropertychangedtosick":

CALL apoc.trigger.add(
'UNWIND apoc.trigger.propertiesByKey($assignedNodeProperties,"healthstatus") AS prop
with prop.node as n
MATCH (n)-[:MEETS]-(p:Person) where n.healthstatus = "Sick"
set p:MuchElevatedRiskPerson'

Once these have been added to the database, we can start working with them, and test them out.

Managing the triggers

There's a number of procedures that allow us to see which triggers have been added, and which are active:
  •  Listing the trigger:

    call apoc.trigger.list()
  • There's another procedure for removing the trigger:

    call apoc.trigger.remove('<name of trigger>')

  • And one for pausing the trigger:

    call apoc.trigger.pause('<name of trigger>')

  • Or resuming the trigger:

    call apoc.trigger.resume('<name of trigger>')
But now, of course, we want to see what happens to our database when the triggers start firing. Let's explore that.

Triggering the triggers

The first thing that we will do is we will write a cypher query that will trigger the CHANGE LABEL trigger. We do that by selecting one person and assigning the "HighRiskPerson" label to that node:

match (p:Person {healthstatus:"Healthy"})
with p
limit 1
set p:HighRiskPerson;

This quasi immediately completes:

And then we immediately see that our Neo4j browser reacts by highlighting the fact that some new labels have been added to the database.

And then we run a quick query to see that indeed, the HighRiskPerson node has been connected to other nodes that have the ElevatedRiskPerson label:

So our first trigger is clearly working. Let's try the other one.

We will now triggering the CHANGE PROPERTY trigger by at the same time set a new label (VeryHighRiskPerson) and setting the healthstatus property to "Sick".

match (p:Person {healthstatus:"Healthy"})
with p
limit 1
set p:VeryHighRiskPerson
set p.healthstatus = "Sick";

That finishes immediately, of course.

And next thing we know, we also see that the VeryHighRiskPerson and MuchElevatedRiskPerson labels have been added:

So that all seems to have worked. This really allows for much more automated actions on the database, which could be extremely useful in a sensitive use case like Contact Tracing - but I could equally see how this would be super useful for use cases like Fraud Detection or something similar.

Hope this useful for everyone. All the code in this example has been published on github, of course. If you have any feedback, please reach out.

All the best


No comments:

Post a comment