Tuesday 21 April 2020

(Covid-19) Contact Tracing Blogpost - part 4/4

Part 4/4: Some loose ends for the Contact Tracing graph

In this last part of this blogpost series, I wanted to quickly articulate some interesting points that I found useful during these experiments.

Using the geospatial data for some additional insights

You may remember that back in part 1, I imported some geospatial properties into our graph - assigning coordinates to all of the Places nodes that we have in the graph. Clearly this also opens up further possibilities for additional analysis, which I have not explored yet in the previous posts. Suffice to say that this data is super easy to work with in Neo4j. Just run a query like this:
match (pl:Place) return pl.id, pl.name, pl.type, pl.location limit 10;
And you can see that the pl.location property has a real geospatial data type that I can use:

We can use these point properties to calculate distance for example - which we have not explored at all yet. Also, we can use this to plug other tools on top of Neo4j, like for example stellasia's neomap Desktop application to visualize nodes with geographic attributes on a map. It's a very rudimentary, but super easy to use tool to look at where the Places are actually located. All I need to do is use this custom query in the configuration:

MATCH (n:Place)
RETURN n.location.x as latitude, n.location.y as longitude
LIMIT 10000
And before I know it I am looking at the Places and their heatmap, neatly plotted on an actual map of Antwerp.


I have added the neomap config file to the github gist for your convenience. Lots of additional possibilities here, especially if we would actually add address / geospatial information to the Person nodes as well. Let's leave that for a future exploration!

Advanced visualisation using Bloom

Of course, this kind of tracing data would need to be used by non-technical people - people that are actually in charge of making sure that a pandemic does not get out of hand, or that want to reduce the impact of the epidemic on their organisation. Therefore, we need non-technical tools to interact with this data. Enter something like Neo4j Bloom, to help us with this.

I have done a few experiments with the latest version of Bloom for this blogpost series, and have found it to be super nice to customize and interact with. Here's a couple of things that I tried.

First: I created a custom search phrase to look for a particular community. I could do many more of these - and the principle is always the same: you wrap a Cypher query with a parametrized, near-natural language search phrase that could be understandable and useable by domain experts. Super easy to do.
The result of such a query could look like this.
Now, you will notice that some of the nodes look different and better than others. That is because of the custom styling that I added. Some of the sizes and colours of the nodes are actually determined by the properties: the bigger and red nodes are the "Sick" people in the graph - as we probably want to pay more attention to those.

I have included the Bloom perspective that I created in the github gist, but I am sure you can make this even nicer :) …

Security and privacy in the contact tracing graph

Note that these queries require Neo4j Enterprise 4.0.3 and apoc 4.0.0.6.

Last but not least: what about security and privacy protection? Having worked in the security industry for a very long time, I full well understand that this could be a major issue. Contact tracing is sensitive information, and we do not want this to fall into the wrong hands. There's lots of debate on this ongoing, but I thought I would apply some of the principles that I have highlighted earlier in this graph, based on the interesting fine-grained security features that Neo4j 4.0 brings to the table.

Read Securing my Beergraph with Neo4j 4.0 or Securing a sample fraud graph with Neo4j 4.0 for another example. In this contact tracing graph, I applied exactly the same principles, by just assuming that the tracing information should probably be fully accessible to governmental analysts or medical staff, but that individual patients would only get partial access to the graph. To illustrate this, I have added a couple of examples to the github gist that make this available on Neo4j 4.0.


Note that at the time of writing, this is a bit of an awkward situation: neither Neo4j Bloom nor the Graph Data Science Library are available on Neo4j 4.0 yet. This will get addressed in the upcoming weeks, but today you would have to arrange for some kind of manual transfer of information between the operational database on 3.5, and the secure database on 4.0. This, of course, will no longer be a problem in a few weeks from the time of writing - as the Bloom and Graph Data Science versions will support 4.0 very soon.


Here's how we go about this:

//create the patient user
:use system;
CREATE USER patient SET PASSWORD "changeme" CHANGE NOT REQUIRED;

//create the patientrole based on the reader role
:use system;
CREATE ROLE patientrole AS COPY OF reader;

//show the roles
:use system;
SHOW ROLES;

//put the patient user into the patientrole
:use system;
GRANT ROLE patientrole TO patient;

//Add read restriction on person names for patients
:use system;
DENY READ {name} ON GRAPH `neo4j` NODES Person TO patientrole;


If we then run a very simple query like the following, first as the default Neo4j administrator user, and then as the patient user:

:use neo4j;
MATCH (p1:Person {healthstatus:"Healthy"}), (p2:Person {healthstatus:"Sick"}),
path = allshortestpaths ((p1)-[*]-(p2))
RETURN path
limit 10;


Then the results would immediately be more secure:
On top of this property level security that is super easy to implement, we can also add further restrictions on how the patient user could access / traverse the graph. Let's say that we would limit the traversal of the MEETS relationship by patients:

//Additional restriction on security: traversal over MEETS relationship
:use system;
DENY TRAVERSE ON GRAPH `neo4j` RELATIONSHIPS MEETS to patientrole;


Then the query above would yield a different result set:
Again: many more possibilities here, but that will be for others to explore at a later date.

Hope this has been as interesting for you as it was for you. I hope to have shown with this blogpost series that Neo4j is just uniquely well suited for these types of massively important and interesting applications that our governments are going to develop in the next few months. I look forward to working with them to make them real, safe, secure and reliable.

Thanks a lot for your time - as always: comments welcome.

All the best

Rik

No comments:

Post a Comment