Friday 12 June 2020

What Recommender Systems and Contact Tracing have in common

With the Covid-19 pandemic raging in the past few months, I have had a lot of interesting conversations about the use of graph technology and how it could help the world be a better, safer, healthier place. At Neo4j, we even put in place a specific Graphs4Good program, helping out where we can. There's splendid research going on at, companies like Elsevier chipping in (and using Neo4j) as well, and I have tried to write up my humble thoughts on how Contact Tracing could really benefit from using graphs as well. See some of my recent posts published on this blog.

Looking at that work, however, I always had a the feeling that I was looking at an excellent example of something else: an excellent example of a great "graph problem". The contact tracing example is a great fit for a tool like Neo4j, and the reason why that is the case is basically because the problem that we are trying to solve with contact tracing (understanding the pandemic spread in our societies, predicting potential evolutions of the pandemic based on contacts between healthy and sick individuals, protecting the healthcare systems by managing the rate of spreading this way) is very much suited for analysis with graph technology. It is a domain where the links between people, the links between people and places, their visits, their meetings are the main important data entities that we need to look at. It's the connections that matter. It's the connections that are becoming the "equal citizens" in the dataset - and therefore we need to spend time and resources analysing it.

But of course I know one thing for sure: there are plenty of other cases that are like that, that are true "graph problems" and that could really benefit from a graph approach to solving it. We know that from all the Neo4j project that we have been running for years. So how do I demonstrate that? How do I show that Contact Tracing is essentially the same thing like a recommendation engine? Or another graph application that we have come to know and love. Let's explore that.

Starting with the Contact Tracing graph

In this post and the next, I am actually going to try and show why these cases are so similar. Let's start with a look at the model that we initially created with the Contact Tracing testbed and the Faker libraries that I described in a previous post. In that post we worked with the following data model:
And we generated it with this script, which you can also find on github:

//Create persons using faker
foreach (i in range(1,5000) |
    create (p:Person { id : i })
    set p += fkr.person('1940-01-01','2020-05-15')
    set p.healthstatus = fkr.stringElement("Sick,Healthy")
    set p.confirmedtime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
    set p.birthDate = datetime(p.birthDate)
    set p.addresslocation = point({x: toFloat(51.210197+rand()/100), y: toFloat(4.402771+rand()/100)})
    set = p.fullName
    remove p.fullName

//Create places using faker
foreach (i in range(1,100) |
    create (p:Place { id: i, name: "Place nr "+i})
    set p.type = fkr.stringElement("Grocery shop,Theater,Restaurant,School,Hospital,Mall,Bar,Park")
    set p.location = point({x: toFloat(51.210197+rand()/100), y: toFloat(4.402771+rand()/100)})

create index on :Place(id);
create index on :Place(location);
create index on :Place(name);
create index on :Person(id);
create index on :Person(name);
create index on :Person(healthstatus);
create index on :Person(confirmedtime);

//create the VISITS using cypher
with range(1,5000) as range
unwind range as iteration
match (p:Person {id: toInteger(rand()*500)+1}), (pl:Place {id:toInteger(rand()*100)+1 })
    create (p)-[:PERFORMS_VISIT]->(v:Visit { id: iteration})-[:LOCATED_AT]->(pl)
    create (p)-[virel:VISITS]->(pl)
    set v.starttime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
    set virel.starttime = v.starttime
    set v.endtime = v.starttime + duration("PT"+toInteger(round(rand()*10))+"H"+toInteger(round(rand()*60))+"M")
    set virel.endtime = v.endtime;

We have this model in the database. 

We could add the meetings as well if we wanted to - but for now, I just want to focus on this core triangular datamodel, and see how we could transform that into a meaningful Recommender system graph.

Transforming the contact tracing graph into a recommender system graph

These kind of triangular shapes are quite common in many graph problems, and they are definitely very common in recommender systems. We can transform the above data model into this one quite easily:

Visits become purchases. Places become products. And all of the corresponding relationships would of course follow. So, based on the graph that we created above, it would be quite simple to run a few cypher statements to transform one graph into another one. Let's look at that:

Let's run this:

//Make purchases out of visits
match (vi:Visit)
set vi:Purchase
remove vi:Visit
remove vi.duration
remove vi.endtime
set vi.purchasetime = vi.starttime
remove vi.starttime;

//Make products out of Places
match (pl:Place)
set pl:Product
remove pl:Place
set = replace(, 'Place', 'Product')
remove pl.location
remove pl.type;

//remove the MEETS relationship
match (p1:Person)-[m:MEETS]-(p2:Person)
delete m;

//replace the VISITS relationship with a PURCHASES relationship
match (n)-[v:VISITS]->(m)
create (n)-[p:PURCHASES]->(m)
set p.purchasetime = v.starttime
delete v;

//replace the LOCATED_AT relationship with a OF_PRODUCT relationship
match (n)-[la:LOCATED_AT]->(m)
create (n)-[:OF_PRODUCT]->(m)
delete la;

//replace the PERFORMS_VISIT relationship with a PERFORMS_PURCHASE relationship
match (n)-[pv:PERFORMS_VISIT]->(m)
create (n)-[:PERFORMS_PURCHASE]->(m)
delete pv;

//replace the indexes
drop index on :Place(id);
drop index on :Place(name);
drop index on :Place(location);
create index on :Product(id);
create index on :Product(name);

And we get this model in the database:

This is clearly a simple transformation. The data may have changed - but the structure of the network has remained identical - NOTHING has changed. So we can apply all of the queries and patterns that we had discovered for the contact tracing use cases to the recommender system use case. Let's just explore that a little bit further with some easy recommender system queries.

Some easy recommender system queries

Let's just look at some very easy examples of a collaborative filtering pattern in a graph. 
You run the following query to 
  • find a person p1
  • who has bought a product pr1
  • which has been also been bought by another person p2
  • and p2 has bought another product pr2
  • which person p1 has NOT yet bought. 
We then obviously want to recommend to p1 that they should purchase pr2! The pattern matching query would look like this:

match path = (p1:Person)-[:PURCHASES]->(pr1:Product)<-[:PURCHASES]-(p2:Person)-[:PURCHASES]->(pr2:Product)
where not exists( (p1)-[:PURCHASES]->(pr2) )
return path
limit 10

and get the following result:

We can make it a little more sophisticated by requiring that p1 and p2 should have had more than 1 product in common which they have bought, and then do the same thing:
You run:

match path1 = (p1:Person)-[:PURCHASES]->(pr1:Product)<-[:PURCHASES]-(p2:Person)-[:PURCHASES]->(pr2:Product)<-[:PURCHASES]-(p1),
path2 = (p2)-[:PURCHASES]->(pr3:Product)
where not exists ((p1)-[:PURCHASES]->(pr3))
return path1, path2
limit 10;

and get an even stronger recommendation:
Clearly there are lots of other possibilities - but the concepts are extremely similar. We can apply the same principles and mechanisms that we can use for our contact tracing use case to a recommender system - and treat both problems for the graph problems that they really are. 

All of the scripts mentioned in this post are of course available on Github. Please take it for a spin an let me know what you think.

I hope this was a useful example - and look forward to hearing your feedback.

All the best


No comments:

Post a Comment