Wednesday, 8 July 2020

Graphistania 2.0 - Episode 7 - The one after the Covid-19 lockdown

Yes! We were able to record and publish another episode of our Graphistania podcast. It's been an amazing and turbulent couple of months - but before the summer holiday season really takes off we wanted to get this to you.

Wishing you a fantastic and relaxing time - and in the mean time enjoy this episode!

Here's the transcript of our conversation:

Monday, 29 June 2020

Executives of Belgian Public Companies - revisited!

Tuesday, 16 June 2020

What VAT Fraud Detection and Contact Tracing have in common

In the previous blogpost we already illustrated in some detail that the contact tracing graph that we built, has a lot of similarities with a product recommendation system graph. We focused on a the Person-Visit-Place triangle that we had built in our Contact Tracing Graph data model, and converted the red and yellow bits into a Person-Purchase-Product triangles.
There is of course another part to the contact tracing graph that is also very interesting: the Person-Meets-Person subgraph. We derived that graph from the original contact tracing graph, by assuming that if a Person had visited a Place at the same time as another person, they would have been likely to have had a meeting there. This Person-Meets-Person subgraph was the basis for most of our graph analytics.

Friday, 12 June 2020

What Recommender Systems and Contact Tracing have in common

With the Covid-19 pandemic raging in the past few months, I have had a lot of interesting conversations about the use of graph technology and how it could help the world be a better, safer, healthier place. At Neo4j, we even put in place a specific Graphs4Good program, helping out where we can. There's splendid research going on at Covidgraph.org, companies like Elsevier chipping in (and using Neo4j) as well, and I have tried to write up my humble thoughts on how Contact Tracing could really benefit from using graphs as well. See some of my recent posts published on this blog.

Looking at that work, however, I always had a the feeling that I was looking at an excellent example of something else: an excellent example of a great "graph problem". The contact tracing example is a great fit for a tool like Neo4j, and the reason why that is the case is basically because the problem that we are trying to solve with contact tracing (understanding the pandemic spread in our societies, predicting potential evolutions of the pandemic based on contacts between healthy and sick individuals, protecting the healthcare systems by managing the rate of spreading this way) is very much suited for analysis with graph technology. It is a domain where the links between people, the links between people and places, their visits, their meetings are the main important data entities that we need to look at. It's the connections that matter. It's the connections that are becoming the "equal citizens" in the dataset - and therefore we need to spend time and resources analysing it.

But of course I know one thing for sure: there are plenty of other cases that are like that, that are true "graph problems" and that could really benefit from a graph approach to solving it. We know that from all the Neo4j project that we have been running for years. So how do I demonstrate that? How do I show that Contact Tracing is essentially the same thing like a recommendation engine? Or another graph application that we have come to know and love. Let's explore that.

Tuesday, 9 June 2020

Creating a Contact Tracing Testbed with Neo4j and Faker

Over the past few weeks and months, I have been living through the Covid-19 pandemic like many others. It's not been easy - but at the same time I feel very fortunate to have been able to stay healthy, active, working, and connected. There's a lot of people out there that are a lot less fortunate, and my heart goes out to them.

On this blog, I have been writing about using graphs for Contact Tracing quite a bit. See 
Fortunately, these articles were very well received by the community - we have had a ton of discussions with a variety of different individuals, companies and governments about how to use this technology to prevent that the next lockdown would again require immobilising so many healthy people. If the pandemic's second wave hits, we all want people at risk / sick people to be separated from the healthy population, and manage the spread of the disease in this way. But all of that requires contact tracing to be effective and operational - which is not a trivial thing to do.

This is why I have been looking at creating a very easy to use testbed for Contact Tracing in Neo4j. I wanted to make it super easy for people to create synthetic contact tracing datasets, and then work with them to gain experience - valuable experience for the "real deal" when we have to manage that. That's what this post is about.

Tuesday, 28 April 2020

Contact tracing guide for the Neo4j Browser

Based on the past two blogposts on (Covid-19) contact tracing (see here for the posts, here for the movies), I thought it would be a good idea to pick up an old skill - to create a Browser Guide for Neo4j for people to look at this dataset example more easily. I did this a long time ago for my beergraph as well, so why not do it for the contacttracinggraph :) ...

About the Neo4j Browser and Browser guides

Here's what this is: with Neo4j, the native graph database, we always ship a default user interface called the "Neo4j Browser". It's a interactive application that communicates with the database, and that essentially allows you to fire of Cypher queries and look at / manipulate the contents of your database. Read up about it over here. Once you have done that you will realise that the Browser is actually more than that: it's also a great way for people to learn more about Neo4j, and has a built in mechanism to share "guides" to various topics. If you experiment a bit with the following commands:

Title
Description
Command
Intro
A guided tour of Neo4j Browser
:play intro
Concepts
Graph database basics
:play concepts
Cypher
Neo4j’s graph query language introduction
:play cypher
The Movie Graph
A mini graph model of connections between actors and movies
:play movie graph
The Northwind Database
A classic use case of RDBMS to graph with import instructions and queries
:play northwind graph
you will get to see a number of topics that allow you to familiarise yourself with it really easily. Most of these guides are either built in or available for serving from a webserver. But: you can also develop these guides yourself. There's a really nice worked example over here, but the process really is dead simple:

Friday, 24 April 2020

(Covid-19) Contact tracing follow-up - demo movies

In my previous post I outlined the 4 different blogposts that I wrote about using the Neo4j Graph Database for Contact Tracing. Each of these posts is actually interesting in and of its own, and actually makes for a really nice demo of the capabilities in Neo4j. So I created those today - and put them on a Youtube playlist for you:



Tuesday, 21 April 2020

(Covid-19) Contact tracing - an amazing graph problem & rabbit hole

In the past couple of days, I have been working with several of my colleagues on a number of projects, all around the world, that are preparing our societies for a post-lockdown strategy that will allow us to keep the Covid-19 pandemic under control, and still regain some of our freedoms. This will be tricky, for sure, but as in so many problems, technology can probably assist.

That's why I started experimenting with how a graph database like Neo4j could help with this. Some of the tracing problems that we will face, are uniquely well suited for a graph database approach: it allows for us to see and understand the indirect contacts that healthy and sick people may have had with one another, and the effects that this could cause in our environments. It also allows for some unique predictive analytics: the structure of our contacts, the network/graph that it constructs, actually says a lot about the importance that parts of the network may play in the evolution of the pandemic. Graph Data Science can give us pointers as to where this should direct our policies.

This has ended up being quite an extensive piece of work. In order to keep it readable, I have cut it up into 4 blogposts, which I will put up all at the same time:
There's so much potential in this dataset, and in this problem domain in general. I feel like I have gone into the rabbit hole and have just resurfaced for some air. But who knows, maybe I will dive back in and do some more digging - after all, this is interesting stuff, and I love working on interesting topics.

Hope this is as interesting for you as it was for me.

All the best

Rik

Note that these demos will require the following environment: 
  • Neo4j Desktop 1.2.7, Neo4j Enteprise 3.5.17, apoc 3.5.0.9, gds 1.1.0, or
  • Neo4j Desktop 1.2.7, Neo4j Enterprise 4.0.3, apoc 4.0.0.6 (NOT later! a bug in apoc.coll.max/apoc.coll.min needs to be resolved)

(Covid-19) Contact Tracing Blogpost - part 4/4

Part 4/4: Some loose ends for the Contact Tracing graph

In this last part of this blogpost series, I wanted to quickly articulate some interesting points that I found useful during these experiments.

Using the geospatial data for some additional insights

You may remember that back in part 1, I imported some geospatial properties into our graph - assigning coordinates to all of the Places nodes that we have in the graph. Clearly this also opens up further possibilities for additional analysis, which I have not explored yet in the previous posts. Suffice to say that this data is super easy to work with in Neo4j. Just run a query like this:
match (pl:Place) return pl.id, pl.name, pl.type, pl.location limit 10;
And you can see that the pl.location property has a real geospatial data type that I can use:

(Covid-19) Contact Tracing Blogpost - part 3/4

Part 3/4: Graph Analytics on the contact tracing graph

Note that these queries require environment: Neo4j Desktop 1.2.7, Neo4j Enteprise 3.5.17, apoc 3.5.0.9 and GDS 1.1. At the time of writing, Neo4j 4.0.3 is not yet supported by GDS 1.1.

One of the fantastic qualities of the graph data model, I have always found, is that it can give you interesting insights - without even looking at the data. The structure of the network can give you some really interesting new revelations, that you would not even have considered before. That is why Neo4j has invested a ton of effort in providing our industry with a completely new set of capabilities that allow us to discover these structural insights more easily - in the form of a new Graph Data Science Library. We have recently released the product, and you should read up on it in detail, and I think it would be a great and interesting idea to explore it on this Contact Tracing dataset that we have built in part 1 and queried in part 2.

Some data prep for analytics: inferring a new relationship

In order to do that, there's actually something that's missing: a new relationship between two Persons, which infers the fact that two people have MET. We can do that based on the overlap time of their visits to the same place - therefore leveraging a query from part 2. This is what are going to do: create a MEETS relationship between 2 Person nodes, based on the overlap - and we do that like this:

match (p1:Person)-[v1:VISITS]->(pl:Place)<-[v2:VISITS]-(p2:Person)
where id(p1)<id(p2)
with p1, p2, apoc.coll.max([v1.starttime.epochMillis, v2.starttime.epochMillis]) as maxStart,
apoc.coll.min([v1.endtime.epochMillis, v2.endtime.epochMillis]) as minEnd
where maxStart <= minEnd
with p1, p2, sum(minEnd-maxStart) as meetTime
create (p1)-[:MEETS {meettime: duration({seconds: meetTime/1000})}]->(p2);


As you can see, we are storing the length of the inferred meeting as a duration property on the relationship. The result appears very quickly:


(Covid-19) Contact Tracing Blogpost - part 2/4

Part 2/4: Querying the contact tracing graph

Note that these queries require environment: Neo4j Desktop 1.2.7, Neo4j Enteprise 3.5.17, apoc 3.5.0.9 or Neo4j Enterprise 4.0.3, apoc 4.0.0.6 (NOT later! a bug in apoc.coll.max/apoc.coll.min needs to be resolved)

In Part 1 we created and imported a contact tracing graph. Now, we are ready to experiment with some interesting graphy queries.

The most interesting part about many if these queries, I find, is that they all relay on the fundamental principle of "hypothesis-free querying". What I mean by this is, is that graph querying, in my experience and opinion, have this wonderful quality about them that you can actually interact with the data in a way that does not require you to hypothesize too much about the structure of the dataset. This is important, because very often I just won't know what I don't know, and making meaningful hypotheses is actually really hard and complicated. The fact that we don't have to do that, is a great win.

As always, you will find all queries are on github, so that you can have a play with it yourself as well. So let's dive right into it.

Who has a sick person potentially infected

To answer that, I will "grab" a sick person from the dataset, and then just walk the dataset from the person to the other persons that are currently healthy. The query goes like this:

match (p:Person {healthstatus:"Sick"})
with p
limit 1
match (p)--(v1:Visit)--(pl:Place)--(v2:Visit)--(p2:Person {healthstatus:"Healthy"})
return p.name as Spreader, v1.starttime as SpreaderStarttime, v2.endtime as SpreaderEndtime, pl.name as PlaceVisited, p2.name as Target, v2.starttime as TargetStarttime, v2.endtime as TargetEndttime;

(Covid-19) Contact Tracing Blogpost - part 1/4

Part 1/4: creating and importing a synthetic contact tracing graph

As we are living in these very interesting times, and many countries are still going through a massive operation to slow down the devastating effects of the SARS-CoV-2 virus and its CoViD-19 effects, there is of course also a lot of discussion already going on what we will do after the initial surge of the virus has passed, and when the various countries and regions will start opening up their economies.

A tactic many countries seem to be taking is the implementation of some kind of Contact Tracing. Using the technology on our phones and our pervasive internet connectivity, we could imagine a way to implement "distancing" and isolation of people that are either already victim of, or vulnerable to, CoViD-19. This seems like a logical, and useful tactic, that could help us to open up our economies for business, while still maintaining the basic attitude of wanting to "flatten the curve". Of course there are still many, many issues with this approach, not in the least with regards to patient privacy and political freedoms, but it seems like an interesting track to explore, at least. Many government organisations have therefore started to explore this, and are working with some of the industry giants like Google and Apple to make this a reality.

This evolution started a whole range of discussions inside Neo4j, especially with regards to the usefulness of a graph database to make sense of some of these contact traceability databases. I remember reading Christakis and Fowler's Connected book, and understanding that virus outbreaks are one of those cases where our direct contacts don't necessarily matter - or at least not matter alone. Indirect contacts, between our friends' friends' friends, can be just as important. So lots of interesting, graph-oriented questions arise: How could we maximise the effect of our distancing measures, and of any contact tracing applications that we put in place? How could we use the excellent and predictive power of the graph to find out which of a person's connections could be most risky? How can we use graph analytics to better understand the structural power and weakness of our social networks? And many more.

So, being locked down myself (although Belgium clearly has a much software stance than for example France or Italy), I thought I would spend some time exploring this. That's what this blogpost series is going to be about - so let's get right to it.

Monday, 6 April 2020

Graphistania 2.0 - Episode 6 - The One with the CovidGraph

So, when I started working with Graphs in 2012, one of the first community use cases that I encountered was all about biotech. I met a few people from the University of Ghent, who were working on some amazing protein interaction networks - and it was fascinating. Over the years, we have done quite a few activities on this, and we have kind of built a nice life sciences and healthcare community around Neo4j. Some amazing work is being done there.

One of the most amazing cases out there, has been the use case of the German Center for Diabetes Research, who have been scouring the scientific universe for ways of finding cures against diabetes. Look at this brief video or read this article to know more about it:

Why am I telling you this? Well, with the global Covid-19 pandemic sweeping around the globe, and many of us being affected in small or big ways, our Neo4j Graph Community has been doing the most interesting things to try and apply the "power of the graph" to this complex and intricate problem. Take a look at covidgraph.org for their work. When I learned about it, I immediately thought about talking to some of the "chief instigators" and inviting them for a podcast interview - which we made happen at record speed :) ...

So here it is: a chat about Covid-19, and about how graphs will help us make sense of the data. Let's hope it proves to be useful.

Friday, 27 March 2020

Supply Chain Management with graphs: part 3/3 - some SCM analytics

I've been looking forward to writing this: this is the last of 3 blogposts that I have been planning to write for weeks about my experiments with a realistic Supply Chain Management Dataset. There's two posts before this one:
  • In the first post I found and wrangled a dataset into my favourite graph database, Neo4j
  • In the second post I got acquainted with the dataset in a bit more detail, and I was able to do some initial querying on it to figure out what patterns I might be able to expose.
In this this third and last post I would like to get a bit more analytical with the dataset, and do some more detail investigation in order to better understand some typical SCM questions. Note that I am far from a Supply Chain specialist - I barely understand the domain, and therefore I will probably be asking some silly questions initially. But bear with me - and let's explore and learn, right?

Wednesday, 25 March 2020

Supply Chain Management with graphs: part 2/3 - some querying

So in the previous post, we got introduced to a dataset that I have been wanting to get into Neo4j for a long time: a Supply Chain Management dataset. Read up about it over here, but the long and short of it is that we got ourselves into the situation where we have an up and running Neo4j database with 38 different multi-echelon supply chains. Result!

As a quick reminder, here's what the data model looked like after the import:

Or visually:


Data validation and profiling

The first thing to do when you have a new shiny dataset like that, is of course to get a bit of a feel for the data. In this case, it really helps to understand the nature of the different SupplyChains - as we know from the original Excel file that they are quite different between the 38 of them. So let's do some profiling:

match (n) return distinct labels(n), count(*)

Saturday, 21 March 2020

Supply Chain Management with graphs: part 1/3 - data wrangling and import

Alright, I have been putting the writing of this blogpost off for too long. Finally, on this sunny Saturday afternoon where we are locked inside our homes because of the Covid-19 pandemic, I think I'll try to make a dent in it - I have a lot of stuff to share already.

The basic idea for this (series of) blogpost(s) is pretty simple: graph problems are often characterised by lots of connections between entities, and by queries that touch many (or an unknown quantity) of these entities. One of the prime examples is pathfinding: trying to understand how different entities are connected to one another, understanding the cost or duration of these connections, etc. So pretty quickly, you understand that logistics and supply chain management are great problems to tackle with graphs, if you think about it. Supply Chains are graphs. So why not story and retrieve these chains with a graph database? Seems obvious.

We've also had lots of examples of people trying to solve supply chain management problems  in the past. Take a look at some of these examples:
And of course some of these presentations from different events that we organised:
So I had long thought that it would be great to have some kind of a demo dataset for this use case. Of course it's not that difficult to create something hypothetical yourself - but it's always more interesting to work with real data - so I started to look around.

Monday, 16 March 2020

Graphistania 2.0 - Episode 5 - This Month in Neo4j

Friends.

These are interesting times. These are difficult times, but we can deal with it together, as a community, as a graph. So that's why we were super happy that, just as Belgium was going into lockdown last week, we were able to record another Graphistania podcast episode for you, talking about the world in general, but also covering some of the amazing graph use cases that drifted over our screens in the past month, in the This Week in Neo4j (TWIN4J) newsletter.

There were actually many things to talk about, in terms of fascinating graph use cases, and I will highlight only the most striking ones here.
Our friends at Kineviz did some really interesting and timely work on  COVID-19 temporal and spatial data visualization. This stuff is really important to understand, as pandemic spreads clearly follow graph patterns. Read Connected if you are not convinced. 
Worth highlighting: Bloodhound: Windows network penetration testing with Neo4j, had a new release that you might want to take a look at. If you are not familiar with Bloodhound yet, you may also want to check out my interview with the Bloodhound crew on this podcast a while back. 
We published this fun little thing called a Neo4j Treasure Map - check it out! 
Finally - we also have a a Winegraph! It's a great example of importing data from the web using Norconex.  
Some interesting stuff on using Neo4j for Gene ID mapping: take a look! 
Another examle of enriching graphs with Wikidata, from the one and only Mark Needham: look at Mark's blog over here! 
Don't forget: we Introduced the Neo4j Graph Data Science plugin with examples from the "Graph Algorithms" book
A really interesting tweet about a visualisation of the US Supreme court as a graph db... Would love to see more like that. 
And for some fun: Pokégraph: Gotta Graph 'Em All! 
Some important stuff: we did a great 4.0 webinar that is giving you a lot of info on what to expect in the new version of Neo4j.  
There was a great update to NeoMap: Visualizing shortest paths with neomap ≥ 0.4.0 and the Neo4j Graph Data Science plugin.
Those were the most important ones. So let's talk about these now - I am sure there's a lot of cool stuff here fore everyone!

Tuesday, 18 February 2020

Graphistania 2.0 - Episode 4 - This Month in Neo4j

Yey! My friend StefanW and I got round to recording another Graphistania episode, episode 4 already - time flies when you are having fun! This month, again, we have so much great content popping up in the This Week in Neo4j (Twin4j) newsletter, that we could probably fill a few hours talking about it. So in the podcast, we will only talk about a handful - covering things like

Wednesday, 12 February 2020

Experimenting with Conflicting access privileges in Neo4j 4.0

In the past couple of weeks, I have been playing around with the shiny new security features of Neo4j 4.0. They are truly interesting - both for childproofing beergraphs and for ensuring that your sensitive fraud databases are properly secured. Take a look at the previous post, and I think you will understand why.

In this post, I wanted to talk about something that I have seen so many times in my previous lives in the security industry, and that also became evident in my 4.0 research. It's got to do with conflicting security privileges. In a nutshell, this is to do with the case where

  • a specific user / role would receive a particular set of privileges from one policy
  • the same user / role would receive a different, and contradictory privilege from another policy. 
In that case, we need clear rules to understand what would happen. In the case of Neo4j 4.0, this is reasonably well explained as part of the documentation - see the documentation site on this topic - but in this post I will try to give you a realistic, but simple example.


Creating Conflict

We'll start working on this with the same database as the previous post, the fraud dataset. If you don't have it yet, just download it from this link. Once we have the database up and running as a separate user database, we can switch to the system database and create a separate user for these tests.

//create a separate user for engineering the conflicting privileges
CREATE USER conflicted_user SET PASSWORD "changeme" CHANGE NOT REQUIRED;
CREATE ROLE conflicted_role AS COPY OF reader;


Friday, 7 February 2020

Securing a sample fraud graph with Neo4j 4.0

This week, we at Neo4j formally released our brightest and shiniest new version of the Neo4j Graphg Database to the world. It's been an amazing journey to this point, and others have reported on this magnificent piece of engineering in more depth. Take a look at Jim's blogpost, or if you are in a hurry, checkout the graphcast below:
Last week, I started playing around with it myself - by digging up my good old faithful beergraph, and illustrating some of the new features in childproofing exercise for beers. Take a look at that post as well for some giggles. Now in this post, I wanted to essentially do the same thing as I did on the beergraph, but using a Fraud dataset. 

Let's see how that would work.

Wednesday, 29 January 2020

Securing my Beergraph with Neo4j 4.0

Not sure if you have realised, but Neo4j has actually recently made the 4.0 version of the most fantastically awesome graph database on the planet available. You can get it ahead of the big launch event (on February 4th, 2020 - in case you were wondering!) from the Download Center and take it for a spin.

In this unbelievable release, there are so many new features, it's kind of hard to keep track of everything. But the ones that I can most easily get my head around are clearly
  1. multi-database support - finally, Neo4j actually has this concept of running multiple databases on one database server. A multi-tenancy solution, that has been requested and anticipated by many of our users and customers. 
  2. a VERY advanced schema-based security module, that allows people to extend the existing role-based security model of Neo4j even further - and make it crazy powerful. We'll spend a lot of time on that in this blogpost.
Readers of this blog probably know that I am a big fan of getting my feet down and dirty with our products, so this evening - with a couple of hours to spare, so to speak - I decided to try out the shiny new release. I spun up my Neo4j Desktop, and started reading some manual pages where stuff was explained. Specifically, I loved

Soon after flipping through this, I was on my way.

Tuesday, 14 January 2020

Graphistania 2.0 - Episode 3 - This Month in Neo4j

Happy new year everyone - although it actually seem like the holidays are already very far behind us! But great times were had, at least in my family, and so I feel super energised to make 2020 another great start to a decade of graphs :) ... Here's to that!

It also means that we are continuing to see all these awesome community stories pop up left right and center in the Neo4j "This week in Neo4j" developer newsletter. And so on our Graphistania podcast, we are going to continue talking about these on a monthly basis. So that's what we're doing - and I have again invited my friend and colleague Stefan Wendin to join me.

From the newsletter, we always select a few stories that we think will be more interesting and/or meaningful to discuss. This month, we found a number of them, and the interesting thing was that the graph-stories seemed to play at very different scales... The Personal, Corporate, and Society levels. Here are some of the ones we liked:

At the Personal scale
At the Corporate scale
At the Society scale, we saw some amazing posts:
So I think you agree that we had plenty of stuff to talk about. Let's get into that!