Tuesday, 11 May 2021

Graphistania 2.0 - The one with all the GraphStuff

Yes! Here's another great Neo4j podcast episode for you. I hope you will enjoy it -  just as much as I enjoyed recording it with Stefan.

Note that I have put all the interesting links together at the very bottom of the post. They all come from the Twin4j newsletter - to which you should all subscribe, obviously!


Here's the transcript of our conversation:

RVB: 00:00:44.353 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j, and yes, it's that time again. We are recording another Graphistania Neo4j podcast. And on the other side of this Zoom call is my dear partner in crime, Stefan, Stefan Wendin. How are you, man?
SW: 00:01:05.215 Always good. Always good meeting up, doing this with you, Rik. It's one of the favourites of the month. And I don't know, what can be better, talking about graphs with your best friend Rik in a sunny southern part of Sweden? Amazing. So good to go.
RVB: 00:01:22.239 Good to go. Fantastic. Great to have you here. And actually, we need to specify one thing, right, before we move on to the real topic of our podcast recording.

Saturday, 24 April 2021

Making sense of the news with Neo4j, APOC and Google Cloud NLP

Recently I was talking to one of our clients who was looking to put in place a knowledge graph for their organisation. They were specifically interested in better monitoring and making sense of the industry news for their organisation. There's a ton of solutions to this problem, and some of them seem like a really simple and out of the box toolset that you could just implement by giving them your credit card details - off the shelf stuff. No doubt, that could be an interesting approach, but I wanted to demonstrate to them that it could be really much more interesting to build something - on top of Neo4j. I figured that it really could not be too hard to create something meaningful and interesting - and whipped out my cypher skills and started cracking to see what I could do. Let me take you through that.

The idea and setup

I wanted to find an easy way to aggregate data from a particular company or topic, and import that into Neo4j. Sounds easy enough, and there are actually a ton of commercial providers out there that can help with that. I ended up looking at Eventregistry.org, a very simple tool - that includes some out of the box graphyness, actually - that allows me to search for news articles and events on a particular topic.

So I went ahead and created a search phrase for specific article topics (in this case "Database", "NoSQL", and "Neo4j") on the Eventregistry site, and got a huge number of articles (46k!) back. 

Monday, 29 March 2021

Part 1/3: Wikipedia Clickstream analysis with Neo4j - the data import

Alright, here's a project that has been a long time in the making. As you may know from reading this blog, I have had an interest, a fascination even, with all the wonderful use cases that the graph ecosystem holds. To me, it continues to be such a fantastic thing to be able to work in - graphs are everywhere, and more and more people are waking up to the fact that they really should look at their data as a network, and leverage the important relationships that are often hidden from plain sight.

One of these use cases that has been intriguing me for years, literally, is clickstream analysis. In fact, I wrote about this already back in 2013 - amazing when you think about it. Other people, like our friends at Snowplow Analytics, have been writing about this as well, but somehow the use case has been snowed under a little maybe. With this blogpost, I want to illustrate why I think that this particular use case - which is really a typical pathfinding application when you think about it, is such a great fit for Neo4j.

A real dataset: Wikipedia clickstream data

This crazy journey obviously started with finding a good dataset. There's quite a few of them around, but I wanted to find something realistic, representative and useful. So after some digging around I found the fantastic site of Wikimedia, where they actually structurally make all aggregated clickstream data of Wikipedia's pages available. You can just download them from this their website, and grab the latest zipped up files. In this blogpost, I worked with the February 2021 data, which you can find over here.

When you dowload that fine, you will find a tab-separated text file that includes the following 4 fields
  • prev: the previous page that the navigation came from
  • curr: the current page that the navigation came into
  • type: the description of the type of navigation that was occuring. There's different possible values here
    • link: a regular link between pages
    • external: a link from an external page to the current page
    • other: a different type - which can occur if people try to hide their navigation patterns
  • n: the number of occurrences of the (prev, curr) pair - so the number of times this navigation took place.
So this is the dataset that we want to import into Neo4j. But - we need to do one tiny little fix: we need to escape the “ characters that are in the dataset. To do that, I just opened the file in a text editor (eg. TextEdit on OSX) and did a simple Find/Replace of " with "". This take care of it.

Part 2/3: Wikipedia Clickstream analysis with Neo4j - queries and exploration

In the previous blogpost, I showed you how easy it was to import data into Neo4j from the official Wikipedia clickstream data. I am sure you would agree that it was surprisingly easy to import a reasonably sized dataset like that, within a very reasonable timeframe. So now we can have some fun with that data, and start applying some graph queries to it. All of these queries are also on github, of course, and you can play around with them there as well.

So let's take a look at some of these queries. 

Some data profiling and exploration

Here's a very simple query to give you a feel for the dataset:

match (n)-[r:LINKS_TO]->(m)
return distinct r.type, count(r);
match (n) return count(n);

The results are telling:

And so now we can start taking a look at some specific links between pages. One place to investigate would be the Neo4j wikipedia page. Here's a query that looks at the source pages that are generating traffic into the Neo4j wikipedia page:

Part 3/3 - Wikipedia Clickstream analysis with Neo4j - some Graph Data Science & Graph Exploration

In the previous blogposts, I have tried to show
  • How easy it is to import the Wikipedia Clickstream data into Neo4j. You can find that post over here.
  • How you can start doing some interesting querying on that data, with some very simple but powerful Cypher querying. You can find that post over here.
In this final blogpost I want to try to add two more things to the mix. First, I want to see if I can do some useful "Graph Data Science" on this dataset. I will be using the Neo4j Graph Data Science Library for this, as well as the Neuler Graph App that plugs into the Neo4j Desktop. Next, I will be exposing some of the results of these Graph Data Science calculations in Neo4j's interactive graph exploration tool, Neo4j Bloom. So let's do that. 

Installing/Running the Graph Data Science Library

Thanks to Neo4j's plugin architecture, and the Neo4j Desktop tool around that, it is now super easy to install and run the Graph Data Science Libary - it installs in a few clicks and that you are off to the races:

Monday, 15 March 2021

Graphistania 2.0: The Lockdown Anniversary session

This weekend marked the 1 year anniversary of the "Covid era" - the time when many of us have been hunkered down at home, or close to home at least, to deal with the raging pandemic. It has been a strange couple of months, and while I personally have been able to deal with it quite comfortably, I must say that my heart has been going out to all the friends and families that have had a far worse time with this. I personally got to experience it first hand in the first couple of months of 2021, loosing a very dear friend to the virus and its awful disease, and will not easily forget this period of our lives.

But we do want to keep the fantastic drive and atmosphere of our Neo4j community up - it would be a terrible shame if we lost that, too, to the virus. So that's why me and Stefan are going to continue making these podcast episodes, at least for the foreseeable future. Not in the least because they are a ton of fun to make :) ... 

So here's our latest chat. We have found a true treasure trove of great use cases and such in the Twin4j newsletter, which we will try to highlight:

Hope you will find our chat interesting! Here it goes:


RVB: 00:00:00.768 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j, and yep, it's that time again. Yippee! Yeah. We have another podcast recording day, and for that, I have my dear friend Stefan on the other side of this Zoom call. Hey, Stefan.

SW: 00:00:18.130 Hello, Rik. Nice to be back here with you.

RVB: 00:00:20.830 Hey, there.

SW: 00:00:21.733 Coming back from a week of vacation, so very nice to be back, and what better way to get your graph mind up and running than to hang out with you here in this lovely podcast?

RVB: 00:00:33.018 I hope so too. Thanks for being here, and I hope you had a good holiday. And it's been two months, Stefan, so we really need to get our act together. It's been a very busy--

SW: 00:00:43.185 Holy crap.

RVB: 00:00:44.057 --couple of months, but. We try to do these things once a month, but that didn't work in February, and so we've got a lot to talk about, actually. And maybe I'll just kind of frame it for you. I went through all of the Twin4j this week, the Neo4j newsletters over the past couple of weeks, and I found some really interesting themes. And maybe we can talk about those for a bit. There's three of them. Is that okay?

SW: 00:01:16.179 Yeah, yeah. Let's go. I think that should be super fun.

RVB: 00:01:20.885 Yeah. So the first one I wanted to talk about is, basically, use cases. I mean, we see this all the time in our community, really, right? That there's these unbelievable interesting use cases popping up. There's a couple of them that popped up in the newsletters. The first one is actually dear to my heart. It's something that I started working on back in 2012 when I first joined Neo4j. It's protein interaction networks. Have you taken a look at that one?

SW: 00:01:50.023 Yeah. That was an amazing one. As always with those kind of things, when there is-- this was written by Tomaz, right? So I started--

RVB: 00:02:01.558 Tomaz, [crosstalk].

SW: 00:02:02.277 --reading the article, but then halfway through the article, I was like, "Oh, but I better just try it out." That's how I learn. So I ended up running this, and it is so neat that this is, in some sort of way, so accessible. I think for me, that is super cool in that. So I find it extremely interesting and also because of the simplicity of it and how complex it is, even if it's just such a simple thing as a protein that interacts with another protein, basically, at the foundation, and it's still amazing. I think it's kind of mind boggling in that sense, so.

RVB: 00:02:44.678 A little story from my side: when I first started working with Neo4j, we did some work with the University of Ghent here in Belgium, and they were working on a topic called metaproteomics, which is exactly this, interactions of proteins. And they struck a nerve with me because one of their most important research customers was, of course - drumroll - a brewery.

SW: 00:03:13.135 Of course. I was wondering, "Where is this going to go?"

RVB: 00:03:17.022 Yes.

SW: 00:03:17.797 There it was. Brewery time again.

RVB: 00:03:18.825 There it is. Yeah.

SW: 00:03:20.664 There it is.

RVB: 00:03:21.182 It was a beer brewery, and they were basically saying yeasts that are being added to brewing systems, they create these protein interactions, and if you better manage those protein interactions, you can actually influence the brewing and the taste of the brew by doing so. So that was an interesting one. I had a good time exploring that. There was another one, Stefan, around asset management. This is a very well-known one as well, right? Things like configuration management databases, building information management. It's all about managing assets, isn't it? It's a very networked problem.

SW: 00:04:06.005 Yeah, yeah. And also, this is one of the use cases that seems to pop up more and more often in customer or prospect interactions, I think, so I think it's going to be very helpful for people. So go check it out if you are interested in that. I think that would be very neat to do that.

RVB: 00:04:29.195 But I know you want to skip to the third one that we had [crosstalk].

SW: 00:04:32.095 Yeah, the big one. This is what I'm kind of waiting for, like, "How can we get to this point faster?" Ha ha!

RVB: 00:04:37.534 Yeah. "How can we get to that third point faster?" which is, of course, the use case of getting to Mars and NASA. It was all over the news in the past couple of weeks, obviously, but yeah, that story of how David Meza from NASA-- he's the - how do you call it? - chief knowledge management architect or something like that with NASA, and he worked on that Lessons Learned database in Neo4j. Super cool, right? It's so good.

SW: 00:05:12.489 Yeah. It's super good. I think the use case is good. The interview is also great. David is also very relaxed, I think, also. Ashley - or what is the name of the girl interviewing? - is also doing a great job. There's very good chemistry in there, really enjoyed it. And again, of course, anyone that was dreaming about going to space as a kid, imagine working with such a thing. I mean, it can't get better. This is the moment when you go to work, and you go, "Holy crap. I'm so proud now." But I think it's an interesting thing on how much this actually speed up time for them, right? How much is saved not only time, in that sense, but also taxpayers' money, right?

RVB: 00:05:59.053 Of course.

SW: 00:05:59.211 I think that's what I keep coming back to in talking about use cases. So you can do a lot of things with a lot of technologies, right? So very often, people ask me, "How can I use Neo?" Right? But then I say, "You can do it for this, but you can, of course, do this with your old technology, in theory. However, if that theory takes you two years, maybe you can't really do it in practice." So I keep coming back to think about that, and I think this is such a good showcaser on that. It's a great YouTube clip here, so for those that have a hard time reading or just want to listen while-- I was saying commuting to work. Apparently, commuting to your working room should be better now, in these times. But I really enjoyed it. Happy to see it, so yeah, hope you like it as well.

RVB: 00:06:49.604 Cool. Yeah. It was super nice. And there's another, I mean, theme to the newsletters in the past couple of months. I've seen so many-- it's kind of like a use case, but it's also a technology foundation of people that are using graphs in combination with natural language processing, right? I saw a number of posts from Jesús, our colleague, who was talking about RDF-related work, WordNet, those types of things, but there's also people that are doing really interesting work on extracting new knowledge from existing documents, right? Were you able to make any sense of that?

SW: 00:07:39.354 Yeah. And I think it's like this is also one of those kind of super untapped-- and I think it's also a perfect bridge from NASA, right? Because literally, that was what they were doing. They had the answers; they just couldn't see it, right? So it's a classical, "You can't see the forest because of all the trees," right? So I think that is really interesting. And I think also, there is a great post about-- I think it was called From Text to Knowledge: The Information Extraction Pipeline or something, basically where Tomaz then explained why he see a combination of NLP and graphs as one path to explainable AI, right? And I think this is also one of those topics that are super important from a lot of things to kind of understand but also compliance and a lot of things, right? So I think this is also one of those kind of areas, use cases, or whatever you want to call it that literally are exploding on all different kind of verticals, you may almost use as a word there. But yeah, I think that, again, our--

RVB: 00:08:44.103 Yeah. Well, it's been very popular in domains where there's a lot of documentation, right? So academics, pharmaceuticals, patterns, legal texts, all those things have been really a great showcase for this type of work, I think.

SW: 00:09:04.354 Yeah. No, but as you're saying, I can see anything from academia, patterns-- I mean, I don't know how many of these kind of works that we have done with prospects and really kind of tapping into this kind of super kind of deep knowledge, but you can't see the new perspective because it is just a lot of deep silos, right? So in that sense, it's super graphy, and I think when people start to see it, this is also where they get so excited so were almost screaming, "Take my money," and I was like, "Calm down. Behave good. What is it that you're thinking of answering, or what is the thing that would help you," right? It doesn't have to be the money query, but just don't throw technology at the problem. So be mindful of what you want. I mean, we can see it a lot. I think one of the interesting parts is also looking upon the entire web, scraping information and making sense of it and treating that as a kind of a knowledge grab itself. It's also one of a neat couple of projects that I'm working on personally. Yeah. So there's a lot of funny things to do with the NLP and knowledge graphing combination, I think.

RVB: 00:10:19.330 Yep, totally. Well, what strikes me there-- and this is a perfect segue to our third theme. What strikes me is that it's becoming so much easier to do this, right? So NLP a couple of years ago, that was just so exotic and difficult to use. You basically had to have some kind of a computer science degree or a PhD to be able to use it, but these days, the tools that we have to implement some of these techniques are super accessible. I mean, even a lost sales guy like me can use it. Do you know what I mean? It's pretty usable, even in its basic form. So I wanted to talk about some of the really interesting tools that we see emerging in our community. The one that I was so happy about, to finally see it fully released, is the Arrows app. I think you've used it for a long time already when it was still in an alpha or beta stage, and now it's--

SW: 00:11:26.893 Exactly.

RVB: 00:11:27.278 --actually been released. Alistair Jones' pet project is now finally out there in the wild. Great, though, right? I mean, really great.

SW: 00:11:37.420 Yeah. No, but I think it's so useful in so many ways, of course, for graph modelling, where it was intended, right? And I have a couple of memories working with a lot of C-level people trying to help them understand the power of connected data and so on. So they're not going to write any code, but what we tend to do is do some modelling practices and basic cipher, and for that, I use Arrows. And one of the constant feedback that I get because of the simplicity of the tool and how it naturally kind of lends your thinking to this kind of graph thinking idea is that, every single one of these sessions, these CEOs, COOs are coming back to me like, "Oh, this is a really good way of thinking of the business, the different domain, and how it's connected. It has given me tons of new ideas." So in that sense, that kind of doubled down as a ideation kind of tool almost, if it makes sense, but I think that's such a cool kind of thing, right? If you're going to build an app or if you have a business logic or anything, that really helps you to kind of map it out in that sense. So it's also kind of neat to see that and to see those people, also, stepping into the graph arena. But yeah.

RVB: 00:12:53.804 Yep. That frame also. Yeah. So Arrows is one of those really amazing tools, but there's other things that are coming up, right? We've all known about the GRANDstack, the development framework that's been around for quite some time. A new release for that and new features, capabilities there, and some examples, also, for people to use and to abuse, I would say.

SW: 00:13:18.057 Use and abuse. That's a perfect way to do it.

RVB: 00:13:19.407 Use and abuse, yeah.

SW: 00:13:21.935 Just dive in there and try.

RVB: 00:13:22.283 And then another one that I wanted to mention was I've really enjoyed using this tool that Niels created called NeoDash. It's a graph app that plugs into the Neo4j browser and allows you to put together dashboards on top of Neo4j. Really no-code development, that type of thing, super easy to use. I was very impressed by that, how accessible it's made everything.

SW: 00:13:52.996 Yeah. And I think that's such a great part because I think just the ability to do something without that no code. Of course, that is in the topic with the cloud itself, one of those kind of things that you can really see how accessible these are. But seeing people with no previous knowledge just trying and fiddling around there, they kind of stumble upon the solution almost. That simple. I think the NeoDash is such a great kind of application for just exploring or visualising the data that you have in a graph that you normally would not use. So we tend to try it out with a lot of the nontechnical people when we work, and it works like a charm every single time. There is literally almost no studying kind of to get started, so I again encourage you, as with all articles, use and abuse, right? Dive in, try around because actually, you're going to get kind of far by just doing that. And that is, I think, a common theme for all of these. You can really see how this whole paradigm of connected data is changing, which is, of course, super cool.

RVB: 00:15:17.502 Yeah, absolutely. Well, I mean, so many other things to talk about. What we'll do is when we get to the blog post created together with this recording, we'll also put all the links to the amazing Twin4j newsletter items and the blog posts and everything all together, and then people can have a look, have a play, use and abuse, and that should set them on their path for even more graph adoption, right? So that's the [crosstalk] idea.

SW: 00:15:50.313 Even more graph adoption. Yeah. And I'm also going to squeeze in a last one because we also had the GDS 1.5 release, right? And there's a great piece on it from Amy and Alicia, two of my most inspiring colleagues. They have taught me a lot of things and are super nice as well. So it's about the new supervised machine learning workflows in Neo4j, so imagine that being even accessible to just try. I can't even think of this. If I would have guessed this five years ago, I'd be like, "Nah, that's not going to happen."

RVB: 00:16:29.543 It was impossible. Yeah, exactly.

SW: 00:16:31.365 "That's impossible. You can't do that on your computer at home in your sofa." But I think that's so cool. So that's a great article by Amy and Alicia; go check it out as well. I'm going to push that in there, but we're going to, of course, post the links, as I said.

RVB: 00:16:48.591 We will, for sure. Well, Stefan, thank you so much for taking the time running through this with me and making a little bit more sense out of it. It was great talking to you. You know that we want to keep these things shortish, at least, so we're going to wrap it up for now, and we're going to try to have this one published soon and then do another one in April, right? We should [crosstalk].

SW: 00:17:14.412 Yes, of course, 1st of April. I will record it from my new podcast studio in the barn in [inaudible] in southern Sweden.

RVB: 00:17:22.097 Ooh.

SW: 00:17:23.410 Ooh, yes. And then--

RVB: 00:17:25.195 I look forward to that one.

SW: 00:17:26.657 --as soon as we get to travel, this is a standing invitation for you to come join me and also for any of our listeners. Bear in mind--

RVB: 00:17:37.641 Absolutely [crosstalk].

SW: 00:17:37.964 --COVID restrictions has to be better before that, so I am not encouraging any anti-vaccine behaviour here, but as soon as we are allowed to travel, come join us. It's going to be a great talk about graphs.

RVB: 00:17:51.294 Fantastic. Thank you, Stefan. It was great talking to you, and I'll talk to you soon.

SW: 00:17:55.905 Likewise. Bye.

RVB: 00:17:57.591 Bye.

Hope you enjoyed that as much as we did. If you have any comments or questions, just reach out!

All the best

Rik & Stefan

Monday, 8 March 2021

Contact tracing in Neo4j: using triggers to automate actions

Last year I was able to spend some time thinking and writing about Contact Tracing, especially since it should probably be a very important part of any kind of system that allows us to deal with pandemics like the Covid-19 pandemic. You kan find some of these articles on this blog if you are interested. This is still as relevant as ever, but seems to have been a more difficult thing to implement practically in a free and privacy-sensitive society like ours (see this research and this article for some thoughts on this). Nevertheless it seems relevant still, as the Canton of Geneva has demonstrated.

In this article I wanted to share a very short little piece of work that I did together with my colleagues on the part that ACTIONS contact tracing activities. What do we ACTUALLY DO when we find a person that is at risk? What alarms should go off? Who should get notified? Where should we be looking for our next potential patient?

This, I think, is something that we could solve with a "database trigger". This is a systematic way of always reacting to events in the database in a consistent and predictable way. It has been demonstrated a number of times before - I really liked David's way of explaining it in his article on data loading/streaming with Neo4j's triggers. David describes it well:
What’s a Trigger?
If you haven’t worked with triggers before, a trigger is a database method of running some action whenever an event happens. You can use them to make the database react to events, rather than passively accept data, which makes them a good fit for streaming data, which is a set of events coming in.
Triggers need two pieces:
  • A trigger condition (which event should the trigger fire on?)
  • A trigger action (what to do when the trigger fires?)
Triggers are available in Neo4j through Awesome Procedures on Cypher (APOC), and you can find the documentation on them here
So what I want to take you through here is just to take the contact tracing dataset, and show you how easy even a salesperson-lost-in-cypherspace like myself could make it work. So here goes.

Installing and preparing

So the first thing we need to do is to create new database in Neo4j Desktop. Once installed, we should install the APOC plugin, which should be directly available in Neo4j Desktop's "plugins" section. The trigger functionality that we will be using us actually part of the APOC library, and you can find documentation on them over here.

In order to generate the dataset, I will use the Faker plugin again. See the previous blogpost for more on that, but installing that is actually really easy. First we need to download the latest release from github, unzip that file into plugins directory of your freshly baked server, and then manually do a small piece of restructuring in the file structure:

  • put  neo4jFaker-0.9.1.jar in plugins directory
  • and put ddgres directory in plugins directory. 
  • delete all other directories
Once that done, we need to edit the neo4j.conf file so that the faker library will actually be allowed to run inside your Neo4j server. You do that by adding

dbms.security.procedures.unrestricted=fkr.* 

to neo4j.conf. Easy. The file structure should look like this:

And the neo4j.conf should have a line like this.

Once that's done the last change is to allow the apoc triggers to run by adding 

apoc.trigger.enabled=true

to neo4j.conf as well.

Generating the dataset

First thing after starting the database is to fire up the Neo4j Browser, and check if faker library functions are active. That's easy:

call dbms.functions() yield name
with name
where name starts with "fkr"
return *

If it returns with a list of functions, then we are good to go.

Next we just need to run two queries to create the dataset: 5000 persons with 15000 MEETS relationships.

foreach (i in range(1,5000) |
    create (p:Person { id : i })
    set p += fkr.person('1940-01-01','2020-05-15')
    set p.healthstatus = fkr.stringElement("Sick,Healthy")
    set p.confirmedtime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
    set p.birthDate = datetime(p.birthDate)
    set p.addresslocation = point({x: toFloat(51.210197+rand()/100), y: toFloat(4.402771+rand()/100)})
    set p.name = p.fullName
    remove p.fullName
);

and then

match (p:Person)
with collect(p) as persons
call fkr.createRelations(persons, "MEETS" , persons, "1-n") yield relationships as meetsRelations1
call fkr.createRelations(persons, "MEETS" , persons, "1-n") yield relationships as meetsRelations2
call fkr.createRelations(persons, "MEETS" , persons, "1-n") yield relationships as meetsRelations3
with meetsRelations1+meetsRelations2+meetsRelations3 as meetsRelations
unwind meetsRelations as meetsRelation
set meetsRelation.starttime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
set meetsRelation.endtime = meetsRelation.starttime + duration("PT"+toInteger(round(rand()*10))+"H"+toInteger(round(rand()*60))+"M")
set meetsRelation.meettime = duration.between(meetsRelation.starttime,meetsRelation.endtime)
set meetsRelation.meettimeinseconds=meetsRelation.meettime.seconds;

That finishes in seconds.

And then we end up with a very simple graph data model:


Now we can start asking ourselves how we can take these automatically triggered actions based on database triggers. So let's explore that.

Working with triggers

As mentioned above, you can find the documentation for the trigger procedures and functions over here. Once the configuration flag is set (apoc.trigger.enabled=true) in neo4j.conf, we can add the triggers to the database, and start testing them. so let's add them first.

Adding two triggers

You will find that there are different types of triggers that you can add. I will explore two types in this post:

  1. a trigger that will fire as soon as a LABEL is added to a node in the database.
  2. a trigger that will fire as soon as a property is set in the database. 
The structure and process of adding the triggers is very similar and simple. Let's walk through it.

First we will be  adding the CHANGE LABEL trigger to the database - we call this trigger "highriskpersonadded":

CALL apoc.trigger.add(
'highriskpersonadded',
'UNWIND apoc.trigger.nodesByLabel($assignedLabels,"HighRiskPerson") AS n MATCH (n)-[:MEETS]-(p:Person) set p:ElevatedRiskPerson'
,{phase:'after'}
)


Next we are adding a CHANGE PROPERTY trigger to the database - we call this trigger "healthstatuspropertychangedtosick":

CALL apoc.trigger.add(
'healthstatuspropertychangedtosick',
'UNWIND apoc.trigger.propertiesByKey($assignedNodeProperties,"healthstatus") AS prop
with prop.node as n
MATCH (n)-[:MEETS]-(p:Person) where n.healthstatus = "Sick"
set p:MuchElevatedRiskPerson'
,{phase:'after'}
)


Once these have been added to the database, we can start working with them, and test them out.

Managing the triggers

There's a number of procedures that allow us to see which triggers have been added, and which are active:
  •  Listing the trigger:

    call apoc.trigger.list()
  • There's another procedure for removing the trigger:

    call apoc.trigger.remove('<name of trigger>')

  • And one for pausing the trigger:

    call apoc.trigger.pause('<name of trigger>')

  • Or resuming the trigger:

    call apoc.trigger.resume('<name of trigger>')
But now, of course, we want to see what happens to our database when the triggers start firing. Let's explore that.

Triggering the triggers

The first thing that we will do is we will write a cypher query that will trigger the CHANGE LABEL trigger. We do that by selecting one person and assigning the "HighRiskPerson" label to that node:

match (p:Person {healthstatus:"Healthy"})
with p
limit 1
set p:HighRiskPerson;

This quasi immediately completes:


And then we immediately see that our Neo4j browser reacts by highlighting the fact that some new labels have been added to the database.


And then we run a quick query to see that indeed, the HighRiskPerson node has been connected to other nodes that have the ElevatedRiskPerson label:


So our first trigger is clearly working. Let's try the other one.

We will now triggering the CHANGE PROPERTY trigger by at the same time set a new label (VeryHighRiskPerson) and setting the healthstatus property to "Sick".

match (p:Person {healthstatus:"Healthy"})
with p
limit 1
set p:VeryHighRiskPerson
set p.healthstatus = "Sick";

That finishes immediately, of course.


And next thing we know, we also see that the VeryHighRiskPerson and MuchElevatedRiskPerson labels have been added:


So that all seems to have worked. This really allows for much more automated actions on the database, which could be extremely useful in a sensitive use case like Contact Tracing - but I could equally see how this would be super useful for use cases like Fraud Detection or something similar.

Hope this useful for everyone. All the code in this example has been published on github, of course. If you have any feedback, please reach out.

All the best

Rik