Tuesday, 18 February 2020

Graphistania 2.0 - Episode 4 - This Month in Neo4j

Yey! My friend StefanW and I got round to recording another Graphistania episode, episode 4 already - time flies when you are having fun! This month, again, we have so much great content popping up in the This Week in Neo4j (Twin4j) newsletter, that we could probably fill a few hours talking about it. So in the podcast, we will only talk about a handful - covering things like
So let's talk about this - in our next episode:



Here's the transcription of our chat:

RVB:  00:00:00.000 Hello, everyone. My name is Rik Van Bruggen from Neo4j, and this is probably the most embarrassing podcast episode in a long, long time. And that's because it's actually a re-recording. We tried to record the Graphistania V2 podcast episode last week, but yours truly made a big mistake with the audio setup. And that's why we are here again, right? And I'm not alone. I'm here with Stefan Wendin.

SW:  00:00:31.355 Yeah. Hi. Nice to be back again and redoing this. This is like a time machine almost, you could argue. So maybe we should get the Nobel Prize for inventing one. I think it's also a good statement. Sometimes you mess up things, right? And then, just go on it again and redo it. It's about the trajectory. This is why they always just continue doing it and improve over time. So it's just role modelling that behaviour, I guess. So let's do this. Let's have fun, and let's have a great job.

RVB:  00:01:04.579 Yeah. Exactly. So apologies again. I made a mistake with the audio setup last time, which made the recording extremely echoey and unusable. So we are going to do this the proper way now. And we're going to be talking about the wonderful use-cases that we've seen in the last month, in this week, in the Neo4j newsletter. So this month in Neo4j, so to speak, and we've seen some really good ones, right, Stefan? And let's start with some of the great stuff that we've seen in the community in the last couple of weeks. There's been some really great announcements. For example, the GraphTour was kicked off, right? You were there.

SW:  00:01:47.180 Yeah. It was amazing to see the amount of how the community is basically or literally exploding and growing and seeing people joining. I mean, for me, one of the highlights of GraphTour is that I brought along with me persons that were in the audience, actually, from last year. And they demoed their graph project which is now up and running. Such an amazing kind of-- it's a wonderful thing, right? Being in the audience and then next year being on stage is like-- so a super shoutout to the people at ERIKS Digital for that. I mean, meeting also the so-called legend, or what is it that Jim always says about Michael Hunger? It's so nice to meet him as well. He has a couple of beautiful posts around some things as well so also amazing to see. That's also super energetic. Anything that stands out for your side from the event?

RVB:  00:02:44.117 Yeah. Well, I think the GraphTour was also-- the launch of the GraphTour also marked the launch of Neo4j 4.0, right? And I must say, that's also a fantastic milestone for us. People sometimes say it's yet another product release, right? But it's not. It's actually a lot more because it's the culmination of a lot of work and some serious investments that went into it. And actually, I've been playing around with it myself. I've been blogging about it, and I finally managed to secure, or childproof, my beer graph, right? So I'm very happy about that [laughter].

SW:  00:03:23.379 Very, very important. I mean, to secure the beer graph. That's one of the reasons I went here and remember joining, actually. Not because I love beer that much, but I think it's one of those things like how we can use graphs to explain something and find things which you already know but you don't know that you know. So I remember before joining Neo, I was checking it out, and I was so in love of it. So I showed it, actually, in Adidas in Nuremberg and people were like, "Ah, wow, what is this guy talking about beer?" But it's such an amazing thing, and so happy to hear that it's all secure now. But I think it's also, as you said, it's not just that they release. I had some clients saying just the other day like, "Oh yeah, we were thinking, it's the competitors coming closer to Neo, and then you drop this 4.0, and you just skyrocket away." And I think it's one of the best comments you can ever have because so many amazing things, and really like staking the next paradigms of graphs, almost. So yeah, super excited about that and all of the security features.

RVB:  00:04:30.329 Very cool, right? And I think there's a lot of community stuff coming up. We'll have the second version of the Global Graph Celebration Day coming up, which is also a real fun exercise celebrating the birthday of Euler. But actually, I wanted to kind of segue from the 4.0 and the beer graph story because what also struck me this last month is that there's so many people that are actually using Neo4j and using graphs, not just for professional applications, but for their personal stuff, right? I mean I use it for--

SW:  00:05:14.835 For the beer graph [laughter].

RVB:  00:05:16.268 --for the beer graph, exactly. But if you look at, for example, what Mark Needham has been doing, he writes about the Australian Open, or he talks about allergies and how you can model allergies in a graph and discover, actually, really interesting things there. It's very personal, isn't it? I mean people can actually use this to do things that matter in their personal lives.

SW:  00:05:42.876 Yeah. And I think this idea of the quick graph, this is also one of the things that I'm so in love with. And I do work with a lot of business people, not necessarily the most technical or, for sure, not developers. And in a lot of the workshops and sessions that I do, we just pull it up and then put a graph together really, really quick. And the ability to do that and kind of unfold the graph and see the insides, I think it's also one of those things which I just can't get enough of it. And I think the allergens one is super cool as well. I'm super allergic to some fish and most seafoods, so seeing that and how it's connected with all of the different dishes and stuff, and I was like, "Yeah. Story of my life." And I can see it coming alive here. So it's very, very relatable in a sense.

RVB:  00:06:36.479 Very relatable, right? I mean, yeah, everyone knows someone with allergies. I have a son that has a nut allergy. Mark, who was a great friend as well, right, I mean, he's got lots of allergies. And again, that quick graph can illustrate the personal aspects of it. But it also, and this is another segue I suppose, it kind of illustrates how graphs can matter for lots of different health-related capabilities, right? I mean, we know that the health industry or the healthcare industry and the pharmaceutical industry has a lot of graph use-cases. But that also kind of became clear this last month, right? There was a couple of really cool stories around that.

SW:  00:07:20.635 Yeah. I mean, visualising all the clinical data stuff and seeing how it all connects. And this is also tying a little bit back, I started my own project doing this myself, just trying to map things out. Because a lot of the way that we use medical things are, again, based upon table structures, which is good for some things, but most of the things in an ecosystem is connected. And I guess, this is also one of those. Did you check out that Google kind of largest-ever map of brain connectivity, the fly brain?

RVB:  00:07:57.548 I thought that was crazy. I mean, I'll put the article in the transcription of this conversation, right? But Google, basically, is much more than an advertising company or a search company, right? They do some really, really interesting, clever, fundamental research. And one of those things is all around mapping how the neurons and the synapses of a brain are connected. And they used this very specific example of a brain map. And I was just thinking, "Wow. Imagine if, in a couple of decades, we could do that very quickly on a human brain, and we could do it in real-time, and we can use it for diagnosis and curing." I mean, the endless possibilities that you can imagine from doing that, I think it's just mind-boggling really, isn't it?

SW:  00:08:51.037 Yeah. It's one of the those really, as you say, mind-boggling things. It's like when you see it happens, you see how far we are because this is a fly, I guess, but also how close we're getting. For me, this is very much a story about trajectory and seeing things coming to life. Not just posting about AI or Artificial Intelligence, where we don't even understand, I would argue, in most cases, how intelligence works from the first place. So for me, this was one of the ones I loved. I especially liked one of the headlines further down in the article. It was something about do-it-yourself neurologies, the world is our oyster or something. And making this available for people, I think is also one of the really cool things. Because again, availability for that is a graph in itself, and it's a network. And using the power and intelligence of the network, it's, again, even cooler in these kind of cases. So yeah, super psyched about that.

RVB:  00:09:52.016 Great. Well, I mean there's a couple of other stories that we'll put in the transcription, but I think we'll leave it at that for this episode of Graphistania. And we'll share that with our listeners in the next couple of days. Hey, Stefan, thank you so much again for doing another take on this recording [laughter].

SW:  00:10:12.724 Another take, yeah [laughter]. It's great. As always, I think for me it's also a good reminder of how just 15 minutes, or whatever it is, of kind of discussing and talking about the different use-cases, kind of help me staying on top of what's going on. So I'm also curious for the one listening. Do you have any of your favourite graph use-cases that you have seen? Please comment, I guess, in the post as well. I mean, that would be super interesting to see what you like and if there is any one that you picked up.

RVB:  00:10:47.943 Let's leave with that. And again, thanks for doing the call with me, and I look forward to next month already.

SW:  00:10:56.408 Me too. Bye-bye.

RVB:  00:10:57.818 Bye. Cheers. Bye.

Subscribing to the podcast is easy: just add the rss feedfind us on Spotify or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Wednesday, 12 February 2020

Experimenting with Conflicting access privileges in Neo4j 4.0

In the past couple of weeks, I have been playing around with the shiny new security features of Neo4j 4.0. They are truly interesting - both for childproofing beergraphs and for ensuring that your sensitive fraud databases are properly secured. Take a look at the previous post, and I think you will understand why.

In this post, I wanted to talk about something that I have seen so many times in my previous lives in the security industry, and that also became evident in my 4.0 research. It's got to do with conflicting security privileges. In a nutshell, this is to do with the case where

  • a specific user / role would receive a particular set of privileges from one policy
  • the same user / role would receive a different, and contradictory privilege from another policy. 
In that case, we need clear rules to understand what would happen. In the case of Neo4j 4.0, this is reasonably well explained as part of the documentation - see the documentation site on this topic - but in this post I will try to give you a realistic, but simple example.


Creating Conflict

We'll start working on this with the same database as the previous post, the fraud dataset. If you don't have it yet, just download it from this link. Once we have the database up and running as a separate user database, we can switch to the system database and create a separate user for these tests.

//create a separate user for engineering the conflicting privileges
CREATE USER conflicted_user SET PASSWORD "changeme" CHANGE NOT REQUIRED;
CREATE ROLE conflicted_role AS COPY OF reader;


Friday, 7 February 2020

Securing a sample fraud graph with Neo4j 4.0

This week, we at Neo4j formally released our brightest and shiniest new version of the Neo4j Graphg Database to the world. It's been an amazing journey to this point, and others have reported on this magnificent piece of engineering in more depth. Take a look at Jim's blogpost, or if you are in a hurry, checkout the graphcast below:
Last week, I started playing around with it myself - by digging up my good old faithful beergraph, and illustrating some of the new features in childproofing exercise for beers. Take a look at that post as well for some giggles. Now in this post, I wanted to essentially do the same thing as I did on the beergraph, but using a Fraud dataset. 

Let's see how that would work.

Wednesday, 29 January 2020

Securing my Beergraph with Neo4j 4.0

Not sure if you have realised, but Neo4j has actually recently made the 4.0 version of the most fantastically awesome graph database on the planet available. You can get it ahead of the big launch event (on February 4th, 2020 - in case you were wondering!) from the Download Center and take it for a spin.

In this unbelievable release, there are so many new features, it's kind of hard to keep track of everything. But the ones that I can most easily get my head around are clearly
  1. multi-database support - finally, Neo4j actually has this concept of running multiple databases on one database server. A multi-tenancy solution, that has been requested and anticipated by many of our users and customers. 
  2. a VERY advanced schema-based security module, that allows people to extend the existing role-based security model of Neo4j even further - and make it crazy powerful. We'll spend a lot of time on that in this blogpost.
Readers of this blog probably know that I am a big fan of getting my feet down and dirty with our products, so this evening - with a couple of hours to spare, so to speak - I decided to try out the shiny new release. I spun up my Neo4j Desktop, and started reading some manual pages where stuff was explained. Specifically, I loved

Soon after flipping through this, I was on my way.

Tuesday, 14 January 2020

Graphistania 2.0 - Episode 3 - This Month in Neo4j

Happy new year everyone - although it actually seem like the holidays are already very far behind us! But great times were had, at least in my family, and so I feel super energised to make 2020 another great start to a decade of graphs :) ... Here's to that!

It also means that we are continuing to see all these awesome community stories pop up left right and center in the Neo4j "This week in Neo4j" developer newsletter. And so on our Graphistania podcast, we are going to continue talking about these on a monthly basis. So that's what we're doing - and I have again invited my friend and colleague Stefan Wendin to join me.

From the newsletter, we always select a few stories that we think will be more interesting and/or meaningful to discuss. This month, we found a number of them, and the interesting thing was that the graph-stories seemed to play at very different scales... The Personal, Corporate, and Society levels. Here are some of the ones we liked:

At the Personal scale
At the Corporate scale
At the Society scale, we saw some amazing posts:
So I think you agree that we had plenty of stuff to talk about. Let's get into that!

Monday, 16 December 2019

Part 3/3: Revisiting Hillary Clinton's email corpus with graph algos and NLP

(Note: this is Part 3 of this blogpost.  Part 1 and Part 2 are also published.)

Alright this is going to be the third and final part of my work on the Hillary Clinton Email Corpus. There's two posts that came before this article:
Now we are going to spent some time with the "heart of the matter", the actual content of the emails. We are going to do that in two steps: first we will do some "full text" querying of some data, using Neo4j's specific full text indexing capabilities. Then we are going to go a step further and try to extract more knowledge from this dataset in an automated way, by running some Natural Language Processing (NLP) algorithms and processes on it.

Let's get right to it.

Fulltext querying of Emails

Those of you that have been following Neo4j for some time, may remember that we have always bundled Apache Lucene with Neo4j. For the longest time, Neo4j used Lucene for it's indexing capabilities. This turned out to be a great choice for many things, but also one that had its limitations and trade-offs. This is why Neo4j has gradually been switching away from Lucene for its core schema indexing capability, and has adopted a modular, pluggable indexing architecture that allows for different indexing techniques to be used for different data types. This is great news for many reasons, but one of the most important benefits has been a dramatic increase in write performance - as the newer indexes are much more optimized and leaner than the older Lucene based structures. Read more about indexing in the Neo4j manual.

So as I started to think about some text-oriented queries, I quickly realised that I would need an index on Email text. So I wanted to do

create index on :Email(text)

and query that index afterwards. But the result was pretty obvious:


Part 2/3: Revisiting Hillary Clinton's email corpus with graph algos and NLP

(Note: this is Part 2 of this blogpost.  Part 1 and Part 3 are also published.)

In the previous post around the emails of Hillary Clinton, we were able to import the data from a CSV file, and use some really cool graph refactoring tools to make the database a little more easy to work with - bad data is bad data, and the less we have of that the better.

So we ended up in a reasonably stable state, where we could do some querying. In this post, we will do exactly that.

Exploring the graph with graph algos

It's fairly easy to get a good initial view of the structure and size of the graph. I just run a few queries like this:

//what nodes are in the db
match (n) return labels(n), count(n)

and: 

//what rels are in the db
MATCH p=()-[r]->() RETURN type(r), count(r)

and we very quickly see that, while this is clearly not a "big" dataset, it's still big enough to start loosing some significant time sifting through data if you want to make some sense of it. This is where our fantastic graph algorithms come in. I installed the plugin into my database, restarted it, and then I also played around a bit with Neuler, a graph algo playground that basically allows you to quickly experiment with different algorithms. You can download Neuler from https://install.graphapp.io/ and install it into your Neo4j Desktop really quickly.