Wednesday, 29 January 2020

Securing my Beergraph with Neo4j 4.0

Not sure if you have realised, but Neo4j has actually recently made the 4.0 version of the most fantastically awesome graph database on the planet available. You can get it ahead of the big launch event (on February 4th, 2020 - in case you were wondering!) from the Download Center and take it for a spin.

In this unbelievable release, there are so many new features, it's kind of hard to keep track of everything. But the ones that I can most easily get my head around are clearly
  1. multi-database support - finally, Neo4j actually has this concept of running multiple databases on one database server. A multi-tenancy solution, that has been requested and anticipated by many of our users and customers. 
  2. a VERY advanced schema-based security module, that allows people to extend the existing role-based security model of Neo4j even further - and make it crazy powerful. We'll spend a lot of time on that in this blogpost.
Readers of this blog probably know that I am a big fan of getting my feet down and dirty with our products, so this evening - with a couple of hours to spare, so to speak - I decided to try out the shiny new release. I spun up my Neo4j Desktop, and started reading some manual pages where stuff was explained. Specifically, I loved

Soon after flipping through this, I was on my way.

Adding the Belgian BeerGraph as a "tenant"

I decided to revisit my dear old beloved Belgian BeerGraph, and to fire up the 4.0 server, and create a a couple of databases on it - among which the one with all the Belgian Beers. This was super easy, following some of the instructions on the developer site:

//create database
create database beergraph;

And in the Neo4j browser I could see the new database appear:
I quickly added the indexes to this database:

//create the indexes
create index on :BeerBrand(name);
create index on :BeerType(name);
create index on :Brewery(name);
create index on :AlcoholPercentage(value);

and that made it super easy to get started:
All I needed to do was to run this query to import the beers:

//Import the beergraph
load csv with headers from
"https://docs.google.com/spreadsheets/d/1FwWxlgnOhOtrUELIzLupDFW7euqXfeh8x3BeiEY_sbI/export?format=csv&id=1FwWxlgnOhOtrUELIzLupDFW7euqXfeh8x3BeiEY_sbI&gid=0" as csv
with csv
where csv.BeerType is not null
merge (b:BeerType {name: csv.BeerType})
with csv
where csv.BeerBrand is not null
merge (b:BeerBrand {name: csv.BeerBrand})
with csv
where csv.Brewery is not null
merge (b:Brewery {name: csv.Brewery})
with csv
where csv.AlcoholPercentage is not null
merge (b:AlcoholPercentage {value: tofloat(replace(replace(csv.AlcoholPercentage,'%',''),',','.'))})
with csv
match (ap:AlcoholPercentage {value: tofloat(replace(replace(csv.AlcoholPercentage,'%',''),',','.'))}),
(br:Brewery {name: csv.Brewery}),
(bb:BeerBrand {name: csv.BeerBrand}),
(bt:BeerType {name: csv.BeerType})
merge (bb)-[:HAS_ALCOHOLPERCENTAGE]->(ap)
merge (bb)-[:IS_A]->(bt)
merge (bb)<-[:BREWS]-(br);

If you have followed the beergraph story on this blog, you know that I also have an "in-graph" alcoholpercentage timeline in there. Not as useful anymore as it used to be, but I still added it:

//create the in-graph index
MATCH (ap:AlcoholPercentage)
WITH ap
ORDER BY ap.value ASC
WITH collect(ap) as sorted_ap
FOREACH(i in RANGE(0, size(sorted_ap)-2) |
  FOREACH(sorted_ap1 in [sorted_ap[i]] |
    FOREACH(sorted_ap2 in [sorted_ap[i+1]] |
      MERGE (sorted_ap1)-[:PRECEDES]->(sorted_ap2))));

So there we are: the database was imported.

So far, I must say, there's not that much difference here. All you notice is that there is a system database and a user database, very similar to what other dbms' do.


Now let's get started with the schema based security parts - which is super interesting.

Securing the Belgian Beergraph tenant

Now, if you flip through the manual and the movie mentioned above, you will find that these new schema based security features are super advanced. Based on new super easy definitions of users, roles, and privileges, you can really get started super easily with this.

For some of these tasks, it's actually super easy to use Halin to configure things graphically, but I have found that it's actually almost as easy to do this in Cypher. Mind you, you always have to take care in the Neo4j Browser to select which database you are using: the system database, or your "beergraph" user database. Please take care with this - as the privileges all need to be granted in the system database.

Part 1: restricting the reading of properties

Here's the problem that I wanted to explore: in the Belgian Beergraph, we have 4 entities:
So, as everyone knows I care very deeply about anything related to Political Correctness (please watch this clip if you think this is true - as it is NOT), I thought it would be fun to try and hide the AlcoholPercentage of the graph from children. Totally realistic and completely sensible - so here goes.

First, we had to create a user in the database that would represent a child (the "childreader" user), and assign that new user a role (the "childreaderrole" role):

CREATE USER childreader SET PASSWORD "changeme" CHANGE NOT REQUIRED;
CREATE ROLE childreaderrole AS COPY OF reader;
SHOW ROLES;
GRANT ROLE childreaderrole TO childreader;

That was easy:
Then, we would go on to change the permissions/privileges in the graph, and change how different users would see the values of the AlcoholPercentage nodes. This is the privilege that I created:

DENY READ {value} ON GRAPH `beergraph` NODES AlcoholPercentage TO childreaderrole;

If I would then run the specific query below as admin or as reader:

MATCH (o:BeerBrand {name:"Orval"}), (d:BeerBrand {name:"Duvel"}),
path = allshortestpaths ((o)-[*]-(d))
RETURN path;

I would get the following result:
However, with new "politically correct" privilege in place, protecting our vulnerable children from knowing the values of the AlcoholPercentage nodes, runing the EXACT same query as user "childreader" would yield this result:
The difference is subtle but important: the same graph structure is returned by the query, but the values that the user is not entitled to see, are simply omitted / greyed out. Pretty cool.

Part 2: restricting the traversal through the graph

But here's where I think everything gets a bit more fantastic. In 4.0, we can now not only restrict access to properties (or entire subgraphs, if we would so desire), we can also restrict the behaviour of traversals based on schema based security privileges. Look at this privilege:

DENY TRAVERSE ON GRAPH `beergraph` RELATIONSHIPS HAS_ALCOHOLPERCENTAGE to childreaderrole;

As you can tell, this basically allows us to say that "children are not allowed to traverse the HAS_ALCOHOLPERCENTAGE relationship", meaning that if we rerun the above query to look for all the shortest paths between two beers (Orval and Duvel) we are very likely to find a VERY different result. Look at what happened:
As you can see the traversal all of a sudden becomes a lot deeper. Instead of having Duvel and Orval be 4 hops away from one another (see the traversal above), the privilege has now doubled that and made the traversal double as long - 8 hops away. Really cool.

Wrap-up and conclusion

Finally, wrapping up this post, let's see how we can easily remove these privileges if we want to show the before/after scenarios: you can remove the read/traversal restrictions really easily:

//in case you want to remove the restrictions and start over
REVOKE DENY READ {value} ON GRAPH `beergraph` NODES AlcoholPercentage from childreaderrole;
REVOKE DENY TRAVERSE ON GRAPH `beergraph` RELATIONSHIPS HAS_ALCOHOLPERCENTAGE from childreaderrole;

I personally was extremely impressed with the power and flexibility of both the multi-database and the schema-based security features of 4.0. Take it for a spin yourself. All of the code above is on github as usual, so it should be super easy to test for yourself.

Hope this is useful.

Cheers

Rik


No comments:

Post a Comment