Friday 7 February 2020

Securing a sample fraud graph with Neo4j 4.0

This week, we at Neo4j formally released our brightest and shiniest new version of the Neo4j Graphg Database to the world. It's been an amazing journey to this point, and others have reported on this magnificent piece of engineering in more depth. Take a look at Jim's blogpost, or if you are in a hurry, checkout the graphcast below:
Last week, I started playing around with it myself - by digging up my good old faithful beergraph, and illustrating some of the new features in childproofing exercise for beers. Take a look at that post as well for some giggles. Now in this post, I wanted to essentially do the same thing as I did on the beergraph, but using a Fraud dataset. 

Let's see how that would work.

Applying security to a fraud graph

For this example, I will be working with a synthetic fraud database that we have built up over the years with lots of fake data, but a realistic datamodel. I used this dataset before in a few blogposts, and you can download it (in the new 4.0 format) from over here. The data model looks like this:
You can actually see this in a bit more detail by running a different query:

match (n1)-[r]->(n2)
return distinct labels(n1) as `1st Label`, type(r) as `RelType`, labels(n2) as `2nd Label`
order by `1st Label`;

which would give you all node labels and how they are connected to one another:

Now, the basic premise of my security exercise here, is going to be that we have different types of users that would be accessing this dataset, and that these different users would need different privileges when they access the data - not everyone is allowed to see the same level of detail, just like in the BeerGraph example. My idea would be that we would want to differentiate between two types of users:

  1. an administrative user, that would be allowed to only see the basic details of a fraud investigation
  2. an investigative user, who would need to see all the details of the fraud investigation
Let's see how we can differentiate these.

The Money Query for fraud investigators

In this fraud database, the most important question that we would be trying to answer, is all about synthetic identities. What's that? Well, if you believe Investopedia synthetic identity theft is a type of fraud in which a criminal combines real and fake information to create a new identity. This information is used to open fraudulent accounts and make fraudulent purchases. It's one of these types of frauds that are easily missed by traditional system - and easily caught by graph databases.

So let's take a look at this interesting query for frauds investigators:

//find shared identity graphs
path = (accountHolder:AccountHolder)-[:HAS_ID|HAS_ADDRESS|HAS_PHONENUMBER|HAS_SSN]->(contactInformation)<-[]-(accountHolder2:AccountHolder)
RETURN path;

This would give you a couple of really useful results if you would run the query as the database administrator (the normal default if you don't secure the graph):
If you zoom in, you would see that this is super interesting: AccountHolders sharing addresses, Phone Numbers, Social Security Numbers, and combinations of all of the above, to create these synthetic identities:
Let's explore case where we want to have two roles with different privileges.

Separating the administrative from the investigative roles

As mentioned above, we want to separate the accessa to the data into two different roles:
  • an administrative role: can see very little, only the very basic information
  • an investigative role: can see everything, in order to accurately run the fraud investigation in all the necessary details.
So let's create these two roles, for two different users:

//in Neo4j Browser first switch to the system database
:USE system;

then we can proceed:

//create the users

//create the roles
CREATE ROLE admin_role AS COPY OF reader;
CREATE ROLE investigative_role AS COPY OF reader;

Immediately after we can show the roles having been created in the system database:
show roles;

Gets us a clear overview:

So then we can proceed by adding the roles to the users, and then adding the read restriction privileges to the roles:

//add the roles to the users
GRANT ROLE admin_role TO admin_user;
GRANT ROLE investigative_role TO investigative_user;

//Add read restriction on sensitive properties of certain labels
DENY READ {firstName,lastName,fullName,birthDate} ON GRAPH `fraudgraph` NODES AccountHolder TO admin_role;
DENY READ {zip,streetAddress,city} ON GRAPH `fraudgraph` NODES Address TO admin_role;
DENY READ {phone,touched} ON GRAPH `fraudgraph` NODES PhoneNumber TO admin_role;
DENY READ {ssn} ON GRAPH `fraudgraph` NODES SSN TO admin_role;

Immediately, we see the result of the update to the role:


Looks like this:

Yu can do the same thing for the users:


Looks like this:

Looks like the administrative users privileges have been reduced, so now all we need to do is to rerun the query as an administrative user, by first disconnecting from the server, and then logging back in with the admin_user credentials. When we do that, and we run the same query we will actually see a structure that is identical to the one above - except for the fact that it has way less detail. AccountHolder don't show names, SSNs don't show social security numbers, addresses only show the state, and phone numbers only show the operators.
Note that I had to adjust the visualisation in the browser to see these different properties.

However, when I disconnect and reconnect again as the investigative_user, I will see all the details again:
I think this is a good example of how you can use some of the fine-grained access control capabilities in Neo4j 4.0 in a real world use case. All of the queries in this blogpost are in this github gist, so please take a look at it yourself to take it for a spin or customize it for your database.

Hope this was useful - let me know if you have any feedback.

All the best


No comments:

Post a Comment