Monday, 20 December 2021

Cognitive biases in Neo4j

I am an economist/engineer. I studied "Commercial Engineering" in Belgium in the nineties, and was quite an avid learner of economic theories large and small at the time. I did however, always kind of find myself uneasy at economists insistence on the rationality the homo economicus, as I knew, and observed all around me, that people were far from rational. That's why, ever since I learned of its existence, I have been a big fan of the field of behavioral economics - which actually tries to formulate ecomic theories that are real, and often times, irrational. I fondly remember first reading Dan Ariely's Predictably Irrational, and learning about some of the crazy biases that he observed and DESCribed. And Nobel-prize-winning Daniel Kahneman has been a hero for decades. I think about the Framing Effect) and Prospect Theory almost on a daily basis.

It all started with a tweet

So you can imagine my excitement when I learned about this tweet:

This tweet include this particular infographic:

I then looked into the source of the graphic, and found that it was featured in more detail on this page. Not much later I was thinking how I could actually do something cool and graphy with this wonderful little piece of data. And again not much later I came up with this blogpost below.

First: get the Cognitive Bias data into a spreadsheet

Call me strange, but I love a good little Google sheet. I put the data in there, and took the trouble of actually putting the category data in there as well. That was 10mins of manual work that I will never get back.

Once I had the spreadsheet nailed, I could easily download the data as a .csv file. That file is then of course ready for import.

Import that data into a Neo4j graph

Import file from .csv

As usual, it take a only a second to import that data into Neo4j, with a very simple query:

    LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/e/2PACX-1vQDDO_Fqewk1OSR7qrW-2XR7AHhy1MiWmFGwZgLhptaislLP6JLmXgDkR0F331WClserKQz61UDjG8n/pub?gid=0&single=true&output=csv" AS csv
    CREATE (b:Bias)
    SET b = csv;

The result:

 

Next, I wanted to extract the Category information from the Bias nodes.

Creating the HAS_CATEGORY relationship

From the spreadsheet, we know that there are 6 different categories:

  • Memory
  • Social
  • Learning
  • Belief
  • Money
  • Politics Every Bias has one or more categories, but some are actually pertinent to all.

So let's run the following queries to CREATE the Category nodes, and connect the Bias nodes to these using the HAS_CATEGORY relationship:

    MATCH (b:Bias)
    WHERE b.Memory IS NOT NULL
    MERGE (c:Category {name: "Memory"})
    CREATE (b)-[:HAS_CATEGORY]->(c);

    MATCH (b:Bias)
    WHERE b.Social IS NOT NULL
    MERGE (c:Category {name: "Social"})
    CREATE (b)-[:HAS_CATEGORY]->(c);

    MATCH (b:Bias)
    WHERE b.Learning IS NOT NULL
    MERGE (c:Category {name: "Learning"})
    CREATE (b)-[:HAS_CATEGORY]->(c);

    MATCH (b:Bias)
    WHERE b.Belief IS NOT NULL
    MERGE (c:Category {name: "Belief"})
    CREATE (b)-[:HAS_CATEGORY]->(c);

    MATCH (b:Bias)
    WHERE b.Money IS NOT NULL
    MERGE (c:Category {name: "Money"})
    CREATE (b)-[:HAS_CATEGORY]->(c);

    MATCH (b:Bias)
    WHERE b.Politics IS NOT NULL
    MERGE (c:Category {name: "Politics"})
    CREATE (b)-[:HAS_CATEGORY]->(c);

Here's the result of that query:

The graph now looks like this:

 

And then we can immediately see that some biases are more interesting than others - if only because they impact more "categories":

    MATCH (b:Bias)-->(c:Category)
    RETURN b.Title, count(c) AS numberofcategories
    ORDER BY numberofcategories DESC;

The result of this looks like this:

My summary would be: interesting, but a bit boring... So I thought about how to make this little graph a little bit more interesting.

NLP on the Bias descriptions

I decided to apply a method that I have used a few times before: the Bias nodes all have a Description property, which actually summarizes in a good way the meaning of each and every one of the 50 biases. If we run

    MATCH (b:Bias)
    RETURN b.Title, b.Description
    ORDER BY b.Title ASC;

Then we see that we could actually do some useful Natural Language Processing on these descriptions:

So, after having installed the required NLP .jar file in the Plugin directory of the Neo4j server, we can start analysing the descriptions using the Google Cloud NLP service. Here's how that works:

    :param apiKey =>("`some fake key here`");

    MATCH (b:Bias)
    CALL apoc.nlp.gcp.entities.graph(b, {
        key: $apiKey,
        nodeProperty: "Description",
        scoreCutoff: 0.01,
        writeRelationshipType: "HAS_ENTITY",
        writeRelationshipProperty: "gcpEntityScore",
        write: true
        })
    YIELD graph AS g
    RETURN "Success!";

This returns after a few seconds:

Now you can see a much richer graph.

And of course we can do some interesting querying on this, like for example exploring the paths between two Biases.

    match path = 
        shortestpath ((b1:Bias {Title: "Reactance"})-[*]-(b2:Bias {Title: "Automation Bias"}))
    return path;

Gives us this:

 

Wrapping up

So, in conclusion: thanks to Elon's tweet, I had another bit of fun with Neo4j, Bloom, and Google NLP. Hope you liked this example as much as I did - let me know your thoughts regardless!

Cheers

Rik

PS: you can import this little graph in no time, without running the NLP yourself, via this Cypher script.

No comments:

Post a Comment