# A Graph Database and a Dadjoke walk into a bar...

## How many times does a joke get tweeted?

``````MATCH ()-[r:REFERENCES_DADJOKE]->(dj:Dadjoke)
WITH dj.Text AS Joke, count(r) AS NrOfTimesTweeted
RETURN Joke, NrOfTimesTweeted
ORDER BY NrOfTimesTweeted DESC
LIMIT 10;
``````

## How many times does a joke get favorited?

``````MATCH ()-[r:REFERENCES_DADJOKE]->(dj:Dadjoke)
RETURN dj.Text AS Joke, dj.SumOfFavorites AS NrOfTimesFavorited, dj.SumOfRetweets AS NrOfTimesRetweeted
ORDER BY NrOfTimesFavorited DESC
LIMIT 10;
``````

## Different ways of finding jokes about cars

Let's explore 3 alternative ways to find jokes about cars.

### 1. Matching the text of the `Dadjoke` for the word "car"

``````MATCH (dj:Dadjoke) WHERE dj.Text CONTAINS "car" RETURN dj.Text LIMIT 10;
``````

### 2. Checking if the `Entity`contains the word "car"

``````MATCH (e:Entity)--(dj:Dadjoke) WHERE e.text CONTAINS "car" RETURN dj.Text LIMIT 10;
``````

### 3. Checking if the `Entity`equals the word "car"

``````MATCH (e:Entity)--(dj:Dadjoke) WHERE e.text = "car" RETURN dj.Text LIMIT 10;
``````

## Finding jokes about cars and wives

This was another great example:

``````MATCH p=(h:Handle)--(t:Tweet)--(dj:Dadjoke)-[r:JACCARD_SIMILAR]->()
WHERE dj.Text CONTAINS "spaghetti"
AND (dj.Text CONTAINS "bike" OR dj.Text CONTAINS "car")
RETURN p;
``````

It's amazing to see how the same conceptual joke is being reused in different ways!

Now we can of course also start to look at some of the structural charactersistics of this part of the Twitterspace. Just from looking at some of the subgraph results of our queries, it becomes obvious that

• lots of jokes are being repeated, time and time again
• different Twitter handles actually borrow each others jokes - all the time

So let's explore that a little more.

### How many jokes are tweeted identically by different tweeters

``````MATCH path = (h1:Handle)-[*2..2]->(dj:Dadjoke)<-[*2..2]-(h2:Handle)
WHERE id(h1)<id(h2)
RETURN path;
``````

This takes a while to load, but you can clearly see a few cliques in this picture.

Let's see how many such paths are actually there:

``````MATCH path = (h1:Handle)-[*2..2]-(dj:Dadjoke)-[*2..2]-(h2:Handle)
WHERE id(h1)<id(h2)
WITH h1.name AS FirstHandle, h2.name AS SecondHandle, count(path) AS NrOfSharedJokes
RETURN FirstHandle, SecondHandle,NrOfSharedJokes
ORDER BY NrOfSharedJokes DESC;
``````

The result is quite enlightning: GroanBot and RandomJokesIO are clearly reinforcing one another. My personal guess is that they are truly just bots.

### What are the most frequent entities

We already have the Favorite/Retweet scores of all the dadjokes summed up, so we can also look at which `Entity` nodes have the highest scores that way:

``````MATCH (e:Entity)--(dj:Dadjoke)
WITH e, sum(toInteger(dj.SumOfFavorites)) AS sumofsumoffavorites, sum(toInteger(dj.SumOfRetweets)) AS sumofsumofretweets
SET e.SumOfSumOfFavorites = sumofsumoffavorites
SET e.SumOfSumOfRetweets = sumofsumofretweets;
``````

This operation finishes very quickly, and so then we can do the exploration quite easily, and figure out what the entities are that our dadjokers are mostly joking about:

``````MATCH (e:Entity)
RETURN e.text, e.SumOfSumOfFavorites AS EntityFavoriteScore, e.SumOfSumOfRetweets AS EntityRetweetScore
ORDER BY EntityFavoriteScore DESC
LIMIT 10;
``````

Surprise: it's about wives and bosses. Right!

## Wrapping up

What a crazy ride this has been. I could actually think of many different things that I would want to do with this dataset - but I will leave it at this for now. I do think that this has been one of the best (and most FUN) examples that I have come across recently that combines data import, data wrangling, NLP, text analysis, graph data science and disambiguation in one exercise. I really loved it - and hope it will inspire others to explore this or other datasets in the same graphy way.

Cheers

Rik

Here are the different parts to this blogpost series:
Hope they are as fun for you as they were for me.