Friday, 29 August 2014

Neo4j makes my head spin!

Since last month, my kids have been going to the local CoderDojo in Antwerp. It was a great success - and I must say that I myself truly enjoyed attending it as well. It's such a great sight to see kids do something more or less useful with their laptops, learning logic and some programming along the way.

The tool that they are using is Scratch - a great, visual and stimulating way for them to get acquainted with the principles of programming. One of their "assignments" was to great a Pong-like game, which they were able to complete quite easily to my surprise. And of course I had to goof around with it too, creating "NeoMakesMyHeadSpin". Silly, I know. But here you have it anyway. Push the green flag to start, the righ/left arrows to move the racket, and space bar to introduce the spinning head. 


Hope you like it - somehow.

Rik

Thursday, 28 August 2014

Import - summarized

Yesterday night, I did a talk at the fantastic Neo4j meetup in London, summarising the different options that you have when importing data into Neo4j. The talk went well - I think - so I thought I would share the prezi over here: 
The prezi includes a couple of movies that take people through the three "demonstration" scenarios step by step. If you want to jump straight to that, then just take a look at this playlist
Hope this is useful. Please let me know if you have any questions or comments?

Cheers

Rik

Saturday, 23 August 2014

How to fuck up your Neo4j project - the prezi

As I mentioned in the previous blogpost, this "fucking up" story started with some internal discussions and presentations on how our users and customers work with our lovely Neo4j database. I might as well share the - slightly modified - presentation here as well then.
Hope this is also useful. Feedback welcome, as always.

Cheers

Rik

Thursday, 21 August 2014

How to mess up your Neo4j project

Early August, I celebrated my second anniversary of working for the wonderful folks at Neo Technology, makers of Neo4j. It has been a wonderful, inspiring journey so far, with many beautiful things to come I am sure. But along the way, I have also seen some things that I would have preferred not to have seen – mistakes or failures that I believe could have been avoided, specifically around the setup of Neo4j projects. Some of these reasons are technical - others aren't - but all of them seem valuable to me. So I thought I would write that up for you – both for your benefit and for my own …
Note to the reader: this post started out as a presentation that I half jokingly made internally in Neo Technology, talking about all the things that users could do wrong in their project. I took the perspective of “what would you do, if you really wanted to mess things up”. Of course, I could have made this point in different ways, and I could have documented “best practices” that talked about “how to do things”. But I decided not to, essentially because this is way funnier – and I am a big believer in using humor to get a point across. So here goes.

Monday, 4 August 2014

World of WarGraph

Today is August 4th, 2014. For most people, that date probably does not mean a lot - but for many people in Europe it probably does - especially if you are from Germany, Belgium, the UK, France - or any of the countries in Central/Western Europe. And for most people across the globe, it probably should mean more than it does - because it is the 100th birthday of the start of World War 1, when Germany invaded Belgium, violated it's neutrality, and Britain declared war on Germany. 

I have never lived a war. I have lived a very comfortable, safe life in Antwerp, Belgium for the past 40 years. But every now and then I go to the Saint Sixtus abbey in Westvleteren to ... indulge on some beer, of course, and then we almost pass through Flanders Fields. 


Yes indeed, home of the poppy remembrance symbol - not that we really have that many, at least not today.

Some of these War remembrance monuments are truly, truly moving symbols of pacifism. I took my kids to visit Tyne Cot, as an example - and it was a day to remember. You cannot unsee 20000 graves.


We also took them to the "trench of death" this spring, and to the war remembrance museum in Diksmuide: the Museum on the Yser. Call me stupid, but I also want to take them to the In Flanders Fields this summer. I have been putting it of because I know it will scare the cr@p out of them - but I think it's one of those little things that I need to tell my kids: War is NOT nice. Not.


Especially with the daily pictures that we see of the Ukraine Unrest, or worse still, the atrocious war in Gaza... War seems not-so-far away. Will we ever learn?



So earlier this month, I was idling around on the net, thinking about stuff like this, and I came across the Correlates of War website. This project was founded in 1963 by J. David Singer, a political scientist at the University of Michigan, and has been documenting the different wars (or "disputes", as he calls them) in a structured way. Which of course brings me to the meat of this blogpost: the WarGraph. Wouldn't this be a great dataset to look into in Neo4j?

Working the data

I started of course looking into the publicly available Correlates of War datasets. There's more than one that we need to import: one for Countries, one for Disputes, and then there's some interesting "meta-data" around religions (in the countries) and Material Capabilities (of the countries). Of course, to start working with this data, I put everything together into a bigger google spreadsheet, which really is turning out to be my go-to-tool these days.

The Graph of Wars Model

Needless to say, we needed to create a graph model of the data before we could do anything meaningful. Here's what I ended up with.
Let me take you through this to explain:
  • Country nodes have a lot of interesting metadata: the CoW people have assembled a lot of data on countries' economic and military capabilities. They have yearly data since the early 1800s, but for simplicity's sake I only imported the 2007 data: 
    • Iron and Steel Production, 
    • Military Expenditures (GBP or USD)
    • Military Personal (thousands) 
    • Primary Energy Consumption 
    • Total Population 
    • Urban Population 
    • Composity Index of National Capability score: computed measure by summing all observations on each of the 6 capability components above, converting each state's absolute component to a share of the international system, and then averaging across the 6 components. 
  • Some metadata needs to be imported in order to make the model more graphy / normalised: 
    • Different kinds of Outcomes are a subgraph, and labeled as such
    • Different kinds of Settlements are a subgraph, and labeled as such
    • Different fatality levels are a subgraph, and labeled as such
    • Different kinds of "Highest levels of Action" in the dispute (HiAct) are a subgraph, and labeled as such
    • Different kinds of hostility levels - which are related to the "highest levels of action" - are also a subgraph, and labeled as such.
  • In order to work with the Years in the graph, I have also connected them to one another in "in-graph-index", aka a timeline. I did something similar with my beergraph a while ago.
  • Religion data also imported: again, there's a lot of interesting data since the 1800s, but I only imported recent data from 2010.
If you think you need a more detailed explanation of what the different data elements mean, you can always go back to the codebook for more info.

Import the WarGraph into Neo4j

Then all I had to do was import the data. I used a combination of two techniques here:
The detailed overview of the import process is in the gist over here. It's not difficult at all.

Querying the WarGraph

Once the data is in neo4j, we can start using the Neo4j browser to start looking at some data using simple Cypher queries:

Let's see if we can find the USA (as a country) in this dataset:

MATCH (n:Country {short:"USA"})-[r]-() 
RETURN n,r
LIMIT 10


Seems correct. Now let's start looking at some "war" related information. Let's find the countries that have been involved in the most disputes:

MATCH (n:Country)-[r:PARTICIPATES_IN]->(d:Dispute) 
RETURN n.name, count(r) 
ORDER BY count(r) desc
LIMIT 10;


Interesting. The US and the UK are up there. But so are Germany, France... and Israel (a country that did not exist until that long ago).

We can then slice and dice this data a bit more. Let's look at the countries with most disputes "per capita", ie relative to their population size. Here's the query:

MATCH (n:Country)-[r:PARTICIPATES_IN]->(d:Dispute)
WHERE n.totalpop is not null
WITH n, count(r) as NrOfDisputes
RETURN n.name, n.totalpop, NrOfDisputes, 1.0*NrOfDisputes/n.totalpop as DisputesPerCapita
ORDER BY DisputesPerCapita desc
LIMIT 10

There's a little trick here with the "1.0*" in the query. This is to force cypher into a floating point operation before it gets to the floating point operation... If you do it any later the query will file. Thanks to Alistair for helping me with that.


Let's now take a look at the time dimension, by running along my in-graph year-index. Let's look at the disputes in first half of 20th century:

MATCH (y1:Year {name:1900})-[:PRECEDES*..51]->(y2),
(d:Dispute)-[:STARTED_IN]->(y2)
RETURN distinct d.name as Dispute, y2.name as StartYear;


Or let's take a look if Religion would have anything to do with warfare. Would it? 
Here's the top 10 of countries with most religious adherents that participated in disputes:

MATCH (n:Religion)<-[r:HAS_ADHERENTS]-(c:Country)
WHERE n.name <> "Non-religious"
WITH distinct c.short as country, r.number as nrofadherents
ORDER BY nrofadherents DESC
LIMIT 10
WITH country
MATCH (c:Country {short: country})-[:PARTICIPATES_IN]->(d:Dispute)
RETURN distinct c.short, c.name, count(d);


And then last but not least, let's see if we can explore some paths. Just as an example, nothing more, we can take a look at some links between two countries, like for example the USA and Israel:

MATCH (u:Country {short:"USA"}), (i:Country {short:"ISR"}),
p = allshortestpaths((u)-[r*]-(i))
RETURN p;


No world shocking data in any of these queries, but still very interesting stuff to play around with. All the queries are in the gist over here

As usual, I would welcome your feedback. I hope this was useful, and that we will all make graphs, not war.

Rik