In the past couple of weeks and months, I have been having a lot of fun at Neo4j working with different clients. One thing struck me however (maybe it's a coincidence, but still): we have come across an impressive amount of customers that all had very similar requirements: they were looking to use Neo4j as the foundation architecture for a next-generation POLE database. A what? A P-O-L-E database.
What is a POLE, exactly?
I guess everyone has their own definition and wants to create yet-another-vague-acronym, but the common case seems to be that it's like a "case management" tool for specific types of government agencies that want to look at the links between Persons, Objects, Locations, and Events. Some of the cases are to be found in police forces, government (tax / social service) agencies, immigration authorities, etc ... They all have that same requirement of being able to analyse and link different entities together, like so (or similar):
The Global Terrorism DatabaseAs mentioned above, one of the key areas where people will try to understand the connections between the POLEs, is in police/intelligence work. In fact, we have noticed that many of the Neo4j use cases that we have worked on are in this domain. So where to find interesting data around topics like that...
Like in so many cases I can't exactly reconstruct how I got there, but in the end I found the Global Terrorism Database (GTD). They seem to be very strict about their ownership of the data, so here's some legalese for you:
the data was provided by the National Consortium for the Study of Terrorism and Responses to Terrorism (START). (2015). Global Terrorism Database [Data file]. Retrieved from http://www.start.umd.edu/gtd.And I must say: they did an unbelievable job. The interface below is super interesting to play around with in the first place.
Then after some playing around I quickly noticed I could actually download the dataset from this page over here.
- a big, tall and wide Excel file.
- a Codebook that explains the meaning of the different data elements in the Excel file.
Opening up the file takes a bit longer than on average, but works fine on my machine. It's about 140000 lines long, and I-don't-know-how-many columns (a lot) wide.
- Events: the 140000 terrorist attacks from 1970 until 2014.
- Objects: the weapons / systems / objects used during these attacks
- The Persons / Groups of persons (usually) performing the attacks
- The Location of the attacks (by region, country, province/state, city, gps-coordinates)
And actually a bit more than that. So the data is actually a bit more than a "simple" POLE, and so I thought that it would be an even better fit for a a potential Graph Model then.
Creating a GTD POLE model for Neo4jSo after a bit of examination and experimentation in Excel, I ended up drawing out the following Graph Model for the Global Terrorism Database:
Hope this was interesting so far!