Friday 12 February 2016

Podcast Interview with Iian Neill, The Codex

A couple of months ago someone pointed me to this great Neo4j application called The Codex - which is like a semantic application mapping out an "atlas for history". Its author, Iian Neill, recorded a great video about it, and triggered me to want to learn more about it. There's been some other Neo4j projects (like for example Historiana, which Paul worked on) in this domain, and as you will see from the interview below - there's a lot to be said about it. So let's get cracking!

Here's the transcript of our conversation:
RVB: 00:02 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again recording a long-distance podcast all the way from Australia. This is actually the second week in two weeks-- the second episode in two weeks that I'm recording with [chuckles] someone in Australia. And tonight I have Iian Neill on the Skype here. Iian Neill in Brisbane. Hi Iian. 
IN: 00:25 G'day Rik. 
RVB: 00:26 Hey, thanks for coming online. I know it's early for you and it's late for me, but this is a great time for us to chat, right [chuckles]? 
IN: 00:34 Absolutely. It's certainly my pleasure. 
RVB: 00:36 Very good. Ian, we got to know each other a couple of weeks, months ago. At least I started following your projects a little bit, but it will be good for you to introduce yourself to our podcast listeners because most people probably don't know who you are yet. 
IN: 00:53 Okay. My name is Iian Neill. I'm an ASP.NET developer. I have a bit of a background in computers and arts. I've got a Bachelor of Arts in Art History, but I work in IT. I also work for a non-profit art foundation called the Art Renewal Center. But basically, yeah, I've been passionate about art history and been looking for a way to data mine it and hence Neo4j. 
RVB: 01:23 And then we should also immediately mention one of your coolest projects, I think. This is how I got to know you: the Codex, right [chuckles]? 
IN: 01:31 Yes. That's right. Yes. The Codex is something I've been working on for a few years. It's kind of evolved a bit, but it's basically a way-- it's a project I built out of ASP.NET and Neo4j using the C# Neo4j client, and it's a tool that I'm building to sort of-- I call it an atlas of history. It's sort of trying to map history out, and the connections between people and events and places and things like that. 
RVB: 02:03 Okay. And then tell us a little bit more about that. I saw there's a lot of information about like Italian Renaissance, Leonardo da Vinci, Michelangelo, and stuff like that that you're trying to map out what they're doing or what they did, right? 
IN: 02:18 Absolutely. I kind of think of it as being a bit like a Facebook of the past or in some ways even a little bit like a time machine. There was TED talk on someone doing a project a little bit like that on Venetian history.
But what I really wanted to do was to be able to put myself back in the past and say, "What was happening on a certain day?" So if I saw a certain painting, and what's the context around this painting? Who were the people? What was going on in Florence when this painting was being made? And from that, I started to build the data structure and say, "What else can we find out about this? Can we use the system to abstract out some information? Can we see connections that we might not see if we were just reading a book in a linear way?" And that's kind of what's attracted me to Neo4j. 
RVB: 03:18 Tell us a little bit more about that. What's the relationship between the Codex and Neo4j? How do you use it? 
IN: 03:24 Oh, it's completely dependent on it. A few years ago I had an idea for breaking down a person's biography or life into a series of events, and you can think of it as being a verb phrase. So X meets Y at Place Z, for example. And that's just a data structure. I mean, you have two people, you have a place, and you have a time. And that data structure can be quite powerful for representing sequence of events and connections. And you could then use Cypher to sort of query that, and say, "If I know that X was at this place, at Florence, who else was there at the same time? If X had these friends, do these friends know the other person's friends?" You know? And you can sort of-- once you start down that road, you can sort of keep expanding that with the graph, basically. 
IN: 04:18 So I started in that fashion, but then I found that it was a little bit restrictive and a little bit time consuming to take written text and break it down into that kind of atomic way. So instead, I put a different model on top of that, so I put in the event, you know, somebody's diary event for a day in 1478, let's say. And then I could annotate who was there and what was-- the places, and everything that was mentioned. Those are all nodes in Neo4j. And then I put, if you like, subject tags on top of that. So it's a little-- like you would tag a photo or a Twitter post, a hashtag, you might tag it with a description of what's happening. So if you can sort of forgive the macabre example, a popular pastime in the Renaissance was hanging people. 
IN: 05:08 So for example, you might read somewhere that somebody was taken to the public square and they were hung that day. So I started by saying, "Let's put that in there." So I would create a tag for hanging and associated it with that event on that day at that place. And then I thought, "Why not bring a taxonomy to that tag?" So what I mean by that is putting that tag in a hierarchy. So I'd ask the question, "Well, what is a hanging? Well, that's a kind of public execution, and that's a kind of death," or something like that. And I thought, "Well, that could be an interesting scholarly tool for understanding history." So you've got the text of the event, you know who was there, what they were doing, and then you can use the graph and step out by sort of degrees of separation. 
IN: 06:00 You can say, "I'll start with a specific subject like hanging and then I'll go to all kinds of executions, which could be--" they were very creative back then, so you're bringing back lots of events. And then I have followed this procedure for every tag in the system where I can. And probably the last extension I've done to that is I thought, "When you put a tag in the system, why not record a numerical quantity with that?" So if three people were hung, you could put "hanging three." And then I thought, "That gives you chartable information for three." So you have an event, you have all the people there, you have the subject of the activities, and then if you have numbers, you have information that can be visualised as charts. So it occurred to me to bring all these things together. That's [crosstalk]-- 
RVB: 06:52 It sounds a little bit-- it sounds a little bit like a semantic application, doesn't it? You know, like-- 
IN: 06:57 Yes. 
RVB: 06:56 --triples and those types of things. Is it related to that in any way? 
IN: 07:01 Yes, absolutely. Many years ago when I did my postgraduate IT degree I did a course called "Ontology and the Semantic Web", and that's kind of where it all came from. It was about ten years ago and we used a language called OWL - I think O-W-L - as a modelling language. And I thought it was amazingly powerful for expressing real relationships. And then I was really disappointed to see that there was no practical database out there that could do that kind of thing. It was just sort of SQL. And I sort of failed to translate the Owl model into SQL in an efficient way, and I kind of put it aside. But then a few years ago I came across Neo4j and that seemed a good time to pick it up again. 
RVB: 07:49 Well, that's a perfect segue for my second question. It's, why Neo4j? Why did you use a graph database for this particular project? And then what's so good about it? Any comments on that? 
IN: 08:06 Well, I mean, originally it just started as a side project. As I said, I started with that sort of data structure, that X meets Y at a place. And originally, I just wrote it as a kind of MapReduce-style thing and JavaScript just using JSON, and just querying it through lambdas and so on. And it was always going to be temporary; it's just a in-memory JavaScript. And I started looking around, thinking, "Is there a database that can do this?" And I heard about NoSQL document databases, and I looked into Mongo and RavenDB. But what I found when I looked into Mongo - I read an interesting post; I will try and dig up the link later - is, it was by somebody who had used Mongo extensively and I think they thought that Mongo would be a relational kind of system for them, that it would have some of the power of-- the relational ability of SQL databases. And they realised that it didn't really have that. And I thought, "That's great. I won't go down that road." And then somebody in the comments recommended Neo4j, so I started looking into Neo4j. And it seemed to me the perfect intersection of the power of representing things in a document style and a graph style and then having the relationships as well that make it incredibly fast to query and update. 
RVB: 09:28 Very cool. So it's kind of like what I've heard many people on the podcast say: it's a combination of good modelling fit and then on the other hand, there's also just query power, right? Query possibilities that match this domain really well. 
IN: 09:45 Absolutely. And just to quickly round that up, but I was saying before, I was lucky enough to sit in on a talk that Jim Webber gave in Brisbane that was related to the YOW Conference in I think 2013. And I already knew about Neo4j at the point but going to the talk really convinced me. Jim gave a great description, gave lots of examples from Doctor Who (dataset is over here), which is wonderful [chuckles]. [crosstalk] you'd think.
RVB: 10:15 [chuckles] Yeah. Yeah.
IN: 10:17 And then he gave me a copy of the book as well, on graph databases, and it really went from there. It was absolutely decided that I was going to do that with Neo4j. 
RVB: 10:28 It's so funny. I mean, two weeks ago I spoke to two fellow Australians from Melbourne, and they as well got inspired by that tour that Jim did in-- 
IN: 10:40 Yes [chuckles]. 
RVB: 10:40 --2013 in Australia [chuckles], so it's been a productive visit, that one [chuckles]. Very [good?] [crosstalk]. 
IN: 10:47 Absolutely. 
RVB: 10:48 So the last question I always ask people, Iian, is what does the future hold? Where do you think this is going? Where is your project going and where do you see graph databases as part of that project going? Any perspectives? 
IN: 11:05 Sure. I've got a few plans with Codex. I want to continue-- I want to add the ability to put in more, what you might call, arbitrary data sets. So rather than just having events - you know, what people were doing - I want to be able to put in things like if somebody gave me a record set of births and deaths, or disease, epidemiology figures, or something like the spread of a plague or something, I think it would be possible to integrate that into the system so you could switch between data sets, you could be looking at somebody's life story but then also looking at more official statistics as well. So that's kind of where I'll be taking it in the next few months. One thing I've discovered working on Codex is that-- one thing I didn't expect from Neo4j was that it's such a good tool for modelling that in a way, you can almost-- in most domains, you have one database for one domain. 
IN: 12:12 You have a shopping cart and you have an art gallery collection or something like that, and you sort of think about them as being two separate databases. But with Neo4j, I've found that you can think about it as being one database. You can have multiple domains that if you define points of where they interface - certain commonalities like time or space or location - you can easily take the domain you started with and add other domains to it, so it becomes kind of what I think it was being, like an integral or universal database in a way. I don't know if that would be appropriate for every solution, but I think it's something that Neo4j offers that I think would be very difficult to do with another database. 
RVB: 12:58 Very cool, very cool. Well, thank you so much for talking about all of this. I really appreciate it. As you know, I try to keep these podcasts quite short so that they are digestible on everyone's commutes, you know what I mean? So we're going to wrap up here, but I really want to thank you again for coming online. Good luck with the Codex and all of your projects, and hopefully we'll get a chance to meet each other at some point. That would be great. 
IN: 13:28 That would be fantastic. And thank you, Rik. 
RVB: 13:31 Thank you. Bye-bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a Comment