Wednesday 24 February 2016

What's a graph without a laugh? Multi-lingual Graph Karaoke is HERE!

Indeed. If you follow this blog and some of my work with Neo4j a little bit, then I am hoping you know that I like to have FUN with Neo4j as well. It's actually quite amazing that you CAN in fact have fun with something as "boring" as a database - but hey, I really think you can.

I have been doing Graph Karaoke for a long time (I still blame Nigel) - and that has been a ton of fun along the way. Over the years, I gathered quite a playlist:
And now with GraphConnect Europe coming up in April, I felt like we would really need some more songs to spice up the conference. But: being in Europe - WHY would we limit ourselves to boring ENGLISH songs all the time. So I figured, there has to be a way to do Graph Karaoke in a multi-lingual way. And that's what I will be doing in the next few months - and today is the first episode in "MultiLingual Graph Karaoke".

Google Sheets to the rescue!

We'll start with a nice little Dutch song that I really like, by the world-famous (!) group "Doe Maar!". The song is "Dansmuziek", and has a lovely vibe to it, I think.

I grabbed the lyrics online, and put it in a google spreadsheet. It looks like this:
As you can see (access the sheet yourself over here), I have one column in there with the lyrics in the original, Dutch language, but I also have a couple of other columns that are automatically translated into English, German and French using the Googletranslate function of Google Sheets. I simply do:

=GOOGLETRANSLATE(C2,"nl","en")
=GOOGLETRANSLATE(C2,"nl","fr")
=GOOGLETRANSLATE(C2,"nl","de")


and these lyrics get automatically translated into the other languages. Obviously the translations will not be perfect, but hey, at least you get to do multi-lingual graph karaoke then! Yey!

Once I have this, I can download the spreadsheet as a CSV file, and then I can start working with it in Cypher. I have created an Import gist on Github - so you can basically run it yourself and import the graph at your own convenience.

Creating the Karaoke video

Once we have that, it's child's play to actually create the Karaoke video. A few very simple queries suffice:

match (w:Dutch {seq:1})
return w;




to find a sentence in a specific language. Or

match (w {seq:1})
return w;



to find a sentence in any language. And then finally also a "tabular" representation that would be easy to read:

match (w {seq:4})
where not ("SongSentence"in labels(w))
return w.seq as Sequence, labels(w) as Language, collect(w.name) as Sentence
order by Language[0]


All of that then brings me to the following - slightly stupid, I agree - slightly wonderful result:


That's it! I plan make a few more of these before GraphConnect, and publish them here as well. If you have any "song requests" in your own language, then PLEASE let me know.

Cheers

Rik

Friday 19 February 2016

Podcast Interview with Caleb Jones, Disney

I have said it before, and I will say it again. The overwhelmingly wonderful reason why I keep making the time to do these podcasts, is that I get to talk "shop" for a bit with some of the smartest, loveliest, most interesting people in the industry. It's so cool to talk to people like my next guest, Caleb Jones. Caleb is one of those community members that does not blog / write / speak very often, but when he does - it simply BLOWS you away. Listen to or read the interview below - it is a gem.
Here's the transcript of our conversation:
RVB: 00:01 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here I am again recording another episode for the Graphistania Podcast. And tonight I am joined by Caleb Jones all the way from Seattle in the US. Welcome Caleb. 
CJ: 00:15 Thanks, Rik. It's great to be here. 
RVB: 00:18 I asked you how to pronounce your name and I made a mistake, I am very sorry-- 
CJ: 00:22 Oh that's all right. 
RVB: 00:22 [laughter]. I apologize. So Caleb, why don't you introduce yourself? You've been  very active in the community and your blog and everything, but it will be good for you to introduce yourself to our audience, if you don't mind. 
CJ: 00:36 Sure. And again, thanks for the opportunity to kind of sit down and chat. I've been  kind of  involved in the graph space for years now. My introduction to it actually did come through Neo4j. That was kind of my first intro into the space,  also focus  a lot on doing graph analysis and, as you mentioned, I run a blog called AllThingsGraphed.com. I post some of those analysis. Really a labor of love. I'm not getting paid  to do it at all. It's  just a way for me to kind of just express what I'm exploring and some of the insights as I play around with the graph space. Like you just mentioned, I'm in the Seattle area. Professionally, I work as a software engineer, now more recently, a software architect for Walt Disney Company
RVB: 01:29 Excellent.  I've being reading your 'All Things Graphed' blog for quite some time. You've done some amazing posts on there that I really enjoyed reading,  you know. Do you mind telling us a little bit about some of those experiments? 
CJ: 01:46 Sure. 
RVB: 01:46 I'm a particular fan about the antonym synonym pathways thing. There's a lot of other interesting things. 
CJ: 01:53 Yeah.  So, if you don't mind, I can just kind of dive right in to, you know, what kind of led me to do that initially. Really what-- so I mentioned Neo4j kind of turned me into the graph space, kind of  opened the door, and  then I ran across this essay called Science and Complexity - the one that Weaver wrote (note: download over here). And he wrote it back in the mid-twentieth century. And it kind of lays out these books of science, what he calls problem of  simplicity. We have  one element acting on another, problem of disorganized complexity. Well, now you're looking at things at system level, but not in terms of the interactions of pieces in there. But then also, a problem of organized complexity. So this is mid-twentieth century, and he says, "Well, problem of organized complexity is really what we're going to need in order to  start addressing things like the complexities of  medical, psychological, biological, political, economic sciences, and he saw it as a kind of a blocker towards us starting to really explore those  problems of organized complexity. 
CJ: 03:06 The compute resources, so again this is back in mid-twentieth century, 1948, and he's seeing how there's this kind of new form of scientific  exploration, and analysis,  that once we have computational power that's up to the task, we'll be able to start diving into. And, to me, that just screams graphs. Right? You have graphs that they're really designed around that concept of elements and their relationships or  interactions with each other and then as you start building up  that graph, and then start doing network wide or graph wide analysis, you start to have these insights. So, that turned on a lightbulb for me, and I said, "Wow, you start seeing  graphs everywhere. And I said, "Well, I've started writing some tools and some code that allows me to do these kinds of analysis, and so I don't have to write code every single time, and that's really what led me to, "Well I'll start blogging about this, as I start playing around." That's what kind of led me to  how I got here, how I got exposed to  graphs and where I'm coming from in my blog. 
RVB: 04:19 Yeah. Some of the experiments, you've done some really interesting experiments. What's your favorite one that you've done so far if you don't mind? [Briefly?] a little bit. 
CJ: 04:28 Yeah, definitely. I try to keep a variety.  I don't want to do the same analysis over and over again, so I try to use  different sorts of data sets and different topics. So I've gone away from kind of a microscopic, where I did one on protein interactions of budding yeast, and then even kind of looked at what some of those molecules look like and write a molecule as a graph, right? You have atoms that  have certain kinds of connections  to each other, right? And then molecules interactions, that's a graph and so forth. You build up and then all the way up to my favorite one was the interstellar  network navigation, using  graph analysis and that one was a real challenge, but also a real joy to do because astronomy is a big life passion of mine. And so that was a really an intersection of a life-long  passion and technical skills  set, and the right matched up with the right data set. So that was my favorite one. 
RVB: 05:32 That's the one that you presented at GraphConnect, I think, and there's a video about that and everything.

Yeah, very cool. It's funny that you mentioned this protein interaction.  That was actually one of the first  projects that got me into a more of a practical Neo4j insight as well. There was a research group here at the University of Ghent that was doing a  metaproteomics -  protein, protein interaction and analysis for beer yeasts. That's one of the first interactions that I've had as well, so it's funny that you mention that. 
CJ: 06:04 Yeah. 
RVB: 06:06 So  I think you've already touched a little bit on why is it so powerful and so interesting for you as well. It's all about dealing with this complexity I suppose. But are there any  particular things that you  really, really enjoy about graphs that you don't find in other data structures or...? 
CJ: 06:25 Yeah, so one thing that I really enjoy about graphs in  particular, is  it starts to address these kind of topological questions. What I mean by that is, when you start analyzing a graph and it's features, you really start to get insight into the emergent properties of that data set or that system. And so, for instance, in an economic graph or network, what you would  start to see is key brokers of  transactions in that network. And there are examples like eBay using that for fraud detection and things like that. And on the medical sciences I've mentioned that post I did on budding yeast proteins and their interactions. You can start  to tease out what are some of these  key proteins that are involved, and when you look at that it turns out that that's kind of a fundamental building block that you see across different kind of types of life. So that's the sort of insight that when you're only looking at an individual protein in that instance and only it's immediate connections, you're not going to get that kind of  an insight, versus when you start looking at networks and doing things like  PageRank analysis between this centrality scoring and so forth. 
RVB: 07:41 You can look at system-wide effects, right? You can look at the entire interaction rather than just the local interaction I suppose? 
CJ: 07:49 Right, and you start getting emergent properties  that in some interesting ways  aren't necessarily strictly reducible to any one element in the network, right? It's really an attribute of the network as a whole. 
RVB: 08:01 Super interesting. And so many practical applications as well. As you know, I work for Neo as a commercial guy, as a  salesperson. I see so many  applications in business and from logistics, to financials, to cancer research, there's so many applications of this stuff. It's really quite amazing [chuckles]. 
CJ: 08:24 Yeah. 
RVB: 08:25 Super.  Maybe one last question if you don't mind. Where do you think this is going, or where do you want it to go, or where do you want to take it yourself? What does the future hold, Caleb? 
CJ: 08:37 It's hard for  any of us to say, but  I think graphs are really poised to start being a tool that can be used to answer or provide some sharpness to our answers of some really big questions.  It was one of the key things I talked about at the last GraphConnect presentation in San Francisco, finding what are the big questions we want  answers to in these different areas,  whether it's astronomy or biology, taxonomies like I might find in WordNet. I've done a few analysis on Wikipedia. You know, what are some of these big questions  that we can start answering, and  how can we use graphs to sharpen our answers to those questions. That's what I kind of see coming out as we start using graphs more and more. 
CJ: 09:27 For me personally, I know in the last couple months  I haven't been posting  on my blog. I have a few that are kind of building up. One is, I've actually started scraping political candidates' websites, and starting to look at those. 
RVB: 09:44 This is the year to do it right [laughter]? 
CJ: 09:47 Yeah yeah,  definitely. But I want to do a new kind of analysis where I'm starting to scrape the topology of those connections but also the content, then do the analysis, then produce word clouds that are segmented based on that analysis, to really tease out what are these-- what does the language really tell us about these candidates?  And so that's one thing  that's kind of on the horizon for me. On the astronomy side, I actually got my hands on a data set from a universe simulation from a colleague from the Los Alamos National Laboratory,  and  basically try to replicate the same sort of analysis I did previously on the stellar network, but do it at a galactic simulation level. So that's another thing that's kind of next on the horizon for me. Yeah. 
RVB: 10:43 Wow. That will be something big, wow.  I actually talked to someone from  NASA a couple of months ago, that was using Neo4j as well, so [chuckles]. That was super interesting, as well. Anyway, Caleb, I think we're going to wrap up here. I really appreciate you coming online  and talking to me about all this wonderful stuff.  I'll make sure we have enough links to all your great articles and transcription when we publish it. Thanks a lot. Really appreciate it. I hope to meet you at some future GraphConnect
CJ: 11:18 Yes, definitely. Thanks for taking the time. 
RVB: 11:20 Cheers, man. Bye. 
CJ: 11:20 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday 12 February 2016

Podcast Interview with Iian Neill, The Codex

A couple of months ago someone pointed me to this great Neo4j application called The Codex - which is like a semantic application mapping out an "atlas for history". Its author, Iian Neill, recorded a great video about it, and triggered me to want to learn more about it. There's been some other Neo4j projects (like for example Historiana, which Paul worked on) in this domain, and as you will see from the interview below - there's a lot to be said about it. So let's get cracking!

Here's the transcript of our conversation:
RVB: 00:02 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again recording a long-distance podcast all the way from Australia. This is actually the second week in two weeks-- the second episode in two weeks that I'm recording with [chuckles] someone in Australia. And tonight I have Iian Neill on the Skype here. Iian Neill in Brisbane. Hi Iian. 
IN: 00:25 G'day Rik. 
RVB: 00:26 Hey, thanks for coming online. I know it's early for you and it's late for me, but this is a great time for us to chat, right [chuckles]? 
IN: 00:34 Absolutely. It's certainly my pleasure. 
RVB: 00:36 Very good. Ian, we got to know each other a couple of weeks, months ago. At least I started following your projects a little bit, but it will be good for you to introduce yourself to our podcast listeners because most people probably don't know who you are yet. 
IN: 00:53 Okay. My name is Iian Neill. I'm an ASP.NET developer. I have a bit of a background in computers and arts. I've got a Bachelor of Arts in Art History, but I work in IT. I also work for a non-profit art foundation called the Art Renewal Center. But basically, yeah, I've been passionate about art history and been looking for a way to data mine it and hence Neo4j. 
RVB: 01:23 And then we should also immediately mention one of your coolest projects, I think. This is how I got to know you: the Codex, right [chuckles]? 
IN: 01:31 Yes. That's right. Yes. The Codex is something I've been working on for a few years. It's kind of evolved a bit, but it's basically a way-- it's a project I built out of ASP.NET and Neo4j using the C# Neo4j client, and it's a tool that I'm building to sort of-- I call it an atlas of history. It's sort of trying to map history out, and the connections between people and events and places and things like that. 
RVB: 02:03 Okay. And then tell us a little bit more about that. I saw there's a lot of information about like Italian Renaissance, Leonardo da Vinci, Michelangelo, and stuff like that that you're trying to map out what they're doing or what they did, right? 
IN: 02:18 Absolutely. I kind of think of it as being a bit like a Facebook of the past or in some ways even a little bit like a time machine. There was TED talk on someone doing a project a little bit like that on Venetian history.
But what I really wanted to do was to be able to put myself back in the past and say, "What was happening on a certain day?" So if I saw a certain painting, and what's the context around this painting? Who were the people? What was going on in Florence when this painting was being made? And from that, I started to build the data structure and say, "What else can we find out about this? Can we use the system to abstract out some information? Can we see connections that we might not see if we were just reading a book in a linear way?" And that's kind of what's attracted me to Neo4j. 
RVB: 03:18 Tell us a little bit more about that. What's the relationship between the Codex and Neo4j? How do you use it? 
IN: 03:24 Oh, it's completely dependent on it. A few years ago I had an idea for breaking down a person's biography or life into a series of events, and you can think of it as being a verb phrase. So X meets Y at Place Z, for example. And that's just a data structure. I mean, you have two people, you have a place, and you have a time. And that data structure can be quite powerful for representing sequence of events and connections. And you could then use Cypher to sort of query that, and say, "If I know that X was at this place, at Florence, who else was there at the same time? If X had these friends, do these friends know the other person's friends?" You know? And you can sort of-- once you start down that road, you can sort of keep expanding that with the graph, basically. 
IN: 04:18 So I started in that fashion, but then I found that it was a little bit restrictive and a little bit time consuming to take written text and break it down into that kind of atomic way. So instead, I put a different model on top of that, so I put in the event, you know, somebody's diary event for a day in 1478, let's say. And then I could annotate who was there and what was-- the places, and everything that was mentioned. Those are all nodes in Neo4j. And then I put, if you like, subject tags on top of that. So it's a little-- like you would tag a photo or a Twitter post, a hashtag, you might tag it with a description of what's happening. So if you can sort of forgive the macabre example, a popular pastime in the Renaissance was hanging people. 
IN: 05:08 So for example, you might read somewhere that somebody was taken to the public square and they were hung that day. So I started by saying, "Let's put that in there." So I would create a tag for hanging and associated it with that event on that day at that place. And then I thought, "Why not bring a taxonomy to that tag?" So what I mean by that is putting that tag in a hierarchy. So I'd ask the question, "Well, what is a hanging? Well, that's a kind of public execution, and that's a kind of death," or something like that. And I thought, "Well, that could be an interesting scholarly tool for understanding history." So you've got the text of the event, you know who was there, what they were doing, and then you can use the graph and step out by sort of degrees of separation. 
IN: 06:00 You can say, "I'll start with a specific subject like hanging and then I'll go to all kinds of executions, which could be--" they were very creative back then, so you're bringing back lots of events. And then I have followed this procedure for every tag in the system where I can. And probably the last extension I've done to that is I thought, "When you put a tag in the system, why not record a numerical quantity with that?" So if three people were hung, you could put "hanging three." And then I thought, "That gives you chartable information for three." So you have an event, you have all the people there, you have the subject of the activities, and then if you have numbers, you have information that can be visualised as charts. So it occurred to me to bring all these things together. That's [crosstalk]-- 
RVB: 06:52 It sounds a little bit-- it sounds a little bit like a semantic application, doesn't it? You know, like-- 
IN: 06:57 Yes. 
RVB: 06:56 --triples and those types of things. Is it related to that in any way? 
IN: 07:01 Yes, absolutely. Many years ago when I did my postgraduate IT degree I did a course called "Ontology and the Semantic Web", and that's kind of where it all came from. It was about ten years ago and we used a language called OWL - I think O-W-L - as a modelling language. And I thought it was amazingly powerful for expressing real relationships. And then I was really disappointed to see that there was no practical database out there that could do that kind of thing. It was just sort of SQL. And I sort of failed to translate the Owl model into SQL in an efficient way, and I kind of put it aside. But then a few years ago I came across Neo4j and that seemed a good time to pick it up again. 
RVB: 07:49 Well, that's a perfect segue for my second question. It's, why Neo4j? Why did you use a graph database for this particular project? And then what's so good about it? Any comments on that? 
IN: 08:06 Well, I mean, originally it just started as a side project. As I said, I started with that sort of data structure, that X meets Y at a place. And originally, I just wrote it as a kind of MapReduce-style thing and JavaScript just using JSON, and just querying it through lambdas and so on. And it was always going to be temporary; it's just a in-memory JavaScript. And I started looking around, thinking, "Is there a database that can do this?" And I heard about NoSQL document databases, and I looked into Mongo and RavenDB. But what I found when I looked into Mongo - I read an interesting post; I will try and dig up the link later - is, it was by somebody who had used Mongo extensively and I think they thought that Mongo would be a relational kind of system for them, that it would have some of the power of-- the relational ability of SQL databases. And they realised that it didn't really have that. And I thought, "That's great. I won't go down that road." And then somebody in the comments recommended Neo4j, so I started looking into Neo4j. And it seemed to me the perfect intersection of the power of representing things in a document style and a graph style and then having the relationships as well that make it incredibly fast to query and update. 
RVB: 09:28 Very cool. So it's kind of like what I've heard many people on the podcast say: it's a combination of good modelling fit and then on the other hand, there's also just query power, right? Query possibilities that match this domain really well. 
IN: 09:45 Absolutely. And just to quickly round that up, but I was saying before, I was lucky enough to sit in on a talk that Jim Webber gave in Brisbane that was related to the YOW Conference in I think 2013. And I already knew about Neo4j at the point but going to the talk really convinced me. Jim gave a great description, gave lots of examples from Doctor Who (dataset is over here), which is wonderful [chuckles]. [crosstalk] you'd think.
RVB: 10:15 [chuckles] Yeah. Yeah.
IN: 10:17 And then he gave me a copy of the book as well, on graph databases, and it really went from there. It was absolutely decided that I was going to do that with Neo4j. 
RVB: 10:28 It's so funny. I mean, two weeks ago I spoke to two fellow Australians from Melbourne, and they as well got inspired by that tour that Jim did in-- 
IN: 10:40 Yes [chuckles]. 
RVB: 10:40 --2013 in Australia [chuckles], so it's been a productive visit, that one [chuckles]. Very [good?] [crosstalk]. 
IN: 10:47 Absolutely. 
RVB: 10:48 So the last question I always ask people, Iian, is what does the future hold? Where do you think this is going? Where is your project going and where do you see graph databases as part of that project going? Any perspectives? 
IN: 11:05 Sure. I've got a few plans with Codex. I want to continue-- I want to add the ability to put in more, what you might call, arbitrary data sets. So rather than just having events - you know, what people were doing - I want to be able to put in things like if somebody gave me a record set of births and deaths, or disease, epidemiology figures, or something like the spread of a plague or something, I think it would be possible to integrate that into the system so you could switch between data sets, you could be looking at somebody's life story but then also looking at more official statistics as well. So that's kind of where I'll be taking it in the next few months. One thing I've discovered working on Codex is that-- one thing I didn't expect from Neo4j was that it's such a good tool for modelling that in a way, you can almost-- in most domains, you have one database for one domain. 
IN: 12:12 You have a shopping cart and you have an art gallery collection or something like that, and you sort of think about them as being two separate databases. But with Neo4j, I've found that you can think about it as being one database. You can have multiple domains that if you define points of where they interface - certain commonalities like time or space or location - you can easily take the domain you started with and add other domains to it, so it becomes kind of what I think it was being, like an integral or universal database in a way. I don't know if that would be appropriate for every solution, but I think it's something that Neo4j offers that I think would be very difficult to do with another database. 
RVB: 12:58 Very cool, very cool. Well, thank you so much for talking about all of this. I really appreciate it. As you know, I try to keep these podcasts quite short so that they are digestible on everyone's commutes, you know what I mean? So we're going to wrap up here, but I really want to thank you again for coming online. Good luck with the Codex and all of your projects, and hopefully we'll get a chance to meet each other at some point. That would be great. 
IN: 13:28 That would be fantastic. And thank you, Rik. 
RVB: 13:31 Thank you. Bye-bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday 5 February 2016

Podcast Interview with Ben Butler-Cole, Neo Technology

Here's another lovely conversation with a dear colleague of mine working in our London office, Ben Butler-Cole. Ben has been a part of the London team for well over 2 years now, and has been working together with the "Thoughtworks crew" (Jim, Ian, Alistair, Mark - and probably some more :)) for a very long time. So he really is part of the family. He became a hero of mine by summing up the corporate culture at Neo very succinctly a while back: "Neo is such a great place to work because we have no ass-holes over here". Or something of that nature :) ... In any case we had a great chat, and I would love to share it:

Here's the transcript of our conversation:

RVB: 00:02 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here I am recording a podcast episode with one of my dear British colleagues, all the way from across the Channel, Ben, Ben Butler Cole. Hi, Ben. Thanks for joining me. 
BBC: 00:16 Hi, there. 
RVB: 00:18 Thanks a lot for coming online. I appreciate it. 
BBC: 00:20 Not at all. 
RVB: 00:21 Very good. Ben, I invited you because I know you've been doing some really cool stuff on Neo4j in the engineering department, and I think it's always cool for people to sort of have a little bit of a view on how these things work. But before I get into that, why don't you introduce yourself and people might learn a little bit more about who you are and what you do at Neo? 
BBC: 00:45 I'm Ben. I am one of the developers at Neo. I've been at Neo for two-and-a-half years. In that time, I spent some time working on the core product. I've also worked on improving our internal development infrastructure, build systems and release processes and so on. I've spent some time working on building testing tools to allow us to automate and improve the testing of Neo4j, particularly testing in sort of real-world scenarios, testing end-to-end in the same way that our users use it so we can have the chance to beef the product a bit before we release it to the real world. 
RVB: 01:45 Absolutely. 
BBC: 01:46 Most recently I started a new stream of work to improve the operations surface of Neo4j. So we're looking at logging, configuration, packaging, and things like that, trying to really improve those and trying to make the product as easy and nice to use for systems operators as it is for the end users. 
RVB: 02:19 Yeah. Could you tell us a little bit more about some of these build processes and how they work? Just from a really high level. I know everything we do is open source, right, so people can look at some of the source codes on GitHub and stuff like that, but how does it actually work from a developer's perspective? What are some of the big bricks? 
BBC: 02:40 Yes. We use TeamCity, which is a build server, and we take the source that's in the GitHub repo, which as you see, people can see. We built that, run a test in that. Then we have quite a long complex pipeline of builds that follow on from that. Some of the testing we do is in the public source code repository. We also have a number of internal private repositories which have testing tools and so on. For each build we probably run 10 or 15 different kinds of tests that we have that test different aspects of the application. Some very simple ones, which we call tyre kicking, which just start the application - install it, start it, run it, make sure that we can write data and read data. And then varying levels of sophistication beyond that. Low-level tests for components where we want to stress-test or performance-test an individual component, and then larger end-to-end tests. Some of the most useful tests we've got are for our testing of our clustering where we do sort of fuzzy testing, where we stand up a cluster and read and write data to it. And while it's running, we knock over individual instances - deliberately crash them or shut them down cleanly - and make sure that the cluster stays up and is resilient to that. That's been a very useful aspect to the testing that we do. 
RVB: 04:30 Wow. Some of those tests, do they take a long time, or is it like instantaneous, or how does that work? Some of these things must take quite some time, no? 
BBC: 04:41 They do, yes. We have tests that run for several hours for that reason, because we're standing up real clusters of service, and we want to run them over time and make sure that they're stable over a long period of time. 
RVB: 04:58 Very cool, very cool. And you're now starting some new work on the operability, you said, right? There's a lot of people looking at that, I think. Some of our primary users are the administrators, aren't they? 
BBC: 05:11 Yes, exactly. Historically it's an area of the product that hasn't got quite as much love as it might have done. We've been focussing on end-user features of the product, and stability and the reliability of the data storage. But we've taken a decision to make an investment in trying to improve the operability of the product as well now. So we're improving the configuration and the way that works, the packaging particularly. It's not glamorous stuff, but because of my interests I'm very excited that we're doing it. So we're changing the directory structure of the application's tools too so that it's more sympathetic towards the standard ways to doing things on the platform, particularly on the Linux, where the majority of our production installs are running. 
RVB: 06:18 Is that work going to be visible in 3.0 or in the 3.x series, or...? 
BBC: 06:23 Yes. Some of that work has already gone into the first steps of the milestones, have already gone into the code base and will be in 3.0. 
RVB: 06:32 Super cool. 
BBC: 06:33 We're trying to get as much of it as possible into 3.0 as a major release because we're happy to make some backwards incompatible changes for 3.0 because it's a major release, and then hopefully things will settle down for the minor releases that come after it. 
RVB: 06:52 Totally, yeah. I always ask a couple of questions on this podcast, right? One of the things that I'm always interested in is what attracted you to Neo, and how did you get to Neo, and why do think it's a cool product to be working on. You can give me the real answers, Ben. I know there's a very boring answer to this one [laughter]. 
BBC: 07:15 I've spent nearly ten years before I came to Neo working as a consultant. So I saw a huge range of different systems and applications, and built quite a lot of them. One of the really attractive things that I see is what I think is the superiority of the Graph model as a way of modelling the real world and the shape of data in the real world, over SQL or key-value stores. I find that very appealing. I think we're effectively making the life easier of all those developers who I've been working with over the years, who are struggling with the impedance mismatch between particularly SQL and the applications they're trying to rhyme. 
RVB: 08:18 Very cool. 
BBC: 08:18 On a more personal level, I knew a bunch of people who are here working at Neo Technology, and they were people who I knew and respected, and I was keen to come and work with them. So that was what really sucked me. 
RVB: 08:32 Exactly. There's a bunch of people that come from the same background at Neo, right, people like Ian and Jim and Alistair and all of those folks? 
BBC: 08:38 Yeah, exactly. And they're all people who I'd worked with before, who I was keen to work with again. 
RVB: 08:43 Very cool. Maybe one more question, Ben, if you don't mind. Where do you think this is going now from a product perspective, from an industry perspective? Anything that you aspire or think we should be doing or believe we should be doing, or that type of stuff? Look into the future crystal ball. 
BBC: 09:09 I think there's a lot of work for me to do, carrying on the work I'm doing at the moment for the product. As you know, there are initiatives and exciting new features being built across the product at the moment, for 3.0 and beyond. I'm keen to kind of stick with the boring stuff. I really want to make Neo4j a very easy product to operate. So beyond just cleaning out what's effectively debt that we're working on at the moment, I have ambitions for monitoring particularly of live systems, to make the software able to explain to people what's going on inside it, integrate it with standard monitoring systems, and once we've reached a level of where we're happy that it's good enough, I've then got ambitions to start pushing on the state of the art, and improving on the state of the art for monitoring particularly, turning monitoring into a kind of feed of events so that it's really easy to understand, to interpret the behaviour of the system, the people operating it to be able to more or less leave it to tread away on its own, and then help build up the systems that can interact with it, and fix problems as they come up. 
RVB: 10:59 Very cool. Well, thank you so much, Ben, for coming online and sharing that with us. 
BBC: 11:04 Awesome. 
RVB: 11:05 It's always great to get like an inside-peak in how things work in Neo's engineering world. Really appreciate it. And I would say: You know what? Let's make Neo4j boringly fantastic, right [laughter]? That would be such a great achievement. Thank you so much, Ben. I appreciate it.
BBC: 11:26 All right. Thanks. 
RVB: 11:28 Cheers, man. Bye. 
BBC: 11:29 Bye-bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik