I first was introduce to Nat in the fall of 2013 when we asked him to give a talk at our London meetup group about the stuff he did for Sky to optimize the memory usage of their set-top-box. Pretty amazing. And so when I reached out to him about the podcast he was immediately up for it, and that's when we had the following late-night, after-the-kids-went-to-bed conversation.
Here's the transcript of our conversation:
RVB: Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and here I am again recording another episode for our Neo4j graph database podcast. And tonight, I am joined all the way from the UK by Nat Pryce. Hi Nat, how are you?
NP: Hello. I'm very well, thank you.
RVB: Very good, excellent. Nat, thanks for joining us. It's great to have you on podcast. For those of our listeners that don't know you, do you mind introducing yourself, and tell a little bit about yourself, what you're doing right now.
NP: Okay. I am a freelance software developer. I work for companies who basically want some consulting, helping with teams and agile software development and software engineering, and actually just developing products. I usually work for companies for a couple of years at a time, working on a product and working with the team to help transfer skills.
RVB: But you've written some books and stuff as well, right?
NP: Yes. I wrote a book with Steve Freeman called Growing Object-Oriented Software, Guided by Tests, which is a book about test room development and how it is applied in the wider software development life-cycle and process.
RVB: I saw that, it was really interesting so very cool stuff. So Nat, how did you get into graphs and graph databases? Tell us a little bit about that. I've read some of your posts and I've seen some of your talks but maybe you can tell us a little bit about the history.
NP: Okay. Well, the first thing that got me interested in graph databases was actually I had a crazy idea for writing my own programming language, which would not have a syntax but would actually be represented as an abstract binding graph, in a graph database, and then projected out into the different views that you could manipulate. So your editing of your program would be done by graph transformation. So I looked around to see if there was anything that could do that for me, so I wouldn't have to write it myself, in terms of just drawing the graph, and found Neo4j. It looked very easy to get started with and so, yeah, that's how I picked it up.
NP: It was very easy to get up and running. This was quite a while ago before they added Cypher when it was an embedded Java library pretty much, which was exactly what I wanted for my particular project that I was experimenting with. So it was just really a crazy idea that got me interested in it and then I realized what a useful tool it is. I mean graph databases, I find them very attractive because they've got a very good form and theory behind them, and I find it very natural to represent my data in terms of property graphs.
RVB: Wow. Did that project ever go anywhere [chuckles] if I can ask?
NP: Yeah. I got a very, very simplified scheme-like language doing function applications and simple calculations. I didn't really take it any further than that.
RVB: So when did you actually ever use Neo in more like a production context? I read some of your work on the Sky set-top box, was that the first time?
NP: Yeah. So I guess my use cases are maybe a little bit different from a lot of other Neo4j users, in that I pretty much use it for ad hoc data analysis. So in that use case, the fact that it's really easy to install and I can throw data into it very easily and then do exploration with Cypher, and then get some visualizations up, is for me the killer feature of it. At Sky, we were working on their set-top boxes, which are embedded systems so they've got quite a limited amount of CPU power and fixed memory, and there's millions of them in the field. You can't upgrade the memory so we were trying to cram more and more functionality onto the box but into a fixed amount of memory. So the memory constraints were becoming more and more of an issue, and as we were trying to get to release, we were getting some out-of-memory situations when we needed to track them down.
NP: The box was running a proprietary clean-room Java virtual machine that was optimized for efficient use of space on embedded systems, rather than performance. It didn't have a lot of tooling to analyse its behavior. So we basically had a fixed release date and some memory problems, and we had to build tooling to help ourselves analyse what was going in the Java virtual machine, which was a proprietary piece of software. So we couldn't look at the source code, we couldn't really understand a lot of its behavior, but we could get heap dumps out of it. So we could dump the heap dumps out but in a non-standard format but that was very easy to pass and heaps with objects relating to objects, all natural representation as a graph.
RVB: As a graph.
NP: So I immediately thought, "Oh right. Well, I know Neo4j, I've played around with it, I downloaded the latest version, I passed the data, just used a batch insert Java API and blasted the data into the Neo4j database on developer work stations." So that when we were working with the boxes, we could dump the data out and then query it with Cypher to understand what was going on inside the memory of these machines as they were running.
RVB: Super cool. I've seen the talk and I've read the material that you published on it. I think it's a fantastic use [chuckles] because then actually there's quite a few people that are doing things like software dependency analysis on Neo4j as well. It's an interesting field I think.
NP: Absolutely. I think a lot of aspects of software are naturally modeled as graphs, aspects of programming languages, of dependencies, core flow. Graph theory is a really good fit for many software-- different parts of different aspects of software development and understanding software development, understanding the contents of your version control history and all of this. It naturally falls out into graph analytics. In this project, we were discovering all sorts of unusual things about our own software that we didn't know how it behaved, about the JVM that we were using, about the Java compiler and how it was behaving. We were discovering things that we literally only found one page on the internet that was explaining what we were discovering in our heap.
NP: We got some really good results out it. We were able to optimize the memory on these boxes and get the release out and it was a big success. But also we were able to-- because we were working with a tool called ProGuard which is also used for Android, and we were finding that the way that Java 5 and above was being compiled into Java bytecode was quite wasteful of memory. So we got in touch with the guy who writes the ProGuard tool and Sky funded him to write new optimizations into the tool, to optimize the memory for these particular cases we were discovering. That ended up being rolled out then released as open source so then everyone benefited so that was a good result.
RVB: Super cool. Yeah, absolutely. I've also read some of the more important work that you've done on word puzzles.
NP: Yeah [laughter].
RVB: I thought that was very funny and interesting as well.
NP: Yeah. Again, that was an experiment with graph modeling, something that doesn't initially look like a graph, but actually if you can work out how to model things as a graph. So the puzzles you're alluding to I think are called Word Ladders.
NP: They were invented by C.S. Lewis I think (note from RVB: it seems like they were invented by Lewis Caroll, author of Alice in Wonderland) , and you need to go from one four-letter word to another four-letter word in a number of steps where you only change one letter at each step, but you always have to change it into another real word. So you can model that as a graph where each step is a link from word to another. Then solving puzzles is just a matter of finding a path through the graph or step.
RVB: Super cool, super cool. So maybe we can talk a little bit more about the future, Nat? What do you think graph databases will look like in the next couple of years, and what will they be used for or that kind of stuff? What are your thoughts on that?
NP: I think thinking about how I use them, and I know that I can see that there's a push to have larger clusters and a lot more processing, but actually the kind of things I'm using them for which is ad hoc analytics, what I'd like to see is something that allows me to more fluidly move between exploring the graph with Cypher Queries, and then visualizing it and then visually selecting part of that visualization, and then using that as the starting point for another set of Cypher Queries to find more data. So the browser in Neo4j is great but limited because it is just like the basic access to the database.
NP: I'd love to see much more interactivity and moving between querying and visualizing and exploring, for the kind of things that I do. Often I'm looking at graphs and I don't really know what I'm looking for, and so I'm exploring around them to try and discover interesting patterns. That's definitely the way we were working on those heap dumps at Sky was we didn't know what our problems were. We would try queries, discover things, use Cypher to summarize the information, and then dig deeper with some more exploration or visualizations. I'd love to see more of that kind of use case provided by tooling around graph databases.
RVB: Is it mostly the visual aspect do you think, or is it more that interactive capability of intuitively going through the graph, or both maybe [chuckles]?
NP: I think it was a mixture of both. There are some excellent visualization tools, and I'm thinking of Tom Sawyer and things like that, are incredible. But I'm a big fan of the Cypher Query language, I find it very powerful and very elegant. So what I was finding was I'd be doing a little bit of Cypher to find some information then I'd explore visually through the browser, and then I'd find some interesting new starting points that I wanted to then use as a starting point for more Cypher Queries. The current browser doesn't really make that easy, so I could see there could be some tooling around that mixture between writing and running queries and exploring interactively and visually.
RVB: Yeah, absolutely. Well, there's a lot of activity on that front right now, both in the community and at Neo. There's a lot of work going into making the browser better but also there's some fantastic tools out there, both commercial and open source, that help you do that I think. So thank you for that perspective, I appreciate that. So Nat, I think we're going to wrap up here. We like to keep these podcasts short and I know it's getting late here in Belgium as well. I need my beauty sleep [chuckles] so thank you for coming online. It was a real pleasure talking to you and I hope to see you at one of the Neo events in the future.
NP: Yeah, absolutely. Thanks for inviting me.
RVB: Thank you.
NP: Thank you.