Wednesday 1 April 2015

Podcast interview with Ian Robinson, Neo Technology

It's always great to talk friends and colleagues that you admire and enjoy talking to. So this next episode of our podcast is particularly special - as I got to spend some time talking to Ian Robinson, of Neo Technology. Ian is one of the many "great minds" - in my opinion at least - at Neo, and a lovely, insightful person to talk to. See below. 

Here's the transcription of our conversation:
RVB: Hello, everyone. My name is Rik - Rik van Bruggen from Neo Technology. Here I am again recording another podcast session for our Neo4j Graph Database podcast. It's a remote session again today over Skype, and I'm joined today by Ian Robinson. Welcome, Ian. 
IR: Hello, Rik. 
RVB: Hey, good to have you on the podcast. Ian, lots of people probably know you from your graph database book and lots of conference talks, but for those of us that don't know you yet, would you mind introducing yourself? 
IR: Sure, yeah. Ian Robinson. You've already mentioned a couple of things that I've done whilst at Neo. I've been here about four years. I'm a developer here, but I'm also one of the authors of O'Reilly's graphdatabases book. And for the last few years, I've also spent quite a bit of time talking about and presenting graph database material at different conferences. Now, I'm working on a lot of internal material at Neo and development work here, currently. 
RVB: Would you mind telling us how you got to Neo? What's the history? 
IR: I'm a long time user of Neo, so I used it prior to joining the company, came across it probably six years ago or so. Whilst working with Jim Webber elsewhere, we both came across Neo and experimented with it whilst working with other customers developing little prototypes and just becoming fascinated by both the technology, but also a lot of the opportunities, a lot of the capabilities that it opened up for us. So when the opportunity emerged about four years ago to join the company, it was a no-brainer. 
RVB: Fantastic. Very cool. I always ask the same questions here on this podcast, and one of those is why graphs? What do you love about graphs, and why do you think it's the best thing since sliced bread? 
IR: What do I love? Firstly, I have to say I do love graphs and love graph databases, really love Neo4j and all the work that I've done with it with the customers of Neo over the last few years. I've been thinking about “why graph” again quite a bit recently, and about how, as a community, we've thought about and discussed and proposed the value of graph. I've been thinking about alternatives to the traditional arguments that we make. If we think of the ways in which we normally talk about the benefits of graphs and graph databases, we talk about things like performance and flexibility, the flexibility of the data model. We talk about the way in which the technology aligns well with agile software methodologies. We talk about things such as return on investment and cost of ownership - lots and lots of very good reasons why we believe graphs and graph databases should be adopted and why we all love them. 
IR: But as I've been thinking about these things, I think there are a couple of issues there. One of them is the fact that all of those proposals, they're all effectively what the economists call “proxy variables”. It's a way of talking indirectly about profit and loss, but each of those things in and of itself doesn't share a common measure. So it's very difficult to balance all of these different forces and to make a conclusive argument as to why adopt graph. Without some kind of common measure, if you base your choice of technology around any one of those arguments that I've just mentioned, we kind of run the risk of optimizing around local maxima. That's one the problems with the ways in which we've discussed graph in the past and the value of graph. The other thing is that actually they're all rather generic arguments. Actually, we could be describing any technology - performance, flexibility, ROI, cost of ownership, all of that kind of stuff. There's nothing there that really talks to me about graph as graph. I've been trying to think through and trying to redirect some of my love for graphs is thinking about the economic benefits of graph as graph. How can we take specific features of the graph data model and of the graph databases technology, how can we take those features and translate them into a model of life cycle profit and loss? 
RVB: Would that be something like cost savings or additional revenue? Is that the kind of thing that you're thinking about? 
IR: Perhaps. It's a multi-dimensional problem. There's lots and lots of different ways into this, but we're trying to find some economic basis for choosing graph over and above anything else. And my way into it is to fall back on some of the work that I've done previously around the graph data model. Like I said, I think there's lots and lots of different ways into this problem, but the one for me is to think about the graph data model and how we can try and translate that into measures of profit and loss. So what I always do is try and create an economic framework that justifies the graph data model. 
RVB: That's really interesting, but doesn't seem like a very straightforward thing to do. 
IR: It's not, and they're kind of tentative thoughts at the moment, but I think it's possible. I think we can reach out to other pieces of literature in other ways in which people are kind of analyzing similar problems and translating them into these measures, and we can apply some of those techniques when we're thinking about graphs and graph databases.
What [crosstalk]— 
IR: Sorry, go. 
RVB: [laughter] We're interrupting each other. I've thought about this a little bit myself as well, and you always think about the cost savings or the additional benefits side, but then there's also this part where I think it's just enabling new things. Completely new kinds of applications that were never possible before are all of a sudden possible. How do you measure that, right? 
IR: Yes, yeah, the kind of untapped opportunities or the way in which it reflexively impacts the way in which we think about our world and what we can do in our world. And perhaps it's more difficult there to attach a specific cost to those opportunities that, currently, we have nothing to compare them against. 
RVB: They can be really disruptive, right? 
IR: Yeah, definitely. 
RVB: Super. That's a really interesting answer and lots of things to follow on to that, I think, but we want to keep these podcasts reasonably short, so I'm going to move on to another question if you don't mind, because obviously, maybe that's part of your answer to the next question. Where is this going? Where do you think graph databases and graph technologies will be in the next couple of years? What are your thoughts on that, Ian? 
IR: I think we could look at that in three different areas, three levels of granularity, and actually they become increasingly more interesting the further up the stack that we go. But the lowest level, I think, there's the evolution of the technology itself - the maturation of the technology. And then we can look at the capabilities offered by the data model and that speaks to the thing that you were just talking about - new opportunities and new applications that we've not dreamt of in the past. Then there are changes in the overall business context. So I think under the technology, what we're going to see over the next few years is-- today we're very much focused on Neo4j. That's obviously our core concern, but we're already aware that this is now part of the larger established branch of technology - graph database technology. 
IR: I think we're going to see that technology mature, and in maturing it's actually going to look-- it's going to come to look more and more like the other technology that we use. Today, each graph database is a kind of snowflake technology. But over time, we're going to see a common set of core features and probably a degree of standardization around the ways in which we access these things - the APIs and the query languages. I think one of the things that we have to consider is the way in which that standardization - whether we like it or not - kind of triggers a revaluation of a lot of proprietary languages and proprietary APIs. I think that's something that, again, as a community, we should begin to consider. I think in the future, the technology itself will be differentiated as much on features as it is on things like quality and the operational affordances. Actually, how easy is this to embed into my software development life cycle? How easy is it to operate in production, and what's its quality? What kind of SLA can I be guaranteed with this technology? 
IR: I thought that's the kind of lowest level of the ways in which I think things will tend over the next few years. What becomes more interesting then is the way in which, as we apply the technology, as we learn about how we can use it, it has that reflexive force that helps us mine new opportunities. I think that's going to create a kind of profound shift in the way in which we conceptualize and represent our domain, all the things that we're interested in. In the past in the applications that we have been familiar with in the last 20 years or so, we've kind of taken it for granted that we know what the things are that we're interested in, what the entities are. We've kind of established boundaries, and then we shape our representations, we shape our data models to conform with this kind of upfront way of the way the world is, the way we think the world is.
But I think in the future - and we're already beginning to see this - what we take to be our object of interest will actually emerge from a connected structure, from a graph, based on the way in which we query or look at that structure. I think this is where something like Cypher in particular becomes enormously important. The techniques that we have allow us to discover entirely new kinds of entity or aggregate. In the past, we already knew what our customers looked like, so we went and sought all of the information that we thought we needed to know about those customers, and then we tried to divine some additional insight. But I think in the future we will take that large variably-structured, densely-connected body of data and by experimenting with things like Cypher queries, with graph patterns, we'll discover new subgraphs, new aggregates within that, which become the objects of interest to us. 
RVB: Is that a little bit similar to what, in the triple store domain, is done with inferencing, Ian? Is that similar to that, you think? 
IR: Inferencing, and it's also about creating what I'd call identity functions. I think a Cypher query is effectively an identity function. You execute that query against the graph, and whatever comes back - particularly if this is a really interesting query with variable length paths and untyped relationship names and so on - whatever comes back suddenly becomes of interest to us. One of the things that triggered this was reading the Christakis and Fowler book, Connected, and them showing the ways in which we look at things have changed. The ways in which we look at things such as smoking and obesity have changed, and instead of defining a person's propensity to smoke as being the attribute of this bounded, autonomous entity - the person - instead, they propose this kind of propensity to smoke function which traverses a network of social relationships. And what you get is a new aggregate, a new weird subgraph which is a network of invisible transitive influences, and that's an entirely new thing that is of interest to us. 
RVB: Super interesting. You're actually the second person on the podcast that references that book [laughter]. So for all of our listeners it's probably a really interesting thing to take a look at. I'll put it on---. 
IR: Yes, it's very good stuff, yeah. 
RVB: It really is great stuff. I'll put it in the blog postwith the podcast as well. Cool. Ian, I'm going to wrap up this conversation, if you don't mind. It's been really interesting. We'll have more time at one of the next meetups or conferences - GraphConnect, perhaps, right? Thank you so much for taking the time. 
IR: You're welcome. 
RVB: I really appreciate it and look forward to continuing with the conversation online or in face to face [chuckles]. 
IR: Wonderful. Thanks, Rik. 
RVB: Thank you. Bye-bye. 
IR: Bye-bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


No comments:

Post a Comment