Friday 10 April 2015

Podcast interview with Philip Rathle, VP of Products at Neo Technology

As you may have heard here and there, Neo Technology released this amazing new version of Neo4j - version 2.2, about 2 weeks ago. Perfect time for me to go to Philip Rathle, the VP of Products of Neo, to talk about where we are and where we are going. Great conversation, as you might expect - and also a bit longer, as you also might expect. We had so much to talk about:
Here's the transcript of what we talked about:
RVB: Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and here we are again recording another great session for our Neo4j Graph Database podcast. It's a remote session again over Skype and all the away across the Atlantic this time because I've got Philip, Philip Rathle of VP of Products for Neo on the other side of this Skype call. Hi, Philip.

PPR: Hi Rik and hello to your lovely listeners.

RVB: Yes indeed. There's a little bit of a delay on the line, I think, but we'll manage. Thanks for joining us and do you mind introducing yourself a little bit, Philip, so that our listeners know who you are exactly.

PPR: Sure. So I'm Philip Rathle. I work out of San Mateo, California headquarters here in Silicon Valley and I've been with Neo for just almost three years and essentially I do all things product management.

RVB: Yeah, absolutely. As I recall, you started about the same time as me at Neo, didn't you?

PPR: I think so. I think we were within a couple months of each other.

RVB: Absolutely, yeah. So Philip, this podcast is always nice and short and sweet. There's two big topics that I want to cover with you. The first one really is, what do you love about graphs, and what attracted you to graphs? You've got a long career in databases behind you already. What brought you to the graph, and what do you love about them?

PPR: You're right. I had spent most of my career - I guess close to 20 years at that point - working with data, and working with databases, and of course all of that was relational for the most part. And some of that was doing consulting, some of that was doing TPA and data modeling work and some of that was actually doing product management around tooling for data modeling, and database administration, database development long before it was popular. Thankfully, it's gotten popular these days. I've been fascinated with this idea that there's all this information and that there's an opportunity to better interact with the world, be more effective, use time better, give people what they need, what they want faster by leveraging information in good ways.

PPR: And there's this one distinction we'd always talked about, was the difference between data and information. Data is this raw stuff that's use at your own risk, and then you synthesize it and come to understand it and model it, and use it, and it becomes information and therefore valuable. And I was fascinated with this idea that you could have a data model that reflected where the logical and physical model were the same, meaning that where the way that a business person viewed data and viewed information could actually be much, much closer - if not almost identical - to the technologist's view. And one of the big disconnects that's always happened with projects, and one of the big costs, and a lot of the frustrations come out of the fact that business and IT are misaligned.

PPR: But I don't think that's the root cause. I think that's a symptom of the fact that the view-- the business and the IT were viewing things through-- viewing the same thing through a very different lens, and so it became hard to communicate. That was the thing that really hooked me to start, and as I started digging more and seeing the kinds of other things that people can do with it and the kinds of performance you can get out of a native graph database, and the kinds of scheme of flexibility, and not having to spend months and wait for migration windows, where you would do this huge all or nothing thing, and maybe spend half an evening rolling it forward and half and evening rolling it back, those were pretty nice side benefits you could say.

RVB: Wow. Yeah, absolutely. I mean, the model is something that has been coming over this podcast time and time again. It's something that people really love about the graph. Maybe we can turn a little bit to what about Neo4j specifically, right? We've released this beautiful new version 2.2 this week, congratulations.

PPR: Super happy about that.

RVB: Yeah, absolutely. Everyone is, I think, and the feedback has been super. But what do you love about that? What's so great about 2.2, maybe a couple of minutes on that?

PPR: Well at the time I joined Neo-- and by the way I felt so great about the stuff, I actually joined on my birthday, can you believe that? At the time, and this was in mid 2012, I saw that this had amazing potential, not only potential it was actually being used for really serious stuff by some big companies, by some cool startups. And the observation though is that it's really, really amazing, but it takes a little bit of hacking and wiring and working around things to get it to work. So, it was an amazing technology if you invested some amount of time getting it working.

RVB: Right. That's where I get my gray hair, by the way [laughter].

PPR: Yeah. I believe you [laughter].

RVB: The early versions-- I mean the early versions of Neo were difficult, right? I mean, they were much more difficult than these than we have right now.

PPR: Yeah, this is what happens when you have brilliant engineers who are focused on the really, really hard stuff, which is building a database engine that is reliable and fast and scalable, and that's such a gargantuan task that it can be easy over time to forget the easy stuff. And it's not easy actually, user experience and defining the right surface and access methods and tooling isn't easy, but it's a very different mindset. And so we-- around the time I joined, all of the work-- well, it's an ongoing thing, but there have been so much work done to create a database that was solid and fast and could scale, that there hadn't been very much investment and actually taking that technology and making it more broadly accessible and easily usable. And so since the time I've joined, it's been an ongoing journey of - and we can talk about the different release themes and how we shift from released to release - what our focus is on, but it's steadily evolved to become something that's not only approachable, but I think really pleasant in a lot of ways. I mean, geez, graph-- karaoke, how many databases do you see doing that?

RVB: Absolutely, I'm a big fan [laughter]. No absolutely. So I mean it's been a fantastic journey I think both in terms of usability the 2.X series of Neo4j, but in 2.2 I think it's amazing what we're seeing in terms of performance right?
Yeah. So with 2.0 the focus was-- let's see I joined before we just started working on 1.9. I think 1.8 had just come out and 1.9 was all about improving the infrastructure used to do the clustering so you didn't have to run a zoo keeper cluster alongside a Neo4j cluster. And then with 2.0 the shift with the major version was we're going to focus on the Cypher query language and-- as opposed to essentially native Java APIs, which if you're not a Java developer or you're not into writing lots of imperative code is no where near as approachable and convenient as writing a declarative query, particularly one that has these characteristics of compactness and readability with your notes enclosed in parentheses, and your relationships with your arrows and so on.

PPR: And to do that, we found we actually needed to change the fundamental model and add this thing called labels. And we also decided-- had this observation that the user interface that we'd had up until then, while we considered it, at least I considered it, really, really limiting because I came from a tooling background, turned out to be something that people really, really loved and appreciated. And that told me anyway, that what they appreciated was the power of being able to actually visualize the graphs. That's a unique aspect of the model. We focused on those three areas, came out with a release that was much more consumable. As you often do with these things, you swivel the chair. You work on features, and then you swivel the chair back and you say, "Okay, I'm going to take the whole thing, but particularly the ensemble in cleaning these new features and I'm going to make it perform even better in every way." And perform means latency, i.e. response time. It means response time under high load, because response time of one query at a time is maybe what you notice when you're trying the technology out. But that's ultimately not working. You use it for production, you're going to have lots of things happen at all at the same time.

PPR: And actually, one of the things that wasn't headlined was a huge investment in quality. We have, on any given day, dozens to hundreds of Neo4j instances on cloud hardware, physical hardware, clusters, not clusters, big clusters, small clusters, doing all sorts of tests, long running tests, and stress tests, and load tests, and let's pull the plug tests. That's ultimately what a database needs to be. It needs to be resilient across a whole range of edge cases where any given person is going to be dealing throughout the course of their life of their application with hundreds or thousands of those edge cases. So, there are hundreds of thousands of tests that run internally on a daily basis and just hammering the database. That's always happened, but it's happened even more, like significantly, significantly more with 2.2, so I feel really, really great about that.

RVB: Yeah absolutely, I think the initial feedback has been absolutely fantastic. It's been a really proud event this week. Maybe I can switch gears a little bit and ask you one last question, Philip, if you don't mind. And that's you're VP of Products, so what does the future hold? Where do you see this going maybe short term but primarily also long term? Where do you see Neo and graph databases go?

PPR: Let me answer that, maybe start with a different place than you might expect which is to talk about where the market is going because the product needs to reflect where the market wants to go and where the market can go even though it doesn't realize it yet [chuckles] and it doesn't know it. It's a secret old--

RVB: the Steve Jobs approach, right? [chuckles].

PPR: I might see an application and say, "Okay, it'd be really convenient to have a big red button here." But actually, for someone, it may be that the best solution isn't to give me a big red button, it's to address something two or three levels back where that screen or that interaction doesn't even need to happen. And so where the market seems to be headed is there's a wider range of use cases where businesses are finding it valuable to not just use the data and look at data as things in isolation, and not just view joining data as reconstituting something that you needed to break apart because the relational model required it, but to actually understand the causality and the relationships and the effects between related things. That's a world we live in, and we've oversimplified it for a long time I think maybe because we didn't know better, but actually more because of the technology limitations we've had. You can't have a high performing native graph database without very fast, random IO, because you're hopping. You're doing pointer chasing. And in the days where you had very little memory and spinning disk, tape, and punch cards or whatnot, that just wasn't feasible and so--

RVB: That's probably why the old Codasyl databases failed, isn't it [chuckles]?

PPR: Well, I think they’re few reasons for that, I think the other is you didn't have the model flexibility either. You're still put things in buckets and so rather than having individual data items have a relationship with another individual data items so, Rik colleague with Philip, you actually were creating structures into which you-- which our generic buckets and you throw those in and that's not dissimilar from relational databases. Of course you have an equivalent on the logical side. There's a conceptual meta model with a graph but then the data itself actually looks just like the meta model you're relating physically individual things not buckets of things. And so, the technologies evolve to make more things possible and it's now just a question of how fast, and in what directions, that wave will grow different industries and different use cases where there's an appreciation and discovery of what are the new things that I can do or what are the existing things that I can maybe do in real-time instead of batch pre-compute?

RVB: So what might be your personal favorite use case? Do you mind pulling one out?

PPR: Yeah. Geez, there's so many. People talk a lot about Internet of things these days but what would that be without the connections? I think of it as Internet of connected things. That's certainly one. Identity and access management is one that maybe isn't immediately intuitive to people, but it's-- I have a content hierarchy on one side and a person to group, to group, to group hierarchy on the other side. And of course these things aren't always strict hierarchies, which if you have just one top to bottom or side to side or—

RVB: They're multidimensional right?

PPR: Yeah, then it becomes a graph effectively, and then connecting these two hierarchies adds another dimension. That's actually a really good one. Sometimes we see ones that are really unexpected and fun and that's part of what's made this whole journey really interesting. I love going to the graph gist page-- I think it's or maybe its

RVB:, I believe.

PPR: Where people will just come up with wild and crazy but often times it-- of course that's a really good fit. Weighing an airplane was a surprising one. I didn't expect that. Turns out you can do it much faster with a graph.
Cool. What I'm hearing is lots of more beautiful use cases to come up, right? So that's the most important thing you see coming up?
So that then drives the features, and I think, "Where does it create demand?" It creates demand for more convenience, so there'll be more of that, improvements in the developer experience, the ops experience and so on, as well as continued improvements in scale and performance. Those are really the themes we track and then quality and reliability underlying all of that.

RVB: Cool, Phillip. We've already gone 18 minutes, so I'm going to wrap up if you don't mind, because we want to keep these reasonably short.

PPR: Let's do it.

RVB: Thank you so much for coming on the podcast. I really appreciate it. If people want to know more about Neo4j, there's only one place to go. You go or @neo4j on Twitter. If you want to reach out to us, I'll put the email addresses on the blog post with the podcast. Thank you so much Philip. It was great talking to you.

PPR: Bye Rik.

RVB: Thank you, bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


No comments:

Post a Comment