Wednesday, 17 August 2016

Podcast interview with Stefan Plantikow, Neo Technology

Today's episode in the Graphistania podcast is one that I have really been looking forward to, for many reasons. First of all, our guest is such a lovely guy - feels like I could go out on a VERY long pub crawl with Stefan - seriously. Then, he has been working on some of the most interesting topics in Neo4j - another bonus. Most recently, he has worked on the "swiss army knife" of Neo4j tooling, the Awesome Apocs. Enough reason to have a good podcast chat together - and here that is:


Here's the transcript of our conversation from July 4th, 2016:
RVB: 00:02.518 Hello everyone, my name is Rik, Rik Van Bruggen from Neo, and here we are again, recording another Graphistania podcast, and today I have one of my lovely colleagues from the engineering team with me, Stefan Plantikow from Berlin. Hi Stefan.
SP: 00:18.163 Hi Rik, how are you? 
RVB: 00:20.004 I'm really well. Thank you for joining me. I'm hoping you'll take good care of my missus, my wife, who is visiting Berlin today [laughter]. 
SP: 00:30.208 I'm sure she'll have a splendid time. 
RVB: 00:32.388 I think so too, absolutely. Anyway, Stefan, I invited you because you are one of Neo's engineers, and you've been doing some really cool stuff, but maybe you can introduce yourself to our audience here? 
SP: 00:46.247 Yeah, my name is Stefan Plantikow, I've been working with Neo for almost four years now, I think, and I've been pretty much about all over the place. I've been working on the kernel, a bit on the indexing side, I've spent a lot, a lot of time on Cypher, and recently I've been working on the new driver surface. Still I think my focus at Neo is around Cypher the language. I'm currently working on the openCypher TCK, which is a way to help people certify other implementations of Cypher, being conformant with Cypher as we have it today in the Neo4j. 
RVB: 01:26.769 So a couple of things that you mentioned there, the driver surface, that means the new Bolt protocol, and the way drivers interact with Neo4j, right? 
SP: 01:36.819 Yes. 
RVB: 01:37.406 And the TCK, that's part of the openCypher initiative, right? 
SP: 01:43.445 Yes.That's the future for Cypher, I think. 
RVB: 01:47.959 I'm surprised that you haven't mentioned your wonderful work on Apocs. 
SP: 01:52.206 [chuckles] Right [laughter] Yes, I did actually have-- 
RVB: 01:55.684 You know where this is going, right, Stefan? 
SP: 01:58.764 I know where this is going. 
RVB: 02:00.634 Tell us that story, please. 
SP: 02:01.823 I can tell you that story. I think as part of the Drivers initiative-- let's maybe pedal back a little bit. So Drivers and the Bolt protocol is a new way to interface with Neo4j, right? It gives you a new super-efficient binary protocol for talking to the graph database, and perhaps more importantly, it gives unified API. So whatever language you use, you have a very standard way of talking to the graph database, a really nice API, and you also have the same capabilities everywhere, so it doesn't matter if you're a Python developer or a Java developer, you can always use Neo4j and get the same things. Now, long-term users of Neo4j are surely aware of a feature we have that's called Unmanaged Extensions, which allows you to plug your own code into the database. 
RVB: 02:54.139 Absolutely. 
SP: 02:55.407 And those have been exposed in the Rest API, so what happened is of course that people asked the question, now in the world of Bolt, how am I going to do that? And because Bolt is purely Cypher driven, so everything you do there, you do via Cypher, so we needed a way for exposing interesting functionality or extensions via Cypher, and the way we came up with eventually were procedures, and that happened as part of the Bolt project. Procedures is the way to call a piece of user code, as part of your Cypher Query. You can just match around in your graph, find stuff and then hand it all off to your custom procedure, have it execute some custom business logic, have a talk to some third party system, whatever it is, executes some graph analytics, algorithms perhaps even, and then return you a stream of results. Then continue with that, even in Cypher. This has been very, very well received, but it's interesting here, the story, because we did it initially to cover the ground, to be sure that you can do the same things in Bolt that you could do before in Rest API. But it turned into this beautiful thing, where people jumped onto it and my colleague Michael Hunger created a repository called Apoc, off public procedure, that you can just hook into your database. People have been adding and contributing and committing to that at a really, really fast pace. 
RVB: 04:28.963 This has really been like the nugget of gold of the 3.0 release, I think. It's been really, really well received, right? 
SP: 04:37.553 Yeah, and that was really amazing because [chuckles] it felt like the thing that would have been missing from the product if we hadn't done it, but still I think I didn't anticipate it to be so well received. It was like super well received. 
RVB: 04:53.736 Especially also because it was something that started as a weekend project for you. Is that what I'm understood? 
SP: 05:01.367 Yeah a little bit like that. What was initially planned was to be able to call procedures just like that, and then I wasn't really happy with that, because it wouldn't allow me to combine procedures with other parts in your Cypher Query, so that would not have been part of 3.0 initially. And then I sat down and spent a couple of weekends, and said, "I need to make this happen." It just didn't feel complete to not be able to combine procedure calls with other parts of the Cypher Query. And that's where it gained its versatility. Take for example, you have a way of importing data from other systems now, via Apoc. You can just call into a JDBC driver, and pull out data out of some legacy relational store. And that's already nice, but what would you do after? But now you can directly feed that into a Cypher Query, then you can create nodes, create relationships, and suddenly it has a much, much higher utility than if you just had been able to just call the procedure. 
RVB: 06:00.157 Absolutely. I mean I think it is one of those examples where I love the way how our ecosystem, whether it's engineers like you, or people around us, that basically provide this amazing innovation, when we haven't always planned for it. It's not like we're planning for these types of features to be built like cathedrals. It's more like a bazaar, that evolves the way it wants to evolve almost. It's super interesting actually, the open source way of doing things. 
SP: 06:34.873 Yeah, that's also part of the fun with working at Neo, that on the one hand of course, there's a stellar team that plans releases and drives things forward and all that, but then there's also the whole organic side where someone can jump in and just they feel very strongly about something, and this can get into the release and everything gets better, better from that. It's really nice to strike a middle ground between these two sides of it. 
RVB: 07:00.544 A question I've been asking everyone on this podcast a little bit is, how did you get into the wonderful world of Neo and graph databases and why do you think it's such a cool place to be? Any perspectives on that, Stefan? 
SP: 07:16.265 My story is a bit-- I was in Berlin academia before I joined Neo, more or less. I was part of a research group on distributed systems, and of course if you're in distributed systems, data storage is the big topic there of course. It's one of the classic applications of distributed systems,and this was around the time the NoSQL movement came around, and I was aware of all that. There was the guys from CouchDB, some of whom are local in Berlin for example, and this was all interesting but it wasn't really-- I was missing the quality of something new there. It's just the same old storage that we've been had, just in a distributed fashion, in a slightly more streamlined fashion in comparison to the old databases. And then I bumped into this guy, Peter Neubauer, who probably has been mentioned on this podcast once or twice-- 
RVB: 08:16.740 And we interviewed him as well, yeah. 
SP: 08:18.233 [chuckles] And he introduced me to the shiny world of graphs, and I was all amazed, at Berlin Buzzwords, and he influenced my academic research. I looked a little bit into partitioning of distributed graphs background. And it really gelled with me actually. I'm a long time fan of the method called mindmapping, you might know about that? 
RVB: 08:44.802 I use it all the time. 
SP: 08:46.610 You use it all the time? If you don't know it, it's a splendid way of making notes. It's a thinking tool really, it's a thought tool. I've been using that since school basically. And what I've always liked about that is it has such an intuitive appeal. You can look at a mindmap and you can follow the process and it's very human-friendly. That's something I really love about graphs, they're so amazingly human-friendly, opposed to many, many other ways in which you could store information on the computer, which require to have a PhD or, I don't know, at least some long education to really get to the bottom of it. And this is something, you can go to a whiteboard, you can talk about it. You can talk about it with completely non-technical people, which I think is very important in today's organisations. 
RVB: 09:35.040 I can tell you that from here, as a commercial person, it's so valuable, that human-friendly nature. The fact that I can have a conversation with both the developer and the manager of the developer, and the manager of the manager of the developer, and people will understand. People understand what we're talking about, and they will understand why graphs are so powerful, and that's pretty amazing, I think. 
SP: 09:58.823 Yeah. I agree. 
RVB: 10:02.540 We could talk for hours, but we want to keep this fairly short, so I'm going to ask you one more question that I have been asking everyone as well, which is, what do you see in the future? What do you see in the future for Neo4j, for the industry, for Apoc? What's your--? 
SP: 10:21.693 Yeah let's go. Let's go from the small to the large then perhaps, and start with Apoc. I think we'll see more use of them for sure. We are also going to see, I think, more flavours of procedures in other contexts, perhaps as functions or something like that. There's more opportunities for us to do thing similar to the procedures we just added for 3.0, to even reap more benefits from them and more importantly give them more value and make using Cypher even more versatile for people. 
SP: 10:55.028 More importantly, procedures, from what I'm currently doing, because my head is very much in openCypher these days, it gives us an interesting trajectory for trying out features, without having to add things to Cypher the language. Because even though, of course, all the dynamism in Neo's culture and the great velocity with which we can change things in Cypher, it is still a process to add-- to change the actual language. And in the procedure is the much quicker process, so here we can innovate, try out things. Also see, look at the procedures that are written by customers in the market and then learn from that, make Cypher the language better. I'm looking forward a lot to that. 
SP: 11:38.432 And then, looking into the wider time horizon perhaps, also with openCypher as the context, I think what we're going to see is just graphs. Really the problem is graphs are everywhere, which is actually not a problem, it's a truth [chuckles]. But having that truth being delivered by being graph-structured data being the default mode of data organisation in the industry, I think that's something I want to see come about, and I think having a standard query language like openCypher is going to be tremendously helpful to the whole market to achieve that. 
RVB: 12:22.588 I couldn't agree more, and I'm going to wrap up our conversation. If people want to know more about this and your work, I will include some links with the transcription of the podcast. I want to thank you so much for doing this conversation Stefan. It's been great talking to you, as always, and I look forward to doing that again. 
SP: 12:44.803 Thanks, thanks Rik. This has been fun. 
RVB: 12:47.120 Cheers, mate. 
SP: 12:47.327 Have a great time, everybody. 
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a comment