Wednesday 29 June 2016

Podcast Interview with Andrès Taylor, Neo Technology - reloaded

Two months ago, me and my friend Andrès where chatting on our internal slack channel, mesmerising about how cool our customers are and how we get the best possible feedback from talking to them. So how could we talk more too them? Simple: two weeks ago, me and Andrès went on a little road trip together, hitting Amsterdam, Paris and London in three days, talking about all the things we love talking about - but primarily, of course, Andrès' baby, Cypher. There's been a lot going on the Cypher world in the past few years - and it felt like a good time for a "reload" of the interview we did last year. So here it is - from a noisy London coffeeshop, to your podcast player:

Here's the transcript of our conversation:
RVB: 00:01 Hello, everyone. My name is Rik, Rik Van Bruggen, from Neo Technology, and here I am in a beautiful London coffee shop recording a podcast with someone that I've been speaking to a lot in the past couple of days because we've been doing a little road trip through Europe about openCypher. That's Andres Taylor. Hi, Andres.
AT: 00:24 Hi, Rik [chuckles]. How are you? 
RVB: 00:25 I'm very good. Yeah. We've been on the road for the past two days, and it's been all about openCypher. But maybe before we dig into the openCypher topic maybe we can just talk a little bit about Cypher itself. Could you tell us a little bit more about where it came from, how do you get into it? Where did it start, just very briefly? 
AT: 00:50 Sure. When I joined Neo Technology we had two different languages that ran on our graph database. We had the CoreAPI and Gremlin, and with my background, a SQL DBA, I felt that we needed something more high level. So we started working on that as a do-what-you-want Friday project for some time until we put it as part of the product and it grew organically since. And today it's the primary BI for Neo4j that we use. 
RVB: 01:34 Absolutely. As you know, I'm a big fan of Cypher because it allows non-programmers like myself to use a graph database which is super important. And I think that's the power of declarative languages. 
AT: 01:49 Yeah. Exactly. That's one side. Originally SQL was meant to have that power that non-programmers should be able to write queries, but technology has moved on and now we have even better tools, even higher level of abstraction to work with. So yeah. It's very much meant to make the querying easier to understand, not just fast. 
RVB: 02:20 Yeah. And in Neo4j there has been a lot of work on Cypher. Like you said it was the hobby project at first but now it's the primary API with full-on infrastructure, planners, all of those types of things. What are some of the big components there? 
AT: 02:38 When we started we built a heuristics planner, a rule-based planner, that has pretty simple rules saying, for example, if you find an index, use it. If you see this type of pattern, use this type of operator to solve it, which is great start, but a cost-based planner allows for even better plans to come out from the planner. The cost planner uses statistics that we store about the data in your graph so that when we're building a plan we know or we estimate, we guess, that this is a better way of getting your data is starting from this index and then traversing this way instead of the other way around. 
RVB: 03:25 Yeah. That was a big change doing cost-based traversals rather than the rule-based. I've seen that a lot. And there's more work on the way I understand. You're doing work on more infrastructure components to make it better for you, faster, stuff like that? 
AT: 03:42 Yes. Today we have an interpreted run-time which builds an object structure and has data flow between these objects when you run a query. This is the execution plan side of things. What we're working on at the moment is compiled run-time which takes your query plan, your logical plan, and transforms that into a Java class with a execute method. So when you're running your query you're actually a Java class. 
RVB: 04:10 I'm looking forward to that. That's in the next couple of versions, right? But one of the big things that's coming up and that we've announced last year at GraphConnect in San Francisco is this whole new openCypher initiative. Could you tell us a little bit more about that? Where does it come from? What do we want to do with it? Who's working with it, those types of things? 
AT: 04:37 Right. What we felt is that we want to grow the graph database space. We think that that's a critical factor. We don't want to have a bigger piece of that market. Instead we want to grow the market. And to do that we felt that we need a common language across graph databases so that people, when they invest in learning this technology, they can take that investment and use it in other products and not feel that they're stuck using Neo Technology. So at GraphConnect, our big conference in San Francisco last year, we announced openCypher, which is a project that will open up the language to make it useful for database implementors, for tool vendors and for any user that wants to get to the nitty-gritty technical details of-- 
RVB: 05:33 So like rogue vikings that want to build the next graph database, they could actually use it as well? 
AT: 05:38 That's the plan. 
RVB: 05:39 That's the plan. So what are some of the key components of openCypher? What's in there? 
AT: 05:45 Well, we're releasing as a grammar, so people can create a parser and know what valid syntax is and isn't. We are building a Technology Compliance Kit, a TCK, which is an example driven way of testing the implementation to see that it behaves the way it's supposed to behave. We're actually moving that out from our Neo4j repository to its own so that we are going to use this TCK to validate Neo4j as a valid Cypher implementation. We're also creating a reference implementation which is a way of showing how this is supposed to work. The TCK and the grammar are really useful and good starts but it's not enough. So a reference implementation would show how you can actually build something like this and what the details mean. And lastly, we're working on a language specification, a natural language semi-formal way of describing the expected behaviour of the language. 
RVB: 06:59 That's really cool. Well, when I was at the GraphConnects San Francisco, I saw some really big names on stage announcing their support. People like Databricks and Oracle. Is that list expanding? 
AT: 07:11 It is. We're working with many different companies at the moment and they're also already tool vendors that had started using the deliverable from openCypher, such as IntelliJ plugins. We see projects popping up all the time using our stuff now. 
RVB: 07:32 Really cool. Well, that should really help for that ambitious goal of creating that new language for graphs. I think if everyone and anyone else wants to read up on it they can go to opencypher.org and absolutely also to the neo4j.com website to find more information. Find us on Twitter or wherever they want to find more information. As you know I want to keep this podcast fairly short. So thank you so much for taking the time to come to a noisy coffee bar with me and talk about this lovely project. Thank you Andres. 
AT: 08:07 Thank you so much for having me Rik. 
RVB: 08:09 Cheers. Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a Comment