Tuesday 7 April 2015

Podcast Interview with Andrés Taylor, Neo Technology

If you have been following our podcast, then you probably know by now that there are some exceptional people in and around the Neo4j ecosystem. It's pretty amazing - it feels like a great honour and privilege to be part of it. Today's podcast episode is going to be another session with an exceptional character, someone that you really don't want to mess about with - for good reasons:


O yeah. Andrés Taylor, one of the lead engineers at Neo Technology, is an avid Jujutsu practitioner - and all round amazing guy and splendid engineer. He is also known as the "father of Cypher", the declarative graph query language of Neo4j. So let's talk about that a little:


Here's the transcription of our conversation:
RVB: Hello, everyone. My name is Rik van Bruggen from Neo Technology. Here I am again, doing another recording for our podcast on Neo4j and graph databases. It's another remote session over Skype with someone that most of you probably don't know yet, but you should. That's Andrès Taylor from our dev team. Hi, Andrès. 
AT: Hi, Rik. 
RVB: Hey. Good to have you on the podcast. For those of you that don't know this yet, Andrès is one of the leading developers on the Neo4j development team. I can probably call you the inventor of Cypher. Right, Andrès? 
AT: Okay [chuckles]. You can say that. 
RVB: So, would you mind introducing yourself a little bit, Andrès? 
AT: Sure. So, like you said, I'm working in the dev team. I'm working on Cypher, the execution engine. The thing that takes Cypher query and actually runs it. I'm also the head of the Cypher language group which is working with the language part, the user facing side. The semantics of the language more than the implementation of the language. 
RVB: All right. How long have you been with Neo on this? 
AT: This is my-- four and a half years. 
RVB: Wow, you're a veteran [chuckles]. 
RVB: Excellent. And, you're based in Malmö, Sweden. Could you tell me a little bit about what attracted you to graph databases, and what do you love about it? What do you love about Cypher, as well? That's also a really cool thing for our listeners, I think. 
AT: I'll give it a try. Before I joined Neo, I had two things that I had done a lot, which were either agile consulting or databases. I spent a lot of time working as a DBA, performance tuning people's databases. 
RVB: You mean a SQL DBA then? 
AT: SQL database administrator, especially on Microsoft SQL server. And so, I would go in and help people with their queries and make them fast. When I started working with Neo4j, I was blown away by the data structures, the access pass that you could take through your data. It opened up ways of looking and working with the data that a SQL database just couldn't give you. 
RVB: This was a an early version of Neo at that time. Yeah? 
AT: I think I joined-- so the first commit that was included in a Neo4j release was 1.2, I think. 
RVB: So the access path, you mean you know the power of the queries, right? Is that what I'm sort of hearing? 
AT: Well, not really. [laughter] That was the problem that-- when I started looking at Neo4j and working with Neo4j was very, very quick. It was super easy to write really performant queries. But the queries needed a lot of hand-holding. You had to do a lot of the work that for someone coming from a SQL background, the query planner does for you. 
RVB: That was an imperative approach to queries. Is that right? 
AT: Exactly. The traversal framework that was the main use of querying databases before Cypher is something that is very imperative in nature. You describe where to start. You describe which path to go through and where you want to end up, and when to do filtering. Stuff like that, you have to make an explicit decision around. So, that's where I started. I thought it was awesome in performance power, but it was kind of difficult to work with. Especially when you came back to the code - the traversal code - after you've written it. You kind of hold it in your head while you were writing it, but then coming back to it was really difficult to understand what you were thinking at the time. 
RVB: So, I feel the birth of a declarative query language coming up here. 
AT: Yes [chuckles]. 
RVB: [chuckles] How did that come about? Tell us about that. 
AT: Cypher was this third attempt I did at a query language. First I started by doing a DSL in Java to try to express your queries, and little bit higher level than what the traversal frame work gave you but that was super difficult then, not pretty at all. Then I did JavaScript wrapper around the API so you could get a REPL. You could go in and try your queries live without having to bait the little program. And I added a little bit of sugar around the graph database API but still, that was not very useful. And then, we started sending a text file around with examples of how do you wish you could express your queries. Me and Mattias were working with-- I always had a couple of people in the office. No one really took it seriously, because none of the clients were using it or were interested in it. And it was difficult to get any interest from higher-ups in the organization. It was not something we were selling at the time. 
RVB: I seem to remember that there was something with Scala, as well. 
AT: Right. Because it wasn't a super important project from the organization, it was something that we spent-- I spent the 20% time that we got, and weekends and evenings, working on this stuff. If I'm working evenings, I'm not working in Java. So, I looked around. I was looking for something better than Java. I've worked in Clojure, but Clojure was very remote-- far away from Java. So, Scala it was. 
RVB: So you wrote and still write the Cypher part of Neo4j in Scala? Right? 
AT :Yes. All the compiling of a query is done in Scala. 
RVB: Super. Pretty cool. So that's very interesting. Can you tell us a little bit more about the future, Andrès? Where is your part of Neo4j and where is Cypher going? Would you mind sharing a little bit of light on that? 
AT: Sure. The language changed a lot in the first few versions. Since the 2.0 release, it's stabilized quite a lot. We have not added a lot of constructs. The language, we haven't added many new functions or features to it. And I said, we've been focusing on making what we have run as quickly as possible. It doesn't matter how pretty a language you have. If it runs slowly, no one's going to use it. 
RVB: Exactly, yeah. 
AT: So that's what we spent big parts - most of 2014 - working on. That should be visible in the 2.2 release coming up now. And, in the immediate future we have more of that. There's more performance stuff that we want to do. We want to look at-- hopefully, we want to get to generating code for execution plans and compiling it. That should give a nice performance boost. 
RVB: Is that a little bit like a stored procedure time thing, or am I reading that wrong?
AT: No. The product of the compiling is something that you can run. And when you run code you can either run it in an interpreted mode or a compiled mode. You've heard those terms before? 
RVB: Yes, I have. 
AT: What we have today is an interpreted version of Cypher. We build the tree structure and we execute that tree structure. And we need to interpret it every time we come across it. What we want to do instead is to actually generate Java code, which we dynamically compile and load, and execute. 
RVB: Super interesting. That should give us a big boost in performance. Any other big things that are coming in the future, do you think? 
AT: That's from the implementation perspective, that's something that we've spent quite a bit of time on. We're interested in how to distribute this, and to running more - either servers or threads - on it. For long running queries, not for the short-lived ones that just take a couple of milliseconds. There's not much point in distributing it. 
RVB: No. But things like PageRank and betweenness calculations, those types of things, you're talking about, right? 
AT: Exactly. Analytical queries. So that's something that we talked about, thought of. And then, on the other side of things is we want to add more indexes. We have-- we want to add text searching. We want to do stuff around dates. There's a lot of features that-- I mean we've spent a lot of time making the engine run smoothly. And now, I think it's time to start adding up new bells and whistles to the language as well. 
RVB: Andrès, there's so many things we could talk about. I do want to keep these podcasts a little short and snappy. So, any other final remarks you want to give our listeners? Or should we keep it at this? What do you think? 
AT: No, that's-- I don't have anything else to add [chuckles]. 
RVB: Very cool. Well, in any case, I really thank you so much for coming online and doing this little recording with me. This is super nice. 
AT: Thank you for having me, Rik. 
RVB: Yeah. It's fantastic. I'll look forward to all the wonderful things that you guys are working on. It's made a big boost in 2.2. And I'm sure it's going to be even better in the future. Thank you. Thanks a lot. 
AT: Thank you, Rik. 
RVB: Okay. Have a nice day. Bye. 
AT: Bye-bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a Comment