And here's the transcription of the chat:
RVB: Hello, everyone. This is Rik from Neo and here we are again recording another episode for our podcast. Tonight I've got a wonderful guest from all the way over in Greece. It's the ever so lovely, Chris Gioran. Hi, Chris.
CG: Hi, Rik.
RVB: Good to have you on the podcast.
CG: It's good to be here.
RVB: Chris, most people won't know you, at least not-- they might know your work but they may not know you as a person, so would you mind introducing yourself?
CG: Sure. I'm Chris Gioran. I come from Athens, Greece. I'm a software engineer and I have been in the boiler room of Neo4j working in the kernel, deep within-- very close to a disk for the past four years - four years and something now. I've been in the kernel team since pretty much the beginning, and I've also moved over occasionally to the HA component, and I'm currently one of the two primary authors of the current HA offering when we moved from Zookeeper to HA, to PAXOS, basically.
RVB: Chris, how did you get to Neo4j? Can you tell us a little bit of the history there?
CG: Sure. When I was fresh out of university, one of my primary research interests was databases as an undergraduate and I worked a bit in the industry - working as a database optimizer, I guess. I was working with relational databases trying to make them work faster, especially because most of the ORM solutions, like Object Relational Mapping solutions that people used to create websites, are not that efficient with the SQL they produce. And being into databases, being really into software engineering and Java, I started looking into the NOSQL solutions that were popping up at that time, and Neo4j drew my attention because it was written in Java and it was - and it still is of course - a true ACID database and I really wanted to get into the actual implementation of the thing instead of just being a user. So I started reading the source code and I wrote the page describing how Neo4j manages its transactional aspects, how it stores information to the disk, how it ensures locking, isolation, all the good things that we've come to expect from databases.
RVB: You were not working for Neo at the time, right?
CG: No, no.
RVB: You were just a community member, right?
CG: I was just a community member - if even that. I was just a guy interested in how it works, basically.
RVB: Very cool.
CG: I wrote those articles and Peter Neubauer picked them up along with the rest of the team - which was very young at the time - and we got to talking about it. I got the opportunity to write some code. Basically, it was around getting external transaction managers to work with Neo4j so that you can have real two-phased commit between, for example, Neo4j and another database like MySQL. And we integrated that into the kernel and as they say, everything is history after that.
RVB: Yes, the rest is history, absolutely. And then you started working for Neo as a software engineer. What were some of the main things that you worked on? You mentioned the HA implementation, right?
CG: Right. Like I said, I started off working in the kernel, basically. So my first big task was moving into the new property store that we use right now which is more compressed than the original versions, taking up less space, and it's also more efficient because it takes up less memory and you can read more from disk with one go. After that though, I started moving to HA. Initially I tried to optimize the way that we used Zookeeper to make big cluster offerings work more efficiently, but then we saw the shortcomings of that approach. And me and Rickard Øberg, we got down and we rewrote the way that HA works and we moved away from Zookeeper which, great software as it may be, it wasn't fit for our purpose. And we wrote, from scratch, a Paxos implementation which does pretty much the same thing but in a much more controlled fashion in a way that we can debug it and maintain it and making it finally performant.
RVB: And Paxos, Chris? So just for our listeners - Paxos, that's a protocol, right? It's a high availability protocol--?
CG: Basically, it's a distributed consensus mechanism. It's one of the primary protocols used for atomic broadcast, and in simple words it means that it makes sure that all the machines in the cluster know exactly the same things, even in the face of partial failures or complete failures. And that's what we use.
RVB: That's what Neo4j, the current version - 2.2 - uses, right?
CG: Yes, that's exactly right. Since 1.9, basically. 1.9 had both Zookeeper and HA as an offering. You could switch between the two, or close anyway. But since 2.0 HA, the current Paxos offering has been the only thing that we use.
RVB: Maybe I can sort of quickly zoom out a little bit. You mentioned that you were interested in databases already, but was there anything specific about the graph model or the graph database model that attracted you to Neo? What did you like about Neo at the time when you started using it?
CG: My first interest in Neo was mostly the technology behind it. It was a grassroots database that had ACID guarantees, and that's what drew me to it. It wasn't the model, to be honest, but very soon I came to realize getting involved in all the ecosystem and seeing how people used this both as community members and as large deployments that we had at the time, that even though it was such a small code base, and it really wasn't as mature as most of the relational offerings, it offered very similar guarantees but insanely faster performance. That was the thing that struck me first. So it was the lack of joins, basically.
CG: The other was the lack of impedance mismatch between the object-oriented paradigm of programming and the way that you store stuff in a graph. Because when you use a relational database, you have a round peg that you try to fit in a square hole, basically. But when you have a graph, you can map pretty much one to one all your domain objects onto the disk and you will never know the difference. And testament into that has been the Spring framework which was essentially the effort of just one person, Michael Hunger, who singlehandedly provided an ORM from Spring onto Neo4j. Whereas, if you see solutions for the corresponding relational problem, they are insanely complicated and have lots of shortcomings.
RVB: Chris, where is it going? Maybe we can zoom in on that one a little bit. I know that you are taking on some new adventures personally. You can talk about that if you want, but where do you think the graph space or the graph database space is going as well? What's your opinion on that?
CG: Well, judging from the start that we have seen that the market wants - like our customers as well as our long research interests and the way that we want to take things - I can see that I see two trends. One is graph processing, global graph processing. That's something that looks really, really interesting, which is-- whereas most of the new SQL solutions right now, perhaps they are better suited for online transaction processing. We also want to move into graph global queries and do application batch processing of very, very, very big graphs. Provide functionality like a graph compute engine or huge graphs that you can process really fast and do data mining or do interesting calculations.
RVB: Like distribution of queries and all those types of things? That's what you're thinking of?
CG: Exactly. And that actually leads us nicely to the second thing that I'd like to see which is really, really big graphs. Right now, there is no real offering for having graphs that are practically unlimited in size. This is something that we are looking into for Neo4j. We've been doing so for a long time - semi-publicly - and we really want to move into that direction. I'd really like to see clusters of thousands of machines processing huge amounts of data with the ease that we've come to know from Neo4j when it comes to single instance data. So it's not only performance, it's also the ability that you gain, the kinds of stuff that you can do very easily when you have that technology.
RVB: Chris, you personally, you are going to do some new interesting adventures, right? Do you want to talk about that?
CG: Yeah, sure.
RVB: Or do you want to mention that?
CG: I can talk briefly about it. Apart from my software engineering interests and graphs in particular, I'm also very, very interested in doing some work in journalism. For the past two months, I think, I've been working as a data journalist in a new venture, like an NGO in Athens, Greece, where we try to do that sort of work. Currently I'm the only junior data journalist on staff.
RVB: Does it have a name already? Does the agency have a name already?
CG: Yeah. The name is The Aeneosis. Which sounds like a Greek word, but it really isn't. It's a portmanteau. It's a concatenation of two words from Greek.
RVB: I'll put a link to it on the blog post that goes with the podcast, maybe that's [crosstalk]--
CG: Sure, when we launch because we don't have a site right now. We're launching mid May. But that's one of the things that-- data journalism is also a domain that can gain from graphs, by the way, and we already have projects starting that will be using Neo4j basically for ontology processing to start with.
RVB: Chris, I think we're going to wrap up. We want to keep these podcasts reasonably short. Thank you so much for spending time with me. I really appreciate it. And good luck with your ventures both at Neo and with your agency. Thank you for coming online, Chris. I appreciate it.
CG: Thank you for having me, Rik. Thank you for doing this, and thank you for everything.
RVB: Cheers, bye.
CG: Cheers.Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!
All the best
Rik
No comments:
Post a Comment