Monday, 24 April 2017

Autocompleting Neo4j - part 3/4 of a Googly Q&A

In the first and second post in this series, I explained and started to explore some of the more interesting "frequently asked questions" that seem to surround Neo4j on the interwebs.
Today, we'll continue that journey, and talk about Lucene, transaction support, and SOLR. Should be fun!

2. Does Neo4j use Lucene

This one is a lot simpler to answer - luckily - than the scale question that we tackled in the previous post. The answer is: YES, Neo4j does indeed leverage the (full-text) indexing capabilities of Lucene to create "graph indexes" on specific node-label-property combinations.

Graph Indexes are a little different from what you may be used to in the traditional database world, as they are fundamentally used in a different way. In a native graph database like Neo4j, you basically only use indexes
  • to find a starting point in your traversal
  • gather heuristics about the different ways that you can do the traversal, in other ways, enable the query planner to do its job efficiently
All of these indexes are based on Lucene, but you do have two ways to use it.
  • you can manually add node- or relationship-properties to an Index, a system we call "Manual Indexes" (some people mistakenly call them "legacy" indexes - but sobeit) which is described in detail over here. This is the older way of using Neo4j indexes, and basically allows for very fine grained, but manual (you have to do it yourself and add stuff to an index from your code) way of indexing graph content. The database does not guarantee that the index is consistent with the database - you may have data in  the database that is not indexed (because you did not manually add it to the index), and you could also have data in the index that is no longer physically in the database (because you forgot to remove it from the index, for example).
  • you can use Neo4j's fully automated "schema indexes" - which do guarantee consistency between the index and the database, as you would expect from a database. Under the hood these also use Lucene, but they automate the adding/updating/deletion of data from the index when you do a corresponding operation on the graph. This essentially provides the database with more information about the data that it is storing, and will enable us to do much more intelligent management and querying of the data as a consequence. You can read up on Schema Indexing over here.  
Other useful articles on Lucene and how Neo4j uses the technology, can be found on:
Some of these articles are a bit older - but I think they could still be useful. The summary to our autocomplete question, however, is quite clear: YES, Neo4j does use Lucene!

3. Does Neo4j support Transactions

Another fairly easy one to answer, but maybe a bit more tricky to explain.

In short: YES! Neo4j is a fully transactional, ACID, "you-can-trust-me-as-the-source-of-truth" database. It does not loose your data - full stop. There's quite a bit of material written up on that, which I won't regurgitate here. Take a look at
In many ways, I find this autocomplete question a bit perplexing and bizarre. It's kind of weird that, in today's world of NOSQL databases, it has become kind of the default for database management systems (ie, things that are supposed to manage and safeguard your precious data) to loose or corrupt data. It's bon ton for these systems to give up transaction support - one of the fundamental underpinnings of a safe data management system - and therefore assume that "we may loose some tiny bits of data here or there, but that's ok". Don't get me wrong - I totally get NOSQL, and I actually respect it greatly in many ways - but I do feel that this ACID transactional stuff should not be discarded lightly. If you are going to assume that you are going to be loosing data - then you better damn well know what you're doing and what kind of data it is that you may be loosing. The consequences may be bigger than you think.

On that topic, I also JUST HAVE TO post this video that I saw a few years ago from James Mickens (then at Microsoft research, now at Harvard) at Monitorama PDX 2014:

Watch the entire video for lots of fun, but from minute 12m30s or so James really rants about the lack of transactional support in NOSQL databases - it's hugely funny and quite accurate in my book: let your reads and writes choose their own destiny!

The point being: transaction support is still ever so important in modern day applications, and it should be. Especially in the connected domains stored in graph databases, transaction support should be considered crucially important. The reason for this is quite clear and simple: in a graph, data corruption not only affects the entity that is being written to, but could potentially affect the entire connected structure that is connected to that entity. If I get a node/relationship write operation wrong, I may end up disconnecting / wrongly connecting entire parts of the graph structure - and therefore we have a much more troubling data consistency requirement than most other databases may have. Graph databases really should never be allowed to loose or corrupt data - it's just too important.

So: recapping the "does neo4j support transactions" question: yes it does, and that's a very, very good thing.

4. Does Neo4j use Solr

The fourth autocomplete question that we are going to answer, is kind of an easy one - mostly because it is related to the Lucene question above.

I guess the answer to the question basically boils down to a kind of an architecture diagram of what contains what. Let's add three with some comments:

  1. this first picture kind of shows you that SOLR essentially uses Lucene indexes as it's core data infrastructure.
  2. it also just simply says so on the SOLR homepage: SOLR is built on Apache Lucene.
  3. last but not least: there seems to be a bit of a discussion / "frenemy" relationship between SOLR and ElasticSearch - as both are essentially based on the same core Lucene infrastructure. I am no expert on any of this by a long shot - but it seems relevant to mention this.

So we can really summarize the answer to this Neo4j Autocomplete question with another Google search: does solr use lucene? Yes it does.

From an end-user's point of view, the importance is of course in what the Lucene integration brings to Neo4j in terms of full-text graph search and graph querying. This is really important and useful in many ways. However, the Neo4j Engineering team is also hard on some added (composite) indexing functionality that may change and enrich the indexing infrastructure in Neo4j.

So that's that - answered another couple of high-flying questions that were clearly top-of-mind for some of you. I hope it was useful.

Let me know if you have any comments.


Friday, 21 April 2017

Autocompleting Neo4j - part 2/4 of a Googly Q&A

So in the previous post, I explained my plan of doing a series of blogposts around the most frequently asked Google questions as recorded and suggested by Google's Autocomplete feature.
We'll start this week with the most asked question of all - which I get all the time from users and customers - and it's the inevitable "scale" question. Let's do this.

1. Does Neo4j Scale

Let's start at the beginning, with the first question that lots of people ask is: "Does Neo4j scale?" Interesting. Should not surprise anyone in an age of "big data" right? Let's tackle that one.

To me, this is one of the trickiest and most difficult things to answer - for the simple reason that "to scale" can mean many different things to many different people. However, I think there are a couple of distinct things that people mean with the question - it least that's my experience. So let's try to go through those - noting that this is by no means an exhaustive discussion on "scalability" - just a my 0,02 Euros.

a. Scale = lots of data volume

Size does matter - right? Or does it? Most graph applications are not really about volume - they are more about complexity and connectedness, and that's probably still the sweet spot for Neo4j. But: big data is here, and it's here to stay - so what do I do if I really do have a LOT of data and I want to use Neo4j to master all of it... Can I do it?

Well: there used to be a time when Neo4j had some (admittedly, very high - but still) hard limits with regards to the maximum numbers of nodes and relationships that the database could ever store - even if you had the big box on this blue earth to run it on. There were - and there still are - very good reasons for those limits: Neo needed to find a balance between the size of the addressing space and the overhead that it represented, and put a stake in the ground with regards to that compromise. This meant that there had to be an upper limit: ever Neo4j database would have a certain amount of addressable data.

In version 3 of the Enterprise version of Neo4j, however, my colleagues were able to remove those upper limits (or actually: make them ridiculiously high so that you could store all the atoms in the universe in Neo4j if you really wanted to) and still preserve all of the good purposes that these limits served. So today, I think this aspect of "scale" is a non-issue for Neo4j users: you can store as much data in a Neo4j machine as your hardware will allow. And there's hardware out there that can fit *a LOT* of data.

b. Scale = lots of reads

If however, what you mean with "scale" is something totally different, specifically the ability to serve lots and lots of clients that need to read data from the graph database, then Neo4j really has a great solution for you. With the new causal clustering architecture that was added in 3.1 and refined in the upcoming 3.2, Neo4j Enterprise has some amazing features to really let you scale out your Neo4j servers, and handle virtually unlimited amounts of read operations. The clustering architecture splits your Neo4j server into two groups of servers:
  • core servers: dedicated to safeguarding your data, making sure that data can never be written unsafely
  • edge servers: dedicating to enabling you to have lots of clients read from the cluster, and add more replica's as you and your application and it's read query volume grows.
This architecture really is a very modern and state-of-the art way of making sure that the database *cannot loose data*, and at the same time, maintains its operational efficiency and scaling-out characteristics.

c. Scale = lots of writes

Hand on heart, I would still claim that Neo4j has a very sound approach and architecture for scaling out write operations. Sure, we can still improve stuff (eg. memory management, batching of writes, etc) but really we have very performant and optimized write channel ... to the graph. Which is kind of a fundamental point, right: we are writing to a connected structure, to a graph - which means that when we add data to the database, we need to not just write the data - but also connect it up to the rest of the graph. Fundamentally, we need to do more work at write time than the average NOSQL store - which we readily accept because we believe that we will win that time back at read time.

But as always, it's still a good idea to really understand the workload: do you really need to connect the data (or is it enough to just store the exceptional connections)? Can you batch the write operations? Are you trying to do too much work in one go for this machine? It does not hurt to be mindful of that - and that will allow us to scale writes just fine on Neo4j. You still have to know what you're doing though - like with any complex system. If you don't, then don't be surprised to hear a big BOOMing noise.

d. Scale = lots of complex real-time queries

Then of course there's another definition of scale, which would be right up Neo4j's alley. If what you mean with scale is that you want your database to be running lots of complex real-time queries, then oh yeah - that's what Neo4j loves to do best.

The key words here in my book are
  • complex queries: Neo4j is a graph database. Graph databases are good for highly connected domains, and running queries over these domains that require lots of different entity types to be evaluated. That's what makes a query complex. In a traditional database that would require your database to do lots of complex join calculcations - and guess what: Neo4j does not need to do those. More complexity <> more joins in Neo4j. It's way better at executing these types of workloads.
  • real-time queries: Neo4j is a graph database, which means that it is really geared to answer your requests immediately, as in, between a web-request and a web-response. The query is executed immediately, in real-time, and brings back the results immediately. It's not really geared to analytical workloads - although it can of course be used for some and can actually be quite efficient at them.
The more complexity, and the more real-time the query patterns of your use case are, the better the "scalability" characteristics of Neo4j will compare to other - no doubt solid and dependable - database infrastructures.

e. Scale = sharding

Finally, I would like to address one of the most common - but in many ways also one of the most bizarre - definition of scale. But I cannot, not address it in this blogpost, as I get the question every other day from lots of smart people. So let's talk about it.

First, let's try to define sharding. As always, we go to a wikipedia page where a lot of very solid info is posted, the summary being that

So in my own words, the question of sharding basically means: "does Neo4j automagically distribute the data in the graph over an arbitrary number of commodity hardware machines?" , and to that, the simple and straightforward answer is of course that it doesn't. Neo4j does not shard data automatically - there, I've said it.

But let's try to move beyond the simplicities of that question, and try to have a real answer. I'll do so in a couple of points:
  1. When you say sharding, you effectively say that you want to put different parts of the connected graph structure on different machines. Of course, you can do that manually today (Neo4j does not mind it at all - manual sharding is a fine and common usage pattern for it), but the question is how would you automatically decide where to "cut" the graph? How do you partition the graph? How do you decide which part of the graph goes to which machine? That, my friends, is a VERY tough - if not impossible and NP hard - problem to solve in the generic case. Really, what you need is some more intelligence about the domain model that you are trying to store in Neo4j, and then you may be able to make a more informed, semi-automatic, decision on what data to put where. And even if you do that, it still remains a very arbitrary decision to take - and you will likely be taking it at the expense of deep traversal power. Deep traversals - the sweetspot query pattern for Neo4j - will at some point inevitably hit a machine boundary and start slowing down, dramatically. It's just natural. Is that really what you want?
  2. When you say sharding, you effictively mean that you want to have a fully distributable database that can have data on lots of different machines and still behave as one system, right? Well - how do you want to do transactions then? We have discussed before that we really believe that transactions are essentially important to graph database management systems because corruption tends to spread in a graph... so do you really want us to give that up just to get that sharding thing that you talk about?

    Not to beat the dead horse - but the philosophy of Neo4j is all about the fact that transactions are important, and distributed transactions are very hard. Only Google has really been able to implement something vaguely similar to it - with their cloud-based Spanner service - and we all know that they have hardware, networking, datacenter, and other skills that average organisations just don't have access to. 
For these two reasons, Neo4j has essentially always opted to remain "single image", so that all the members of a Neo4j Causal Cluster, will guarantee to have the same data, AND transactional. It still seems to us that that is the best thing to do in today's computer science environment. One day we may change that - and one day we may evolve the Causal Cluster into a fully distributable database - but that day is not there yet. And for all the use cases that Neo4j excels at today (which all require fast, deep traversals in real time) it remains to be seen if it really would be the best option. 

So, in each of these perspectives, I think we can claim that Neo4j does in fact scale very well. Of course it does not do everything, of course it will suck at some things and excel at others - show me a system that does not have to deal with these trade-offs.

But if I were to summarize the answer to the autocomplete question: Does Neo4j scale? Yes, it definitely does. We have the users and the customers to prove it.

In the next post, we'll continue the Autocomplete questions with more answers - that will hopefully not take such a long-winding answer :)

As always, feedback very welcome.


Thursday, 20 April 2017

Autocompleting Neo4j - part 1/4 of a Googly Q&A

As you can probably tell from this blog, I have been working in the wonderful world of Graphs for quite some time now - Neo4j remains to be one of the coolest and inspiring products I have ever seen in my 20 odd years in the IT industry, and it certainly has been a thrill to be part of so many commercial and community projects around the technology in the past 5 years. Not to mention the wonderful friends and colleagues that I have found along the way.

One thing that does keep on amazing me in working with Neo4j, is the never ending
  • stream of use cases, industries and functional domains where graphs and graph databases can be useful
  • stream of new audiences that we continue to educate and inform on the topic. Every time we do a meetup or an event, we seem to tap a new source of people that are just starting their journey into the wonderful world of graphs - and that we get to talk to and work with along the way.
When dealing with these new audiences, it's also pretty clear that we ... keep on having the same types of conversations time and time again. Every new graphista that gets added to the community, is asking the same or similar kinds of questions... and most likely, they are going to google for answers.

This leads me to the topic of this blogpost, which is both fun and serious at the same time: we are going to try and autocomplete neo4j :) ...

Autocompleting? What's that again?

When we talk about autocomplete, we talk about this amazing technology that Google has built into it's search functionality, that completes your search query as you type - often times "guessing" what you will be looking for most likely before you even thought about it... it can be pretty interesting, even eerily scary sometimes...

Google made a wonderful practical joke about on April 1st a couple of years ago - which I thought was really funny:

But nowadays, it's come to be this really serious and important tool for Google, allowing their users to find what they are looking for - and, of course, what Google wants us to be looking for - really quickly. They actually go through some length at explaining it on their help pages:

Now, not too long ago I saw these really funny youtube video series recently, that Wired ran, basically bringing together the world's most searched for topics for a particular person, celebrity, show - and asking these questions to that person/... live and recording it. Really cool. Here's the Youtube playlist if you want to waste some time on a Monday/Friday:

There's quite a few of them that are really funny - like for example the Sesame Street one:

Or if you want to have even more fun with this - just go and play the "Google Feud" game, where you have to guess the most searched for terms:

So you get the drift. And then my mind started drifting: why not use that kind of google autocomplete on Neo4j, and answer the web's most asked questions about Neo4j, once and for all, in a blogpost? Let's give it a try.

Autocompleting Neo4j

So the first thing that I noticed here, is that the autocomplete really does change with the web. I just ran the "does neo4j" entry into my Google search box, and the difference between last week

and this week

is kind of interesting. So I will go through 6 different questions here, and try to answer these as good as I can - without spending too much time on boring details and discussions. These questions are:

  • Does Neo4j scale?
  • Does Neo4j use Lucene?
  • Does Neo4j support transactions?
  • Does Neo4j use SOLR?
  • Does Neo4j support Gremlin?
  • and last but not least: Does Neo4j  support SparQL?
So when I started out writing answers to these questions, I found that there's quite a bit to say about it - so I am going to split it into 4 different posts which I will publish of the next few days - just to make it convenient for everyone to get through the material. 

So: look for the first post in the next few days - should be fun!



Friday, 17 March 2017

(Another) Podcast Interview with Alistair Jones, Neo Technology

Just before ending the week, I thought I would publish another great episode on our Graphistania podcast. Ever since the launch of Neo4j 3.1, I had been wanting to do an episode about the new Neo4j clustering architecture. It's so innovative, new and a great piece of engineering - we just had to sing its praise :) ... So who better to invite back to the podcast than Alistair Jones, who was one of the lead engineers at Neo Technology to pursue the effort. Here's our chat:

Here's the transcript of our conversation:
RVB: 00:02.563 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology. And here I am again recording the second podcast of this year. I know it's only two months into the year, so I've been slacking, but I [laughter]--
AJ: 00:15.346 You've been picking up the pace again.
RVB: 00:16.082 Yeah, picking up the pace again. And for the second episode, I have invited a returning guest to our podcast, and that's my friend and colleague, Alistair Jones, from the Neo Technology engineering team. Hi, Alistair.
AJ: 00:28.380 Hi, Rik.
RVB: 00:29.089 Hey, thank you for making the time. I know you're a busy man these days, so thanks for taking the time. Alistair, the reason why I invited you back is because I know you've been hard at work in the engineering team, on some of the really big, new features in Neo4j. 3.1 was released at GraphConnect San Francisco last year. Or, no, it was actually announced and was released a little bit later, but one of the biggest new features in Neo4j 3.1 was the new clustering architecture, right?
AJ: 01:03.036 Yep.
RVB: 01:03.119 And that was what you and your team were working on?
AJ: 01:05.054 Yeah, it was a really big thing for us, actually. So I've been working on this area for nearly two years, actually, on this new clustering architecture. And as you know, Neo4J is a clustered database designed to run over multiple servers. And we've had clustering in place for six or seven years in Neo. This is the biggest change we've ever made, by miles. It's a huge, huge upgrade of all the technology around the clustering.
RVB: 01:40.760 Wow. I remember like in version 1.8 it was like Zookeeper that was doing some of the work.
AJ: 01:46.629 Yeah, we had a small change in the 1.9 release back in the day.
RVB: 01:54.312 Back in the day, yes.
AJ: 01:55.544 This is a much bigger release in 3.1.
RVB: 02:00.387 So what's it all about?
AJ: 02:01.882 So the first part of it is getting up to date. So the world around us has moved on, and one of the great things about Neo is that we can take research from academia and actually apply it. So reasonably recent stuff that, if you read the academic papers and blog about it, we read all those, and some of those things we can put them fairly quickly into the product. So, for us, this time, it was doing the Raft protocol, which is a consensus algorithm. So what that means is getting agreement between participants, so computers in this case. We--

RVB: 02:52.316 Members of the cluster, right?
AJ: 02:53.396 Yeah. So, in this case, it can be different services in the class are getting consensus between those servers, when the servers themselves and the communication between servers is potentially unreliable. So you need to account for the unreliability in the design. Now, we know a little bit about consensus algorithms because previously, back in that 1.9 release, we implemented Paxos. And, at the time, that was the kind-of state-of-the-art thing to do. Raft, you could argue, at some theoretical level, is the same thing, but it's much more clearly structured. Raft is--
RVB: 03:38.960 You mean a consensus protocol, right?
AJ: 03:40.209 Yeah, yeah, exactly. So it's from Diego Ongaro, who's the lead researcher in this area, and it's really impressive how it's described.
NOTE: Diego reacted to this part of the podcast with a super-cool tweet:
It's actually aimed to be simple to understand and to explain. And that makes it really good to implement because you can be very clear about what you've done. You can see the direction that you've gone in. So we've changed from one consensus algorithm to another.
RVB: 04:13.411 Yep. Which is a big change [crosstalk].
AJ: 04:14.871 Which is a big change, but architecturally it's totally different, because previously we were using Paxos to agree on membership of the cluster. So actually a very small amount of data. Not that many servers. They don't go that often. Now what we're doing is we're using Raft, and we're using it for every single transaction in the database. So every single node, relationship, property you create in the database it goes through the Raft protocol. You've got consensus across the cluster. And what that means is that every single change is agreed to by a majority of the cluster, so no matter what happens in terms of loss of connectivity or failure of the minority of the servers, still, the cluster as a whole agrees on what the state is as you move forward, so--
RVB: 05:06.995 Sounds a bit like open heart surgery to me.
AJ: 05:09.230 Yeah, it's quite a major change, but it's actually really nice. Once you've got that super solid foundation, you can build a whole load of things on top of it. So it's extremely solid for-- it's like the most reliable we could make it, and it stores every single transaction in this replicated log across all members of the cluster. And also as the membership changes, that's agreed to with protocol as well, so you know every time who the people were, who the servers were. People were allowed to [inaudible] transactions and to get them committed. So the whole thing's very tightly integrated into the core of the clustering.
RVB: 05:52.520 So I would never claim that I understand everything about it, but what I've read is that it's very different architecturally in terms of-- previously we had masters and slaves, now we talk about cores and edges, right?
AJ: 06:05.636 Yeah. The second part of this is that we were aiming to have much larger clusters than people had previously been running in Neo. Neo's been around for a long time. And, previously, people used to think of having 3, 5, 10 servers being a large database cluster. Now people want to run hundreds of servers, and we have customers and users running 200 servers in a database cluster. We want to be able to get higher than that, and the consensus algorithm that we were using before, the design of it, or perhaps the membership, yeah, it had a sort of limit on the-- or do we say kind of--? It was hard to get to that scale.
AJ: 06:55.530 And the reason is that all of the servers had to be aware of each other and what they were doing at any stage to basically make sure that they hadn't disappeared. So that led to heartbeats going from every server to every other server, and that ultimately gets very expensive when you have a large number of servers. It also gets very difficult when you're committing across the majority of the servers because you have to wait for a large number of them to come back before you can say, "Yes, this is now safely committed."
AJ: 07:30.229 So just having one huge cluster of Raft servers is not a good design for that kind of hundreds of servers or thousands of servers. So we came up with a new architecture. And what we do now is we divide the cluster into two groups. We mark some of the servers as being in what we call a call. Call servers participate in Raft and they are about safety. They're about storing your data durably. Secondly, we have a lot of potentially much larger group of read replicas. And these are servers that are for running your queries on, and--
RVB: 08:14.895 Read queries, not write queries.
AJ: 08:16.298 Yeah, yeah, read queries. So you don't have to worry about safety here, and the idea is these are about-- they're disposable, where you can scale them up and down; when your web traffic is high a certain time of day, have more and more of them.
RVB: 08:30.725 Just have more of them, yep.
AJ: 08:30.721 [inaudible] your cloud instances when it's quieter, and you can adapt to the shape of your traffic with the read replicas. What's interesting is that the name read is that we're doing more service than reading. Why does that make sense in a--? How does that help you in a database, have more read only things? Surely you need them more so to write. Well, that's because of the shape of graph data. It's because, actually, when we look at the-- I'll show you, because it's a nice slide [laughter] with audio only. You're looking at a slide that shows kind of how we see people do stuff with graphs, and what you notice is that the right [inaudible] updates tend to be quite small.
RVB: 09:17.172 Local [crosstalk]?
AJ: 09:17.605 Yeah, very, very local. Like, two or three nodes in relationships, up to maybe 100 things in a transaction, whereas on the read side - the whole point of graphs is to really fast, and people go a long way - they traverse along the graph in a read transaction. So they're doing hundreds of thousands of relationships in one transaction. Now, that's very fast, but it still takes resources. It takes memory bandwidths, it takes CPU to run these queries. And that's what people are really hammering their graph with, thousands of these, each very big, queries. And that's an enormous amount of computational load. We want to spread that across a lot of servers, and this is a way to do it - have loads of re-replicas that can handle that traffic for you. So it is really helping you in the kind of [inaudible] applications. It's a very specific architecture to the type of system that we're building.
RVB: 10:13.028 Pretty cool. And so, as I understand it, the core is-- so they're all about the safety, and about writing to the graphs, and the age servers are all about reading. Is there any downside to this? Is this good news show all the way around, or are there some things that we should take care with?
AJ: 10:34.656 So there's one thing that's just like-- a challenge here for people when they're deploying these type of applications, is that the transaction's being pushed out from the core, out to the B replicates, and there's some delay in that happening. It's very small, but there is some delay. So people call this eventual consistency, and this is something that we're aware of. And lots of modern sort of web systems that you get into this kind of eventual consistency situation. An example of this that could kind of catch you out is, say you're a user, you create an account, or you make a booking, that's a right transaction. It updates the graph. Then when you come to refresh your page, you try another operation and it's a read only operation, maybe you hit a read replica that hasn't quite seen your update, so, as a user, it almost appears like the thing you just did has disappeared, like you've gone back in time. There's a bit of a--
RVB: 11:45.015 It's [crosstalk] read your own writes problem.
AJ: 11:46.385 Yeah, I can't really-- so what we did at the same time as this, is we actually added a whole new feature that became the name of the whole clustering architecture. So this is what I like to call causal clustering, because we added in a feature of causal consistency.
RVB: 12:08.446 Tell me more about that, because I don't know what that means [laughter].
AJ: 12:10.836 Okay, Rik. So causal consistency. So it's actually something that's been-- again, from research, there's some academic and industry research in this area, but it's not very commonly implemented. There are only a handful of other implementations out there, and what it's about is trying to represent what causally has happened in the user's application. So the cause and effects of the changes that you've made.
AJ: 12:45.088 Practically, it's very easy to use. What happens is that when you update the graph or when you touch the graph in any way, the database can give you a bookmark. And this bookmark represents the latest thing that you've changed or the latest thing that you've seen in the database. And then when you make another request to any other server in the cluster, you can supply that bookmark that's saying bookmark, and the database will make sure that it has at least as up-to-date a state as the bookmark represents. So the bookmark is just a little string and it comes back to your database driver into your application code. You can store it in your application server, or you can hold onto it temporarily while you make another inquiry, or you can send it all the way back to the client. You can send it back to your web browser or your mobile device, and route it back, ultimately, to the database.
RVB: 13:46.373 So that basically assures that the client of the database always takes into consideration everything that it calls [crosstalk]?
AJ: 13:54.484 Yeah, it prevents you from going back in time--
RVB: 13:56.751 Ah, yeah, that's it.
AJ: 13:56.890 --is what it does. And it supports a totally stateless architecture - everything between the user and the database. The database is storing state. Why should you need to store it anywhere else? So this is [inaudible]. Your sessions, you don't need to worry about sophisticated routing. Just have stateless application servers, pass your bookmark around, and you get causal consistency. That's the idea.
RVB: 14:28.017 Wow.
AJ: 14:28.699 And we've tried to make this even easier to use by building some of the primitives. The kind of passing backwards and forwards keeping track of things is built into the database drivers. So in 3.0, we introduced--
RVB: 14:42.424 BOLT drivers, right?
AJ: 14:42.490 Yeah, the BOLT drivers. So they initially supported native language drivers in your [crosstalk]--
RVB: 14:48.561 Right. And so the new version of the driver supports this bookmarking--
AJ: 14:51.515 Exactly, yeah.
RVB: 14:52.721 --and that gives us the causal consistency.
AJ: 14:54.338 The causal consistency, yeah. Exactly.
RVB: 14:56.476 So let's talk a little bit about the future. What's coming up? What are you working on now, and what keeps you up at night, and [laughter]---?
AJ: 15:03.404 Yeah. Well, [crosstalk]. I mean, it's kind of following on logically from where we are now, so the next stage of this is to be-- it's that kind of how people actually deploy this stuff. And these days, not just a cluster of servers that are using it to run a database. It's also servers across multiple data centres and multiple regions around the world. Around the country, all around the world. So that's what the cloud environment's been very easy to do, to have geographic distribution. And we are taking account of that feature in the product, or that server usage in the product. So what we're going to do is make the clustering aware of data centres and how they're organised, and allow the client to give hints about how might be the best way to serve it. So that means that you can do your reads from a server that's very close to you, with a low latency, and you can support fault tolerance across data centres when one of them goes away, or explicitly recover in a disaster recovery zone. All of these different operational scenarios. So--
RVB: 16:24.512 Is that something that's coming up in the next couple of versions of Neo4j or--?
AJ: 16:27.024 Yeah, yeah. So in the next couple of versions, that's the stuff that's going on. And, again, it's to be seamless all the way through the driver, so you write your application once for Neo4j on your laptop, and then it should move forward [inaudible].
RVB: 16:46.117 That's very cool. I have one more question. Don't you miss the visualisation stuff that you were doing before [laughter]?
AJ: 16:52.896 Yeah. So I always miss the visualisation. I try to devote my spare time to get back into it every now and then, so--
RVB: 17:03.147 Very cool. Well, thank you so much for spending your time, Alistair. I mean, we want to keep these podcasts fairly short, but I'm sure we'll include a bunch of links to the documentation and the blog post that we wrote about this topic. I really appreciate you making the time, and look forward to seeing what's up next.
AJ: 17:21.907 Thanks very much.
RVB: 17:23.060 Thank you. Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Monday, 6 March 2017

Podcast Interview with Kristof Van Tomme, Pronovix

Last month I had one of those cool encounters of the graph kind at the Belgian Beerfest that we have been organising a couple of times in the the last few years at the occasion of Fosdem - the amazing open source conference that's taking place in Brussels every year. This year, I got talking to a fellow countryman that has been doing some amazing work on integrating the Drupal content management system with Neo4j - something that has a lot of potential in a lot of areas, I think. So - we just HAD TO have a chat :) ...

Here's the transcript of our conversation:
RVB: 00:03.346 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology. And here I am again the third time in two days, this is wonderful, I'm on a roll here, recording another podcast for our Neo4j Graphistania podcast. And today I have a fellow Belgian on the other side of this Skype call, and that's Kristof Van Tomme from Pronovix. Hi, Kristof. 
KVT: 00:27.466 Good morning Rik. How are you? 
RVB: 00:29.593 I'm really well, and I hope the Skype gods bear with us, because we've had some trouble in the past couple of minutes, but I'm sure it will fine. Hey, Kristof, we met each other at the FOSDEM conference, which was a great experience, and I loved the Beer Fest afterwards [laughter]. But yeah, you told me about some really great stuff that you guys are doing with graph databases. So, first of all, let's start from the beginning, who are you, what do you do and what's your relationship to the wonderful world of graphs? 
KVT: 01:07.202 So I'm a bit of a weird duck because I'm actually a bioengineer who ended up in IT through a biotech startup that did research in schizophrenia. It's a whole other life. But I got involved in the Drupal community a little over 10 years ago when we started making websites for biotech companies. 
RVB: 01:35.332 Okay. Drupal is like a content management system, right? 
KVT: 01:38.557 Yes, Drupal the open source content management system. The other really good Belgian product after beer and chocolates [laughter]. And I got really strongly involved in that community 10 years ago. I helped organise one of the big European conferences, and then we built a consultancy around that. Then, about five years ago, I got really excited about documentation, and reuse of documentation specifically, and how to deliver it and reuse bits and pieces so that you could build deliverables that can easily reuse between different channels. And that's how I got excited about graph databases, and Neo in specifically. 
RVB: 02:32.949 When you say documentation, you mean technical recommendation for software, right? 
KVT: 02:35.667 Yes. Yes, I do. The thing that everybody's like, "Ooh, documentation." 
RVB: 02:41.017 Ah, damn it. Yeah, exactly. 
KVT: 02:44.417 So that's how I got involved in-- because we had one of our colleagues, a long time ago, I think six years ago or something, started playing with graph databases, and actually, he built a first connector for Drupal for Neo. And he's like, "Kristof, I did this thing, and I'm really excited about graph databases, and I think it's cool. Can we do something with this?" And I was like, "I have no idea." So that was the first connector for Neo for Drupal, and then that kind of died because there was-- technically it was there, but then there were no further implementations, and I was not sold, and people didn't figure out how to use it. But then because of the documentation thing, I actually started seeing what you would use a graph database for and that's when I got really excited. 
RVB: 03:46.370 Super cool. Because documentation, I don't know if you notice, but this is where Neo4J started as well, as an open source project, 15 years ago, Viking hackers in a garage. They were all about content management at the time as well because they were working for a media company that was managing digital assets. So it's funny that there's this convergence or link between the two worlds, right? What is the use case all about? How does it work? 
KVT: 04:20.696 So I've been thinking-- I've got this DITA, which is another of those words. It's a standard that's fairly popular in the technical writing community for writing reusable documentation. It's like an XML standard. Some people scratch their heads when they hear about it, and other people are raving mad about it. So in the DITA community, I've been doing talks about consult management systems and open source and things like that. I think two years ago, I started thinking about personalisation and embedding information. What I dream about is this; instead of having a manual that the documentation system knows who you are and serves you the right information when you need it. I did a talk about that at the DITA conference here, I think it was in Europe, and I was thinking, "So how would you do that?" And then I started thinking yeah, actually, probably it wouldn't really work with a relational database because you need to start collecting a whole lot of information and start analysing for patterns. And that's how I started thinking about Neo and graph databases more in general. 
RVB: 05:48.382 So as a personalisation engine for documentation, right? So you wouldn't need to search for documentation as much, but you would have a recommended set of documentations that would be served to you semi-automatically. 
KVT: 06:04.195 Yeah. So it's the idea that, for example, you're in an application, you're in a web app, and you can't find that one damn button that you know is somewhere-- 
RVB: 06:16.996 We've all been there.
KVT: 06:18.043 Yeah, we've all been there. So you're clicking around, and you're going through settings, and I don't know, connections, so you keep going circles and circles and circles because you can't find the damn button. And at that point, the system would say, "This looks a lot like what people do when they're looking for this thing," and then you would get a little pop-up saying, "Are you maybe looking for this?" And similarly, if you're using a certain feature and you're doing something really weird and other people have done that, and then they went through the documentation and found some other feature, then you could shortcut that and skip a few jumps in that graph and immediately serve them the information that they're looking for. So it's kind of like analysing patterns of behaviour that people have inside of a web application and then serving them-- that's patterns of behaviour that they normally do just before going to documentation sites and then serving them that documentation that people normally will find when they go to documentation site after they've done a certain thing, and then serving that information to them. So that's one of the really cool things that I would like to do. 
RVB: 07:32.834 Yeah, I understand. So why is that such a good use case for a graph database? Is that because of the pattern recognition, or what's the secret sauce? 
KVT: 07:44.973 So it's the pattern recognition. So I think CMSs are really good at storing data in a-- storing similarly structured information because most of CMSs use SQL databases and they're pretty good at that, just building up a content model and then reusing that over and over again. But being able to recognise behaviour-- well, that's not something that we are normally doing in the CMS space. We have some very basic things, like there's some recommendation based on the content and shared keywords and things like that, but behaviour analysis is not one of the things that you normally find in the CMS. So for that, we need different technology because in a SQL database you would have to do so many joints to even figure out what's going on, yeah, that I don't think that it would make sense to do it that way. And ideally, it would be a system that you don't have to program everything but that it can start looking for patterns on its own eventually. And that you build this graph of interactions and content and kind of like a graph that combines those two to do things with that. So yeah. 
RVB: 09:04.602 So where are you guys with this? How far along that path are you? I know you've done some prototyping already, right? 
KVT: 09:11.567 Yeah. So we are very, very early. So our main business right now is developer portals. So two years ago we started working-- well, a year and a half ago we started working with APG, that's now part of Google, and they have a developer portal that we are customising for their customers. And we built this whole business around documentation, specifically about APIs, so that's where our core focus is right now. And so the AI and personalised documentation is something that we're doing research on. So the thing we've done currently is we've built a connector for Drupal for Neo - I did a talk about that at FOSDEM - and that was-- 
RVB: 10:02.991 I went to that one, yeah. 
KVT: 10:04.291 Yeah. So that talk was not just about this use case. It was about what could you do if you combine a CMS and a graph database and looking at it from an added-value perspective, rather than a replacement perspective. Because I know that in the DO community people are like, "Just get rid of the stupid SQL databases [laughter]." They're worthless and graph databases can do everything so much better. I think--
RVB: 10:37.056 That's a pipe dream in my opinion. 
KVT: 10:38.368 Probably. You could build a CMS graph database, and I think that could work. But I think that there's so much existing technology already where it's a large amount of extensions and huge communities that it would make more sense to create an add-on instead of a replacement because if you replace it then you have to rewrite everything. 
RVB: 11:05.120 I couldn't agree more. 
KVT: 11:06.124 Yeah. So that's why I think that's their sweet spot for Neo in the CMS community but I think there's two stress facts to this. One is the sweet spot for neo in the CMS community, and that could be recommendation and pattern finding. But then there's also the inverse that you could think about and that's what if you were to put an open source CMS like Drupal in front of a graph database and we use it as an interface to manipulate the graph and to add, maybe, some structured objects into your graph? And then use the CMS to build reports about those objects and the graph to find out which ones you're going to put into your reports. So that was my talk about. 
RVB: 11:57.922 Well, you've already touched on my last question, which is what does the future hold [laughter]? What could we do in the future? And I know that we'll be doing some meet-ups together and I'm really looking forward to those, but where does this go, Kristof? What's in your crystal ball? 
KVT: 12:21.174 So I love thinking about a future. I really love Kevin Kelly's book, The Inevitable. And in that book, he talked about-- I think this is the basic pattern that got me thinking about this, also. He talks about flowing and it's a very, very interesting concept that we're moving from an Internet where we used to have documents to an Internet where we have pages today, where we'll have flows of information tomorrow. And this idea of going from having an object that's structures and it has a context-- has a manual context, or a book context, or a document's context where you put all the information in context of the rest of the book into a very rigid structure. That's how we used to do things. That's how books and manuals were built, even when printing press-- even before the printing press was invited. And what the Internet has been doing, and what search engines have been doing, is that we've been moving towards pages where you can just dive into any object-- sorry, any document, any book, and just find out one page where a certain concept is explained. So you can just jump in. You don't have to read the whole book to be able to understand something. And that's where we are today. But I think that's the next step in this process, and it's also what Kevin Kelly talks about is flows, where you have a flow of information that's much more personalised, and we're just constantly dipping in and out of these information flows around us that are serving us the documentation that we need at a certain time to be able to do what we need to do and that are aware of our contexts so that we don't have to adjust to the context of the documentation, but the documentation adjusts to our own personal context, and I think-- yeah? 
RVB: 14:31.872 So what I'm hearing is you see this graph database integration and everything that you guys are building as a means to that end, to get there somewhere, somehow, to get closer to it. 

KVT: 14:44.560 Yeah. So we have a first customer where I've been talking about this concept, and-- they're an SaaS company. So what I imagine is that we could track users, the administrators as interacting with the software, and then basically serve them the contents this way where you look at their whole experience inside of your tool, and then you serve them the information they need to be able to interact better and get more value out of your system. So it's kind of like the idea-- the way that I describe it going from the context of the manual to the context of the one, like the one person, one single user and how they are interacting with the system. This is very, very-- there's a lot of work to get here [laughter]. But I think that we can take baby steps, start with first implementation. Start with building a graph of the behaviour and how people interact with documentation and with the tools that are documented by the documentation and then use that to start recommending content. And yeah, I'm really excited about it. We started a mailing list about it at one of the meet-ups where I was presenting. We actually had one of the people that worked on the Clippy years and years ago at Microsoft who was also really excited about the idea. Because I think this is actually what Clippy wanted to do, or wanted to be, but it was not possible. And I think that graph databases could be the piece of technology that enables the dream of Clippy [laughter]. 
RVB: 16:40.452 Well, I think on that bombshell [laughter], I think that's a great time to kind of wrap up this podcast. Thank you so much for coming online, Kristof, and we'll be publishing some more details around your work and also the talks that you've been doing with the transcription of the podcast so people can read up about it. And I look forward to seeing you at one of our meet-ups, right? Because we'll be doing some community work together in the next couple of months as well. So really looking forward to that. 
KVT: 17:12.653 Likewise. 
RVB: 17:13.552 Thank you so much. Have a nice day, Kristof. 
KVT: 17:16.253 Yeah, you too. 
RVB: 17:16.990 Bye. 
KVT: 17:17.383 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Thursday, 23 February 2017

Podcast Interview with Gábor Szárnyas, Budapest University of Technology and Economics

Waw. That was probably the longest stretch that I went without publishing blogposts or podcasts over here. I have no real excuse - the start of 2017 has just been super busy and interesting - with a lot of travel that does not really help with quiet "writing" time. But it's all great fun - I just need to get back into the rhythm - and today is the start of that.

Today's podcast is actually super cool. It started at a beautiful Brussels bar after Fosdem. At this conference, there have been "graph devrooms" hosted for the past couple of years - and this year it was a really nice lineup.  One of the speakers, Gábor, did this really interesting talk about "Graph Incremental Queries with OpenCypher", which is really cool. So after the conference, it turned out we share a passion for cycling too - and we decided to get together for a nice recording. Here it is:

Here's the transcript of our conversation:
RVB: 00:04.202 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and I must confess I feel very, very guilty now because this is the first time that I'll be recording a podcast in 2017, so happy new year. In spite of the fact that it's Valentine's Day. But yeah, I was slacking a little bit but I want to bring the podcast back to life and I've lined up a bunch of people to help me with that. And today I've invited someone who I've who only met like two weeks ago at the FOSDEM Conference in Brussels. And that's Gábor Szárnyas from Budapest. Hi Gábor. 
GS: 00:42.680 Hi Rik. Nice to be here. 
RVB: 00:43.500 Hey. Thank you for joining me. It was a great time meeting you in Brussels over some Brussels beer, but yeah we talked to each other about your work and I thought it would be great to have you on the podcast. So my first question is going to be who are you, and what do you do? What's your relationship to the wonderful world of graphs? 
GS: 01:10.158 Okay. So I'm a researcher at Budapest University of Technology and Economics. And also visiting researcher at McGill University in Canada. Now I'm working on finalizing my PhD, so hopefully I will be finish it within a year or a half. And I worked basically on graph- related topics in my PhD. 
RVB: 01:33.134 Oh, very cool. And don't forget you share another passion with me. 
GS: 01:38.380 Yeah, I'm also a cyclist. 
RVB: 01:40.152 Yes, exactly. 
GS: 01:40.729 So I started road cycling three years ago and it absolutely wondered me. I really like cycling-- 
RVB: 01:49.279 Same for me...
GS: 01:50.351 --and that's my main passion. 
RVB: 01:51.948 Same for me. We have a couple of other graphistas that are super passionate about cycling so we'll have to do a ride sometime. But tell us-- 
GS: 01:59.412 I agree. 
RVB: 01:59.558 --a little bit more about your work with graphs. What's it all about, what's your PhD about, and what are you working on? 
GS: 02:07.503 Okay. So my PhD revolves around three topics that are related to graphs. The first one is how to incrementally query graphs. So imagine that you have a complex query and you have a huge graph. Now obviously, it's very difficult to evaluate a query on the graph at a very short amount of time. So basically, as a workaround, we do incremental queries, which means that if your graph changes slightly then we maintain the result sets. And this is useful for a number of scenarios. You can use it for static analysis of code bases, you can use it for runtime modelling, you can use it for fraud detection, and so on. There are many use cases that present this scenario. 
GS: 02:52.025 The second topic of my PhD is how to benchmark an incremental graph query engine. Because, obviously, once you have an incremental graph query engine, you would like to have some feedback on its performance. And you would like to use that to continuously improve your query engine. So, with my research group, we designed and implemented a framework that allows users to do just that. Compare incremental graph query solutions to each other and to other competitors. 
GS: 03:22.765 And the third one-- yes? 
RVB: 03:22.870 Is that related to the LDBC work, the Linked Data Benchmarking Council, is that related to that? 
GS: 03:30.529 So basically they have similar goals. I was actually at Walldorf last week at LDBC Technical User Community Meeting. And LDBC has a couple of benchmarks, but currently none of those covers incremental graph queries and complex graph pattern matching. I talked to the LDBC guys and also attended the talks, and it seemed that there will be a new LDBC benchmark, which will have similar goal than my benchmark. And that will be called the Business Intelligence workload for the Social Network Benchmark. And the problem with that is that it's not yet ready. So I talked to it's core developer, Alex Averbuch, and he said that it will be ready within half a year but they are still heavily working on it. 
RVB: 04:29.082 Okay. But you had said that you had three goals, right? You had the incremental queries and then the benchmarking and what was the third one? 
GS: 04:34.976 The third one is closely related to network theories. A network theory is something that came up in the late '90s in the early nodes when people started to analyze graphs. So they took a graph of people where the nodes were the people in a community and the relationships were if they were friends or not. Or they took the graph of the World Wide Web where the nodes were the web pages and the relationships were the links between the web pages. So they took all these graphs and started to analyze them, and they derived very interesting properties, chief among which was the scale-free property of graphs. There are many papers on scale-free networks, and they discovered that this is very common in biology, in sociology, also in physics and other sciences. 
RVB: 05:28.488 What does that mean, scale-free networks? What does that mean?
GS: 05:30.744 So basically scale-free network means that the degree of distribution of the nodes follow the so-called power law. So you have very few central hubs. And basically, if you remove these hubs from the network then your network will break down to smaller components. And they discovered that this is how societies are organized, this is how citation networks work, and this is how power grids work as well. 
RVB: 06:00.783 Oh wow. Just like a universal structural characteristic of lots of networks. 
GS: 06:06.958 Yes, lots of networks. Obviously you cannot apply to all of the networks but it was a very big surprise to the scientists who worked on it that a lot of networks exhibited this property. So how does my PhD research relate to that? Well interestingly, there wasn't much work performed on tide graphs. So if you see Neo4j graphs, you obviously see that you don't only have people and websites and books, but you have all these inner single graphs. So you have tide graph, and they also have different relationships between them. And only in the last five to ten years have been there research about how to characterise these graphs. These have many interesting names. Some people call them the multiplex networks, others call them the multidimensional networks or multilayered networks. Analysing these is very tricky because obviously you have another dimension of complexity by having to deal with all the types of the nodes and the relationships in the graphs, but it's kind of a green area and you can do a lot of interesting work in it. I actually applied it to engineering models, so my research group works in model driven engineering. And there are engineering models for software, hardware, state machines, system design and so on. And basically we took all these models and analyzed them and we looked for some interesting properties. 
RVB: 07:58.123 Wow. 
GS: 07:59.168 We didn't find any huge results so we didn't find that these models are scale-free or they follow some very famous distribution. But we did have some interesting results on how to characterize these models. 
RVB: 08:18.190 Wow, very cool. So could you tell us a little bit more about how you got into the graph business, or the graph science if I may call it that way? How did you get into it, and why did you get into? 
GS: 08:35.661 Okay. Well, that's an interesting question. I think it started in 2011 when I had to pick my first individual research topic at my university, and my roommate
suggested that I should give a try to node secure databases. I was already very interested in anything that's related to databases, relational or not. So I started to work on node secure databases. And then I soon discovered Neo4j and the property graph data model. And I think what really struck me is how intuitive the graph data model is. There is actually a paper by Marko Rodriguez, who was the implementer of the TinkerPop framework, and he said that graphs are very intuitive because they describe the way that people use when thinking about the world. So people tend to abstract the world as things that are somehow connected. And you can perfectly describe this with graph nodes and graph relationships. So this is something I really like about graphs. And that's something that you also mentioned in this podcast, I think a couple of times, that you can use a whiteboard and then just start brainstorming, and having ideas, and drawing a graph. And you can use pretty much the same graph in your applications as well. So that's my favourite thing. 
RVB: 10:07.046 Jokingly, I always talk about my own acronym, which is WYDIWYS, what you draw is what you store. 
GS: 10:14.439 Yeah, that's a catchy acronym actually. 
RVB: 10:18.913 It's been repeated so many times on this podcast but it is a very big strength of graphs, right? The model is so intuitive and so descriptive, so rich, really. That makes a whole lot of difference, right? So I'm reading that that's also how you got into it, right? That's also why you think it's very valuable? Is that right? 
GS: 10:43.860 Yes. So basically after I got a bit familiar with the topic, I started my master's at university. And already during my master's I was working on the incremental query engine that I'm still working on today. So it's quite a long project. I've been doing this for five-plus years. And I really liked my experience during the master's so I joined the PhD and I just finished PhD school three weeks ago. So now it's only-- 
RVB: 11:11.500 Congratulations [laughter]. 
GS: 11:13.087 Thank you. So it's only up to me to publish some more papers and polish a dissertation. 
RVB: 11:21.283 So what does the future hold, Gabor? Where is it going for you personally? Where is your research taking you, but also how do you look at this taking ground in the broader industry? What's the future hold if you had a crystal ball? 
GS: 11:36.571 So, I would really like to be an academic. I really enjoy working at university because you have so many positive experiences with students. You can pretty much follow your own dreams and do research in almost whatever interests you the most. Obviously you have to fit within your grant proposals and your funding but this still gives you a lot of way to be creative and I would like to be a university lecturer and researcher in the future. So that's my kind of dream career. And-- yes? 
RVB: 12:17.317 And is it lecturing and teaching about graphs then or is it on a broader topic or is it computer science or what will be the topic then? Or topics? 
GS: 12:26.893 Well, I'm pretty much happy to teach anything relates to computer science, so I've taught topics from database theory to automata theory, system modelling, and software engineering topics, and also some laboratories on actual technologies. So our university is a bit of a mix between computer science and computer engineering. So we teach both theoretical and practical stuff and this is something that I also really enjoy. 
RVB: 13:01.647 Super. And what about the wonderful world of graphs and graph databases, is there anything like that in your future you think? 
GS: 13:10.251 Yes. So I really would like to get a version of my graph query engine that can be used by other researchers. I obviously understand that implementing production-grade software is not really possible within the limits of a PhD. But I would like to release a system that can be used at least by other researchers, both in academia and both in industry. I talked to a lot of people about this and it seemed that people would actually be interested in trying such a system, or benchmarking such system, and see how it works for their use cases. 
RVB: 13:49.818 Super. So final question, what's your favourite cycling destination? 
GS: 13:54.706 Ooh, that's a tricky question [laughter]. 
RVB: 13:56.737 Curveball for you. 
GS: 13:56.958 But actually, it's not a very common answer. I live next to the Hungarian-Austrian border, so I do go a lot to Austria because Austria has the best roads in Europe, and also most of the country is the Alps. So I live next to the lower Alps section, but even there you have very nice hills, and drivers are really polite, and you have these super flat tarmac all over the country. And that's what I really enjoy and I'm really looking forward to the summer. So I just usually disappear from the university for a couple of weeks and then go home and cycle. 
RVB: 14:38.375 Excellent. So no cobblestones for you? Unlike Flanders Classics or something like that? 
GS: 14:44.387 I actually really like riding the [inaudible], so I live in the inner historical district of Budapest and we still have a lot of cobblestone roads. And when I just started cycling in Budapest just to get to work and commute I usually tended to avoid those sections. But since I'm more into cycling I just go for the most cobblestoney sections [laughter]. This is something that you learn to enjoy or at least you think you enjoy it. 
RVB: 15:16.963 Yeah, yeah. Exactly. Very, very cool. All right. Well, I hope we get to ride one day together, that would be great. I really enjoyed this conversation. Thank you for taking the time. And I look forward to meeting you again someday, at FOSDEM or somewhere else. 
GS: 15:32.360 Thank you, for an invitation and we should definitely go for a ride. 
RVB: 15:36.138 Absolutely. Thank you, Gábor. 
GS: 15:38.717 Thanks. Bye

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Friday, 23 December 2016

Podcast Interview with Emil Eifrem, Neo Technology

In the summer of 2015, 5-6 months after first starting this crazy podcast thing with Michael and Mark at Qcon London, I finally got my boss and friend Emil Eifrem, CEO of Neo Technology, to spend some time with me on this podcast. It was a great conversation, and I still smile thinking about the silly drumroll that we used.  But just before we wrap up 2016, it felt like it was the right thing to get Emil back on the podcast, and talk about "stuff". Here's that conversation - a little longer than usual, but totally worth it.

Here's the transcript of our conversation:
RVB: 00:02.909 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology. And here I am again. And I'm so excited, I can barely restrain myself. It's my “├╝ber boss” on the phone again. It's been 18 months since the last interview, and here I have him back on the podcast. Emil Eifrem. Hi, Emil. 
EE: 00:21.803 Hi Rik. Thanks for finally inviting me back.