Friday 16 July 2021

Graphistania 2.0, Episode 15: The Summer Session with Emil

Well this makes me very happy: just before many of us are taking some summer vacations, and ON THE DAY OF MY 2ND VACCINATION SHOT, I am able to publish another Graphistania podcast episode - interviewing my friend and boss (how awesome is it to be able to say that!) Emil Eifrem. We talk about the world, the graph database market, Neo4j the company, and of course, the products. It was a ton of fun, and I even got Emil to agree to publishing the video recording too :) ... Hope you enjoy it as much as I did. Here goes:

And of course the video:

Here's the transcript of our conversation:

RVB 00:00:01.709 So have I got your consent to record, please?

EE 00:00:07.850 Fine. [laughter] I want to put it on the record that you have my consent to record this and release the audio.

RVB 00:00:17.102 [laughter] Yes. Thank you. Thank you.

EE 00:00:19.920 But actually, if you want to release the video too, that's okay as well. I feel like it's a good hair day today. [laughter] It's less of a hair day each day [crosstalk].

RVB 00:00:36.786 Yes, exactly.

EE 00:00:38.073 It's that? decade of my life where each day is less hair day than before. Remember when we started working together, Rik? We were so young and good-looking back then.

RVB 00:00:53.102 So long ago. All right. Let's do it.


Hello, Emil. My name is Rik, Rik Van Bruggen from Neo4j, and I am here recording another podcast - or should I call it vodcast - with my dear friend and fearless leader, Emil. Hi, Emil. How are you?

EE 00:01:13.715 Hello, hello, Rik. That was a real #dadjoke. You got that one right.

RVB 00:01:20.193 You know me. I do that all the time. Come on. [laughter] That's why I have kids, for the dad jokes.

EE 00:01:25.672 Yeah, I can hear the painful groans in the background.

RVB 00:01:32.215 So it's great to have you back on this podcast, and we kept our word. We didn't have as long of a break between our sessions.

EE 00:01:43.655 Finally. It's been like half a year, right?

RVB 00:01:48.123 Yeah, half a year, yeah, December last year, so good times. And again, what a time it has been. Quite a journey again, the past six, seven months. So I thought it would be cool to have another chat. Lots of things to talk about. We'll try to keep it at a reasonable time scale, but let's talk a little bit about the world outside first, maybe. How have you been living through the-- should I call it the remainder of the pandemic? No, how has it been going for you?

EE 00:02:24.280 Yeah, maybe let's be hopeful and say this is the tail-end of the pandemic or something like that. Yeah, I guess, when we recorded, I think it was pre-vaccine, right? Pre-invention of the vaccines, right? I think it was probably December. Keep me honest here.

RVB 00:02:38.684 [crosstalk].

EE 00:02:39.554 December, yeah, exactly. So it was before all that had been announced, right? So I guess we didn't see the light at the end of the tunnel as clearly as just a month later. It's been fascinating, right, because it turns out that maybe the death rattle of the pandemic was longer than at least I had thought. Once the vaccine was announced, I'm like, "F yeah. We're going at it. I'm going back home. I'm going to go out to pubs. I'm going to grab coffee with strangers." And that didn't quite unfold that way. But I think, at the highest level, it's been this weird kind of contradictory thing in my head, and probably in many people's heads here at Neo4j because, as tragic as this has been-- and it truly has been tragic in ways, the pandemic, right, that we don't even know yet, with kids who've been homeschooled for a year and all of that stuff, right? So the very visible kind of physical health problems, but then also much more invisible things like mental health and whatnot that we might not see-- the repercussions might not be known for decades, right?

EE 00:03:53.828 So, as tragic as that has been, kind of our business has been going so great at the same time, and I almost feel guilty saying that, right, like we're profiteering off of something. It's obviously true for not all but for a lot of technology companies, right? And so that has been a little bit of kind of this conundrum in [crosstalk]--

RVB 00:04:16.122 a real mind twister, right?

EE 00:04:18.204 It really has been, right? And while, of course, we've struggled like everyone else from a personal and practical perspective, this really has accelerated graph adoption, right, really, across the board, right, for many, many use cases in many, many verticals, right? And because we're such a horizontal business, we have exposure to industries that were really dramatically hit. We had exposure to hospitality, one of the-- actually, the biggest hotel chain in the world, Marriott, is a big customer of ours, right? We have exposure to airlines, for example, and travel, and transportation, cruise lines. Those were really, really badly hit, of course, by the pandemic. But then, that was more than well-compensated for with all the other verticals, where technology adoption, cloud adoption, and just adoption of data-driven applications far outweighed some of the slowdown in the few other verticals. So that, I think, has been maybe the high-order bid for us as a company in the broader context of the world. I don't know if you-- do you agree with that, Rik?

RVB 00:05:38.254 I do agree with that. At a personal level, obviously, so many things have happened. We actually lost a dear colleague during the pandemic, right, in the first half of this year. But also, things like working at home, these four walls, your sauna. [laughter] We have to adapt to that, right? And I'm actually wondering right now, what's it going to be like when we open up again? I know, for example, at our London office, there's 40 seats there, but by now, we have like 90 employees there or whatever the number is. How is that going to work, right?

EE 00:06:24.311 Well, and I'm wondering on much more kind of trivial stuff, like how does one behave when you talk to people in the real world? How do you say hi? I remember people were-- I think they called it shaking hands. Yeah, they reached out their hand and they shook-- so that was kind of some weird thing that I vaguely recall. I remember wearing pants going into the office. I haven't done that for 18 months. And when I'm out now in the real world, in the 3D world, I keep looking for the Zoom box where it says people's names, and it's not there, right? And so I'm just thinking through all these kind of practical, "How do you actually behave?" [crosstalk]--

RVB 00:07:09.290 I think, actually, on the handshake thing, when we did our last physical event, which was, I think, in London, March 2020, there was a big thing about the Wuhan Shake, which was you give your feet instead of your hands.

EE 00:07:24.579 Oh, that's right. [laughter] Oh, man.

RVB 00:07:28.578 That's how it started 18 months ago.

EE 00:07:32.187 That's crazy.

RVB 00:07:32.668 Absolutely crazy.

EE 00:07:34.468 People might not know listening to the podcast. At that time, the London GraphTour, I think first week of March or maybe second week of March of 2020, we were 300 people in the company. And look; if we want to be very strict about it, the pandemic will probably never end. COVID will never kind of go away, right? But by the time we are, in all geos, back in the office - who knows when that's going to be, but - it's probably not fully back until the end of the year in all geographies, right? It might even be longer, right? By the end of the year, we're going to be 600 people. So you just think about that, right? So, going from 300 to 600, so we've doubled in size, and so over half the company-- not only will half the company never have met everyone-- or half the company will never have met anyone face-to-face. And that's a crazy thing. And you know, internally, Rik, and hopefully, people can feel it; we're so relationship-centric at Neo4j, right? And man, I love-- relationships over video call is better than over a phone call, which is better than over email, and that kind of stuff, but the best thing is there's a step-function improvement with face-to-face. There just is, right? And that, I think, it's just kind of a shocking view of just how much the pandemic has changed us in the past 18 months, or changed running the company.

RVB 00:09:20.887 I mean, I see that in very real ways on the sales team, right? Selling over a video call or selling face-to-face, very different. Very, very different. I mean, it's not less successful. I can't say that at all, right? But it is very different, and--

EE 00:09:42.063 At least not in the short term. I wonder, it's a thing like, when it first initially happened, then, on the engineering side, we didn't skip a beat, right, because engineering was already very used to [peer?]-programming across offices and so on and so forth, right? But I wonder about-- there's got to be a long-term effect of not having those real relationships. I'm not going to say it as dramatic as a breakdown of trust, but trust is such a valuable currency, and I think it's at least partially built in face-to-face interaction, or at least it's accelerated by that or something like that, right?

RVB 00:10:24.330 My current hunch is that, for example, in the sales team, it's actually-- the trust is being built much more slowly. If you're building it face-to-face, I can actually build a relationship or have a real conversation with someone in, I want to say, a two-hour meeting face-to-face. On a Zoom call, it's going to take me four or five Zoom calls [crosstalk], and that just makes it difficult. But can we talk a little bit about the effect on the company and our position is the industry as well? I mean, I obviously saw your keynote. [laughter]

EE 00:11:07.162 You have to say that, right, because this is being recorded.

RVB 00:11:10.483 I have to say that. At the NODES conference in June. What an announcement was that! I mean, normally, we would have like five conferences and five different announcements, and now, we have one. I mean, that was crazy. It was the funding round. There was the trillion-relationship graph. There was new product release. It was just crazy, I mean, all those things together. How did you do that? [laughter]

EE 00:11:40.872 It was a ton of fun to put that together, right? And I guess, a few observations, right? Maybe two of the most common misconceptions or myths or just statements about Neo4j and the graph database space, ever since we got started, so for 10-plus years, has been a few things, right? Let's take them one at a time, right? One is graphs are niche. Graphs are useful, but only for a few small things. And you would hear this from the most thoughtful, credible, deeply technical people, the founders, CTOs of other database companies, right? Just some really, really absolute world-class experts at data, databases, software development, right, they would still come out, look at graph databases, and say-- it used to be-- 10 years ago, it was like, "Yeah, Neo4j, really great technology, good for social networks, and that's it." And five years down the line, they were like, "Yeah, good for social networks and recommendations and fraud detection. Oh, and identity and access management. That's what it's good for." And then, a few years later-- and so they add use cases every time, but they're lagging what we see internally by five years or something like this, right? And fast forward to where we are today, it used to be NoSQL had 30, 40, 50 companies that got real funding from good investors and that kind of stuff, right? Walking into the previous decade, there was this massive divergence, this massive explosion, this experimental phase where we went from 4, 5 relational databases to choose from to over 350 databases and db-engines today.

EE 00:13:55.685 But in the past few years, it's converged down to just a handful of companies that have achieved scale, are growing fast, are on that path to IPO. We're talking three, four, five at most. And obviously, Neo4j and graphs is one of them. And the fact that we came out and we raised literally the biggest round in database history, that got a lot of heads to turn. Even really clueful people, again, who were close to the space, close to the broader database space, were, I think, taken aback by the fact that, "Wow, graph databases probably aren't as niche as we thought." And of course, what we know on the inside and what I believe is that they have not seen anything yet because they still-- if we come back to what we recorded six months ago, we talked about the drivers of graph database adoption, and we talked about how performance is a key driver for it, right, because performance addresses an antibiotics-level pain, right? And so, if you want to do a recommendation engine - other people blah-blah-blah, other people who bought this also bought that - or capture fraud rings, or do something that requires many, many hops in a connected dataset, and you want to do that in real-time, in particular, with concurrent updates to that, there's nothing else you can do. The only thing you can do is reinvent some kind of a graph database, do a poor man's version of that, probably in memory, right? That's the only alternative you have, right? And that performance driver is massive, right, and it's huge.

EE 00:15:34.444 And so what I see people doing now when they look at the graph space is that they approximated with our performance-based use cases. So that's the recommendations. That's the fraud detection. That's [inaudible]. And because of the growth of connected-- because the world is becoming increasingly connected and datasets are becoming increasingly connected, we see more and more of those use cases every year, right? So, for example, when I first got started with this, supply chain was not a use case, right, because if you took a random company that was shipping stuff, they would have a two, three levels deep supply chain, right, which means you can't-- if you want to digitalise that, you can put it in your relational database and then there you have it, right? Today, if you're any company that is manufacturing, that is shipping physical goods, producing physical goods, you tap into this worldwide, global mesh spanning continent to continent that is not 2, 3; it's 20, 30 levels deep, right? And all of a sudden, the Suez Canal gets blocked for a week, right, by a ship, right? And then, you need to figure out, how does that cascade across my universe, right? How does that affect me and my customers? And all of a sudden, that's 20, 30 levels deep, and you have to use a graph database, right? That's a clear example of a use case that existed before, but you could use a relational database. Today, you have to use a graph database. So popping back up, when people have looked at the graph space, they extrapolated based on several more of these, what's called the performance-based use cases. And that alone is increasing in depth and breadth, right, so much so that you can see this becoming a significant part of the new database landscape.

EE 00:17:31.595 But that's only one of the three drivers of graph databases. When it comes to intuitiveness and flexibility and that developer productivity, there's all kinds of really amazing benefits to using a graph database when you have a thousand nodes, when you have a thousand records, as it were, right, just because it's more flexible. This comes back to your beer graph, right? It was never a billion-- sadly, we don't a billion beers in the world. Yeah. And so it's not like-- but still, it's so valuable to look at that data in a graph view, very valuable, almost as valuable as consuming it, drinking the-- and I think that's a huge part that people underestimate when they try to look at the graph space. So, popping back up then, I think that the funding round, it really signalled to the world that the previous decade was set up. The 2020s is our decade to shine.

RVB 00:18:37.091 Very interesting. Emil, when people ask you about-- actually, I think you had a slide on it in the keynote; what are you going to do with the money? [laughter] What was it? Skyscraper?

EE 00:18:52.409 Yeah.

RVB 00:18:53.287 Buy a yacht or whatever.

EE 00:18:55.837 And sponsor the Formula 1 or something like that. [laughter]

RVB 00:18:59.688 [crosstalk]. Exactly. No, it should be a cycling tea, obviously, but--

EE 00:19:03.992 Oh, obviously.

RVB 00:19:06.360 --your next line was product, product, product, right?

EE 00:19:11.079 Product, product, product. That is where I want to invest as much as we just can with this money. At the end of the day - you know this, Rik - we're a product company. We're a product-first company. We're a technology company. And that is where the rubber meets the road. Yes, we need amazing salespeople like yourself. Yes, we need marketing to get the word out. Yes, we even need some overhead sitting in [sauna?]-looking rooms, pontificating. But at the end of the day, the product is what matters, right? And we want to have, by far, just the best product in the space and beyond, right, in the graph data space, and broader. So that is priority number one and two and three.

RVB 00:19:59.880 Yeah, yeah. So that's a lot to do, and there was a great example with the trillion-relationship graph in the keynote, right? People should definitely go back and look at it. So maybe segueing a little bit on that topic, on the product, it's actually fascinating, I find, and it ties it back to the pandemic, that 4.3 release that we have out rift now, it's the first release, I think, that was fully developed remotely, right, that we didn't have meeting rooms or stand-up sessions or whatever it was for our processes, whatever. That's interesting, right? Have we seen anything beyond the obvious features, quality-wise, or anything else that is impacted there?

EE 00:20:59.355 It's a good question. Here's kind of some of the challenges with engineering. But I guess this is even more broadly, is that you don't have the AV test, right? And so you don't know. And in a fast-paced, high-growth environment-- we just talked about how we were 300 people when the pandemic started and, by the end of this year, we're going to be probably over 600 people. Growing so fast, it's very hard to measure output today in the pandemic, fully distributed, kind of that-- versus the release because there's 50 to 70 percent more engineers working on this release, right? And then--

RVB 00:21:37.765 [It's down?] to lines of code, [right?]?

EE 00:21:39.820 Yeah, exactly, exactly. And so it's very hard to do the AV. My subjective assessment is-- it's back to what I said before. I feel like engineering hasn't slowed a beat, hasn't missed a beat. And the concern I would have would be more along the long-term effects, right, which I haven't seen. And I think we even mentioned it in the podcast. We started seeing some of that, maybe, end of last summer, I felt like, where it's like-- I could feel tensions were higher in the company, right? And this is also around the time where, globally but even more-- I think even more in the US, just the external world was on fire, right, and it was literally on fire in California with the California wildfires. There was also Black Lives Matter and that entire thing, right? And then you walk into the election cycle in the US. So it was just kind of a crazy, crazy time, right? And I could feel it internally in the company too, where I felt like emotions ran high and tensions were-- there was more tensions than before. I feel like that has really mellowed out since then. But my concern would come back to it doesn't feel sustainable, or at least maybe we need to rewire and reprogram completely. But somehow, adjust more for those real human relationships. But to answer your question specifically, for this release, I feel like it came together really, really well.

RVB 00:23:20.350 Yeah. The one area where I feel like there's some attention that we need to spend on it is on the customer side, where you get feedback from customers, they give you directions [crosstalk] in future releases, or those types of things, things that [you probably wouldn't care?] about during [crosstalk], you know what I mean?

EE 00:23:46.829 Yes.

RVB 00:23:47.730 Yeah, how do we get that type of thing?

EE 00:23:51.678 I think that's spot-on, and it's probably some-- so I think one of the things we're kind of missing kind of internally, organisationally, is kind of-- I mean, I guess it's the classic, stereotypical water cooler talk, right? When you bump into something and you don't transact a business problem with them, but it's like, "How was your weekend? How are the kids? Did you like that-- you told me you were going to go watch a movie. Did you like that?" All that, kind of the social fabric, right? That stuff, right? And there's some equivalent with customers and product feedback, I feel like, right, where it's like we're going to get the high-order bits. If something is broken, if something is crashing, they're going to let us know, right? If we build something that's useless for them, they're not going to use it. So the high-order bit we're going to get, but some those kind of smaller, "I'm not going to bring it up in a 30-minute Zoom call," but that might be really annoying, or it might be amazing and so valuable. Some of those smaller things, we're losing the fidelity of that conversation, I think, in some of the customer conversations.

RVB 00:25:04.975 There was one more question that I really wanted to ask you that came up on the keynote. The trillion-relationship graph; what was the AWS bill like?

EE 00:25:19.481 [laughter] So it's funny. I haven't dared to look at the final result, yeah? But when they first spun it up, it was-- and so maybe popping back up, before we get to the AWS build, so we talked before about kind of misconceptions about-- I said there's maybe two broad-based conceptions about Neo4j and the graph [crosstalk], right? And niche is one, and we talked about the funding and how I think that's-- it doesn't fix it or anything, but it's been kind of a start signal to people that that's not the case, right? And then the other one is graphs don't scale or Neo4j doesn't scale, right, and that's been kind of this other thing that I've heard so many times. And it's interesting, right, because it's partially true, that is very hard to scale graphs because the standard way that you scale things in databases is you chop it up into pieces, right, and you put those pieces on a lot of servers, right? It's called sharding, right? And then, initially, we did that-- as an industry, we did that manually, right, with MySQL, for example, right? You took parts of the dataset and you put it over here. And you can imagine, if you have tabular data, right, you can either split it up - it's called vertical shard - by tables, or you can take the tables and you can split them up, and then it's kind of maybe by geography, or you have all the orders for North America over there and Europe over there and stuff like that, right? And initially, you did that manually, and then, there's a generation of databases that came out that did this automatically. It used to be called auto-sharding back then. Now, we've just simplified that and just call it sharding, right?

EE 00:27:10.478 And of course, the very nature of graph data is that it's deeply connected and related, right? And so you can't just easily chop it up and spread it out across a bunch of servers, right? If you do that, you're going to get the checkbox of sharding. It's the spray-and-pray approach. You just spread it out across many servers, and that works. Spreading it out works, but then, when you query it, you then have to hop around, maybe, if you're lucky, 1 server; more likely, 2, 3, 5, 10, 20, whatever, a lot of them, right? So your queries don't work, basically, right? The spray-and-pray approach. And there have been a lot of people who have kind of done that to get the checkbox, but even in the graph space, who have done that to get the checkbox so that they can claim that they sharded. Technically, they're right, but in terms of solving real problems for real customers, they're not, right, because it doesn't work. And what we did is that we built this internally. It's called the Neo4j Fabric architecture, and it's what's called graph-native sharding. So it's a way of sharding the graph that actually takes into consideration the shape of the data, right? And you have to do some more-- it's not magical. It's not a unicorn, solve all your problems with a-- you have to do some real work. You have to do some data-modelling, right, much in the way that, when you have a domain model, you need to figure out, "Should this be a relationship, or should this be a property?" Many times - you've seen this; you're hands-on - I could model this as a property, or I could model it as a relationship.

EE 00:28:45.443 So you have to do some thinking like that. But what you get is this absolutely exceptional performance and scalability characteristics. So what I showed there in the demo was I-- there was a social graph demo, bigger than Facebook, 3 billion nodes-- 3 billion people, which is more than Facebook has today, and then, initially, 10 billion relationships running across 10 servers, right? And ran some really hard, weird, graphy, connected, real-world queries on that, and then it ran in less than 20 milliseconds. And then, of course, we 10Xed it again to 3 billion people but 100 billion relationships running across 100 servers. So this is 1 graph database, 100 shards, right, running in an Amazon data centre, executing the same queries, still less than 20 milliseconds. And then, of course, the big punchline was 10Xing it again. So a thousand shards, a thousand servers, right, a trillion relationships, right, running those same queries, still less than 20 milliseconds. And then, even the punchline on the punchline was that, then, we ran this graph global query that had to touch all the thousand shards to compute the result, and that one came back in less than a hundred milliseconds, right? It was just spectacular. And kind of the behind-the-scenes there is that, as we spun this up, the 1,000-- this is a real scale. We actually ran out of machines in the Amazon availability.

RVB 00:30:23.482 That's so [good?]. [laughter]

EE 00:30:25.637 "It didn't work," they told us. It's like, "Oh, no, it didn't work? What went wrong?" No, Amazon ran out of machines. And just to get back to your question then, the initial estimate there-- the initial one cost $4,000 per hour. So that's, obviously, almost $100,000 per day, right, which is kind of-- that tells you just what scale it ran. Then they were able to optimise it, so it became cheaper in the end. But still, it was a massive demo and a quite spectacular scalability involved.

RVB 00:31:01.353 [crosstalk]-- in the sales team, we get the question about scale so often, right, and the only truthful answer, in my opinion, has always been, "What do you mean? What do you mean with scale? What does that mean for you?" But it's so powerful, I think, to be able to say, "Well, we ran out of machines, so it must scale a little bit." It's kind of cool. I know that we're kind of running out of time here, Emil, but so many things to talk about. We should probably do it again sometime.

EE 00:31:40.034 We should do this again. I think our goal-- we should do this like every six months, but the real goal should be to do this with a Chimay or a Duvel or a Gulden Draak. Coffee's fantastic too, but it really should be a high-quality Belgian beer in hand. But that might not be the next six-- but maybe the next one after that. Maybe the summer 2022 podcast [crosstalk]--

RVB 00:32:05.933 You've got a date. You've got a date.

EE 00:32:07.094 --face-to-face with two Belgian beers in our hands.

RVB 00:32:11.496 Thank you again. It was super nice talking to you.

EE 00:32:13.648 Awesome, my friend.

RVB 00:32:14.771 And we will do it again. Have a wonderful summer.

EE 00:32:19.860 Thank you. Thanks, everyone, for paying attention.

RVB 00:32:21.815 Thank you. Bye.

EE 00:32:23.289 Bye.

Subscribing to the podcast is easy: just add the rss feed, find the show on Spotify, or add us in iTunes! Hope you'll enjoy it!

All the best


No comments:

Post a Comment