Monday, 3 September 2018

Podcast Interview with Johannes Unterstein, Neo4j


A couple of months ago, we had a great Online Meetup that was all about scaling out Neo4j using containerisation and container orchestration technologies. You can see the recording over here:


That was really cool, and a great execuse to invite my nowadays *colleague* Johannes Unterstein to the podcast. Johannes has a really interesting history and a lot of expertise in these technologies, and could really talk about them for our audience. So here's our chat:



Here's the transcript of our conversation:
RVB: 00:00:00.399 Hello, everyone. My name is Rik Van Bruggen from Neo4j, and here I am again after the holiday period recording another Graphistania podcast. And today I have the pleasure of welcoming one of my dear engineering colleagues on this podcast episode, and that's Johannes Unterstein from Germany. Hi, Johannes.
JU: 00:00:25.374 Hi, Rik. 
RVB: 00:00:26.621 Hey. Thank you for joining me. Really appreciate it. 
JU: 00:00:30.150 Thank you for inviting me. 
RVB: 00:00:31.615 Yeah, absolutely. Johannes, yeah, I know a little bit more about you now, but probably our listeners don't yet. Why don't you tell us who you are, and what do you do, and also don't forget to tell us about a little bit of your history and how you got to know Neo4j, right? 
JU: 00:00:49.304 Yeah. My name is Johannes. I'm working for the cloud team. So we're currently building a product for a managed Neo4j version, so you can go to neo4j.com/cloud and check it out. And, basically, what we're trying to do is bringing the Neo4j causal clusters in the cloud. And this means that the customer can go to website, register, and decide where he wants to run the Neo4j clusters, like in which cloud provider and in which region, and we're making sure that your cluster is always up and running and happy to host your graph. Yeah, so that's what I'm working on currently. Before I joined Neo4j, I worked at Mesosphere. This is the company who's mainly contributing to Apache Mesos. It's also a platform for container operations together with big data and fast data operations, and it's also shipping the product called DC/OS, which is like the bundle of all these technologies. 
RVB: 00:02:02.585 That sounds like a great fit, right? I mean, you're in the cloud offering, you probably have a need for some kind of containerization, and you work with Neo4j, so it sounds like a great combination, isn't it? 
JU: 00:02:20.435 Yeah. So I known Neo4j for quite some time, so I met Michael Hunger-- most of your podcasts are mentioning Michael Hunger.
RVB: 00:02:31.562 Michael is the spider in the web, Johannes [laughter]... 
JU: 00:02:35.904 Yeah. So I met Michael Hunger, I think it was, 2012. So it was doing a tour through Germany visiting some Java User Groups, and I'm running a Java User Group in my hometown, Kassel. And he was in our user group, and we were so amazed by graphs. So we started a small side project doing a climbing database with graphs, and it was quite funny because Neo4j's great for it because we categorised a lot of climbing routes. And then you could find climbing partners in the same area of you and climbing similar routes, so this was quite funny. 
RVB: 00:03:17.409 You mean rock climbing - right? - or something like that? 
JU: 00:03:18.910 Yeah. Yeah, rock climbing, exactly. So this was super funny. And then I did project business back in the days for big German customers, and I tried to pitch Neo4j a couple of times. I had some success for internal projects, but my big German customers were mostly in the SQL world back in the days and didn't want to change that. But I was always fan of Neo4j, but my interest for containers was also so big that I decided to go for Mesosphere a couple of years ago. And then I gave a talk at GraphConnect in London last year, and I met Ben, which is the team lead for Neo4j Cloud, after my talk, and we had a short conversation.

And then I met Ben again end of last year, and we talked about the challenges he's going to be facing with Neo4j Cloud. It was so amazing. And then we came to a conclusion that it would be a great fit for me to join the Neo4j Cloud team because it allowed me to work with graphs, that I'm so enjoying, and also work with containers and-- yeah, so it was quite amazing for me. 
RVB: 00:04:42.585 Super. Super. So maybe I can ask you a little bit about why you like  graphs so much. What do you think is so attractive about it? And maybe you can also expand that and say why you like graphs in a containerized world so much. 
JU: 00:04:57.232 Yes. So when I started doing project business, I faced some hosting companies hosting the software we're developing for these big German customers, and they claimed that they were multi-database providers. And what they meant was that you could choose between Oracle 7, 8 and 9. And [laughter] this was quite challenging for us because they had also strict regulations on table naming. So you needed to prefix your actual table with like 10 characters for the project and then another 10 for whatever, and then you have 6 characters left for your actual naming scheme. And then everything had to be super normalised and blah, blah, blah. So this was quite painful because in this area, there were a bunch of people checking this bird-view architecture of your application, and they're checking your database schemes because that's one of the few things those people understand very well. And then you were forced to super normalise the database in order not to duplicate data, and then it's not performing because you actually want to duplicate data at some point because you want to have performance benefits. And this was quite annoying because we did really, really big applications for them, and they were sometimes so slow because of these restrictions. And we argued this would be so cool if we would have a technology where our relation would be a first-class citizen in the database. 
RVB: 00:06:45.174 I can feel where you're going with it [laughter]. 
JU: 00:06:47.896 Yeah. And then, when Michael came and just said, "Wow. Look, this is the amazing world of graphs, and you have the relations, and you can forget everything about [joints?]. You can just traverse your graph." This felt so easy and so mature, but this was really amazing. And then we did some community work together with Michael. So my good friend, Sebastian, he's working on this print data integration for the Play Framework, and he maintained this for a couple of years now. So we kept in contact and did some community work. And then I joined Mesosphere, and Michael wrote me a message and said, "Well, look, you have this marketplace in these [inaudible]. So similar to an app store, maybe, for your firm, this [inaudible], like this data centre operating system, which is Mesosphere's offering, has an app store. I think it's called Catalog nowadays. It was called Universe back in the days, but I think it's now called Catalog. And what you can do is you can browse to your favourite application, like a Cassandra, or Kafka, or Neo4j. And you can browse and say, "Hey, I want to instal the Neo4j cluster into my whole data centre." And then you can do some basic configurations like how many nodes you want to start, how about your credentials, and how many CPU and memory you want to assign to your application. And you can double-click, and then it will instal a Neo4j [quota?] cluster to you through your data centre. And then DC/OS will make sure to keep it running until you decide otherwise. And we did this package together, and I think we released it beginning of 2017, I think. 
RVB: 00:08:55.075 Yeah, they're still on that. You can see them there, yeah
JU: 00:08:58.313 Yeah, yeah. So this was-- 
RVB: 00:09:02.233 Is it left in the community right now, or what's the status of it? 
JU: 00:09:09.664 I think it's slightly outdated currently. I think we need to lift it to the latest Neo4j version. So, currently, our focus is also a little bit shifted-- not much shifted, but we're addressing more and more cloud partners currently. So David Allen is currently doing a great job in promoting or developing Neo4j on Google Cloud. So he released a big package a couple of weeks ago where Neo4j is now available in the Google Kubernetes marketplace. So this is kind of similar, but when you go to Google Kubernetes engine, you get a managed solution. So you can say, "Hey, I want to have this amount of service, and please add more service as we go." So you get the full managed package. You get the managed infrastructure, and on top, you get Neo4j running in the managed Kubernetes, which is really cool. 
RVB: 00:10:18.394 Very neat. So, yeah, we'll put some links to some of these things on the podcast's transcription - right? - so our listeners can find their way around a little bit. But maybe we can talk a little bit about where this is going. Obviously, there's a lot of stuff going on with the cloud product, but where do you see the industry going? What's your take on that? Let's look into your crystal ball. 
JU: 00:10:45.484 My crystal ball. I think managed service is becoming more and more mature. So I talk to a lot of people, and more and more people are moving to the cloud, but mostly, currently, they're struggling in putting stateful applications into containers. Because if you're doing this, you need to take care or be concerned about more things in comparison when you're putting a microservice in a container. And a lot of people are using lambda functions, and serverless technologies, and all that kind of things, which are super helpful because it speeds up development quite rapidly and also make operation less complicated if you outsource the actual operation. And managed databases fits in this concept perfectly. So if you're in one cloud provider and say, "Hey, I want to have my managed database. This should grow as we grow." Then I can connect all my microservices and all my serverless technologies to this managed database. I don't need to worry about data loss or data corruption, all that big and complex operational topics. And I think this is for smaller companies or for smaller startups as well as for midsize to big companies. Pretty awesome that they don't need to be experts in running container orchestrators. They don't need to be experts in running databases. They can focus on their actual business logic, and focus on getting their business running, and don't focus on being experts in ops topics. 
RVB: 00:12:42.013 Do you think it's more of a public cloud thing these days, or is it also large organisations like some of the bigger German companies doing this themselves and organising private clouds? How do you see that? 
JU: 00:12:59.965 I think Germany's a little bit special, so the German market is really concerned about data protection and where data is hosted. So, probably, big German companies would start with the private cloud. But I also saw some kind of hybrid mix, so where their core data centre is somewhere on-premise, in their own data centre, but they burst out workloads. So when they have a big marketing campaign or if it goes to Christmas business, they burst out to public clouds and have this kind of hybrid operational thing where they have the core business on their own [inaudible], and they add additional sources in the public cloud provider. But I think in the long term, a public cloud will be more and more scalable. So I think there could be a movement more towards public clouds. 
RVB: 00:14:08.118 I think a lot of people are looking, yeah, at that intermediate scenario - right? - the virtual private cloud where they get some assurances from the public cloud to infrastructure around privacy, around security, and those type of things. And that's kind of a neat compromise, isn't it? 
JU: 00:14:28.207 Yeah. So if you can combine it, yeah, this is amazing. So if you span a virtual network between your private and your public cloud, and then maybe expose only the endpoints from the public cloud provider to the Internet to be more flexible and [move into graphic?], you're good to go. I saw some big open source users of Marathon. It's the container orchestrator for Mesos. And they build their own data centres straight beside the Amazon data centres. So they could switch and had a lower [latency?] between their private bare metal and the cloud instances. So this was a kind of cool setup to combine both worlds. 
RVB: 00:15:21.269 Yeah. Oh, sorry about that. I was [inaudible] here. Hey, I think this was a great conversation. I think we should wrap it up now. Otherwise, [laughter], we will start boring our listeners. But we'll put some links on the transcription, as I said earlier. And then I think there's going to be so much great movement in this industry. I think we can maybe have you on the podcast again next year or something and talk about the state of the industry at that time, no? 
JU: 00:16:01.709 Yeah, we can talk about cloud all day long if you want to [laughter]. 
RVB: 00:16:06.397 Johannes, thank you so much for taking the time, and I look forward to seeing you at one of our conferences soon. 
JU: 00:16:14.814 Yeah, thank you very much. 
RVB: 00:16:16.341 Cheers, man. Thank you. 
JU: 00:16:17.675 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a comment