Tuesday 21 April 2015

Podcast Interview with Peter Neubauer, Mapillary (and co-founder of Neo4j)

Today is a special podcast episode. I got a chance to talk to one of Neo4j's founders, Peter Neubauer, again - which is always fun. I remember one of the first conversations I had with Peter, where he was explaining something to me over a skype call - and he was at the same time pushing a core Neo4j bugfix to github. Just to say that he is pretty awesome and incredible at multi-tasking :)) ... Peter left Neo last year as an active team-member, to start a new project that "looks just as impossible", Mapillary. Take a look at it for sure - but listen to the podcast first:


And here's the transcription of the chat:
RVB: Good morning everyone. My name is Rik, Rik Van Bruggen, from Neo Technology, and here we are again recording a remote session for our graph database podcast. I'm joined today from Sweden. Peter Neubauer is on the other side of the line. Hi, Peter 
PN: Hi, Rik. Nice to meet you. 
RVB: Yes, good to be on the phone with you again. It's been a while. Peter, if you don't mind - most people will know you - but, would you mind introducing yourself a little bit for people that don't know you yet? 
PN: Yes. Regarding Neo4j, I'm one of the three founders of Neo4j together with Emil and Johan who are currently working on Neo Technology. Actually, it was us three who came up with the first version back in 2002 and wrote the first version that went into production. 
RVB: That's a long time to go, huh? 
PN: That's long time ago, yes - 13 years. 
RVB: Absolutely. It's been quite a ride. How did it start Peter? How did you guys get into Neo and where did it come from? 
PN: It did start with us having written or gone into content management systems. And we are at that point managing images, and one of the major problems there was that every photographer, every picture agency, had their own rights management for when the image could be licensed as to whom, in which country, and so on, so there wasn't a lot of business logic about that. Modelling that in, what then was, Informix, and it was a server, our database that was one of the most capable database engines at the time - and object relational database engine-- 
RVB: Didn't they get acquired by IBM, Informix? They did, right? 
PN: Yes, they did. And this was the last version, 9.14, that came out before they were acquired. And we were trying to model our business logic in that engine and it just didn't scale. We got into like five, six joins, and even with the, at the time, modest data in that database, it was just taking ages, like minutes, to get answers back, and that's not good enough for a backend that needs to surf on the web. At that point, we examined our system architecture and found out that the database was the bottleneck. And we also at that point had some of the Datablades or plugins to Informix, and one of them was dealing with the semantic words for translation, namely WordNet, a semantic initiative to structure the English language. And that kind of projected in network model off these connected words, like hyponyms and synonyms and concepts into the database. And we saw that and thought like, "This is a very interesting approach to model data." It's very close to the UML diagrams and so on, if you translate it to our domain. We tested that plugin for our domain just to see if the model fit, and it did. However, it was still slow, so since it was such a beautiful match, we then went about and actually wrote, at that point, a Java Enterprise JavaBeans, 1.0 implementation that modeled that kind of structure what is now-- and that actually had the first kind of Java version of what is now the Neo4j API, all these fleshed out. 
RVB: Did you call it a graph API at the time, or do you call [crosstalk]? 
PN: No, no, no. 
RVB: What did you call it at the time? 
PN: We called it a network database or network engine, and that's where Neo partly comes from. Of course, the matrix is very popular but it also stands for network engine of course, so we had to make it work [laughter]. 
RVB: Fantastic, okay. I don't know if I've ever told you that, but one of the projects that I've first worked on was a project for DHL which was also using Informix Datablades. 
PN: It was a very good database. 
RVB: Super. Where are you now, Peter? What are you doing now? You're working for a new startup, right, Mapillary? 
PN: Yes. I left Neo Technology last year, mostly because I found-- I'm and early startup guy and Neo4j has a big group of followers now and there's so much activity around it so my feeling was that I can't invest my time in something that is, again, almost impossible, so I joined Mapillary as a co-founder and the vision there is to do a visual representation of the whole earth possibly even a 3D model connected to it. So that's what we're doing. People are submitting basically thousands and thousands of images taken by their smartphones or action cameras and so. And we in the background do a lot of computer vision and analytics on this data and we connect the images into what could be described a big, global, giant graph of visual information. So Neo4j is an essential part of  the architecture there. 
RVB: Oh, is it? What do you use it for? What do you use Neo for? 
PN: We use it for connecting the analyzed images both in space for instance so you have actually this connection between one image and the nearest images in different directions, and then even connect computed visual connections. For instance, one image overlapping another image so if two images look at the same view of the turning torso, then we will know it and we will actually create a connection in that image graph in-- from this image, you can translate the object, turning torso, into something that can merge into the others, so we know how to project and we even store the 3D point cloud of these objects in Neo4j [inaudible] references too in Neo4j. The whole navigational logic, if you then want to construct, for instance, a street view from these millions of images, it's done in Neo4j but because it's a perfect use case for Neo4j. Basically fetch all the connected images in the vicinity of say a connected images that 3 or 4 or 30 if you are going to fast forward then prefetch these into a local graph in JavaScript and do that along certain rules while you are traversing the backend graph. For instance, time filtering, or filtering just certain color, shades, or certain directions or what not. 
RVB: Super. That's really interesting. I think people can get involved with the Mapillary project as well, right? There's like an app that they can download and then you can participate in the project, right? 
PN: Yes. Anyone can submit pictures and anyone can help improve the data. It's like OpenStreetMap of Wikipedia, so you can improve even, for instance, street sign detections and object detections that we do in the images and feedback to for instance the OpenStreetMap project and to Wikimedia. 
RVB: Cool. Peter, maybe one more question because we keep these podcasts quite snappy. Where is this going? Where is Mapillary going? Where are graphs going? Any vision on that? Do you mind sharing that? 
PN: I think the concept of connected data is growing a lot and people are expecting and willing to put in much more effort into making data connected. That is not just on the global linked data initiative level, but even on pragmatic in-system level. So where I see graphs going is that they approach enterprise. Enterprise Connected data are even in normal installations, and as we see it here with a lot of developments in virtualizing hardware and so, you can partly build bigger monolithic kind of graph blobs. In Mapillary we have now over 1 billion properties in the database within one year, and that is one thing. The hardware is letting you scaling up these installations quite a lot, so you can scale up quite easily. And the other thing is that sharding graphs will be the forefront of data science. That is one of the remaining kind of challenges with graphs. They're very easy to query and so on, but sharding them is not trivial. 
RVB: So difficult. Yeah, yeah. 
PN: It's difficult, yeah. 
RVB: I'm sure you've heard of the work that Jim Webber and Co have been doing on that, and we are really in the middle of starting that project again and making some good progress there. 
PN: Yeah, I'm really excited about it. Right now, in Mapillary, we will go for bounded boxes and shard by geography, and if you have a domain that lets you [shard?] it in a kind of interesting way, then you can do this already now, but auto-sharding would be awesome. 
RVB: Super. Peter, thank you so much for coming on the podcast. It was pleasure to talk to you again. I really appreciate it. I wish you so much luck and pleasure and drive at Mapillary, and thanks again. I look forward to speaking to you soon. 
PN: No problem. Nice to talk to you too, Rik. 
RVB: Cheers.


Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a Comment