Tuesday 6 October 2015

Podcast Interview with Jesus Barrasa, Neo Technology

Over the past couple of months, we have been doing a lot of work at Neo4j to try to better explain the value of Neo4j to our prospects and customers. This has been a true team effort, with lots of engineers, marketeers and sales folks participating in articulating how complex technology can be used to add true value to business processes. And as we did that, we added some really talented people to our team. One of them is the person that I am interviewing in this particular podcast episode: Jesus Barrasa. Jesus has a lot of experience with graphs and even (a particular kind of) graph databases - so going in I knew it was going to be an interesting chat. And guess what: it was. Listen on:

Here's the transcript of our conversation:
RVB: 00:00 Hello everyone. My name is Rik, Rik Van Bruggen from Neo and here I am again recording a Neo4j graph database podcast and my guest today is all the way in the UK. Jesus, hi Jesus. 
JB: 00:13 Hi Rik. How are you? 
RVB: 00:14 Hey. I'm always scared of pronouncing your name in the wrong way. I'm sorry. 
JB: 00:19 You did great. You did great [laughter]. I've heard much worse than that so that was brilliant. 
RVB: 00:24 Okay [chuckles]. Okay, Jesus, you just joined Neo a couple of months ago as a pre-sales engineer but you have a long-standing history with graphs. You did a PhD on the subject if I'm not mistaken, right? 
JB: 00:37 That's correct, yeah. It all started probably more than ten, nearly 15 years ago, so quite a while yeah. 
RVB: 00:43 Yeah. 
JB: 00:43 And you're right. It all started in the semantic technology space, in the RDF space, so that was the first time I was exposed to modeling data as graphs and yeah. That's been a long story. After the PhD I did work for a company in London called Ontology where we did use graphs to best resolve problems in companies, mostly in the telecommunications sector and yeah. As you say, two months ago I joined the field engineering team in the London office. 
RVB: 01:18 That's super. What was your PhD about exactly then? 
JB: 01:21 Well, at the time… I did model-- I mean, I formalized a way of translating relational schemas into ontologies. Ontologies is the way you represent metadata in the RDF model. I don't know if we will have time to talk about that but yeah, it's actually an automated mapping between relational schemas and ontologies. That's what I-- 
RVB: 01:45 There's a lot of people looking at that I think, still today, I get that question quite often. 
JB: 01:50 Absolutely. 
RVB: 01:51 So another question because you've been working so long in the RDF world, what's the difference between the RDF semantic technology space and the property graph model of Neo4j - what's the key difference for you? 
JB: 02:04 All right. I'd say you can answer this question from two perspectives because obviously RDF is a presentation paradigm and there's the different implementations, the triple stores that you can find in the market but as a model I think the main thing they have in common is the fact that they both represent data as a graph. And that makes them very, very close to each other. The difference I would say is RDF is simple as it can be. It's only based on the notion of URIs to identify resources or nodes if you want to establish the properties with a property graph but there's this element called triple, subject, predicate, object. And that's all the constructs that you have to model your domain. And of course I think that's the biggest difference because in the property graph model, you have nodes with properties, you have relationships with the properties and they have this brilliant thing that's this white-board friendliness, this excellent thing that's the way you conceive it, the way you model it in your head, in your whiteboard, is exactly the way it's represented and stored physically, whereas in RDF there's still this gap where you have to translate that into triples, things that may sound simple like giving a weight to a relationship so there's a connection between Rik and Jesus because we work together. You want to give weight to this relationship. That's something that's completely natural in the property graph but it's not something you can do directly on a triple, so you have to model this relationship probably as an intermediate resource. There's a bit of a gap and I think that makes it sometimes less intuitive, sometimes a bit less humane if you want. So that's the difference. 
RVB: 04:06 Well [crosstalk] actually that was sort of what I was hoping you would say because I've always been told that the difference is really on the predicate, the fact that it's so difficult to qualify a predicate in the RDF model. 
JB: 04:20 That's [crosstalk] exactly. Then there are other interesting things in RDF, the whole idea of being able to use the model itself to represent the metadata, the ontology and that gives you in certain cases some interesting powerful things you can do like querying all the data and the metadata at the same time. But I would say the biggest thing is more what they have in common and it's the fact that they look at data as a graph, a set of connected nodes. 
RVB: 04:49 What attracted you to the graph in the first place? Why did you get into this field in the first place if you don't mind me asking? 
JB: 04:55 Well, yeah, sure. I think there's two things. One was this incredible flexibility. At the time when I started we didn't talk about NOSQL. That's a concept that was coined later. It's the idea of being able to start storing data without having to model it up front. I thought that was brilliant and that gives you an incredible flexibility, this thing of schemaless model or at least implicit schema depends on how you talk about it but this flexibility was one of the things and the other one, I'd say the possibility of infering new knowledge based on the information on your graph, you identify patterns and you can enrich your graph with new knowledge based on the data elements that you have. So I think probably these two things attracted me to this world and I find them unbelievably useful and powerful. 
RVB: 05:51 You know, as your couple months at Neo, what do you think is the most interesting use-cases that you've seen so far? Anything that jumps out? 
JB: 06:01 Sure. Well, I'm amazed-- the thing is the great thing about Neo is how you can use and update your graph in real time, at what speed you can not only read it and query itbut also keep it up to date and how it's possible to identify fraud rings for example is one of the cases that I like the most, being able to pick up the status of your accounts, your users, their information, the transactions that they're carrying out, and at the same time be able to pick up, detect the patterns that identify a fraud ring is one of the ones that I enjoy the most. 
RVB: 06:43 And do that in real time you mean? 
JB: 06:44 Exactly, the real time is the key thing and that's pretty impressive yeah. 
RVB: 06:48 It's not like with my experience a couple of weeks ago with my Amex card, I get a fraudulent transaction on Friday and I get a call on Monday [laughter]. 
JB: 06:57 Oh yeah, just in time right. 
RVB: 06:59 Just in time [laughter]. Didn't really work. Very cool. So where do you see this going, Jesus, where's the industry taking us do you think? 
JB: 07:09 Well, I think adoption is growing. It's amazing the number of different organizations and companies that we talk to. I can't think of a single vertical, a single sector that would not benefit from scenarios where modeling data as a graph adds incredible value, so I think adoption will definitely grow and that's one of the things, and then the other one that I think is going to be key as well is about integrating the graph with the rest of the data architecture. These days there are so many alternatives to represent data and some of them adequate for certain scenarios. I'm all for peaceful co-existence with all the other approaches. So integration I think is going to be the other important one and I think being able to expose the graphing in ways that make it easy to inter-operate with other stores, sometimes not all of the information is going to be in your graph but the extremely valuable information in your graph will need to be combined with some external information. That's one case where you will want to visualize it in different ways using BI tools, well you name it, there's so many elements now in data architectures that integration I think is going to be the other important aspect that we'll see developing in the next few years. 
RVB: 08:22 There was one thing that I wanted to ask you and I obviously forgot. I'm so good at this podcasting thing [chuckles] is actually you've done a lot of work on virtualization of data, right and then [crosstalk] integrate and that links to that integration story I suppose. 
JB: 08:36 Exactly. Exactly. I did work in the data integration space with a data virtualization company in the couple of years between Ontology and Neo and yes, I'm particularly interested in that and it's a way of integration data virtualization that's based on this idea of wrapping the sources and make them look as if they were relational even though they're not so they're not copying the data into centralized stores. You leave the data where it is and you define the logic to extract it and combine it and I think that was a powerful paradigm for new ways of representing data like the graph and make it easy to integrate them with other technologies and other types of stores and yes. 
RVB: 09:27 So in a case like that the Neo4j would be one of the sources of virtualization? Is that what I'm--? 
JB: 09:32 That's correct [crosstalk]. One essential one, that's the thing because the importance in the end is what value is there in your source? Neo can be perfectly, for example, in an MDM scenario. It can be the core. It can be your master model. And you can link it with the different rest of it and provided the detailed information about the entities but exactly, it would be one of the sources and the data virtualization will expose it in a way that's easily consumable by say for example BI tools in analytic scenarios or that's one of the-- 
RVB: 10:05 Are there any examples of that yet? You know, like open-source virtualization tools that integrate with Neo [crosstalk]? 
JB: 10:10 Well, there's not much to be honest. There's one quite limited community edition of one of the vendors called Denodo which is the one I used to work for. There's another one from JBoss but I'd say there's not much, Rik, available in there. I mean, JBoss would be the obvious option. I'm actually now trying to work a little bit with it and try to build some integrations with Neo and yeah. That's what you can find. 
RVB: 10:40 I think this is kind of like community call for help, you know [crosstalk]. 
JB: 10:45 Yeah [crosstalk]. I definitely really hope to be publishing something soon, at least in some idea, some small examples that can inspire people to look at these. 
RVB: 10:55 That would be great. Cool. I think we're going to wrap up here. I think we like to keep these podcastsquite short and snappy but thanks a lot for sharing your perspective. I think that was very interesting although because of my limited presentation skills a little bit chaotic [laughter]. 
JB: 11:13 Right. No, it was great [chuckles]. Great to have this chat with you Rik. 
RVB: 11:15 Thank you, Jesus. And I'll see you soon yeah. 
JB: 11:18 Lovely. Cheers. 
RVB: 11:19 Bye. 
JB: 11:20 Bye now.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a Comment