Thursday, 10 September 2015

Podcast Interview with Petra Selmer, Neo Technology

A couple of years ago, we were organising meetups in London for Neo, and we had a really nice audience of people that showed up in every other meetup. One of these people was a really interesting lady with a bit of a funny accent, who told me that she was doing a PhD. on Graph Databases. Interesting. And now, a couple years later, we have her on our team - Petra Selmer works on our Cypher team in our Engineering department advancing the worlds greatest declarative query language :)) ... So I decided to invite her to the podcast, and lo and behold, after some interesting "summer planning", we had a great chat. Here it is:

Here's the transcript of our conversation:
RVB: 00:03 Hello everyone. My name is Rik, Rik van Bruggen from Neo Technology, and I'm very excited to be doing another podcast recording today.  My guest on the podcast today is Petra Selmer. Hi Petra.
PS: 00:17 Hi Rik.
RVB: 00:17 Hey, hey, it's good to have you on the podcast. Thank you for making--
PS: 00:20 Thank you.
RVB: 00:20 --the time. So Petra, we've known each other for a couple of years. I think I got to know you first in the London community, right?
PS: 00:30 Yes, that's right. That's right, it's been quite a while, yeah.
RVB: 00:33 It's been quite a while, but maybe you can take some time to introduce yourself to our listeners. That might be useful.
PS: 00:40 Sure. Well, as you said, my name is Petra Selmer, and I'm actually a developer at Neo Technology, specifically working with a Cypher team. That's the team that actually develops, implements, designs Cypher, the query language. I'm also a member of the Cypher language group. This is something that we started about six months ago, to ensure that we kept up the momentum of adding new features, new operators, new keywords, new semantics to Cypher keeping things rolling forward in that way. So I'm also a member of that group, and we essentially just try, and make sure that we move the language forward. I also do a little bit of work as well in the biotechnology community. That is just trying make contacts with scientists and other people in the biology, chemistry, physics communities, to try and get them enthused about graphs, that it's a really tool to help solve their problems because they've got really, really complex domains, which are very well afined with graphs. This is just all very much in the beginning stages, but that's something that we're hoping to see grow in the future.
RVB: 01:48 Did I get that right? You have an academic background in graph query languages, right? Is that right?
PS: 01:54 That's right. So I'm actually towards the end of my PhD in the flexible querying of graph structure data. So essentially, I've developed a query language, which allows users to post queries which do not exactly match the structure of the graph, but which nevertheless gets answers back to them in a ranked order, depending on how closely their query matches the actual graph. So if you don't know your graph very well, you'll still get answers back, and essentially in this way, you actually get to know your data. So it's more of like an exploratory thing. If you like it, it's a "fuzzification" of queries, and it also does inferencing as well. For example, if you asked for things relating to cats, and somewhere you've that cats are related to lions because they're both felines, you also get data related to lions as well. So in this way, it's quite a powerful mechanism to-- this is motivated actually by biological domains, where they've got incredibly complex data that changes all the time. So where the situation where scientists were just kind of stuck sometimes not knowing what path's query the graphs. So some "fuzzification" and approximation of query was necessary. That's my PhD.
RVB: 03:11 Wow, that sounds extremely interesting here. Is it related to visualization technologies in any way because that's where people tend to do those approximations, or finding those patterns with visual tools quite often, or is not related at all?
PS: 03:28 It initially was meant to be because when I began actually, where I've seen many, many avenues to explore, and in fact, visualization is incredibly important. Everybody from sort of a developer's new to the seen through to as I say, very experienced physicists. They all find visualization very powerful, but actually, it turned out that there was so much to explore in this area that I concentrated rather on the theoretical proofs of the constructs required to undertake this optimizations. But I believe there are other PhDs going on using this, and then applying visualization techniques on that as well. So yeah, visualization very important, but alas not something I [crosstalk].
RVB: 04:09 Cool. Petra, how did you get into graphs and why did you get into graphs? Could we go into that a little bit more? I've asked this question to lots of the people on the podcasts, and I wonder if you've got a perspective on that?
PS: 04:21 Sure. So I've actually been an applications developer since 1997 actually, in loads of vertical markets and loads of different companies from places like IBM through to Internet solutions providers and those sort of things. And I think it was about five, six years ago, that I simultaneously began my PhD, and I feel into it quite by accident. I was supposed to undertake a PhD in description logics, which is just some mathematical logical thing that's quite arcane. After three months, I found it really was just not for me, and I actually went to speak to the dean of the university, and he said, "Well, actually we've got this other project that we'd like a PhD student for," and it actually happened to be this flexible querying of graph data. And soon as I read the brief, I just fell in love with it. It was absolutely kind of awesome.
PS: 05:11 So I fell into it via that route, and also at that time as well, I was working for a medical research company, and the type of data we were trying to store and allow users to query was incredibly complex. In fact, the use case was to represent the entire NHS hierarchy of top-range consultants and administrators across all the trusts and networks and strategic health authorities - all these organizations. I,t was very much a graph. Same people, they'd have different rules and different contexts. So at the time, I was obviously fighting as many other people were, in doing this in a relational database. There was a lot of pain around that. It was at the time actually, I thought, "Hang on, this is definitely a perfect sort of graph problem that a graph database would usually be able to solve." But at that time, there was very little around. I think this was around about 2007 or so. It' was just before obviously, graph databases as such, sprang out into industry. So certainly when I'm coming across Neo4j, that was uh-huh, light bulb moment. Finally, very, very happy that industry had also seen the light as such because of course, graph models had been around for decades in academia, but it just never really taken a light in an industry. So very glad that's actually changed now.
RVB: 06:31 What was it that actually attracted you most, was it the modelling side of things, or what was it that attracted you most when you sort of found this matching technology?
PS: 06:42 With me, it was the modelling. It was basically, so we didn't have that impedance mismatch, and all the sort of a 90% of the time, spent on writing stupid install procedures hundreds of lines long, and basically and obviously the increased number of testing and everything around that, and just ending up with a code that was not maintainable. But also it felt to me  - to use an analogy - it was as if a surgeon was trying to perform surgery using oven gloves -  big heavy oven gloves. It was just really awful. Whereas, I actually first was introduced to SPARQL, that's the semantic web graph query language, which was miles and miles better. But then, when I came across Cypher, I thought, "My goodness, this is brilliant. This is actually now like a surgeon using a scalpel to perform surgery,"which is like it should be. It's very precise, very expressive, and you immediately can just fall in there and actually start doing very complex things. I think it's the only way in which you can write really intelligent systems - really advanced systems. I think at some point, there's a point at which you're just so bogged down by relational technology that you sort of reach that limit as to what you can do. Whereas, a graph just opens it up for you, and you start off at a very strong position, and then, as you see what you can do, you just go ever more advanced.
RVB: 08:04 Absolutely. It's great to hear you talk about that. I love that analogy, by the way. That will stick, I think [chuckles]. Presumably on Cypher, is there anything particular that you think is the main reason why you think it's going be conquering the world or stuff like that? What is it specifically that you like so much about Cypher?
PS: 08:30 It just that it absolutely, completely just reflects the graph model without any cognitive overload. So ie you don't really need to think too much about it. It just fits so beautifully well. In particular, I love the matched query, and the way you can actually describe your pattern in a very, very natural and easy way. Whereas, trying to start with something like SQL, and trying to make it graphy-like  already, it just doesn't have the elegance or the expressivity that Cypher does. So that's certainly something that grips me immediately, was just the pattern matching capabilities.
RVB: 09:05 Yes, super, super. Cool, very cool. Maybe one more final question, if you don't mind, Petra? I ask this same question to everybody. It's about what does the future hold? Where do you think this is going? How you see, how do you hope that this will change the industry?
PS: 09:25 I think really since I knew about Neo4j, I think it was now about five years ago or something-- it's amazing actually how many more people, how many more developers and others in industry, now know what a graph database is, so when speaking with people, I don't have to start right from the beginning, and sort of, "This is a graph," and spend a lot of time talking about that. They immediately already know. So what the future holds? I think it's actually limitless. I think  - as I said before - I think it will be the only way in which we can actually solve some problems. In particular, I'm thinking of my background, in what I've worked before, which was in the sciences and in the medical research arena. The database-- I think we'll be able to do so much more there that we'll be able to get better applications out there much faster to domains like say medical healthcare and places like that, in order to be able to leverage all these wonderful scientific discoveries that are going on at the same time, and therefore get wonderful research being undertaken and performed. So I think actually, it's hard to say where this will end up, but I think it will be really, really big.
RVB: 10:35 Absolutely, very cool. Maybe one more question and a little bit more personal. Do you still speak Afrikaans, or do you speak any Afrikaans?
PS: 10:42 I've been in the UK now for 16 years, but yes, I do try and speak it with any South Africans - those who know it. So I do try and keep up-to-date and in practice.
RVB: 10:57 Well, you know, Flemish my mother tongue, and Afrikaans are very much related, right? So next time we can practice together maybe [chuckles]?
PS: 11:03 Indeed, indeed. That'd be good.
RVB: 11:06 Very cool. Thank you so much for coming on the podcast, Petra. I really appreciate it, and I look forward to seeing you soon.
PS: 11:12 Thank you. Your'e welcome.
RVB: 11:14 Bye.
PS: 11:15 Bye-bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a Comment