Thursday, 23 February 2017

Podcast Interview with Gábor Szárnyas, Budapest University of Technology and Economics

Waw. That was probably the longest stretch that I went without publishing blogposts or podcasts over here. I have no real excuse - the start of 2017 has just been super busy and interesting - with a lot of travel that does not really help with quiet "writing" time. But it's all great fun - I just need to get back into the rhythm - and today is the start of that.

Today's podcast is actually super cool. It started at a beautiful Brussels bar after Fosdem. At this conference, there have been "graph devrooms" hosted for the past couple of years - and this year it was a really nice lineup.  One of the speakers, Gábor, did this really interesting talk about "Graph Incremental Queries with OpenCypher", which is really cool. So after the conference, it turned out we share a passion for cycling too - and we decided to get together for a nice recording. Here it is:


Here's the transcript of our conversation:
RVB: 00:04.202 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and I must confess I feel very, very guilty now because this is the first time that I'll be recording a podcast in 2017, so happy new year. In spite of the fact that it's Valentine's Day. But yeah, I was slacking a little bit but I want to bring the podcast back to life and I've lined up a bunch of people to help me with that. And today I've invited someone who I've who only met like two weeks ago at the FOSDEM Conference in Brussels. And that's Gábor Szárnyas from Budapest. Hi Gábor. 
GS: 00:42.680 Hi Rik. Nice to be here. 
RVB: 00:43.500 Hey. Thank you for joining me. It was a great time meeting you in Brussels over some Brussels beer, but yeah we talked to each other about your work and I thought it would be great to have you on the podcast. So my first question is going to be who are you, and what do you do? What's your relationship to the wonderful world of graphs? 
GS: 01:10.158 Okay. So I'm a researcher at Budapest University of Technology and Economics. And also visiting researcher at McGill University in Canada. Now I'm working on finalizing my PhD, so hopefully I will be finish it within a year or a half. And I worked basically on graph- related topics in my PhD. 
RVB: 01:33.134 Oh, very cool. And don't forget you share another passion with me. 
GS: 01:38.380 Yeah, I'm also a cyclist. 
RVB: 01:40.152 Yes, exactly. 
GS: 01:40.729 So I started road cycling three years ago and it absolutely wondered me. I really like cycling-- 
RVB: 01:49.279 Same for me...
GS: 01:50.351 --and that's my main passion. 
RVB: 01:51.948 Same for me. We have a couple of other graphistas that are super passionate about cycling so we'll have to do a ride sometime. But tell us-- 
GS: 01:59.412 I agree. 
RVB: 01:59.558 --a little bit more about your work with graphs. What's it all about, what's your PhD about, and what are you working on? 
GS: 02:07.503 Okay. So my PhD revolves around three topics that are related to graphs. The first one is how to incrementally query graphs. So imagine that you have a complex query and you have a huge graph. Now obviously, it's very difficult to evaluate a query on the graph at a very short amount of time. So basically, as a workaround, we do incremental queries, which means that if your graph changes slightly then we maintain the result sets. And this is useful for a number of scenarios. You can use it for static analysis of code bases, you can use it for runtime modelling, you can use it for fraud detection, and so on. There are many use cases that present this scenario. 
GS: 02:52.025 The second topic of my PhD is how to benchmark an incremental graph query engine. Because, obviously, once you have an incremental graph query engine, you would like to have some feedback on its performance. And you would like to use that to continuously improve your query engine. So, with my research group, we designed and implemented a framework that allows users to do just that. Compare incremental graph query solutions to each other and to other competitors. 
GS: 03:22.765 And the third one-- yes? 
RVB: 03:22.870 Is that related to the LDBC work, the Linked Data Benchmarking Council, is that related to that? 
GS: 03:30.529 So basically they have similar goals. I was actually at Walldorf last week at LDBC Technical User Community Meeting. And LDBC has a couple of benchmarks, but currently none of those covers incremental graph queries and complex graph pattern matching. I talked to the LDBC guys and also attended the talks, and it seemed that there will be a new LDBC benchmark, which will have similar goal than my benchmark. And that will be called the Business Intelligence workload for the Social Network Benchmark. And the problem with that is that it's not yet ready. So I talked to it's core developer, Alex Averbuch, and he said that it will be ready within half a year but they are still heavily working on it. 
RVB: 04:29.082 Okay. But you had said that you had three goals, right? You had the incremental queries and then the benchmarking and what was the third one? 
GS: 04:34.976 The third one is closely related to network theories. A network theory is something that came up in the late '90s in the early nodes when people started to analyze graphs. So they took a graph of people where the nodes were the people in a community and the relationships were if they were friends or not. Or they took the graph of the World Wide Web where the nodes were the web pages and the relationships were the links between the web pages. So they took all these graphs and started to analyze them, and they derived very interesting properties, chief among which was the scale-free property of graphs. There are many papers on scale-free networks, and they discovered that this is very common in biology, in sociology, also in physics and other sciences. 
RVB: 05:28.488 What does that mean, scale-free networks? What does that mean?
GS: 05:30.744 So basically scale-free network means that the degree of distribution of the nodes follow the so-called power law. So you have very few central hubs. And basically, if you remove these hubs from the network then your network will break down to smaller components. And they discovered that this is how societies are organized, this is how citation networks work, and this is how power grids work as well. 
RVB: 06:00.783 Oh wow. Just like a universal structural characteristic of lots of networks. 
GS: 06:06.958 Yes, lots of networks. Obviously you cannot apply to all of the networks but it was a very big surprise to the scientists who worked on it that a lot of networks exhibited this property. So how does my PhD research relate to that? Well interestingly, there wasn't much work performed on tide graphs. So if you see Neo4j graphs, you obviously see that you don't only have people and websites and books, but you have all these inner single graphs. So you have tide graph, and they also have different relationships between them. And only in the last five to ten years have been there research about how to characterise these graphs. These have many interesting names. Some people call them the multiplex networks, others call them the multidimensional networks or multilayered networks. Analysing these is very tricky because obviously you have another dimension of complexity by having to deal with all the types of the nodes and the relationships in the graphs, but it's kind of a green area and you can do a lot of interesting work in it. I actually applied it to engineering models, so my research group works in model driven engineering. And there are engineering models for software, hardware, state machines, system design and so on. And basically we took all these models and analyzed them and we looked for some interesting properties. 
RVB: 07:58.123 Wow. 
GS: 07:59.168 We didn't find any huge results so we didn't find that these models are scale-free or they follow some very famous distribution. But we did have some interesting results on how to characterize these models. 
RVB: 08:18.190 Wow, very cool. So could you tell us a little bit more about how you got into the graph business, or the graph science if I may call it that way? How did you get into it, and why did you get into? 
GS: 08:35.661 Okay. Well, that's an interesting question. I think it started in 2011 when I had to pick my first individual research topic at my university, and my roommate
suggested that I should give a try to node secure databases. I was already very interested in anything that's related to databases, relational or not. So I started to work on node secure databases. And then I soon discovered Neo4j and the property graph data model. And I think what really struck me is how intuitive the graph data model is. There is actually a paper by Marko Rodriguez, who was the implementer of the TinkerPop framework, and he said that graphs are very intuitive because they describe the way that people use when thinking about the world. So people tend to abstract the world as things that are somehow connected. And you can perfectly describe this with graph nodes and graph relationships. So this is something I really like about graphs. And that's something that you also mentioned in this podcast, I think a couple of times, that you can use a whiteboard and then just start brainstorming, and having ideas, and drawing a graph. And you can use pretty much the same graph in your applications as well. So that's my favourite thing. 
RVB: 10:07.046 Jokingly, I always talk about my own acronym, which is WYDIWYS, what you draw is what you store. 
GS: 10:14.439 Yeah, that's a catchy acronym actually. 
RVB: 10:18.913 It's been repeated so many times on this podcast but it is a very big strength of graphs, right? The model is so intuitive and so descriptive, so rich, really. That makes a whole lot of difference, right? So I'm reading that that's also how you got into it, right? That's also why you think it's very valuable? Is that right? 
GS: 10:43.860 Yes. So basically after I got a bit familiar with the topic, I started my master's at university. And already during my master's I was working on the incremental query engine that I'm still working on today. So it's quite a long project. I've been doing this for five-plus years. And I really liked my experience during the master's so I joined the PhD and I just finished PhD school three weeks ago. So now it's only-- 
RVB: 11:11.500 Congratulations [laughter]. 
GS: 11:13.087 Thank you. So it's only up to me to publish some more papers and polish a dissertation. 
RVB: 11:21.283 So what does the future hold, Gabor? Where is it going for you personally? Where is your research taking you, but also how do you look at this taking ground in the broader industry? What's the future hold if you had a crystal ball? 
GS: 11:36.571 So, I would really like to be an academic. I really enjoy working at university because you have so many positive experiences with students. You can pretty much follow your own dreams and do research in almost whatever interests you the most. Obviously you have to fit within your grant proposals and your funding but this still gives you a lot of way to be creative and I would like to be a university lecturer and researcher in the future. So that's my kind of dream career. And-- yes? 
RVB: 12:17.317 And is it lecturing and teaching about graphs then or is it on a broader topic or is it computer science or what will be the topic then? Or topics? 
GS: 12:26.893 Well, I'm pretty much happy to teach anything relates to computer science, so I've taught topics from database theory to automata theory, system modelling, and software engineering topics, and also some laboratories on actual technologies. So our university is a bit of a mix between computer science and computer engineering. So we teach both theoretical and practical stuff and this is something that I also really enjoy. 
RVB: 13:01.647 Super. And what about the wonderful world of graphs and graph databases, is there anything like that in your future you think? 
GS: 13:10.251 Yes. So I really would like to get a version of my graph query engine that can be used by other researchers. I obviously understand that implementing production-grade software is not really possible within the limits of a PhD. But I would like to release a system that can be used at least by other researchers, both in academia and both in industry. I talked to a lot of people about this and it seemed that people would actually be interested in trying such a system, or benchmarking such system, and see how it works for their use cases. 
RVB: 13:49.818 Super. So final question, what's your favourite cycling destination? 
GS: 13:54.706 Ooh, that's a tricky question [laughter]. 
RVB: 13:56.737 Curveball for you. 
GS: 13:56.958 But actually, it's not a very common answer. I live next to the Hungarian-Austrian border, so I do go a lot to Austria because Austria has the best roads in Europe, and also most of the country is the Alps. So I live next to the lower Alps section, but even there you have very nice hills, and drivers are really polite, and you have these super flat tarmac all over the country. And that's what I really enjoy and I'm really looking forward to the summer. So I just usually disappear from the university for a couple of weeks and then go home and cycle. 
RVB: 14:38.375 Excellent. So no cobblestones for you? Unlike Flanders Classics or something like that? 
GS: 14:44.387 I actually really like riding the [inaudible], so I live in the inner historical district of Budapest and we still have a lot of cobblestone roads. And when I just started cycling in Budapest just to get to work and commute I usually tended to avoid those sections. But since I'm more into cycling I just go for the most cobblestoney sections [laughter]. This is something that you learn to enjoy or at least you think you enjoy it. 
RVB: 15:16.963 Yeah, yeah. Exactly. Very, very cool. All right. Well, I hope we get to ride one day together, that would be great. I really enjoyed this conversation. Thank you for taking the time. And I look forward to meeting you again someday, at FOSDEM or somewhere else. 
GS: 15:32.360 Thank you, for an invitation and we should definitely go for a ride. 
RVB: 15:36.138 Absolutely. Thank you, Gábor. 
GS: 15:38.717 Thanks. Bye

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday, 23 December 2016

Podcast Interview with Emil Eifrem, Neo Technology

In the summer of 2015, 5-6 months after first starting this crazy podcast thing with Michael and Mark at Qcon London, I finally got my boss and friend Emil Eifrem, CEO of Neo Technology, to spend some time with me on this podcast. It was a great conversation, and I still smile thinking about the silly drumroll that we used.  But just before we wrap up 2016, it felt like it was the right thing to get Emil back on the podcast, and talk about "stuff". Here's that conversation - a little longer than usual, but totally worth it.

Here's the transcript of our conversation:
RVB: 00:02.909 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology. And here I am again. And I'm so excited, I can barely restrain myself. It's my “├╝ber boss” on the phone again. It's been 18 months since the last interview, and here I have him back on the podcast. Emil Eifrem. Hi, Emil. 
EE: 00:21.803 Hi Rik. Thanks for finally inviting me back.

Wednesday, 14 December 2016

Podcast Interview with Mouse Reeve, Internet Archive

Here's a lovely, conversation with a super interesting Neo4j community member: Mouse Reeve. She has been actively working on a really interesting application of Neo4j (see below) that is probably covering the most interesting and captivating domains ever: demons, spells, magic, and more. I am sure you will enjoy the following conversation as much as I did :) ...


Here's the transcript of our conversation:
RVB: 00:03.841 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and here we are again recording another podcast for our Graphistania podcast series. And tonight I have a lovely guest all the way from California, Mouse Reeve from the Internet Archive. Hi, Mouse.

Friday, 2 December 2016

Exploring the Paris Terrorist Attack network - part 3/3

Previously, on this blog, I had started writing about how we could get some of the data published by a local Belgian newspaper, De Standaard, on the Paris Terrorist Attack Network into Neo4j. In
  • Part 1, we talked about loading the raw JSON data into Neo4j, and then in
  • Part 2, we cleaned up some of the data for easy querying in Neo4j. 
So that's where we are. To wrap things up, I just wanted to illustrate some of the results and queries in Neo4j around some of the most interesting figures in this Terrorist network. I started some of my explorations around a widely reported terrorist, and Belgian national, called Salah Abdeslam.


So let's take a look at Salah in Neo4j.

Wednesday, 30 November 2016

Exploring the Paris Terrorist Attack network - part 2/3

In part 1 of this blogpost series, we got the basic Paris Terrorist Attack Network loaded into Neo4j. It looked like this:
There's a couple things that annoyed be about this graph:

  1. First, the relationships are all "bidirectional", which really clutters the visualisation. In Neo4j, relationships are always directed, which kind of makes it awkward to store these bi-directional relationships like this. 
  2. Of course, this graph was originally made by De Standaard newspaper in Flanders, Belgium, so therefore it was created in Dutch. A couple of the key concepts though (type of node, status of the node) would be easily and meaningfully translated for you to have any fun with the dataset.
  3. The graph was not "labeled", and therefore lacked some essential structural elements that would allow for fun manipulation in the Neo4j Browser. 
  4. The relationships did not really say anything about the type of relationship. 
Let's tackle these one by one.

Monday, 28 November 2016

Exploring the Paris Terrorist Attack Network - part 1/3

November 13th, 2015 - A day to remember

Just over two weeks ago, we remembered the sad anniversary of one of the most atrocious and vile terrorist attachs that our generation has seen. It's easy to forget many things in our daily rat race, but I don't think I will easily forget this video, which was all over the internet hours/days after the attack on the Bataclan concert hall in Paris:

All it takes is a drop of empathy and humanity to understand the horror that these victims went through. The sound of the one person shouting "Oscar .... Oscar... Oscar..." just keeps on ringing through my head.

Friday, 25 November 2016

Podcast Interview with Craig Taverner, Neo Technology

The interview below was long overdue - but very much worth the wait. For the past couple of years, the Neo4j community has been brewing on a really interesting add-on capability to integrate GIS-style, spatial querying capabilities into Neo4j. It's such a great and natural fit - and one of the driving forces behind this in the community has always been this global citizen called Craig Taverner. Craig has been in the ecosystem for years - first as a community member, then as a commercial customer, and now as an employee in Neo's Swedish engineering team. So about time we had a chat:

Here's the transcript of our conversation:
RVB: 00:02.785 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again, recording another Neo4j Graphistania podcast session. And today I'm joined by one of my colleagues actually, in the Neo4j engineering team, Craig Taverner. Hi Craig.