Wednesday 31 August 2016

Podcast Interview with Dirk Vermeylen, HP Enterprise

Last month I had the pleasure of interviewing one of my fellow countrymen, a Belgian graphista living a short distance from my home in Antwerp, who had submitted a very cool graphgist to our challenge earlier this year. Dirk Vermeylen did a great gist about trying to model sports results as a graph database - very interesting, so I will let him explain it to you himself. Here's the recording of our conversation:
Here's the transcript of our conversation:
RVB: 00:02.538 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and I am here again recording a podcast episode with someone in my own country. That's actually very, very-- I don't think that's happened before. Dirk Vermeylen, from HP Enterprise is joining me from a couple of miles down the road on our phone call. Dirk, welcome. 
DV: 00:23.708 Thank you, Rik. Thanks. 
RVB: 00:25.290 Cheers. Thanks for coming online. Dirk, we got to know each other because of some GraphGist that you created recently which we'll talk about more a little bit later, but why don't you introduce yourself to our listeners first of all? 
DV: 00:41.285 I am working for HPE, used to work for EDS. So that was taken over by HP then became HPE and there is some changes in the future as well. I work as a consultant in the service delivery and infrastructure environment. 
RVB: 01:04.137 Okay and then what's your relationship to the wonderful world of graphs? How do you get into that if I may ask? 
DV: 01:13.628 It is more or less by accident. I am-- as part of service delivery and a lot of work is done on configuration databases on CMDBs, where what we need to do is migrate CMDBs and varying CMDBs on for example, what is important is you want to know what service are implemented in specific data centers and what applications are related to it. There are many, many relations in between like server databases that are installed and instances at business services that are configured to it, so you only have all sets of relationship. We need to work with it and I was looking into better ways on how we can manage the data on it. Then Neo4j was mentioned when reading the internet, so at some point in time I thought I may get this a try. I did it and it worked in fact, [?] though. 
RVB: 02:18.972 I think there's a lot of people out there that are using Neo4j for configuration management databases because of the relationships and review and the impact analysis that allows you to do. So, if something happens in your configuration, what's the impact on the rest of the configuration. Is that also your case or was that your background as well? 
DV: 02:41.301 Exactly. Exactly. And the advantage is that when you do queries, actually Neo4j allows to document and very well decipher way of working is the very nice way of documentation. It reads easier than you do with complex SQL queries whereas we join and all of that that's a bit more tricky to understand what you have been doing a couple of days later. 
RVB: 03:05.779 Absolutely. I have so many people that talked to us about that. That's great to hear. But you ended up writing that GraphGist for something completely different, right [chuckles]? What was that all about? It was about your running partners, right? 
DV: 03:21.132 That's right. I was working on CMDBs also a little bit in my free time. At some point you have done and you're ready with it. At that time the GraphGist challenge appeared and it says try to do this problem if you can do it on a wide board, it will work. In my free time I tried to collect information about our running competition which is just like we go running with a couple of friends. And first at the rising in the race gets 50 points, second 45, third 40 points, and so on. And then we just add them all up together and at the end of the year you have a winner for motivation. We keep track of this points in the Excel spreadsheet. I want to automate it because automation is fun. And I tried to make a big draw. The problem-- and it worked very well at the same time as the GraphGist challenge, so I thought probably I should spent a little bit more time and work on this GraphGist challenge. 
RVB: 04:33.723 Super cool. We'll put a link to your actual GraphGist on the blog post with this episode as well. What was so nice about it? What was it that made the GraphGist and Neo4j such a good fit for that particular running assignment that you wanted to solve? 
DV: 04:53.803 GraphGist was easy to use, and you need the AsciiDoctor which is like-- you take notes-- part plain text and then with very little markup or markdown, as it was called, you can specify query. And see the results actually directly in the GraphGist. And Neo4j is very visual so you create-- when one query, you see a little bit of the dots already and you say "Ah, I need a second query." And from the second query you go to the third query. So, it's very visual. It allows you to very quickly progress in your problem fields. 
RVB: 05:36.505 Very, very nice. I bet your running buddies were happy with the result [chuckles]. 
DV: 05:44.441 I didn't show it to them already. Part of the thing is that I used test results which means the races are correct, the people are correct, but the sequence of rival is not correct in the GraphGist. So I'm not sure if they will be so happy with me what I assumed there. 
RVB: 06:05.157 You made yourself win every time [laughter]. 
DV: 06:08.787 No. Not me. Someone else. But it's-- really it's test data. 
RVB: 06:13.784 I understand. Very cool. All right. So what are you going to use this for in the future, Dirk? Do you have any more professional or personal plans with this? Or where do you want to take this in the future? 
DV: 06:26.347 And part of the GraphGist as well was to get more experience with cypher and I used it for CMDB to challenge with configuration management databases that you have like 20,000 objects and 18,000 relations. So if you launch a query, it may or it may not end up with the result that you are expecting because of lack of experience. With my very small example, it's a lot easier to understand what you are doing. And where I want to take it in future is configuration management databases, of course. Also, on open data-- I'm more and more involved in open data. And open data is all about linked data. And with linked data, you are very close to Neo4j and these graphical relations again. 
RVB: 07:19.085 Super. Well, I mean, I think we will meet again there. Because that's exactly the type of stuff that I've been working on in the meetups here in Belgium as well. And I'm sure we'll have a chat about this in the future then. Very cool. Anything else you want to add, Derrick? Is there anything that would should be paying attention to in your work? Or otherwise, I think we'll keep this podcast recording short. 
DV: 07:45.950 That's perfect. I'm happy with what Neo4j was doing already. I was working on Neo4j version two. And py2neo is very important. The py2neo library. I had a little bit of issues that I've seen-- that the new version of Neo4j and the new version of py2neo solve them all. So I'm looking forward to play around with these. 
RVB: 08:10.485 Well, it's summer time. You know, this is perfect time to experiment with those [crosstalk]. 
DV: 08:16.662 Exactly. 
RVB: 08:16.236 All right, Dirk. Thank you so much for sharing your work with us and with the entire community. I really appreciate it. And thank you for coming online to do this recording. And I'm sure we'll meet very soon. 
DV: 08:28.730 Thanks, Rick. Thanks. It was nice doing this. 
RVB: 08:30.685 Thanks a lot. Bye. 
DV: 08:31.662 Thanks. Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Monday 29 August 2016

Orienteering with Neo4j - solving the 1000-control race with the Dijkstra-APOC - part 2/2

In part 1 of this blogpost, I explained how you can use the Awesome Neo4j APOCs to calculate a weighted shortest path on a graph with a more optimized and more efficient algorithm, based on Dijkstra's work. In this second and last blogpost on this topic, I would love to explain a bit why I think this is pretty much a very big deal. APOCs give you access, from Cypher, to a whole slew of graph algorithms, many of them very useful for all kinds of different graph operations.

Orienteering - a bit more complicated in the real world

One reason why I wanted to write this second post, is of course because my lovely sport - Orienteering - is of course a bit more complicated in the real world than what you have seen in that little park run that I talked about in the previous two posts. To give you a feel for it:

  • Here's an excerpt of my run in the World Masters Orienteering Champs a few weeks back in Estonia. More details over here - but I can tell you that for each and every one of these legs there's at least half a dozen different route options - and of course my course had 20+ control points too. So a bit more of a bigger graph anyway!
  • And here's another example: actually being run today (August 25th) is the actual elite's World Orienteering Champs on the long distance. Just. Look. At. This. Map.

Thursday 25 August 2016

Orienteering with Neo4j - moving from Cypher to the Dijkstra-APOC - part 1/2

So last July, my dear colleagues at Neo4j decided that they would tweet about a blogpost that I wrote 3 years ago.

The post was first published on my own blog over here, and then re-blogged over at the I also wrote a graphgist about it at the time, which I have revisited on the graphgist portal just now.

Some context

This entire thing started that summer with a blogpost by my friend and (at the time) colleague, Ian Robinson, about using a clever cypher query to calculate the weighted shortest path over a (small) graph. I decided to use that mechanism and apply it to my lovely hobby/sport: Orienteering. Pathfinding through forests, parks, cities - it's what we do in that sport, all the time. And efficient pathfinding in this environment, requires you to read the map, understand what the fastest route is, and run that as fast as you can. Effectively, when you want to "understand the fastest route", you will be weighing different alternative route choices against one another, and - as quickly as you can - choose that one for your run. It is, in effect, a total graph problem, a total "weighted shortest path problem" on a detailed map of your surroundings. So I used Ian's approach, and applied it to a small graph of an orienteering excercise in an Antwerp park.

Tuesday 23 August 2016

Podcast Interview with Daniel Himmelstein, University of Pennsylvania

So today's podcast episode may well be one of the most interesting that I have ever published - and we have had some darn interesting episodes, if you ask me :) ... I got to know our guest, Daniel Himmelstein, by his great graphgist on "Drug repurposing by hetnet relationship prediction". Really interesting stuff - and Daniel actually got his PhD on this topic too. I found this video of his Thesis Seminar if you want more detail:

But for now we will just have a great conversation about his work. More interesting links below in the transcription - as usual.

Wednesday 17 August 2016

Podcast interview with Stefan Plantikow, Neo Technology

Today's episode in the Graphistania podcast is one that I have really been looking forward to, for many reasons. First of all, our guest is such a lovely guy - feels like I could go out on a VERY long pub crawl with Stefan - seriously. Then, he has been working on some of the most interesting topics in Neo4j - another bonus. Most recently, he has worked on the "swiss army knife" of Neo4j tooling, the Awesome Apocs. Enough reason to have a good podcast chat together - and here that is:

Here's the transcript of our conversation from July 4th, 2016:
RVB: 00:02.518 Hello everyone, my name is Rik, Rik Van Bruggen from Neo, and here we are again, recording another Graphistania podcast, and today I have one of my lovely colleagues from the engineering team with me, Stefan Plantikow from Berlin. Hi Stefan.

Friday 12 August 2016

The Great Olympian Graph - part 3/3

In part 1 of this blogpost series, we created and prepared a dataset with all the modern Olympian medallists from 1896 to 2012. In part 2, we loaded all that data into Neo4j: here's that article. Now, we of course want to do some interesting queries on the dataset, and see if the Graph will yield any interesting insights - as it usually does.

Easy querying - number of sports per Olympic game

Let's start with something easy - doing some counting of the numbers of sports in every game since 1896. Here's how we do that:
//number of sports per game 
match (y:Year)<--(e:Event)-->(d:Discipline)-->(s:Sport) 
with distinct as game, as sport 
return game, count(sport)order by game ASC
Then you can see that the number of sports has not really changed that dramatically over the years: In the early days we immediately went from 9 sports at the first Olympic game in 1896, to 19 in 1900.

Wednesday 10 August 2016

The Great Olympian Graph - part 2/3

In the previous blogpost of this Olympic series I explained how I got to the dataset in 4 distinct .csv files that get generated from a 4-worksheet Google Spreadsheet. Here are the links to the 4 sheets:
Now, in order to load that data into Neo4j, I had to come up with a meaningful graph model.

Monday 8 August 2016

The Great Olympian Graph - part 1/3

After my previous experiments with some sports data (most recently, the Tour de France 2016 results) in Neo4j, I recently saw the 2016 Olympic games coming up, and thought: well, there MUST be some interesting datasets to find around that - especially now that one of my favourite bike-riders in the world, Greg Van Avermaet, won the Gold Medal in the Cycling Road Race. Still so excited!!!

I did a bit of research and decided to settle on a combination of two datasets:
Just before the London Olympics in 2012,

Wednesday 3 August 2016

Graphs @ Radiolab

So I go for my morning run the other day, and I put on my 2nd dearest podcast (after Graphistania, of course) - Radiolab. They have the most amazing stories that make me laugh, cry, read and research - and guess what: this episode is about GRAPHS!

Listen to this episode:
telling the amazing story of connectedness between soil, fungi, trees, and animals... aka the Wood Wide Web. The "internet of fungus", as the Beeb calls it.

Check out this TED talk too:
or this article about the "Intelligent Plant" and the connections that exist there.

If ever we needed more proof:


even on your daily podcast!



Tuesday 2 August 2016

Podcast Interview with Dave Bennett, Nulli

So, in an age that feels like the distant past (in other words: before my holidays), I had a really great conversation with a "small world" neighbour of mine, talking about two of my passions - one older and one more recent - the interesting relationship between graphs and identity & access management. I wrote about this some time ago already, and it continues to be one of the more interesting and popular use cases for graph databases.

So that would be a great chat from the get-go - and here it is:

Here's the transcript of our conversation:
RVB: 00:02.575 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology and here we are again recording another podcast episode. Today I'm joined by I think it's the first Canadian that I have on this podcast, Dave Bennett from Nulli. Hi, Dave.