Monday, 10 December 2018

Podcast Interview with JEP, the Graph Database

Alright - here's something special for you. For the past couple of days, I have been listening to - no, I have been DEVOURING all the episodes published in the "Everything is alive" podcast series published by Radiotopia. It is such a great show. Funny, interesting, sad, thoughtful, and ... inspirational. Because - what would happen IF EVERYTHING WAS ALIVE??? I could not not think about that - and specifically, I thought about Neo4j instances... what if THEY were alive? What if I could interview a real, live Neo4j instance - what would that be like???

I decided to find out. Here's my lovely chat with imaginary JEP, the Graph Database. Turned out to be a fine chap, really. Here it goes:


Here's the transcript of our conversation:
RVB: Good morning everyone, my name is Rik, Rik Van Bruggen from Neo4j - and today I am doing a really special episode of the Graphistania podcast series. It's an episode that I got the inspiration for by listening to Everything is Alive, a podcast hosted by RadioTopia. They did some amazing work interviewing some of the most interesting characters ever - and I would love to continue that tradition here today on Graphistania.

Monday, 3 December 2018

Podcast interview with Will Lyon, Neo4j

Been a while since I have been able to publish more podcast episodes - sorry about that. Will try to keep up a regular pace - but no guarantees. However, I must say that the conversation that I had with my colleague Will Lyon made me think that I really should keep it up... the Neo4j ecosystem, and company is full of people with lots of interesting things to say - and talking to them is just a blast.

So this conversation was long overdue, because Will has done SO MUCH for the Neo4j Community in the past couple of years - it's pretty crazy. How do you start a conversation like that? Turns out it's really easy. So nice. Listen to it over here:

Here's the transcript of our conversation:
RVB: 00:00:00.615 Hello, everyone. My name is Rik Van Bruggen from Neo4j, and tonight I have a very long overdue guess on this podcast episode. Someone that I've been dying to talk to, actually, for quite sometime because he's done such an amazing job in the Neo4j community over the past couple of years. And that's my colleague, Will Lyon. Hi, Will.

Wednesday, 28 November 2018

Working with the ICIJ Medical Devices dataset in Neo4j

Just last weekend our friends at the ICIJ published another really interesting case of investigative journalism - tracking down and publishing the quite absurd and disturbing practices of the medical devices industry. The entire case with all of the developing stories can be found at https://medicaldevices.icij.org/ - take a look as it really is quite fascinating. Of course that meant that I wanted to see what that data looked like in Neo4j, and if I could have a play. I didn't have time for a full detailed exploration yet - but hopefully this will also give others the opportunity to chime in. So let's see.

The Medical Devices dataset as a graph

This turned out to be surprisingly easy. Just download the Zip file from the ICIJ website: https://medicaldevices.icij.org/download/icij-imddb-2018-11-25.zip, unzip this, and then we get 3 comma-separated-values files:
  • one for the Devices that are being reported on
  • one for the Events that are being reported (whenever something happens to a device (eg. a recall) then that is logged and reported)
  • one for the Manufacturers of the medical devices.
That's easy enough.

Wednesday, 31 October 2018

Data Lineage in Neo4j - an elaborate experiment

For the past couple of years, I have had a LOT of conversations with users and customers of Neo4j that have been looking at graph databases for solving Data Lineage problems. Now, at first, that seemed like a really fancy new word used only by hipster technovangelists to try to appear interesting, but once I drilled into it, I found that it’s actually something really interesting and a really cool application of graph databases. Read more on the background of it on wikipedia (as always), or just live with this really simple definition:
“Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources.”
That’s easy enough. Fact is that it’s a really big problem for large organisations - specifically financial institutions as they have to comply with regulations like the Basel Committee on Banking Supervision's standard number 239 - which is all about assuring data governance and risk reporting accuracy.

Here’s a couple of really nice articles and videos that should really give you quite a bit of background.
 

Monday, 22 October 2018

Poring over Power Plants: Global Power Emissions Database in Neo4j

In the past couple of weeks, I have been looking to some interesting datasets for the Utility sector, where Networks or Graphs are of course in very, VERY abundant supply. We have Electricity Networks, Gas Networks, Water Networks, Sewage Networks, etc etc - that all form these really interesting graphs that our users can. Lots of users have specialised, GIS based tools to manage these networks - but when you think about it there are so many interesting things that we could do if ... we would only store the network as a network - in Neo4j of course.
So I started looking for some datasets, and maybe I am not familiar with this domain, but I did not really find anything too graphy. But I did find a different dataset that contained a lot of information about Power Plants - and their emissions. Take a look at this website:
and then you can download the Excel workbook from over here. It's not that big - and of course the first thing I did was to convert it into a Google Sheet. You can access that sheet over here:

There's two really interesting tabs in this dataset:
  1. the sheet containing the fuel types: this gives you a set of classifications of the fuel that is used in the different power plants around the globe
  2. the list of 30,5k power plants from around the world that generate different levels of power from different fueltypes. While doing so, they also generate different levels of emissions, of course, and that data is also clearly mentioned in this sheet. Note that the dataset does not include any information on Nuclear plants - as they don't really have any "emissions" other than water vapour and... the nuclear waste of course.
So let's get going - let's import this dataset into Neo4j.

Friday, 12 October 2018

Podcast Interview with Michael Simons, Neo4j

For this week's episode of our Graphistania podcast, I had the great pleasure of spending some time on the phone with Michael Simons - one of the talented Neo4j engineers that build our products. Michael only recently joined our team, and we actually got talking on our internal channels about something we both love dearly... Bikes. I did a ride in Belgium recently that Michael found interesting and then he rode it himself as well - and hey, we got talking. One thing led to another, and before you know it we are recording the conversation... Here it is:


Here's the transcript of our conversation:
RVB: 00:00:01.418 Hello, everyone. My name is Rik Van Bruggen from Neo4j, and here I am again recording another episode for our Graphistania podcast. And today, I have one of my dear colleagues on the other side of this Google Hangout again, and that's Michael Simons from Neo4j engineering. Hi, Michael.

MS: 00:00:19.623 Hi, Rik.


Wednesday, 3 October 2018

Podcast Interview with Michael McKenzie

Why spend my evenings/weekends/empty hours creating a podcast? Well that's very simple: I love talking to like-minded people in the graph community. There's something about this community that attracts people that are equally fond of "connections" and building relationships that is just too awesome to explain. I love it. So when Karin told me about this guy in Washington that was doing awesome things with Neo4j and was helping out with community activities (he wrote about it over here), I was all too keen to have a chat with him. Meet Michael McKenzie, from Washington DC - here's our chat:

Note: I recorded this with Michael before our fantastic GraphConnect conference in New York a few weeks ago - but did not have the time to publish it earlier... apologies...


Here's the transcript of our conversation:
RVB:00:00:00.000 Hello, everyone. My name is Rik. Rik Van Bruggen from Neo4j and here I am again recording another Graphistania Neo4j podcast. And today, I have a wonderful community member on the other side of this Google hangout and that's Michael McKenzie from Washington, D.C. in the US. Hi, Michael.