Tuesday, 11 September 2018

Podcast Interview with Karin Wolok, Neo4j

Next week is GraphConnect New York City 2018, and that's of course a big highlight for all of us at Neo4j. You should really be there if you can :) ...

One of the reasons why GraphConnect is such a great event, is because it allows us to connect all the nodes in the graph and have a great couple of days of real-world conversations about this fascinating topic called graphs. Again, we are going to have a great line-up, not in the least because of all the great community content that we will be presenting and working on during the event.

On top of that, we have had a LOT going on in the Neo4j Community recently - with the launch of a new community site and more. That's a good enough reason for me to invite Karin Wolok, our Community Manager at Neo4j for a good chat. Here it is:

Here's the transcript of our conversation:
RVB: 00:00:00.819 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j. And here I am again recording another episode of our Graphistania Neo4j podcast. And today's a little bit of a special episode I think because it relates to something very dear to my heart and many people at Neo4j's heart, which is our Neo4j community. And for that, I've invited Karin Wolok on the podcast. Karin is our community manager. Actually, you have a very different and more expensive-sounding title, right, Karin? But maybe you can introduce yourself to our listeners.

Monday, 3 September 2018

Podcast Interview with Johannes Unterstein, Neo4j

A couple of months ago, we had a great Online Meetup that was all about scaling out Neo4j using containerisation and container orchestration technologies. You can see the recording over here:

That was really cool, and a great execuse to invite my nowadays *colleague* Johannes Unterstein to the podcast. Johannes has a really interesting history and a lot of expertise in these technologies, and could really talk about them for our audience. So here's our chat:

Here's the transcript of our conversation:
RVB: 00:00:00.399 Hello, everyone. My name is Rik Van Bruggen from Neo4j, and here I am again after the holiday period recording another Graphistania podcast. And today I have the pleasure of welcoming one of my dear engineering colleagues on this podcast episode, and that's Johannes Unterstein from Germany. Hi, Johannes.

Thursday, 23 August 2018

ESCO database in Neo4j: Skills, Competencies, Qualifications and Occupations form a beautiful graph!

Just a few weeks ago, I was discussing with Neo4j users that are active in the domain of "labour", or work. While talking to these users, they mentioned that there are standards out there that classify different types of work into different buckets (a taxonomy, if you will), and that there are two competing standards to do so out there. There's 
  • the ESCO standard: the European Skills, Competences, Qualifications and Occupations, and 
  • the ROME standard: the "Répertoire opérationnel des métiers et des emplois (ROME)"
The ESCO seems to be promoted by the European Commission, and the latter seems to be a Belgian/French initiative of some sorts. Surely they overlap, but I am not sure by how much. As luck would have it I started looking at the ESCO material first, but I am sure we could have written this post about ROME as well. It's the principles that matter.

And in principle, I figured that using these standards would be a really cool thing to do in Neo4j. Skills/Competences and  Occupations form really interesting graphy structures, and I could see how you could use a taxonomy like that to do some really interesting recommendations and other data workloads. So I wanted to give it a poke around.

Loading ESCO into Neo4j

The entire ESCO dataset can be downloaded from the European Commission's portal site: https://ec.europa.eu/esco/portal.  
It's really easy: you just select the data that you are interested in - the topic, format, and the languages - and put together a download package. 

In terms of format, you can choose between
  • an RDF format, which basically gives you a large (500MB) Turtle file. Turtle - the Terse RDF Triple Language, see https://www.w3.org/TR/turtle/ - is probably more comprehensive, as it contains everything. But it's also quite a bit more difficult to manipulate and get your head around. I was able to import the Turtle file really easily using Jesus' "neosemantics" plugins, and had it up and running in minutes. But I found it more difficult to use - most likely because I am not an RDF afficionado. Sorry.
  • CSV format. That's easy enough - we know how to import those. So all I needed to do was write a few Cypher scripts and import the data in a few minutes. I will put the scripts below, but you can also see them on github.
In any case, I opted to continue with the CSV files, and spent a little time importing the different files and connecting them together - in different languages. There's basically 5 files:
  1. the Skills
  2. the Skillsgroups, grouping the above together in groups
  3. the Occupations
  4. the ISCOgroups: this is a standard of the International Labour Organisation (ILO) that provides an International Standard Classification of Occupations. 
  5. and then a few files with relationships between Skills and Occupations, different ISCO groups, and different Skills/Skillsgroups.
I wrote the script pretty quickly - it's really not that hard - and then I ...
... ended up with a few Neo4j databases:
  1. one full of RDF triples - complicated!
  2. one with English Skills, Skillsgroups, Occupations and ISCOgroups. 
  3. one with Dutch Skills, Skillsgroups, Occupations and ISCOgroups.
In the Neo4j Desktop that looks a bit like this:
This is where the scripts are on Github.

Working with the ESCO database in Neo4j

Now that all that is imported, we can take a look at it. Let's start by looking at the model that we have imported. Pretty straightforward:
We can also just start looking at some data by just visually exploring it in the Neo4j Browser:
But it get's a lot interesting when we put Cypher to it, and start querying the data. For example, let me grab these two nodes here:
And look at the paths between them:
As always, the things that are located on the path, tend to be pretty interesting. Even more so when I think a bit more about the data, and start looking for the ESSENTIAL FOR relationship links. Let's see what comes back when I look for the links between a "software developer" and a "beer sommelier", when I ONLY traverse the relationships that define really important / ESSENTIAL relationships between Skills and Occupations:
Interesting. I am sure that a domain expert could do lots of other things here, especially if we could give that expert some non-technical tool like Neo4j Bloom.
All in all, this was a really easy and interesting experiment. I am sure there's a lot more to do here - but this was yet another example of a cool application of Neo4j in a surprising domain.

Hope this was useful.



Thursday, 5 July 2018

Podcast Interview with Matt Casters, Neo4j & Kettle

A couple of years ago, I got to know another Belgian data aficionado that was doing quite a bit of work in the open source community, called Bart Maertens. For a while, we actually met at Antwerp Airport when we were both "commuting" to London City Airport for business - and we got a conversation going. Bart was organising a Pentaho Community Meeting in Antwerp, less than 500m from my home, and invited me to come along and talk a bit about my favourite subjects: beer and graphs :) ... 
So one thing lead to another, and Bart started to do some interesting work integrating his data integration tools with Neo4j. He wrote the code, and blogged about it in some detail

Fast forward to early 2018. Neo4j is more and more in the Enterprise market, with very large organisations seeing the value of graph databases and the platform around it. But most of these environments are NOT greenfield environments - they almost always require some kind of data integration work to make the tools work effectively. So it became very natural for us to start look for architects and experts that could help us... and that's effectively what brought my next Podcast guest to the Graph: Matt Casters has worked together with many other Neo4j people in a previous life, and is now the Chief Solutions Architect in our professional services organisation. 

Here's my chat with Matt:

Friday, 22 June 2018

Podcast Interview with Estelle Joubert, Dalhousie University

One of the coolest things about Neo4j is just the sheer breadth and diversity of applications that we see for connected data and graph databases out there. I think I have said it before, but it truly continues to baffle me. Very frequently, I will have a morning conversation with a user about battling financial fraud, a lunch conversation about using graphs in biotech to fight world hunger, and an afternoon conversation about real time recommender systems in retail. And of course finish it of with a beergraph conversation in the evening :) ...

Really - it's just amazing. And the next podcast episode is a true testimony to that. I got to have a chat with a lovely lady all the way over in Canada recently, Estelle Joubert from Dalhousie University. She and her team have been using Neo4j in her amazing field of research, which is all about understanding how music and opera came to be what they are today in a historical perspective. She is best at explaining it herself - so here's our chat:

Here's the transcript of our conversation:
RVB:  00:01:20.209 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4J, and tonight I am joined by a guest on our podcast all the way from Canada, someone that has been working with, and experimenting with, Neo4j for quite some time in a very interesting domain that I hadn't heard of before. And that's Estelle Joubert from Dalhousie University. Hi, Estelle.

Friday, 15 June 2018

Exploring new datatypes in Neo4j 3.4 and the Open Beer Database - part 2/2

In the previous blogpost I imported the Open Beer Database into Neo4j and added some new fancy spatial data to it. Now in this post I would like to explore that data. As a reminder, you can find the full
Let's take a look.

First we will just look at the basic OpenBeerDB data. The schema is quite straightforward:

Thursday, 14 June 2018

Exploring new datatypes in Neo4j 3.4 and the Open Beer Database - part 1/2

Recently, I gave a talk at the Amsterdam, Brussels and London Neo4j meetups about some of the new and exciting features in Neo4j 3.4. While preparing for it, I was looking for material and I found some very cool stuff that powerfully explains the new features. The best resource is probably this post by Ryan Boyd, and the video that goes with it:

Ryan does a great job at explaining the new features, and goes into some detail on the new temporal and spatial data types that you can now use in Neo4j 3.4. You can explore these new features yourself by accessing the Neo4j Sandbox developed specifically for this purpose. Or you can just do what I did, and use the Neo4j Desktop to spin up a Neo4j instance, and access the "guide". You do that by typing
:play https://guides.neo4j.com/sandbox/3.4/index.html
into the Neo4j browser, and then you can access the entire guide, add some data to your dataset, and play around.