Just a few weeks ago, I was discussing with Neo4j users that are active in the domain of "labour", or work. While talking to these users, they mentioned that there are standards out there that classify different types of work into different buckets (a taxonomy, if you will), and that there are two competing standards to do so out there. There's
- the ESCO standard: the European Skills, Competences, Qualifications and Occupations, and
- the ROME standard: the "Répertoire opérationnel des métiers et des emplois (ROME)"
And in principle, I figured that using these standards would be a really cool thing to do in Neo4j. Skills/Competences and Occupations form really interesting graphy structures, and I could see how you could use a taxonomy like that to do some really interesting recommendations and other data workloads. So I wanted to give it a poke around.
Loading ESCO into Neo4j
The entire ESCO dataset can be downloaded from the European Commission's portal site: https://ec.europa.eu/esco/portal.
It's really easy: you just select the data that you are interested in - the topic, format, and the languages - and put together a download package.
In terms of format, you can choose between
- an RDF format, which basically gives you a large (500MB) Turtle file. Turtle - the Terse RDF Triple Language, see https://www.w3.org/TR/turtle/ - is probably more comprehensive, as it contains everything. But it's also quite a bit more difficult to manipulate and get your head around. I was able to import the Turtle file really easily using Jesus' "neosemantics" plugins, and had it up and running in minutes. But I found it more difficult to use - most likely because I am not an RDF afficionado. Sorry.
- CSV format. That's easy enough - we know how to import those. So all I needed to do was write a few Cypher scripts and import the data in a few minutes. I will put the scripts below, but you can also see them on github.
In any case, I opted to continue with the CSV files, and spent a little time importing the different files and connecting them together - in different languages. There's basically 5 files:
- the Skills
- the Skillsgroups, grouping the above together in groups
- the Occupations
- the ISCOgroups: this is a standard of the International Labour Organisation (ILO) that provides an International Standard Classification of Occupations.
- and then a few files with relationships between Skills and Occupations, different ISCO groups, and different Skills/Skillsgroups.
- one full of RDF triples - complicated!
- one with English Skills, Skillsgroups, Occupations and ISCOgroups.
- one with Dutch Skills, Skillsgroups, Occupations and ISCOgroups.
This is where the scripts are on Github.
Working with the ESCO database in Neo4jNow that all that is imported, we can take a look at it. Let's start by looking at the model that we have imported. Pretty straightforward:
Hope this was useful.
Post a Comment