Tuesday, 29 September 2020
Using Apache Zeppelin with Neo4j to analyse the FinCEN Files
So of course I had to take this data for a spin myself - it seems really important to me that more eyeballs are looking at this, and more people exposing the sometimes very questionable behaviour of the world's largest financial institutions.
Wednesday, 23 September 2020
Exponential growth in Neo4j
One thing I do know though, is that very smart and loveable people, in my own social and professional circle and beyond, seem to be confused by some of the data. Very often, they make seemingly rational arguments about the numbers that are seeing - but ignoring the fact that we are looking at an Exponential Growth problem. In this post, I want to talk about that a little bit, and illustrate it with an example from the Neo4j world.
What is Exponential Growth exactly?Let’s take a look at the definition from good old Wikipedia:
Exponential growth is a specific way that a quantity may increase over time. It occurs when the instantaneous rate of change (that is, the derivative) of a quantity with respect to time is proportional to the quantity itself. Described as a function, a quantity undergoing exponential growth is an exponential function of time, that is, the variable representing time is the exponent (in contrast to other types of growth, such as quadratic growth).The basic functions that are being entertained here are very simple in terms of the maths:
Friday, 18 September 2020
OpenTrials in Neo4j - with a simple ETL job
I have been meaning to write about this for such a long time. Ever since the lockdown happened, I have been wanting to take a look at a particular biomedical dataset that looks extremely interesting to me: the OpenTrials dataset. If you are not familiar with this yet, this is what they say:
OpenTrials is a collaboration between Open Knowledge International and Dr Ben Goldacre from the University of Oxford DataLab. It aims to locate, match, and share all publicly accessible data and documents, on all trials conducted, on all medicines and other treatments, globally.
It's a super interesting initiative, and it really flows from the idea that in much of the very intensive, expensive biomedical research, we should be looking at how to better use and re-use the knowledge that we are building up. Kind of like what people in the CovidGraph.org initiative, het.io (remember the interview I did with Daniel - so great!) and others are doing.
Downloading and restoring the dataset
It's a bit hidden, but you can actually download a (slightly older, but still) dataset of the OpenTrials dataset from their website. The dataset is actually a Postgres dump file: I got the latest one from http://datastore.opentrials.net/public/opentrials-api-2018-04-01.dump.
Monday, 7 September 2020
Graphistania 2.0 - Episode 8 - The one after the Covid-summer
No sure if we should be happy or sad - but hey - the Covid-19 summer of 2020 is almost behind us. Like most people, I found it quite a strange and unusual summer, with very few foreign adventures (although I did manage to squeeze in a cycling/camping trip to the French Alps in July), lots of cycling, some great family time... and of course lots of time with graphs :) ...
So that means that we are also kicking the Graphistania podcast back into gear - here's the next episode for you:
Here's the transcript of our conversation:
RVB: 00:00:15.863 [music] Hey, Stefan, I do need to ask you for consent, I think, right?
SW: 00:00:19.847 Hi. Yeah, I consent. [laughter] This is always the weird moment.
RVB: 00:00:24.727 Exactly. I thought, "Start with that one again."
SW: 00:00:28.436 Exactly, just to create a little bit of tension in the air.