Monday, 11 January 2021

Graphistania 2.0 - the HAPPY NEW YEAR session!

A VERY HAPPY NEW YEAR, everyone! I hope 2021 will beat all of your professional and personal expectations, and OMG aren't we all hoping that we can see eachother in person a bit more this year. Let's make that happen, when it's safe to do so. For now, we will connect with each other remotely, among other things through this page and this... podcast. 

Here's a great episode for you. As always, we actually based our conversation on the awesome TWIN4J developer newsletter, which has some fantastic stories in there almost every week - definitely recommend that you subscribe to that one. Our summary of some of the posts is in this document, and here's the recording of our conversation: 

Here's the transcript of our conversation:

RVB: 00:00:20.848 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j. And happy new year. Happy new year to everyone, and welcome to another episode our Graphistania podcast. So happy to be here again after this crazy year it was, 2020. And we are going to continue with the good thing that we started last year, which is I have my dear friend and colleague, Stefan, with me on this recording. Hi, Stefan.
SW: 00:00:56.121 Hello, Rik, and hello every single one of you out there in this completely new year, which is going to be, of course, completely different and not anything like the year before. Let's see how that turns out. We all know what's going to happen or what is already happening. But it can just be as good as we make it. So this is what I like doing, this thing with you, because it's really fun and inspiring, and I hope people feel the same.
RVB: 00:01:23.810 Yeah. Same here. Yeah. Thanks for being there. And, as usual, we have a lot to talk about, and we'll probably need to keep an eye on the clock here a little bit. But yeah, there's been so many great things, again, in the graph community that have been popping up. So many great examples that keep coming out in the This Week in Neo4j newsletter, but also everywhere on the community website. It's kind of amazing. I've got a couple of ideas to talk about. Why don't we run through those? Is that okay for you?
SW: 00:02:03.521 Yeah. That would be lovely. And, again, what better way to get your kind of lazy Christmas holiday brain to start than to dive straight in? So yeah.
RVB: 00:02:13.998 Exactly. Yeah.
SW: 00:02:15.499 Any one of those you're thinking of that stand out?
RVB: 00:02:18.754 Well, when I was going through the This Week in Neo4j newsletters, which is what I typically do in preparing for these podcasts, to just kind of see what's happening, right, what struck me is that there's a number of super, super interesting discussions and cases that are all about knowledge creation, how graphs can help you with not just knowledge management, so to speak, and structuring knowledge but really kind of creating new knowledge. People talk about machine learning and AI these days all the time. But it's amazing how things like this-- there was an article about the brain, which is one of those knowledge management tools that use graphs. Or some of the articles that Jesús wrote around multilingual taxonomies or the ArXiv connections. I mean, all of these use cases, they're all about, leveraging existing data, structuring it as a graph, and then using that to create new knowledge, which is fascinating in my book. What do you think about that?
SW: 00:03:39.777 Or maybe it's even like-- it comes, for me, from this fascination of, as you said, everybody running towards the latest technology. But what they do tend to forget is that there's a lot of [barriers?] just underneath their feet, right? But they can't see it. So all of that knowledge is there, however, they cannot see it because it's not connected, right? And I think that's the beauty of the graph and the way you can work with it to allow you to see the things that you already had answers to or the things that you didn't even know that you wanted to know, as I always say. And I think that is also coming back to a little bit on the way I think about strategy and behaviour prediction. I very often do this kind of the way of thinking from a anthropologist kind of view, right? Very like you can't isolate technology in one sense, but you need to study the full culture, meaning values, beliefs, artefacts, tools, behaviours, and everything at once, and I think that's when you get the fuller picture. And I think if there's anything that we have learned in the past year, it's that a lot of questions do not have a yes or no answer. It's not black or white. Very often, it's a nuanced answer. And I think that is the great part with graphs. It allow us to kind of reason about things in a more networking kind of way, so it's almost like it's enabled also. Not uncovering only the knowledge within your data, but it's helping you, actually, to create a more sustainable mental model in the way you think, right? So I think that is a lot of the cool things because if you can't see it, I mean, then you can't really think it in that sense, right?
RVB: 00:05:26.369 Very true. Yeah. One of the examples that was featured in the newsletters was all about these links between academic papers, right? I mean, if there's one place where there's a lot of knowledge being created and managed, it's, obviously, in academia, right? And now, people are starting to look at these things, like structuring academic papers and the links, the cross references, the citations that people make between different academic papers and creating big networks around that, right? And it made me think of you, Stefan. I think I've shared it with you in one of our private conversations as well. But there was this wonderful article that, I thought, was out there by a lady called Anne-Laure Le Cunff, I think her name is. She created this article around thinking in maps. How, actually, structuring knowledge, structuring ideas, structuring data as maps - and maps is just a type of graph, I would argue - is, actually, this age-old metaphor dating from the Lascaux caves back to the Egyptians, back to the Greek cultures. All this age-old technique of structuring data in maps, in graphs to make sense of them, to make sense of information, of knowledge. And some fascinating stuff there. And, obviously, me as an orienteering geek, I also like my maps.
SW: 00:07:16.465 Yeah. You heard map, and then you start running, right? [laughter]
RVB: 00:07:19.504 Yes, exactly. I'm like, "Map? Map, map, map? Where's the map?" But it was a fantastic article, I thought. I don't know if you had any thoughts on that.
SW: 00:07:31.537 No, I think it's so interesting. And also, since COVID around, there was a lot of labs, innovation labs - what I do work with there at Neo - that we also connected all of this kind of medical data, right? Because most of the time, this is also open data sources, but they are very siloed. And that's the problem, I think, with academia. They kind of drill this kind of deep hole with specialised knowledge, and they kind of forget that a lot of the value is also when you connect them. So I think it's super interesting, again, as you said, because if you can't see it, you can't do anything with it, right? And then you start to forget about it. So I think it's super cool to see it. Yeah.
RVB: 00:08:18.875 By the way, I wanted to mention that in almost all of the examples that I looked at in the newsletter and the past couple of months, I found that there's a lot of graph data science actually being applied, right?
SW: 00:08:35.361 Oh, yes.
RVB: 00:08:35.431 I mean, you know that it's kind of a new thing to the graph community, to really have an enterprise grade tool for applying data science concepts to graphs. But I think almost every single one of these knowledge creation examples that we just talked about has a data science component to it. It's amazing how that's been boosted in the past couple of months. You know what I mean?
SW: 00:09:05.233 Yeah. And I think, from a transformation kind of standpoint, what we see here, as you say, there is literally in every single one, right, because all of a sudden, this is now, because of the release of GDS library and what we do, there's a possibility for pretty much anyone to just fire away and start going with this. And one of the things which I think is so interesting, there's a couple of really good articles from Kristof there, like one where he kind of compare Neo4j with NetworkX and do, in his own words, a drag race of sorts, which I think is also interesting to see. A lot of these things, you could do in smaller scale before, but you couldn't do it in the same kind of system where you have your transactional thing. And this is a lot of what I see because if you can put all this power in one system, and that system allows you to work faster, that is the game changer. Because if you can, I think the other article, there is a good example of calculating centrality at scale where the example was something about 20 million or something, and it should take approximately five years to calculate. You can do it in theory, but let me know any business that's going to wait five years for the result of that. That's, literally, not happening. But when you can start to get these examples in real time, then you're allowed to try out things. And I think this is just the beginning of seeing this whole wave of new companies behaving instead of just like the team at Google or some of those big giants, right? Now it's democrat times, so it's for everyone, so.
RVB: 00:10:51.795 Yes. Literally, that's the right word, right? It's democratisation. You used to require a super computer to do this stuff, right? I mean, I don't know if you remember, but Cray supercomputers, they used to have-- they used to have a spin-off company called Yarc. And Yarc, all they did was they sold custom Cray computers that did - drum roll - graph processing, right? That's what they did. And now, you run that on your laptop. The democratisation of this stuff is just so impressive. And I thought it was super interesting to see that article about comparing it to NetworkX because I actually like NetworkX. I think it's a really, really cool tool.
SW: 00:11:44.419 Yeah, it's amazing.
RVB: 00:11:46.967 But if you just look at Kristof's test, you can kind of see, to do something really simple, it takes minutes in NetworkX, and it takes seconds on Neo4j. And then you're like, "Hmm. That's not a trivial thing." That has an impact on the rate of innovation, the rate of how easy it is for people to adopt it, yes or no. That's not a trivial thing. To be able to do things at that speed, it's just kind of meaningful. And I'm quite happy about that. So it's very impressive.
SW: 00:12:26.767 Yeah. But I 100% agree to that. And I think this idea, you can look upon this from the time perspective, this is how much I would save, and then like 1 second compared to 10 minutes, it's not that much. I can go and take a coffee. But I think from an innovation and in a cognitive kind of human capacity, what happens when you are allowed to just try and explore is that you're going to try 100 new things during the rest of those seconds in that 10 minutes time, right, which will allow you to more of, again, the word that we said before, things that you didn't know that you wanted to know that you already, in one sense, had the data, but you couldn't see, right? So I think, again, that's so amazing to see this happening. I've just downloaded and try it out, working with embeddings and stuff during the holidays, and it's mind-blowing. I really encourage every single one of you out there to just do it and go ahead.
RVB: 00:13:27.117 Yeah. And there were some other-- I mean, I think there's some other examples, not just in processing speed but also in how quickly can you get something done. Can you get to an end result? I mean, if I look at the work that Adam did with this new graph app called Charts, I'm like, "Wow. This is so cool." Because I mean, you used to have to develop this entire front-end app to kind of expose this to your colleagues, right, to show it to someone. And now, it's just like click, click, click, click, click, and [inaudible], you're up and rolling, and you can show it to people, not just the speed of processing, but also the speed of development and things like that. GRANDstack, the BI connector, some really cool articles that we saw in the past couple of months that showed that. Really, really quite impressive, I must say.
SW: 00:14:20.687 Yeah. I really second that. And I think this is where we see, again, on a level of transformation within companies, right? Because there's one part to validate your use case from a data and technology standpoint, and then, of course, you need to validate the business part. But one thing, and I worked with Adam in a lot of these labs that we do, right, and this was an idea that we have been talking about for ages, and I'm so happy to see this coming alive because the one times that we tried putting people that literally coming into the room saying, "I hate data because it never works," give them the graph, the power of the graph in a simple interface like this, and all of a sudden, these people stand up and screaming, "I love data. I love graphs." So this is like the graph epiphany moment times like - I don't know - 100 or something. So I'm super happy to see it. And I think, again, it's just amazing to see how much goes so fast, so super cool.
RVB: 00:15:20.954 All right. Well, I think we're going to wrap up it. Just maybe one more question. What was your favourite title of the articles that you read? I know I have one. I have one in mind. [laughter]
SW: 00:15:32.535 What could that be? Let me know.
RVB: 00:15:34.558 I think it's The Pulumi Platypus And The Very GRAND Stack. [laughter]
SW: 00:15:40.368 Yeah. It's hard to beat that one.
RVB: 00:15:42.801 It's really hard to beat that one. It's a super, super-- Pulumi Platypus And The Very GRAND Stack, yeah. A really cool article. [laughter]
SW: 00:15:50.523 It's an amazing one. The one I was thinking-- actually, I don't know why. Maybe this is, again, me being nerdy again. But when you said any good title, the only one I was thinking was this network analysis of the Marvel Universe.
RVB: 00:16:05.990 Oh, yeah. Of course. Yeah.
SW: 00:16:06.410 But I guess that has nothing to do with titles. It has [crosstalk] me and my childish behaviour that will never leave my body. [laughter] Yeah.
RVB: 00:16:15.794 We love you for it. We love you for it. So hey, Stefan, thank you taking the time to talk us through these different posts. We're, obviously, going to include them in the transcription of the podcast. It's been great talking to you again. And I'm looking forward to a great new series, right, where we're going to keep this up and keep on making these little podcast recordings together. It's been so much fun.
SW: 00:16:41.855 Yes. Super great there. Happy to speak to you again, and waiting for the next one already.
RVB: 00:16:48.182 Me too. Thank you, Stefan. Have a great day.
SW: 00:16:51.806 Great day. Bye.
RVB: 00:16:53.294 Bye.

Subscribing to the podcast is easy: just add the rss feed, find the show on Spotify, or add us in iTunes! Hope you'll enjoy it!


All the best

Rik

Wednesday, 25 November 2020

Exporting Spotify Playlists into Neo4j - and creating a little dashboard

About two months ago, my colleague Niels published an amazing blogpost. He showed us how to solve a problem that I really recognized: to make sense of your age-old Spotify playlists that are getting seriously out of hand. I have this problem in the real world: I keep adding songs to my "favorites" playlist, or to some collaborative playlists that I have with my kids/friends - but I end up with these huge gathering pots of songs that... really don't make a lot of sense anymore, and really have not much use anymore. 
So Niels' blogpost was really useful: he used python, the spotipy wrapper of the Spotify Web API,  and of course our favourite database, Neo4j and some of it's graphy tools (Graph Data Science to the rescue)  to make a really fancy new set of Spotify playlists that were much more useable. Take a look at Niels' script over here. So I wanted to have a play with Niels' work in my own environment - and do some more exploration in Neo4j. Here's what happened.

Tuesday, 24 November 2020

Graphistania 2.0 - Episode 11 - The Emil Update

Yey! I got to do it again. For the 4th time in the history of this weird thing called the Graphistania podcast, I have had the change to spend some quality time talking to Emil Eifrem, our fearless leader and CEO of Neo4j. As last time, we actually recorded the video, so you will find the zoom call, and the MP3 version of it, below in the blogpost - along with the habitual transcription.

Hope you will enjoy the chat as much as I did.


Here's the link to the youtube video of the call:

Thursday, 12 November 2020

Graphistania 2.0 - Episode 10 - This Month in Neo4j

Hi everyone

Hope you are all well, keeping safe, and finding some time to relax and enjoy life in this wonderful rollercoaster that is 2020. Think of it this way - we will never forget this ride, EVAH! 

As you can imagine, things have been evolving at warp speed in the wonderful world of graphs as well. So me and my partner in crime Stefan had another chat about all the things we have seen pop up, mostly through the awesome This Week in Neo4j (Twin4J) newsletter. Here's the chat we recorded:

Here's the transcript of our conversation:

RVB:00:00:01.448 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j, and here I am again recording another episode of our Graphistania Neo4j podcast. Wonderful time of the day to start with this type of conversation because I have my dear friend, Stefan, on the other side of this call. Hi, Stefan. How are you?

Wednesday, 4 November 2020

Making sense of 2020's mad calendar with Neo4j


As we enter November 2020, I - like many people I assume - can't help but feel quite "betwattled" by all of the events taking place this year. I took some time last weekend to look at all the crazy events that happened ... starting with pretty normal January and February, moving slowly to ominous March, and then living the weird, (semi-) locked down lives that we have been living until this very day I write this, which is the day after the bizarre US elections.

In any case, I decided to have some fun while reflecting about all this. And in my world, that means playing with data, using my favourite tools... CSV files, Google Sheets, and of course, Neo4j. Let me take you for a ride.

Starting out with my calendar

The starting point of all this is of course my Google Calendar - which is buried in online calls and meetings these days. 

Tuesday, 6 October 2020

Graphistania 2.0 - Episode 9 - The one about the (Graph Databases for) Dummies (book)

Here's a nice new episode of the Graphistania podcast for you: for the first time in 5 years, I was able to get the fantastically awesome Chief Scientist of Neo4j, Dr. Jim Webber, back to the podcast. Jim is a great colleague and friend, and one of the best tech public speakers in the business - especially when you want to talk Graphs and distributed systems. Over the past few months, I had the pleasure of working together with Jim on a more regular basis - as we actually wrote a book together: the Graph Databases for Dummies book. It was announced on the Neo4j blog, and seems to have been doing really well in the past few weeks. Some of you may remember that Jim co-wrote The O'Reilly book on Graph Databases, and I wrote Learning Neo4j by Packt (2nd edition together with Jérôme Baton) - and we have had a bit of friendly banter going back and forth about the quality of both artifacts :) ... it has been a ton of fun.

So here's the chat that we recorded about the new book - hope you enjoy it as much as we did.

Here's the transcript of our conversation:
RVB - 00:00:00.151 Hello, everyone. My name is Rik, Rik Van Bruggen, from Neo4j, and here I am again recording another episode of our Graphistania podcast. And this is a special one. This is a special episode, one that we've been talking about for some time, because I have a very special guest on this show, and that is my dear friend and colleague Jim Webber. Hey, Jim.

Tuesday, 29 September 2020

Using Apache Zeppelin with Neo4j to analyse the FinCEN Files

Last week, we got another great and widely publicised case of Graph Databases' usefulness throw our way. The ICIJ published their FinCEN Files research, and on top of allowing you to explore the data on their website they also published an anonymised subset of the data as a series of CSV/JSON files. My friends and colleagues Michael Hunger, Will Lyon and the rest of the team, helped with the process of making this subset available as a Neo4j database (see this github repo), and there's even a super easy FinCEN Files Neo4j Sandbox that you can spin up in no time for some investigation fun.

So of course I had to take this data for a spin myself - it seems really important to me that more eyeballs are looking at this, and more people exposing the sometimes very questionable behaviour of the world's largest financial institutions.

Introducing Zeppelin

I had heard of some great technology a while ago that would allow people to use their data in a very different way, by looking at these interactive webpages that would interact with a Neo4j database.