Bruggen Blog: July 2016

Wednesday 20 July 2016

Graphing the Tour de France - part 3/3

In the past two blogposts I have been creating and importing some nice Tour de France 2016 data. It's a small dataset, for sure, and this is by no means a realistic graph application - but perhaps we can still have some fun exploiting the data with some cypher queries. That's what we'll try now. I have put all of the example queries together in this gist, so please feel free to play around with it :) ... let's take you through it.

Is the model really there?

First and foremost, let's verify the model that we wanted to put in place, with yet another AAPOC (Awesome APOC). We thought we were going to get this model:

Graphing the Tour de France - part 2/3

In a previous blog post, I created a couple of Google spreadsheets with some of the results data of the 2016 Tour de France. These spreadsheets can be very easily downloaded as two comma-separated files that hold the data:

the riders.csv file can be downloaded from this URL.
the stages.csv file can be downloaded from this URL.

I will be updating the stages.csv files as the Tour progresses, so we can keep updating the graph as well.

Creating a model

To import these CSV files into Neo4j, I actually went through multiple iterations of the model. Here's two of them that I wanted to share with you - not because of the fact that one of them would be "right" and the other one would be "wrong", but because it really reflects the fact that your use case - the questions that you want to ask of your data and what you want to be doing with the data - is going to determine the model. Underlined. In Bold. Because it's so important.

Graphing the Tour de France - part 1/3

Alright, it's time to come out of the closet. I have to admit, over the past couple of years, I have turned into a bit of a cycling geek. I love watching the races in Flanders in spring, the legendary "ride through hell" from Paris to Roubaix, and of course, now, in summertime, the big tours of Italy, France and Spain. I have grown quite addicted to it - and have taken to riding my own bike a couple of times a week as well... it's a ton of fun. Last year I did a fun experiment in a series of 5 blog posts about the Professional Cycling twitterverse, but this year, I had something else thrown into my lap. Here's what happened.

Podcast Interview with Florent Biville, Criteo

Today is a good day, because I got to spend some time for the podcast talking to one of our most active and busiest community members in France. Florent Biville has been working with Neo4j for a long time, on various projects like Liquigraph and AssertJ Neo4j, and has been presenting his work at various conferences - like this one:

So now I got to talk to Florent in a bit more detail, and it was a true pleasure:

Here's the transcript of our conversation:

RVB: 00:02 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here I am again recording another podcast episode for our Graphistania podcast. And today I'm joined by someone from the city of lights, Paris, Florent Biville. Thank you for joining us, Florent.

FB: 00:19 Thanks for having me.

RVB: 00:21 Fantastic. Thank you for coming online. And yeah, Florent, maybe the best place to start is for you to introduce yourself to our listeners. I know you've been very active in the Neo4j community, but maybe you can introduce yourself.

FB: 00:36 Sure. So, I'll just say that I'm based in Paris, and I work there in a company called Criteo. So, just in a few words, Criteo is one of the rare non-American companies that can compete with Google and Facebook. But, on the domain of online advertising, and more specifically, in the domain of retargeting. So basically, we're just trying to promote the best quality advertisement for users at scale. So that's basically what we do.

RVB: 01:08 Wow. Is that a French company, Florent? Or this a--?

FB: 01:11 Yes. Three co-founders are French. So now we have offices all around the world. Engineering is mostly based in Europe, but we have offices in the US, in South America, in Turkey, in England, in France, and so many countries I forget. But yeah, we are in a big expansion, and we're hiring also, by the way [chuckles].

RVB: 01:33 That's a nice plug, very well done [laughter]. Hey Florent, so what's your relationship to the wonderful world of graphs, then? How did you get into graphs? Can you tell us a little bit about that?

FB: 01:45 Sure. So, during my first job, actually, that's the time I first heard about Neo4j. Also, we didn't use it directly. We were interested in a graph database for fraud detection because we were selling video games activations. And every time there was a fraud detected, it was usually detected too late, so we had to pay chargebacks from the bank. So it's lots of cash wasted, and also the activation keys could not be reused afterwards. So we were basically burning your activation key stock. So it was really a huge waste, and we were trying to push a lot of efforts into improving fraud detection for our video games we're selling. So that was basically my first contact with graph database.

RVB: 02:29 How long ago is this?

FB: 02:31 So it was, I guess, in 2010 - something like that.

RVB: 02:34 That's a long time ago [chuckles].

FB: 02:37 Yes, yes. And then afterwards, I continued for my personal project. I really wanted to dig into that because-- I don't know, I just saw five minutes of Cypher, and I said, "Wow, that's so powerful and so interesting," even if the language was still young. And so, yeah, I tried Neo4j for my personal project later on, and then I joined a small company. We became partner with Neo4j. I guess we were one of the first in France, actually, to become partner. And I continued with some consulting gigs with startups or bigger companies, sometimes with Neo4j employees as well. That's how, basically, I spent between 2012 and maybe until one or two years ago.

RVB: 03:26 And you've been developing some open-source software around Neo4j as well, right? Liquigraph? Maybe you can tell us a little bit about that as well?

FB: 03:33 Sure. So Liquigraph is based on the project called Liquibase. Liquibase is a migration tool for relational databases. So the ideas and the concepts are very similar in Liquigraph, and the idea's the same, basically. You design your migrations in Cypher, and it will-- so you organise them in change sets, and then your change set will be executed incrementally, and that's how you can manage your model migration. Because that's not always easy, because Neo4j's schema optional. So it's very flexible, but sometimes maybe it's too much flexible and you need some structure to be sure your model evolves in a good way.

RVB: 04:16 Yeah [chuckles], I always say, with freedom comes responsibility, right? You have to have something overlooking the schemaless nature, right?

FB: 04:25 Yes, exactly. I couldn't have said it better [chuckles], actually. So yes, we are in active development right now. We've done some releases already. We're working on some new features to get a bit more on par with Liquibase. We're not as complete as Liquibase yet. We really worked on the main priority features. But we are getting there, and hopefully in the following weeks, we should have a new release with even better features, especially one we did with large data sets.

RVB: 04:56 Wow. And this is all open-source, right? You can find this online, and you can just take a look at it, and use it if it's suitable for you?

FB: 05:06 Yeah, absolutely. You just go to liquigraph.org, and that's your starting point.
RVB: 05:10 Sweet. So, can I ask you the question that I ask every one of my interviewees? Why [chuckles] did you get into graphs? You mentioned that it was quite some time ago, but what was the main thing that attracted you to get into the world of graphs?

FB: 05:32 Well, when I started-- so, it was really-- how can I say? The developer in me, I just saw the power of Cypher. I mean, I could express complex queries so easily that-- I don't know. Just that. Really, Cypher was the selling point for me. Just to see how easy it was to create a graph, how easy it was to express queries. Even though I didn't have any specific project in mind necessarily, just the power of it and the flexibility of it just got me into it almost immediately. That's really what got me into graphs at first.

RVB: 06:11 And then, did that love evolve? Or, how do you feel about that now? Is it still Cypher mainly, or are there things that you really think are more important?

FB: 06:23 Well, I think that - especially with the three years of versions - three, I mean, is-- Neo4j really, really becomes more and more powerful, especially in terms of the-- because before this, you have manager extensions or server plugins. So even though I was a driver developer and had no problem with that, it still didn't feel as natural as with other databases, I would say. So latest edition of Neo4j, three, really makes it like a robust major product, like what you would expect from a database - like binary protocols, some drivers, and some well-defined way of extending the behavior. That is so--

RVB: 07:12 Have you looked at the procedures yet? I suppose you have.

FB: 07:16 Yeah, absolutely. And that even gave me the idea of another small open-source project I created. Because when I started-- I started playing with the procedures a month ago, or something, and I noticed-- so, the runtime was very nice. Whenever I made a mistake and I deployed the procedure, I really got a detailed message. And then I thought, "Oh, okay. That's nice. So runtime gave me the error. What if I could get the error earlier?" And that's how I got the idea of writing a kind of compiler to remain simple, so that basically whenever you compile your procedure, even before you deploy it to Neo4j, you will get a detailed feedback about what you did wrong or not.

RVB: 08:01 That sounds very useful.

FB: 08:03 Yes, and I did that to the repository of Neo4j, or some procedure or something. You know, APOC?

RVB: 08:11 Yes, APOC [chuckles]. The name-- we are so good at naming, it's fantastic [chuckles].

FB: 08:17 I'm no better, so I won't even start on that [chuckles]. I wouldn't--

RVB: 08:22 Exactly. So, before I let you go, just to approach another - and maybe final - subject, where do you think this is going? What does the future hold, both for the graph industry but also for things like Liquigraph, or some of your other projects? How do you see the wonderful world of graphs evolving?

FB: 08:44 Well, first, I think it's going to-- at least what started with version three, and I'm sure it's going to continue this way, is you will have more and more integration with external tools. So, especially with the rework of the JDBC driver, for instance. It will definitely help see Neo4j used with some BI tools, or even more-- so as a-- that looks very promising. And hopefully-- I don't know. For Liquigraph, hopefully, so we will reach a version 1.0 and see people use it in reports. That's what I hope on my site, and maybe more contributors as well [chuckles]. But when I see, for instance, Panama Papers, that's a very great example of how Neo4j could be used. And it's great to see a big public example of something that is not a social graph. I mean, that's very interesting, and I'm sure we will have more and more examples like this, maybe in journalism, and maybe in other fields.

RVB: 09:48 Absolutely. I hope so, too. For the journalism part, I don't know if you saw that, but we've actually announced a Journalism Accelerator Program, whatever that means. But it's all about helping journalists or publishing organisations get started with Neo4j. I'm hoping that we'll see a lot more of that as well, so that will be great. Very cool. All right, Florent, as I told you before, we like to keep these podcasts fairly short, and so I want to thank you for coming online and spending some time with me. I'm sure you'll get a lot of interest once we publish a proper podcast for things like Liquigraph. And I look forward to meeting you again at one of the community events. At the meetups, maybe.

FB: 10:31 Yes, sure.

RVB: 10:32 Absolutely. Thank you, Florent.

FB: 10:34 Thanks a lot.

RVB: 10:35 Bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday 8 July 2016

BOLTing Podcast Interview with Nigell Small, Neo Technology

Little over a year ago, I had the chance to interview my friend and colleague Nigel Small for the Graphistania podcast. Great conversation, but time has gone by quickly - and Nigel has been very hard at work inside Neo4j's engineering team to create additional and exciting functionality for the World's leading graph database. Specifically, Neo4j's BOLT protocol has been one of the fruits of Nigel's labour - so it was a good time to have another chat and get another coffee-laden conversation in.

Here's the transcript of our conversation:

RVB: 00:02 Hello everyone. I'm name's Rik, Rik Van Bruggen, from Neo Technology and I'm at the London coffee shop of choice near our office in London doing another interview for the Neo4j podcast and I'm interviewing someone who I have interviewed before [chuckles] a while ago - Nigel Small from our engineering team. Hi, Nigel.

NS: 00:27 Hello.

RVB: 00:28 Hey, it's good to have you here. The reason for inviting you, somewhat unexpectedly, to talk a little bit on this podcast interview is that I know you've been hard at work for the past, I don't know, 18 months or so on the Bolt interface to Neo4j and the new drivers for the different development languages. I wanted to talk a little bit about that. Can you tell us a little bit about what is Bolt and what are the new uniform drivers? Let's start there.

NS: 01:01 So Bolt is the new approach for interfacing with Neo4j to eventuate probably replace the old REST interface, but for now, sits side by side it. It's a binary protocol, it's developed entirely in-house, and we've built a set of four drivers initially that we've released that interface with Java, Python, Java Script and .Net, to try to broaden the reach of the software that we're offering in-house and offer something that's supported for those platforms.

RVB: 01:37 Absolutely. So Bolt, from what I understand it, it's a binary protocol as opposed to the REST interface which is ASCII-based, I suppose. It's clear text.

NS: 01:49 It's text based. So the old interface - the HTTP interface - using JSON to transmit its payloads and as a data transfer format, JSON has challenges, let's say. There's certain limitations in what you can express. So we've developed a custom-serialization format and it's very much in line with the Cypher type system. It's inspired heavily, let's say, by MessagePack, which is a similar set-up, but it was developed from scratch to, as I say, work with a type system and work efficiently in the way we want to transfer data to and fro.

RVB: 02:34 So what was the primary goal of Bolt? What are some of the main advantages of using it with Neo4j 3.x going forward?

NS: 02:46 As I say, the type system is certainly one of them, so you get a much more native type system. You're sending fewer bytes to and fro, and while we haven't focused very heavily on optimization at this stage, we want to get the features set, fully fleshed out, there's a lot of optimization ideas we have going forward. Already, generally speaking, we're going to have a much faster experience with Bolt than you have done with HTTP in the past.

RVB: 03:13 As I understand it, it makes the server mode of Neo4j, as opposed to the embedded mode of Neo4j, a lot more feasible for high performance applications, or is that not really the case?

NS: 03:29 No, there are certain advantages. You've got a stateful session that you know you use, as opposed to HTTP which uses a stateless set-up. So each time you make a request when using HTTP, you're sending often the same set of metadata across, you're sending your user agent, you're sending your authentication information with each request that you do. With Bolt, you send that information at the start of a session and then that's used throughout, so you don't need to resend the same data. So, yeah, you do get some efficiencies.

RVB: 04:03 Very cool. But as I understand it, some of the side-effects from implementing the Bolt protocol, that hasn't been more on the drivers side? There has been a lot of work that you've been doing also on the uniform drivers that we've been providing with three.org. Can you talk to us a little bit about that?

NS: 04:23 Yes. The uniformity has been a key part of this. We wanted to provide a clean uniform experience across different languages. We picked these four to start with because they were four of the most important ones to us.

RVB: 04:37 Which four are those?

NS: 04:37 The four are: Java, Java Script, Python and .Net. And we wanted to make the experience as similar as possible. So we've made sure that we've unified the use of terminology and concepts that the drivers use across the board. So we have a session, we have a transaction, we have a results set. And they're all handled and described in the same way across all the drivers. In fact, the developer manual - the new developer manual - actually has one story it tells of how to use the driver with simply a difference in the sample code that's embedded. You can just switch the tab and it shows you the same code in different languages.

NS: 05:19 So we wanted to get this uniformity in place, but a lot of the difficulty has been around making sure that we get the right balance between uniformity and idiomatic language use. So we didn't want something that was exactly the same in every language, but for alien to the developer in that language. And actually getting that balance right has been quite a lot of work, it's been a real challenge at times, and we've tried to respond to feedback where developers have come to us. And we had a particular incidence with that with the .Net users telling us that the methods we were using for filtrating through results didn't feel natural. So we've gone back and we've reassessed how we were doing that - this was still pre-release of 3.0 - and we went back and we reassessed and we redesigned and we actually shifted the balance there much more towards idiomatic and away from uniform. I think that's been something we couldn't do entirely in-house, we needed to talk to the users in those ecosystems and tell us how best that we could fit that in. I think now we've got something that's pretty solid and should work well in most languages.

RVB: 06:32 In the past, the language drivers around the REST API in the past, they were mostly developed by the community, right? They were mostly developed by people like yourself with other people from the community have been contributing. Has that changed as well now with the uniform driver set?

NS: 06:50 We still have some driver authors. This is interesting actually. I've been involved with doing this for about five years or so now in terms of developing drivers, and seeing some drivers that have been born and had a life and then died off somewhere, other drivers that have carried on. But now we have some official drivers, actually we have to kind of work out how we want those drivers to sit alongside the community efforts. We don't want to go along and rid ourselves of any community efforts we have because the community's very very valuable and we don't want to build all these comprehensive idiomatic features in every single language. We want to provide a base, I think, is where we've left this. We want to build a base core driver that handles all the plumbing, doesn't let you have to worry about the type system and the protocol detail, provides a base API on which you can build other layers, build an OGM, build other things that are specific to the language that you're in. You've got Link and .Net, the things that are specific to the language. So ultimately, we're hoping that the community drivers will be something that will actually sit alongside the official drivers and perhaps as a set of plug-ins or something that can extend the official driver.

RVB: 08:19 So then the official supported drivers will be like the infrastructure for more added features, feature reach and implementations by the community?

NS: 08:27 Absolutely. Exactly how we go about that, we haven't entirely decided yet. As I'm still running the Py2neo Project, I've got a few ideas of how we can kind of combine the official Python driver with Py2neo with the extra features that I've added in in a way that I don't have to duplicate my efforts and actually do the same thing at home that I do in the office. There are a few challenges there and how we fit that in, but we want to make sure there's room for both, I think, ultimately.

RVB: 08:56 Very cool. Well I think people can find a lot more information on Bolt and on how to write drivers and everything online. We'll include some of those links on the transcription of the podcast. I think it's a great evolution that we're really taking this forward. So thank you very much for taking the time and spending some time with me. Thank you for the coffee is what I wanted to say as well. And in the meanwhile, I think England has scored here. I heard a big row outside, so that's probably good news.

RVB: 09:31 All right. Thank you, Nigel, and I'll talk to you soon.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Bruggen Blog

Pages

Wednesday 20 July 2016

Graphing the Tour de France - part 3/3

Is the model really there?

Monday 18 July 2016

Graphing the Tour de France - part 2/3

Creating a model

Thursday 14 July 2016

Graphing the Tour de France - part 1/3

Wednesday 13 July 2016

Podcast Interview with Florent Biville, Criteo

Friday 8 July 2016

BOLTing Podcast Interview with Nigell Small, Neo Technology

Labels

Blogarchive

Metricool