Friday, 23 December 2016

Podcast Interview with Emil Eifrem, Neo Technology

In the summer of 2015, 5-6 months after first starting this crazy podcast thing with Michael and Mark at Qcon London, I finally got my boss and friend Emil Eifrem, CEO of Neo Technology, to spend some time with me on this podcast. It was a great conversation, and I still smile thinking about the silly drumroll that we used.  But just before we wrap up 2016, it felt like it was the right thing to get Emil back on the podcast, and talk about "stuff". Here's that conversation - a little longer than usual, but totally worth it.

Here's the transcript of our conversation:
RVB: 00:02.909 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology. And here I am again. And I'm so excited, I can barely restrain myself. It's my “über boss” on the phone again. It's been 18 months since the last interview, and here I have him back on the podcast. Emil Eifrem. Hi, Emil. 
EE: 00:21.803 Hi Rik. Thanks for finally inviting me back.

Wednesday, 14 December 2016

Podcast Interview with Mouse Reeve, Internet Archive

Here's a lovely, conversation with a super interesting Neo4j community member: Mouse Reeve. She has been actively working on a really interesting application of Neo4j (see below) that is probably covering the most interesting and captivating domains ever: demons, spells, magic, and more. I am sure you will enjoy the following conversation as much as I did :) ...


Here's the transcript of our conversation:
RVB: 00:03.841 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and here we are again recording another podcast for our Graphistania podcast series. And tonight I have a lovely guest all the way from California, Mouse Reeve from the Internet Archive. Hi, Mouse.

Friday, 2 December 2016

Exploring the Paris Terrorist Attack network - part 3/3

Previously, on this blog, I had started writing about how we could get some of the data published by a local Belgian newspaper, De Standaard, on the Paris Terrorist Attack Network into Neo4j. In
  • Part 1, we talked about loading the raw JSON data into Neo4j, and then in
  • Part 2, we cleaned up some of the data for easy querying in Neo4j. 
So that's where we are. To wrap things up, I just wanted to illustrate some of the results and queries in Neo4j around some of the most interesting figures in this Terrorist network. I started some of my explorations around a widely reported terrorist, and Belgian national, called Salah Abdeslam.


So let's take a look at Salah in Neo4j.

Wednesday, 30 November 2016

Exploring the Paris Terrorist Attack network - part 2/3

In part 1 of this blogpost series, we got the basic Paris Terrorist Attack Network loaded into Neo4j. It looked like this:
There's a couple things that annoyed be about this graph:

  1. First, the relationships are all "bidirectional", which really clutters the visualisation. In Neo4j, relationships are always directed, which kind of makes it awkward to store these bi-directional relationships like this. 
  2. Of course, this graph was originally made by De Standaard newspaper in Flanders, Belgium, so therefore it was created in Dutch. A couple of the key concepts though (type of node, status of the node) would be easily and meaningfully translated for you to have any fun with the dataset.
  3. The graph was not "labeled", and therefore lacked some essential structural elements that would allow for fun manipulation in the Neo4j Browser. 
  4. The relationships did not really say anything about the type of relationship. 
Let's tackle these one by one.

Monday, 28 November 2016

Exploring the Paris Terrorist Attack Network - part 1/3

November 13th, 2015 - A day to remember

Just over two weeks ago, we remembered the sad anniversary of one of the most atrocious and vile terrorist attachs that our generation has seen. It's easy to forget many things in our daily rat race, but I don't think I will easily forget this video, which was all over the internet hours/days after the attack on the Bataclan concert hall in Paris:

All it takes is a drop of empathy and humanity to understand the horror that these victims went through. The sound of the one person shouting "Oscar .... Oscar... Oscar..." just keeps on ringing through my head.

Friday, 25 November 2016

Podcast Interview with Craig Taverner, Neo Technology

The interview below was long overdue - but very much worth the wait. For the past couple of years, the Neo4j community has been brewing on a really interesting add-on capability to integrate GIS-style, spatial querying capabilities into Neo4j. It's such a great and natural fit - and one of the driving forces behind this in the community has always been this global citizen called Craig Taverner. Craig has been in the ecosystem for years - first as a community member, then as a commercial customer, and now as an employee in Neo's Swedish engineering team. So about time we had a chat:

Here's the transcript of our conversation:
RVB: 00:02.785 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again, recording another Neo4j Graphistania podcast session. And today I'm joined by one of my colleagues actually, in the Neo4j engineering team, Craig Taverner. Hi Craig.

Wednesday, 16 November 2016

Podcast Interview with Evelina Gabasova, University of Cambridge

Oh man - things are heating up in the graph space, and keeping me super busy. After announcing our Series D last week (read more over here) I barely found the chance to publish this interview with Evelina Gabasova about graphs, Star Wars and biotech. Listen or read the full interview below.



Here's the transcription of our conversation:
RVB: 00:03.805 Hello, everyone. My name is Rik - Rik Van Bruggen from Neo Technology, and here we are again. It's been a while. We're recording another episode of our Neo4j Graphistania podcast, and today I have a wonderful lady from the beautiful lands around Cambridge, on the other side of this Skype call. And that's Evelina Gabasova. Hello, Evelina.

Thursday, 20 October 2016

Podcast Interview with Amanda Schaffer, Cisco

Been a while actually since the below conversation happened - but now I have a finally found the time to put it up here. Apologies for the delay. Amanda Schaefer has been a really great community member for Neo4j and has been using and advocating the use of graphs in lots of different use cases. Listen to her story - and hopefully it will be an inspiration for other graphistas to come out of the woodwork and start tackling real world use cases with Neo4j.

Here's the transcript of our conversation:
RVB: 00:02.767 Hello everyone. My name is Rik Van Bruggen from Neo Technology and here we are again recording another Graphistania podcast episode and today I'm joined by Amanda Schaefer from Cisco. And you're based in Seattle, right Amanda?
AS: 00:17.171 That's correct.
RVB: 00:20.162 Well our listeners may not know you yet, even though you've participated in the GraphGist Challenge last winter. So why don't you introduce yourself, Amanda?
AS: 00:29.131 Sure, so I am the technical lead for an analytics team in Cisco and my group focuses on maintenance contract renewals and kind of optimizing the quoting work flow and optimizing customer success. So we look at a lot of metrics related to opportunity and bookings and quoting packways and things like that.
RVB: 00:50.598 Wow, that sounds really interesting and you know that Cisco is already a Neo4j user, maybe someday you'll get to use it professionally there as well.
AS: 01:00.715 I hope so. I'm working on a couple of use cases for that.
RVB: 01:03.514 Really cool. But what I've read from your work so far, you've been using Neo4j for some of your personal projects, right.
AS: 01:10.929 I have, yeah.
RVB: 01:12.699 Can you tell us a little bit more about that?
AS: 01:14.455 Sure, absolutely. So I started out going to some theater productions around Seattle and noticed that I recognized a lot of the actors from different plays and different theater companies and I got interested in graphing that, because that's such a perfect kind of classic graph problem, mapping people in social networks, and so I was interested in that in the theater space. So for the GraphGist Challenge last winter, I wanted to take a look at that and ended up focusing on the Seattle Shakespeare Company, mostly because their data was the best available of the local theater companies [laughter] so I had to deal with the least data engineering for that, and could focus on the analysis a little bit more. I took a look at their past productions, and matched that up with all of the available Shakespeare plays, and took a look at things like production year and comedies versus tragedies, and their normal seasons versus the things that they take out to the parks. Just had a lot of fun exploring the data with Neo4j.
RVB: 02:15.017 Did you learn anything interesting, things that you didn't know before?
AS: 02:19.544 I definitely learned that the most popular plays and the most successful ones tend to be the things that they take to the parks, which was interesting. I found that there were only eight plays of Shakespeare's that the Seattle Shakespeare Company hasn't produced, so they have done a pretty comprehensive job.
RVB: 02:37.899 When you say, take it to the park what does that mean? I'm not familiar with that.
AS: 02:41.361 So there is a summer kind of "Shakespeare in the park" program, where they go out to different parks around the city, and tour, even around Washington state and do free production in parks around the summer.
RVB: 02:55.427 Now you are just trying to get me to move to Seattle, right?
AS: 02:58.729 Seattle is a fantastic place to visit in the summer, I highly recommend it.
RVB: 03:03.180 Very good. You told me a little bit about some other projects that you have been working on as well, like the movie festival. Tell us about that maybe.
AS: 03:11.317 Yes. So Seattle has an international film festival that takes place in May and June. And so this year I had a festival pass which means I could see any movie essentially that was running during the festival. But there are about 500 movies to see in about three and a half weeks. So figuring out which movies to see is a big challenge. I watched all of the trailers and rated the movies according to my interests, and then I loaded the schedules, the theaters, the transit time between theaters, and my ratings and the movies all into Neo4j, and using a Python program, I created my optimal schedule for the international film festival. I ran 100 simulations and took a look at the top 10 to 20 schedules, and used that as my basis for deciding which movies to go see.
RVB: 04:03.776 Sweet. That sounds really great. It reminds me a little bit of the use case that we talked about on this podcast a couple episodes back, about the Date Night movies. I don't know if you heard that episode. It's datenightmovies.com. You'll like it if you're a movie buff [laughter].
AS: 04:22.165 Great, I'll have to take a look at that.
RVB: 04:24.285 Yeah. Very good. So why graphs, Amanda? Why did you get into graphs, and what's so cool about them for you?
AS: 04:32.701 I actually got interested in graphs looking at the master data management use case, because as part of our quoting workflows, we have a lot of places where a single kind of parent company will have a bunch of different subsidiaries or a bunch of different field locations, and we want to be able to understand which of these contracts really belong to the same company and things like that. So I took a hands on workshop with Nicole White from Neo4j actually, at NoSQLNow! in 2015 and that was my official kind of hands on introduction, when I was exploring that master data management use case. And I just kind of got hooked after that workshop. It was so much fan and so easy and intuitive to play around with the graph model especially in Neo4j, so from there it just sort of took of. And they say once you get it, everything is a graph and I think that's really true. I am always kind of thinking about how can I make this into a fun Neo4j project?
RVB: 05:38.791 Absolutely yeah, it's unbelievable. I was actually jogging yesterday and all of a sudden there is my podcast is talking about graphs [chuckles]. It was really, really peculiar. All right, the model, that's what you find very interesting, the ease of use, is there anything particular that you find most appealing in Neo4j?
AS: 06:02.033 I love the ease of use. For me. I'm just kind of always thinking about the intersection of business and technology, or the intersection of modeling real world things and technology. So the modeling events, like I did for the film festival, is very interesting to me, like the use case I was thinking about, DataDay Seattle, a local conference that I attended a couple of days ago. And thinking about conference management software, and figuring out combining the sessions scheduling with a recommendation engine, which I think are both things that Neo4j does really well. And it seems like you could build a really powerful conference scheduler application based on that, so attendees could make it social, and recommend sessions for each other, and things like that. So just always thinking about the ways that things are connected, and how to just apply these classic graph problems to a lot of situations in the real world.
RVB: 07:02.117 Hmm totally. Well, at GraphConnect we always have a schedule graph. I don't know if you are familiar with that, but in the GraphConnect conference that we host every year, every six months, upcoming October in San Francisco as well, we'll have a schedule graph as well. So maybe that's a starting point for you.
AS: 07:18.126 That's great, I have actually recently purchased tickets to the event in San Francisco, so I am really looking forward to it.
RVB: 07:24.554 Fantastic, we will see each other there for sure then. Other than GraphConnect, what is the future look like Amanda? Where do you think this is going for you personally, for the industry? What's in store, do you think?
AS: 07:39.721 For me, the really interesting next step, and the hurdle that I need to overcome to use it professionally, is just really making the self-serve, analytics part, and getting graph understanding out to the typical data analysts on my team, and the people that would use this analysis day-to-day in their business, and helping them understand these cases and understand the graphic analysis, and things like that, and I think making it really accessible to a lot more people around the organization, is one of the biggest challenges that I'm looking at. In Neo4J 3.0 there's the ability to share the graph style sheets that you've set up in the cloud, so that everyone can see exactly what you're seeing on the screen, and it's much more easy to share those around the organization. Things like that I'm really excited about, because at least in my organization, I know that this is a very cool thing, and there are a lot of use cases for it, but I need to take that out and empower other people to figure out how to take advantage of it. That's what I'm really looking forward to.
RVB: 08:52.664 Fantastic. I think that's something that we'll see many more people working on in the next couple of years, for us as well, as a company in this industry, it's really important that we make that work. Cool. Amanda, you know that we want to keep these podcasts fairly short, so we'll put some links maybe to your graph sheets and the rest of your work, on the transcription page, if that's okay. For now, I'm just going to thank you so much for coming online and having a chat with me, and I look forward to meeting you at GraphConnect.
AS: 09:25.550 Thanks, Rik. I had a lot of fun on the podcast this morning.
RVB: 09:28.510 Cheers. Bye-bye.
AS: 09:29.962 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Thursday, 6 October 2016

Podcast Interview with Alessio De Angelis, Whitehall Reply

I have said it before and I will say it again, but the joy of this podcast - for me, at least - is the fact that I get to talk to all of these wonderful people out there. Today's conversation is with a really cool and talented young software developer out in Rome, Italy, who has had a passion for graphs for a while - and is now applying it to Neo4j and NOSQL database approaches. Here's my chat with Alessio De Angelis:


Here's the transcript of our conversation:
RVB: 00:02.287 Hello everyone. My name is Rik, Rik van Bruggen from Neo Technology, and here I am recording another Graphistania Neo4j podcast, and today I'm joined by someone from a place with a lot better weather than where I am, that's for sure. Alessio De Angelis from Rome in Italy. Hi Alessio. 
ADA: 00:21.861 Hi Rik. 
RVB: 00:23.151 Hey, it's great to have you on the podcast. Thank you for making the time. 
ADA: 00:26.929 Thank you for inviting me. 
RVB: 00:28.415 Very good. Alessio, our listeners probably don't know you yet, so I read some of your work on the GraphGist Channel Challenge (Santa's shortest weighted path), but you may want to introduce yourself. Who are you, what do you do? 
ADA: 00:44.241 Yes, sure. I'm now working as an IT consultant, in particular involve into big data, data warehouse, and SQL projects, but my love with the graphs started a long time ago, probably three years ago, because at my university I was working on a Master's thesis, where I had to do a recommendation engine, in the cultural heritage domain, and I was looking for a tool that was helping me to profile the user interest, storing the places, historical monuments he and his friends visited according to their social network accounts, together with the extra information delivered by linking the data  in graphs. And Neo4j was the perfect database to achieve all those tasks. So, I really fell in love with this database.
RVB: 01:56.472 That's great to hear. Fantastic. I mean, there's a lot of people that are actually using Neo4j in the cultural heritage space. I've actually interviewed a couple of them on this podcast already, so I'll send you some links and I'll put them on the transcription as well (note: see posts like this one, interviewing Iian Neill about his Codex, or this one, interviewing Lorenzo Speranzoni about his work on Van Gogh's journey). So, that was a really great use-case, and then you also decided that you wanted to do something for Santa Claus as well? 
ADA: 02:21.394 Yeah [laughter]. 
RVB: 02:24.278 What was that about? 
ADA: 02:26.193 I started looking for-- I was in the Neo4j main page-- actually, no, in the Twitter account, and I read up about these GraphGist Challenge, and say, "Whoa, I want to participate." And then I was seeing the domains, and I saw Santa Claus. Nice. Let's think about Santa Claus life. And since I was a little kid, I was wondering how could Santa Claus manage to reach every children in the world, and give the presents before they fall asleep. 
RVB: 03:11.683 Because he has a graph database. Now we know. 
ADA: 03:13.636 Yes [laughter]. 
RVB: 03:18.643 Very funny. 
ADA: 03:18.854 He managed to think about an algorithm for searching the shortest path in the graph of children all over the world. I think for sure that is the reason why Santa Claus managed to do it now in 2016. 
RVB: 03:39.121 There is no doubt, I know he uses Neo4j [chuckles], but maybe he's actually using it with some of the new awesome procedures, the APOC, because I don't know if you are familiar with that but the weighted shortest path to calculations, you can do those with algorithms like Dijkstra and A* [crosstalk]. 
ADA: 03:59.679 Yes. 
RVB: 04:01.044 And in the APOC, in the awesome procedures that we have now in Neo4j 3.X, there's those algorithms and you can call them from Cypher. I think next Christmas you have to go to Santa Claus, and you have to tell him how to improve his database. 
ADA: 04:16.610 Yes. For sure [laughter]. 
RVB: 04:19.417 Very cool. So, why is it so attractive to you? Why did you fall in love you think with the wonderful world of graphs, is it the model, is it the performance, what is it about for you? 
ADA: 04:33.153 Firstly, because as the Neo4j slogan says, graphs are everywhere. Probably we don't think about it but really graphs are everywhere, and probably you can model all domain in a graph way. And then in particular, I was attracted by Cypher, because it's so short syntax, but really powerful. I really love the pattern matching, and Santa Claus in his team managed to do all the search of the shortest way weighted path with just Cypher and his main constructs like-- 
RVB: 05:22.258 Fantastic. That's a great summary. I mean if Santa Claus can use it then everyone can, right? 
ADA: 05:27.348 Yes. 
RVB: 05:28.332 It's obvious [laughter]. And, is there anything that you think that would really make it even more awesome, that you think we should really be adding to Cypher, or Neo4j, or maybe looking a little bit into the future? What do you think about that? What do you think the future holds for this industry? 
ADA: 05:53.424 I think it's really growing because, even in Italy, much more companies are getting through this database, and if I think several years ago no one was knowing that database is probably just at university. In some courses, I was listening to people talking about graph databases, but now I think it's really a reality. In the future, it would be a really-- much more projects would be involved no SQL databases. 
RVB: 06:38.832 Is there a particular kind of project where you think that it would really fit best, or what do you think would be the sweet spot for the graphs in the next couple of years? 
ADA: 06:48.486 Yeah. For sure, in our research, operations, course, we did so many , like for searching the best solution, or-- yeah. So, I think each one of these theoretical aggregates can have an improvement if it's applied with a graph database, like Santa Claus did with just a normal shortest weighted path search algorithm, improved by Cypher language. 
RVB: 07:30.011 So, basically what you're saying is that there's a lot of just regular database applications that could benefit from it, and offer new functionalities, new [crosstalk]. 
ADA: 07:40.226 Yes, probably just some-- a algorithm that work on data as a graph, but just not considering the data as a graph itself. So, they try to travel among these data, and try to connect them, but without using a proper graph database. If they use it, I'm sure that their performance will improve. 
RVB: 08:12.035 I agree. I couldn't agree more, actually. Thank you so much for sharing that, Alessio, I really appreciate it. And, again, maybe next Christmas we'll have a new and improved [chuckles] engine for the pathfinding of Santa Claus. In the meanwhile, I think we'll try to keep these podcasts short and snappy, so I'm going to thank you for coming online, and sharing your experience with us. It's been a great time talking to you, and I hope to see you at one of the future Neo4j community or conference events, you know? 
ADA: 08:53.743 Yes. 
RVB: 08:55.047 That'll be great. Thank you [crosstalk]. Have a nice day. 
ADA: 08:58.099 Yeah, you too. Thank you for inviting me. 
RVB: 08:59.814 Bye. 
ADA: 09:00.891 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Monday, 26 September 2016

Podcast Interview with Sascha Peukert, TU Dresden

Another week another podcast episode! Here's a great chat with a graphista that has not even started his professional career yet - but that, by the sounds of it - could really be a stellar one. My chat with Sascha Peukert in Germany:

Here's the transcript of our conversation:
RVB: 00:04.386 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are recording another podcast session for our Graphistania podcast. And today I have a very interesting guest on the podcast from Germany - from Dresden. And that's Herr Sascha Peukert. Hi, Sascha. 

Friday, 16 September 2016

Podcast Interview with Greg Ricker, {{HIDDEN CO}}

Note: at the request of Greg's employer, we have removed all references to his employer's name in this podcast

What do Spatial Engineering, Epidemiology, One Direction have in common? Well, of course, it would be our Neo4j graph database. Or: my guest on this week's Graphistania (also spelled as "graph is tinier", according to the nice folks at TranscribeMe), Greg Ricker, of {{HIDDEN CO}}. Greg wrote this really cool graphgist (see below) a while ago, but turns out he's been doing a ton of fine and interesting computer science projects. Here's the conversation.

Here's the transcript of our conversation:
RVB: 00:02.790 Hello everyone. My name is Rik. Rik Van Bruggen from Neo Technology, and here I am again recording a little podcast episode for our "graphistania" podcast. And tonight I have a guest all the way from the U.S., Greg Ricker, from Greenfield, Maine. Hi, Greg. 
GR: 00:19.692 Hello. How are you? 
RVB: 00:20.611 I'm very, very well. Thanks for coming online, really appreciate it. Greg, you've been active in the Neo4j community for quite some time, but for our listeners, would you mind introducing yourself, and tell us a little bit about you? 
GR: 00:36.068 I am a software engineer. I've been working at this for a little over 25 years, and I've done lots of things from embedded systems to larger enterprise Java-based systems. Spent a lot of time in the RFID world. Did some work in public health, and I'm now working for the exciting world of insurance. 
RVB: 00:58.902 That is an exciting path that you've been on. And how did you get into the world of graphs there, Greg? 
GR: 01:05.009 I started it-- got interested in it when I was involved with the public health in the state of Maine, and I was also involved in a Master's program at University of Maine in spatial engineering. And you could immediately see the relationship between what public health were doing with vaccinations and disease tracking. And it was a perfect fit for graphs, so you'd like to know who's got a vaccination, who's getting reports of diseases, and it's just easily trackable in a graph system. The relationships are just perfect for it. 
RVB: 01:41.844 Perfect. Did you find anything interesting from that research that you did at the time? 
GR: 01:47.923 Yeah, we found-- we were trying to use it with the geolocation, so we wanted to know what vaccinations were occurring at what locations, and how that tracked with disease reporting. And the thing that we found was that you really have to make sure you're looking backwards in time. So, if you have a disease outbreak today, you need to look back a couple of years and see what the vaccination rates were in that location back then. We made the mistake of looking for vaccinations around the same time period, and obviously if you have a disease outbreak you're going to wind up with vaccinations. So, after looking backwards, we do think there was probably-- you could see a lowering of incidences. 
RVB: 02:30.561 Very interesting, and-- but you've done some really-- some very different things with Neo4J as well. I read some of your graphs, jeez I think it was around entertaining your lovely daughter [laughter] and her music choice
GR: 02:45.021 Yes. 
RVB: 02:46.972 I know how that feels, by the way [chuckles]. 
GR: 02:49.147 My daughters have always wondered what I do for a living because they're 19 and 16. I used to spend a lot of time trying to keep up with their musical interests. And so, for this graph just this last time I thought, "Well, I could drag my younger daughter into this by looking at song lyrics from her - at that time - favorite band One Direction. And the real goal was because I've used Neo4J with Python and R, which are great additions to the whole envtironment. What I wanted to see was could you determine whether a song was happy or sad? Just sort of a sentiment analysis. Then we could come up with some other interesting things, like how many times do they use certain words, and things like that. And so, I got her involvement in it so she could help me find the songs, and parse out the lyrics, and make sense of it. And then she actually went out and did a little research to find out what people thought might be happy songs and sad songs. 
RVB: 03:51.799 I'm actually going to show that to my daughter, Greg, if you don't mind, because I know that that might help me explain graphs to her [laughter]. 
GR: 04:00.803 It really did, so she could see-- so we did things. You'd have one graph, part of the graph was the band name, and then you had the band members, and then you had albums, and if you click on an album and you could see the songs, and then you could click on a song, and you could see all the words. And so, it really exploded for her in that, "Okay, I can really visually see this thing." And it made sense to her, because if I had tried to explain it in a standard database, she would have just wandered off. 
RVB: 04:30.216 Too bored. 
GR: 04:31.683 Yeah, but this was really-- this really made sense to her, and then we did things like look at how many words occur. One of the things you could do with the way we extracted the data was what's the most common first word in the first line? And then the sentiment analysis was a little tougher. Then probably the more interesting thing out of it was raw sentiment analysis might say, "If you see the word love in a phrase four times, it's a positive song." What we found is there's a lot more context to a song than there might be say for an email or something like that. So I Love You might be a positive song, but I Used to Love You might mean a negative song. 
RVB: 05:19.795 Of course. 
GR: 05:20.696 And so, that was something that we hadn't really considered. And so, one of the things we were going to go back and do was to look at the relationship between positive and negative words or tense, some things like that within a line. Or how close could you find the word, a positive and a negative word together, and that sort of thing. 
RVB: 05:42.973 So, Greg, why are graphs so interesting to you? Why do you use Neo4J for something like music analysis or epidemiology. What's so interesting about it? 
GR: 05:55.851 When you look at it, graphs are about relationships. And when you start looking at it from that point point of view, it's just a natural. It's easier to understand, it's easier to explain. And to me it's a lot faster when I think about how to process it than it is when I'm looking at tables, and joins, and foreign keys, and all that other stuff. It just seems like a much more natural way of storing data and retrieving data. 
RVB: 06:26.369 So, that's the model really? And yet the model is so attractive for explaining data, is that what I'm hearing? 
GR: 06:32.135 Yeah, but it just makes sense. At {{HIDDEN CO}} where I'm working, they store a lot of documents. And I'm working on a research project right now to use Neo4J as part of their document storage. And it really makes sense, because you've got customers, and you've got claims, and you have policies. And we don't want to store the documents themselves in Neo, we'll store them on some other like an Amazon cloud. But the metadata, the relationships between the policies, and the claims, and the customers fit really very well in a Neo4J. 
RVB: 07:08.959 Is there anything other than the model that you think is so powerful at it? Is it performance related or is it--? 
GR: 07:16.707 Yeah, performance has been very good, but the tools, like I said, I'm really in love with the Python and the R components for it. And obviously, the community support's excellent for all of this stuff. When I first started doing it, I was using Java with Spring, and I was blown away to find out that they all ready had a Spring component for it, so the support is good. One of the challenges I have at my job right now is to see how well it works with a database that has a billion records in it.
RVB: 07:56.092 There's a-- size does matter. Right [chuckles]? 
GR: 08:01.179 Yeah. 
RVB: 08:01.779 You'll probably want to take a good hard look at that, and obviously, that's one of the things that the company behind Neo4J - my employer - tries to help with. So, you'll let us know when you need us, but let's take a look at the future, Greg. Where is this going for you personally, but also as an industry. Where do you think this is going? 
GR: 08:24.702 One of the things that I see is a match-- is a process where they're saying one database doesn't fit all. In the case of {{HIDDEN CO}}, you have documents, but you have metadata. We're not going to store the documents in Neo4J, we'll store it in Mongo or Couch, and then we'll store the metadata in a Neo4J. I think what I'm seeing a lot of is a mix and match. Let's use documents stores where they're good, and let's use other databases where they're good as well, and not everybody is going to go off to Mongo or Couch or Neo4J. So that's where I think the future is going to be is a lot more mix and match and-- 
RVB: 09:16.112 Probably go out for assistance? 
GR: 09:17.514 Yeah, that's what I was thinking probably go out for assistance. Yes. I think that's the future. But there's-- even a company like {{HIDDEN CO}} is now fully on board with the no SQL model, and that's a broad, vague term to a lot of people. But the fact that they're not married to a single database model anymore is making life easier. 
RVB: 09:42.212 Super great to hear. Thank you so much, Greg, for taking the time to talk to us about that. I'll put some links to some of your blog posts and graph sheets on the transcription page whenever we stay at the podcast. But for now I think we will wrap up this recording, and thank you very much for coming online. It really was a joy talking to you. 
GR: 10:04.864 Thank you. I appreciate being asked. Thank you. 
RVB: 10:06.756 Fantastic. Talk you later. Bye. 
GR: 10:08.609 Okay.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Thursday, 8 September 2016

Podcast Interview with Dustin Cote, Luther College

I have called to him before, but I am going to do it again: thanks to the wonderful Bryce Merkl Sasaki at Neo4j, I have had the pleasure of talking to sooooooo many wonderful guests on the podcasts. People that I would otherwise probably never have talked to, for all kinds of reasons. Another one of these folks is my guest on the podcast today - Dustin Cote. An active graph enthusiast in the heartland of the Midwest - and a great knowledgeable database engineer at that, who I really enjoyed talking to. Here's our conversation:
Here's the transcript of our conversation:
RVB: 00:02.617 Hello everyone. My name is Rik Van Bruggen from Neo Technology and here I am recording another podcast session for the Neo4j Graphistania podcast. Tonight, I'm joined by someone from the beautiful Midwest in the USA. I've got Dustin Cote from Decorah, Iowa on the Skype call. Hi Dustin. How are you? 
DC: 00:25.316 Hi Rik. I'm doing great. How are you
RVB: 00:27.268 Very, very good. Thank you for joining me. It's always great to have people from different parts of the world on this podcast. I've read some of your work in the communities and in the GraphGist, Dustin, but most of our listeners probably haven't yet. Do you mind introducing yourself to us and telling us who are, what do you do, and what's your relationship with the wonderful world of graphs? 
DC: 00:53.085 All right. My name is Dustin Cody. I currently live in a northeast corner of Iowa in a small rural town by choice. I work for Luther College. It's a small, private college and just started last year. Prior to that, I was working at the University of Wisconsin-Madison and worked there for seven years as a PeopleSoft programmer analyst with emphasis in database design. Before that, I was a data warehouse administrator, and before that, I was doing stuff with databases and before that, I was learning to program in seventh grade. I've been around for a while, well seasoned as they say. That's what I'm working on, mainly because of my experience with ERP systems. Seems like no one wants to work on old technology, so there's always a niche for that. 
RVB: 01:45.121 The world would stop turning without old technology, I think. 
DC: 01:49.133 It would, believe it or not. 
RVB: 01:50.191 It would, I absolutely do. Dustin, what's your relationship with Neo4j and graph databases? How did you get into them? 
DC: 01:59.511 Very recently, there was a competition for GraphGist on Neo4j and I thought, "What a great to finally finish one of my project ideas." It had a deadline and basically, it was the only thing I needed to finish the project. I put together a conference data model, because I was going to a conference and I wanted to know different things from the booklet they gave you, but because it's in a certain order, you can't find out certain things. I knew that a graph database would be the perfect way to query on different angles and different ways of looking at your data. Before that, I would say back in 2008 when one of our companies moved and I was worried about competing against 25 Java developers, unleashed onto the town for jobs, I went back to school. Even though it turned out not to be an issue being reemployed, I continued to go back to school and get my masters. It was then that I was doing a research paper and one of the papers I was doing was to debunk over-hyped technologies. One of them was the Semantic Web
DC: 03:03.407 I heard of the Web 3.0 and thought to myself that this has been over-hyped. I've heard about it for years and I haven't seen anything about it. So while researching it, I realized that this is exactly what I had been looking for to solve so many of my own database projects. For example, I had some projects where I was saving just spare data. So, for instance, when you save music, MP3s or songs, it's usually saved as artist, album and song name. Well, if you talk to classical music enthusiasts, they care about orchestras and maybe the conductor, or the original composer. They look at different things, but it's the same item almost, and you still have to store it. You know, by the time I was done designing all the different kinds of genres, there was just no database for relational databases that could handle it well. And when I looked into Semantic Web, I realized that this was the solution. And back then, there was a NOSQL movement. I would say Semantic Web - since it saves as tuples or triples - I believe it's based on graph databases, a little more formal perhaps. And so my natural-- so all of the uses I found for Semantic Web I also find useful for Neo4j. And at the time, when I was choosing which graph database to use, Neo4j was the one that I had heard the most. It had the best reviews at the time. And so I decided to spend the little time I had with raising two kids to learn a new technology. And so that's why I picked Neo4j. 
RVB: 04:40.390 Interesting, yeah. It's very much related technology, and that we could talk for more than what we have in this podcast about the differences between Semantic Web databases and Neo4j, because they are quite different. But [chuckles], we'll take that off-line. But they are different but related technologies, I would say. What did you like about graph database then? What problems was it solving for you? Why was it such a good fit for some of the things that you had found problematic in other databases? 
DC: 05:18.521 One of the things I've always liked is how you can make non-obvious connections. You might have two different graft sets that are unconnected at the moment, but some time in the future if you're still collecting data you might [inaudible] two nodes together, and then suddenly your same query would return different results, and, in fact, maybe solve a problem. I think one of the examples I saw long ago was perhaps besides the NASA and CIA kind of examples, would be different language authors, and they might have a different name in one language than another. You might be following him and you suddenly make the connection and realize that they have all this other body of work. So I like those kind of solutions that the graph database is continuously growing and making connections for you. 
RVB: 06:08.085 Yes, like inferring new paths between different parts of the graph. Is that what I'm hearing? 
DC: 06:15.632 Well-spoken. Yeah, exactly [laughter]. 
RVB: 06:19.269 Yeah, well. I mean that is a-- that's one of the examples that I always give. It's like the path finding. I've got these two things and how are they connected to each other, and show me those connections. Whether you're talking about a [BR?] data set or a social network or recommendation engine, that's one of the most powerful use case-- those hidden connections as you call it. 
DC: 06:41.392 Right, and with database design, when you're working in large company, you have to spend so much time, ahead of time, designing things. And then once it's approved, and then once it's implemented, many months or years can go by and once it's done, you're pretty much nail in the coffin, and you really don't want to change it again because you don't know what you might change. With graph databases, sometimes you can add a whole new set of features or properties, and it won't affect the past data you had. It'll actually just enrich the data you have in your new applications that can leverage it. That's very powerful. 
RVB: 07:16.441 That's such a powerful point that you're making there, and I think it's also why it's like a perfect storm now for graph databases because of the whole agile developing paradigm as well. You know, people don't develop waterfall systems anymore. You know, they try to take a much more leaner approach to software development. You don't know what you don't know when you're developing a new system. It's a great fit for that, I think, as well. Dustin, where do you think this is going? Where do you, personally, want to take this? I'll put some of the links to your work on the blog post with this podcast recording, but where do you see it going? What do you want to do with it? Where do you see the industry taking it? 
DC: 08:04.811 Where I personally want to take it is, perhaps, more projects that can help categorize interests. I've been an internal description, is my industry. I'm might be interested in homesteading or something. There are podcasts out there that are in the 2,000 episode range and it would be great to be able to categorize those and find exactly what you're looking for, maybe even the author or the interviewee. Some day, maybe you'll have 2,000 interviews here. 
RVB: 08:33.387 Oh my God [chuckles]. 
DC: 08:35.375 You never know how you can tie all those people together. People bring such great resources from their work. You know, [hack the plans?], it's more like categorize the plan. I think graph databases is good for that. As far as the world's future, I think this is a tool that's generic enough and powerful enough that someone out there that doesn't even know anything about this yet is going to come up with an idea, use your project and platform, and come up with something new. It's just that kind of technology that you're creating and it's really going to be anyone's imagination really, something that we haven't even thought of yet. 
RVB: 09:18.040 Like a platform for innovation, basically? Like doing--? 
DC: 09:20.356 Absolutely. 
RVB: 09:22.665 Super relevant. I really like that perspective. I really do. I think, for example, in the past couple of months, we've seen at Neo4j when some of the Panama Papers research was published and stuff like that. That was for us fantastic validation. That would never have happened ten years ago and now it's happening because of the small contribution that we're making, I guess. 
DC: 09:49.082 Absolutely. I like your ramp up time to test out Neo4j with that web interface you have, where you can start importing right away. It's a great tool. 
RVB: 09:57.583 Very cool. Very nice. Thank you, Dustin, for spending your time with me on this recording. As you know, I like to keep these things short and snappy and digestible for everyone. We'll put some of the links to your work with the transcription, but for now, I want to thank you very much for coming online. I hope to meet you in person, face-to-face, at some point [chuckles]. 
DC: 10:21.682 It's my pleasure and keep up the good work. 
RVB: 10:23.838 Thank you so much. Bye. 
DC: 10:25.481 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Wednesday, 31 August 2016

Podcast Interview with Dirk Vermeylen, HP Enterprise

Last month I had the pleasure of interviewing one of my fellow countrymen, a Belgian graphista living a short distance from my home in Antwerp, who had submitted a very cool graphgist to our challenge earlier this year. Dirk Vermeylen did a great gist about trying to model sports results as a graph database - very interesting, so I will let him explain it to you himself. Here's the recording of our conversation:
Here's the transcript of our conversation:
RVB: 00:02.538 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and I am here again recording a podcast episode with someone in my own country. That's actually very, very-- I don't think that's happened before. Dirk Vermeylen, from HP Enterprise is joining me from a couple of miles down the road on our phone call. Dirk, welcome. 
DV: 00:23.708 Thank you, Rik. Thanks. 
RVB: 00:25.290 Cheers. Thanks for coming online. Dirk, we got to know each other because of some GraphGist that you created recently which we'll talk about more a little bit later, but why don't you introduce yourself to our listeners first of all? 
DV: 00:41.285 I am working for HPE, used to work for EDS. So that was taken over by HP then became HPE and there is some changes in the future as well. I work as a consultant in the service delivery and infrastructure environment. 
RVB: 01:04.137 Okay and then what's your relationship to the wonderful world of graphs? How do you get into that if I may ask? 
DV: 01:13.628 It is more or less by accident. I am-- as part of service delivery and a lot of work is done on configuration databases on CMDBs, where what we need to do is migrate CMDBs and varying CMDBs on for example, what is important is you want to know what service are implemented in specific data centers and what applications are related to it. There are many, many relations in between like server databases that are installed and instances at business services that are configured to it, so you only have all sets of relationship. We need to work with it and I was looking into better ways on how we can manage the data on it. Then Neo4j was mentioned when reading the internet, so at some point in time I thought I may get this a try. I did it and it worked in fact, [?] though. 
RVB: 02:18.972 I think there's a lot of people out there that are using Neo4j for configuration management databases because of the relationships and review and the impact analysis that allows you to do. So, if something happens in your configuration, what's the impact on the rest of the configuration. Is that also your case or was that your background as well? 
DV: 02:41.301 Exactly. Exactly. And the advantage is that when you do queries, actually Neo4j allows to document and very well decipher way of working is the very nice way of documentation. It reads easier than you do with complex SQL queries whereas we join and all of that that's a bit more tricky to understand what you have been doing a couple of days later. 
RVB: 03:05.779 Absolutely. I have so many people that talked to us about that. That's great to hear. But you ended up writing that GraphGist for something completely different, right [chuckles]? What was that all about? It was about your running partners, right? 
DV: 03:21.132 That's right. I was working on CMDBs also a little bit in my free time. At some point you have done and you're ready with it. At that time the GraphGist challenge appeared and it says try to do this problem if you can do it on a wide board, it will work. In my free time I tried to collect information about our running competition which is just like we go running with a couple of friends. And first at the rising in the race gets 50 points, second 45, third 40 points, and so on. And then we just add them all up together and at the end of the year you have a winner for motivation. We keep track of this points in the Excel spreadsheet. I want to automate it because automation is fun. And I tried to make a big draw. The problem-- and it worked very well at the same time as the GraphGist challenge, so I thought probably I should spent a little bit more time and work on this GraphGist challenge. 
RVB: 04:33.723 Super cool. We'll put a link to your actual GraphGist on the blog post with this episode as well. What was so nice about it? What was it that made the GraphGist and Neo4j such a good fit for that particular running assignment that you wanted to solve? 
DV: 04:53.803 GraphGist was easy to use, and you need the AsciiDoctor which is like-- you take notes-- part plain text and then with very little markup or markdown, as it was called, you can specify query. And see the results actually directly in the GraphGist. And Neo4j is very visual so you create-- when one query, you see a little bit of the dots already and you say "Ah, I need a second query." And from the second query you go to the third query. So, it's very visual. It allows you to very quickly progress in your problem fields. 
RVB: 05:36.505 Very, very nice. I bet your running buddies were happy with the result [chuckles]. 
DV: 05:44.441 I didn't show it to them already. Part of the thing is that I used test results which means the races are correct, the people are correct, but the sequence of rival is not correct in the GraphGist. So I'm not sure if they will be so happy with me what I assumed there. 
RVB: 06:05.157 You made yourself win every time [laughter]. 
DV: 06:08.787 No. Not me. Someone else. But it's-- really it's test data. 
RVB: 06:13.784 I understand. Very cool. All right. So what are you going to use this for in the future, Dirk? Do you have any more professional or personal plans with this? Or where do you want to take this in the future? 
DV: 06:26.347 And part of the GraphGist as well was to get more experience with cypher and I used it for CMDB to challenge with configuration management databases that you have like 20,000 objects and 18,000 relations. So if you launch a query, it may or it may not end up with the result that you are expecting because of lack of experience. With my very small example, it's a lot easier to understand what you are doing. And where I want to take it in future is configuration management databases, of course. Also, on open data-- I'm more and more involved in open data. And open data is all about linked data. And with linked data, you are very close to Neo4j and these graphical relations again. 
RVB: 07:19.085 Super. Well, I mean, I think we will meet again there. Because that's exactly the type of stuff that I've been working on in the meetups here in Belgium as well. And I'm sure we'll have a chat about this in the future then. Very cool. Anything else you want to add, Derrick? Is there anything that would should be paying attention to in your work? Or otherwise, I think we'll keep this podcast recording short. 
DV: 07:45.950 That's perfect. I'm happy with what Neo4j was doing already. I was working on Neo4j version two. And py2neo is very important. The py2neo library. I had a little bit of issues that I've seen-- that the new version of Neo4j and the new version of py2neo solve them all. So I'm looking forward to play around with these. 
RVB: 08:10.485 Well, it's summer time. You know, this is perfect time to experiment with those [crosstalk]. 
DV: 08:16.662 Exactly. 
RVB: 08:16.236 All right, Dirk. Thank you so much for sharing your work with us and with the entire community. I really appreciate it. And thank you for coming online to do this recording. And I'm sure we'll meet very soon. 
DV: 08:28.730 Thanks, Rick. Thanks. It was nice doing this. 
RVB: 08:30.685 Thanks a lot. Bye. 
DV: 08:31.662 Thanks. Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Monday, 29 August 2016

Orienteering with Neo4j - solving the 1000-control race with the Dijkstra-APOC - part 2/2

In part 1 of this blogpost, I explained how you can use the Awesome Neo4j APOCs to calculate a weighted shortest path on a graph with a more optimized and more efficient algorithm, based on Dijkstra's work. In this second and last blogpost on this topic, I would love to explain a bit why I think this is pretty much a very big deal. APOCs give you access, from Cypher, to a whole slew of graph algorithms, many of them very useful for all kinds of different graph operations.

Orienteering - a bit more complicated in the real world

One reason why I wanted to write this second post, is of course because my lovely sport - Orienteering - is of course a bit more complicated in the real world than what you have seen in that little park run that I talked about in the previous two posts. To give you a feel for it:

  • Here's an excerpt of my run in the World Masters Orienteering Champs a few weeks back in Estonia. More details over here - but I can tell you that for each and every one of these legs there's at least half a dozen different route options - and of course my course had 20+ control points too. So a bit more of a bigger graph anyway!
  • And here's another example: actually being run today (August 25th) is the actual elite's World Orienteering Champs on the long distance. Just. Look. At. This. Map.

Thursday, 25 August 2016

Orienteering with Neo4j - moving from Cypher to the Dijkstra-APOC - part 1/2

So last July, my dear colleagues at Neo4j decided that they would tweet about a blogpost that I wrote 3 years ago.

The post was first published on my own blog over here, and then re-blogged over at the neo4j.com/blog. I also wrote a graphgist about it at the time, which I have revisited on the graphgist portal just now.

Some context

This entire thing started that summer with a blogpost by my friend and (at the time) colleague, Ian Robinson, about using a clever cypher query to calculate the weighted shortest path over a (small) graph. I decided to use that mechanism and apply it to my lovely hobby/sport: Orienteering. Pathfinding through forests, parks, cities - it's what we do in that sport, all the time. And efficient pathfinding in this environment, requires you to read the map, understand what the fastest route is, and run that as fast as you can. Effectively, when you want to "understand the fastest route", you will be weighing different alternative route choices against one another, and - as quickly as you can - choose that one for your run. It is, in effect, a total graph problem, a total "weighted shortest path problem" on a detailed map of your surroundings. So I used Ian's approach, and applied it to a small graph of an orienteering excercise in an Antwerp park.

Tuesday, 23 August 2016

Podcast Interview with Daniel Himmelstein, University of Pennsylvania

So today's podcast episode may well be one of the most interesting that I have ever published - and we have had some darn interesting episodes, if you ask me :) ... I got to know our guest, Daniel Himmelstein, by his great graphgist on "Drug repurposing by hetnet relationship prediction". Really interesting stuff - and Daniel actually got his PhD on this topic too. I found this video of his Thesis Seminar if you want more detail:

But for now we will just have a great conversation about his work. More interesting links below in the transcription - as usual.

Wednesday, 17 August 2016

Podcast interview with Stefan Plantikow, Neo Technology

Today's episode in the Graphistania podcast is one that I have really been looking forward to, for many reasons. First of all, our guest is such a lovely guy - feels like I could go out on a VERY long pub crawl with Stefan - seriously. Then, he has been working on some of the most interesting topics in Neo4j - another bonus. Most recently, he has worked on the "swiss army knife" of Neo4j tooling, the Awesome Apocs. Enough reason to have a good podcast chat together - and here that is:


Here's the transcript of our conversation from July 4th, 2016:
RVB: 00:02.518 Hello everyone, my name is Rik, Rik Van Bruggen from Neo, and here we are again, recording another Graphistania podcast, and today I have one of my lovely colleagues from the engineering team with me, Stefan Plantikow from Berlin. Hi Stefan.

Friday, 12 August 2016

The Great Olympian Graph - part 3/3

In part 1 of this blogpost series, we created and prepared a dataset with all the modern Olympian medallists from 1896 to 2012. In part 2, we loaded all that data into Neo4j: here's that article. Now, we of course want to do some interesting queries on the dataset, and see if the Graph will yield any interesting insights - as it usually does.

Easy querying - number of sports per Olympic game

Let's start with something easy - doing some counting of the numbers of sports in every game since 1896. Here's how we do that:
//number of sports per game 
match (y:Year)<--(e:Event)-->(d:Discipline)-->(s:Sport) 
with distinct y.name as game, s.name as sport 
return game, count(sport)order by game ASC
Then you can see that the number of sports has not really changed that dramatically over the years: In the early days we immediately went from 9 sports at the first Olympic game in 1896, to 19 in 1900.

Wednesday, 10 August 2016

The Great Olympian Graph - part 2/3

In the previous blogpost of this Olympic series I explained how I got to the dataset in 4 distinct .csv files that get generated from a 4-worksheet Google Spreadsheet. Here are the links to the 4 sheets:
Now, in order to load that data into Neo4j, I had to come up with a meaningful graph model.

Monday, 8 August 2016

The Great Olympian Graph - part 1/3


After my previous experiments with some sports data (most recently, the Tour de France 2016 results) in Neo4j, I recently saw the 2016 Olympic games coming up, and thought: well, there MUST be some interesting datasets to find around that - especially now that one of my favourite bike-riders in the world, Greg Van Avermaet, won the Gold Medal in the Cycling Road Race. Still so excited!!!




I did a bit of research and decided to settle on a combination of two datasets:
Just before the London Olympics in 2012,

Wednesday, 3 August 2016

Graphs @ Radiolab

So I go for my morning run the other day, and I put on my 2nd dearest podcast (after Graphistania, of course) - Radiolab. They have the most amazing stories that make me laugh, cry, read and research - and guess what: this episode is about GRAPHS!

Listen to this episode:
telling the amazing story of connectedness between soil, fungi, trees, and animals... aka the Wood Wide Web. The "internet of fungus", as the Beeb calls it.

Check out this TED talk too:
or this article about the "Intelligent Plant" and the connections that exist there.

If ever we needed more proof:

(GRAPHS)-[:ARE]->(EVERYWHERE),

even on your daily podcast!

Cheers

Rik

Tuesday, 2 August 2016

Podcast Interview with Dave Bennett, Nulli

So, in an age that feels like the distant past (in other words: before my holidays), I had a really great conversation with a "small world" neighbour of mine, talking about two of my passions - one older and one more recent - the interesting relationship between graphs and identity & access management. I wrote about this some time ago already, and it continues to be one of the more interesting and popular use cases for graph databases.

So that would be a great chat from the get-go - and here it is:

Here's the transcript of our conversation:
RVB: 00:02.575 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology and here we are again recording another podcast episode. Today I'm joined by I think it's the first Canadian that I have on this podcast, Dave Bennett from Nulli. Hi, Dave. 

Wednesday, 20 July 2016

Graphing the Tour de France - part 3/3

In the past two blogposts I have been creating and importing some nice Tour de France 2016 data. It's a small dataset, for sure, and this is by no means a realistic graph application - but perhaps we can still have some fun exploiting the data with some cypher queries. That's what we'll try now. I have put all of the example queries together in this gist, so please feel free to play around with it :) ... let's take you through it.

Is the model really there?

First and foremost, let's verify the model that we wanted to put in place, with yet another AAPOC (Awesome APOC). We thought we were going to get this model:

Monday, 18 July 2016

Graphing the Tour de France - part 2/3

In a previous blog post, I created a couple of Google spreadsheets with some of the results data of the 2016 Tour de France. These spreadsheets can be very easily downloaded as two comma-separated files that hold the data:
I will be updating the stages.csv files as the Tour progresses, so we can keep updating the graph as well.

Creating a model

To import these CSV files into Neo4j, I actually went through multiple iterations of the model. Here's two of them that I wanted to share with you - not because of the fact that one of them would be "right" and the other one would be "wrong", but because it really reflects the fact that your use case - the questions that you want to ask of your data and what you want to be doing with the data - is going to determine the model. Underlined. In Bold. Because it's so important.

Thursday, 14 July 2016

Graphing the Tour de France - part 1/3

Alright, it's time to come out of the closet. I have to admit, over the past couple of years, I have turned into a bit of a cycling geek. I love watching the races in Flanders in spring, the legendary "ride through hell" from Paris to Roubaix, and of course, now, in summertime, the big tours of Italy, France and Spain. I have grown quite addicted to it - and have taken to riding my own bike a couple of times a week as well... it's a ton of fun. Last year I did a fun experiment in a series of 5 blog posts about the Professional Cycling twitterverse, but this year, I had something else thrown into my lap. Here's what happened.

Wednesday, 13 July 2016

Podcast Interview with Florent Biville, Criteo

Today is a good day, because I got to spend some time for the podcast talking to one of our most active and busiest community members in France. Florent Biville has been working with Neo4j for a long time, on various projects like Liquigraph and AssertJ Neo4j, and has been presenting his work at various conferences - like this one:


So now I got to talk to Florent in a bit more detail, and it was a true pleasure:

Here's the transcript of our conversation:
RVB: 00:02 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here I am again recording another podcast episode for our Graphistania podcast. And today I'm joined by someone from the city of lights, Paris, Florent Biville. Thank you for joining us, Florent. 
FB: 00:19 Thanks for having me. 
RVB: 00:21 Fantastic. Thank you for coming online. And yeah, Florent, maybe the best place to start is for you to introduce yourself to our listeners. I know you've been very active in the Neo4j community, but maybe you can introduce yourself. 
FB: 00:36 Sure. So, I'll just say that I'm based in Paris, and I work there in a company called Criteo. So, just in a few words, Criteo is one of the rare non-American companies that can compete with Google and Facebook. But, on the domain of online advertising, and more specifically, in the domain of retargeting. So basically, we're just trying to promote the best quality advertisement for users at scale. So that's basically what we do. 
RVB: 01:08 Wow. Is that a French company, Florent? Or this a--? 
FB: 01:11 Yes. Three co-founders are French. So now we have offices all around the world. Engineering is mostly based in Europe, but we have offices in the US, in South America, in Turkey, in England, in France, and so many countries I forget. But yeah, we are in a big expansion, and we're hiring also, by the way [chuckles]. 
RVB: 01:33 That's a nice plug, very well done [laughter]. Hey Florent, so what's your relationship to the wonderful world of graphs, then? How did you get into graphs? Can you tell us a little bit about that? 
FB: 01:45 Sure. So, during my first job, actually, that's the time I first heard about Neo4j. Also, we didn't use it directly. We were interested in a graph database for fraud detection because we were selling video games activations. And every time there was a fraud detected, it was usually detected too late, so we had to pay chargebacks from the bank. So it's lots of cash wasted, and also the activation keys could not be reused afterwards. So we were basically burning your activation key stock. So it was really a huge waste, and we were trying to push a lot of efforts into improving fraud detection for our video games we're selling. So that was basically my first contact with graph database. 
RVB: 02:29 How long ago is this? 
FB: 02:31 So it was, I guess, in 2010 - something like that. 
RVB: 02:34 That's a long time ago [chuckles]. 
FB: 02:37 Yes, yes. And then afterwards, I continued for my personal project. I really wanted to dig into that because-- I don't know, I just saw five minutes of Cypher, and I said, "Wow, that's so powerful and so interesting," even if the language was still young. And so, yeah, I tried Neo4j for my personal project later on, and then I joined a small company. We became partner with Neo4j. I guess we were one of the first in France, actually, to become partner. And I continued with some consulting gigs with startups or bigger companies, sometimes with Neo4j employees as well. That's how, basically, I spent between 2012 and maybe until one or two years ago. 
RVB: 03:26 And you've been developing some open-source software around Neo4j as well, right? Liquigraph? Maybe you can tell us a little bit about that as well? 
FB: 03:33 Sure. So Liquigraph is based on the project called Liquibase. Liquibase is a migration tool for relational databases. So the ideas and the concepts are very similar in Liquigraph, and the idea's the same, basically. You design your migrations in Cypher, and it will-- so you organise them in change sets, and then your change set will be executed incrementally, and that's how you can manage your model migration. Because that's not always easy, because Neo4j's schema optional. So it's very flexible, but sometimes maybe it's too much flexible and you need some structure to be sure your model evolves in a good way. 
RVB: 04:16 Yeah [chuckles], I always say, with freedom comes responsibility, right? You have to have something overlooking the schemaless nature, right? 
FB: 04:25 Yes, exactly. I couldn't have said it better [chuckles], actually. So yes, we are in active development right now. We've done some releases already. We're working on some new features to get a bit more on par with Liquibase. We're not as complete as Liquibase yet. We really worked on the main priority features. But we are getting there, and hopefully in the following weeks, we should have a new release with even better features, especially one we did with large data sets. 
RVB: 04:56 Wow. And this is all open-source, right? You can find this online, and you can just take a look at it, and use it if it's suitable for you? 
FB: 05:06 Yeah, absolutely. You just go to liquigraph.org, and that's your starting point.
RVB: 05:10 Sweet. So, can I ask you the question that I ask every one of my interviewees? Why [chuckles] did you get into graphs? You mentioned that it was quite some time ago, but what was the main thing that attracted you to get into the world of graphs? 
FB: 05:32 Well, when I started-- so, it was really-- how can I say? The developer in me, I just saw the power of Cypher. I mean, I could express complex queries so easily that-- I don't know. Just that. Really, Cypher was the selling point for me. Just to see how easy it was to create a graph, how easy it was to express queries. Even though I didn't have any specific project in mind necessarily, just the power of it and the flexibility of it just got me into it almost immediately. That's really what got me into graphs at first. 
RVB: 06:11 And then, did that love evolve? Or, how do you feel about that now? Is it still Cypher mainly, or are there things that you really think are more important? 
FB: 06:23 Well, I think that - especially with the three years of versions - three, I mean, is-- Neo4j really, really becomes more and more powerful, especially in terms of the-- because before this, you have manager extensions or server plugins. So even though I was a driver developer and had no problem with that, it still didn't feel as natural as with other databases, I would say. So latest edition of Neo4j, three, really makes it like a robust major product, like what you would expect from a database - like binary protocols, some drivers, and some well-defined way of extending the behavior. That is so-- 
RVB: 07:12 Have you looked at the procedures yet? I suppose you have. 
FB: 07:16 Yeah, absolutely. And that even gave me the idea of another small open-source project I created. Because when I started-- I started playing with the procedures a month ago, or something, and I noticed-- so, the runtime was very nice. Whenever I made a mistake and I deployed the procedure, I really got a detailed message. And then I thought, "Oh, okay. That's nice. So runtime gave me the error. What if I could get the error earlier?" And that's how I got the idea of writing a kind of compiler to remain simple, so that basically whenever you compile your procedure, even before you deploy it to Neo4j, you will get a detailed feedback about what you did wrong or not. 
RVB: 08:01 That sounds very useful. 
FB: 08:03 Yes, and I did that to the repository of Neo4j, or some procedure or something. You know, APOC? 
RVB: 08:11 Yes, APOC [chuckles]. The name-- we are so good at naming, it's fantastic [chuckles]. 
FB: 08:17 I'm no better, so I won't even start on that [chuckles]. I wouldn't-- 
RVB: 08:22 Exactly. So, before I let you go, just to approach another - and maybe final - subject, where do you think this is going? What does the future hold, both for the graph industry but also for things like Liquigraph, or some of your other projects? How do you see the wonderful world of graphs evolving? 
FB: 08:44 Well, first, I think it's going to-- at least what started with version three, and I'm sure it's going to continue this way, is you will have more and more integration with external tools. So, especially with the rework of the JDBC driver, for instance. It will definitely help see Neo4j used with some BI tools, or even more-- so as a-- that looks very promising. And hopefully-- I don't know. For Liquigraph, hopefully, so we will reach a version 1.0 and see people use it in reports. That's what I hope on my site, and maybe more contributors as well [chuckles]. But when I see, for instance, Panama Papers, that's a very great example of how Neo4j could be used. And it's great to see a big public example of something that is not a social graph. I mean, that's very interesting, and I'm sure we will have more and more examples like this, maybe in journalism, and maybe in other fields. 
RVB: 09:48 Absolutely. I hope so, too. For the journalism part, I don't know if you saw that, but we've actually announced a Journalism Accelerator Program, whatever that means. But it's all about helping journalists or publishing organisations get started with Neo4j. I'm hoping that we'll see a lot more of that as well, so that will be great. Very cool. All right, Florent, as I told you before, we like to keep these podcasts fairly short, and so I want to thank you for coming online and spending some time with me. I'm sure you'll get a lot of interest once we publish a proper podcast for things like Liquigraph. And I look forward to meeting you again at one of the community events. At the meetups, maybe. 
FB: 10:31 Yes, sure. 
RVB: 10:32 Absolutely. Thank you, Florent. 
FB: 10:34 Thanks a lot. 
RVB: 10:35 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik