Friday, 2 December 2016

Exploring the Paris Terrorist Attack network - part 3/3

Previously, on this blog, I had started writing about how we could get some of the data published by a local Belgian newspaper, De Standaard, on the Paris Terrorist Attack Network into Neo4j. In
  • Part 1, we talked about loading the raw JSON data into Neo4j, and then in
  • Part 2, we cleaned up some of the data for easy querying in Neo4j. 
So that's where we are. To wrap things up, I just wanted to illustrate some of the results and queries in Neo4j around some of the most interesting figures in this Terrorist network. I started some of my explorations around a widely reported terrorist, and Belgian national, called Salah Abdeslam.

So let's take a look at Salah in Neo4j.

Wednesday, 30 November 2016

Exploring the Paris Terrorist Attack network - part 2/3

In part 1 of this blogpost series, we got the basic Paris Terrorist Attack Network loaded into Neo4j. It looked like this:
There's a couple things that annoyed be about this graph:

  1. First, the relationships are all "bidirectional", which really clutters the visualisation. In Neo4j, relationships are always directed, which kind of makes it awkward to store these bi-directional relationships like this. 
  2. Of course, this graph was originally made by De Standaard newspaper in Flanders, Belgium, so therefore it was created in Dutch. A couple of the key concepts though (type of node, status of the node) would be easily and meaningfully translated for you to have any fun with the dataset.
  3. The graph was not "labeled", and therefore lacked some essential structural elements that would allow for fun manipulation in the Neo4j Browser. 
  4. The relationships did not really say anything about the type of relationship. 
Let's tackle these one by one.

Monday, 28 November 2016

Exploring the Paris Terrorist Attack Network - part 1/3

November 13th, 2015 - A day to remember

Just over two weeks ago, we remembered the sad anniversary of one of the most atrocious and vile terrorist attachs that our generation has seen. It's easy to forget many things in our daily rat race, but I don't think I will easily forget this video, which was all over the internet hours/days after the attack on the Bataclan concert hall in Paris:

All it takes is a drop of empathy and humanity to understand the horror that these victims went through. The sound of the one person shouting "Oscar .... Oscar... Oscar..." just keeps on ringing through my head.

Friday, 25 November 2016

Podcast Interview with Craig Taverner, Neo Technology

The interview below was long overdue - but very much worth the wait. For the past couple of years, the Neo4j community has been brewing on a really interesting add-on capability to integrate GIS-style, spatial querying capabilities into Neo4j. It's such a great and natural fit - and one of the driving forces behind this in the community has always been this global citizen called Craig Taverner. Craig has been in the ecosystem for years - first as a community member, then as a commercial customer, and now as an employee in Neo's Swedish engineering team. So about time we had a chat:

Here's the transcript of our conversation:
RVB: 00:02.785 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again, recording another Neo4j Graphistania podcast session. And today I'm joined by one of my colleagues actually, in the Neo4j engineering team, Craig Taverner. Hi Craig. 
CT: 00:20.685 Hi, Rik. 
RVB: 00:21.379 Hey, good to have you on the podcast. Thanks for joining me. 
CT: 00:23.946 Thank you. 
RVB: 00:24.823 Craig, you've had an interesting journey with Neo4j, first as a community member, then as a customer of Neo4j, and now as one of our engineering team leads. So, why don't you tell us a little bit more about you and your relationship to the wonderful world of graphs? 
CT: 00:43.121 Yeah. It's an interesting story. I guess I should say that my background is in Geo-science, but when I had the opportunity to move to Sweden about 17 years ago to work in Telecoms, I got involved in software for engineering telecom networks and optimising telecom networks, and entrepreneurial work starting companies. And I did that for about a decade and a half. My last company, we were building modeling software for doing GIS modeling, mapping the modeling of telecom networks and the data coming from telephones as well. And that's where I got involved in graphs, because it's a very graph-like domain. There's a lot of interconnections, relationships, relationships between phones and the network, between people, between the services, between the signals. There's a lot of complexity there that is really natural for graphs. I had an opportunity to meet Emil back in 2009, I think, and he tried to convince me that Neo4j would be the right database for our product, and I said-- 
RVB: 01:53.857 He does that with everyone [chuckles]. 
CT: 01:55.749 Yeah. And he didn't succeed, but he did [chuckles]. He put the seed in my mind. He proved to me that this was a cool idea, but I told him we already were on MySQL and it was working fine for us. And that was it. But a couple of months later, I was faced with a database refactoring problem, and I needed to enhance the data model. And I was sitting there thinking, "This would've been easier in Neo4j." I remembered what Emil had said about the way things worked. It was the whiteboard's friendliness that really sold me. And I put my most junior developer on the project, and said, "Listen, could you just model this up in Neo4j?" And this guy was a guy that always did Google searches for sample code, and even back in 2009, he managed to find all of the examples that he needed from the community, and got it done really quickly. And-- 
RVB: 02:52.413 No way?! 
CT: 02:52.457 --that taught me that the product was more mature and the community was more mature than I had anticipated. And it was the junior guy who did it. So I was sold, and within a month we ported the whole product over and that was the beginning of the story for me. 
RVB: 03:07.634 Oh, wow. And is this also when you got involved in the Neo4j Spatial add-on to Neo4j? 
CT: 03:15.740 Yes. Once we ported the telecom model, we had a need to visualise it in our map - we had a map user interface - and so I built a simple quadtree as a tree structure in the graph itself, and then presented that at a conference at the end of the year, when I got in contact with Tobias, one of the other engineers in Neo4j. 
RVB: 03:37.894 Who I've also interviewed here [laughter]. 
CT: 03:39.846 Yeah. And he'd done a similar thing, and we started brainstorming, and this led to a collaboration between my company and Neo Technology. So for 2010 and '11, we collaborated to build a Neo4j Spatial data modeling library, which actually is a very, very rich GIS platform for doing quite complicated geographical analysis as a data models within the graph. And that was really, really great. Then, of course, I focused on my own company for a few years, and we moved our markets to Asia and worked there for a while, but when the entire company got shifted to Asia I came back to Neo. So I was out of graphs for a while, and 2014 I switched to actually being an employee and work with the engineering team in Malmø. And that's been really great, because it's been fantastic being a customer, and now to actually get into the insides and really see what's going on deep inside the product, that's fantastic as well. 
RVB: 04:41.772 Super. So what have you been working on most recently within Neo4j engineering up there? 
CT: 04:47.747 Well, I've been with the company for over two years, and most of the time I was working in Cypher as an engineer in Cypher or as a team lead for Cypher, but the last more than half a year, say eight months or so, I've been the team leader for security. We've been building the first fully-featured security model for Neo4j with multiple users and roles, and all the things you would expect of a security model, which we're now releasing in 3.1. 
RVB: 05:16.252 Which is something that I know a lot of our customers are looking forward to, so thank you so much for taking that on. Really great. So you've mentioned a couple of things already, Craig, but what was it that really attracted you and that made you get into graphs most? Is it that whiteboard friendliness or the flexibility? What stands out for you, if you don't mind talking about that a little bit? 
CT: 05:43.820 I think the strongest thing for me has been the whiteboard friendliness, and the fact that you can really understand your data model so much better when you work with a database that is so similar, in a way, to how anyone thinks about the data model. In my case with Telecom, is that it was completely natural, and then also with maps. And the map side I think is a particular passion of mine. I've been involved in GIS in many different ways in the past, but when it comes to graphs, the synergy is enormous, and it reminds me of the fact that so much of graph theory actually came out of mapping analysis of GIS. So it's a passion for me to actually see Neo4j get more involved in maps again. Even though we built that map system back in 2010 and '11, that's very external to the product. Now we're looking at building graph capabilities, spatial capabilities into the product itself. And that's going to be super exciting. I'm hoping we're going to get into that more and more quite soon. We've done a little bit in the last year, but if we're lucky, I think it's going to be something we can see some more of in the future. 
RVB: 07:01.147 It's really cool to see how it got picked up really quickly in the APOC developments. There's a couple of really nice procedures that allow for much easier access, I think, to the spatial libraries, right? 
CT: 07:15.296 What we did is two things. And I collaborated with Michael on this, of course. In APOC we did geocoding only, but outside of APOC, with Michael's support as well, we built a series of procedures on top of the old spatial library, the one that I mentioned before. So that library has been revamped and polished up a little bit now for the 3.X series of Neo4j using procedures, making it far more accessible from Cypher than it was before. So I think that's going to help the market a lot, because up until this point, using the spatial library from the Cypher has been difficult and in fact buggy due to the difference in the way Cypher interacts with indexes, and the way the old library was designed to interact with indexes coded back in the Java API days. So I think this is going to help a lot. The library does have some limitations, some performance issues as well. Although I don't think this is going to make it applicable to all markets, I think it's going to open up the markets enough that we will get the feedback we need, which will help us design the built-in version of spatial for the future, which is going to be fast. It won't have any of those performance problems or any other issues. 
RVB: 08:31.149 I can only confirm that it's been one of those domains where there's been a lot of customer use cases as well. If you look at one of our big customers in Europe, like TomTom, they've been doing quite a bit of work on Neo4j already, sort of like confirming that their maps are effectively graphs. There's a lot of interest in it, and I think the APOC work has already made it a lot more usable for non-programmers like myself. I can use the spatial library now, which I couldn't do before. It's pretty simple. 
CT: 09:08.059 Well, that's fantastic to hear. 
RVB: 09:09.356 Yeah. So Craig, where is this going [chuckles]? What does the future hold, both for the Neo4j developers that you are working on now, but also for things like spatial, maybe even for our industry? What do you think is around the bend? 
CT: 09:29.325 Well, around the bend there's so many things you could say. All speculation, of course, when you look into the future. But I could say a few things about spatial in particular, because we have real buy-in I believe, from the company for a certain elemental spatial, the location-based search, distance calculations, point data, which is something that almost anyone in any industry is likely to end up needing. So we see a very large demand for that. And that's something that I think we're going to see coming in very soon, with high quality and high-end performance. But I think there is a interesting thing that we should consider, and that is the whole element of graphs in spatial. If you look at industry leaders in this area, like Oracle, and PostGIS, and others, they are doing some very advanced graph analysis on relational back-ends. What they do is they pull data out of relational stores, build complex graphs in memory, do the graph analysis in memory, and then either present the results to the user or save them back into the tables. 
CT: 10:37.086 There is an opportunity there, a sweet spot for a meta-graph database with the index-free adjacency that Neo4j has, to be able to do that far more efficiently and scale to much, much larger sizes, without having to have the same RAM requirements and CPU requirements that the other databases have. But we're talking about something well beyond the current plans for Neo4j Spatial, but something that I think will be a market changer, a disruptive changer there, because no one else can do it the way that we'll be able to do it. So I'm still looking forward to that kind of a disruptive change in the market further down the line. I don't know if we're talking about one or two years, or five years. I'm not sure, but it's something that could be really massive. This also relates to something that's separate from spatial, but if we talk to the customers that use these advanced spatial features, they are interested in thing like time-versioned graphs based on MVCC and other techniques. And I can imagine the product going in that direction anyway, not just because of my interest in spatial, other people's interest in spatial, but we see other markets interested in time-versioned graphs and other aspects like that with really complex indexes, that actually turn it into high-performance data warehouses as well. Then you start to overlap the OLTP and OLAP areas as well, which I think is also an area that the company is interested in. 
RVB: 12:07.815 It's funny that you mentioned that, the time-versioning bit. I'm sure you've read some of the work that Ian Robinson did on that, and some of our customers have presented on it, but it's really, really cool and I couldn't agree more, it's one of those things that lots of people have been showing an interest into. So Craig, thank you so much for coming online. You know that we want to keep these podcasts fairly short, although we could talk about these things for at least another hour, probably more of a beer conversation. But [chuckles] thank you so much for coming online. We'll put the transcription up on the different websites, and I'm sure that if people want to reach out, they will. So thank you so much. 
CT: 12:51.207 Thank you, Rik. It's been a pleasure. 
RVB: 12:52.568 Thank you. Cheers, bye. 
CT: 12:54.436 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Wednesday, 16 November 2016

Podcast Interview with Evelina Gabasova, University of Cambridge

Oh man - things are heating up in the graph space, and keeping me super busy. After announcing our Series D last week (read more over here) I barely found the chance to publish this interview with Evelina Gabasova about graphs, Star Wars and biotech. Listen or read the full interview below.

Here's the transcription of our conversation:
RVB: 00:03.805 Hello, everyone. My name is Rik - Rik Van Bruggen from Neo Technology, and here we are again. It's been a while. We're recording another episode of our Neo4j Graphistania podcast, and today I have a wonderful lady from the beautiful lands around Cambridge, on the other side of this Skype call. And that's Evelina Gabasova. Hello, Evelina. 
EG: 00:26.929 Hello, Rik. 
RVB: 00:27.991 Hey. Good to have you on the call. Thank you for making the time. 
EG: 00:30.946 Thanks for having me. 
RVB: 00:31.940 Yeah, fantastic. So, Evelina, I have learned from our conversations that you are a postdoc researcher at the University of Cambridge, but maybe you might want to introduce yourself a little bit and tell our listeners who you are and what's your relationship to the wonderful world of graphs. 
EG: 00:49.666 Well, I originally started as a programmer and then I got interested more in machine learning, so I went on to do a PhD in machine learning actually, and statistics in mathematics. And now I'm working as a postdoc in biomedical research, so I don't have any biological background at all, but I'm a quantitative person and I help biologists analyse their data. So I work in like statistical genomics and bioinformatics at the moment. And I got interested in graphs because they are very useful in modeling quite lot of biological phenomena because there are these protein-protein interaction networks, et cetera, and gene interactions. So graphs are a very natural way of modeling these kinds of things. 
RVB: 01:33.283 Wow. You know this is a funny story because my first exposure to graphs was also about protein-protein interactions at University of Ghent here in Belgium. Is that metaproteomics? Is that the kind of field that you're talking about here? 
EG: 01:46.540 I'm not working with proteins most of the time. I'm working on the bit lower level with the individual genes and different DNA variations, et cetera. But, still, they interact with each other, and the thing with biology is that it's a very multi-layered process and the different layers interact with each other, so it's extremely complex. I'm still-- my mind is exploding whenever I think about it, to be honest. 
RVB: 02:12.786 Oh, my god [chuckles].
EG: 02:14.342 Because whenever you look closer it's just much more complex, and some aspects of it are very well modelled by graphs. Some are not, but we are just trying to integrate all the types of information that we have, and graphs are very helpful in that. 
RVB: 02:32.241 Fantastic. And you're a big Star Wars fan, right? Because I read that [chuckles] GraphGist about Star Wars [laughter]. 
EG: 02:40.119 Yeah, I am a big Star Wars fan [laughter]. Yeah, that's one thing that I did. Social network-based analysis is very nice for just playing with things, so last year before Christmas when the new Star Wars movie was coming out, I just decided, "Okay, let's play with it a little," and I extracted social networks from all the scripts, all the movies, and then I just played with it. And it's a wonderful data set because you can understand what's happening there. Because sometimes when I look at bilingual data sets and see, okay, so this gene interacts with this gene without actually consulting quite a lot of literature and biologists, I have no idea what it means properly. But if I see that these two characters interact with each other, it makes sense because I've seen all the movies [laughter]. 
RVB: 03:33.382 I won't ask you for your opinion about the last movie [laughter]. So what do you think is so nice about using graphs for these different fields? You know what do you like it about it? Why is it so interesting for you? 
EG: 03:50.762 Well, I find graphs as a very natural way of structuring information and very nice way to analysing very complex data where I just know how-- maybe I don't know that much about the data, but I know something about interactions, and graphs are just great for that. [crosstalk] Also, if you are looking at a graph, it doesn't have to be like direct interaction. The interactions in a graph can mean whatever you decide they should mean and that's a very flexible framework for approaching complex problems. 
RVB: 04:26.904 Is that something you encounter in biology a lot, or are you talking more about the social networking stuff or both? 
EG: 04:33.988 No, I was talking more about biology, probably. 
RVB: 04:36.439 Yeah, it's more like a pathfinding to see if there is a path between different genes, for example. Is that what sounds like an example? 
RVB: 04:48.423 Yeah, for example, or it doesn't have to be like directly pathways, it can be like if genes are related. For example, we are also working with some colleagues on a system that does data mining on academic papers. And it can be, for example, if two genes are mentioned in the same paper, which is not a very direct interaction between them, but it tells me that they are probably related in some way. 
EG: 05:15.520 Absolutely, yeah. Actually, I did a podcast recording a couple of months ago with someone from University of California who was writing about molecular interactions. And he called it HETnets. Really interesting. I'll look up it for you [crosstalk]. 
EG: 05:34.317 Yeah, that sounds very interesting. But the interactions can mean anything. It can be on the bio-- like, chemical level. It can be on physical interaction level. It can be on what we know about those genes, et cetera. And I like to play with social networks because that is a very interpretable way of dealing with networks. So, these are my hobby projects, and at work it's much more complex [laughter]. 
RVB: 06:05.747 Well graphs are everywhere [laughter], right? 
EG: 06:08.607 Yeah, that's true. 
RVB: 06:09.181 So, that the kind of the tagline of Neo4j and it's been-- it's so true, right? Once you get into it, it’s almost impossible not to see things as a graph [laughter]. 
EG: 06:21.159 That's true [laughter]. 
RVB: 06:24.220 Do you have any plans for other use cases right now, Evelina? 
EG: 06:29.661 Not at the moment because, well, the use cases that we are already working on with my colleagues are complex enough [chuckles] to be honest. So, we are playing with some like-- 
RVB: 06:39.938 Enough to keep you busy. 
EG: 06:41.580 Yeah, definitely [laughter]. 
RVB: 06:44.227 I always say, "It keeps me off the streets" [laughter]. Exactly. 
EG: 06:49.666 Yeah, but what I'm working on mostly is how to integrate the information from different layers in biology proposals. So, I'm looking at like the very low-level gene level or like the DNA level, and if there are any changes there, and how does it integrate with the RNA changes that are in the cell and how does it integrate with protein changes, et cetera? So, these are many different levels-- 
RVB: 07:14.369 What's RNA? I don't know what RNA is. 
EG: 07:17.103 Sorry? 
RVB: 07:17.552 What's RNA? I have no idea what that is. 
EG: 07:20.178 Oh sorry [chuckles]. It’s like intermediate product between the DNA and a protein. So, it's how the DNA is transcribed into RNA, and that is then changed into protein. So it's sort of like an intermediate product, and if you are looking at that, you can see what's actually happening in a cell in a specific moment, because it's telling you which genes are being actively changed into proteins. 
RVB: 07:50.302 You know what? 
EG: 07:50.913 Did it make sense? 
RVB: 07:50.917 This is why I like doing these podcasts. I learn something everyday, you know, it's [chuckles] very good, thank you [crosstalk]. 
EG: 08:00.830 Sorry, I just wanted to add something, that the DNA is basically a stable structure, the RNA tells you what's actually happening in a cell in a specific moment, and the protein level tells you basically what are those processes that were happening over some time ago, or over some longer time, because the proteins are just in the cell produced and then they are doing their roles. 
RVB: 08:27.269 Interesting. So where do you think this is going, you know, both for you personally, Evelina, in your job or in your play time, but also looking at the IT industry - at the end of the day you're an IT professional - where do you think this is going, what does the future hold for the world of graphs? 
EG: 08:50.072 Well, I think the future for our graphs is bright, because we have a lot of unstructured data and graphs are a great way to represent that. And it allows us to mine quite a lot of very complex data sets that would be impossible to structure in any other way, or maybe we don't have any other good way to structure those data sets. So, I think graphs will continue being quite successful in modeling, in quite a lot of domains, and for me personally, well, I hope I will get to play with graphs even more, because it's quite a lot of fun. 
RVB: 09:27.334 Excellent. 
EG: 09:28.508 At least in my free time analyzing some more movies, et cetera. 
RVB: 09:33.443 Well, I look forward to seeing the results of that, and I wish you all the best for the professional use cases. I want to thank you for coming online, Evelina, it's been great talking to you. And I'm sure our listeners will enjoy listening to and reading about your story as well. 
EG: 09:51.603 Thanks for having me. 
RVB: 09:52.591 Thank you and have a nice day. 
EG: 09:55.145 Bye, thank you. 
RVB: 09:56.020 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Thursday, 20 October 2016

Podcast Interview with Amanda Schaffer, Cisco

Been a while actually since the below conversation happened - but now I have a finally found the time to put it up here. Apologies for the delay. Amanda Schaefer has been a really great community member for Neo4j and has been using and advocating the use of graphs in lots of different use cases. Listen to her story - and hopefully it will be an inspiration for other graphistas to come out of the woodwork and start tackling real world use cases with Neo4j.

Here's the transcript of our conversation:
RVB: 00:02.767 Hello everyone. My name is Rik Van Bruggen from Neo Technology and here we are again recording another Graphistania podcast episode and today I'm joined by Amanda Schaefer from Cisco. And you're based in Seattle, right Amanda?
AS: 00:17.171 That's correct.
RVB: 00:20.162 Well our listeners may not know you yet, even though you've participated in the GraphGist Challenge last winter. So why don't you introduce yourself, Amanda?
AS: 00:29.131 Sure, so I am the technical lead for an analytics team in Cisco and my group focuses on maintenance contract renewals and kind of optimizing the quoting work flow and optimizing customer success. So we look at a lot of metrics related to opportunity and bookings and quoting packways and things like that.
RVB: 00:50.598 Wow, that sounds really interesting and you know that Cisco is already a Neo4j user, maybe someday you'll get to use it professionally there as well.
AS: 01:00.715 I hope so. I'm working on a couple of use cases for that.
RVB: 01:03.514 Really cool. But what I've read from your work so far, you've been using Neo4j for some of your personal projects, right.
AS: 01:10.929 I have, yeah.
RVB: 01:12.699 Can you tell us a little bit more about that?
AS: 01:14.455 Sure, absolutely. So I started out going to some theater productions around Seattle and noticed that I recognized a lot of the actors from different plays and different theater companies and I got interested in graphing that, because that's such a perfect kind of classic graph problem, mapping people in social networks, and so I was interested in that in the theater space. So for the GraphGist Challenge last winter, I wanted to take a look at that and ended up focusing on the Seattle Shakespeare Company, mostly because their data was the best available of the local theater companies [laughter] so I had to deal with the least data engineering for that, and could focus on the analysis a little bit more. I took a look at their past productions, and matched that up with all of the available Shakespeare plays, and took a look at things like production year and comedies versus tragedies, and their normal seasons versus the things that they take out to the parks. Just had a lot of fun exploring the data with Neo4j.
RVB: 02:15.017 Did you learn anything interesting, things that you didn't know before?
AS: 02:19.544 I definitely learned that the most popular plays and the most successful ones tend to be the things that they take to the parks, which was interesting. I found that there were only eight plays of Shakespeare's that the Seattle Shakespeare Company hasn't produced, so they have done a pretty comprehensive job.
RVB: 02:37.899 When you say, take it to the park what does that mean? I'm not familiar with that.
AS: 02:41.361 So there is a summer kind of "Shakespeare in the park" program, where they go out to different parks around the city, and tour, even around Washington state and do free production in parks around the summer.
RVB: 02:55.427 Now you are just trying to get me to move to Seattle, right?
AS: 02:58.729 Seattle is a fantastic place to visit in the summer, I highly recommend it.
RVB: 03:03.180 Very good. You told me a little bit about some other projects that you have been working on as well, like the movie festival. Tell us about that maybe.
AS: 03:11.317 Yes. So Seattle has an international film festival that takes place in May and June. And so this year I had a festival pass which means I could see any movie essentially that was running during the festival. But there are about 500 movies to see in about three and a half weeks. So figuring out which movies to see is a big challenge. I watched all of the trailers and rated the movies according to my interests, and then I loaded the schedules, the theaters, the transit time between theaters, and my ratings and the movies all into Neo4j, and using a Python program, I created my optimal schedule for the international film festival. I ran 100 simulations and took a look at the top 10 to 20 schedules, and used that as my basis for deciding which movies to go see.
RVB: 04:03.776 Sweet. That sounds really great. It reminds me a little bit of the use case that we talked about on this podcast a couple episodes back, about the Date Night movies. I don't know if you heard that episode. It's You'll like it if you're a movie buff [laughter].
AS: 04:22.165 Great, I'll have to take a look at that.
RVB: 04:24.285 Yeah. Very good. So why graphs, Amanda? Why did you get into graphs, and what's so cool about them for you?
AS: 04:32.701 I actually got interested in graphs looking at the master data management use case, because as part of our quoting workflows, we have a lot of places where a single kind of parent company will have a bunch of different subsidiaries or a bunch of different field locations, and we want to be able to understand which of these contracts really belong to the same company and things like that. So I took a hands on workshop with Nicole White from Neo4j actually, at NoSQLNow! in 2015 and that was my official kind of hands on introduction, when I was exploring that master data management use case. And I just kind of got hooked after that workshop. It was so much fan and so easy and intuitive to play around with the graph model especially in Neo4j, so from there it just sort of took of. And they say once you get it, everything is a graph and I think that's really true. I am always kind of thinking about how can I make this into a fun Neo4j project?
RVB: 05:38.791 Absolutely yeah, it's unbelievable. I was actually jogging yesterday and all of a sudden there is my podcast is talking about graphs [chuckles]. It was really, really peculiar. All right, the model, that's what you find very interesting, the ease of use, is there anything particular that you find most appealing in Neo4j?
AS: 06:02.033 I love the ease of use. For me. I'm just kind of always thinking about the intersection of business and technology, or the intersection of modeling real world things and technology. So the modeling events, like I did for the film festival, is very interesting to me, like the use case I was thinking about, DataDay Seattle, a local conference that I attended a couple of days ago. And thinking about conference management software, and figuring out combining the sessions scheduling with a recommendation engine, which I think are both things that Neo4j does really well. And it seems like you could build a really powerful conference scheduler application based on that, so attendees could make it social, and recommend sessions for each other, and things like that. So just always thinking about the ways that things are connected, and how to just apply these classic graph problems to a lot of situations in the real world.
RVB: 07:02.117 Hmm totally. Well, at GraphConnect we always have a schedule graph. I don't know if you are familiar with that, but in the GraphConnect conference that we host every year, every six months, upcoming October in San Francisco as well, we'll have a schedule graph as well. So maybe that's a starting point for you.
AS: 07:18.126 That's great, I have actually recently purchased tickets to the event in San Francisco, so I am really looking forward to it.
RVB: 07:24.554 Fantastic, we will see each other there for sure then. Other than GraphConnect, what is the future look like Amanda? Where do you think this is going for you personally, for the industry? What's in store, do you think?
AS: 07:39.721 For me, the really interesting next step, and the hurdle that I need to overcome to use it professionally, is just really making the self-serve, analytics part, and getting graph understanding out to the typical data analysts on my team, and the people that would use this analysis day-to-day in their business, and helping them understand these cases and understand the graphic analysis, and things like that, and I think making it really accessible to a lot more people around the organization, is one of the biggest challenges that I'm looking at. In Neo4J 3.0 there's the ability to share the graph style sheets that you've set up in the cloud, so that everyone can see exactly what you're seeing on the screen, and it's much more easy to share those around the organization. Things like that I'm really excited about, because at least in my organization, I know that this is a very cool thing, and there are a lot of use cases for it, but I need to take that out and empower other people to figure out how to take advantage of it. That's what I'm really looking forward to.
RVB: 08:52.664 Fantastic. I think that's something that we'll see many more people working on in the next couple of years, for us as well, as a company in this industry, it's really important that we make that work. Cool. Amanda, you know that we want to keep these podcasts fairly short, so we'll put some links maybe to your graph sheets and the rest of your work, on the transcription page, if that's okay. For now, I'm just going to thank you so much for coming online and having a chat with me, and I look forward to meeting you at GraphConnect.
AS: 09:25.550 Thanks, Rik. I had a lot of fun on the podcast this morning.
RVB: 09:28.510 Cheers. Bye-bye.
AS: 09:29.962 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Thursday, 6 October 2016

Podcast Interview with Alessio De Angelis, Whitehall Reply

I have said it before and I will say it again, but the joy of this podcast - for me, at least - is the fact that I get to talk to all of these wonderful people out there. Today's conversation is with a really cool and talented young software developer out in Rome, Italy, who has had a passion for graphs for a while - and is now applying it to Neo4j and NOSQL database approaches. Here's my chat with Alessio De Angelis:

Here's the transcript of our conversation:
RVB: 00:02.287 Hello everyone. My name is Rik, Rik van Bruggen from Neo Technology, and here I am recording another Graphistania Neo4j podcast, and today I'm joined by someone from a place with a lot better weather than where I am, that's for sure. Alessio De Angelis from Rome in Italy. Hi Alessio. 
ADA: 00:21.861 Hi Rik. 
RVB: 00:23.151 Hey, it's great to have you on the podcast. Thank you for making the time. 
ADA: 00:26.929 Thank you for inviting me. 
RVB: 00:28.415 Very good. Alessio, our listeners probably don't know you yet, so I read some of your work on the GraphGist Channel Challenge (Santa's shortest weighted path), but you may want to introduce yourself. Who are you, what do you do? 
ADA: 00:44.241 Yes, sure. I'm now working as an IT consultant, in particular involve into big data, data warehouse, and SQL projects, but my love with the graphs started a long time ago, probably three years ago, because at my university I was working on a Master's thesis, where I had to do a recommendation engine, in the cultural heritage domain, and I was looking for a tool that was helping me to profile the user interest, storing the places, historical monuments he and his friends visited according to their social network accounts, together with the extra information delivered by linking the data  in graphs. And Neo4j was the perfect database to achieve all those tasks. So, I really fell in love with this database.
RVB: 01:56.472 That's great to hear. Fantastic. I mean, there's a lot of people that are actually using Neo4j in the cultural heritage space. I've actually interviewed a couple of them on this podcast already, so I'll send you some links and I'll put them on the transcription as well (note: see posts like this one, interviewing Iian Neill about his Codex, or this one, interviewing Lorenzo Speranzoni about his work on Van Gogh's journey). So, that was a really great use-case, and then you also decided that you wanted to do something for Santa Claus as well? 
ADA: 02:21.394 Yeah [laughter]. 
RVB: 02:24.278 What was that about? 
ADA: 02:26.193 I started looking for-- I was in the Neo4j main page-- actually, no, in the Twitter account, and I read up about these GraphGist Challenge, and say, "Whoa, I want to participate." And then I was seeing the domains, and I saw Santa Claus. Nice. Let's think about Santa Claus life. And since I was a little kid, I was wondering how could Santa Claus manage to reach every children in the world, and give the presents before they fall asleep. 
RVB: 03:11.683 Because he has a graph database. Now we know. 
ADA: 03:13.636 Yes [laughter]. 
RVB: 03:18.643 Very funny. 
ADA: 03:18.854 He managed to think about an algorithm for searching the shortest path in the graph of children all over the world. I think for sure that is the reason why Santa Claus managed to do it now in 2016. 
RVB: 03:39.121 There is no doubt, I know he uses Neo4j [chuckles], but maybe he's actually using it with some of the new awesome procedures, the APOC, because I don't know if you are familiar with that but the weighted shortest path to calculations, you can do those with algorithms like Dijkstra and A* [crosstalk]. 
ADA: 03:59.679 Yes. 
RVB: 04:01.044 And in the APOC, in the awesome procedures that we have now in Neo4j 3.X, there's those algorithms and you can call them from Cypher. I think next Christmas you have to go to Santa Claus, and you have to tell him how to improve his database. 
ADA: 04:16.610 Yes. For sure [laughter]. 
RVB: 04:19.417 Very cool. So, why is it so attractive to you? Why did you fall in love you think with the wonderful world of graphs, is it the model, is it the performance, what is it about for you? 
ADA: 04:33.153 Firstly, because as the Neo4j slogan says, graphs are everywhere. Probably we don't think about it but really graphs are everywhere, and probably you can model all domain in a graph way. And then in particular, I was attracted by Cypher, because it's so short syntax, but really powerful. I really love the pattern matching, and Santa Claus in his team managed to do all the search of the shortest way weighted path with just Cypher and his main constructs like-- 
RVB: 05:22.258 Fantastic. That's a great summary. I mean if Santa Claus can use it then everyone can, right? 
ADA: 05:27.348 Yes. 
RVB: 05:28.332 It's obvious [laughter]. And, is there anything that you think that would really make it even more awesome, that you think we should really be adding to Cypher, or Neo4j, or maybe looking a little bit into the future? What do you think about that? What do you think the future holds for this industry? 
ADA: 05:53.424 I think it's really growing because, even in Italy, much more companies are getting through this database, and if I think several years ago no one was knowing that database is probably just at university. In some courses, I was listening to people talking about graph databases, but now I think it's really a reality. In the future, it would be a really-- much more projects would be involved no SQL databases. 
RVB: 06:38.832 Is there a particular kind of project where you think that it would really fit best, or what do you think would be the sweet spot for the graphs in the next couple of years? 
ADA: 06:48.486 Yeah. For sure, in our research, operations, course, we did so many , like for searching the best solution, or-- yeah. So, I think each one of these theoretical aggregates can have an improvement if it's applied with a graph database, like Santa Claus did with just a normal shortest weighted path search algorithm, improved by Cypher language. 
RVB: 07:30.011 So, basically what you're saying is that there's a lot of just regular database applications that could benefit from it, and offer new functionalities, new [crosstalk]. 
ADA: 07:40.226 Yes, probably just some-- a algorithm that work on data as a graph, but just not considering the data as a graph itself. So, they try to travel among these data, and try to connect them, but without using a proper graph database. If they use it, I'm sure that their performance will improve. 
RVB: 08:12.035 I agree. I couldn't agree more, actually. Thank you so much for sharing that, Alessio, I really appreciate it. And, again, maybe next Christmas we'll have a new and improved [chuckles] engine for the pathfinding of Santa Claus. In the meanwhile, I think we'll try to keep these podcasts short and snappy, so I'm going to thank you for coming online, and sharing your experience with us. It's been a great time talking to you, and I hope to see you at one of the future Neo4j community or conference events, you know? 
ADA: 08:53.743 Yes. 
RVB: 08:55.047 That'll be great. Thank you [crosstalk]. Have a nice day. 
ADA: 08:58.099 Yeah, you too. Thank you for inviting me. 
RVB: 08:59.814 Bye. 
ADA: 09:00.891 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best