Here's the transcript of our conversation:
RVB: 00:00:00.938 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j, and here we are again, recording another episode of our Graphistania Graph Database Podcast. And tonight I have a dear colleague of mine on the other side of this Google call, and that's Amy Hodler. Amy, how are you?Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!
AH: 00:00:20.779 I'm doing wonderful. Hi, Rik. A pleasure to be here.
RVB: 00:00:24.250 Fantastic. Amy, you've been leading the charge at Neo4j for the graph analytics and AI program, and I'd love to know a little bit more about that. So tell us a little bit more about who you are, what you do, and how you got into the wonderful world of graphs.
AH: 00:00:43.734 Wonderful. Would love to. So I work as a program manager in the marketing team. So what that means is-- and I'm focused completely on graph analytics and AI and machine learning, those uses for graphs. So what does that mean? Well, it means I wear a lot of hats. I do a lot of content. I create a lot of content. I just finished co-authoring a book on graph algorithms, which I'm sure we'll talk about in a bit. I also do presentations, speaking, generate content, like you might expect a product marketing manager to do. But then I also do evangelism and assist with training and things of that nature.
RVB: 00:01:30.272 Absolutely. And I've read some of the stuff that you've produced and I've used some of the material that you've produced. But there was a little bit about a book. What have you been working on?
AH: 00:01:42.294 Well, the book is called Graph Algorithms. It's an O'Reilly book and it will be published in the spring, so coming up spring 2019. It is focused on graph algorithms with practical examples in Apache Spark and Neo4j. And our focus of the book was really to help people who are new to using graph analytics and graph algorithms, to give them a sense for what the algorithms could do versus regular analytics and focus on the classic graph algorithms. And classic graph algorithms usually fall into three general areas. They are path finding, so how do I find an optimal route from A to B or the most cost-effective route from A to B? Centrality, which is about finding the most significant and influential node or nodes within your network. And then community detection, which is pretty much what it sounds like. How do things group together or potentially might split apart?
RVB: 00:02:46.672 That's really interesting. And it's kind of a stretch for the traditional Neo4j community, right, because Neo4j traditionally has been more around OLTP-style workloads, where you we do transactions and keep data safe and all that kind of wonderful stuff. But this is a little bit different, right? It's more analytical, for lack of a better word.
AH: 00:03:11.820 Yeah. So it is a little different and our entry into this area a few years ago started because we had customers that wanted to do more global analysis. So you can think about pattern-based queries that you might do in Cypher that Neo4j users might be really familiar with as very locally-focused and pattern focused. You know exactly what you're looking for. I want to match this thing that has this type of relationship to this other thing. And so you know the pattern you're trying to look and you're probably looking or doing an analysis that's very much around a particular node or area of your graph. When you think about graph algorithms, a lot of people are starting to call them a computational algorithm, so graph computational algorithms are more global in nature. So you're looking at trying to understand more about the structure of your overall graph. I have a large network I want to know. I want to know where the lumpy bits are. I want to know where things are bunched up together. I want to know where the hubs and spokes are. I want to understand, in general, who or what is a bottleneck in my network. So looking at things more kind of at the larger holistic standpoint.
RVB: 00:04:37.901 And that's traditionally something that you need very specific types of hardware and software for it and algorithms and stuff like that, right? And I guess that's where the new book comes in to explain that new architecture.
AH: 00:04:55.348 Yes, definitely. And the algorithms are very specific, and they are graph algorithms based on traditional graph theory. They are very specific to the analysis based on relationships, which, as we know, relationships are very important when you're studying graphs and networks in general. We do also talk a bit about the underlying platforms and what kind of choices you may want to make depending on what kind of analysis you're trying to do. One of the things that we found working with current Neo4j customers is there's a little bit of a gray area overlap between your traditional transaction process and your analytics process. And the more you can bring that together, those workflows together in a single paradigm or platform, it just makes it easier for your data sciences to iterate quicker using more real-time or fresher information, and then your transactional side of the house doesn't have to stop and pause to provide somebody with updated information. And they can also use those, use the insights from the analytics to then develop more targeted processing.
RVB: 00:06:16.240 I think the technical term is HTAP these days, right, hybrid transactional/analytical processing? I think I read that somewhere.
AH: 00:06:27.185 That's absolutely right. Yeah.
RVB: 00:06:29.698 Fantastic. Well, but before I forget, you wrote this book together with a dear European friend of ours, right?
AH: 00:06:38.209 Yes. Mark Needham was my co-author on the book, which was a fantastic experience to have somebody with such a different skill set than myself and to work back and forth. Which I would highly recommend if anybody is working on, I don't know if I want to say any book, but certainly any technical book, is to not go at it alone and to definitely work with a co-author and kind of enjoy that back and forth and having that other point of view to how things are explained and brought along.
RVB: 00:07:16.391 Yeah, fantastic. And Mark is such a fantastic member of our community and a dear friend and colleague. Which reminds me that I need to get him back onto the podcast [laughter] one of these weeks. But maybe we can switch gears a little bit and can I ask you a little bit how you got into graphs and why you got into graph algorithms and analytics specifically? What makes it so interesting for you?
AH: 00:07:42.413 Yeah, I love to talk about that. Yeah. So it's kind of an interesting story because my background really wasn't in analytics specifically or mathematics, but I've always enjoyed math and started to get a little more into analytics probably about eight years ago or so. And during that time period, I read two books that had a pretty big impact on-- and I would say are actually responsible for me getting into graph. One was a Black Swan by Nicholas--
RVB: 00:08:17.262 Taleb.
AH: 00:08:17.889 Yes, Nassim Nicholas Taleb. And that's about understanding risk and risk analysis and looking at things without knowing. It's the unknown unknowns, if you will, of risk. And the other book at the same time I read was called The Information by James Gleick, which is a beautifully-written history of information theory and the study of information itself. And during that time, reading those two books together really got me down the path of complexity studies and network science, and the ideas that the analysis or the analytics I had already started to get involved with was very reductionist. So we would take a-- and this is still very common, is that you have a large amount of data, maybe it represents a complex system, maybe it represents traffic. I mean, it could represent the human body, whatever it might be. And traditionally, a reductionist approach would say, "Okay, how can I break these things? How can I break this complex thing up into a whole bunch of constituent parts? And if I understand the constituent parts, I must therefore understand the larger picture." And that's just not true [laughter]. So if you think about breaking the human body down to cells, if you understand the cell, you don't know anything or very little about how glucose is metabolized in the system. You have to understand relationships and how things interact. And so network science--
RVB: 00:09:51.867 Very true.
AH: 00:09:53.340 Yeah, yeah. So whether you're talking about traffic or banks and economic--
RVB: 00:09:57.974 System-wide effects and stuff like that, right?
AH: 00:10:00.022 Yeah. And so I got very, very interested in network science and just studying-- how do you study these complex things interact and have weird emergent behavior? And, as I started to get involved in that, I started to take some online courses and, lo and behold, graph theory came up. Because graph is the math that is used-- graph theory and graph mathematics is what is used to study networks or it's one of the ways you can use to study networks, and the graph algorithms are purposely built to study these complex relationships. And so that's how I came to graph and it was really just kind of the love of trying to understand how things work as a holistic system that kind of brought me into graphs and the algorithms.
RVB: 00:10:51.271 Fantastic. Yeah, no, I can totally relate to that. I've been trying to explain my kids that little video about the-- I think it's Yosemite, how wolves change rivers. I've been trying to explain that to my kids and it kind of summarizes you know these system-wide effects sometimes. I love that story. So let's talk a little bit about the future, if you don't mind. Where is this beautiful path taking you and us with you? What does the future hold, Amy, if you had a crystal ball?
AH: 00:11:32.203 Well, I think from a-- I'll start off near-term, just kind of at arm's reach, kind of Neo4j and kind of my path and our path as we work together is I think we're going to see more algorithms added to our libraries. As you would imagine, we're always doing that. But I think the scope of not just more numbers of algorithms, but also the focus of those algorithms will kind of span out. So the book includes the three classic areas of graph algorithms, but there's also not-so-classic graph algorithms, like similarity, that we've started to add. And then I think you're going to see that, plus more machine-learning processing type of algorithms added as well or things that can help you cleanse data and get it ready for machine learning. And then there are other types of algorithms that are even more holistic that look at the typology of a graph as well, so you can kind of understand the shape of graphs and the shape of how data within the graph, how the nodes within the graphs are connected. And I think you're going to-- so I think the expansion of the different types of algorithms you're going to see over the next few years and then I think just giving more examples for people of use cases is something we're going to be working a lot on. Because one of the things that I found is-- and I believe probably you can relate to as well, is that we're in the process right now of providing people these very powerful tools with the graph algorithms and showing them examples of how to use it in the book, but a lot of people need more. They need more examples. They need more workflow.
AH: 00:13:32.540 So what do I do to my data before it gets in the graph? What do I do after it's in a graph before-- if I want to take it over into machine learning, then what? If I want to look at specific types of problems, is it better to use label propagation community detection or Louvain community detection? And it's not really an easy answer. There's no black and white, and I think that's the hard part. If there was, we would have already published something. But what we can say is if your problem is kind of like this and your data is kind of like that and you're trying to-- your end goal in your question is somewhere like this other thing. These are the three or four you might want to start with, and watch out for the supernode if you're doing it. Or these are the three or four you want to start with, and be careful that you're not overly connected and you only have one community. So I think there's-- I think you're going to see-- so I guess to summarize a lot of those thoughts, you're going to see different types of algorithms come into our library and you're going to see more guidance on use cases and how to fit certain use cases to certain sets of tools.
RVB: 00:14:43.025 You very skillfully avoided the word AI during the entire podcast, Amy. I thought you might bring it up looking at the future. Does it relate in any way to this hype buzzword of artificial intelligence, do you think?
AH: 00:15:05.088 Yeah, absolutely. And it's probably because all of us that get deep into the weeds on certain tools like to talk about the details of the tools and what they can do as opposed to kind of the bigger use case. And I do think-- I view AI as a use. It's a goal to have some kind of a probabilistic model to make decisions similar to the way humans do. So to me, it's a goal and it's a use case. It's not a specific type of technology equals AI. This is why, in my mind, I kind of separate out the use case versus the technology. But the technology underpinning that and a lot of what graphs can do to enhance AI is in the machine learning and to improve the machine learning. So graphs have this opportunity to add context to AI solutions. And we've already seen that in things like if you think about a knowledge graph with natural language processing and a chat bot, graphs can add context to improve accuracy. So that's one way that we might think about it. But also if we want to get a little more into the weeds on the machine learning, graph is also being shown, and we included a entire chapter on graph-enhanced machine learning specifically with feature extraction and connected feature extraction in the book. And in that chapter, we actually show that adding graph features to your machine-learning model, extracting those features out actually improve the classification that we do - we actually use a random forest classifier - and improves the model for classification and improves the accuracy and prediction in the recall rates in our link prediction.
AH: 00:17:01.369 So we can show that-- and that actually is really fascinating to me because we used a citation network and tried to predict that people in the future are going to work together, kind of be co-authors, which just now occurred to me that's kind of meta that we co-authored a book with a chapter [laughter] about co-authoring a book.
RVB: 00:17:21.517 Very meta, exactly.
AH: 00:17:23.674 Very. I just realized it. But looking at the studies around that, the graph features alone weren't the most predictive. Non-graph features alone weren't the most predictive. Basic statistical analysis weren't the most predictive. But when you brought these different features together, almost like an ensemble, if you will, they were overall together improve the predictions. And I think that's an interesting-- if we think about future directions of graph and graph algorithms and analytics in general, is I think we're going to see them be added to a lot of other processes. So machine learning is hot right now. AI is really hot right now for valid reasons. But it's not that graph is going to come in and be the end-all, be-all for everything in your fill in the blank, in your machine learning, your AI, but they can make your predictions more accurate. They can help filtering, just so you can be more efficient in your machine learning, filter down your data, don't do it manually, things of that nature. So I think you're going to see that in the future as graphs become a way to enhance different AI and machine learning processes.
RVB: 00:18:44.545 That is a fantastic and a great summary to wrap up our recording for tonight, I think. And so I really do want to thank you for coming online and doing this with me. And it's going to be a great book, I'm sure, and a great set of links that we'll provide in the podcast transcription as well. So thank you so much, Amy. It was great to have you on the podcast.
AH: 00:19:11.273 Thank you, Rik. It was a pleasure speaking with you, and I love this topic. I'm looking forward to getting feedback on the book and hearing how people actually put it into practice.
RVB: 00:19:22.156 That's what we'll do. Thank you, Amy. Have a nice evening.
AH: 00:19:25.602 Thank you. Bye.
RVB: 00:19:26.781 Bye.
All the best