Here's the transcript of our conversation:
RVB: 00:01 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here I am again recording another episode for the Graphistania Podcast. And tonight I am joined by Caleb Jones all the way from Seattle in the US. Welcome Caleb.
CJ: 00:15 Thanks, Rik. It's great to be here.
RVB: 00:18 I asked you how to pronounce your name and I made a mistake, I am very sorry--
CJ: 00:22 Oh that's all right.
RVB: 00:22 [laughter]. I apologize. So Caleb, why don't you introduce yourself? You've been very active in the community and your blog and everything, but it will be good for you to introduce yourself to our audience, if you don't mind.
CJ: 00:36 Sure. And again, thanks for the opportunity to kind of sit down and chat. I've been kind of involved in the graph space for years now. My introduction to it actually did come through Neo4j. That was kind of my first intro into the space, also focus a lot on doing graph analysis and, as you mentioned, I run a blog called AllThingsGraphed.com. I post some of those analysis. Really a labor of love. I'm not getting paid to do it at all. It's just a way for me to kind of just express what I'm exploring and some of the insights as I play around with the graph space. Like you just mentioned, I'm in the Seattle area. Professionally, I work as a software engineer, now more recently, a software architect for Walt Disney Company.
RVB: 01:29 Excellent. I've being reading your 'All Things Graphed' blog for quite some time. You've done some amazing posts on there that I really enjoyed reading, you know. Do you mind telling us a little bit about some of those experiments?
CJ: 01:46 Sure.
RVB: 01:46 I'm a particular fan about the antonym synonym pathways thing. There's a lot of other interesting things.
CJ: 01:53 Yeah. So, if you don't mind, I can just kind of dive right in to, you know, what kind of led me to do that initially. Really what-- so I mentioned Neo4j kind of turned me into the graph space, kind of opened the door, and then I ran across this essay called Science and Complexity - the one that Weaver wrote (note: download over here). And he wrote it back in the mid-twentieth century. And it kind of lays out these books of science, what he calls problem of simplicity. We have one element acting on another, problem of disorganized complexity. Well, now you're looking at things at system level, but not in terms of the interactions of pieces in there. But then also, a problem of organized complexity. So this is mid-twentieth century, and he says, "Well, problem of organized complexity is really what we're going to need in order to start addressing things like the complexities of medical, psychological, biological, political, economic sciences, and he saw it as a kind of a blocker towards us starting to really explore those problems of organized complexity.
CJ: 03:06 The compute resources, so again this is back in mid-twentieth century, 1948, and he's seeing how there's this kind of new form of scientific exploration, and analysis, that once we have computational power that's up to the task, we'll be able to start diving into. And, to me, that just screams graphs. Right? You have graphs that they're really designed around that concept of elements and their relationships or interactions with each other and then as you start building up that graph, and then start doing network wide or graph wide analysis, you start to have these insights. So, that turned on a lightbulb for me, and I said, "Wow, you start seeing graphs everywhere. And I said, "Well, I've started writing some tools and some code that allows me to do these kinds of analysis, and so I don't have to write code every single time, and that's really what led me to, "Well I'll start blogging about this, as I start playing around." That's what kind of led me to how I got here, how I got exposed to graphs and where I'm coming from in my blog.
RVB: 04:19 Yeah. Some of the experiments, you've done some really interesting experiments. What's your favorite one that you've done so far if you don't mind? [Briefly?] a little bit.
CJ: 04:28 Yeah, definitely. I try to keep a variety. I don't want to do the same analysis over and over again, so I try to use different sorts of data sets and different topics. So I've gone away from kind of a microscopic, where I did one on protein interactions of budding yeast, and then even kind of looked at what some of those molecules look like and write a molecule as a graph, right? You have atoms that have certain kinds of connections to each other, right? And then molecules interactions, that's a graph and so forth. You build up and then all the way up to my favorite one was the interstellar network navigation, using graph analysis and that one was a real challenge, but also a real joy to do because astronomy is a big life passion of mine. And so that was a really an intersection of a life-long passion and technical skills set, and the right matched up with the right data set. So that was my favorite one.
RVB: 05:32 That's the one that you presented at GraphConnect, I think, and there's a video about that and everything.
Yeah, very cool. It's funny that you mentioned this protein interaction. That was actually one of the first projects that got me into a more of a practical Neo4j insight as well. There was a research group here at the University of Ghent that was doing a metaproteomics - protein, protein interaction and analysis for beer yeasts. That's one of the first interactions that I've had as well, so it's funny that you mention that.
CJ: 06:04 Yeah.
RVB: 06:06 So I think you've already touched a little bit on why is it so powerful and so interesting for you as well. It's all about dealing with this complexity I suppose. But are there any particular things that you really, really enjoy about graphs that you don't find in other data structures or...?
CJ: 06:25 Yeah, so one thing that I really enjoy about graphs in particular, is it starts to address these kind of topological questions. What I mean by that is, when you start analyzing a graph and it's features, you really start to get insight into the emergent properties of that data set or that system. And so, for instance, in an economic graph or network, what you would start to see is key brokers of transactions in that network. And there are examples like eBay using that for fraud detection and things like that. And on the medical sciences I've mentioned that post I did on budding yeast proteins and their interactions. You can start to tease out what are some of these key proteins that are involved, and when you look at that it turns out that that's kind of a fundamental building block that you see across different kind of types of life. So that's the sort of insight that when you're only looking at an individual protein in that instance and only it's immediate connections, you're not going to get that kind of an insight, versus when you start looking at networks and doing things like PageRank analysis between this centrality scoring and so forth.
RVB: 07:41 You can look at system-wide effects, right? You can look at the entire interaction rather than just the local interaction I suppose?
CJ: 07:49 Right, and you start getting emergent properties that in some interesting ways aren't necessarily strictly reducible to any one element in the network, right? It's really an attribute of the network as a whole.
RVB: 08:01 Super interesting. And so many practical applications as well. As you know, I work for Neo as a commercial guy, as a salesperson. I see so many applications in business and from logistics, to financials, to cancer research, there's so many applications of this stuff. It's really quite amazing [chuckles].
CJ: 08:24 Yeah.
RVB: 08:25 Super. Maybe one last question if you don't mind. Where do you think this is going, or where do you want it to go, or where do you want to take it yourself? What does the future hold, Caleb?
CJ: 08:37 It's hard for any of us to say, but I think graphs are really poised to start being a tool that can be used to answer or provide some sharpness to our answers of some really big questions. It was one of the key things I talked about at the last GraphConnect presentation in San Francisco, finding what are the big questions we want answers to in these different areas, whether it's astronomy or biology, taxonomies like I might find in WordNet. I've done a few analysis on Wikipedia. You know, what are some of these big questions that we can start answering, and how can we use graphs to sharpen our answers to those questions. That's what I kind of see coming out as we start using graphs more and more.
CJ: 09:27 For me personally, I know in the last couple months I haven't been posting on my blog. I have a few that are kind of building up. One is, I've actually started scraping political candidates' websites, and starting to look at those.
RVB: 09:44 This is the year to do it right [laughter]?
CJ: 09:47 Yeah yeah, definitely. But I want to do a new kind of analysis where I'm starting to scrape the topology of those connections but also the content, then do the analysis, then produce word clouds that are segmented based on that analysis, to really tease out what are these-- what does the language really tell us about these candidates? And so that's one thing that's kind of on the horizon for me. On the astronomy side, I actually got my hands on a data set from a universe simulation from a colleague from the Los Alamos National Laboratory, and basically try to replicate the same sort of analysis I did previously on the stellar network, but do it at a galactic simulation level. So that's another thing that's kind of next on the horizon for me. Yeah.
RVB: 10:43 Wow. That will be something big, wow. I actually talked to someone from NASA a couple of months ago, that was using Neo4j as well, so [chuckles]. That was super interesting, as well. Anyway, Caleb, I think we're going to wrap up here. I really appreciate you coming online and talking to me about all this wonderful stuff. I'll make sure we have enough links to all your great articles and transcription when we publish it. Thanks a lot. Really appreciate it. I hope to meet you at some future GraphConnect.
CJ: 11:18 Yes, definitely. Thanks for taking the time.
RVB: 11:20 Cheers, man. Bye.
CJ: 11:20 Bye.Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!
All the best