Friday 19 February 2016

Podcast Interview with Caleb Jones, Disney

I have said it before, and I will say it again. The overwhelmingly wonderful reason why I keep making the time to do these podcasts, is that I get to talk "shop" for a bit with some of the smartest, loveliest, most interesting people in the industry. It's so cool to talk to people like my next guest, Caleb Jones. Caleb is one of those community members that does not blog / write / speak very often, but when he does - it simply BLOWS you away. Listen to or read the interview below - it is a gem.
Here's the transcript of our conversation:
RVB: 00:01 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here I am again recording another episode for the Graphistania Podcast. And tonight I am joined by Caleb Jones all the way from Seattle in the US. Welcome Caleb. 
CJ: 00:15 Thanks, Rik. It's great to be here. 
RVB: 00:18 I asked you how to pronounce your name and I made a mistake, I am very sorry-- 
CJ: 00:22 Oh that's all right. 
RVB: 00:22 [laughter]. I apologize. So Caleb, why don't you introduce yourself? You've been  very active in the community and your blog and everything, but it will be good for you to introduce yourself to our audience, if you don't mind. 
CJ: 00:36 Sure. And again, thanks for the opportunity to kind of sit down and chat. I've been  kind of  involved in the graph space for years now. My introduction to it actually did come through Neo4j. That was kind of my first intro into the space,  also focus  a lot on doing graph analysis and, as you mentioned, I run a blog called AllThingsGraphed.com. I post some of those analysis. Really a labor of love. I'm not getting paid  to do it at all. It's  just a way for me to kind of just express what I'm exploring and some of the insights as I play around with the graph space. Like you just mentioned, I'm in the Seattle area. Professionally, I work as a software engineer, now more recently, a software architect for Walt Disney Company
RVB: 01:29 Excellent.  I've being reading your 'All Things Graphed' blog for quite some time. You've done some amazing posts on there that I really enjoyed reading,  you know. Do you mind telling us a little bit about some of those experiments? 
CJ: 01:46 Sure. 
RVB: 01:46 I'm a particular fan about the antonym synonym pathways thing. There's a lot of other interesting things. 
CJ: 01:53 Yeah.  So, if you don't mind, I can just kind of dive right in to, you know, what kind of led me to do that initially. Really what-- so I mentioned Neo4j kind of turned me into the graph space, kind of  opened the door, and  then I ran across this essay called Science and Complexity - the one that Weaver wrote (note: download over here). And he wrote it back in the mid-twentieth century. And it kind of lays out these books of science, what he calls problem of  simplicity. We have  one element acting on another, problem of disorganized complexity. Well, now you're looking at things at system level, but not in terms of the interactions of pieces in there. But then also, a problem of organized complexity. So this is mid-twentieth century, and he says, "Well, problem of organized complexity is really what we're going to need in order to  start addressing things like the complexities of  medical, psychological, biological, political, economic sciences, and he saw it as a kind of a blocker towards us starting to really explore those  problems of organized complexity. 
CJ: 03:06 The compute resources, so again this is back in mid-twentieth century, 1948, and he's seeing how there's this kind of new form of scientific  exploration, and analysis,  that once we have computational power that's up to the task, we'll be able to start diving into. And, to me, that just screams graphs. Right? You have graphs that they're really designed around that concept of elements and their relationships or  interactions with each other and then as you start building up  that graph, and then start doing network wide or graph wide analysis, you start to have these insights. So, that turned on a lightbulb for me, and I said, "Wow, you start seeing  graphs everywhere. And I said, "Well, I've started writing some tools and some code that allows me to do these kinds of analysis, and so I don't have to write code every single time, and that's really what led me to, "Well I'll start blogging about this, as I start playing around." That's what kind of led me to  how I got here, how I got exposed to  graphs and where I'm coming from in my blog. 
RVB: 04:19 Yeah. Some of the experiments, you've done some really interesting experiments. What's your favorite one that you've done so far if you don't mind? [Briefly?] a little bit. 
CJ: 04:28 Yeah, definitely. I try to keep a variety.  I don't want to do the same analysis over and over again, so I try to use  different sorts of data sets and different topics. So I've gone away from kind of a microscopic, where I did one on protein interactions of budding yeast, and then even kind of looked at what some of those molecules look like and write a molecule as a graph, right? You have atoms that  have certain kinds of connections  to each other, right? And then molecules interactions, that's a graph and so forth. You build up and then all the way up to my favorite one was the interstellar  network navigation, using  graph analysis and that one was a real challenge, but also a real joy to do because astronomy is a big life passion of mine. And so that was a really an intersection of a life-long  passion and technical skills  set, and the right matched up with the right data set. So that was my favorite one. 
RVB: 05:32 That's the one that you presented at GraphConnect, I think, and there's a video about that and everything.

Yeah, very cool. It's funny that you mentioned this protein interaction.  That was actually one of the first  projects that got me into a more of a practical Neo4j insight as well. There was a research group here at the University of Ghent that was doing a  metaproteomics -  protein, protein interaction and analysis for beer yeasts. That's one of the first interactions that I've had as well, so it's funny that you mention that. 
CJ: 06:04 Yeah. 
RVB: 06:06 So  I think you've already touched a little bit on why is it so powerful and so interesting for you as well. It's all about dealing with this complexity I suppose. But are there any  particular things that you  really, really enjoy about graphs that you don't find in other data structures or...? 
CJ: 06:25 Yeah, so one thing that I really enjoy about graphs in  particular, is  it starts to address these kind of topological questions. What I mean by that is, when you start analyzing a graph and it's features, you really start to get insight into the emergent properties of that data set or that system. And so, for instance, in an economic graph or network, what you would  start to see is key brokers of  transactions in that network. And there are examples like eBay using that for fraud detection and things like that. And on the medical sciences I've mentioned that post I did on budding yeast proteins and their interactions. You can start  to tease out what are some of these  key proteins that are involved, and when you look at that it turns out that that's kind of a fundamental building block that you see across different kind of types of life. So that's the sort of insight that when you're only looking at an individual protein in that instance and only it's immediate connections, you're not going to get that kind of  an insight, versus when you start looking at networks and doing things like  PageRank analysis between this centrality scoring and so forth. 
RVB: 07:41 You can look at system-wide effects, right? You can look at the entire interaction rather than just the local interaction I suppose? 
CJ: 07:49 Right, and you start getting emergent properties  that in some interesting ways  aren't necessarily strictly reducible to any one element in the network, right? It's really an attribute of the network as a whole. 
RVB: 08:01 Super interesting. And so many practical applications as well. As you know, I work for Neo as a commercial guy, as a  salesperson. I see so many  applications in business and from logistics, to financials, to cancer research, there's so many applications of this stuff. It's really quite amazing [chuckles]. 
CJ: 08:24 Yeah. 
RVB: 08:25 Super.  Maybe one last question if you don't mind. Where do you think this is going, or where do you want it to go, or where do you want to take it yourself? What does the future hold, Caleb? 
CJ: 08:37 It's hard for  any of us to say, but  I think graphs are really poised to start being a tool that can be used to answer or provide some sharpness to our answers of some really big questions.  It was one of the key things I talked about at the last GraphConnect presentation in San Francisco, finding what are the big questions we want  answers to in these different areas,  whether it's astronomy or biology, taxonomies like I might find in WordNet. I've done a few analysis on Wikipedia. You know, what are some of these big questions  that we can start answering, and  how can we use graphs to sharpen our answers to those questions. That's what I kind of see coming out as we start using graphs more and more. 
CJ: 09:27 For me personally, I know in the last couple months  I haven't been posting  on my blog. I have a few that are kind of building up. One is, I've actually started scraping political candidates' websites, and starting to look at those. 
RVB: 09:44 This is the year to do it right [laughter]? 
CJ: 09:47 Yeah yeah,  definitely. But I want to do a new kind of analysis where I'm starting to scrape the topology of those connections but also the content, then do the analysis, then produce word clouds that are segmented based on that analysis, to really tease out what are these-- what does the language really tell us about these candidates?  And so that's one thing  that's kind of on the horizon for me. On the astronomy side, I actually got my hands on a data set from a universe simulation from a colleague from the Los Alamos National Laboratory,  and  basically try to replicate the same sort of analysis I did previously on the stellar network, but do it at a galactic simulation level. So that's another thing that's kind of next on the horizon for me. Yeah. 
RVB: 10:43 Wow. That will be something big, wow.  I actually talked to someone from  NASA a couple of months ago, that was using Neo4j as well, so [chuckles]. That was super interesting, as well. Anyway, Caleb, I think we're going to wrap up here. I really appreciate you coming online  and talking to me about all this wonderful stuff.  I'll make sure we have enough links to all your great articles and transcription when we publish it. Thanks a lot. Really appreciate it. I hope to meet you at some future GraphConnect
CJ: 11:18 Yes, definitely. Thanks for taking the time. 
RVB: 11:20 Cheers, man. Bye. 
CJ: 11:20 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a Comment