Friday, 15 May 2015

Podcast Interview with Nicole White, Neo Technology

Here's another fantastic episode of our Neo4j Graph Database Podcast: I had a super nice late-night (for me) conversation with Nicole White, a colleague of mine in our San Mateo office. She is a Data Scientist at Neo, which means she helps us out with a lot of our internal data questions - and develops some fantastic tools for that. She also frequently speaks at conferences and meetups, and writes stuff over here, here (super cool Flask tutorial btw!) and here.

Here's the episode for you:

Here's the transcription of our conversation
RVB: Hello everyone. My name is Rik - Rik Van Bruggen - from Neo Technology, and here I am again recording another episode of our Neo4j Graph Database podcast. With me tonight is - all the way from California - Nicole White, from Neo. Hi, Nicole. 
NW: Hi Rik, how are you? 
RVB: I'm very well. And yourself? 
NW: Very good. 
RVB: Very good. Well, it's late at night for me, it's still afternoon for you, but I thought I'd take the opportunity to talk to you a little bit because-- well, maybe you can explain that, yourself? Who are you, and what do you do with Neo? Do you mind explaining that to our listeners? 
NW: Right. Yeah. My name is Nicole White, I'm a data scientist at Neo4j, and we actually use Neo4j internally to hold all of our data that we collect - marketing, sales, product usage. Particularly with Neo4j, I'm using Neo4j to  perform common data science tasks, but Neo4j is our data storage solution. All of our data sits in one spot, one nice clean spot, and thus it's very easy to answer some of the complicated questions that we weren't able to answer before. Actually, all of our tools that I've built out internally are built on top of Neo4j, which is probably my favorite part about my job - is that I get to use Neo4j, I don't have to touch SQL ever, I just get to write cypher all day long which is super, super fun. So with regards to Neo4j, that's who I am. But I just recently graduated from grad school with a degree in statistics, and before that I got an undergraduate degree in economics and math. Just hailed from Austin, Texas, moved here to California, San Mateo, ten months ago, I think, is when I started. I'm coming up on my first year here at Neo. 
RVB: Okay. Well, this sounds like we're eating our own dog food, right? Using Neo for a-- 
NW: Yes, we are. We actually just upgraded to 2.2. All of our systems were just upgraded to 2.2. 
RVB: Fantastic. How did you get into Neo, Nicole? I mean, you must have started using that at grad school or at university, or how did you get into it? 
NW: Yeah. It was actually the GraphGist Challenge. It was the very first one. I saw it on Twitter. Someone who I was following re-tweeted a Neo4j tweet about the GraphGist challenge, and so I looked at the page and I saw a GraphGist. I think the first GraphGist I saw was something about doctors and prescriptions or something, and I saw Cypher and I was like, "This looks really cool." And, of course, there is an opportunity to make money so I was all about it. I looked at Cypher-- 
RVB: Typical student, right [chuckles]? 
NW: I know, right [chuckles]. I was actually, at the time that I came across these GraphGists, I was working on a project with My Flights data set in school with all of the-- it was the Bureau of Transportation's statistics, all their data on delayed flights across all domestic-- US domestic airports. I had that all in Oracle database - a SQL database - and I was doing just like some pretty basic analysis on it for a school project, and then as soon as I saw Neo4j-- as soon as I saw Cypher, I already knew that a lot of my SQL queries would be so much easier in Cypher. I was already seeing that I would prefer to have Neo4j.  So I moved it all to Neo4j, and then I also created that GraphGist of the flights, and that's the first data set that I learned Neo4j on and learned Cypher on.
RVB: Yeah, fantastic. So you mentioned that you thought , you know-- 
NW: The GraphGist on. 
RVB: Yeah. You mentioned that you thought that it would be a lot simpler than in SQL, that in SQL. Did that turn out to be true? Is that-- 
NW: Yeah. 
RVB: --one of the things that you like about it, or where does the love for Neo come from? 
NW: That has to be the first thing that hit me, was there are some SQL queries that I really struggled with. There was one - it was so simple to say in English. It was just like,  "I want to see airports that are, by definition, span multiple states." Because some airports are technically-- some over in the DC area, they technically sit in several states somehow, and writing that query in SQL was strangely hard. I had to use a partition by and something weird. I remember it was that query specifically, and when I saw Cypher, I was like, "That's going to be super easy," and it was. I took a SQL query that was probably, like, 20 lines and really hard to read, and put it into Cypher. And that's what I love about Neo4j, is that you can take a question that you've posed in English and very easily translate it into Cypher and vice versa. Like, I can take a Cypher query and then translate it back to English very easily, even if it's a data set I've never seen before, a Cypher query I've never seen before. I can easily scan through it and say, "This is what they're doing," in English, whereas when someone sends you a SQL query and that-- particularly with a data set you haven't worked with, translating it back to English is really hard. Just trying to scan everything that's going on in the cycle-- or in the SQL query, I think is very difficult. So I think from a collaboration standpoint, Neo4j is super awesome. Because I got a few of my classmates to work with me on this so I'm putting all the flights data into Neo4j, and just collaborating across queries was much simpler because we could understand what-- 
RVB: Because of the readability, yeah. 
NW: The readability is just a huge factor for me, and I think that's probably what I love most about Neo4j, is Cypher, I would have to say [chuckles]. 
RVB: Yeah, very cool. You mentioned earlier that you were using it for data science and I believe you're also doing a lot of talks on integrating Neo with R right, with the R project. Can you tell us a little bit more about that maybe? 
NW: Yeah, so I wrote the R driver for Neo4j. It's called RNeo4j, and I use that internally here as well a lot as in addition to Python. Python and Neo4j do all the heavy work, and then any reporting, or analysis, or charting visualization stuff, I'll spin up my R driver and pull Neo4j data into R for more fancier statistic stuff which we've been doing recently for some new projects that we've just started here at work. The R driver, essentially, is just a wrapper for the rest API and it will pull Cypher query results into your R environment very easily, and then you can-- and that opens up a lot of doors for analysis purposes. But yeah, I've been doing a lot of talks around that. I do a meet-up on the R driver probably like once every couple of months here in the Bay Area-- 
RVB: You should do that in Europe, Nicole. I mean-- 
NW: I should [laughter]. I'll be in Europe soon. I'll be there for a GraphConnect London, so I'll probably do something with Mark while I'm over there, because he uses the R driver probably more than I do. If you look at his blog [chuckles]-- 
RVB: Yeah, exactly. So maybe wrapping up, Nicole, where do you think this is going? Where do you hope, where do you want it to go, and where do you think it will go? The evolution of graph databases is so quick this days yet-- but what do you think [chuckles] is coming at us right now? 
NW: I just think we get to look forward to just a huge improvement in user experience from a user standpoint. I've been a user of Neo4j for a little bit over a year now and it's just crazy how quickly they improved the user experience. Just from 1.9, I think, is when I first saw it, 2.0 was huge, just the Neo4j browser has gotten so much better with the 2.2 release. 
RVB: Absolutely, yeah. 
NW: There's just so many nice, convenient-- like they're subtle, the changes are subtle, but when you're a super-heavy user of the Neo4j, they really stand out. They've made some subtle changes to the Neo4j browser that I really like. I think, I'm mostly looking forward to the huge improvements in user experience that are most likely continuing to come. Also, I think from my standpoint as well, the whole import process for Neo4j is going to continue getting more awesome because I feel like import was our biggest weakness when I first encountered Neo4j. We didn't have a lot of really easy-to-use tools. And within a year, now we have load CSV, which is really easy, and then we have the 2.2 import tool, which is really easy and super-fast. I feel like that's also what I'm looking forward to, is continued improvements on the import part of Neo4j, because that's the first part you're going to encounter, right? As a new user, the first thing you're going to do is import your data, so I'm really happy that we've been putting so much work into that part. The whole import experience has gotten much better. 
RVB: I could not agree more [chuckles]. As you know, I'm in sales, and I know how important this is. Very good, thank you so much Nicole for coming online and doing this recording with me, I really appreciate it. 
NW: Thanks for having me. 
RVB: It was great having you on the podcast and I really appreciate it. Thank you again, and yeah, I look forward to seeing you at GraphConnect. 
NW: Yeah, I look forward to it as well. Have a good one. 
RVB: See you, bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

No comments:

Post a comment