Note: at the request of Greg's employer, we have removed all references to his employer's name in this podcast
What do Spatial Engineering, Epidemiology, One Direction have in common? Well, of course, it would be our Neo4j graph database. Or: my guest on this week's Graphistania (also spelled as "graph is tinier", according to the nice folks at
TranscribeMe),
Greg Ricker, of {{
HIDDEN CO}}. Greg wrote this really cool graphgist (see below) a while ago, but turns out he's been doing a ton of fine and interesting computer science projects. Here's the conversation.
Here's the transcript of our conversation:
RVB: 00:02.790 Hello everyone. My name is Rik. Rik Van Bruggen from Neo Technology, and here I am again recording a little podcast episode for our "graphistania" podcast. And tonight I have a guest all the way from the U.S., Greg Ricker, from Greenfield, Maine. Hi, Greg.
GR: 00:19.692 Hello. How are you?
RVB: 00:20.611 I'm very, very well. Thanks for coming online, really appreciate it. Greg, you've been active in the Neo4j community for quite some time, but for our listeners, would you mind introducing yourself, and tell us a little bit about you?
GR: 00:36.068 I am a software engineer. I've been working at this for a little over 25 years, and I've done lots of things from embedded systems to larger enterprise Java-based systems. Spent a lot of time in the RFID world. Did some work in public health, and I'm now working for the exciting world of insurance.
RVB: 00:58.902 That is an exciting path that you've been on. And how did you get into the world of graphs there, Greg?
GR: 01:05.009 I started it-- got interested in it when I was involved with the public health in the state of Maine, and I was also involved in a Master's program at University of Maine in spatial engineering. And you could immediately see the relationship between what public health were doing with vaccinations and disease tracking. And it was a perfect fit for graphs, so you'd like to know who's got a vaccination, who's getting reports of diseases, and it's just easily trackable in a graph system. The relationships are just perfect for it.
RVB: 01:41.844 Perfect. Did you find anything interesting from that research that you did at the time?
GR: 01:47.923 Yeah, we found-- we were trying to use it with the geolocation, so we wanted to know what vaccinations were occurring at what locations, and how that tracked with disease reporting. And the thing that we found was that you really have to make sure you're looking backwards in time. So, if you have a disease outbreak today, you need to look back a couple of years and see what the vaccination rates were in that location back then. We made the mistake of looking for vaccinations around the same time period, and obviously if you have a disease outbreak you're going to wind up with vaccinations. So, after looking backwards, we do think there was probably-- you could see a lowering of incidences.
RVB: 02:30.561 Very interesting, and-- but you've done some really-- some very different things with Neo4J as well. I read some of your graphs, jeez I think it was around entertaining your lovely daughter [laughter] and her music choice.
GR: 02:45.021 Yes.
RVB: 02:46.972 I know how that feels, by the way [chuckles].
GR: 02:49.147 My daughters have always wondered what I do for a living because they're 19 and 16. I used to spend a lot of time trying to keep up with their musical interests. And so, for this graph just this last time I thought, "Well, I could drag my younger daughter into this by looking at song lyrics from her - at that time - favorite band One Direction. And the real goal was because I've used Neo4J with Python and R, which are great additions to the whole envtironment. What I wanted to see was could you determine whether a song was happy or sad? Just sort of a sentiment analysis. Then we could come up with some other interesting things, like how many times do they use certain words, and things like that. And so, I got her involvement in it so she could help me find the songs, and parse out the lyrics, and make sense of it. And then she actually went out and did a little research to find out what people thought might be happy songs and sad songs.
RVB: 03:51.799 I'm actually going to show that to my daughter, Greg, if you don't mind, because I know that that might help me explain graphs to her [laughter].
GR: 04:00.803 It really did, so she could see-- so we did things. You'd have one graph, part of the graph was the band name, and then you had the band members, and then you had albums, and if you click on an album and you could see the songs, and then you could click on a song, and you could see all the words. And so, it really exploded for her in that, "Okay, I can really visually see this thing." And it made sense to her, because if I had tried to explain it in a standard database, she would have just wandered off.
RVB: 04:30.216 Too bored.
GR: 04:31.683 Yeah, but this was really-- this really made sense to her, and then we did things like look at how many words occur. One of the things you could do with the way we extracted the data was what's the most common first word in the first line? And then the sentiment analysis was a little tougher. Then probably the more interesting thing out of it was raw sentiment analysis might say, "If you see the word love in a phrase four times, it's a positive song." What we found is there's a lot more context to a song than there might be say for an email or something like that. So I Love You might be a positive song, but I Used to Love You might mean a negative song.
RVB: 05:19.795 Of course.
GR: 05:20.696 And so, that was something that we hadn't really considered. And so, one of the things we were going to go back and do was to look at the relationship between positive and negative words or tense, some things like that within a line. Or how close could you find the word, a positive and a negative word together, and that sort of thing.
RVB: 05:42.973 So, Greg, why are graphs so interesting to you? Why do you use Neo4J for something like music analysis or epidemiology. What's so interesting about it?
GR: 05:55.851 When you look at it, graphs are about relationships. And when you start looking at it from that point point of view, it's just a natural. It's easier to understand, it's easier to explain. And to me it's a lot faster when I think about how to process it than it is when I'm looking at tables, and joins, and foreign keys, and all that other stuff. It just seems like a much more natural way of storing data and retrieving data.
RVB: 06:26.369 So, that's the model really? And yet the model is so attractive for explaining data, is that what I'm hearing?
GR: 06:32.135 Yeah, but it just makes sense. At {{HIDDEN CO}} where I'm working, they store a lot of documents. And I'm working on a research project right now to use Neo4J as part of their document storage. And it really makes sense, because you've got customers, and you've got claims, and you have policies. And we don't want to store the documents themselves in Neo, we'll store them on some other like an Amazon cloud. But the metadata, the relationships between the policies, and the claims, and the customers fit really very well in a Neo4J.
RVB: 07:08.959 Is there anything other than the model that you think is so powerful at it? Is it performance related or is it--?
GR: 07:16.707 Yeah, performance has been very good, but the tools, like I said, I'm really in love with the Python and the R components for it. And obviously, the community support's excellent for all of this stuff. When I first started doing it, I was using Java with Spring, and I was blown away to find out that they all ready had a Spring component for it, so the support is good. One of the challenges I have at my job right now is to see how well it works with a database that has a billion records in it.
RVB: 07:56.092 There's a-- size does matter. Right [chuckles]?
GR: 08:01.179 Yeah.
RVB: 08:01.779 You'll probably want to take a good hard look at that, and obviously, that's one of the things that the company behind Neo4J - my employer - tries to help with. So, you'll let us know when you need us, but let's take a look at the future, Greg. Where is this going for you personally, but also as an industry. Where do you think this is going?
GR: 08:24.702 One of the things that I see is a match-- is a process where they're saying one database doesn't fit all. In the case of {{HIDDEN CO}}, you have documents, but you have metadata. We're not going to store the documents in Neo4J, we'll store it in Mongo or Couch, and then we'll store the metadata in a Neo4J. I think what I'm seeing a lot of is a mix and match. Let's use documents stores where they're good, and let's use other databases where they're good as well, and not everybody is going to go off to Mongo or Couch or Neo4J. So that's where I think the future is going to be is a lot more mix and match and--
RVB: 09:16.112 Probably go out for assistance?
GR: 09:17.514 Yeah, that's what I was thinking probably go out for assistance. Yes. I think that's the future. But there's-- even a company like {{HIDDEN CO}} is now fully on board with the no SQL model, and that's a broad, vague term to a lot of people. But the fact that they're not married to a single database model anymore is making life easier.
RVB: 09:42.212 Super great to hear. Thank you so much, Greg, for taking the time to talk to us about that. I'll put some links to some of your blog posts and graph sheets on the transcription page whenever we stay at the podcast. But for now I think we will wrap up this recording, and thank you very much for coming online. It really was a joy talking to you.
GR: 10:04.864 Thank you. I appreciate being asked. Thank you.
RVB: 10:06.756 Fantastic. Talk you later. Bye.
GR: 10:08.609 Okay.
All the best
Rik