Monday 26 September 2016

Podcast Interview with Sascha Peukert, TU Dresden

Another week another podcast episode! Here's a great chat with a graphista that has not even started his professional career yet - but that, by the sounds of it - could really be a stellar one. My chat with Sascha Peukert in Germany:

Here's the transcript of our conversation:
RVB: 00:04.386 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are recording another podcast session for our Graphistania podcast. And today I have a very interesting guest on the podcast from Germany - from Dresden. And that's Herr Sascha Peukert. Hi, Sascha. 

Friday 16 September 2016

Podcast Interview with Greg Ricker, {{HIDDEN CO}}

Note: at the request of Greg's employer, we have removed all references to his employer's name in this podcast

What do Spatial Engineering, Epidemiology, One Direction have in common? Well, of course, it would be our Neo4j graph database. Or: my guest on this week's Graphistania (also spelled as "graph is tinier", according to the nice folks at TranscribeMe), Greg Ricker, of {{HIDDEN CO}}. Greg wrote this really cool graphgist (see below) a while ago, but turns out he's been doing a ton of fine and interesting computer science projects. Here's the conversation.

Here's the transcript of our conversation:
RVB: 00:02.790 Hello everyone. My name is Rik. Rik Van Bruggen from Neo Technology, and here I am again recording a little podcast episode for our "graphistania" podcast. And tonight I have a guest all the way from the U.S., Greg Ricker, from Greenfield, Maine. Hi, Greg. 
GR: 00:19.692 Hello. How are you? 
RVB: 00:20.611 I'm very, very well. Thanks for coming online, really appreciate it. Greg, you've been active in the Neo4j community for quite some time, but for our listeners, would you mind introducing yourself, and tell us a little bit about you? 
GR: 00:36.068 I am a software engineer. I've been working at this for a little over 25 years, and I've done lots of things from embedded systems to larger enterprise Java-based systems. Spent a lot of time in the RFID world. Did some work in public health, and I'm now working for the exciting world of insurance. 
RVB: 00:58.902 That is an exciting path that you've been on. And how did you get into the world of graphs there, Greg? 
GR: 01:05.009 I started it-- got interested in it when I was involved with the public health in the state of Maine, and I was also involved in a Master's program at University of Maine in spatial engineering. And you could immediately see the relationship between what public health were doing with vaccinations and disease tracking. And it was a perfect fit for graphs, so you'd like to know who's got a vaccination, who's getting reports of diseases, and it's just easily trackable in a graph system. The relationships are just perfect for it. 
RVB: 01:41.844 Perfect. Did you find anything interesting from that research that you did at the time? 
GR: 01:47.923 Yeah, we found-- we were trying to use it with the geolocation, so we wanted to know what vaccinations were occurring at what locations, and how that tracked with disease reporting. And the thing that we found was that you really have to make sure you're looking backwards in time. So, if you have a disease outbreak today, you need to look back a couple of years and see what the vaccination rates were in that location back then. We made the mistake of looking for vaccinations around the same time period, and obviously if you have a disease outbreak you're going to wind up with vaccinations. So, after looking backwards, we do think there was probably-- you could see a lowering of incidences. 
RVB: 02:30.561 Very interesting, and-- but you've done some really-- some very different things with Neo4J as well. I read some of your graphs, jeez I think it was around entertaining your lovely daughter [laughter] and her music choice
GR: 02:45.021 Yes. 
RVB: 02:46.972 I know how that feels, by the way [chuckles]. 
GR: 02:49.147 My daughters have always wondered what I do for a living because they're 19 and 16. I used to spend a lot of time trying to keep up with their musical interests. And so, for this graph just this last time I thought, "Well, I could drag my younger daughter into this by looking at song lyrics from her - at that time - favorite band One Direction. And the real goal was because I've used Neo4J with Python and R, which are great additions to the whole envtironment. What I wanted to see was could you determine whether a song was happy or sad? Just sort of a sentiment analysis. Then we could come up with some other interesting things, like how many times do they use certain words, and things like that. And so, I got her involvement in it so she could help me find the songs, and parse out the lyrics, and make sense of it. And then she actually went out and did a little research to find out what people thought might be happy songs and sad songs. 
RVB: 03:51.799 I'm actually going to show that to my daughter, Greg, if you don't mind, because I know that that might help me explain graphs to her [laughter]. 
GR: 04:00.803 It really did, so she could see-- so we did things. You'd have one graph, part of the graph was the band name, and then you had the band members, and then you had albums, and if you click on an album and you could see the songs, and then you could click on a song, and you could see all the words. And so, it really exploded for her in that, "Okay, I can really visually see this thing." And it made sense to her, because if I had tried to explain it in a standard database, she would have just wandered off. 
RVB: 04:30.216 Too bored. 
GR: 04:31.683 Yeah, but this was really-- this really made sense to her, and then we did things like look at how many words occur. One of the things you could do with the way we extracted the data was what's the most common first word in the first line? And then the sentiment analysis was a little tougher. Then probably the more interesting thing out of it was raw sentiment analysis might say, "If you see the word love in a phrase four times, it's a positive song." What we found is there's a lot more context to a song than there might be say for an email or something like that. So I Love You might be a positive song, but I Used to Love You might mean a negative song. 
RVB: 05:19.795 Of course. 
GR: 05:20.696 And so, that was something that we hadn't really considered. And so, one of the things we were going to go back and do was to look at the relationship between positive and negative words or tense, some things like that within a line. Or how close could you find the word, a positive and a negative word together, and that sort of thing. 
RVB: 05:42.973 So, Greg, why are graphs so interesting to you? Why do you use Neo4J for something like music analysis or epidemiology. What's so interesting about it? 
GR: 05:55.851 When you look at it, graphs are about relationships. And when you start looking at it from that point point of view, it's just a natural. It's easier to understand, it's easier to explain. And to me it's a lot faster when I think about how to process it than it is when I'm looking at tables, and joins, and foreign keys, and all that other stuff. It just seems like a much more natural way of storing data and retrieving data. 
RVB: 06:26.369 So, that's the model really? And yet the model is so attractive for explaining data, is that what I'm hearing? 
GR: 06:32.135 Yeah, but it just makes sense. At {{HIDDEN CO}} where I'm working, they store a lot of documents. And I'm working on a research project right now to use Neo4J as part of their document storage. And it really makes sense, because you've got customers, and you've got claims, and you have policies. And we don't want to store the documents themselves in Neo, we'll store them on some other like an Amazon cloud. But the metadata, the relationships between the policies, and the claims, and the customers fit really very well in a Neo4J. 
RVB: 07:08.959 Is there anything other than the model that you think is so powerful at it? Is it performance related or is it--? 
GR: 07:16.707 Yeah, performance has been very good, but the tools, like I said, I'm really in love with the Python and the R components for it. And obviously, the community support's excellent for all of this stuff. When I first started doing it, I was using Java with Spring, and I was blown away to find out that they all ready had a Spring component for it, so the support is good. One of the challenges I have at my job right now is to see how well it works with a database that has a billion records in it.
RVB: 07:56.092 There's a-- size does matter. Right [chuckles]? 
GR: 08:01.179 Yeah. 
RVB: 08:01.779 You'll probably want to take a good hard look at that, and obviously, that's one of the things that the company behind Neo4J - my employer - tries to help with. So, you'll let us know when you need us, but let's take a look at the future, Greg. Where is this going for you personally, but also as an industry. Where do you think this is going? 
GR: 08:24.702 One of the things that I see is a match-- is a process where they're saying one database doesn't fit all. In the case of {{HIDDEN CO}}, you have documents, but you have metadata. We're not going to store the documents in Neo4J, we'll store it in Mongo or Couch, and then we'll store the metadata in a Neo4J. I think what I'm seeing a lot of is a mix and match. Let's use documents stores where they're good, and let's use other databases where they're good as well, and not everybody is going to go off to Mongo or Couch or Neo4J. So that's where I think the future is going to be is a lot more mix and match and-- 
RVB: 09:16.112 Probably go out for assistance? 
GR: 09:17.514 Yeah, that's what I was thinking probably go out for assistance. Yes. I think that's the future. But there's-- even a company like {{HIDDEN CO}} is now fully on board with the no SQL model, and that's a broad, vague term to a lot of people. But the fact that they're not married to a single database model anymore is making life easier. 
RVB: 09:42.212 Super great to hear. Thank you so much, Greg, for taking the time to talk to us about that. I'll put some links to some of your blog posts and graph sheets on the transcription page whenever we stay at the podcast. But for now I think we will wrap up this recording, and thank you very much for coming online. It really was a joy talking to you. 
GR: 10:04.864 Thank you. I appreciate being asked. Thank you. 
RVB: 10:06.756 Fantastic. Talk you later. Bye. 
GR: 10:08.609 Okay.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best


Thursday 8 September 2016

Podcast Interview with Dustin Cote, Luther College

I have called to him before, but I am going to do it again: thanks to the wonderful Bryce Merkl Sasaki at Neo4j, I have had the pleasure of talking to sooooooo many wonderful guests on the podcasts. People that I would otherwise probably never have talked to, for all kinds of reasons. Another one of these folks is my guest on the podcast today - Dustin Cote. An active graph enthusiast in the heartland of the Midwest - and a great knowledgeable database engineer at that, who I really enjoyed talking to. Here's our conversation:
Here's the transcript of our conversation:
RVB: 00:02.617 Hello everyone. My name is Rik Van Bruggen from Neo Technology and here I am recording another podcast session for the Neo4j Graphistania podcast. Tonight, I'm joined by someone from the beautiful Midwest in the USA. I've got Dustin Cote from Decorah, Iowa on the Skype call. Hi Dustin. How are you? 
DC: 00:25.316 Hi Rik. I'm doing great. How are you
RVB: 00:27.268 Very, very good. Thank you for joining me. It's always great to have people from different parts of the world on this podcast. I've read some of your work in the communities and in the GraphGist, Dustin, but most of our listeners probably haven't yet. Do you mind introducing yourself to us and telling us who are, what do you do, and what's your relationship with the wonderful world of graphs? 
DC: 00:53.085 All right. My name is Dustin Cody. I currently live in a northeast corner of Iowa in a small rural town by choice. I work for Luther College. It's a small, private college and just started last year. Prior to that, I was working at the University of Wisconsin-Madison and worked there for seven years as a PeopleSoft programmer analyst with emphasis in database design. Before that, I was a data warehouse administrator, and before that, I was doing stuff with databases and before that, I was learning to program in seventh grade. I've been around for a while, well seasoned as they say. That's what I'm working on, mainly because of my experience with ERP systems. Seems like no one wants to work on old technology, so there's always a niche for that. 
RVB: 01:45.121 The world would stop turning without old technology, I think. 
DC: 01:49.133 It would, believe it or not. 
RVB: 01:50.191 It would, I absolutely do. Dustin, what's your relationship with Neo4j and graph databases? How did you get into them? 
DC: 01:59.511 Very recently, there was a competition for GraphGist on Neo4j and I thought, "What a great to finally finish one of my project ideas." It had a deadline and basically, it was the only thing I needed to finish the project. I put together a conference data model, because I was going to a conference and I wanted to know different things from the booklet they gave you, but because it's in a certain order, you can't find out certain things. I knew that a graph database would be the perfect way to query on different angles and different ways of looking at your data. Before that, I would say back in 2008 when one of our companies moved and I was worried about competing against 25 Java developers, unleashed onto the town for jobs, I went back to school. Even though it turned out not to be an issue being reemployed, I continued to go back to school and get my masters. It was then that I was doing a research paper and one of the papers I was doing was to debunk over-hyped technologies. One of them was the Semantic Web
DC: 03:03.407 I heard of the Web 3.0 and thought to myself that this has been over-hyped. I've heard about it for years and I haven't seen anything about it. So while researching it, I realized that this is exactly what I had been looking for to solve so many of my own database projects. For example, I had some projects where I was saving just spare data. So, for instance, when you save music, MP3s or songs, it's usually saved as artist, album and song name. Well, if you talk to classical music enthusiasts, they care about orchestras and maybe the conductor, or the original composer. They look at different things, but it's the same item almost, and you still have to store it. You know, by the time I was done designing all the different kinds of genres, there was just no database for relational databases that could handle it well. And when I looked into Semantic Web, I realized that this was the solution. And back then, there was a NOSQL movement. I would say Semantic Web - since it saves as tuples or triples - I believe it's based on graph databases, a little more formal perhaps. And so my natural-- so all of the uses I found for Semantic Web I also find useful for Neo4j. And at the time, when I was choosing which graph database to use, Neo4j was the one that I had heard the most. It had the best reviews at the time. And so I decided to spend the little time I had with raising two kids to learn a new technology. And so that's why I picked Neo4j. 
RVB: 04:40.390 Interesting, yeah. It's very much related technology, and that we could talk for more than what we have in this podcast about the differences between Semantic Web databases and Neo4j, because they are quite different. But [chuckles], we'll take that off-line. But they are different but related technologies, I would say. What did you like about graph database then? What problems was it solving for you? Why was it such a good fit for some of the things that you had found problematic in other databases? 
DC: 05:18.521 One of the things I've always liked is how you can make non-obvious connections. You might have two different graft sets that are unconnected at the moment, but some time in the future if you're still collecting data you might [inaudible] two nodes together, and then suddenly your same query would return different results, and, in fact, maybe solve a problem. I think one of the examples I saw long ago was perhaps besides the NASA and CIA kind of examples, would be different language authors, and they might have a different name in one language than another. You might be following him and you suddenly make the connection and realize that they have all this other body of work. So I like those kind of solutions that the graph database is continuously growing and making connections for you. 
RVB: 06:08.085 Yes, like inferring new paths between different parts of the graph. Is that what I'm hearing? 
DC: 06:15.632 Well-spoken. Yeah, exactly [laughter]. 
RVB: 06:19.269 Yeah, well. I mean that is a-- that's one of the examples that I always give. It's like the path finding. I've got these two things and how are they connected to each other, and show me those connections. Whether you're talking about a [BR?] data set or a social network or recommendation engine, that's one of the most powerful use case-- those hidden connections as you call it. 
DC: 06:41.392 Right, and with database design, when you're working in large company, you have to spend so much time, ahead of time, designing things. And then once it's approved, and then once it's implemented, many months or years can go by and once it's done, you're pretty much nail in the coffin, and you really don't want to change it again because you don't know what you might change. With graph databases, sometimes you can add a whole new set of features or properties, and it won't affect the past data you had. It'll actually just enrich the data you have in your new applications that can leverage it. That's very powerful. 
RVB: 07:16.441 That's such a powerful point that you're making there, and I think it's also why it's like a perfect storm now for graph databases because of the whole agile developing paradigm as well. You know, people don't develop waterfall systems anymore. You know, they try to take a much more leaner approach to software development. You don't know what you don't know when you're developing a new system. It's a great fit for that, I think, as well. Dustin, where do you think this is going? Where do you, personally, want to take this? I'll put some of the links to your work on the blog post with this podcast recording, but where do you see it going? What do you want to do with it? Where do you see the industry taking it? 
DC: 08:04.811 Where I personally want to take it is, perhaps, more projects that can help categorize interests. I've been an internal description, is my industry. I'm might be interested in homesteading or something. There are podcasts out there that are in the 2,000 episode range and it would be great to be able to categorize those and find exactly what you're looking for, maybe even the author or the interviewee. Some day, maybe you'll have 2,000 interviews here. 
RVB: 08:33.387 Oh my God [chuckles]. 
DC: 08:35.375 You never know how you can tie all those people together. People bring such great resources from their work. You know, [hack the plans?], it's more like categorize the plan. I think graph databases is good for that. As far as the world's future, I think this is a tool that's generic enough and powerful enough that someone out there that doesn't even know anything about this yet is going to come up with an idea, use your project and platform, and come up with something new. It's just that kind of technology that you're creating and it's really going to be anyone's imagination really, something that we haven't even thought of yet. 
RVB: 09:18.040 Like a platform for innovation, basically? Like doing--? 
DC: 09:20.356 Absolutely. 
RVB: 09:22.665 Super relevant. I really like that perspective. I really do. I think, for example, in the past couple of months, we've seen at Neo4j when some of the Panama Papers research was published and stuff like that. That was for us fantastic validation. That would never have happened ten years ago and now it's happening because of the small contribution that we're making, I guess. 
DC: 09:49.082 Absolutely. I like your ramp up time to test out Neo4j with that web interface you have, where you can start importing right away. It's a great tool. 
RVB: 09:57.583 Very cool. Very nice. Thank you, Dustin, for spending your time with me on this recording. As you know, I like to keep these things short and snappy and digestible for everyone. We'll put some of the links to your work with the transcription, but for now, I want to thank you very much for coming online. I hope to meet you in person, face-to-face, at some point [chuckles]. 
DC: 10:21.682 It's my pleasure and keep up the good work. 
RVB: 10:23.838 Thank you so much. Bye. 
DC: 10:25.481 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best