So today's podcast was a particularly nice and insightful talk with David Meza, of NASA. David is using Neo4j at a small scale today to enable true Knowledge Management on their "Lessons Learnt" data. He can tell you all about that himself - but rest assured that it was truly interesting:
Here's the transcript of our conversation:
RVB: 00:02 Hello everyone. My name is Rik, Rik van Bruggen from Neo Technology and here we are again recording a wonderful session for the Neo4j graph database podcast. I'm joined today all the way from Texas in the U.S.A. by David Meza from NASA. Hello David.
DM: 00:18 Hello Rik, glad to be here.
RVB: 00:19 It's wonderful to have you on the podcast, thanks for making the time.
DM: 00:22 Oh, my pleasure.
RVB: 00:23 Well, it's not every day that we have a space-related guest on the podcast, so it's particularly exciting especially because you've done some wonderful things with Neo4j. But, David, would you mind introducing yourself a little bit to our audience?
DM: 00:41 Sure, again, my name is David Meza. I am the Chief Knowledge Architect at NASA, stationed out of Johnson Space Center in Houston. And as a Chief Knowledge Architect, my primary role is to look at the technological road map for our knowledge services out of our Chief Knowledge Office. Basically, they're doing this by merging information architecture and knowledge management in such a way they can provide our users with all the tools, processes, and applications to really extract the golden nuggets of knowledge from our critical data.
RVB: 01:17 Wow, that sounds really exciting. I can imagine that NASA has a lot of knowledge. You must have a lot of interesting things that you've been working on. I read your blog post from the lessons learned database, David, can you tell us a little bit more about that?
DM: 01:32 Sure, what I was looking at doing with our lessons learned database is most folks while they find lessons learned very important, and they want to make sure that they can get the lessons out of our projects and programs from the past in order to implement them into our future programs and projects, I found that most people tend not to really look through them because they find it very difficult to find information inside of the lessons learned. So I needed to find another way of looking at these lessons learned that would give users, one, a better way to search through the lessons, and two, a better way to connect the dots between lessons to try to make sure they find all the relevant lessons for what they're looking for.
RVB: 02:20 How did you get into graph databases or how did that connect with the wonderful world of graphs?
DM: 02:30 Well, recently, about two, three years ago, I had taken course on social network analysis because I was really looking at how to develop expert finders within our organizations and how to make those connections between people. And when I was posed with this question from an engineer on our lessons learned on how to make it easier to find information and how to connect lessons together, it just dawned on me that a graph database, while it works for people, it should work for documents also because they're also related in many different ways.
RVB: 03:02 So basically, you're looking at a graph of documentation of knowledge base, is that what I'm hearing?
DM: 03:10 Correct, we're taking the lessons learned and after applying some various topic model algorithms to them, we can build relationships in connections between the documents inside there based on unstructured data inside the document. So for example, by using a topic model algorithm, I can create groups of topics for each of the lessons and eventually correlate them together based on self-assign categories. And in doing so, I build relationships in connections in notes and in relationships between these documents.
RVB: 03:48 I read your blog post and what struck me was that you actually use some other tools together with Neo4j as well. You use R for the topic modelling, I believe, is that true?
DM: 03:59 That's correct, I'm a statistician at heart and R has been one of my tools for many years, and I utilize R to do the Latent Dirichlet Allocation or topic modeling algorithm against the documents. But that only gets me to the point of understanding my documents for the analyst. I needed something more for my end user as I go forward.
RVB: 04:25 And that's where you also looked at things like visualization tools I think, the Linkurious and that type of technology?
DM: 04:33 Correct, having read through the Neo4j books and try to get as much information as I can on graph databases, there was a section in one of the books that talked about different types of visualization tools. And so I did an evaluation on various different applications and I found Linkurious was one that did really showed, at least for the end user, a very easy way to walk through the graph in the relationships as you're trying to find information.
RVB: 05:04 They're from Europe, they're from Paris, and we've been doing a lot of projects with them recently. Actually, it's a very nice tool set.
DM: 05:14 Definitely.
RVB: 05:15 How many users are using this solution right now, David? Is it in production or is this just experimental right now?
DM: 05:23 It started up experimental just to showcase the technology and what it could do so that I could secure a little bit of budget to expand, but now to this point, I'm probably up to 150 to 200 different users that are utilizing the tool sets to look through the information. But I hope as I move forward and as I start showcasing this to some of our centers, that I can expand it up to a several thousands of them next year or so.
RVB: 05:54 That's by adding more topic models, more project information, more documents that you'll expand them or how should I see that?
DM: 06:03 Correct, what I'm doing here is showcasing how we can utilize these types of machine learning algorithm and visualization tools against our critical data to make it easier for users to find that information. So more and more groups are, as are coming forward and asking me to help them visualize their data based on the article that I have written there.
RVB: 06:26 That seamlessly sort of brings me to my last question, if you don't mind, and that's, where is this going? What's in store in the future, David, any insights or perspectives on that?
DM: 06:42 My ultimate goal, of course, is to expand how we visualize our data to our end users. I see a fairly decent connection between Neo4j and maybe some other NoSQL databases such as MongoDB or other document databases to help, two things, one, to help capture all the documentation or information in document database, but yet to work together to build relationships in Neo4j communicating with MongoDB in such a way that it automatically creates to nurture relationships based on how the information is input into the databases.
RVB: 07:23 So that would be the entire topic modeling phase would be automated, sort of or is that something different?
DM: 07:32 Correct, it would be automated based on the fact that a user would just have to input their information or upload their information based on certain criteria. But then looking at the topic modeling algorithm, and of course playing with those algorithms and trying to find the best algorithm for the data in order to be able to visualize it correctly.
RVB: 07:54 Super exciting, and I'm assuming that this will then get us to Mars, right [chuckles]?
DM: 07:59 I'm hoping that at least get us the information necessary to get us to Mars a lot quicker. My goal is to get that document, that information to our engineers and our scientists faster because finding information is difficult not just for NASA but for many organizations. Mainly, there's been different surveys in research analysis that generally takes an engineer anywhere from eight to 13 different searches in order to find the information they want and that takes time. I want to shorten that time frame.
RVB: 08:31 That makes so much sense. It's a very popular use case for Neo4j, this graph-based search as we call it. When you're talking about engineering documents or legal documents or published articles, it's a very popular use case and I'm very excited that you guys are using it as well. This is really cool. Thank you for sharing that with us.
DM: 08:54 You're quite welcome.
RVB: 08:55 And I'm hoping that maybe you're attending GraphConnect in the next couple of weeks or?
DM: 09:00 I wish I could. It's on my bucket list of things to do. I may have to wait until next year to get out there. Unfortunately, just too many other things I'm involved in.
RVB: 09:10 I understand. David, we look forward to seeing you at some point and I want to thank you so much for coming on the podcast. It's very cool and thank you also for writing the blog post. It's very much appreciated.
DM: 09:22 You're quite welcome, with my pleasure and I look forward to other topics in the future.
RVB: 09:28 Thank you.Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!
All the best