Here's the transcript of our conversation:
RVB: 00:00:03.935 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo4j, and tonight I'm here again recording another Graphistania Neo4j podcast episode. And on the other side of this Skype call, I've got a wonderful community member from Ellicott City, Maryland. And that's Laura Drummer from Novetta Technologies. Hi, Laura. How are you?LD: 00:00:26.915 Hi, I'm great. How are you?
RVB: 00:00:29.203 I'm really good. I'm happy to have found the time to have a chat with you. And I know you've been extremely busy, right?LD: 00:00:38.683 Yeah.RVB: 00:00:39.654 And thank you for finding the time. Laura, some people may know you from your GraphConnect talks last October in New York
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!or from your blog posts, but I'm sure many people don't know you yet. So if you wouldn't mind introducing yourself? Who are you? What do you do? And what's your relationship to the wonderful world of graphs?LD: 00:01:00.668 Sure. I'm Laura Drummer. I'm a data scientist for Novetta. We're a defense contractor based in McLean, Virginia. And I've worked for them for about eight years doing data science and other analytics. My background, initially, was actually as a Chinese linguist. And I take that sort of analysis with me wherever I go into the future, but now I'm doing a lot more with big data, trying to recreate what I--RVB: 00:01:29.202 I didn't know that, okay. From Chinese linguism to graphs. That's a journey [laughter].LD: 00:01:38.031 Well, yeah. At the end of the day, we're all just trying to figure out what the stuff we're looking at means, and so you kind of carry that with-- so you'll see that when I talk about the natural language processing and topic modeling and all that I bring to my graphs, so.RVB: 00:01:56.599 Nowadays, I think you've been keeping yourself busy as well, right? I have to congratulate you with your wonderful daughter. Congratulations again.LD: 00:02:05.931 Yes. I went to GraphConnect 36 weeks pregnant and I said, "It's fine. I've got plenty of time." And I ended up going into labor six days later. So we cut it a little closer than I thought. But we're all settled in now.RVB: 00:02:21.444 Fantastic. Congratulations again. So, Laura, how did you get into graphs? What's the story there? And how did you get to know Neo4j and start working with it?LD: 00:02:32.196 So my work has always touched the law enforcement and intelligence communities, and it's always kind of had some social network analysis aspects to it. So I've always kind of liked graphs in general. When you're drawing a social network you immediately draw a graph. But more recently, one of the analytics I was working on involved binding social network analysis with the content, the topics, the stuff they're talking about. And it's an analytic called Social Bee that I briefed at GraphConnect. I developed it three years ago, and I realized quickly that I needed a graph database. I'd been reading about them. And so I literally just Googled graph databases and found Neo4J, and I love it. Because until then, all my databases were the traditional SQL-type stuff. And immediately, this was a graph that kind of looked like-- or a way to store data that looked like how I was thinking about it. There was very little post-processing to turn it into the way my brain was thinking about it or the questions I wanted to ask. And I tried a few other ones out. I didn't just stick with Neo4j. But it really was a nice combination of a low barrier to entry. It worked well with Python, which is my language of choice. And then it actually can get very, very powerful. It's not just a-- it can do a lot more than the movie databases that comes with it [laughter].RVB: 00:03:59.361 Absolutely, yeah. So when you say social network analysis, is that like a clique detection? Or what are you looking at in social network analysis?LD: 00:04:07.785 So they're some of the basics-- it's not basics, but they're some of the more traditional social network analysis that people talk about. But what I also do is I analyze the content of the messages people are sending. So for example, if I was drawing a social network based on Twitter, I don't just look at Laura tweeted at Rick. I capture the words that I said to you, and I store that in graph as well. And so then [those?] relationships-- and if you're familiar with natural language processing and topic modeling, you can actually start finding relationships between messages based on their content. And so I store that in the graph, too, so I can see there's this sub-community based on who they know or the sub-community based on what they're talking about. Maybe if you drew who's friends with whom there wouldn't be a line between them. But if you said who's talking about the same stuff you would draw a line between them. And all of a sudden-- let's say we're the intelligence community. Well, maybe these people are members of the same terrorist cell. Or if we're Taco Bell and they're both talking about tacos, well, maybe we need to build Taco Bells in these two areas. These two people don't know each other, but they're interested in the same things. So it adds that layer of content to the social, who knows whom.RVB: 00:05:19.875 That makes a lot of sense. And how do you do topic modeling then? Are there some NLP tools that you use alongside with Neo4j, or how do you do that?LD: 00:05:28.409 Yeah. Yeah. So I do that part in Python and then I store it in the graph. But I use SKLearn, which is a Python library. There's some other ones, especially if you're just interested in starting out. NLTK is a really good library with some great tutorials online. I think, just Natural Language Toolkit is what it breaks out to. But that's a great one for learning how to do topic modeling and natural language processing. SKLearn is what I use, actually, now because it's fast, and it's powerful, and it uses-- I like it, but. And then there's another one Gensim, G-E-N-S-I-M, which a lot of people like. I haven't used it enough to say whether it's great or not but, yeah.RVB: 00:06:16.695 Cool. So we'll definitely have to put some links to those tools in the transcription of the podcast when we get to that, but super interesting. So what would you say is the main reason that graphs are interesting for this type of analysis? What makes it attractive to you? Why use it in the first place?LD: 00:06:39.351 I like it because it's a lot easier to ask the right questions, I guess. As a developer, I'm not very good at the frontend stuff. But as the backend, when I think about the questions I would ask-- back in the day, when I was a linguist or even later when I was helping with cyber analysis for Novetta, what questions do I ask translates so much easier to a Cypher query then they would to 8,000 joined statements and a database. And so a lot of it, for me, is the backend. In fact, if you ran my queries in the Neo4J console and got back the jittering balls, the D3 stuff [laughter]--RVB: 00:07:21.920 Dancing balls we called them, yeah [laughter].LD: 00:07:23.465 That's probably not actually going to help you. We would need to do more visualization on the end. I really like graphs for the powerful questions you can ask, if that makes sense.RVB: 00:07:37.423 Yeah, totally. Yeah, yeah. The visualization pieces is also powerful, but at the end of the day, it's the types of questions that you can answer, right?LD: 00:07:46.787 Right. Because, really, what I want to say is Laura and Rick are talking about Neo4j. And you can say that a lot more with a sentence than you can with some pictures. But when you ask it via Cypher, it just makes a lot more sense than if you ask it via SQL or something like that.RVB: 00:08:04.297 Any real-world examples from, maybe, some of your intelligence or defense work or whatever that you can talk about, or is that all classified and you'd have to kill me [laughter]?LD: 00:08:15.632 No. I'm not allowed to kill people. No. I think a lot of what we've been able to do for our customers is with Twitter. And so that's, I think, a big-- some of the examples I've brought up before-- we have these two people, and they both are engaging in similar behavior online. And they don't communicate directly to each other, but we know that they communicate in such a similar fashion, based on the topics they talk about, that they must know each other. Somehow we're missing their direct communications, and that could be they have a cellphone that we don't know about. They only speak in person, but their web activity is similar. And you can see how that could apply to the intelligence or law enforcement community. And like I said-- or it also helps if you're marketing and you're Taco Bell. Another good example to think about is when we just do traditional social network analysis. I don't, but I may follow some terrorist account on Twitter. And if you're just drawing lines between who knows whom, you might say, "Oh, well Laura's connected to that bad guy so she must also be bad." But you really want to look at my behavior and what I'm saying and talking about. And that's where the bringing in the content helps so much more than just drawing lines between who follows who or who talks to who, you know?RVB: 00:09:46.674 Yeah, absolutely. Okay, that sounds really cool. So we'll put some links up, and people can read a little bit more about it, right? Maybe we can talk a little bit about future stuff. Where do you see this going? What's in store for you? How do you see the technology evolving within your domain? Or any major things that you would love to see happen in the future?LD: 00:10:15.221 Well, I think, with graphs, in terms of my domain, it's just going to get more and more-- you're going to see more and more people talking about graph databases, and I already hear more and more talk about Neo4j. In fact, I've started a meetup in Maryland, just specifically because I see--RVB: 00:10:34.554 [inaudible].LD: 00:10:35.190 So yes, we've been on-- since the baby came, we haven't had any meetings, but we'll start again in January or February, maybe. But the interest is just getting more and more high, I guess, in the intelligence community. And a lot of that is because there's people like me from 10 years ago who I knew I wanted to ask these questions. And I'm a little bit savvy. I can code a little bit, but I don't really have the ability to build my own program. But I want to ask questions in the way that I think about it as an analyst and not, like I said, SQL queries. And so that's where graphs are appealing, more and more, to these people with analytical minds and not necessarily the deep, deep technical or coding experience. But then it's also powerful on the backend, and that's where I'm also seeing more and more of just people developing tools that require Cypher-type queries, or openCypher, actually, is popping up a lot more.RVB: 00:11:34.332 Yeah. Is this more developments in the analytical use cases or is it more interactive, real-time applications? Where do you see most use cases for you?LD: 00:11:45.938 I work with a lot of folks like me that will build analytical prototypes. They'll spin up something, a graph database, really quick, and build something and then answer a question and tear it down. So I definitely see that. I think we all need to move more towards the interactive, actually building tools that run on graphs. We're not there yet. A lot of it is sort of, "Oh, I've already built something. Let me slap some query ability on it with openCypher or something." And that's helpful, but I think we need to take advantage of actually the native graph stuff, so.RVB: 00:12:26.199 Couldn't agree with you more [laughter]. We're all addicted to the same Kool-Aid, right?LD: 00:12:33.006 Yeah.RVB: 00:12:35.376 Well, I want to thank you so much for coming online. I know you've been very busy. I want to congratulate you again with your lovely family--LD: 00:12:41.801 Thank you.RVB: 00:12:42.861 --and with the work that you've been doing in Neo4j community. And I would love to meet up again at one of the future events, GraphConnect or other, and talk more about this stuff. Thank you so much for coming online.LD: 00:12:58.658 Definitely. Thank you, too.RVB: 00:13:00.134 Cheers. Have a nice evening. Bye.LD: 00:13:02.181 Bye.
All the best
Rik
No comments:
Post a Comment