Friday, 22 April 2016

Podcast interview with Ben Nussbaum, AtomRain

Recently had the chance to chat to one of the Nussbaum-brothers, Ben more specifically. We've met eachother a bunch of times at various graph events over the years, and next week they will be at GraphConnect again presenting their latest and greatest services and products. So it was a great time to catch up and discuss their view on Graphistania and the rest of the graphlands. So here's our conversation:


And as always, here's the transcript of our conversation:
RVB: 00:02 Hello everyone. My name is Rik. Rik Van Bruggen from Neo Technology. And here I am again recording a great podcast episode with Ben Nussbaum from AtomRain. Hi Ben. 
BN: 00:15 Hi Rik, thanks for having me. 
RVB: 00:17 It's a long distance call. It's late in the evening for me. It's about lunch time for you I think. You're in near Los Angeles, right? 
BN: 00:23 That's right. In Santa Monica. 
RVB: 00:24 All right. Very cool. Ben, I know you've been active in the Neo4j community for quite sometime but maybe some people don't know you yet, so why don't you introduce yourself. 
BN: 00:36 Sure. My name is Ben Nussbaum. I've been building enterprise software for nearly 20 years, really focusing on the architecture of globally distributed systems. And in the last four years, working extensively with Neo4j to basically make that a safe enterprise choice and really establish it in the solutions that we're delivering, and seen-- just a lot of excitement around connected data and the possibility of what it can do. 
RVB: 01:03 Yeah, and you guys have been working for companies on the west coast, or international companies, or-- what kinds of companies are you guys been working for? 
BN: 01:14 Some of both. We end up working with a lot of media companies because they're here in Los Angeles, but those have global reach. We worked with Sony Pictures Television out of the UK through their branch here of Sony Pictures Entertainment in Culver City. So it was kind of a dual working relationship across the US and into the UK. 
RVB: 01:39 Yeah, very cool. And I know that you guys have been doing quite a bit of work on the scalability operations, those types of things, right? And that's where GraphGrid came in? Is that what I'm understanding? 
BN: 01:51 Yeah, that's right. As AtomRain, we saw a need through our consulting especially around the Neo4j for a lot of-- just foundational data tooling, to enable enterprises to easily adopt and integrate Neo4j into their existing architecture. Because when you deal with an enterprise of a scale like a Toyota or a Sony, there's just a tremendous amount of existing systems already there. And so you're not really starting with a blank slate and bringing in a new technology like Neo4j, which can provide tremendous benefits, also has a lot of integration challenges. I think that's where we saw the need for a platform like graph grid to enable these companies to very quickly get up and running with Neo4j, get it connected to their existing systems, bring data into it and start connecting the real time applications to the graph to be able to take advantage of the connected data. 
RVB: 02:47 Well, I could tell you-- I've been working for Neo4j projects for about three or four years myself as well, and I can completely relate to that. It's a very common problem for big enterprises so I can totally understand that. So how did you get into Neo4j? Tell me about that? How did you guys get connected? 
BN: 03:08 Yeah. A little over four years ago we were working on a project called Ad Cloud, which is basically responsible for delivering 30 second video advertising spots for play out during television shows. There's a lot of different players involved in bringing one of those commercials together. The number of vendors and who has to sign off on what, and all the different assets and managing all of that workflow to bring that final spot out, takes a tremendous amount of complexity in just the roles, the permissions, who needs to sign off, what, when, where, who's next in the chain, and all of that. And so, we were just running up against a lot of barriers with my sequel and with Neo4j, you know identity and access management in this scenario was more of a function of your role in that group, rather than your position in a company at that time. And so there was just-- it give us the flexibility to represent the real world connections of the people, the vendors, the assets and kind of all of that complex network of highly connected data very effectively, and solved the challenge and kind of after-- yes? 
RVB: 04:30 Was it primary around the identity and access management peace stand, or what was it?
BN: 04:34 Yeah. The first one was primarily around the identity and access management, managing the work flow and the permission, and the sign off. 
RVB: 04:43 Okay. There's plenty of other customers that do that. Next week at GraphConnect or in two weeks at GraphConnect, we've got Royal Bank of Scotland talking about it and and that's what they use it for as well for example, so cool, very cool that you guys used it for that. And how did you sort of fell in-- fell in love from there? How did it started [chuckles]? It started with one project and it of took off a life of it's own, huh? 
BN: 05:08 Really after that we just started seeing all of the problems that we tried to solve in the past, and the promise that we are being presented with to solve in the future, were really graph problems. You know being-- where the relationships for how things were connected needed to be treated as first class citizens within the database, because Neo4j allows you to represent real world connections and the contexts of those relationships very well. And so I think that's why we've gravitated towards it as our database of choice, and especially the fact that it's acid compliant and fully transactional, and it can be your source of truth database. That's tremendous for being a real alternative to a relational database. 
RVB: 05:57 That's very cool, I mean it's a-- once you've seen the light, you start seeing these graphs everywhere, right? I've seen that so many times as well. It's really cool to see. 
BN: 06:08 Yeah. 
RVB: 06:10 All right. So where do you see this going then? Where do you-- what does the future hold for both for you guys at AtomRain and Graphgrid, and maybe also for the industry? How do you look at that? 
BN: 06:25 I'm really excited about the future. I think we're going to continue to push Graphgrid and make it able to serve the enterprise with their connected data solutions. Help them leverage Neo4j effectively at global scale. I think with the industry-- the thing that's always excited us about Neo4j and the reason we've backed Neo4j as our graph database of choice is because you guys have focused on reliability. First and foremost, you guarantee referential integrity. And when you're dealing with big data and trying to get it connected, not having to worry about if your data is consistent and if it's reliable, and knowing that two nodes always agree on the relationship between them just takes a huge load off, because not having that guarantee can just be a pain when you're dealing with hundreds of millions or billions of nodes. 
RVB: 07:25 It feels-- especially in the graph model I think, that consistency is super important because if you miss out on the consistency on the graph model, then you're basically corrupting the entire system before you know it, right? It's a really important characteristic especially for the graph model I think. 
BN: 07:46 That's right. The graph model is a paradigm shift. It's not just another relational store, another sequel store where you can just have anybody writing data into it. And so I think that fact that the solution-- as a native graph database Neo4j is sensitive to that, and considers that as the single most important thing. The peta scale graph vision that I've heard Jim Webber discuss at GraphConnect keynotes [chuckles] will come. We will get to that. And when that day happens, Neo4j will be by far the best solution out there. 
RVB: 08:17 Oh man, [chuckles] that's the best summary ever. On that bombshell. That's so great. No, I think it's super great that companies like yourselves are sort of complimenting the product ecosystem that we are establishing, and we need you guys, and you guys need us, and it's a great way to work together and we really appreciate this kind of partnership. So I'm looking forward to lots of other things as well. 
BN: 08:50 Absolutely. 
RVB: 08:50 Are you guys coming to GraphConnect any chance? 
BN: 08:53 Yeah, we'll be there. We're one of the sponsors, so we look forward to seeing you and everybody else there. 
RVB: 08:59 I'm looking forward as well. It's ten days-- a little over ten days to go, and we'll have a big crowd there so it's going to be great. All right Ben, I will wrap up the podcast here. Thank you for coming online, I really appreciate it. I also appreciate your friends in the trees [chuckles] to have joined us as well, because that's always very kind as well, and I look forward to seeing you in two weeks. 
BN: 09:24 Sounds great. Thanks Rik, talk to you soon. 
RVB: 09:25 Thank you man, bye-bye. 
BN: 09:26 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Thursday, 14 April 2016

Podcast Interview with Nicolas Rouyer, Orange

It's been a few crazy busy weeks and therefore a bit slower on the Podcasting front, but I still want to keep up this community effort, and so here we are again. I spoke to a dear community member from Toulouse, France: Nicolas Rouyer. Nicolas has been a driving force in the local community in Toulouse, and has also presented his experience with Neo4j at numerous meetups and GraphConnect conferences:

So I definitely wanted to have a chat with him, and here's the result:

Here's the transcript of our conversation:
RVB: 00:01 Hello everyone. My name is Rik, Rik Van Bruggen from Neo. And here I am again recording another episode of our Graphistania podcast. It's been a while since we've had some European guests on the podcast, so I'm very happy to have Nicolas Rouyer from Orange in France on the podcast. Hi Nicolas. 
NR: 00:23 Hi, Rik. How are you? 
RVB: 00:25 I'm very good, very good, and it's great to have you on the podcast. I know you've been doing some wonderful work with our community in Toulouse. Well, most people won't know you yet, so would you mind introducing yourself and telling us a little bit about your work there?
NR: 00:40 Yeah, so I am a Big Data Architect at Orange company. We try to design big data architectures that don't fall down. This is a challenge, and we try to analyze all data, all telecom data. I was interested two or three years ago, I can't remember, by graphs because I was I was working on a graph traversal problem and I met Cedric Fauve in Paris at a meet-up. And then I couldn't do anything but work on graphs and Neo4j. 
RVB: 01:36 It was just too compelling, huh [laughter]? 
NR: 01:39 Yeah. It was. I fell in love with I was graphs, I guess. 
RVB: 01:43 That's great. Many people don't know yet, but Orange is actually a telecom operator, right? You're a telecom operator in France, but also, in the UK, I believe, and across Europe I think. Right? 
NR: 01:55 Not in UK anymore, but in Europe, and also in Africa. 
RVB: 02:01 Yeah, exactly. So what attracted you to graphs? What was the use case that you thought was so fascinating that you thought, "You know, I have to get my hands on this?" 
NR: 02:13 Really, I was thrilled by the connections between data. I am really convinced at the call of my heart that connections between data bring value. This is really a mantra and I really believe in that. And so are the easiness to make connections between data and a graph really fascinated me with the user cypher which was very pleasant. I happen to to find new use cases with telecom data and then I did it. 
RVB: 03:06 Were there any particular use cases that attracted most attention for you? Was it more network management, or what was it exactly? 
NR: 03:14 Yeah, in the beginning it was it was network management, you're right, and IT management. So we beat a use case on IT supervision, and we adjusted into Neo4j IT data such as applications, flows, but also incident management, the incidents that occur on applications. So with this very simple data model, at the beginning so, applications-- there are incidents occurring on those applications and those applications exchange flows together. And we try to beat some use cases on incident prediction with Neo4j inside. 
RVB: 04:07 Wow, okay. Since then you've done a lot of work in the local the community in Toulouse. That's going really well, isn't it? 
NR: 04:19 Yeah, we are over 300 graphistas now, and-- 
RVB: 04:24 That's pretty amazing for-- I mean, Toulouse is a big city but it's not like Paris, right?
NR: 04:32 Yeah, you're right. It's not like Paris. We eat very well in Toulouse, not in Paris. 
RVB: 04:38 [laughter]. And you drink well because I remember the beers that you were serving at the meet-up. That was so good, it was really great. 
NR: 04:46 Yeah, but after twenty-two hours, we cannot relate what happened [laughs]. 
RVB: 04:53 Exactly, so you also participated in the writing of the new French Neo4j book, right? 
NR: 05:04 Yes, this is a book in French written Sylvain Roussy who is organizing graph database meet-ups in Lyon. And Sylvain propose me to write some chapters is this book. I happened to write a chapter on graphgists, and it took me a lot of time because-- 
RVB: 05:36 I know [chuckles]. 
NR: 05:37 -- I had not written books before, and it's crazy work. But anyway, I'm pretty happy with the with the result. We want to write the second edition which will deal more on the portion of how to operate it. 
RVB: 06:08 Okay, well Nico, I am having some trouble hearing you. I'm hoping the recording is still happening okay. But that book that you guys wrote, it's popular in Belgium as well, I can tell you. I was talking to some graphistas here in Belgium, at the meet-up and they know about it as well. So really top job, thanks for doing that, really appreciated. 
NR: 06:33 So it's on D-BookeR Edition, if I can advertise a little bit. 
RVB: 06:41 Yeah, I'll put some links to it at the transcription at the blog. So don't worry. 
NR: 06:50 On the graph keys chapter, I tried to explain the spirit of it. Which is to share the graph data model you're building on some data. And sharing you cases is the best way to offer a space of creativity for other graphistas. Because, If they know that a use case is possible in a given sector of activity, they are able to transpose it to their own sector. I mean for example, there is a very well-known use case on telecom, which is churn reduction. We want to keep our customers, and we can use graphs for that. Because interactions-- communications are interactions, are connections between our users, and we can find influences in the telecommunication community. As well, we can transpose it to a human resource. Why do people turn? Why do people leave the company? And we could try to get inside from their interaction, inside the company, and try to find why people leave, and which people could be influencers. 
RVB: 08:21 Couldn't agree more. So, Nico,  maybe I can ask you one more question, right? Where do you think this is going, what does the future hold, both for you and the meet-up, and maybe also the industry? What's your view, what's your hope, what's your expectation there? 
NR: 08:40 I'm pretty convinced that more and more people are going to use graphs because, really, they can find a new ways to build use cases and business models to be able to find more things than before with graphs, because there is a deep insight into the data and into their connections. I'm pretty convinced it's going to work for Neo4j and graphs in general. I expect that the openCypher initiative will bring more actors working on graphs, and on graph traversal with data bricks and oracle, for example. And I expect that /the standardization of drivers will help adopt Neo4j for the global community of developers. If developers find an easy way to ingest and query data, then the graph-based use cases will multiply around the world. 
RVB: 09:59 I could not agree more. Absolutely. Those are really key initiatives, you know, the drivers and the openCypher. And as you know, we're to be announcing a lot more of that stuff at GraphConnect, which is only a month away. I'm hoping I will see you there? 
NR: 10:18 Yeah, of course. I will be a tourist at this GraphConnect because I don't-- I won't present anything. So I won't be a speaker, but I will listen all the while. 
RVB: 10:34 Exactly. I'm sure it will be a really fun conference. Look forward to to seeing you there, I want to thank you for coming online and doing the podcast recording with me, Nico, it was really great. And I look forward to doing lot more stuff together in Toulouse and elsewhere. 
NR: 10:50 Yeah, yeah. Thanks a lot, Rik. 
RVB: 10:53 Talk to you soon, bye. 
NR: 10:55 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Wednesday, 13 April 2016

GraphConnect Europe 2016 - of course we have a Schedule-graph

It's that time of the year again - GraphConnect Europe 2016 really is around the corner now. We have so much to look forward to - great users, community, customers, developers, engineers, speakers, founders - and everyone with a graphista-bone in their body is flocking to London on April 25th-26th. So of course, on this blog, I had to have some kind of little contribution to make - and traditionally (like I have done for many other conferences in different blogposts), I have to make an improvement on this "ugly" table:

This is: I really want to look at this data as a GRAPH - not a table. Of course, you say...

The GraphConnect Schedule Spreadsheet

In orde to do that, we create a nice little spreadsheet version of the schedule first. Simple, in a Google Sheet: here it is. It really is pretty simple:

And of course, like with every Google Sheet, it's trivial to download it as a CSV file. Here it is straight from Google (you have to make it public - which I have done - for this to work, of course), or find it in the Github Gist as well. And then of course, we can start importing the data into Neo4j.

Importing the Schedule into Neo4j

Like with previous conference schedule imports, I have chose then same model again to import the data into:

So we have 2 days (25th for the trainings at CodeNode, and 26th for the conference at QE2 centre), lots of timeslots in each day, and then Sessions that are part of tracks, located in Rooms, held by Persons that work at Companies). Pretty straightforward!

So then we have a an import statement (or a series of different import statements - whatever you prefer) to get the data into this model. I have put both versions (the single statement, and the multi-statement version) in this Github Gist, but I will share the single statement version over here:
load csv with headers from "https://docs.google.com/a/neotechnology.com/spreadsheets/d/10sswmRmY5FjYMLU5c5rJlXr4m9Idxw7tC8OImb7v4Yg/export?format=csv&id=10sswmRmY5FjYMLU5c5rJlXr4m9Idxw7tC8OImb7v4Yg&gid=16326967" as csvmerge (d:Day {date: toInt(csv.day)})with csvmatch (d:Day), (d2:Day)where d.date = d2.date-1merge (d)-[:PRECEDES]-(d2)with csvmerge (r:Room {name: csv.room})merge (t:Track {name: csv.track})merge (p:Person {name: csv.speaker, title: csv.title})merge (c:Company {name: csv.company})merge (p)-[:WORKS_FOR]->(c)with csvmatch (d:Day {date: toInt(csv.day)})merge (t1:Time {time: toInt(csv.starttime)})-[:PART_OF]->(d)merge (t2:Time {time: toInt(csv.endtime)})-[:PART_OF]->(d)with csvmatch (t2:Time {time: toInt(csv.endtime)})-[:PART_OF]->(d:Day {date: toInt(csv.day)})<-[:PART_OF]-(t1:Time {time: toInt(csv.starttime)}), (r:Room {name: csv.room}), (t:Track {name: csv.track}), (p:Person {name: csv.speaker, title: csv.title})merge (s:Session {title: csv.talk})merge (s)<-[:SPEAKS_IN]-(p)merge (s)-[:IN_ROOM]->(r)merge (s)-[:STARTS_AT]->(t1)merge (s)-[:ENDS_AT]->(t2)merge (s)-[:IN_TRACK]->(t);
As you can see, it's basically different statements tied together by a bunch of WITH statements. Try it out on your local machine, and it should work. If it does not, then you can always go back to the multi-statement version - that definitely works, also on older versions of Neo4j.

Querying the Schedule Graph

Of course then we like to explore the graph, right. I have created a few queries for you to play around with - they are also in the Github Gist - but let's start by taking a look at day 1 with a simple query:
match (d:Day {date:20160425})<--(t:Time)<--(s:Session)--(connections)return d,t,s,connectionslimit 50
Gives you a very nice and graphy view of the trainings:

Obviously this would be a bit more of a complicated graph - because of the more varied schedule, of course - for day 2:

And that's just a sample! But one part worth looking into are some of the talks that will be given by the MOST FAMOUS user of Neo4j these days, the ICIJ. Since the publication of the Panama Papers, they have been front page news - and clearly this will get a lot of attention at GraphConnect as well:


And then we can start asking some truly "graphy queries: how would our beloved Dr. Jim Webber be related to the ICIJ? Let's ask the graph:
match (c:Company {name:"ICIJ"}), (p:Person {name:"Jim Webber"}),path = allshortestpaths( (c)-[*]-(p) )return path
This gives you this result:



That's about it for now. I hope you will play around with the Schedule Graph yourself as well, and I look forward to seeing you at the conference.

All the best

Rik

Wednesday, 30 March 2016

Podcast Interview with Brock Tibert, Bentley University

Here's another great interview for our Neo4j Graphistania podcast, this time with a very active community member in the US of A, Brock Tibert. Brock has a great blog (called Enrollment Nerdery) about all kinds of interesting tech stuff for the education industry, works at Bentley University in the Boston area, and has been doing some very nice stuff with Neo4j. Reason enough to have a chat:

Here's the transcript of our conversation:
RVB: 00:02 Hello everyone. My name is Rik, Rik Van Bruggen from Neo and here I am recording another episode of the Graphistania podcast with someone who's joining me all the way from the Boston area, in Waltham, Massachusetts, where I used to have my stomping grounds as well. This is Brock Tibert joining us. Hi, Brock. 
BT: 00:21 Hi, Rik. How are you? 
RVB: 00:22 I'm very well. Thanks for joining us, making the time. It's really cool. Hey, Brock, you know how we try to structure these podcasts a little bit, right? So why don't you try and introduce yourself a little bit so that people can know what you do, and how you relate to the wonderful world of graphs? 
BT: 00:39 Sure, well first of all, thanks for having me. It's great to be on this podcast. Currently I'm employed in the higher education space here in the United States. I work at Bentley University as the Executive Director of Enrollment Systems and Analytics. I know it's a really long title, but basically my job is-- I'm responsible for the division that works with admissions and financial aid. We work on the recruitment of students, the marketing, making sure that we land our class from the strategicals that we have as a university, but also making sure that we stay on budget. I've worked in higher ed for about 11 years but I'm really excited about how I can apply graphs. I've seen a lot of different applications as I work through that in my job, and how I can apply them to various problems that I have here at my job, as well as in the higher ed space. 
RVB: 01:31 Oh wow, okay. So Bentley University, it's a private university I suppose, right? 
BT: 01:36 It is. It's a private business-specific school, which working at a school the has a singular focus, or a niche if you will, presents some interesting challenges but I think that's where-- you know, you think about graphs and how people are interested in certain things and their relationships, and it really kind of lends itself to that. So it's kind of an interesting use case. 
RVB: 01:57 Yeah, that can be very cool. So, how did you get into the wonderful world of graphs? What's your relationship to Neo4j? How did it all start? 
BT: 02:07 Sure, so a long time ago when the Netflix Prize came out and I know it was solved I think using matrix factorization and things like that, I started to really get into recommendations and thinking about how recommendation engines can solve a lot of problems, or at least try to help with some of the problems that I face in my job, whether it's recruiting students or trying to think about marketing content that's relevant. I started to think through how you could shape recommendation engines in that way. And I actually came across the blog post not too long after that contest, where someone worked through how to do that with a Yelp data set and Neo4j. And just basically from that post on, I've been hooked. Neo's accessible. You've talked a lot about on this podcast and on various other blogs that graph way of thinking, and it just totally helped my learning curve really relate to graphs and how things are related and how you can leverage that to help solve problems. 
RVB: 03:08 That's really cool and it introduces my second topic a little bit. Why is it so useful for your daily business challenges? How does it help? 
BT: 03:21 Sure. Right now, I think higher ed and the marketing and trying to land your class and enrollments and all that, it's getting really complex. A lot of schools are now employing things that have gone on in other marketing, in other industries for years. We're kind of, in a way, higher ed's kind of catching up. In doing so, we're generating lots and lots of data. As we start to really start to leverage CRMs, something that higher education is only starting to really do, we're generating like I said lots of information and we can learn a lot about our prospective students or even our current students for that matter. Whether it's things that they're interested in, if you use email engagement as a proxy for interest, or things that they might view on the website, things that they tell you via surveys, visiting campus, things you learn from them then, you can start to really get out and pull out that interest graph, right? Those things that this student is interested in, this major. This student has clicked on content related to financial aid or various groups on campus. You can start to really tease out that interest, and start to say, "Okay, well how can I separate my school Bentley from another school-- well you want to provide relevant content right? So that's kind of what I'm working on a little bit. I'm trying to leverage Neo4j and that interest graph to say, "What can we do to really market differently and stand out from the crowd?" 
RVB: 04:47 Wow, that's super cool. It's really almost going towards those sophisticated recommendation engines that people use in retail and stuff like that, right? It's similar to that. 
BT: 04:58 Yeah, that's the idea. I might not be recommending a movie but I might be thinking about, "Okay well in our content, what should we be putting in a newsletter? Should we be putting things about study abroad because the student would be more likely to want to travel overseas during their time here? Or maybe sports? They're more interested in athletics. The idea is you want to provide relevant content. And to me graphs and recommendation engines are a really easy way and accessible way to do that. 
RVB: 05:25 Are you using any of the standard recommendation techniques there like collaborative filtering or any of those types of things? 
BT: 05:36 So right now what I'm trying to do is, I finally had about a good year, year and a half of that data collection, where I can actually start to think through how I would solve this for our problem moving forward. What I'm thinking about doing is looking at similar click behaviours. So, we record a lot of information like someone visits our campus or they requested information on the web, but by leveraging our CRM system and all that data we have in terms of email engagement and what are they engaging and clicking on, I want to start to look at, "Well, find similar groups of students. What are they interested in?" And then start to think about the marketing content for future  perspective pools and recommend, to your point, are you interested in study abroad? And do it through collaborative filtering but also graph clustering and things like that. 
RVB: 06:26 Well, that all sounds very much like the Enrollment Nerdery that you promote on your blog, right? 
BT: 06:32 Yes, I'm totally obsessed with higher ed. 
RVB: 06:35 It's really cool and on your blog you also had some great articles. I only read a couple of them on prototyping and linking back to R and those types of things,  really cool stuff. 
BT: 06:48 Thank you. Nicole's package has been great. The RNeo4J package. I mean it's totally phenomenal and it makes accessing Neo4J via R, something that is my language of choice, really really easy. 
RVB: 07:01 Super. Cool, so maybe one more question, Brock? Where do you think it's going? What does the future hold for you and your use of Neo4J at Bentley, but also in general, and the industry. How do you think about that? 
BT: 07:17 Yeah I think I kind of talked a little bit earlier, moving forward I think graphs have a natural place in this higher ed recruitment space, this enrollment management. I think it will make things easier for us as we start to collect all this information about students and families, this student is related to this person, or this student is an alumni from this school, or this student is interested in that school, and also majors in interest like I said and I think graphs really will help us institutions and higher ed in general with a lot of the problems we solve whether on the recruitment side of things, what students are interested in our school, or even further down that enrollment path, once they're actually enrolled at a school, what courses should they be taking? What courses will they maybe have trouble with so you can intervene to make sure that you retain the students which is a huge problem right now in higher ed. So I think there's a lot of natural use cases for graphs in higher ed in general, and I'm excited to see how that plays out, and I'm just trying to work through a lot of the problems that I have and try to promote that as much as I can to help people that are in my shoes at other institutions to think that way. 
RVB: 08:28 Super nice.  It's really cool that you that do that. I really appreciate it. Brock, we try to keep these podcasts nice and snappy and digestible, so I think I'll wrap up here, but I wanted to thank you again for coming on the line and doing this episode with me. I wish you tons of luck and then success with all your wonderful experiments and business apps. 
BT: 08:52 Well thanks Rik, I really appreciate it. 
RVB: 08:54 Thanks man, bye bye. 
BT: 08:55 Bye Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Wednesday, 23 March 2016

An easier, better, tastier (!) BeerGraphGuide in the Neo4j Browser!

I recently wrote two blogposts about a fancy new way to create "Guides" right in the Neo4j Browser - a system that we at Neo have used ourselves to create learning experiences for the Neo4j Graph Database - but that we are now also extending and making available to our community and Enterprise customers. The first post showed you what the result was, and the second post showed you how easy it was to create such a guide. And of course, since this is still all very new and under development by the inimitable Michael Hunger, it is still evolving at warp speed.

So that's why I am adding a third post to this series, based on the lastest version of Michael's github repo. In this version, he has added a much more convenient way to serve the HTML page that we use as the basis for the Browser Guide: we essentially don't need a CORS-enabled webserver anymore - we can just use the webserver that is already baked into Neo4j and which the Browser already uses. Michael did this by developing an unmanaged extension that picks up the HTML file from a local directory, and then serves it when you call the :play command in the browser. So how does it work? Let's show you.

Installing the Unmanaged Extension

You will find the extension on the github repo under the "guide-extension" directory. Michael has added some detailed installation instructions if you would be able and willing to compile the software yourself - but he has also added a compiled version link for Neo4j Enterprise 2.3.2. You can download the .jar file, and then just add this to the right directory of your Neo4j Enterprise 2.3.2 installation: the "plugins" subdirectory.

Once you have that, we do still need to do a few manual interventions to activate and enable the extension.

Creating the directory and copying the guide's html file

The extension assumes that you add a new directory in the "data" directory of your Neo4j installation: "guides". Once you have created that, you can copy the HTML file that I had created in the previous post, into this directory - this is where we will serve all future Guides from.

Configuring the Neo4j server

Then we need to make a few more easy configs to the neo4j-server.properties file in the ./conf directory of your Neo4j installation.

1. Disable authentication for the Neo4j server

Michael has not added a way to authenticate to the server yet, and therefore you just turn the dbms.security.auth_enabled switch from true to false.

2. Donfigure the extension .jar file

Two things we need to do:
  • In order to load the extension from a specific URL (in this case: http://localhost:7474/guides) we add property to the neo4j-server.properties file. 
  • We let the extension know where the guides will be located (the data/guides directory)
You could of course configure these differently if that would be useful.

3. Edit the whitelist for the browser to load Guides from

Again, we edit a particular section of the neo4j-server.properties file, and allow the browser to load data from the localhost server. Two ways to do that:
  • simply allow ANY server with a "*"
  • or specifically allow the localhost on the specific port that your server is running
And that should do it. All we need to do now is bounce the Neo4j server, and we should have our guide in the browser: just try

:play http://localhost:7474/guides/beer_graph.html

in the browser, and tah-daaaaaah:

Yey! That was easy!

Hope you like this as much as I did. Big kudos to Michael, again - and I look forward to seeing your Guides somewhere in our community.

Cheers

Rik

Sunday, 20 March 2016

Podcast Interview with Clare Zutz and Mark Hand, University of Texas


A few weeks ago I was introduced to a couple of community members that had written a super interesting blogpost about visualising the Global Impact Investing Network (GIIN) on the Neo4j BlogClare Zutz and Mark Hand are both researchers at the University of Texas, and as you will read below, they have done some remarkable work to use Neo4j for the greater social good - so read on and follow the conversation...


Here's the transcript of our conversation:
RVB: 00:03 Hello everyone. My name is Rik Van Bruggen from Neo, and here I am recording another episode for the Graphistania podcast, and tonight I have invited two lovely people from all the way in Texas, who have been doing some wonderful work with Neo4j. That's Mark Hand and Clare Zutz. Hi guys. 
CZ: 00:24 Hello. 
MH: 00:25 Hi everybody. 
RVB: 00:26 Hey. Good to have you guys on the podcast. Thanks for making the time, I really appreciate it. I've been reading some of your work that you guys have been publishing on our blog and everything, but some people may not have read that yet. So why don't you guys introduce yourselves and make yourselves known? 
CZ: 00:49 Okay, I'll start. So, hi I'm Clare Zutz and I'm currently at UT Austin at the RGK Center for Philanthropy and Community Service, and I lecture an undergraduate course, and then Mark and I also do research around social innovation and network analysis. 
MH: 01:06 My name is Mark Hand and I'm an adjunct assistant professor here at the University of Texas, affiliated with the RGK Center. And I also work for an incubator of social enterprises here in Houston Texas called UnLtd USA
RVB: 01:17 Very cool and the RGK Center? What does that stand for [chuckles]? 
CZ: 01:22 It's the RGK Center for Philanthropy and Community Service
RVB: 01:26 Oh okay, all right. Very cool. I saw the website, but I didn't know what the acronym was for-- 
MH: 01:32 It's named after a businessman and his wife, Ronya and George Kozmetsky, who are big donors to the university. 
RVB: 01:37 Interesting, okay, very cool. And you guys have been doing some very interesting work with graphs, networks and Neo4j. You wrote about the GIIN - the Global Impact Investing Network - which you might want to tell our listeners a little bit about. 
CZ: 01:57 Yeah, absolutely. So, last summer Mark and I were talking about the intersection of social innovation and food policy. Mark has been in the social innovation space for a long time, and I had just finished a Masters with focus in food policies. So we were talking about what impact investing in sustainable agriculture looked like, and we weren't really sure what that network or ecosystem looked like. So Mark offered up the idea of using a graph database to visualize that, and to be able to gain some better insights. So what we did, was we began by scraping publicly available data from GIIN, and again that's the Global Impact Investing Network. And so we looked at the asset managers, and looked at their investments, and we ended up with 45 asset managers and over 400 relationships. So that was how we started, and our idea was to try and kind of understand the evolution of co-investments over time, to uncover some of the key actors and the influencers in the space and highlight some of the co-investments. So thinking about it as an opportunity for entrepreneurs to search for different investors, and then investors to also see where other people are co-investing in different opportunities. 
RVB: 03:12 Super interesting. So I've read your blog post, but listeners may not have done that yet. So impact investing, what is that again? 
MH: 03:20 Sure. So one of the things that happened over the course the last ten to fifteen years is that the an increasing number of both foundations and traditional investors and asset managers have began to ask the question, "What happens if we actually intentionally invest in companies that have some kind of social and environmental mission at their heart?" Included in this world might be micro-finance and fair trade, and to some extent conscious consumption, conscious consumerism. And one of the things that's happened recently is a certain coalescing of these investors around a set of industry bodies. And the Global Impact Investing Network is the largest of those bodies and includes everything from family offices that are deciding to make investments in, let's say a software in Kenya that helps connect mobile money to back office small business software, all the way to alternative energy sources, perhaps in coming from certain types of trees that are farmed down in Texas. 
RVB: 04:21 Oh, wow. So, and this is global, right? This is not just, you know, North American investors, managers, this--? 
MH: 04:29 One of the things that's been interesting is that this really is a global phenomenon. There's a tremendous amount of activity in India. And then in the United States it's come out of-- in the United States and Europe it sort of evolved from the international aid world into an industry on it's own right. And there's sort of - there's still questions about boundaries and definitions, which is part of actually what we were interested in - is can we actually look at who's in and who's out, or what kind of activity is happening, and look at it deductively from who's actually making investments and who's actually making transactions, as opposed to trying to induce definitions on the space? And if the listeners are interested, the database that has, perhaps, the most amount of information - the most number of investors and entrepreneurs - is actually also out of Austin, Texas and they were our partners in this project. It's called Enable Impact. If you go to enableimpact.com, that's where listeners can look up and see who some of these entrepreneur's are, and who some of these investors are as well. 
RVB: 05:32 I'll put some of those links on the blog post with the transcription. So that's great that you put that out there. So how did you guys get into graphs for this type of work? Mark, you were like the guy suggesting this the idea to start--? 
CZ: 05:48 He was indeed [laughter]. 
MH: 05:50 So I took a class on strategy and innovation from a professor at Oxford named Marc Ventresca. And in that class we got exposed to how networks can help us understand the way that entrepreneurs behave, the way that entrepreneurs gather their teams together and manipulate resources around them. And when I came to Austin to start work with UnLtd USA, one of my questions was to what extent can we actually take some of this scholarly work on entrepreneurs and the way that they manipulate networks, and actually turn it into something that can help entrepreneurs who are just getting started? And the more we sort of went through that, the more we began to consider how else might we apply some of these lessons - not just to working with entrepreneurs themselves, but to understanding the ecosystem of players around them. So actually we made a map of the Austin social innovation ecosystem, that was graph based. And from there we began to ask more question here, at the University of Texas, about: What can we actually learn about this ecosystem if we get a little bit more intentional, and if we try to get a little bit more focused on the questions that we're asking? 
RVB: 06:54 That's super cool. And then what were some of the interesting results that you guys found? Any particular highlights that you want to sort of call out? 
CZ: 07:03 Well I think one of the interesting things that we learned was, I think our original goal to just bring some transparency to the sector, right? I think a lot of people talk about impact investing, but what exactly does that mean and what does that look like, especially for sustainable agriculture? I think when we were going through it, we initially thought, okay that to have this map will lend some clarity to the space, but then we also realized that this could really be used as a tool for entrepreneurs and also to give recommendations for investors. And we've actually had a lot of different organizations reach out to us after the blog post went out, so that was really exciting, to see that there was a need for that, and it was actually gaining deeper insights. 
RVB: 07:43 That's super cool. Is there any future work planned on this, Clare or Mark? Are you guys planning to evolve this in any way? 
MH: 07:53 We hope so. So one of the things we found actually, is that we ended up having to get this data from the portfolios from the websites of the investors who were part of the GIIN. And what we did was, we began with their investments and then we actually looked at press-releases and then on CrunchBase and AngelList to see who else had invested in those companies as well, to try to determine what other investors were out there that weren't identifying as impact investors, but were making investments in companies that might be considered social enterprises. And that's been a pretty tremendous learning for us, to see  how it is that this network of people is starting to pull in other more traditional investors into some of these companies. Hampton Creek is a company that's a good example of this. It's a company started by a guy named Josh Tetrick, and they make an egg alternative, or a mayonnaise alternative. And they've actually been able to get some - what might be considered impact investing - but then they've also been able to get a tremendous amount of mainstream investors pulled into this company as well. 
MH: 08:57 And the other thing that was really interesting was to see which entrepreneurs had been successful at raising money from within the GIIN ecosystem, so from within self-identified impact investors, and then which companies, like Hampton Creek, had actually had more success raising funds from people who didn't self-identify as impact investors. Now what happened when we tried to move forward with this, is it turns out that, in fact, that this information simply doesn't exist anywhere yet in this kind of form. And so now we're considering how we might move forward knowing that we would actually have to construct this data set ourselves, as opposed to trying to do some work on an existing data set. So we are beginning to do a couple of things. We're beginning to look at other data sets. There is an organization called D-Lab which runs a certification program for entrepreneurs that consider themselves social entrepreneurs. And they have a beautiful data set of about 80 investment funds and some of their investments, and then also the impact metrics associated with those entrepreneurs. And so that may be one project that we take up. Another is actually to try to build up a more robust map of the Austin social innovation landscape, and perhaps even make it open source so that entrepreneurs and investors can enter their information in. But we've sort of got a long way to go and a bunch of projects that we are playing with. We're also - if listeners are interested - we'll be contactable through a website that we'll be putting up in the next couple weeks. It'll be impactanalytics.io. It'll be dedicated specifically to: How do we use some of these analytical methods - like those that Neo4j helps facilitate - to better understand the social impacts sphere? 
RVB: 10:31 Wow, so many plans! Usually when I ask people about their future they're like, "Oh well, we--" [laughter], but you guys have a lot of plans laid out for you. It's very cool. 
CZ: 10:44 We do. For having spoken them out loud, perhaps somebody will have to hold them to us [laughter]. 
RVB: 10:48 That would be very cool. Well, this has been a wonderful conversation. I really enjoyed it, and I wish you guys lots of success with the future work that you are planning. It's been really cool to see some of that work come out that you share with our community. We all really appreciate that. So thank you so much for coming online and talking to me about that. And I'm hoping that we'll meet each other someday at the GraphConnect Conference or something like that [chuckles]. 
CZ: 11:19 Wonderful, thank you so much. 
RVB: 11:21 Thank you, have a nice day. 
CZ: 11:23 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Thursday, 17 March 2016

The BeerGraphGuide in the Neo4j Browser: the making of!

So last week I showed off this little tidbit that I created with the help of Michael and Oskar - a BeerGraphGuide right into the already fabulous Neo4j Browser.

So obviously I need to explain HOW I created that - and that's what I will do in this blogpost.

Creating a Browser Guide - easy!

Starting point of the exercise is Michael's github repo with all the source material. Download the zip file with all sources and unzip it on your local machine. The instructions online are kind of limited, but in principle the sequence of what we need to do is EASY:

  1. we need to create an Asciidoc file (identical to what we use for our GraphGists) that describes the Neo4j Browser Guide.
  2. we need to convert that Asciidoc file to an HTML file with a very simple shell script (see below).
  3. we then need to figure out a way to LOAD that HTML file into the Neo4j browser. in a :play pane (similar to the built in guides that we already have)
So let's get cracking with that.

Creating the Asciidoc file

As I showed you in last week's post, I started from an older graphgist that I had lying around - which loaded by favourite Belgian Beergraph straight from a Wikipedia into Neo4j. I basically edited and "dumbed down" that gist and made a much simpler document, which you can now find over here on github as well.  The trick seems to be that you really have to create very clear sections in your Browser Guide:

  • using 1 "=" for the main headline of the guide
  • then using 2 "==" markers for each slide that you scroll through sideways, and then 
  • within each pane you create further subdivisions with a "===" marker. 
Pretty straightforward!

Here's what it looks like:

Next up: converting that into an HTML document.

Creating the HTML page for the Browser Guide

The structure of the HTML is explained in this markdown document, but it's easy to understand from the document itself. But of course, Michael and team have tried to make it really simple to convert the .adoc into html automatically, with a simple shell script All you need to do is to run the script:

./run.sh ./adoc/beer_graph.adoc

which will put its output in the ./html/beer_graph.html file. It looks like this:

and that will do the trick. As mentioned above, it generates an HTML file in the HTML directory:
which in and of itself is pretty easy to understand:


You can find the generated HTML file on github as well, if you would just like to grab it and go from there. You will see that this file uses a "slide"-layout CSS which is then used by the browser to turn it into a slideshow.

Now, if you would just be opening the html file locally into your browser from your filesystem, you could already get a good sense of what is going on. 
Now all we need to do is figure out a way to get that to load into the Neo4j browser.

Download and install the latest and greatest Neo4j

So: next, you need to download the latest and greatest version of Neo4j from over here (I used the 3.0M05 Enterprise version just because... I am an adventurer, you know!). You should install it, make sure it's running and that you can access it on http://localhost:7474 (or whatever address you are using), and then shut it back down.

The thing is, in order for the Browser Guides to be allowed to fetch OTHER content (not the default content that Neo Technology adds), you need to change a property in the Neo4j configuration file at

./conf/neo4j.conf

Specifically, you want to look for a property called

dbms.browser.remote_content_hostname_whitelist=

and set that to "*" as below. This basically does what it says: it allows the browser to fetch other content from "foreign" URLs.


Here's the full config:

#*****************************************************************
# Neo4j Browser configuration
#*****************************************************************

# Whitelist of hosts for the Neo4j Browser to be allowed to fetch content from.
# Set to '*' to allow all hosts.
dbms.browser.remote_content_hostname_whitelist=*

The issue is however that the Neo4j browser is served from http://localhost:7474 and that it does not have access to files on the local file system.  And the browser does not allow you to mix and match local file system content with content coming from http://localhost:7474. See this error message:


So therefore we need to do two things to serve up the HTML page to our Neo4j browser:
  1. We need to find or run a webserver, that will be able to host the HTML page that we created above and serve that to requests from our Neo4j Browser's guide
  2. and on that webserver, we need to enable the Cross-Origin Resource Sharing feature so that the browser would not complain about mixing different sources of content in one web page. 
Not trivial? Well, no - but if I can do it you can do it.

Note: Oskar told me that very soon, Neo4j will be offering a contrib repo on Github where people can create pull requests with their general guides - and which will then automatically by published on guides.neo4j.com. Also: you can host a website on an Amazon S3 webserver that has configurable server headers - so guides would be playable from there too.

Run a CORS webserver to serve the HTML

In the github repo there are different simple ways to add a CORS webserver: there's
  • ./http.py, a simple python script offering up webpages
  • ./http.rb, a Ruby Sinatra application that does the same. You are best starting it by running ./http.sh , as that will also install the Sinatra gem for you.
I am sure there are plenty of other ways of doing the same, but if I just run ./http.sh things automagically start happening.
As you can see above, the webserver is alive and serving pages on port 8001 of my localhost, which I can test really easily by going to the http://localhost:8001/beer_graph.html page. And yey! It works!
This means we should be ready to test our BeerGraphGuide in the Neo4j browser too.

Finally: testing the BeerGraphGuide within the Neo4j Browser

Last step is stupidly easy. All I now need to do is go to my Neo4j browser, and run

:play http://localhost:8001/beer_graph.html

And a new :play pane will appear that interpretes and reformats the HTML page above, and shows it in the specific format that we know and love for the Neo4j browser panes.


That is what I call a RESULT! It totally works, and I can think of a zillion different use cases for this mechanism - and I am sure you can do the same.


Hope you guys like this experiment - and I am looking forward to many more in the future!

Cheers

Rik