Bruggen Blog: graphgist

Showing posts with label graphgist. Show all posts

Wednesday, 26 April 2017

Graphconnect Europe 2017 schedule graph

Countdown has begun! Two weeks from now we'll be bringing together the entire European Graph Community in London again, for the annual Graphconnect Conference. Every year, it's something to really live up to, to rally our customers and users to attend as we really believe in the "power of relationships" that are formed and strengthened at conferences like this.

So of course, we had to pull out the old trick (started at Oredev 2014 actually - so quite some time ago!) of creating a "Conference Schedule Graph" for everyone to explore.

Podcast Interview with Dustin Cote, Luther College

I have called to him before, but I am going to do it again: thanks to the wonderful Bryce Merkl Sasaki at Neo4j, I have had the pleasure of talking to sooooooo many wonderful guests on the podcasts. People that I would otherwise probably never have talked to, for all kinds of reasons. Another one of these folks is my guest on the podcast today - Dustin Cote. An active graph enthusiast in the heartland of the Midwest - and a great knowledgeable database engineer at that, who I really enjoyed talking to. Here's our conversation:
Here's the transcript of our conversation:

RVB: 00:02.617 Hello everyone. My name is Rik Van Bruggen from Neo Technology and here I am recording another podcast session for the Neo4j Graphistania podcast. Tonight, I'm joined by someone from the beautiful Midwest in the USA. I've got Dustin Cote from Decorah, Iowa on the Skype call. Hi Dustin. How are you?

DC: 00:25.316 Hi Rik. I'm doing great. How are you

RVB: 00:27.268 Very, very good. Thank you for joining me. It's always great to have people from different parts of the world on this podcast. I've read some of your work in the communities and in the GraphGist, Dustin, but most of our listeners probably haven't yet. Do you mind introducing yourself to us and telling us who are, what do you do, and what's your relationship with the wonderful world of graphs?

DC: 00:53.085 All right. My name is Dustin Cody. I currently live in a northeast corner of Iowa in a small rural town by choice. I work for Luther College. It's a small, private college and just started last year. Prior to that, I was working at the University of Wisconsin-Madison and worked there for seven years as a PeopleSoft programmer analyst with emphasis in database design. Before that, I was a data warehouse administrator, and before that, I was doing stuff with databases and before that, I was learning to program in seventh grade. I've been around for a while, well seasoned as they say. That's what I'm working on, mainly because of my experience with ERP systems. Seems like no one wants to work on old technology, so there's always a niche for that.

RVB: 01:45.121 The world would stop turning without old technology, I think.

DC: 01:49.133 It would, believe it or not.

RVB: 01:50.191 It would, I absolutely do. Dustin, what's your relationship with Neo4j and graph databases? How did you get into them?

DC: 01:59.511 Very recently, there was a competition for GraphGist on Neo4j and I thought, "What a great to finally finish one of my project ideas." It had a deadline and basically, it was the only thing I needed to finish the project. I put together a conference data model, because I was going to a conference and I wanted to know different things from the booklet they gave you, but because it's in a certain order, you can't find out certain things. I knew that a graph database would be the perfect way to query on different angles and different ways of looking at your data. Before that, I would say back in 2008 when one of our companies moved and I was worried about competing against 25 Java developers, unleashed onto the town for jobs, I went back to school. Even though it turned out not to be an issue being reemployed, I continued to go back to school and get my masters. It was then that I was doing a research paper and one of the papers I was doing was to debunk over-hyped technologies. One of them was the Semantic Web.

DC: 03:03.407 I heard of the Web 3.0 and thought to myself that this has been over-hyped. I've heard about it for years and I haven't seen anything about it. So while researching it, I realized that this is exactly what I had been looking for to solve so many of my own database projects. For example, I had some projects where I was saving just spare data. So, for instance, when you save music, MP3s or songs, it's usually saved as artist, album and song name. Well, if you talk to classical music enthusiasts, they care about orchestras and maybe the conductor, or the original composer. They look at different things, but it's the same item almost, and you still have to store it. You know, by the time I was done designing all the different kinds of genres, there was just no database for relational databases that could handle it well. And when I looked into Semantic Web, I realized that this was the solution. And back then, there was a NOSQL movement. I would say Semantic Web - since it saves as tuples or triples - I believe it's based on graph databases, a little more formal perhaps. And so my natural-- so all of the uses I found for Semantic Web I also find useful for Neo4j. And at the time, when I was choosing which graph database to use, Neo4j was the one that I had heard the most. It had the best reviews at the time. And so I decided to spend the little time I had with raising two kids to learn a new technology. And so that's why I picked Neo4j.

RVB: 04:40.390 Interesting, yeah. It's very much related technology, and that we could talk for more than what we have in this podcast about the differences between Semantic Web databases and Neo4j, because they are quite different. But [chuckles], we'll take that off-line. But they are different but related technologies, I would say. What did you like about graph database then? What problems was it solving for you? Why was it such a good fit for some of the things that you had found problematic in other databases?

DC: 05:18.521 One of the things I've always liked is how you can make non-obvious connections. You might have two different graft sets that are unconnected at the moment, but some time in the future if you're still collecting data you might [inaudible] two nodes together, and then suddenly your same query would return different results, and, in fact, maybe solve a problem. I think one of the examples I saw long ago was perhaps besides the NASA and CIA kind of examples, would be different language authors, and they might have a different name in one language than another. You might be following him and you suddenly make the connection and realize that they have all this other body of work. So I like those kind of solutions that the graph database is continuously growing and making connections for you.

RVB: 06:08.085 Yes, like inferring new paths between different parts of the graph. Is that what I'm hearing?

DC: 06:15.632 Well-spoken. Yeah, exactly [laughter].

RVB: 06:19.269 Yeah, well. I mean that is a-- that's one of the examples that I always give. It's like the path finding. I've got these two things and how are they connected to each other, and show me those connections. Whether you're talking about a [BR?] data set or a social network or recommendation engine, that's one of the most powerful use case-- those hidden connections as you call it.

DC: 06:41.392 Right, and with database design, when you're working in large company, you have to spend so much time, ahead of time, designing things. And then once it's approved, and then once it's implemented, many months or years can go by and once it's done, you're pretty much nail in the coffin, and you really don't want to change it again because you don't know what you might change. With graph databases, sometimes you can add a whole new set of features or properties, and it won't affect the past data you had. It'll actually just enrich the data you have in your new applications that can leverage it. That's very powerful.

RVB: 07:16.441 That's such a powerful point that you're making there, and I think it's also why it's like a perfect storm now for graph databases because of the whole agile developing paradigm as well. You know, people don't develop waterfall systems anymore. You know, they try to take a much more leaner approach to software development. You don't know what you don't know when you're developing a new system. It's a great fit for that, I think, as well. Dustin, where do you think this is going? Where do you, personally, want to take this? I'll put some of the links to your work on the blog post with this podcast recording, but where do you see it going? What do you want to do with it? Where do you see the industry taking it?

DC: 08:04.811 Where I personally want to take it is, perhaps, more projects that can help categorize interests. I've been an internal description, is my industry. I'm might be interested in homesteading or something. There are podcasts out there that are in the 2,000 episode range and it would be great to be able to categorize those and find exactly what you're looking for, maybe even the author or the interviewee. Some day, maybe you'll have 2,000 interviews here.

RVB: 08:33.387 Oh my God [chuckles].

DC: 08:35.375 You never know how you can tie all those people together. People bring such great resources from their work. You know, [hack the plans?], it's more like categorize the plan. I think graph databases is good for that. As far as the world's future, I think this is a tool that's generic enough and powerful enough that someone out there that doesn't even know anything about this yet is going to come up with an idea, use your project and platform, and come up with something new. It's just that kind of technology that you're creating and it's really going to be anyone's imagination really, something that we haven't even thought of yet.

RVB: 09:18.040 Like a platform for innovation, basically? Like doing--?

DC: 09:20.356 Absolutely.

RVB: 09:22.665 Super relevant. I really like that perspective. I really do. I think, for example, in the past couple of months, we've seen at Neo4j when some of the Panama Papers research was published and stuff like that. That was for us fantastic validation. That would never have happened ten years ago and now it's happening because of the small contribution that we're making, I guess.

DC: 09:49.082 Absolutely. I like your ramp up time to test out Neo4j with that web interface you have, where you can start importing right away. It's a great tool.

RVB: 09:57.583 Very cool. Very nice. Thank you, Dustin, for spending your time with me on this recording. As you know, I like to keep these things short and snappy and digestible for everyone. We'll put some of the links to your work with the transcription, but for now, I want to thank you very much for coming online. I hope to meet you in person, face-to-face, at some point [chuckles].

DC: 10:21.682 It's my pleasure and keep up the good work.

RVB: 10:23.838 Thank you so much. Bye.

DC: 10:25.481 Bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Wednesday, 31 August 2016

Podcast Interview with Dirk Vermeylen, HP Enterprise

Last month I had the pleasure of interviewing one of my fellow countrymen, a Belgian graphista living a short distance from my home in Antwerp, who had submitted a very cool graphgist to our challenge earlier this year. Dirk Vermeylen did a great gist about trying to model sports results as a graph database - very interesting, so I will let him explain it to you himself. Here's the recording of our conversation:
Here's the transcript of our conversation:

RVB: 00:02.538 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and I am here again recording a podcast episode with someone in my own country. That's actually very, very-- I don't think that's happened before. Dirk Vermeylen, from HP Enterprise is joining me from a couple of miles down the road on our phone call. Dirk, welcome.

DV: 00:23.708 Thank you, Rik. Thanks.

RVB: 00:25.290 Cheers. Thanks for coming online. Dirk, we got to know each other because of some GraphGist that you created recently which we'll talk about more a little bit later, but why don't you introduce yourself to our listeners first of all?

DV: 00:41.285 I am working for HPE, used to work for EDS. So that was taken over by HP then became HPE and there is some changes in the future as well. I work as a consultant in the service delivery and infrastructure environment.

RVB: 01:04.137 Okay and then what's your relationship to the wonderful world of graphs? How do you get into that if I may ask?

DV: 01:13.628 It is more or less by accident. I am-- as part of service delivery and a lot of work is done on configuration databases on CMDBs, where what we need to do is migrate CMDBs and varying CMDBs on for example, what is important is you want to know what service are implemented in specific data centers and what applications are related to it. There are many, many relations in between like server databases that are installed and instances at business services that are configured to it, so you only have all sets of relationship. We need to work with it and I was looking into better ways on how we can manage the data on it. Then Neo4j was mentioned when reading the internet, so at some point in time I thought I may get this a try. I did it and it worked in fact, [?] though.

RVB: 02:18.972 I think there's a lot of people out there that are using Neo4j for configuration management databases because of the relationships and review and the impact analysis that allows you to do. So, if something happens in your configuration, what's the impact on the rest of the configuration. Is that also your case or was that your background as well?

DV: 02:41.301 Exactly. Exactly. And the advantage is that when you do queries, actually Neo4j allows to document and very well decipher way of working is the very nice way of documentation. It reads easier than you do with complex SQL queries whereas we join and all of that that's a bit more tricky to understand what you have been doing a couple of days later.

RVB: 03:05.779 Absolutely. I have so many people that talked to us about that. That's great to hear. But you ended up writing that GraphGist for something completely different, right [chuckles]? What was that all about? It was about your running partners, right?

DV: 03:21.132 That's right. I was working on CMDBs also a little bit in my free time. At some point you have done and you're ready with it. At that time the GraphGist challenge appeared and it says try to do this problem if you can do it on a wide board, it will work. In my free time I tried to collect information about our running competition which is just like we go running with a couple of friends. And first at the rising in the race gets 50 points, second 45, third 40 points, and so on. And then we just add them all up together and at the end of the year you have a winner for motivation. We keep track of this points in the Excel spreadsheet. I want to automate it because automation is fun. And I tried to make a big draw. The problem-- and it worked very well at the same time as the GraphGist challenge, so I thought probably I should spent a little bit more time and work on this GraphGist challenge.

RVB: 04:33.723 Super cool. We'll put a link to your actual GraphGist on the blog post with this episode as well. What was so nice about it? What was it that made the GraphGist and Neo4j such a good fit for that particular running assignment that you wanted to solve?

DV: 04:53.803 GraphGist was easy to use, and you need the AsciiDoctor which is like-- you take notes-- part plain text and then with very little markup or markdown, as it was called, you can specify query. And see the results actually directly in the GraphGist. And Neo4j is very visual so you create-- when one query, you see a little bit of the dots already and you say "Ah, I need a second query." And from the second query you go to the third query. So, it's very visual. It allows you to very quickly progress in your problem fields.

RVB: 05:36.505 Very, very nice. I bet your running buddies were happy with the result [chuckles].

DV: 05:44.441 I didn't show it to them already. Part of the thing is that I used test results which means the races are correct, the people are correct, but the sequence of rival is not correct in the GraphGist. So I'm not sure if they will be so happy with me what I assumed there.

RVB: 06:05.157 You made yourself win every time [laughter].

DV: 06:08.787 No. Not me. Someone else. But it's-- really it's test data.

RVB: 06:13.784 I understand. Very cool. All right. So what are you going to use this for in the future, Dirk? Do you have any more professional or personal plans with this? Or where do you want to take this in the future?

DV: 06:26.347 And part of the GraphGist as well was to get more experience with cypher and I used it for CMDB to challenge with configuration management databases that you have like 20,000 objects and 18,000 relations. So if you launch a query, it may or it may not end up with the result that you are expecting because of lack of experience. With my very small example, it's a lot easier to understand what you are doing. And where I want to take it in future is configuration management databases, of course. Also, on open data-- I'm more and more involved in open data. And open data is all about linked data. And with linked data, you are very close to Neo4j and these graphical relations again.

RVB: 07:19.085 Super. Well, I mean, I think we will meet again there. Because that's exactly the type of stuff that I've been working on in the meetups here in Belgium as well. And I'm sure we'll have a chat about this in the future then. Very cool. Anything else you want to add, Derrick? Is there anything that would should be paying attention to in your work? Or otherwise, I think we'll keep this podcast recording short.

DV: 07:45.950 That's perfect. I'm happy with what Neo4j was doing already. I was working on Neo4j version two. And py2neo is very important. The py2neo library. I had a little bit of issues that I've seen-- that the new version of Neo4j and the new version of py2neo solve them all. So I'm looking forward to play around with these.

RVB: 08:10.485 Well, it's summer time. You know, this is perfect time to experiment with those [crosstalk].

DV: 08:16.662 Exactly.

RVB: 08:16.236 All right, Dirk. Thank you so much for sharing your work with us and with the entire community. I really appreciate it. And thank you for coming online to do this recording. And I'm sure we'll meet very soon.

DV: 08:28.730 Thanks, Rick. Thanks. It was nice doing this.

RVB: 08:30.685 Thanks a lot. Bye.

DV: 08:31.662 Thanks. Bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday, 11 March 2016

The Neo4j Browser has a BeerGraphGuide

I have written quite a bit about Beer on this blog - and that's a good thing. I have also done a bunch of graphgists about it, and I am sure I will keep on doing that for a while to come. But recently, I have learnt about some new functionality in the already awesome Neo4j Browser that allows you to extend it with custom GUIDES - the learning tool that is built straight into the platform. And so I had to take it for a spin, and get my Belgian Beergraph to become part and parcel of the Browser - of course.

Here's a little video I made capturing how this would look like.

I am planning to come back to this in the next few weeks, and explain the details of how this works. But for now, I think this opens up a world of possibilities on how people will be sharing and working on graphs with Neo4j and its Browser.

Cheers

Rik

Thursday, 3 March 2016

The Neo4j Knowledge Graph

A couple of days ago, I wrote a graphgist about creating a true Knowledge Graph for the Neo4j ecosystem. Based on the fantastic Awesome Neo4j resource created by our friends at Neueda/Neueda4j. You can access it in a separate window over here.

In this post however, I will go into a bit more detail about how I went about creating that graph.

Google Spreadsheet is my friend

I mentioned already that I started from the awesome Awesome Neo4j github resource. And while it's a great idea to manage pages etc collaboratively on Github, I can't help but feel like there should be other and nicer ways of structuring that information. So I spent a couple of hours converting that information into a spreadsheet (which is publicly accessible over here):

This sheet contains

info about the resource (name and comments)
the URL where you can find the resource
info about the author (individual or organisation) that created/manages the resource

So it's a very, very easy graph model:

So all I needed to do was import that sheet into Neo4j. Easy...

Importing the Google Spreadsheet with Load CSV

As we know by know, it's really easy to download a Google spreadsheet as a CSV file, and then it is pretty darn easy to import that CSV into Neo4j with Load CSV. I have two versions of that load script:

The result is not a very big graph of course:

And now we can do some nice querying on it - just for fun!

Querying the Neo4j KnowledgeGraph

Obviously there are many different queries that we could run on an interesting graph like this. I have put a couple of them on Github as well. Here they are:

//Find some Authors, Resources and Tags
MATCH p = ((a:Author)--(r:Resource)--(t:Tag))
return p
limit 25

Gives you an initial sample of the graph:

Then we can explore a couple of specific graph neighborhoods:

//Find some Authors, Resources and Tags connected to Rik or Max

MATCH (t:Tag)--(r:Resource)--(a:Author)

where a.name contains "Rik" or a.name contains "Max"

return t,r,a

this gets us this one:

And then we can also "recreate" a spreadsheet-like view of the graph:

//find some resources and authors
MATCH (r:Resource)--(a:Author)
where a.name contains "Rik" or a.name contains "Max"
return distinct a.name as Author, r.name as Resource, r.url as URL, r.comments as Description
order by Author;

This gets us (pitty that the url's don't get hyperlinked like they do on the graphgist):

And then finally, let's look at some pathfinding - always interesting:

//find some paths between books and blogs
match (t1:Tag {name:"book"}), (t2:Tag {name:"blog"}),
p = allshortestpaths ( (t1)-[*]-(t2))
return p

limit 10

As usual, we end up with Michael Hunger again :))

So there you go. A first attempt at creating another graph-based knowledge repository for all things Neo4j. Hope you guys enjoyed that. I know I did :))

Cheers

Rik

Friday, 20 November 2015

GraphGist About the Graphistania Podcast

Here's an interactive graphgist that I built about the podcast that we publish on this blog from time to time. Hope you enjoy it!

Cheers

Rik

Tuesday, 21 January 2014

The Open Source Licensing Graph

Selling an open source product like Neo4j, always gets you into the interesting question around Licensing. How do you license your product? And then I get into a very interesting explanation on how the different versions of Neo4j compare in terms of features, license, support capability, and of course pricing.

Bruggen Blog

Pages