Bruggen Blog: March 2015

Monday, 30 March 2015

Podcast interview with Matt Wright, Stitched.io

Last year, one of our lovely Neo4j users, Matt Wright, did a great talk at the London Neo4j Meetup about Private Social Networks. Super interesting talk, so when I saw this

@rvanbruggen If you need another podcast speaker….let me know…I’ll come along and talk Social Networking Theory and neo4j if you like?
— MrMattWright (@MrMattWright) March 16, 2015

I did not have to think twice. Matt and I had a great conversation about stitched.io, and how they use Neo4j in anger. Super cool:

Here's the transcript of our conversation:

RVB: Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again recording another podcast session for our Graph Database podcast. I'm joined today by Matt Wright from Stitched.io. Matt, welcome.

MW: Hi.

RVB: Hi. Matt, like everyone on the podcast, I just want to ask you who you are and what's your relationship to Neo Technology. Tell us a little bit about yourself.

MW: Okay. So, I'm Matt Wright. I'm the CTO at Stitched.io. And we're essentially-- we've got a sort of grand mission, I guess, which is to try and help build better teams for every company in the world. I guess that's our little elevator pitch. And at the moment we are sort of focused recruitment and we've built a product and we use Neo4j to build a CRM system basically. So our CRM system is built entirely on Neo4j, and then we use that for various things.

RVB: Wow, that's great. How long have you been working on that?

MW: I guess last year we were a science project and now we're an actual start-up [chuckles]. We've just sort of done a bunch of seed funding, so we're kind of going to market in the next couple of months and kind of taking it from there.

RVB: Wow, that's cool.

MW: It's been going since probably about the middle of last year in seriousness.

RVB: Cool. Well, everyone on the podcast gets a couple of questions here. And the first one that I really want to ask you is what attracted you to Graph Databases? What do you love about Graph Databases? Why is it the best thing since sliced bread?

MW: This is an easy one to answer for us. Initially we were attracted by-- my co-founder is a guy that owns a recruitment company and we started working on a problem that is kind of common for recruitment companies, which is they have this big body of contacts, very much like LinkedIn but sort of their private world, and they wanted to kind of model that and make use of it. So, we did a bunch of research and we found Neo4j, and it's like, "Hey, we can build a whole social network out of this thing." So, we started there and we've kind of done that. And then we moved on from there. It's like, "Well, how can we use this thing?" We built recommendation engines on Neo4j. All of our stuff is in Java. And so, we built recommendation engines saying, "Hey, who knows who?"- you know, the typical social networking stuff. We've also built an ontology in Neo4j to sort of say, "Hey, when you search for AngularJS, you should also search for Backbone and Ember and React, and these other things." We've tied that all together and built a big search engine.

RVB: Wow.

MW: Our search runs all of that stuff in just under a couple hundred milliseconds and there's absolutely no chance we could do that without Neo4j.

RVB: Wow. Did you look at anything else? Did you look at other types of databases before you started?

MW: Well, we started-- our first attempt was a triple store on PostgreSQL and it was horrifically slow. So, I think we probably started with Neo4j, and kind of evolved our way to our product. Like, we're very sort of graph driven, and our whole thing is about connectedness, and possible connectedness. It's a sort of-- I think, in terms of recommendation engines, you can't really find something that would do what we want, that will be this quick.

RVB: Fantastic, so how important is the real-time aspect of the recommendation? Is that a big part of the solution then?

MW: It's massive. A lot of our users are recruitment guys, and they're all rushed for time, and also they're on the phone constantly with people. So, if you can say, "Oh, you've done this skill. Oh, have you used this skill, and this skill, and this skill?" And you can use a recommendation engine to do that in real time. That's really, really valuable.

RVB: Oh, wow, fantastic. Cool. It sounds like a fantastic project. So, people can take a look at it at Stitched.io, right?

MW: Yep, you can go and look at our Superman guy and yeah, if anybody wants to-- actually we have a blog as well which goes through a lot of technical detail on how we do all the things. We have a sort of very heavy Angular front end so there's a good blog post on that. And there'll be a few more coming out and about the Java stack and some sort of social networking theory as well. That's like Stitched.io/blog and you'll get there.

RVB: Yeah, fantastic, and I think you guys did a talk at the London meet-up last year as well, didn't you?

MW: Yeah, that's right. I'm probably due to give you guys another one because we've definitely got some updated bling to show everybody [chuckles].
Fantastic, okay, we'll do that. Wrapping up here with the same question that I ask everyone, where do you think this is going? Where is Stitched.io going? Where is graph databases going? What do you want from graph databases in the future? Could you give us your perspective?

RVB: Yeah, sure and I have a sort of a related question back actually, have you read that book Connected - the orange one?
Absolutely, I have, yeah.

MW: It's like the best book ever.

RVB: It's fantastic.

MW: Outside of our stuff, the graph databases can be applied to a billion different things. There's some fantastic stuff in there about the spread of disease and the spread of obesity, the internet of things and controlling population growth and all the stuff is like--

RVB: Lots of great examples, right?

MW: There's a start-up every two pages.

RVB: It's fantastic, yeah [chuckles].

MW: And I think one of the other things is starting to measure sentiments and the way that emotion travels through and all of this stuff we just haven't touched the sides of. And there's some excellent tech stuff coming out from Kenny Bastani that's a very, very good bit of tech - Maze Runner - and we're gonna have a look at that which is sort of the off graph analytics.

RVB: That's the integration with Spark, right?

MW: Yeah, exactly. That's pretty exciting. But in general, I think there's a world of applications out there that graphs can be used for.

RVB: Fantastic. Cool. Matt, we're going to wrap up here. Thanks a lot for taking the time to talk to me and--

MW: No worries.

RVB: wish you guys a lot of luck and success with Stitched.io.

MW: Thanks very much.

RVB: And I'm sure we'll meet each other, for example, at GraphConnect, right?

MW: Yeah, for sure.

RVB: Absolutely. Thank you, Matt.

MW: Okay. See you.

RVB: Bye.

MW: Bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday, 27 March 2015

Podcast Interview with Dirk Mahler, JQAssistant

Had a great conversation with Dirk Mahler. Dirk works for Buschmais, and is one of the leading contributors to jQAssistant, a software quality analysis tool based on Neo4j. Find it on Github, and read up on the documentation over here.

Here's the transcript of our conversation:

RVB: Hello, everyone. My name is Rik, Rik van Bruggen. I work for Neo Technology, and I'm here again to record one of our graph database podcasts. I'm here in cloudy Antwerp on the day of the eclipse, and I'm joined here today by Dirk Mahler who has been a long time graphista. Dirk, would you mind introducing yourself a little bit to us?

DM: Hello. As Rik mentioned, my name is Dirk Mahler. I'm coming from Germany, working for a small company called Buschmais. What our company actually is doing is consultancy mainly for Java Enterprise projects, and I started working with Neo4j and graph databases two years ago.

RVB: Interesting. And how did you get to Neo? How did you find Neo? How do you find out about Neo?

DM: There's a guy called Michael Hunger who is a very old friend of mine and who actually works for Neo Technology.

RVB: Yes, Michael Hunger, yes.

DM: And for weeks he was telling me, "Oh, Dirk, please, Dirk, try out our product. It's a really cool thing." I always resisted, and one fine day, he achieved what he wanted and we went together to a introduction training in Berlin, which he gave. After that training I was so surprised of how easy data modeling works and especially how you can create those data using Cypher that I just started right on the way back on the train implementing my first things with Neo4j.

RVB: Oh, wow, and as I understand it, you've been developing a product or an open source project around it, right?

DM: Yes. On the way to Berlin, Michael showed me a prototype of what he did, and this was a scanner for java structures, java classes. He gave me some ideas what one could do with that data. From my work in larger projects with lot of people involved, I always had the problem that it was hard to get some rules or conventions established, like naming rules for Java classes or packages - or Maven modules, for all the Java guys who are listening right now. And when I went back on train I had the idea, "Cool, it might be the right thing to put those software structures on to a Java project and work them into a database, create trees on that data, and to enforce rules - like, several classes of several types must be located in dedicated packages, or something like that.

RVB: What's the project called? Can you tell us a little bit more about it?

DM: The project is called jQAssistant - Java Quality Assistant. The term Java might be a bit misleading right now as it is for Neo4j - because it's not only for Java - but it's implemented actually in Java.

RVB: Interesting. Let's talk about something else - related, of course. What do you really like about graph databases or why is it so powerful for you? Can you tell us a little bit more about that?

DM: There are mainly three things. Let's start with performance. Especially Neo4j, it's quite fast and to reading all the structures into this database, it's just a snip and you've got the data in the database, but that's only one aspect. The other tools - data modeling is quite intuitive and flexible. Let me explain that. If you'll read the structures of a software project into a database, you're not only interested in getting all the Java stuff, you might also be interested in getting all the stuff which is around like the build system, like the database structures you're working with. And with the schemaless nature of the graph database, it's quite easy, possible to get these things in without having thinking about how do I change my schema, how can I add things, and this enables, for instance, a plug-in-like architecture, as jQAssistant right now provides. That's the thing, and the third thing, it's simply Cypher. This query language is so nice to read, to write, quite intuitive and I think, even for developers, quite easy to learn. I like it, really.

RVB: Fantastic. I'm talking to Andres Taylor who invented Cypher in the next couple of podcasts, so I'll make sure he knows. That's great, fantastic. So, Dirk, maybe one more thing. Where do you think this is going? Where do you see your open source product going and where do you see graph databases going? Would you mind sharing your perspective?

DM: The jQAssistant itself, I see two things. The first one is giving people a tool that can read in the structures of their own software projects and doing some explorations - how are things connected to each other? - creating, gathering their own metrics where they don't need special tools which need to be implemented by some vendor party to get their own things. That's the one thing. Beyond the jQAssistant graph databases themselves, what I learned is that it's quite easy to model your business domain with graphs. A business domain usually is not just focused on one or two tiny little things, but there are always things coming from the outside which might be correlated together with the things you are actually looking at. It's that way of collecting data from different sources together, correlate them, matching them together, and then being able to ask questions. What I see currently right now in the projects I'm working in this that people try to put data in a database which actually does not fit their needs, and what I see is data modeling is quite easy with graph databases and choosing the right database for complex business domain. Most of the business domains out there are complex. It's even easier with a graph database. I hope it will be the first choice for modeling a business domain in some years.

RVB: Fantastic. Thank you so much, Dirk. We are going to wrap up here. If people want to know more about jQAssistant they can go to jqassistant.org, I believe, right?

DM: That's correct.

RVB: If you want to know more about Neo4j, there's only one site. That's neo4j.com and, obviously, you can always reach out to us from the podcast or the website. Thank you so much, Dirk, for doing this with us and look forward to speaking to you again.

DM: Thank you, Rik.

RVB: Bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Graph Karaoke: Dancing in the Dark

After having done the "Ignite" talk yesterday at the Data Innovation Summit, The Boss kept ringing in my head. What better way to keep it there if not for some Graph Karaoke. Here it goes:

Hope you liked it.

Cheers

Rik

Thursday, 26 March 2015

Data Innovation Survey for Belgium - in Neo4j

Today, I have had loads of fun at the Data Innovation Summit in Brussels, Belgium. Hosted in the beautiful Axa Belgium offices, it was a great opportunity to meet 500 (!!) data-minded professionals. I was also able to do an Ignite Talk there, which was quite an experience. 15 seconds for every slide, and no way for you to change the slides yourself and determine the "rythm" - very different. Here are the slides:

But that was not the coolest thing. They also did a "Data Innovation Survey", which was super cool. The data is all open (find it in this gist), and I of course took it from Excel

create a graph MODEL out of it

and then load it into Neo4j using this load script. You will need to tweak the load csv file locations, but after that: just download Neo4j 2.2, fire up the Neo4j-shell, and paste all the commands into it. Should be a matter of half a minute to load the data.

Then we have the data in Neo4j, and we can start doing some queries. Now, I must admit that I am not a huge fun of working the data this way - as there are very few intricate relationships that we can use meaningfully. Nevertheless, here are a few queries:

 //respondents and techniques with PhDs  
 MATCH (dl:DegreeLevel {name:"PhD"})--(r:Respondent)--(t:Technique)  
 return dl,r,t

That's easy:

Let's make it a bit more sophisticated:

 //respondents and techniques at level 5 with PhDs and their DegreeFields  
 MATCH (dl:DegreeLevel {name:"PhD"})--(r:Respondent)-[ht:HAS_TECHNIQUE {level:'5'}]--(t:Techniques),  
 (r)--(df:DegreeField)  
 return dl,r,t,df  
 limit 10

You can see how that would make the visualisation a bit more complicated.

And then finally, here is a first attempt at doing something a bit more "graphy". Let's see which "DegreeFields" are the most important in our graph. In other words - the most "Between" the other nodes of the graph. We do that with a query like this:

 //betweenness centrality of the "DegreeFields"  
 MATCH p=allShortestPaths((r1:Respondent)-[*]-(r2:Respondent))  
  WHERE id(r1) < id(r2) and length(p) > 1  
  UNWIND nodes(p)[1..-1] as n  
  WITH n, count(*) as betweenness, labels(n) as labels  
  WHERE "DegreeField" in labels  
  RETURN n.name, betweenness  
  order by betweenness desc;

and then we see this result:

There's a lot of importance to Science/Mathematics, ICT and Engineering. Who would have thought?

You can of course apply these techniques much more generically to other problems, and that is mostly why I share it here. I hope others find it interesting, and as always...

... Feedback welcome!

Cheers

Rik

Wednesday, 25 March 2015

5 big small things I love about Neo4j 2.2

Today, Neo4j 2.2 was released. It was a good day - and everyone at Neo was really excited. This was a big release. It was not an easy one, as we redid large parts of the plumbing for Neo4j - and still provide customers a smooth transition path. It's a pretty spectacular piece of software engineering, if you ask me - and it feels good to get that out the door. I know that I am very excited about it - and I am sure that our customers and users will be too.

That's why I thought I would list the things that I really love about Neo4j 2.2 - from a users perspective. Sure the upgraded plumbing is fantastic - but what will users SEE when they get started with the new version? And what parts of that may or may not be as easy to uncover at first glance? Let's take a look.

1. Getting started by :play-ing

A big focus for Neo as a product and solution, has been to make it easy and straightforward to get started with. That's important, because as my friend and colleague Mark pointed out several times already, there is a learning curve that our users go through. Mark very cleverly noted that graphistas go through a phase where the learning curve is very steep - and everything seems really hard then. After that, things quickly get better and easier - and gradually people tend to fall deeply in love with the technology.

That's why it's so important to make that initial phase so nice and easy for users. And that's why I am totally digging the :play command in the browser these days. By :play-ing (totally inline with the "playful learning" that you can apply to your kids), our users will be learning and getting into the technology as gently as possible. Here are a few examples:

:play query template

This command gives you access to some really nice and easy ways to input some data into Neo4j. it's really sweet. Take it for a spin and input some straightforward data - it's fun!

:play northwind graph

Another super-cool command, taking you through an example that most people know - the Microsoft Northwind relational database - and "migrating" that to a graph model, step by step. Love it!

:play sysinfo

Maybe a bit more advanced, but this is the first time that the Neo4j browser - which has traditionally been positioned as the developer's Cypher playground - also gets some admin-style capabilities. No doubt this will be expanded in the future, but already this pane is really useful.

2. Query pane enhancements FTW!

In the new Neo4j browser, the query pane that displays the results of your queries, has been completely redesigned. Some of these features are minor cosmetic changes, but some are actually quite massive:

The pane now allows you to cancel running Cypher queries. This is massive. With the simple click of a button on the X marker, you can now tell the Neo4j server that it should cancel/rollback the transaction that you just launched. Tremendously important for many users - previously you would sometimes just need to kill the server when transactions would spin out of control.
Exporting data from the browser: it was always possible to export the view in the "graph view" to json files. But now you can do so much more. Especially the export to .png and .svg is useful for those of us that what to share query results with their colleagues. I love that.
At the bottom of the Graph pane you now have this little slider: do you want to Auto-complete your graph, yes or no. This can be really, really useful. Auto-complete basically allows you to switch the browser from either showing ALL of the relationships between the subgraph in the resultset, or only showing those that you explicitly included in your "RETURN" statement. Super useful.

3. Additional Browser :commands

There's two Browser commands that you may or may not yet know about, and I think they are really useful:

:style

This gives you access to the "grass" file that is applied to the browser. This GRAph Style Sheet will allow you to configure the visualization in detail. Normally you edit this file by making visualization choices in the query pane - but you can also download the file, edit in your favourite text editor, and re-load it by dropping it on the button below. Really easy.

:config

This one is interesting, as it gives you fine grained control on how the Browser web application (you should look at the "Browser" as an application that talks to the Neo4j REST API) behaves. You can see some of these parameters below:

And: you can edit some of these parameters too! For example, let's say I want to edit the maximum number of rows returned by the Browser. Then you do:

:config maxRows:1000

and that parameter will be updated. Or let's say you want to edit the maximum number of "neighbour" nodes displayed when you double-click a node in a pane:

:config maxNeighbours:1000

You obviously need to be careful with these parameters, as they can very easily screw with your browser stability - but still very useful.

4. Editing the GRASS file: Composite captions rule!

I already mentioned the GRASS file above, but I wanted to mention one of these things that I think will really enhance the Browser experience: the ability to create composite captions on the visualisations. What are these? Well let's say that I have a subgraph with nodes that are labeled as :Respondent, and that they have a "name" property that holds an identifier. Well, if I want to create a visualisation that displays a Node caption that has the word "Respondent"+the "name" property in it, I can now do that by including

caption: 'Identifier {name}';

instead of

caption: '{name}';

in the GRASS file. See the result below.

Or, let's say that I would have TWO properties on these kinds of nodes: "name" and "age". Well, then with the composite captions I can create a caption that combines both properties in one Node caption:

caption: '{name} aged {age}';

Cool, huh? Really helpful, I think.

5. Query plans VISUALIZED!

As you may have learned, Neo4j now has a superfast new "cost based" query planner for Cypher. That's super interesting, and has yielded massive performance enhancements over previous Neo4j Cypher planners - which were all rule based. What you may not know yet, is that the new Neo4j 2,2 Browser includes a fantastic way for users to understand and tune their queries, but visually showing you what happens to a query as it gets interpreted and executed. There's two keywords that you want to remember to prefix your queries with:

EXPLAIN will show you the query plan, but will not execute the query. Particularly helpful for "heavy" operations.
PROFILE will show you the plan and also execute it - giving you a real world sense of the execution speed on a running database.

The basic rule here is of course that you want to keep the "database hits" as low as possible, and that you particularily want to avoid joins/products of large sets. That's what kils performance on a RDBMS, and it can still kill performance in Neo4j too.

So there you go. 5 small big things, or big small things that I think will really matter for our users as they start playing with Neo4j 2.2. I have really enjoyed it so far - I am hoping that you guys will too!

Hope that was useful.

Cheers

Rik

What do Linkurio.us, the ICIJ and Swissleaks have in common?

That would be Neo4j.

Here's a wonderful interview that I did with Jean Villedieu, one of the founding fathers of Linkurio.us. Another great conversation, in which I learned how Linkurio.us helped the International Consortium of Investigative Journalists (ICIJ) uncover the most important secrets of the one of the world's largest banking corporations. Super fascinating stuff. Listen here:

Here's the transcript of our conversation:

RVB: Hello everyone. My name is Rik and here we are again, recording another episode of our Neo4j graph database podcast. And today, we're doing another remote session with Jean Villedieu from Linkurious. Jean is in France today. I'm here in cloudy Antwerp. Welcome Jean.

JV: Hello Rik. Thank you very much for inviting me to the podcast.

RVB: Absolutely. Yeah. It's great. So Jean, like always, we have three parts in this podcast and I'd really like you to introduce yourself a little bit to our listeners.

JV: Sure. My name is Jean Villedieu and I am one of the co-founders of Linkurious. At Linkurious, I'm in charge of sales and marketing and Linkurious is a French startup, based in Paris. Founded three years ago and we specialize in graph visualization. What that means is that we have software that runs on top of Neo4j and that provides Neo4j users a nice interface with which they can browse their data, explore it visually and extract information that's within their graph.

RVB: So Linkurious is a product that people can buy, right?

JV: Exactly. You can visit the website at http://linkurio.us where you can find more information about our products, and you can buy and download it.

RVB: But it's also an open source part as I understand it, right? There's a part of it that's open source?

JV: Actually, it's a good question. We recently released Linkurious.js, a graph visualization library, and it has a dual license with a open source license and a commercial one. You can find more information about that on Github and on a website which is actually going to be updated in a few days or weeks.

RVB: Super. Well, like any open source tool, it's great that you can explore it and have the community look at the internals but at the same time, there needs to be business model for it to survive right? So, we're on the same page there.

RVB: Jean, really the next question for me is, what attracted you guys to graphs? Why do you love graphs so much and how did you get into it? Could you give us a little bit more insight there?

JV: Sure. So, I co-founded Linkurious with Sebastien Heymann. I'm going to talk a little bit about him and then I'll talk about myself. Sebastien has a long love story with graphs. I think he started a project called Gephi, about six or seven years ago. It's open source graph visualization program.

RVB: That's very well-known and Gephi's, everyone uses Gephi.

JV: Yeah, and I think he's been in love with graphs ever since. For me it has been more recent I'd say. During my studies I started using Gephi for competitive analysis reasons. I really loved it because I thought it was powerful, it was a completely new way for me to understand data, and at the same time it was very beautiful. It was very exciting to--

RVB: Beautiful is good …

JV: Exactly, that's-- I'm not ashamed of that, it's one of the things that attracted me to graph, and graph visualization in the first days, and it's still something that I find exciting. I started diving into Gephi and graph visualization. I thought it was very interesting, and then I met Sebastien and had the opportunity to start Linkurio.us. Ever since, what had been very fascinating with the ongoing work at Linkurio.us, is that we get to interact with people who are innovators working with data in new ways to solve various problems in the field of financial services. In the field of security in general, in the field of health, and we get to play a very modest part in that journey by helping them understand their data and--

RVB: What's the most exciting case that you've had in the past couple of years? Can you give us one example maybe?

JV: Sure, something that was very interesting for us was working with the ICIJ. It's a conglomerate of data journalist and they used Neo4j and Linkurious to explore the data from HSBC and work on important very large scale tax fraud scheme or arranged tax fraud scheme.

RVB: Is that the case that was in the news a couple of weeks ago?

JV: Yeah exactly.

RVB: Oh. Wow.

JV: And that has been discovered by combination of Neo4j and Linkurious.

RVB: Super. I was always thinking that I could use visualizations to find new beers but you can do more than that …

RVB: We'll switch to the next and the last question. Where do you think this is going in the next couple of years? Where is the graph space, graph data space going but also where is Linkurio.us going in the next couple of years, could you help us there a little bit?

JV: Sure, I think graphs will still be a niche, compared to relational data basis in general, but it's a very quickly growing niche, and it's going to be very important for companies working with large volumes of data, within that more and more people are going to be interacting with graphs. It's going to be something very common in the business world and we're excited to see a new cases, new applications, almost on a daily basis. So there's bright future for graph technologies. And as a small company Linkurio.us intends to play our part in democratizing graph technologies, by offering the solution that business people - everyday business people - will use to understand their graph data. We want to be the reference for that.

RVB: That's super. Thank you so much. It was a great talk. We're going to wrap up now. We want to keep this podcast short, and sweet. Thank you-- thanks a lot. If people want to know more about either Neo4j or Linkurio.us you can go to Neo4j.com or linkurio.us. Thank you again, Jean, and I look forward to speaking to you soon again.

JV: Thank you very much, Rik.

RVB: Bye.

JV: Have a nice day.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Monday, 23 March 2015

Hidden GraphGems revisited: the 2.2 Meta-graph

Last week I shared this great little gem of a Cypher query that would allow you to very quickly to take a view at the Model of an existing Neo4j database. This was the query that we ran to generate the Meta-graph:

 // generate the pre-2.2 META-graph  
 MATCH (a)-[r]->(b)  
 WITH labels(a) AS a_labels,type(r) AS rel_type,labels(b) AS b_labels  
 UNWIND a_labels as l  
 UNWIND b_labels as l2  
 MERGE (a:Node:Meta {name:l})  
 MERGE (b:Node:Meta {name:l2})  
 MERGE (a)-[:OUTGOING]->(:Relationship:Meta {name:rel_type})-[:INCOMING]->(b)  
 RETURN distinct l as first_node, rel_type as connected_by, l2 as second_node

That gave us a visualisation in the Neo4j browser that looked like this:

Nice, but not really. It's a little awkward, actually, as it uses nodes to represent Node-labels (makes sense!), but also uses nodes to represent relationship-types (makes a lot less sense). I mean, it was useful and nice, but we want and need better than that.

Turns out that one of the main reasons that Michael originally created this query this way, was that he needed a way to "visualise" the relationship names. In previous versions of Neo4j, the browser did not allow you to choose the relationship property to use on the relationship - and now it does. So ... we could revisit the Meta-graph query, right? Right!

The Meta-graph as from Neo4j 2.2

There have been a LOT of improvements in Neo4j 2.2, most under the hood - but some are very visible in the Neo4j browser. One of the new things is that you can choose the property to put on the relationships in the graph view of the browser. Seems trivial, but if we make a couple of small tweaks to the Meta-graph query, it gets a whole lot better:

 // generate the 2.2 META-graph  
  MATCH (a)-[r]->(b)   
  WITH labels(a) AS a_labels,type(r) AS rel_type,labels(b) AS b_labels   
  UNWIND a_labels as l   
  UNWIND b_labels as l2   
  MERGE (a:Meta_Node {name:l})   
  MERGE (b:Meta_Node {name:l2})   
  MERGE (a)-[:META_RELATIONSHIP {name:rel_type}]->(b)   
  RETURN distinct l as first_node, rel_type as connected_by, l2 as second_node

Instead of creating a node for every relationship, it now just ... creates a relationship for every relationship type, and adds a "name" property to the META_RELATIONSHIP relationship type. That's the property that we can then select in the new browser to create a visualisation like this one:

How much nicer is that? A lot, if you ask me.

So take it for a spin, and let me know what you think. I for one like it :)

Cheers

Rik

Sunday, 22 March 2015

Starting the week with a podcast interview: Dr. Jim Webber

The workweek is almost there, so what better time to publish another interview for our Neo4j Graph Database podcast. And this one is a bit special, since the interview is with one of those people that is probably as close as you can get to the forefront of graph database technology: the one and only, Dr. Jim Webber.

Here's the transcription of the interview:

RVB: Hello, everyone. My name is Rik and today I'm in sunny Amsterdam recording another podcast for our Graph Database podcast. We have a new guest which is the ever charming Mr. Jim Webber. Hey, Jim.

JW: Hello, Rick, how you doing?

RVB: Doing very well. The sun is shining and it's a beautiful day outside. People might not know you that well so why don't you introduce yourself? What's your link to the wonderful world of graph databases?

JW: My name's Jim Webber and I'm Neo4J's chief scientist. My link to graph databases goes back to about 2008 when I first started to use Neo4J and ultimately contribute to the database before I joined the Neo team about four-and-a-half years ago.

RVB: That's great. So in this podcast, I really don't have a lot of questions. The two really most important questions are what do you love most about Neo or graph databases and where do you see this going, so let's start with the start. Can you tell us a little bit about what you really love about graph databases and why it's the best thing since sliced bread?

JW: It may even be better than sliced bead. There's two answers, really, Rick. At the moment, I work on the inside of Neo4j and I've got to say that it's a joy and a privilege to do some really fascinating computer science research and development work that people then take and build amazing systems on. But that amazing systems aspect is really what got me hooked in the first place as an Neo4j user.

JW: In what feels like an eternity ago now, I was faced with a challenge of building a product catalog in a telecoms company and modeling in this product catalog things that the business users wanted, particularly up-sell - knowing what products you already had, being able to price those products as a bundle, and importantly being able to cross-sell and up-sell to you to increase your value as a customer. Back in the day, we were going to do that with a mixture of off the shelf software and customization relational databases. We thought that give or take, it would take about three years before we had up-sell functionality implemented.

JW: I bumped into Emil Eifrem, one of the founders of Neo4J, quite accidentally in the bleak depths of a sudden Swedish winter and he explained to me that really, my model was a graph. I kind of got that at a conceptual level. I knew that things depended on other things and so on. But then Emil told me about this thing called Neo4J where the graphs were represented as first class citizens in the database. I've got to say that didn't sit comfortably with me. The database for us back then, that was all about the relational database. While I understood in my Java code or whatever I was going to have a bunch of objects connected together in my database, I expected tables.

JW: Anyway, I went back to work and thought, "What was that funny named database that that Swedish guy told me about?" We gave it a go and actually within an afternoon - admittedly a long afternoon - we spiked out what it would mean to do a product catalog for telcoms and implement up-sell, and that really blew away. The first time I ran a query in Neo - and it wasn't Neo as we know it today, it was an early version of Neo - didn't have the wonderful cipher query language or any of that stuff, just really did a simple graft traversal. But when it told me given my starting product what I should buy next, blew me away. I honestly thought I'd built Skynet and then you think, "I haven't actually, I've just done a graph traversal." But from that moment on, I was hooked because we took a problem that we thought might take three years to deliver and we delivered it in order of magnitude hours. Being able to conceptualize a problem as graphs makes things that were previously intractable readily tractable and I was hooked from there on in.

RVB: That's super. So in summarizing, it's all about the model? Is that-- it's the model that most attracts you to it?

JW: So the graph model, I think, is the most expressive and pleasant model that I've ever worked with, because it matches the way that I think many of us think as humans about stuff being connected to other stuff - kind of rich, semantically-driven network. What attracts me to Neo is that it was the technology that supported that model that was leagues ahead of everything I've ever seen before, and even today is still leagues ahead because it's by far the most mature graph database available. So If I want to adopt the most expressive and straightforward and pleasurable model, and indeed in many cases the most performing model, that's graphs. If I want the technology that supports that, that's Neo.

RVB: Super. Preaching to the choir, but I couldn't agree more. The follow-up question is where is it going? As the chief scientist, you're perfectly placed to answer that question. Where would you see the technology in five years from now and what's a realistic objective there?

JW: I guess there are two levels to that answer. There's the business impact that the technology's going to have, then there's the technology itself, and I think they're both fascinating. Right now my sense is that graphs are primed for the big time. You look at all the indicators that we have, all of the metrics that the analysts are running, and graphs are by far outstripping all other database categories, even those equal in terms of growth right now. There's something that's built up and pent up and now graphs are a thing. Not just Neo4j, although Neo4j is leading the charge, but graph tech as a whole is really taking off. I think that at some point in the medium term we're going to look at graphs in the same way we look at relational today - it's just going to be the data model. I'm super confident about that.

JW: Where are we going in terms of Neo4j and its implementation technology? Well, where are we not going? There's years of computer science ahead of us, some of which is already written down. The academic community has been doing some very pioneering work there. But to boil it down to a few podcastable sound bites, I think there's a bunch of work that's going to happen around query languages. I think that cipher query language is going to go from strength to strength. I think those guys are going to figure out better and better ways of doing query planning and optimization, potentially even things like your parallel queries and distributed queries and so on.

JW: In terms of the database engine itself, it's a whole bunch of fundamental concurrent programming, algorithms, and data structure stuff where we're going to be pushing the limits in terms of performance and robustness. Indeed in terms of the distribution system stuff, which is in my background, I think we've seen a resurgence of incredibly high performance transaction protocols which all but detonate the reasons for having soft state and eventual consistency because they eliminate so many of the unavailability and un-performant characteristics of traditional 2PC. So I think we're going to see a resurgence of extremely high performant, high concurrency transaction processing and commit protocols. I'm definitely looking forward to living in that world in the next few years.

RVB: That is a super answer and I'm really looking forward to living that together with you. It's going to be an exciting ride. Thank you so much for the time, Jim, and I look forward to speak to you again.

JW: Pleasure. Thanks, Rick.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

Before I leave you, just two more things:

after this interview, Jim and I spent another half hour laughing ourselves into a dent because the Doctor had misunderstood one of my closing comments (underlined and bold above): he thought - being in Amsterdam and all that - that I said "I look forward to living together with you" - instead of what I really said. Wishful on his part, probably, but it did give us a lot of giggles :) ...
there's a reason the interview has the song below, performed by Wilco and Billie Bragg. Woody Guthrie's commentary on the "big depression" bankers seems ever so actual today. And I think Jim would like this cover.

All the best - have a great week.

Rik

Friday, 20 March 2015

Podcast Interview with Ron van Weverwijk, GoDataDriven

I couldn't be happier about publishing this next episode in the podcast series: my "partner in crime", both in many different customer projects, driving the Dutch GraphDB meetup, reviewing my little book, and just being awesome at GoDataDriven, Ron van Weverwijk has also been such a nice and interesting guy to come across. You should meet him. Starting with this interview:

Here's the transcript of our conversation:

RVB: Hello, everyone. My name is Rik and I'm here again trying to record a new session for our graph database podcast. It's a great pleasure to have another guest here on a Skype call this time - I'm hoping this works out well - and that is Ron van Weverwijk from GoDataDriven in the Netherlands. Ron, I'm in Antwerp, you're over there in the Netherlands, but I'm hoping that this works out well.

RVW: We'll see.

RVB: We'll see, exactly. Would you mind introducing yourself, Ron, and tell us a little bit about yourself?

RVW: Yeah, I'm Ron van Weverwijk. I'm working at GoDataDriven, a company who is focusing on big data and data science in the Netherlands. Next to that I'm very interested in Neo4j. I work with Neo4j for a couple of years now - I believe it's four or five years now - and enjoying it big time. I'm giving the trainings in the Netherlands for the Neo courses in the Netherlands, and enjoying it a lot.

RVB: That's great. Thank you, Ron. We've been working a lot together also on the meetups and on the trainings in the Netherlands, and it's been a great experience. But I think this podcast format, what we've been doing so far is asking people really two questions only. First question is why do you love Neo? Why do you love graph databases, and what do you think is so fantastic about it? What's your perspective, Ron?

RVW: Maybe to answer that question, I want to get back in history a bit. My first interaction with Neo4j, we had a problem, which was actually kind of a social networking problem. We've done a lot of work on an old platform as a relational database to discover other relationships which are on the data we had in that - a lot of joins, a lot of recursion in there, a lot of detection of looping and those kind of things. We were producing a lot of code and a lot of hard code to manage, hard code to write. And then when we first discovered Neo4j, we saw that all the pain was getting away and all the joins we needed to do on the relational database system, they were gone. It was actually quite easy to do those kind of things in Neo4j.

RVW: I think nowadays a lot of people start looking and interacting with the data and looking for the connections, the data parts in your databases is using, and it is quite interesting that when we start working with tools like LinkedIn and Facebook, and we're constantly focusing on, "Who am I connected with? What are those people doing? In which companies do they work?" On our social life, we do a lot of things about the connections we have around us, and you see that, more and more, people want to do the same kind of things in their businesses as well - want to look at the connection between the data - and those kind of things are quite a pain to do on a relational database and actually very easy to do with Neo4j. The modeling in Neo4j is so easy and so easy to build the database straight on top of the business vision you have that it makes it quite fun to work with because the match is so good.

RVB: What I'm hearing is it's all about the easiness with which the model is reducing complexity, right? Is that a good summary?

RVW: Yeah, yeah, exactly. That's a good summary, and somehow I do a lot of proof of concepts with Neo4j at customers, and I really love that when interacting with the business for the first time, start working at a whiteboard drawing and making a first doodle about the business model they have on a white board. The first time you need to visualize it. Yeah, they immediately see the connection and they immediately see how things are being modeled in the database, and it's fun that the first time you need to take a pencil and draw some lines and draw some circles to represent the notes and relationships. After working with people for a while, you start seeing that when they want to explain the same things or the new business cases, they immediately draw the same kind of graphs and the same kind of images, so it’s very addictive. People start taking the habits of thinking in graphs.

RVB: I see that all the time as well. When I go and visit clients or community users, oftentimes there will be a whiteboard in the meeting room and you will see there's graph already up there but they didn't know it was a graph, you know what I mean? Cool, well, keeping it short a little bit, Ron, I think that’s a great summary and great answer to the question. So that leaves me only one question left that I want to ask you, and that's basically where do you think this is going? Where do you think? Where do you hope? Where do you want it to be in a couple of years from now? What's your perspective?

RVW: I hope that it will be in the default toolbelt of developers. Now we see that when a developer needs to have a database to store the basic data model, they start working with a relational database because they know it. Hopefully, in the coming couple of years, graph databases will be more known and people will start recognizing, "Oh, maybe the graph database is a good primary database for my primary storage," and from thereon when people start developing and start working with graph databases, I hope that we can do a bit more of data analytics on those databases because I see that when you start looking at the connection in your database, you're really starting to see the different perspective of your data you already have. You can do very good data quality analysis on your data, impact analysis on your data, and all those kind of things are quite easy to do with the graph database and graph model. I really hope that in the past couple of years we can really do great data analysis on the Neo4j platform, as well.

RVB: That's how we all become GoDataDriven, right? That's how we all should work.

RVW: Yes, indeed. We all want to be GoDataDriven.

RVB: Absolutely. Cool, Ron. Thank you so much for taking the time to talk to me about all that stuff. It's been great having you in the community and on the courses, so thank you for that as well. I look forward to working with you in the future as well.

RVW: Yes. Thank you. My pleasure.

RVB: Cheers, bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes!

Hope you'll enjoy it!

Cheers

Rik

Tuesday, 17 March 2015

Podcast Interview with Ryan Boyd, Neo Technology

Here's a really great chat with one of our latest wonderful colleagues at Neo Technology. Ryan Boyd joined Neo a couple of months ago from Google to work with people like Michael Hunger to help, serve and grow our awesome Neo4j community.

Here's the transcript of our conversation:

RVB: Hi everyone. My name is Rik and I'm here today to record another version of our podcast series around Neo4j, the world’s leading graph database, and I'm here today with Ryan, Ryan Boyd, to talk about his view on graph database in the industry and where is it going and all that. So, Ryan, could you quickly introduce yourself.

RB: Sure. My name is Ryan Boyd. I'm in the developer relations team here at Neo, trying to help developers use Neo and understand the power of graph databases.

RVB: And how did you come to Neo?

RB: Sure. I was at Google for a number of years working on the cloud platform technologies and I came to Neo after meeting with the exec team a number of times and realizing that graph databases is really where the future is at for a lot of different types of data. It's how you can effectively and quickly query large amounts of connected data - and I really thought that was exciting and wanted to make more people aware of that.

RVB: Awesome. That's fantastic. I mean, there's so many people that are looking at graph databases for lots of different reasons, but what is the one thing that you think is so fascinating about it and what do you like most about graph databases?

RB: The one thing that is really just about that performance factor. Graph databases can be a lot faster and a lot easier to understand when you have data that's highly connected. So, at first the obvious use case that gets you attracted to graph databases is how people are connected through social networks. But then you can see how the power of it in things like fraud detection or things like finding shorter paths and routes for trip-planning and all. I'm actually right now trying to plan a trip to fly out for our conference - GraphConnect in London - in a couple of months, and I wish my airline used graph databases, it would be much easier for me to plan my trip.

RVB: [chuckles] Absolutely. You might be able to optimize your costs, as well. Shortest wait, shortest path kind of calculation. That's great. Maybe, last question for you. Where do you think this is going? Where do you think graph databases will be in three, four, five years from now? What do you think the industry will be like and people will be using this for?

RB: Sure. Although we have a large number of customers here at Neo, we also see a huge amount of interest in graph databases. Nearly every developer I've talked to has said, "Hey, I want to look into graph databases. I have this use case which I think it might be good for, but I haven't had time quite yet." I'm hoping if we travel a couple years down the road, that more developers have looked at it and understand the technology and where it could be useful. And pretty much every company and every application has at least a portion of their application powered by a graph database. Because I've met very few use cases, very few types of applications or types of companies where they can't be powerful.

RVB: That's great. Thank you so much for spending the time, Ryan. I look forward to speaking to you again.

RB: Absolutely. Thank you.

RVB: All right. Bye.

Subscribing to the podcast is easy: just add the rss feed or add us in iTunes!

Hope you'll enjoy it!

Cheers

Rik

Bruggen Blog

Pages

Monday, 30 March 2015

Podcast interview with Matt Wright, Stitched.io

Friday, 27 March 2015

Podcast Interview with Dirk Mahler, JQAssistant

Graph Karaoke: Dancing in the Dark

Thursday, 26 March 2015

Data Innovation Survey for Belgium - in Neo4j

Wednesday, 25 March 2015

5 big small things I love about Neo4j 2.2

1. Getting started by :play-ing

:play query template

:play northwind graph

:play sysinfo

2. Query pane enhancements FTW!

3. Additional Browser :commands

:style

:config

4. Editing the GRASS file: Composite captions rule!

5. Query plans VISUALIZED!

What do Linkurio.us, the ICIJ and Swissleaks have in common?

Monday, 23 March 2015

Hidden GraphGems revisited: the 2.2 Meta-graph

The Meta-graph as from Neo4j 2.2

Sunday, 22 March 2015

Starting the week with a podcast interview: Dr. Jim Webber

Friday, 20 March 2015

Podcast Interview with Ron van Weverwijk, GoDataDriven

Tuesday, 17 March 2015

Podcast Interview with Ryan Boyd, Neo Technology

Labels

Blogarchive

Metricool