Friday 26 September 2014

Another Graph Karaoke: Tom Waits

End of the quarter for me, and instead of biting my nails or pace myself waiting for some of our customers' orders to come in, I thought I would have another go at Graph Karaoke. Still waiting for that new discipline to catch on - but as long as I am having fun, right :)))

Here's the scoop: I have been a longtime fan of Tom Waits. Before Spotify came along, it was was one of the few artists that I basically bought ALL the records off. Good and bad. Actually he never made a truly bad record in my opinion, but that's a different topic :) ... and earlier in the week I came across one of my alltime favourite songs:
Such a wonderful, poetic and funny song - I just love it. The lyrics are over here, and I used these Cypher statements to import the song into my favourite graph database. Then all I had to do was create a little movie to share it with you - so here it is:



The queries that I used in the video, are also on github. I hope you half as much fun listening to it/watching it as I had creating it.

Cheers

Rik

Friday 19 September 2014

Graphs for HR Analytics

Yesterday, I had the pleasure of doing a talk at the Brussels Data Science meetup. Some really cool people there, with interesting things to say. My talk was about how graph databases like Neo4j can contribute to HR Analytics. Here are the slides of the talk:

I truly had a lot of fun delivering the talk, but probably even more preparing for it.

My basic points that I wanted to get across where these:
  • the HR function could really benefit from a more real world understanding of how information flows in its organization. Information flows through the *real* social network of people in your organization - independent of your "official" hierarchical / matrix-shaped org chart. Therefore it follows logically that it would really benefit the HR function to understand and analyse this information flow, through social network analysis.
  • In recruitment, there is a lot to be said to integrate social network information into your recruitment process. This is logical: the social network will tell us something about the social, friendly ties between people - and that will tell us something about how likely they are to form good, performing teams. Several online recruitment platforms are starting to use this - eg. Glassdoor uses Neo4j to store more than 70% of the Facebook sociogram - to really differentiate themselves. They want to suggest and recommend the jobs that people really want.
  • In competence management, large organizations can gain a lot by accurately understanding the different competencies that people have / want to have. When putting together multi-disciplinary, often times global teams, this can be a huge time-saver for the project offices chartered to do this. 
For all of these 3 points, a graph database like Neo4j can really help. So I put together a sample dataset that should explain this. Broadly speaking, these queries are in three categories:
  1. "Deep queries": these are the types of queries that perform complex pattern matches on the graph. As an example, that would something like: "Find me a friend-of-a-friend of Mike that has the same competencies as Mike, has worked or is working at the same company as Mike, but is currently not working together with Mike." In Neo4j cypher, that would something like this
 match (p1:Person {first_name:"Mike"})-[:HAS_COMPETENCY]->(c:Competency)<-[:HAS_COMPETENCY]-(p2:Person),  
 (p1)-[:WORKED_FOR|:WORKS_FOR]->(co:Company)<-[:WORKED_FOR]-(p2)  
 where not((p1)-[:WORKS_FOR]->(co)<-[:WORKS_FOR]-(p2))  
 with p1,p2,c,co  
 match (p1)-[:FRIEND_OF*2..2]-(p2)  
 return p1.first_name+' '+p1.last_name as Person1, p2.first_name+' '+p2.last_name as Person2, collect(distinct c.name), collect(distinct co.name) as Company;  

  1. "Pathfinding queries": this allows you to explore the paths from a certain person to other people - and see how they are connected to eachother. For example, if I wanted to find paths between two people, I could do
 match p=AllShortestPaths((n:Person {first_name:"Mike"})-[*]-(m:Person {first_name:"Brandi"}))  
 return p;  

and get this:
Which is a truly interesting and meaningful representation in many cases.
  1. Graph Analysis queries: these are queries that look at some really interesting graph metrics that could help us better understand our HR network. There are some really interesting measures out there, like for example degree centrality, betweenness centrality, pagerank, and triadic closures. Below are some of the queries that implement these (note that I have done some of these also for the Dolphin Social Network). Please be aware that these queries are often times "graph global" queries that can consume quite a bit of time and resources. I would not do this on truly large datasets - but in the HR domain the datasets are often quite limited anyway, and we can consider them as valid examples.
 //Degree centrality  
 match (n:Person)-[r:FRIEND_OF]-(m:Person)  
 return n.first_name, n.last_name, count(r) as DegreeScore  
 order by DegreeScore desc  
 limit 10;  
   
 //Betweenness centrality  
 MATCH p=allShortestPaths((source:Person)-[:FRIEND_OF*]-(target:Person))  
 WHERE id(source) < id(target) and length(p) > 1  
 UNWIND nodes(p)[1..-1] as n  
 RETURN n.first_name, n.last_name, count(*) as betweenness  
 ORDER BY betweenness DESC  
   
 //Missing triadic closures  
 MATCH path1=(p1:Person)-[:FRIEND_OF*2..2]-(p2:Person)  
 where not((p1)-[:FRIEND_OF]-(p2))  
 return path1  
 limit 50;  
   
 //Calculate the pagerank  
 UNWIND range(1,10) AS round  
 MATCH (n:Person)  
 WHERE rand() < 0.1 // 10% probability  
 MATCH (n:Person)-[:FRIEND_OF*..10]->(m:Person)  
 SET m.rank = coalesce(m.rank,0) + 1;  

I am sure you could come up with plenty of other examples. Just to make the point clear, I also made a short movie about it:

The queries for this entire demonstration are on Github. Hope you like it, and that everyone understands that Graph Databases can truly add value in an HR Analytics contect.

Feedback, as always, much appreciated.

Rik

Saturday 13 September 2014

Friend-or-Foe relationships in the Middle east.

This blogpost essentially references a GraphGist that I created. Look at it in a full window over here, or below: I hope that was interesting - let me know if you have any feedback.

Cheers

Rik

Thursday 11 September 2014

Graph Databases for Enterprise Architects - the webinar

The webinar that we did alst Tuesday is now also available as a screencast. Here it is:



Hope you like it!

Cheers

Rik

Tuesday 9 September 2014

Graph Databases for Enterprise Architects

Today, I had the opportunity to do a short webinar where I tried to articulate why I think that a graph database like Neo4j really can contribute to a better Enterprise Architecture. I'd like to take some time to explain that here on the blog as well. I think there are three big reasons why Graphs are Good for the Enterprise Architect, so let's go through them and I will try to illustrate it with examples.

Time to market

A modern Enterprise Architecture, these days, should enable quick business decisions. If business managers want to react to competitive threats, or better still, want to seize a new business opportunity, the architect’s response cannot be that “he can deliver that in 12 months from now”. That may have been ok in the past, but it’s not anymore. Agility is key, and being to implement modern-day functionality like social network integration, real-time recommendations, real-time fraud detection - just to name a few examples - needs to be quick.

One of the key things that a graph database brings to the table to enable short time to market is the excellent fit with these domains. There is a natural fit between these very interconnected domains and a graph representation. There is no need to translate back and forth from the domain into the relational storage paradigm, and that alone cuts down implementation efforts by an order of magnitude. Complex operations like pathfinding, recursive questions, complex pattern matching or deep joins - all of the above are much simpler in a graph database than in any other model - thereby enabling shorter time to market for the enterprise architect.

Here's a short demo illustrating how the Enterprise Architect can achieve shorter Time to Market:

Flexibility

We all know the cliché: “the only thing constant is change”. And we have all lived its reality: in the fast moving business world that we enable with our enterprise architecture, we have to be able to act quickly, react to changing circumstances, and be flexible in the way that we react so that we can learn, adapt and improve our performance. In the enterprise architect’s world, we have adopted things like “Agile” development methodologies for a reason: we understand that in the fast paced business environment, our IT systems have to be equally fast paced and able to adapt. IT systems can no longer feel like a straitjacket - they have to allow our business processes to evolve and adapt, in a flexible way.

Graph databases allow for that flexibility. Developers don’t have to spend valuable time trying to model what they don’t know, have the ability to grow their data models as their understanding of the problem domain grows, and can flexibly work with their data structures in many different situations. This in itself, is a huge added value to any enterprise architect who is tracking the constant change around them.
Here's a short demo illustrating this Flexibility:

Operational performance

In the same way that graph database often “enable” certain capabilities for specific use cases as described above, they just as often offer a very significant performance improvement over existing technologies. Some query patterns - like the deep/recursive join or the pathfinding operation - require an enormous amount of hardware/software horsepower in the traditional relational database world in order to deliver the results in a timeframe that users would accept. And even then, we all know cases where the query performance would become brittle and unpredictable under load.

This is where graph databases can really help. The same queries that were causing constant headache in the relational world, would predictably run like a breeze in a graph world. That’s a fantastic trait for an enterprise architecture that business managers can rely on. The fact that it will probably cost them less in terms of hardware and software - is of course a nice bonus that we should not underestimate. Everyone understands the benefits of a sound architecture - but when it actually saves money, business managers will really start paying attention.

Here's a short demo illustrating this operational performance:


I hope that I have made my point clear about this added value of Neo4j for the Enterprise Architect. If you have any feedback - please let me know.

All the best

Rik

Friday 5 September 2014

Why write a book about Neo4j?

Some of you may have noticed some noise recently about the little book that I wrote: Learning Neo4j. I added a link at the top of this page as well, with some more information. I obviously also wanted to also announce this publication on the blog, but while thinking about it - I thought it would also be interesting to go back and see why I wrote the book - my objectives. Time will tell if I will have achieved all of them - but anyways....

Here we go - in order of descending importance:



  1. I wanted to help the Neo4j community grow. For the past two years, I have had tremendous joy and excitement by working on different Neo4j related projects with community users AND commercial clients. But it has always struck me what a micro-cosmos it is that these "graphistas" work and live in. It seemed at times like I was part of an obscure cult of math-loving programmers with questionable personal hygiene... :) ... haha... But seriously: it's such a niche. The world of Enterprise IT is out there, and if the Neo4j project is to grow, it will need to look to new audiences. Not the astrophysicist with multiple PhD's that dreams Java-code - but the typical, Visual-Basic-loving Enterprise Developer. You will find that there is not a line of Java-code in the book. That is because 1) I don't know how to code, and 2) that was not the intention of the book. Graph Databases need to become easy to learn if they are to grow up.
  2. I thought it would be a cool personal experience. I have always enjoyed writing - it helps me get through the day, basically. Structure my thoughts, reflect on them, and all that. That's why I have a blog :) ... But writing a book is something different. It took me 7.5 months of daily work (sometimes hours at a time, sometimes just a few emails) to get it done - and there is a cool sense of achievement when it "gets done". I liked that a lot. To be honest: I think this is not the last book that I have written.
  3. I wanted to get some personal benefit from it. Whether it's in the form of recognition by the friendly folks at Neo Technology, or in a royalty payment that will pay for a nice Xmas ski-trip, or - and this is my big hope too - because I would be able to sell millions (!) of Euros worth of Neo4j Enterprise software as the result of someone picking up that book (see 1.). 

Those were probably the main reasons. And of course: Michael Hunger stimulating me to do it, and Ian Robinson giving me some pointers and ideas. 

Anyway. There you have it. It's out there, and I hope you like it. If you do - tell other people. If you don't - please tell ME!

Cheers

Rik