Thursday 22 January 2015

Innovation Pitch

Some companies are interesting. I mean, I myself have been working in startup environments for (djeez! has it been so long!) decades now, but some large organisations are equally interesting - especially in today's "information age". Every so often I get to meet fascinating people that are working in an industry that is literally being thrown upside down because of the modern technology swell of connectedness, mobile information, demanding customers, and innovative applications. After years, centuries sometimes, of successful business ventures in the "good old days", they find themselves in a place where they are sitting on wonderful assets, with real value, but also facing a growing need to re-assess how it all fits in this new age of digitalism. They need to innovate.

Innovation is a hard nut to crack. I am not an expert, but when I read the "Innovator's dilemma" a few year's ago, it became blatantly clear to me that innovation does not come natural to a large organisation. It simply doesn't. There's all kinds of internal and external forces that actually make it tremendously hard for large organisations to truly innovate.

That's probably why I personally find Startup organisations more my cup of tea, but it's also why I am truly impressed and greatly sympathetic when I see large organisations make a truly consolidated effort to innovate.


Yesterday, I was part of such an effort. Wolters Kluwer, global publishing powerhouse with a long standing history, headquartered in the Netherlands, organised an Innovation Pitch event for their executive team. Almost all of their board members and execs were there, and I had 10 minutes to "pitch Neo4j". Interesting.

I thought about this a bit - and I decided to go for the "high road". The pitch was not meant to sell product, not meant to position Neo4j even - but really was geared to getting these top-level international execs to think differently - to open up their minds to the wonderful world of graphs. I used the example of "How Wolves Change Rivers" to help illustrate that - as seen over here, or in the GraphGist over here.



I recorded the pitch at home - see below. The actual presentation included some Q&A and took a bit longer in total - but it was pretty much like this:



Slides are over here:



Probably a ton of other things that I could have said - but my main goal was to be remembered and get a conversation going with Wolters Kluwer. I would love to get your feedback, if any.

Cheers

Rik




Monday 12 January 2015

Graph Karaoke using "Natural Language Analytics": Billie Jean

Last week, my friend and colleague Michael wrote a really interesting blogpost on natural language analytics using Neo4j. He used the One Ring poem as an example of how you could use Cypher to analyse a text file and put it into a Neo4j database for some advanced analytics. That immediately made me think about my Graph Karaoke Playlist, and how I could use this technique for some more Graph Karaoke generation. Wouldn't that be nice? More graph karaoke == good!

So in this post I will show you how easy it is to get this done. A couple of quick steps is all what is needed. Let's run through it and show you how it's done.

Loading a song

The first thing to do, as always, was picking a song. So this time, my kids picked it:
Billie Jean, by the King of Pop himself. Not wanting to sound pretentious, but I think it's great for my kids to big fans of that kind of music - seems like all of our educational efforts are yielding some results :) ...

Then I picked up the lyrics of the song over here, and put it into a google doc. The reason why, is that I wanted to do one small manipulation to the file in order to be able to use it for Karaoke: I added the Songpart and the Songpartsentence in two additional columns. Plus: the Google sheet has a very easy conversion into a csv file that we can then point the Load CSV process to.

Customizing the query

With that CSV file available, I then proceeded to customize Michael's query. Here it is:

 //create the karaoke graph  
 load csv with headers from "https://docs.google.com/a/neotechnology.com/spreadsheets/d/1DLu2bl1ZO7Zm8zU1UXNCDZGxsnBkicAJD4J-FSbVXLE/export?format=csv&id=1DLu2bl1ZO7Zm8zU1UXNCDZGxsnBkicAJD4J-FSbVXLE&gid=0" as csv  
 with csv.Songpart as songpart, csv.Songpartsentence as songpartsentence, csv.Songsentence as row  
 unwind row as text  
 with songpart, songpartsentence, reduce(t=tolower(text), delim in [",",".","!","?",'"',":",";","'","-"] | replace(t,delim,"")) as normalized  
 with songpart, songpartsentence, [w in split(normalized," ") | trim(w)] as words  
 unwind range(0,size(words)-2) as idx  
 MERGE (w1:Word {name:words[idx]})  
 MERGE (w2:Word {name:words[idx+1]})  
 MERGE (w1)-[r:NEXT {songpart:toInt(songpart), songpartsentence:toInt(songpartsentence)}]->(w2)  
  ON CREATE SET r.count = 1 ON MATCH SET r.count = r.count +1  

Let's run through this query to make it easier for you to digest. We start with the "load csv" statement. We point to the csv download link mentioned above, user the first row as headers and identify that with an identifier called "csv".
 load csv with headers from "https://docs.google.com/a/neotechnology.com/spreadsheets/d/1DLu2bl1ZO7Zm8zU1UXNCDZGxsnBkicAJD4J-FSbVXLE/export?format=csv&id=1DLu2bl1ZO7Zm8zU1UXNCDZGxsnBkicAJD4J-FSbVXLE&gid=0" as csv  

Then we pull the csv into three different sets that we can address separately with separate identifiers:
 with csv.Songpart as songpart, csv.Songpartsentence as songpartsentence, csv.Songsentence as row   

Then we use the Cypher "unwind" operator to create separate rows out of the "row" collection, and call these rows containing lyrics "text". 
 unwind row as text  

Afterwards, we are gong to be using "reduce" to remove punctuation marks and then split the text into individual lyrical words:
 with songpart, songpartsentence, reduce(t=tolower(text), delim in [",",".","!","?",'"',":",";","'","-"] | replace(t,delim,"")) as normalized  
 with songpart, songpartsentence, [w in split(normalized," ") | trim(w)] as words  

Lastly, we want to write these words into the graph. In order to do that, we are going to use "unwind" to generate an in-memory index, and then stepping through every sentence to generate the sequences. We do that with "Merge", first for the words, and then for the relationships. On every relationship, we will "karaoke-ize" the graph by assigning "songpart" and "songpartsentence" identifiers to every relationship.
 UNWIND range(0,size(words)-2) as idx  
 MERGE (w1:Word {name:words[idx]})  
 MERGE (w2:Word {name:words[idx+1]})  
 MERGE (w1)-[r:NEXT {songpart:toInt(songpart), songpartsentence:toInt(songpartsentence)}]->(w2)  
  ON CREATE SET r.count = 1 ON MATCH SET r.count = r.count +1  

That was easy!

So where is the KARAOKE???

Hah! That's what you came here for huh? Well, here's the result. 

I have put the queries on a gist so that you can take a look at it yourself. If you have any comments, then please let me know!

Cheers

Rik