Showing posts with label neo4j browser. Show all posts
Showing posts with label neo4j browser. Show all posts

Tuesday, 1 March 2016

Podcast Interview with Oskar Hane, Neo Technology

Here's another great conversation with one of the Neo4j Engineers that is working on the coolest product since sliced bread: me and Oskar had a great conversation about his daily work on the Neo4j Browser - one of the most loved components of a very loved product. Here's out chat for you:


Here's the transcript of our conversation:
RVB: 00:02 Hello everybody. My name Rik, Rik Van Bruggen, from Neo Technology. Here we are recording another podcast session. Today, we have a really interesting guest, all the way from BorĂ¥s in Sweden, and it's one of our developers on the Neo4j engineering team. It's Oskar Hane. Hi Oskar.
OH: 00:20 Hi Rik.
RVB: 00:21 Hey, good to have you on the podcast. Thanks for making the time.
OH: 00:25 Thanks for having me.
RVB: 00:27 It's great. Oskar, one of the reasons I was really excited to get you on the podcast was because Neo4j has this wonderful front-end tool that I use a lot, everyone uses it a lot, called the Neo4j browser. I think you're one of the guys behind that, aren't you?
OH: 00:46 Yes, since about one and a half years, I've been on that UX team, what we call it internally.
RVB: 00:54 How did you get to Neo, who are you, where did you come from, how did you get to Neo?
OH: 01:01 Actually, it's quite a funny story, I think. I was browsing Twitter two years ago, and I think that one of the guys I was following retweeted something from a guy called Emil Eifrem, send that--

RVB: 01:21 Who is that guy?
OH: 01:22 By then, I didn't know. I didn't know [anything?] at all, and he said that, "We're looking for Java developers." I was a freelance contractor by then, and I was just-- a long contract, so I was looking for Neo4j, so I sent him a tweet and said, "Don't you have any openings for JavaScript because I'm not a Java guy?" And he was like, "I think we do, actually. Talk to Magnus!", another guy inside Neo. They set up interviews. I actually have like seven or eight interviews before I joined in August of that year.
RVB: 02:16 Have you used Neo4j before or did you know about the product before, or how did you know about it?
OH: 02:22 No, I didn't, but of course, when I talked to Magnus the first time-- well, before I talked to Magnus, I downloaded and started using it. I was blown away by the user interface of the client or the Neo4j Browser compared to-- I've been working with MySQL and other databases, and their user interface--
RVB: 02:53 Is a little bit more old-fashioned?
OH: 02:55 Yeah, so I was completely blown away. I'm thinking, "Wow, I'd really like to work on this product," and I managed to get in.
RVB: 03:07 And the rest is history, right?
OH: 03:09 Yeah [chuckles].
RVB: 03:10 Nowadays, it's you and one or two other people that work on that part of Neo4j, right?
OH: 03:16 Yeah, currently, we're two. The other guy, Mark Peace sits in the London office, and in two weeks, another person is joining us, so it'll be great to have the three of us.
RVB: 03:33 What do you think is so cool about Neo4j, in general, but maybe the browser, more specifically, why do you think it's such a cool product to work on, what makes it so nice?
OH: 03:47 As you were talking a little bit that it's modern, in a way, compared to MySQL, and the browser is, you can visualize the data in a way that you're not able to visualize in a relational database, of course, an area that Neo4j or graph databases we have fixed relationships between nodes.
RVB: 04:23 The graph data model, right?
OH: 04:25 Yeah, exactly. So we have this graph visualisation in the Neo4j browser where we can run animations and have a gravity physics engine to display the data in a nice way.
RVB: 04:44 I think one of the cool things about the browser, for me as a user, is that it combines so many nice perspectives. On the one hand, it's a visualisation tool, on the other hand, it's a query troubleshooting tool, on the other hand, I have a learning tool. There's so many nice things coming together, I think. I'm assuming that was intentional.
OH: 05:09 Yeah, of course, it's sort of a platform for-- the target audience was from the beginning, at least, for developers who are writing apps for the Neo4j database, then we can use this Neo4j browser, as you say, they can see the data in many different ways. We can troubleshoot if we have any query program, so we can just run ad hoc queries to get interesting stuff, and you can learn the query language, Cypher, within the browser.
RVB: 05:50 I have off the cuff type of question for you, what's your favorite feature, if you have any? It's probably the most recent one [chuckles], but what is one of your most-- the nicest thing about the browser?
OH: 06:08 Wow, that's really a hard question. Even though it could be limited in how big graph you can do it at once, but I think the visualisation is actually my favorite because I'm basing that element that other database types that don't have, that makes us unique.
RVB: 06:37 My personal, probably, is the fact that you have this rolling history. You have this long page of query histories that you can sort of go through. I don't know if you've ever seen one of my graph karaokes but that's what I use all the time...
OH: 06:54 I have that actually, I have that karaoke [laughter].
RVB: 06:59 I'm a fan of your work, Oskar, I really am.
OH: 07:03 Thanks.
RVB: 07:05 One more thing, if you don't mind, where is this going? What does the future hold, both from the product in general, the browser, more specifically, and maybe the industry that we work in, any comments on that?
OH: 07:23 I can talk for the browser at least that we-- we're talking about having it extensible in a way that somehow that you can write like from the plugins maybe for the browser, and load in an easy way as a user. I very much like to see that, at least, that if you want to visualize your data in a special way, maybe your own [row?] or table format that only you see, you could create a front-end plugin, so to speak, just have it that way.
RVB: 08:10 So like an extension, but then part of the browser, so to speak, right?
OH: 08:14 Yeah, something like that. But we're moving some of the content out of the browser so you can bring it back in. We're heading that way.
RVB: 08:27 You also mentioned to me earlier that for future versions of Neo4j, the browser is not going to be like part of the same package as Neo4j, is that also something that's coming up then?
OH: 08:37 Well, maybe, I'm not 100% sure where we are at the decisions yet, but what we know is that, at least, we have among the files, the repository for the browser out of the Neo4j main database, but mostly, having it as a dependency. So when you download Neo4j, you get the browser inside it, just as you do now, but the code isn't mixed with the database code.
RVB: 09:08 It's easier to pull it out if we want or need to do that, right?
OH: 09:12 Yeah, exactly.
RVB: 09:13 Very cool. What about visualization, in general, is there-- there's so much stuff happening in that domain, graph visualization, it's a very hot topic, I think, how do you look at that?
OH: 09:28 Yeah, it is, and it's super hard to show visualization that are huge which has functions and elements and relationships within it. I'm very impressed by many of the actors in the industry.
RVB: 09:51 It very quickly becomes like hairball, doesn't it [chuckles]?
OH: 09:56 It can hold your web browser as well.
RVB: 10:00 Totally, there's some really cool visualization tools out there. We've been partnering with externally, so hopefully, that's complimentary, it should help us.
OH: 10:13 Definitely, we're a small team. I think our visualisation is good for some use cases, but maybe not for some.
RVB: 10:26 Well, very good. We'll wrap up the recording here, if you don't mind. We'd like to keep this podcast short and snappy. Thank you so much for coming online, Oskar, really appreciate it.
OH: 10:40 Thanks.
RVB: 10:41 I'll look forward to seeing you maybe at GraphConnect Europe or some other occasion very soon.
OH: 10:47 I definitely will be at GraphConnect Europe.
RVB: 10:50 Fantastic. I'll look forward to seeing you there. Thank you.
OH: 10:52 Likewise, thank you.
RVB: 10:53 Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday, 13 December 2013

Business Continuity Management - a perfect fit for Graphs!

At one of our recent Graph-Cafe meetup events, I had the pleasure of spending some time with a lovely gentleman from a large corporation that was into a profession that I had never heard of: Business Continuity Management. It’s always interesting to learn new things, but even more interesting it became when this fine gentleman started explaining to me that BCM is actually all about graphs. Google defines it as
"Business Continuity Management is a holistic process that identifies both potential threats and the impacts to an organization of their normal business operations should those threats be realized."
But what does that mean? When you think about it some more, you quickly realise that it’s all about the relationships between different parts of a business, and understanding and managing the relationships between these parts in such a way so that the business can run as continuously as possible. Seems obvious? Well - it’s not. Because how do you define “a business”? What does “continuous” mean? And what does that have to do with graphs?

Understanding your business - creating a model

This courteous gentleman - I cannot name him for obvious reasons - was having a little trouble getting started with neo4j, and so we decided to work together. I would create a lovely neo4j dataset for him, and he would help us document and present the use case. So we started with the obvious question: how do we plan for Business Continuity? By understanding our business, right! We have to get a grip on how our processes, departments, applications, physical environments, etc interact - and how we can model this as a graph.


Luckily, my “partner in crime” knew what he was doing. He had already thought of the model, and had created a set of MS Excel files that would accurately represent how business processes, process, business lines/departments, buildings and applications would interact and depend on each other. And: since we are talking about assuring the continuity of the business, he even had a quantitative measure of the importance of business processes and processes - the recovery time objective. You can see from the model how easy it is to represent these intricate relationships, as a graph. So how to go about importing this data into neo4j, so that we could ask some interesting questions?

Loading the data: Spreadsheets rule!

As you can probably tell from some of my previous posts, there are many ways to import data into neo4j. But since the source data in this particular case was already in spreadsheet format, I decided to use the good old spreadsheet technique. Just add a column to my excel sheets, use string concatination to generate Cypher statements based on cell contents, and then copy/paste the resulting Cypher queries into the neo4j-shell - and we’re done. Easy!




Once we have the data in neo4j, the fun can actually begin - and the neo4j browser is going to be a big part of that.

A first look at the BCM data

Let’s explore the newly created dataset a bit, by running a couple of simple queries. The first one actually is a standard query saved in the neo4j browser:

Show the data model: what is related to what, and how?

MATCH (a)-[r]->(b)
RETURN DISTINCT head(labels(a)) AS This, type(r) AS To, head(labels(b)) AS That
ORDER BY This
LIMIT 100




So this means that the import basically worked well :) …

Impact analysis: the complex what-if question

The real objective of the BCM use case for graph databases, however, is not just about playing around with the data - it’s about understanding Impact. A broad field of business and scientific understanding, and a very active use case for neo4j. Essentially, what we are talking about here are complex, densely connected data structures in which we want to understand the effects of change in that structure. What happens to the rest of the graph, if one element of the graph would change? What happens if it would disappear? What happens if … What if?
These kinds of dependency analysis is not new. We have had people discuss it with regards to source code analysis, web services, telecom, railway planning, and many other domains. But to apply it to a business-as-a-whole was very new to me - and fascinating for sure.
Let’s look at a couple of examples.

Which Applications are used in which buildings

What would happen to specific employees located in specific buildings if a particular application would “die”?

MATCH (n:Application)<-[:USES]-(m:Process)-[:USED_BY]->(l:BusinessLine)->[:LOCATED_IN]->(b:Building)
RETURN DISTINCT n,b
limit 10;


Obviously this is a quite a broad query, with a lot of different results. But by using LIMIT we can start looking into some specifics, and use a graphical visualisation to make this all less difficult to grasp.



Or another example:

What BusinessProcesses would be affected by a fire at location Loc_100

Let’s use a “shortestpath” calculation to find this:


MATCH p = ShortestPath((b:Building {name:"Loc_100"})-[*..3]-(bp:BusinessProcess))
RETURN p;


and immediately we get a very easy-to understand answer.

and maybe one more example:


Which applications that are used by a Business Process that has an RTO of 0-2hrs would be affected by a fire at Loc_100


MATCH (rto:RTO {name:"0-2 hrs"})<-[:BUSINESSPROCESS_HAS_RTO]-(bp:BusinessProcess),
p1=ShortestPath(bp-[*..3]-(b:Building {name:"Loc_100"})),
p2=ShortestPath(bp-[*..2]-(a:Application))
RETURN p1,p2,rto;


And then for some reasoning - sortof



Like with any domain, understanding the meaning of the concepts expressed there is very important. It will allow us to do “reasoning”, and potentially plug holes in our data structures that do not really make sense and may need corrective action.


In this particular case, I stumbled upon the simple understanding that
  • if business processes have a recovery time objective,
  • and processes have a recovery time objective,
  • and business processes are made up of (atomic) sub-processes
  • then therefore it should follow that the RTO of the business process can never be smaller than, or even equal to, the RTO of the constituting processes.


So let’s look for this using the following query to see if there are any cases in our organisation that violate this simple reasoning:


MATCH triangle=((bp:BusinessProcess)-[r1:BUSINESSPROCESS_HAS_RTO]->(rto:RTO)<-[r2:PROCESS_HAS_RTO]-(p:Process)<-[:CONTAINS]-(bp))
RETURN triangle LIMIT 10;


which returns the following graph:

Conceptually, this is a very valuable query, as it starts to illustrate much closer where the risk areas are for our BCM domain. This could really be a life-saving query!

Conclusion

I never thought of it this way, but business processes, especially in larger corporations, are very intertwined and networked. So if you want to better understand and manage these processes and better protect yourself from potential disruptions that may affect your entire business’ continuity - then look no further, graphs can help. Some of the queries that I prepared for this use case are quite complex and interesting - and you should definitely check them out and see what they mean for your business.


You can find the dataset and the relevant queries in this gist - make sure you use neo4j 2.0 to run these.


As always, I hope this is useful.


Cheers


Rik

Friday, 22 November 2013

Meet this "Tubular" graph!

Many of us know London. Those of us that have visited London will know "the Tube", "the Underground" - simply the fastest and most efficient way to get around (although I must admit that Hailo has been quite a contender lately...). Beautiful city, lovely place to work, and since I started working for Neo, it feels a bit like my home away from home.

The Tube: A Great Graph

As you can easily imagine, or just plainly see from looking at any of the maps of the tube, the Underground really is a very sophisticated system, and can only be described as a very sophisticated graph. We always refer to it - in our Neo4j presentations - as the perfect example of how one-page-graphs can easily represent and provide *insight* into complex system ... without having to have a PhD in maths. Literally: almost everyone can use the tube - almost everyone can use a graph.

Finding a nice "tubular" dataset

Since we talk about this example all the time, and since I am indeed an avid, non-native tube-user, I thought it would be interesting to look at how I could fit the Tube system into a neo4j database. It took me a while, but of course the data is out there: this page links to this spreadsheet that has a very nice starting point. It contains the Line, the Direction, the Stations, the Distance between stations, and then 3 different time measurements between the stations.


Importing this into a neo4j database is really, really easy.

Creating a neo4j Tube database

First things first: from the above spreadsheet, we would probably be best off to transform it into a .csv file. Easy peasy in Excel: the result is over here. Once we have that, we can use the ever so awesome neo4j-shell-tools (the 2.0 version is over here, in case you can't find it!) to import the data into a nice little graph model:

Kudos to Alistair Jones for making Arrows - it's actually very useable these days :)) ...

In other words: Stations have to be unique, are connected by one or more "Lines" in two directions, and the "Lines" have a "Direction" property (east, west, north, south...), a "Time" between stations property (which can be different in opposite directions!), and a "Distance" between stations property.

The import script for the .csv file is quite simple, as it completely leverages the new neo4j 2.0RC1 way of working:
  • it uses a schema constraint to ensure that the stations are unique
  • it uses the new Match-syntax (with property-matching in the pattern instead of in a where clause)

All in all it is very simple and effective. The resulting graph.db directory is over here.

Exploring the tube in the neo4j browser

Ever since it's introduction at GraphConnect San Francisco, the neo4j browser has become my favourite place to play around with neo4j and cypher. One of it's coolest features is the ability to apply stylesheets to your graph visualisations. So I wanted to apply this to my new tube-graph, and use the "official" tube-line colours in the browser. 


LINETRUE HEXADECIMALWEB SAFE HEXADECIMAL
Bakerloo
#B36305
#996633
Central
#E32017
#CC3333
Circle
#FFD300
#FFCC00
District
#00782A
#006633
Hammersmith and City
#F3A9BB
#CC9999
Jubilee
#A0A5A9
#868F98
Metropolitan
#9B0056
#660066
Northern
#000000
#000000
Piccadilly
#003688
#000099
Victoria
#0098D4
#0099CC
Waterloo and City
#95CDBA
#66CCCC
So then all I had to do was to download the .grass file from the browser, and start editing the "relationship" sections. In the example below, the .Circle and .Central are the names of the relationship types "Circle" and "Central". Logical.

You can download the full .grass file that I created from over here.

Nice: if I start exploring the surrounding tube network for "London Bridge" station, I quickly get a feel for the network:


But of course the real fun begins with the queries.

Exploring the Tube with Cypher

Obviously I don't have the technical skills - at all - to develop anything like a route planner for the London Underground. But: using the dataset that we just created, it's quite easy to see how it would be very doable to create something like that. Let's look at some of the queries that I created:

Show the different underground lines:


Show the most densely connected underground station

With "densely connected" meaning the most different underground lines passing through it.

And then you can drill into this really easily and explore some more:

And finally: pathfinding

Of course we can do some rudimentary pathfinding in Cypher. But it's rudimentary - and just included for fun. Let's say that I would want to go from Tower Hill to Southwark (one of the most tedious tube connections that I would take sometimes to get to our London office).


Anyone a bit familiar with London knows that this is "b*ll*cks", and noone would ever do that. The right thing (I think) to do is to take the district/circle lines from Tower Hill to Blackfriars - and then just walk across the bridge to the office. Easy.


I have included some other pathfinding queries in the gist - but I am pretty sure that they would need work :) ...

That's about it for now. I think I have demonstrated how easy it is make the Great Tube Graph even greater by putting it into a graph database like neo4j - and how you could easily use something like the neo4j browser to find your way around one of the world's most complicated networks. 

Hope you enjoy!

Rik