Friday 13 December 2013

Business Continuity Management - a perfect fit for Graphs!

At one of our recent Graph-Cafe meetup events, I had the pleasure of spending some time with a lovely gentleman from a large corporation that was into a profession that I had never heard of: Business Continuity Management. It’s always interesting to learn new things, but even more interesting it became when this fine gentleman started explaining to me that BCM is actually all about graphs. Google defines it as
"Business Continuity Management is a holistic process that identifies both potential threats and the impacts to an organization of their normal business operations should those threats be realized."
But what does that mean? When you think about it some more, you quickly realise that it’s all about the relationships between different parts of a business, and understanding and managing the relationships between these parts in such a way so that the business can run as continuously as possible. Seems obvious? Well - it’s not. Because how do you define “a business”? What does “continuous” mean? And what does that have to do with graphs?

Understanding your business - creating a model

This courteous gentleman - I cannot name him for obvious reasons - was having a little trouble getting started with neo4j, and so we decided to work together. I would create a lovely neo4j dataset for him, and he would help us document and present the use case. So we started with the obvious question: how do we plan for Business Continuity? By understanding our business, right! We have to get a grip on how our processes, departments, applications, physical environments, etc interact - and how we can model this as a graph.


Luckily, my “partner in crime” knew what he was doing. He had already thought of the model, and had created a set of MS Excel files that would accurately represent how business processes, process, business lines/departments, buildings and applications would interact and depend on each other. And: since we are talking about assuring the continuity of the business, he even had a quantitative measure of the importance of business processes and processes - the recovery time objective. You can see from the model how easy it is to represent these intricate relationships, as a graph. So how to go about importing this data into neo4j, so that we could ask some interesting questions?

Loading the data: Spreadsheets rule!

As you can probably tell from some of my previous posts, there are many ways to import data into neo4j. But since the source data in this particular case was already in spreadsheet format, I decided to use the good old spreadsheet technique. Just add a column to my excel sheets, use string concatination to generate Cypher statements based on cell contents, and then copy/paste the resulting Cypher queries into the neo4j-shell - and we’re done. Easy!




Once we have the data in neo4j, the fun can actually begin - and the neo4j browser is going to be a big part of that.

A first look at the BCM data

Let’s explore the newly created dataset a bit, by running a couple of simple queries. The first one actually is a standard query saved in the neo4j browser:

Show the data model: what is related to what, and how?

MATCH (a)-[r]->(b)
RETURN DISTINCT head(labels(a)) AS This, type(r) AS To, head(labels(b)) AS That
ORDER BY This
LIMIT 100




So this means that the import basically worked well :) …

Impact analysis: the complex what-if question

The real objective of the BCM use case for graph databases, however, is not just about playing around with the data - it’s about understanding Impact. A broad field of business and scientific understanding, and a very active use case for neo4j. Essentially, what we are talking about here are complex, densely connected data structures in which we want to understand the effects of change in that structure. What happens to the rest of the graph, if one element of the graph would change? What happens if it would disappear? What happens if … What if?
These kinds of dependency analysis is not new. We have had people discuss it with regards to source code analysis, web services, telecom, railway planning, and many other domains. But to apply it to a business-as-a-whole was very new to me - and fascinating for sure.
Let’s look at a couple of examples.

Which Applications are used in which buildings

What would happen to specific employees located in specific buildings if a particular application would “die”?

MATCH (n:Application)<-[:USES]-(m:Process)-[:USED_BY]->(l:BusinessLine)->[:LOCATED_IN]->(b:Building)
RETURN DISTINCT n,b
limit 10;


Obviously this is a quite a broad query, with a lot of different results. But by using LIMIT we can start looking into some specifics, and use a graphical visualisation to make this all less difficult to grasp.



Or another example:

What BusinessProcesses would be affected by a fire at location Loc_100

Let’s use a “shortestpath” calculation to find this:


MATCH p = ShortestPath((b:Building {name:"Loc_100"})-[*..3]-(bp:BusinessProcess))
RETURN p;


and immediately we get a very easy-to understand answer.

and maybe one more example:


Which applications that are used by a Business Process that has an RTO of 0-2hrs would be affected by a fire at Loc_100


MATCH (rto:RTO {name:"0-2 hrs"})<-[:BUSINESSPROCESS_HAS_RTO]-(bp:BusinessProcess),
p1=ShortestPath(bp-[*..3]-(b:Building {name:"Loc_100"})),
p2=ShortestPath(bp-[*..2]-(a:Application))
RETURN p1,p2,rto;


And then for some reasoning - sortof



Like with any domain, understanding the meaning of the concepts expressed there is very important. It will allow us to do “reasoning”, and potentially plug holes in our data structures that do not really make sense and may need corrective action.


In this particular case, I stumbled upon the simple understanding that
  • if business processes have a recovery time objective,
  • and processes have a recovery time objective,
  • and business processes are made up of (atomic) sub-processes
  • then therefore it should follow that the RTO of the business process can never be smaller than, or even equal to, the RTO of the constituting processes.


So let’s look for this using the following query to see if there are any cases in our organisation that violate this simple reasoning:


MATCH triangle=((bp:BusinessProcess)-[r1:BUSINESSPROCESS_HAS_RTO]->(rto:RTO)<-[r2:PROCESS_HAS_RTO]-(p:Process)<-[:CONTAINS]-(bp))
RETURN triangle LIMIT 10;


which returns the following graph:

Conceptually, this is a very valuable query, as it starts to illustrate much closer where the risk areas are for our BCM domain. This could really be a life-saving query!

Conclusion

I never thought of it this way, but business processes, especially in larger corporations, are very intertwined and networked. So if you want to better understand and manage these processes and better protect yourself from potential disruptions that may affect your entire business’ continuity - then look no further, graphs can help. Some of the queries that I prepared for this use case are quite complex and interesting - and you should definitely check them out and see what they mean for your business.


You can find the dataset and the relevant queries in this gist - make sure you use neo4j 2.0 to run these.


As always, I hope this is useful.


Cheers


Rik

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hello,
    This post is all about the business continuity.All the descriptions are just super.Nice blog...
    Davey Continuity Website

    ReplyDelete
  3. Thank you Rik. A brillant post, clear and straight forward. Must be read by any BCM practitioner and more important by any Crisis Manager to be able to answer the terrible question "And from this incident, what are the impacts?".
    Christian Houdebine - Bridge Conseil

    ReplyDelete