Friday, 27 March 2020

Supply Chain Management with graphs: part 3/3 - some SCM analytics

I've been looking forward to writing this: this is the last of 3 blogposts that I have been planning to write for weeks about my experiments with a realistic Supply Chain Management Dataset. There's two posts before this one:
• In the first post I found and wrangled a dataset into my favourite graph database, Neo4j
• In the second post I got acquainted with the dataset in a bit more detail, and I was able to do some initial querying on it to figure out what patterns I might be able to expose.
In this this third and last post I would like to get a bit more analytical with the dataset, and do some more detail investigation in order to better understand some typical SCM questions. Note that I am far from a Supply Chain specialist - I barely understand the domain, and therefore I will probably be asking some silly questions initially. But bear with me - and let's explore and learn, right?

Path analysis in SupplyChains: cost and time of paths

One of the obvious things that you would want to analyse in any supply chain, are the costs and time spent on every path through the chain. More than likely, you would want to optimize the paths that are performing worse than others - so it starts with analysing these costs. Here's how we can do that in Neo4j:

//look at cost of paths
match (sc:SupplyChain {id:8})-[:STARTS_WITH]->(startstage:Stage)
with startstage,sc
match path = (startstage)-[:PRECEDES*]->(endstage:Stage)
where (endstage)-[:ENDS]->(sc)
with path, reduce(sum=0, x in nodes(path) | sum+x.stageTime) as totalTime,
reduce(sum=0, y in nodes(path) | sum+y.stageCost) as totalCost
order by totalTime asc
limit 10;

the key of course is in the
reduce(sum=0, x in nodes(path) | sum+x.stageTime)
and
reduce(sum=0, y in nodes(path) | sum+y.stageCost)
functions, which graph the stageTime and stageCost properties of every Stage in the SupplyChain, and sum them up. This allows us to compare these costs/times quite easily:

First we order the results by totalTime ascending:

and then we compare that to ordering the results by totalCost:

In each of these cases I would like to understand the differences and also look at the most expensive / longest taking paths as well. Let's just switch the sort order:

It would be interesting to do some more analysis on this, and figure out which paths have a low cost AND a low throughput time. This would not be very difficult to do - but I will leave that to others to explore and have fun with. Let's move on to some real Graph Analytics.

Analysing the betweenness of supply chain stages

One of the main reasons why I thought bringing graph databases to bear on Supply Chain Management, was to leverage some interesting graph analytical metrics to better understand these supply chains. We all know, based on different mediatized case studies, that supply chains can be vulnerable to disruption: if one of our globablized supply chain components starts failing, very quickly we see the effect of this failure in other parts of the chain. This is what we call "Supply chain risk", and it has been the subject of some academic research. I found this paper for example: Mapping supply chain risk by network analysis of product platforms, which has some really interesting points:

One of the things they do there, is analyse the importance of Supply Chain stages using network analytics:
So based on that inspiration, I decided to run some interesting analytics on our dataset. The main metric that I would want to look at is Betweenness Centrality: it represents the degree to which nodes stand between each other, or how many times a given node would appear on the path between two other nodes. Others have noted that this may be a critical indicator for analysing supplier performance and risk profiles.

So needed to install the algo plugin (now also available as the Graph Data Science Library), and use the Neuler graphapp to run it really easily. Here's how I configured Neuler:

This generated the following queries for me to run in the Neo4j browser:

//Calculate the betweenness scores
:param label => ("Stage");
:param relationshipType => ("PRECEDES");
:param limit => (100);
:param config => ({concurrency: 8, direction: "Outgoing", writeProperty: "betweenness"});

CALL algo.betweenness(\$label, \$relationshipType, \$config);

MATCH (node:Stage)
WHERE not(node[\$config.writeProperty] is null)
RETURN node, node[\$config.writeProperty] AS score
ORDER BY score DESC
LIMIT \$limit;

This all runs really quickly, and then we can start looking at the Stage nodes that have the highest betweenness scores, by examining the betweenness property on the Stage nodes. Note that for clarity, I have added the BETWEENNESSCHAMP label to the node with the highest betweenness score.

//take a look at the nodes with the highest betweenness
MATCH (s:Stage)
WITH s
ORDER BY s.betweenness DESC
LIMIT 1
SET s:BETWEENNESSCHAMP
WITH s
MATCH path = ((sc:SupplyChain)-[:STARTS_WITH|PRECEDES|ENDS*]->(s))
where s in nodes(path)
with sc,s
limit 1
match path = (sc)-[:STARTS_WITH]->(startstage:Stage)-[:PRECEDES*]->(endstage:Stage)-[:ENDS]->(sc)
return path, s
limit 100;

Running that query looks like this:

So if we then apply this to a particular SupplyChain, and look for the specific Stage in that SupplyChain that has the highest betweenness score, we could potentially see some more interesting localized bottlenecks that apply to that specific SupplyChain. Let's see how that would work in a query:

//take a look at the nodes with the highest betweenness in a particular supply chain
MATCH path = ((sc:SupplyChain {id:19})-[:STARTS_WITH|PRECEDES*]->(s))
where s in nodes(path)
with sc,s
match (s)
with s,sc
order by s.betweenness DESC
limit 1
set s:BETWEENNESSCHAMP
with s, sc
match path = (sc)-[:STARTS_WITH]->(startstage:Stage)-[:PRECEDES*]->(endstage:Stage)
where not ((endstage)-[:PRECEDES]->())
return path, s
limit 100;

Betweenness or degree?

Finally, I thought it would be interesting to compare the betweenness score that we calculated above, with a centrality metric that is a lot simpler: the degree of a Stage node. So let's see how that would work. I am going to use the apoc.node.degree function to do that really easily. Here's what the query looks like:

//take a look at the degree of the nodes with the highest betweenness
MATCH path = ((sc:SupplyChain {id:19})-[:STARTS_WITH|PRECEDES|ENDS_WITH*]->(s))
where s in nodes(path)
with sc,s
match (s)
return distinct s.name, apoc.node.degree(s) as degree, s.betweenness as betweenness
order by betweenness DESC
limit 10;

This returns very quickly:

And I can also look at it the other way around, bu ordering by degree:

//take a look at the betweenness of the nodes ordered by degree
MATCH path = ((sc:SupplyChain {id:19})-[:STARTS_WITH|PRECEDES|ENDS_WITH*]->(s))
where s in nodes(path)
with sc,s
match (s)
return distinct s.name, apoc.node.degree(s) as degree, s.betweenness as betweenness
order by degree DESC
limit 10

And then we do see some differences:

So it's pretty clear that different centrality metrics will give you different insights, also in this dataset.

That brings me to an end of this blogpost series. I hope it was useful for you, and that it serves as a starting point for people with actual Supply Chain Management expertise to explore the power of graphs in this domain. The queries and scripts in this blogpost are of course on github for you to play with.

As always - would love to hear from you.

All the best

Rik