Risk Algebra – EA Voices

Link: http://rvsoapbox.blogspot.com/2022/05/risk-algebra.html

From Architecture, Data and Intelligence

In this post, I want to explore some important synergies between architectural thinking and risk management.

The first point is that if we want to have an enterprise-wide understanding of risk, then it helps to have an enterprise-wide view of how the business is configured to deliver against its strategy. Enterprise architecture should provide a unified set of answers to the following questions.

What capabilities delivering what business outcomes?
Delivering what services to what customers?
What information, process and resources to support these?
What organizations, systems, technologies and external partnerships to support these?
Who is accountable for what?
And how is all of this monitored, controlled and governed?

Enterprise architecture should also provide an understanding of the dependencies between these, and which ones are business-critical or time-critical. For example, there may be some components of the business that are important in the long-term, but could easily be unavailable for a few weeks before anyone really noticed. But there are other components (people, systems, processes) where any failure would have an immediate impact on the business and its customers, so major issues have to be fixed urgently to maintain business continuity. For some critical elements of the business, appropriate contingency plans and backup arrangements will need to be in place.

Risk assessment can then look systematically across this landscape, reviewing the risks associated with assets of various kinds, activities of various kinds (processes, projects, etc), as well as other intangibles (motivation, brand image, reputation). Risk assessment can also review how risks are shared between the organization and its business partners, both officially (as embedded in contractual agreements) and in actual practice.

Architects have an important concept, which should also be of great interest for enterprise risk management – the idea of a single point of failure (SPOF). When this exists, it is often the result of a poor design, or an over-zealous attempt to standardize and strip out complexity. But sometimes this is the result of what I call Creeping Business Dependency – in other words, not noticing that we have become increasingly reliant on something outside our control.

There are also important questions of scale and aggregation. Some years ago, I did some risk management consultancy for a large supermarket chain. One of the topics we were looking at was fridge failure. Obviously the supermarket had thousands and thousands of fridges, some in the customer-facing parts of the stores, some at the back, and some in the warehouses.

Fridges fail all the time, and there is a constant processes of inspecting, maintaining and replacing fridges. So a single fridge failing is not regarded as a business risk. But if several thousand fridges were all to fail at the same time, presumably for the same reason, that would cause a significant disruption to the business.

So this raised some interesting questions. Could we define a cut-off point? How many fridges would have to fail before we found ourselves outside business-as-usual territory? What kind of management signals or dashboard could be put in place to get early warning of such problems, or to trigger a switch to a “safety” mode of operation.

Obviously these questions aren’t only relevant to fridges, but can apply to any category of resource, including people. During the pandemic, some organizations had similar issues in relation to staff absences.

Aggregation is also relevant when we look beyond a single firm to the whole ecosystem. Suppose we have a market or ecosystem with n players, and the risk carried by each player is R(n). Then what is the aggregate risk of the market or ecosystem as a whole?

If we assume complete independence between the risks of each player, then we may assume that there is a significant probability of a few players failing, but a very small probability of a large number of players failing – the so-called Black Swan event. Unfortunately, the assumption of independence can be flawed, as we have seen in financial markets, where there may be a tangled knot of interdependence between players. Regulators often think they can regulate markets by imposing rules on individual players. And while this might sometimes work, it is easy to see why it doesn’t always work. In some cases, the regulator draws a line in the sand (for example defining a minimum capital ratio) and then checks that nobody crosses the line. But then if everyone trades as close as possible to this line, how much capacity does the market as a whole have for absorbing unexpected shocks?

Both here and in the fridge example, there is a question of standardization versus diversity. On the one hand, it’s a lot simpler for the supermarket if all the fridges are the same type, with a common set of spare parts. But on the other hand, having more than one type of fridge helps to mitigate the risk of them all failing at the same time. It also gives some space for experimentation, thus addressing the longer term risk of getting stuck with an out-of-date fridge estate. The fridge example also highlights the importance of redundancy – in other words, having spare fridges.

So there are some important trade-offs here between pure economic optimization and a more balanced approach to enterprise risk.