On chaos in enterprise-architecture

Link: http://weblog.tetradian.com/2012/11/21/on-chaos-in-ea/

What is chaos? What does that word mean, in practice? And how – if at all – can we use chaos in enterprise-architecture?

I’ve been having a great email back-and-forth on this with Cynthia Kurtz, co-originator of Cynefin, and – probably more relevant here – originator of the Confluence sensemaking-framework (CSF).

As Cynthia said in that conversation, probably the greatest problem here is that the word ‘chaos’ is used to mean so many different things that can also be interpreted and used in so many different ways. Worse, some of those views tend to be misused as term-hijacks that block off the view to any other meaning or understanding of ‘chaos’ – which can be a real problem.

As I see it, there at least four themes here that are relevant to enterprise-architecture:

‘chaos’ in the colloquial sense, as a state within which no sense can be made – and hence something that many people would want to prevent
‘chaos’ as the ‘necessary jolt’ – a brief burst to shake things up when they’ve gotten stuck
‘chaos’ as non-linear dynamics – a subset of or view that can be described by ‘chaos-mathematics’
‘chaos’ as infinite-possibility – a source for innovation, improvisation and the like

We also need to clarify the difference between control, complexity and ‘real’ chaos – because there are significantly different tactics that we need for each.

First. though, let’s get colloquial ‘chaos’ out of the way. In essence, that kind of ‘chaos’ is actually a sensemaking breakdown: things are happening too fast and/or with too much unpredictability for conventional order-based sensemaking to cope – hence a kind of collapse into overwhelm or whatever.

In system-designs, overload under rapid change is one common cause of this. Conventional analytic sensemaking / decision-making / action loops take time to execute: so if whatever’s going on is happening faster than the system can keep up, and changes from one loop to another, eventually the decisions and actions are going to be those that should have applied to the previous loop. Which will then be the wrong decisions and actions, needing a further correction, which will also be out of sync, and then further and further out of sync, as the misalignment between sensemaking and action cascades into chaos.

[A simple first-hand example: if tryt to trype as fast tiass i can withoyut6 editing and neot checkign every single letter that iu type, which is whtb i;m doign now ont rhis umfa,iliart key boarfd, i tnd to maker a chaqsotic mess of things. I need to slow it down for the typing to make sense. Possibly…]

In a military context, the whole aim of a tactic such as John Boyd’s OODA loop (Observe, Orient, Decide, Act) is to cause this kind of chaotic collapse in the opponent. The opponent is then forced to slow down in order to regain sense – and that slowing-down then makes them a literally predictable target.

There’ll also always be a moment of this kind of chaos before any attempt at sensemaking can take place. The moment we’re dumped into a new situation – or even a known situation before we know that it’s a known situation – there’s the same kind of existential uncertainty: we don’t know what’s going on, so we can’t decide what to do. (In the swamp-metaphor, that’s the state that we’re in before we decide on what kind of tactic to use – when all we know is that the swamp is, well, the swamp…) Sometimes we can kick-start the sensemaking-cycle there by taking some arbitrary form of action: “don’t just stand there, do something!” – or, as some would put it, ‘act -> sense -> respond’ as ‘the method’ to break out of the chaos. Yet as we’ll see later, that’s by no means the only available tactic there – and in many cases it’s not even a good tactic either.

In essence, colloquial-chaos is loss of sense leading to a sense of loss of ‘control’. To many – especially those embedded in a linear-paradigm worldview – the obvious answer is to crank up the speed of the sense/decision/action cycle: hence the popularity of computer-based automation and suchlike. Yet although such automation is an answer that works well in some circumstances, automation alone is not ‘the answer’. All it really does is push back the boundaries of ‘chaos’: no matter how much we might wish that it could do so, it cannot ever expunge that ‘chaos’ in its entirety.

The reality is that concepts of ‘control’ only work well with contexts that are amenable to a concept of ‘control’, which in essence means predictability, which in turn depends on repeatability – over on the left-side of the red line of the ‘Inverse-Einstein boundary‘, in the SCAN diagram above. Which doesn’t apply in many real-world contexts: for example, see Sigurd Rinde‘s work on ‘Barely Repeatable Processes’. And pretending that we can reduce everything to ‘control’ is a dangerous fantasy – a more extreme version of the notion that complexity can somehow be eliminated from real-world business. To paraphrase a quote from an earlier post here:

To ignore or deny Chaos is a bit like denying the existence of a hole in the road – it simply increases the chance of falling into it.

Colloquial-chaos occurs whenever we try to ignore or deny the reality of chaos: we need to work with that chaos, rather than try to pretend that it doesn’t exist.

A first choice here for many people from the linear-paradigm tradition would be mathematical-’chaos’. To quote Douglas Hofstadter:

It turns out that an eerie type of chaos can lurk just behind a façade of order – and yet, deep inside the chaos lurks an even eerier type of order.

There are plenty of business-examples of this: for example, a buying-decision in sales will often conform almost exactly to one of the best-known of chaos-math patterns. In practice, though, this is often misused or misunderstood as just another way to put a gloss of ‘control’ over something that actually isn’t controllable at all. To understand how to make use of this in enterprise-architecture, we need to know its very real limitations.

Perhaps most important is that this isn’t the mathematics of ‘chaos’ as a whole: it describes only a specific subset of ‘chaos’, such as the impact of non-linear dynamics. Although it’s different from conventional linear notions of ‘order’, in essence it’s a kind of ‘meta-mathematics’, with a role somewhat similar to the role of metaframeworks and the like: it describes the behaviour of the bounding-conditions (such as some aspects of ‘variety-weather‘) for specific phenomena, but cannot describe the exact phenomena themselves – a crucial distinction that’s too often missed. And although it does cover key concerns such as sensitivity to initial conditions, it explicitly doesn’t allow for true randomness or extreme-uniqueness – which is a significant issue in many business-contexts.

In short, chaos-science and suchlike can be useful: most enterprise-architects would gain a lot of insights from even an introductory text such as James Gleick’s Chaos. But it’s not ‘the answer’, and treating it as ‘the answer’ will bring on a whole load of really-unhelpful term-hijack problems. Useful, but handle with care…

Next we’d turn to what we might describe as ‘jolt-chaos‘: in SCAN terms, deliberately stepping over into uncertainty or ‘unorder’ so as to shake things up a bit. In enterprise-architecture practice, I’ve often seen at least two distinct forms of this.

The first is sort-of linked to chaos-science territory: giving something a jolt to break it out of ‘stuckness’. One way to describe this is in terms of ‘attractors‘ or ‘strange-attractors‘, where a phenomenon that follows non-linear dynamics tends to fall into particular patterns with regions of relative stability. The classic real-world example is to give the (old-style analogue) television a thump in the hope of bringing it back to the desired tuning. I don’t know the details of the maths here, but I do know that old-style thermionic-valves could wander, but tended to settle into particular states: if it was a ‘wrong’ state, a mechanical jolt could often restart the dynamics, with a good chance of getting the valve to resettle into the preferred state. In effect, the jolt briefly becomes a part of the overall dynamic-system, and therefore temporarily changes the dynamics of the system. Plenty of business analogues there – the Hawthorne Effect being perhaps one of the best-known examples.

The other type is where we kind of ‘dip in to the chaos’ as a source of ideas or innovation. Common business-examples to invite this type of chaos include brainstorming, gamestorming and structural-serendipity. We need this ‘jolt-chaos’ to keep things moving in business and elsewhere: Nietzsche’s oft-quoted comment that “You must have chaos within you to give birth to a dancing star” would apply to birthing anything new, really. But we do need to respect it and work with it as chaos – and not try to ‘control’ it, or pretend that it’s something else that’s easier to manipulate or understand.

[For example, some people would misdescribe this ‘jolt-chaos’ as part of complexity. Yet doing so is both unfortunate and unwise: it’s true that chaos-events can be used as a feed into complexity, but that’s not the same at all as saying that it is complexity. I’ll need to do a separate post on this, but the two domains are fundamentally different in nature, in particular around the role of predictability, outliers and patterns: as different, in fact, as the individual events of quantum-physics are from the derived probabilities of such events en-masse. Blurring chaos and complexity together is potentially dangerous in enterprise-architecture, because it introduces a delusory form of ‘order’ into a context where, by definition, no order can actually exist: a similar mistake to the way in which some people misinterpret chaos-math as implying that it’s possible to use it to predict chaotic-events, when in reality all it can do is predict the degree of unpredictability – whilst the fact of the unpredictability itself always remains unchanged.]

The crucial point with ‘jolt-chaos’ is that we have no way to know beforehand what the outcome will be. Working with that uncertainty is actually the key here. Any attempt at ‘control’ will likely block access to that needed-uncertainty – hence control-oriented concepts such as ‘success’ or ‘failure’ must be kept well outside of the chaos-space itself. Given that we might well dip into the chaos-space because we need new ideas for an urgent issue elsewhere, this can be a tricky balance to maintain…

Finally, for here, there’s what we might call ‘infinity-chaos‘, though perhaps ‘continuous-chaos‘ or ‘intentional-chaos‘ might well be useful alternate terms. Here we don’t just dip into the uncertainty for a brief moment – as in ‘jolt-chaos’ – but instead remain in it for as long as we can or must.

In the terms of the SCAN framework, it’s about operating in the ‘Not-known’ domain in real-time, where in practice decisions and actions in practice can only be done on faith:

I’ll admit this is not easy to describe or explain, not least because to make sense of what happens here requires careful subjective observation and a mode of inquiry that doesn’t fit well with a conventional ‘truth’-based verbal form of description. Hence this whole space is very often misunderstood, or misdescribed, or just glossed-over in the search for some more-easily-understandable Belief-structure – which won’t do the job that’s required here. It often seems the best we can do in descriptions is kind of point at it, or imply it: if you’re familiar with classics such as the Tao Te Ching or the Sufi teaching-tales of the ‘wise fool’ Mulla Nasruddin, you’ll have a good sense of the more abstract and generic end of what I’m aiming for here. Yet it’s probable that the only really workable ‘description’ is to do it, whilst also observing oneself in the doing of it – which is not an easy thing to do!

For enterprise-architecture and the like, what we’d look for here are strategies and tactics that support real-time action and decision-making: it’s very important not to try to define methods or whatever to do the work itself – and especially not with some form of automation. The whole point here is that in any inherently-chaotic context, we do not and cannot define beforehand the exact details of what will happen: some parts of it at least will always need to be made up on the spot, from whatever material is available to hand. So the key here is to provide an ‘option-rich’ environment and solid support for the human skills and decision-making that must apply in this type of context.

Simple everyday examples (or at least, ‘everyday’ for the people who act in those contexts) would include a customer-support call-centre, the emergency-room in a hospital, or soldiers on front-line patrol in hostile territory. There’s always some aspect that’s ‘the same’, as constrained by the context and the nominal responsibilities in that context; yet there’s also always some aspect that’s different every time. And we can’t know beforehand exactly what it will be: all that we do know is that something half-known or completely-unknown to us could be thrown at us at any moment – and the decisions and actions that we take are down to our literal ‘response-ability’, in real-time, right here, right now.

[A reminder that this isn’t the same as complexity – or more accurately, we here move from complexity (non-real-time) to chaos (real-time), rather than from chaos to complexity, as often occurs with ‘jolt-chaos’. Again, though, this is something I’ll need to expand in more detail in another post – this one’s long enough already!]

Some of the key points here include:

this item matters – and we are responsible in this moment for its outcome
what happens in this one moment affects everything that happens onwards, and may affect (the reinterpretation of) everything that happened before
there is also only this one item, with no connection to anything else
the item may have any degree of uniqueness
what worked in the past with something that looked much the same as this may or may not work – and we have no way to tell
all of the action must take place in real-time, right here, right now
the action is over when it’s over – and we have no way to tell how long it will take
each moment is our responsibility – and determined by our ‘response-ability’

Some of the obvious points that should arise from this:

anything that assumes identicality (e.g. Six Sigma, rule-based automation) is not going to work well here
anything that assumes connection between events (e.g. pattern-matching) may well be misleading
anything that assumes no connections between events (e.g. colloquial ‘chaos’) is likely to problematic – or at the very least, inefficient and ineffective
anything that requires significant ‘offline’ time to execute (e.g. conventional analysis or experimentation) will cause either analysis-paralysis or catastrophic-collapse

In short, most conventional approaches to business-challenges – whether Taylorist or complexity-based – will not work well for these contexts. For example, one all but guaranteed anti-pattern for catastrophic failure in high-intensity, high-uncertainty real-time action is to attempt to ‘control’ it via micromanagement: the tragedy is that people still try to make it work, solely because it’s the only approach that they know and understand. As enterprise-architects, we need to do better than that…

The tactics and strategies that do work here will often seem somewhat paradoxical, even in their description. Yet in some domains, such as improvisational theatre, there’s much about this that’s both well-understood and well-documented. For example, here’s Michelle James‘ list of improv-principles, from her slidedeck ‘Expand Your Story: Improv for reinvention‘:

Yes-And
Make everyone else look good
Heighten and explore
Justify [aka ‘Include and expand’]
Serve the good of the whole
Mistakes are invitations to create
Be changed by what happens

Note that many of these are almost the exact opposites of what usually happens in business: ‘No-But’ rather than ‘Yes-And’, for example, or ‘Serve the good of this business-unit’, or ‘Mistakes are invitations to punish’. Which might just illustrate why so many organisations have so much difficulty in coping with inherent-uniqueness or inherent-uncertainty… Again, we do need to do better than this.

Anyway, stop there for now: enough to get started with, I hope? There’ll be more detail to follow in later posts – with an emphasis on practical, usable detail – but for now, over to you for any comments so far?