Where were the architects at RBS?

Link: http://feedproxy.google.com/~r/Soapbox/~3/b_lznwk9JRM/where-were-architects-at-rbs.html

From Richard Veryard on Architecture

#entarch Some interesting architectural implications of the recent embarrassing failure of banking systems at RBS-NatWest Bank, which has caused financial stress and distress for millions of customers.

A banking software expert quoted in the Guardian offered an interesting architectural analogy.

“Banking systems are like a huge game of
Jenga [the tower game played with interlaced blocks of wood]. Two
unrelated transactions might not look related now, but 500,000
transactions from now they might have a huge relation. So
everything needs to be processed in order.”

This analogy suggests that the problem is one of architectural
knowledge and governance. This is always a problem for any large and complex enterprise, but outsourcing typically amplifies such problems. From the press reports, it seems that the implementation
of the RBS-NatWest application architecture has been delegated to a
bunch of relatively inexperienced Indians with little knowledge of the
RBS-NatWest business.

The finger of blame is being
pointed to CA-7, which I understand to be a middleware product
responsible for the orchestration of complex batch runs. As recently as February, there were job adverts in Inda urgently seeking people with CA-7 experience for the RBS contract.

Distributes or centralize job submission,
management and monitoring as you choose and simplify job
management by automating as much as possible and provides a
simple-to-use interface to manage your environment. CA 7® Workload
Automation is a mainframe-hosted, fully-integrated workload
automation engine that coordinates and executes job schedules and
event triggers across the enterprise.

http://www.ca.com/us/products/detail/ca-7-workload-automation.aspx

The Guardian continues

It seems whoever made the update to CA-7
managed to delete or corrupt the files which hold the schedule for
the overnight jobs, so they did not run, or ran incorrectly.

ComputerWorld quotes an RBS spokesman.

The focus right now is on fixing the
problem, which was triggered during a software system upgrade.

and BBC’s Robert Peston adds

the software update that went so badly wrong last Tuesday night was
fairly quickly identified and patched by Royal Bank; it is the absence
of a contingency plan to deal with the knock-ons from the initial
computer failure that many will see as deeply troubling

I presume that CA-7 expertise involves the ability to create and
maintain these control files. But these control files essentially
contain executable metadata that describe how the applications must
be joined up, which must ultimately be based on a rigorous view of
the application architecture – in other words, a model of the application layer.

(In my discussion of business capabilities, I have always said that
the most troublesome capabilities (and the ones overlooked by most
business analysts) are the coordination capabilities, and these are
the ones that need the most care when outsourcing. The RBS-NatWest
incident illustrates this point.)

@davidsprott
uses the incident to illustrate the need for application
modernization. But was the problem in the core application systems,
or was it in the platform layer?  

To the extent that application coordination is being managed via
CA-7, it looks suspiciously as if the model of the application layer was embedded in the platform layer, and managed as if it was merely technical infrastructure. This suggests a fundamental architectural flaw in RBS systems – a failure to maintain a clean separation of concerns between the application layer and the platform layer.

This is one of the reasons why enterprise architecture is important. With clean separation and robust interfaces between the architectural layers (business, application, platform), we can carry out modernization, innovation and continuous change in each layer separately. This follows the principle of pace layering, based on the notion that each layer has a different characteristic rate of change. Without clean separation between layers, the layers shear apart, resulting in misalignment and system failure. And as @davidsprott points out, service enabling has exactly this (layer separation) outcome.

Conclusions

  • It’s risky outsourcing the core systems unless the architecture is clearly understood and controlled.
  • Good outsourcing‬ requires a good service architecture, which may include business, app and/or platform services.
  • Modernization requires good architecture.
  • In complex systems of systems, coordination is a core business capability. Outsource with extreme caution.

Charles Arthur, How NatWest’s IT meltdown developed (Guardian 25 June 2012)

Anh Nguyen, CA ‘helps’ RBS resolve tech problem that led to massive outage (ComputerWorld 25 June 2012)

Robert Peston, Is outsourcing the cause of RBS debacle? (BBC News 25 June 2012)

David Sprott, RBS Crash – Management Prefer Offshoring to Modernization? (25 June 2012)