Inspector Sands to Platform Nine and Three Quarters

Last week was not a good one for the platform business. Uber continues to receive bad publicity on multiple fronts, as noted in my post on Uber’s Defeat Device and Denial of Service (March 2017). And on Tuesday, a fat-fingered system admin at AWS managed to take out a significant chunk of the largest platform on the planet, seriously degrading online retail in the Northern Virginia (US-EAST-1) Region. According to one estimate, performance at over half of the top internet retailers was hit by 20 percent or more, and some websites were completely down.

What have we learned from this? Yahoo Finance tells us not to worry.

“The good news: Amazon has addressed the issue, and is working to ensure nothing similar happens again. … Let’s just hope … that Amazon doesn’t experience any further issues in the near future.”

Other commentators are not so optimistic. For Computer Weekly, this incident

“highlights the risk of running critical systems in the public cloud. Even the most sophisticated cloud IT infrastructure is not infallible.”

So perhaps one lesson is not to trust platforms. Or at least not to practice wilful blindness when your chosen platform or cloud provider represents a single point of failure.

One of the myths of cloud, according to Aidan Finn,

“is that you get disaster recovery by default from your cloud vendor (such as Microsoft and Amazon). Everything in the cloud is a utility, and every utility has a price. If you want it, you need to pay for it and deploy it, and this includes a scenario in which a data center burns down and you need to recover. If you didn’t design in and deploy a disaster recovery solution, you’re as cooked as the servers in the smoky data center.”

Interestingly, Amazon itself was relatively unaffected by Tuesday’s problem. This may have been because they split their deployment across multiple geographical zones. However, as Brian Guy points out, there are significant costs involved in multi-region deployment, as well as data protection issues. He also notes that this question is not (yet) addressed by Amazon’s architectural guidelines for AWS users, known as the Well-Architected Framework.

Amazon recently added another pillar to the Well-Architected Framework, namely operational excellence. This includes such practices as performing operations with code: in other words, automating operations as much as possible. Did someone say Fat Finger?


Abel Avram, The AWS Well-Architected Framework Adds Operational Excellence (InfoQ, 25 Nov 2016)

Julie Bort, The massive AWS outage hurt 54 of the top 100 internet retailers — but not Amazon (Business Insider, 1 March 2017)

Aidan Finn, How to Avoid an AWS-Style Outage in Azure (Petri, 6 March 2017)

Brian Guy, Analysis: Rethinking cloud architecture after the outage of Amazon Web Services (GeekWire, 5 March 2017)

Daniel Howley, Why you should still trust Amazon Web Services even though it took down the internet (Yahoo Finance, 6 March 2017)

Chris Mellor, Tuesday’s AWS S3-izure exposes Amazon-sized internet bottleneck (The Register, 1 March 2017)

Shaun Nichols, Amazon S3-izure cause: Half the web vanished because an AWS bod fat-fingered a command (The Register, 2 March 2017)

Cliff Saran, AWS outage shows vulnerability of cloud disaster recovery (Computer Weekly, 6 March 2017)

On McKinsey’s Two Speed IT Architecture and Digital Business Models

 In a “A two-speed IT architecture for the digital enterprise”, McKinsey’s (McK) Oliver Bossert, Chris Ip, and Jürgen Laartz stated that “delivering an enriched customer experience requires a new digital architecture running alongsi…

Regulating Software Development

  Another weekend, another too good to pass up Twitter conversation during my “unplugged” time. This weekend, Grady Booch hooked me by retweeting Mike Potts tweet: Mike’s tweet was a reply to Grady’s comment on the latest news out of Uber: It’s an understandable question. It’s a reasonable question. It’s one that came up back […]

The power of giving up

In architecture we sometimes confronted by wicked problems or other problems that are unsolvable by traditional means, such as with the wicked where you can first formulate the answer after you have implemented the solution. Additional creativity often works best if you remove the pressure, often you see this with writers block. Traditional management and … Continue reading The power of giving up

Uber’s Defeat Device and Denial of Service

Perhaps you already know about Distributed Denial of Service (DDOS). In this post, I’m going to talk about something quite different, which we might call Centralized Denial of Service.

This week we learned that Uber had developed a defeat device called Greyball – a fake Uber app whose purpose was to frustrate investigations by regulators and law enforcement, especially designed for those cities where regulators were suspicious of the Uber model.

In 2014, Erich England, a code enforcement inspector in Portland, Oregon, tried to hail an Uber car downtown in a sting operation against the company. However, Uber recognized that Mr England was a regulator, and cancelled his booking. 

It turns out that Uber had developed algorithms to be suspicious of such people. According to the New York Times, grounds for suspicion included trips to and from law enforcement offices, or credit cards associated with selected public agencies. (Presumably there were a number of false positives generated by excessive suspicion or Überverdacht.)

But as Adrienne Lafrance points out, if a digital service provider can deny service to regulators (or people it suspects to be regulators), it can also deny service on other grounds. She talks to Ethan Zuckerman, the director of the Center for Civic Media at MIT, who observes that

“Greyballing police may primarily raise the concern that Uber is obstructing justice, but Greyballing for other reasons—a bias against Muslims, for instance—would be illegal and discriminatory, and it would be very difficult to make the case it was going on.”

One might also imagine Uber trying to discriminate against people with extreme political opinions, and defending this in terms of the safety of their drivers. Or discriminating against people with special needs, such as wheelchair users.

Typically, people who are subject to discrimination have less choice of service providers, and a degraded service overall. But if there is a defacto monopoly, which is of course where Uber wishes to end up in as many cities as possible, then its denial of service is centralized and more extreme. Once you have been banned by Uber, and once Uber has driven all the other forms of public transport out of existence, you have no choice but to walk.


Mike Isaac, How Uber Deceives the Authorities Worldwide (New York Times, 3 March 2017)

Adrienne LaFrance, Uber’s Secret Program Raises Questions About Discrimination (The Atlantic, 3 March 2017)

The Problem with Processes: The Reprise – FiD post

This is a slightly rewritten version of the first public airing of the VPEC-T concept. That was over 10 years ago – it now it has a life of its own, it is, however, the foundation on which Lost In Translationwas written, and apparent in Found In Design. Please take a look and send me your comments. Thanks Nigel.


The Problem with Processes: The Reprise – FiD post

This is a slightly rewritten version of the first public airing of the VPEC-T concept. That was over 10 years ago – it now it has a life of its own, it is, however, the foundation on which Lost In Translationwas written, and apparent in Found In Design. Please take a look and send me your comments. Thanks Nigel.