19 days ago

Inspector Sands to Platform Nine and Three Quarters

Last week was not a good one for the platform business. Uber continues to receive bad publicity on multiple fronts, as noted in my post on Uber’s Defeat Device and Denial of Service (March 2017). And on Tuesday, a fat-fingered system admin at AWS managed to take out a significant chunk of the largest platform on the planet, seriously degrading online retail in the Northern Virginia (US-EAST-1) Region. According to one estimate, performance at over half of the top internet retailers was hit by 20 percent or more, and some websites were completely down.

What have we learned from this? Yahoo Finance tells us not to worry.

“The good news: Amazon has addressed the issue, and is working to ensure nothing similar happens again. … Let’s just hope … that Amazon doesn’t experience any further issues in the near future.”

Other commentators are not so optimistic. For Computer Weekly, this incident

“highlights the risk of running critical systems in the public cloud. Even the most sophisticated cloud IT infrastructure is not infallible.”

So perhaps one lesson is not to trust platforms. Or at least not to practice wilful blindness when your chosen platform or cloud provider represents a single point of failure.

One of the myths of cloud, according to Aidan Finn,

“is that you get disaster recovery by default from your cloud vendor (such as Microsoft and Amazon). Everything in the cloud is a utility, and every utility has a price. If you want it, you need to pay for it and deploy it, and this includes a scenario in which a data center burns down and you need to recover. If you didn’t design in and deploy a disaster recovery solution, you’re as cooked as the servers in the smoky data center.”

Interestingly, Amazon itself was relatively unaffected by Tuesday’s problem. This may have been because they split their deployment across multiple geographical zones. However, as Brian Guy points out, there are significant costs involved in multi-region deployment, as well as data protection issues. He also notes that this question is not (yet) addressed by Amazon’s architectural guidelines for AWS users, known as the Well-Architected Framework.

Amazon recently added another pillar to the Well-Architected Framework, namely operational excellence. This includes such practices as performing operations with code: in other words, automating operations as much as possible. Did someone say Fat Finger?


Abel Avram, The AWS Well-Architected Framework Adds Operational Excellence (InfoQ, 25 Nov 2016)

Julie Bort, The massive AWS outage hurt 54 of the top 100 internet retailers — but not Amazon (Business Insider, 1 March 2017)

Aidan Finn, How to Avoid an AWS-Style Outage in Azure (Petri, 6 March 2017)

Brian Guy, Analysis: Rethinking cloud architecture after the outage of Amazon Web Services (GeekWire, 5 March 2017)

Daniel Howley, Why you should still trust Amazon Web Services even though it took down the internet (Yahoo Finance, 6 March 2017)

Chris Mellor, Tuesday’s AWS S3-izure exposes Amazon-sized internet bottleneck (The Register, 1 March 2017)

Shaun Nichols, Amazon S3-izure cause: Half the web vanished because an AWS bod fat-fingered a command (The Register, 2 March 2017)

Cliff Saran, AWS outage shows vulnerability of cloud disaster recovery (Computer Weekly, 6 March 2017)

27 days ago

Fear of Failure, Fear and Failure

Some things seem so logically inconsistent that you just have to check them out. Such was the title of a post on LinkedIn that I saw the other day: “Innovation In Fear-Based Cultures? Or, why hire lions to be dogs?”. In it, Michael Graber noted that “…top-down organizations have the most trouble innovating.”: In particular, […]

6 months, 10 days ago

Leadership Anti-Patterns – The Thinker

My interest in leadership, how it works and how it fails, goes back a long way. Almost as soon as I learned how to read, history, particularly military history, has been a favorite of mine. Captains and kings, their triumphs and their downfalls, fascinated me. The eleven years I served with the Henrico Sheriff’s Office […]

6 months, 14 days ago

Leadership Anti-Patterns – The Great Pretender

My previous leadership type, the Growler, was hard to classify as it had aspects of both pattern and anti-pattern. The Great Pretender, however, is much easier to label. It’s clearly an anti-pattern. Before entering the working world full-time, I worked in the retail grocery business (both of my parents also had considerable industry experience, both […]

6 months, 25 days ago

Leadership Patterns and Anti-Patterns – The Growler

Prior to starting my career in IT (twenty years ago this month…seems like yesterday), I spent a little over eleven years in law enforcement as a Deputy Sheriff. Over those eleven years my assignments ranged from working a shift in the jail (interesting stories), to Assistant Director of the Training Academy, then Personnel Officer (even […]

9 months, 8 days ago

Form Follows Function on SPaMCast 399

This week’s episode of Tom Cagley’s Software Process and Measurement (SPaMCast) podcast, number 399, features Tom’s essay “Storytelling: Developing The Big Picture for Agile Efforts”, Kim Pries on deliberate practice, and a Form Follows Function installment on customer-centricity for IT. Tom and I discuss my post “A Meaningful Manifesto for IT”. It seems obvious that […]

1 year, 1 month ago

A Meaningful Manifesto for IT

“Customer-centricity” is one of the biggest tags in the tag cloud to the right. My first post this year was “Is 2016 the Year for Customer-Focused IT?”. It’s a concept that I find vitally important to IT for the simple reason that to the extent that IT is not fit for purpose, it’s a waste […]

2 years, 10 months ago

Socially Developed Architecture

One of the most challenging aspects in our role as architects is that we often have to influence without direct authority.   We often wrestle with this fact as we may not have the managerial clout and there may be lack of clarity on what precisely we are accountable for.    Perhaps simply stated, we have to be THE accountable party for…