29 days ago

Hybrid By Design Vs. Hybrid By Accident

As a veteran of enterprise IT, there’s a difference between “hybrid by design” and “hybrid by accident.” Let’s be frank: you are probably doing hybrid by accident – just about everybody is. Hybrid by accident is: Integrating public cloud with on-premises tech without standardizing on a common infrastructure-as-code practice Shadow IT cloud “experiments” that suddenly […]

2 months, 15 days ago

New Enterprise “Cloud” Integration Approach in Banking

While all four maturing digital trends – Mobile, Cloud, Delivery Optimization, Process Optimization — are interconnected, Cloud appears to be the one to make the technology c-suite (CISO, CTO and CDO) most nervous. But the potential upside of Cloud adoption brings tremendous synergy in operating costs and also helps propel innovation.

4 months, 10 days ago

The transition to Shadow IT and the Cloud (ii)

continuing from
The CIO in the Cloud era
The IT issue and the Cloud solution (i)
 
The revolution started with the shadow IT. Business felt satiated with the perpetual IT excuses and delays of the type “can’t”, “not now”, “we are so busy…”, &nbs…

4 months, 10 days ago

The IT issue and the Cloud solution (i)

continuing from
The CIO in the Cloud era
 
Yet, business is not too happy with the IT department. I cannot imagine why, you may say. Well, IT costs a big deal in comparison with rest of a business. Consider the numbing operationa expenditure on se…

7 months, 14 days ago

Inspector Sands to Platform Nine and Three Quarters

Last week was not a good one for the platform business. Uber continues to receive bad publicity on multiple fronts, as noted in my post on Uber’s Defeat Device and Denial of Service (March 2017). And on Tuesday, a fat-fingered system admin at AWS managed to take out a significant chunk of the largest platform on the planet, seriously degrading online retail in the Northern Virginia (US-EAST-1) Region. According to one estimate, performance at over half of the top internet retailers was hit by 20 percent or more, and some websites were completely down.

What have we learned from this? Yahoo Finance tells us not to worry.

“The good news: Amazon has addressed the issue, and is working to ensure nothing similar happens again. … Let’s just hope … that Amazon doesn’t experience any further issues in the near future.”

Other commentators are not so optimistic. For Computer Weekly, this incident

“highlights the risk of running critical systems in the public cloud. Even the most sophisticated cloud IT infrastructure is not infallible.”

So perhaps one lesson is not to trust platforms. Or at least not to practice wilful blindness when your chosen platform or cloud provider represents a single point of failure.

One of the myths of cloud, according to Aidan Finn,

“is that you get disaster recovery by default from your cloud vendor (such as Microsoft and Amazon). Everything in the cloud is a utility, and every utility has a price. If you want it, you need to pay for it and deploy it, and this includes a scenario in which a data center burns down and you need to recover. If you didn’t design in and deploy a disaster recovery solution, you’re as cooked as the servers in the smoky data center.”

Interestingly, Amazon itself was relatively unaffected by Tuesday’s problem. This may have been because they split their deployment across multiple geographical zones. However, as Brian Guy points out, there are significant costs involved in multi-region deployment, as well as data protection issues. He also notes that this question is not (yet) addressed by Amazon’s architectural guidelines for AWS users, known as the Well-Architected Framework.

Amazon recently added another pillar to the Well-Architected Framework, namely operational excellence. This includes such practices as performing operations with code: in other words, automating operations as much as possible. Did someone say Fat Finger?


Abel Avram, The AWS Well-Architected Framework Adds Operational Excellence (InfoQ, 25 Nov 2016)

Julie Bort, The massive AWS outage hurt 54 of the top 100 internet retailers — but not Amazon (Business Insider, 1 March 2017)

Aidan Finn, How to Avoid an AWS-Style Outage in Azure (Petri, 6 March 2017)

Brian Guy, Analysis: Rethinking cloud architecture after the outage of Amazon Web Services (GeekWire, 5 March 2017)

Daniel Howley, Why you should still trust Amazon Web Services even though it took down the internet (Yahoo Finance, 6 March 2017)

Chris Mellor, Tuesday’s AWS S3-izure exposes Amazon-sized internet bottleneck (The Register, 1 March 2017)

Shaun Nichols, Amazon S3-izure cause: Half the web vanished because an AWS bod fat-fingered a command (The Register, 2 March 2017)

Cliff Saran, AWS outage shows vulnerability of cloud disaster recovery (Computer Weekly, 6 March 2017)