5 years, 6 months ago

Big Data Analytics and Cheap Suits

Link: https://pragmaticarchitect.wordpress.com/2015/10/31/big-data-analytics-and-cheap-suits/

Why?Sometimes I just want to staple my head to the carpet and wonder how to help others manage the seemingly irresistible urge to cling to what everyone else seems to be doing without thinking carefully about what is needed, not just wanted.  I will be discussing a topic I have been buried in the last couple of years in the Big Data Analytics space which most everyone by now is familiar with.  The technology is sound, evolving quickly, and solves for problems I could not image attacking a decade ago.  On the other hand the breath-taking speed of this platform adoption has left many scratching their heads and wondering why the old familiar rules of thumb and proven practice just don’t seem to work well anymore.   Less current management styles and obsolete thinking have created needless friction between the business and their supporting IT organizations.  This never ends well, but does keep me very busy.

First let’s put this challenge in perspective with a little context.  Over my career there have been a number of times when the need for efficient, cost effective data analysis has forced a change in existing technologies. The move to a relational model occurred when older methods to reliably handle changes to structured data led to the shift toward a data storage paradigm that was modeled on relational algebra. This created a fundamental shift in data handling, introducing a variety of tools and techniques that made all of our lives more rewarding. The current revolution in technology referred to as Big Data has happened because the relational data model can no longer efficiently handle the current needs for analysis of large and unstructured data sets. It is not just that data is bigger than before, or any of the other Vs (Variety, Volume, Velocity, Veracity, and Volatility) others have written about.  All of these data characteristics have been steadily growing for decades. The Big Data revolution is really a fundamental shift in architecture, just as the shift to the relational model was a shift that changed all of us. This shift means building new capabilities, adopting new tools, and thinking clearly about solving the right problems with the right tools the right way.  This means we need to truly understand what critical analytic capability is needed and make a focused investment in time and energy to realize this opportunity.  This should sound familiar to any of you working in this space. Many are already answering some of the obvious questions we should address at a minimum.

– When do we use a big data platform as opposed to the other platforms available?
– What are the platform drivers or key characteristics beyond storage and advanced analytics?
– Is low latency, real time application access required?
– How about availability and consistency requirements (see the CAP theorem for more on this)
– Workload characteristics – consistent flows or spikes?
– What is the shape of the data (e.g. structured, unstructured, and streaming)?
– Is there a need to integrate with existing data warehouse or other analytic platforms?
– How will the data be accessed by the analytic community and supporting applications?

Note that last question carefully; this is where the fun starts.

Why? There are two very real and conflicting views that we need to balance carefully.

The first, driven by the business is concerned with just getting the job done and lends itself to an environment where tools (and even methods) proliferate rapidly. In most cases this results in overlapping and redundant expensive functionality.  Less concerned with solving problems once, the analytic community is characterized by many independent efforts where significant intellectual property (analytic insight) is not captured and likely put a risk.  And not even re-used across the organization by others solving the same question.  There are very good reasons for this, this is completely understandable when the end justifies the means, and getting to the end game is the rewarded behavior. Like a cheap suit the analytic community simply doesn’t believe one size fits all. And I agree.

What-to-look-for-in-a-good-cheap-suit-by-DapperedThe second view, in contrast, is driven by the supporting IT organization charged with managing and delivering supporting services across a technology portfolio that values efficiency and effectiveness.  The ruthless pursuit of eliminating redundancy, leveraging the benefits of standardization, and optimizing investment drive this behavior.  I think it is easy to see where the means becomes the critical behavioral driver and the end is just assumed to resolve itself.   Just as cheap suits are designed to be mass-produced, use standard materials, and provide just enough (and no more) details to get by with the average consumer (if there really is such a thing).  Is there really an average analytic consumer? No; there is not (see the user profile tool in the next post for more). And I do agree with this view as well, there are very sound reasons why this view remains valid.

So this is where the friction is introduced. Until you understand this dynamic get ready for endless meetings, repeated discussions about capability (and what it means), and organizational behavior that seems puzzling and downright silly at times.  Questions like these (yes these are real) seem to never be resolved.

– Why do we need another data visualization tool when we already have five in the portfolio?
– Why can’t we just settle on one NoSQL alternative?
– Is the data lake really a place to worry about data redundancy?
– Should we use the same Data Quality tools and principals in our Big Data environment?

What to Do

So I’m going to share a method to help resolve this challenge and help focus on what is important so you can expend your nervous system solving problems rather than creating them. Armed with a true understanding of the organizational dynamics it is now a good time to revisit a first principal to help resolve what is an important and urgent problem.

First Principal: Form follows function.

The American architect, Louis Sullivan coined the phrase saying “It is the pervading law of all things organic and inorganic, of all things physical and metaphysical, of all things human and all things superhuman, of all true manifestations of the head, of the heart, of the soul, that the life is recognizable in its expression, that form ever follows function. This is the law”. And this has since become known by its’ more familiar phrase “form follows function“.

It is truly interesting that Sullivan developed the shape of the tall steel skyscraper in late 19th Century Chicago at the very moment when technology, taste and economic forces converged and made it necessary to drop the established styles of the past. If the shape of the building was not going to be chosen out of the old pattern book something had to determine form, and according to Sullivan it was going to be the purpose of the building. It was “form follows function”, as opposed to “form follows precedent”. Sullivan’s assistant Frank Lloyd Wright adopted and professed the same principle in slightly different form perhaps because shaking off the old styles gave them more freedom and latitude.

Sound familiar? It should, for any of us actively adopting this technology. This is where the challenge of using tried and true proven practice meets the reality of shaking off the old styles and innovating where and when it is needed in a meaningful, controlled, and measured manner.

So if form follows function, let’s see what makes sense. Thanks to Gartner who published Critical Capabilities for Business Intelligence and Analytics Platforms this summer (12 May 2015 ID:G00270381) we have a reasonably good way to think about form and function.  You may think what you will about Gartner I believe they have done a good job of grouping and characterizing fourteen (14) critical capabilities for analytics across four (4) different operating models (Gartner referred to them as baseline use cases) as follows.

– Centralized Provisioning
– Decentralized Analytics
– Governed Data Discovery
– OEM/Embedded Analytics

In this case capabilities are defined as “the ability to perform or achieve certain actions or outcomes through a set of controllable and measurable faculties, features, functions, processes, or services”.  They grouped the capabilities in questions into fourteen (14) major categories to include:

– Analytic Dashboards and Content
– Platform Administration
– Business User Data Mashup
– Cloud Deployment
– Collaboration and Social Integration
– Customer Services
– Development and Integration
– Ease of Use
– Embedded Analytics
– Free Form Interactive Exploration
– Internal Platform Integration
– IT-Developed Reports and Dashboards
– Metadata Management
– Mobile
– Traditional Styles of Analysis

Note there may be more than one operating model or baseline use case delivery scenario in use at your organization.  I just completed an engagement where three of the four operating models are in use.  This is exactly where the friction and confusion is created between IT Management and the Analytic Community. Every problem does not represent a nail where a hammer is useful. A set of tools and platforms which are ideal for Centralized Provisioning are usually terrible and completely unsuited for use within a Decentralized Analytics operating model.  Critical capability essential to Embedded Analytics is very different from Governed Data Discovery.  Yes there are some essentials that cross operating models (e.g. metadata), and in general this is a truly sound way to determine where your investment in capability should be occurring – and where it is not. In short, form follows function.  This is extremely helpful in using a common vocabulary where all stakeholders can understand the essentials when making analytic portfolio investment or simply selecting the right tool for the right job.

In a follow-up post I will provide an example and some simple tools you can use to help make ToolImage_01these decisions.  And remain committed to delivering value. After all, there is another prinicipal we should always remember. Analysis for analysis sake is just plain ridiculous.  Or has Tom Davenport said “…If we can’t turn that data into better decision making through quantitative analysis, we are both wasting data and probably creating suboptimal performance”.

Stay tuned…

Tagged: Big Data, Enterprise Architecture, Reference Architecture