First there was Hadoop. Then there were data scientists. Then came Agile BI on big data. Drum roll, please . . . bum, bum, bum, bum . . .
Now we have data preparation!
If you are as passionate about data quality and governance and I am, then the 5+-year wait for a scalable capability to take on data trust is amazingly validating. The era for "good enough" when it comes to big data is giving way to an understanding that the way analysts have gotten away with "good enough" was through a significant amount of manual data wrangling. As an analyst, it must have felt like your parents saying you can't see your friends and play outside until you cleaned your room (and if it's anything like my kids' rooms, that's a tall order).
There is no denying that analysts are the first to benefit from data preparation tools such as Altyrex, Paxata, and Trifacta. It's a matter of time to value for insight. What is still unrecognized in the broader data management and governance strategy is that these early forays are laying the foundation for data citizenry and the cultural shift toward a truly data-driven organization.
Today's data reality is that consumers of data are like any other consumers; they want to shop for what they need. This data consumer journey begins by looking in their own spreadsheets, databases, and warehouses. When they can't find what they want there, data consumers turn to external sources such as partners, third parties, and the Web. Their tool to define the value of data, and ultimately if they will procure it and possibly pay for it, is what data preparation tools help with. The other outcome of this data-shopping experience is that they are taking on the risk and accountability for the value of the data as it is introduced into analysis, decision-making, and automation.