8 months, 12 days ago

Data by Design

Link: http://feedproxy.google.com/~r/Soapbox/~3/vr15FPEuC1M/data-by-design.html

If your #datastrategy involves collecting and harvesting more data, then it makes sense to check this requirement at an early stage of a new project or other initiative, rather than adding data collection as an afterthought.
For requirements such as security and privacy, the not-as-afterthought heuristic is well established in the practices of security-by-design and privacy-by-design. I have also spent some time thinking and writing about technology ethics, under the heading of responsibility-by-design. In my October 2018 post on Responsibility by Design, I suggested that all of these could be regarded as instances of a general pattern of X-by-design, outlining What,Why, When, For Whom, Who, How and How Much for a given concern X.
In this post, I want to look at three instances of the X-by-design pattern that could support your data strategy:
  • data collection by design
  • data quality by design
  • data governance by design
Data Collection by Design
Here’s a common scenario. Some engineers in your organization have set up a new product or service or system or resource. This is now fully operational, and appears to be working properly. However, the system is not properly instrumented.

Thought should always be given to the self instrumentation of the prime equipment, i.e. design for test from the outset. Kev Judge

In the past, it was common for a system is instrumented during the test phase, but once the tests are completed, data collection is switched off for performance reasons.
If there is concern that the self instrumentation can add unacceptable processing overheads then why not introduce a system of removing the self instrumentation before delivery? Kev Judge
Not just for operational testing and monitoring but also for business intelligence. And for IBM, this is an essential component of digital advantage:

Digitally reinvented electronics organizations pursue new approaches to products, processes and ecosystem participation. They design products with attention toward the types of information they need to collect to design the right customer experiences. IBM

The point here is that a new system or service needs to have data collection designed in from the start, rather than tacked on later.

Data Quality by Design
The next pitfall I want to talk about is when a new system or service is developed, the data migration / integration is done in a big rush towards the end of the project, and then – surprise, surprise – the data quality isn’t good enough.
Particularly relevant when data is being repurposed. During the pandemic, there was a suggestion of using BlueTooth connection strength as a proxy for the distance between two phones, and therefore an indicator of the distance between the owners of the phones. Although this data might have been adequate for statistical analysis, it was not good enough to justify putting a person into quarantine.

Data Governance by Design

Finally, there is the question of the sociotechnical organization and processes needed to manage and support the data – not only data quality but all other aspects of data governance.

The pitfall here is to believe you can sort out the IT plumbing first, leaving the necessary governance and controls to be added in later. 

Scott Burnett, Reza Firouzbakht, Cristene Gonzalez-Wertz and Anthony Marshall, Using Data by Design (IBM Institute for Business Value, 2018)

Kev Judge, Self Instrumentation and S.I. (undated, circa 2007)