Economic Value of Data

Link: http://feedproxy.google.com/~r/Soapbox/~3/3okudJUhBMk/economic-value-of-data.html

From Architecture, Data and Intelligence

How far can general principles of asset management be applied to data? In this post, I’m going to look at some of the challenges of putting monetary or non-monetary value on your data assets.

Why might we want to do this? There are several reasons why people might be interested in the value of data.

Establish internal or external benchmarks
Set measurable targets and track progress
Identify underutilized assets
Prioritization and resource allocation
Threat modelling and risk assessment (especially in relation to confidentiality, privacy, security)

Non-monetary benchmarks may be good enough if all we want to do is compare values – for example, this parcel of data is worth a lot more than that parcel, this process/practice is more efficient/effective than that one, this initiative/transformation has added significant value, and so on.

But for some purposes, it is better to express the value in financial terms. Especially for the following:

Cost-benefit analysis – e.g. calculate return on investment
Asset valuation – estimate the (intangible) value of the data inventory – e.g. relevant for flotation or acquisition
Exchange value – calculate pricing and profitability for traded data items

There are (at least) five entirely different ways to put a monetary value on any asset.

Historical Cost The total cost of the labour and other resources required to produce and maintain an item.
Replacement Cost The total cost of the labour and other resources that would be required to replace an item.
Liability Cost The potential damages or penalties if the item is lost or misused. (This may include regulatory action, reputational damage, or commercial advantage to your competitors, and may bear no relation to any other measure of value.)
Utility Value The economic benefits that may be received by an actor from using or consuming the item.
Market Value The exchange price of an item at a given point in time. The amount that must be paid to purchase the item, or the amount that could be obtained by selling the item.

But there are some real difficulties in doing any of this for data. None of these difficulties are unique to data, but I can’t think of any other asset class that has all of these difficulties multiplied together to the same extent.

Data is an intangible asset. There are established ways of valuing intangible assets, but these are always somewhat more complicated than valuing tangible assets.

Data is often produced as a side-effect of some other activity. So the cost of its production may already be accounted for elsewhere, or is a very small fraction of a much larger cost.

Data is a reusable asset. You may be able to get repeated (although possibly diminishing) benefit from the same data.

Data is an infinitely reproducible asset. You can sell or share the same data many times, while continuing to use it yourself.

Some data loses its value very quickly. If I’m walking past a restaurant, this information has value to the restaurant. Ten minutes later I’m five blocks away, and the information is useless. And even before this point, suppose there are three restaurants and they all have access to the information that I am hungry and nearby. As soon as one of these restaurants manages to convert this information, its value to the remaining restaurants becomes zero or even negative.

Data combines in a non-linear fashion. Value (X+Y) is not always equal to Value (X) + Value (Y). Even within more tangible asset classes, we can find the concepts of Assemblage and Plottage. For data, one version of this non-linearity is the phenomenon of information energy described by Michael Saylor of MicroStrategy. And for statisticians, there is also Simpson’s Paradox.

The production costs of data can be estimated in various ways. One approach is to divide up the total ICT expenditure, estimating roughly what proportion of the whole to allocate to this or that parcel of data. This generally only works for fairly large parcels – for example, this percent to customer transactions, this percentage to transport and logistics, etc. Another approach is to work out the marginal or incremental cost: this is commonly preferred when considering new data systems, or decommissioning old ones. We can compare the effort consumed in different data domains, or count the number of transformation steps from raw data to actionable intelligence.

As for the value of the data, there are again many different approaches. Ideally, we should look at the use-value or performance value of the data – what contribution does it make to a specific decision or process, or what aggregate contribution does it make to a given set of decisions and processes.

This can be based on subjective assessments of relevance and usefulness, perhaps weighted by the importance of the decisions or processs where the data are used. See Bill Schmarzo’s blogpost for a worked example.

Or it may be based on objective comparisons of results with and without the data in question – making a measurable difference to some key performance indicator (KPI). In some cases, the KPI may be directly translated into a financial value.

However, comparing performance fairly and objectively may only be possible for organizations that are already at a reasonable level of data management maturity.

In the absence of this kind of metric, we can look instead at the intrinsic value of the data, independently of its potential or actual use. This could be based on a weighted formula involving such quality characteristics as accuracy, alignment, completeness, enrichment, reliability, shelf-life, timeliness, uniqueness, usability. (Gartner has published a formula that uses a subset of these factors.)

Arguably there should be a depreciation element to this calculation. Last year’s data is not worth as much as this year’s data, and the accuracy of last year’s data may not be so critical, but the data is still worth something.

An intrinsic measure of this kind could be used to evaluate parcels of data at different points in the data-to-information process. For example, showing the increase of enrichment and usability from 1. to 2. and from 2. to 3., and therefore giving a measure of the added-value produced by the data engineering team that does this for us.

1. Source systems

2. Data Lake – cleansed, consolidated, enriched and accessible to people with SQL skills

3. Data Visualization Tool – accessible to people without SQL skills

If any of my readers know of any useful formulas or methods for valuing data that I haven’t mentioned here, please drop a link in the comments.

Heather Pemberton Levy, Why and How to Value Your Information as an Asset (Gartner, 3 September 2015)

Bill Schmarzo, Determining the Economic Value of Data (Dell, 14 June 2016)

Wikipedia: Simpson’s Paradox, Value of Information

Related posts: Information Algebra (March 2008), Does Big Data Release Information Energy? (April 2014), Assemblage and Plottage (January 2020)