There are not many models that have enjoyed such a long life and wide acceptance as the Knowledge Pyramid. Also known as the DIKW pyramid, it features data as the basis and shows how information is built on it, then knowledge, and finally wisdom. Each layer refers to the lower one, using it but adding something more. It tells the story of how data is transformed into information and information into knowledge. And being a pyramid, it implies that the higher you go the better things get, that there is more value but less quantity. There are variations. In some it is not actually shown as a pyramid, in others, wisdom is skipped and in at least one popular version enlightenment is put on top of wisdom or added in another way.
The model goes together with a set of conventions about the meaning of each concept and their relations. There is quite some variation of these definitions but the logical sequence is rarely questioned. What I’ve found as the most popular narrative in business is the following: Data are individual facts that need to be processed to generate information. When data are categorised, or interpreted, or put in context or better all of that, they turn into information. There is greater divergence what knowledge is, but most sources seem to suggest that if what is done to data, is done once again to information, you’ll get knowledge. It all sounds like a recipe for a delicious cake. What’s not to like?
Well, just about everything.
There are authors that refer to data as symbols or signals, but these interpretations have much less currency compared to data as facts.
In the context of the model, the definitions for data either explicitly assert or imply that data are devoid of meaning, that they exist out there and that it’s too early to speak of information at this stage, in fact, that this would be a fundamental error. And knowledge, knowledge is a way up.
Let’s check them one by one.
1. Data exist out there.
If I see a footstep in the sand – whatever that is – data or information, it is what it is for me, and won’t be the same for an insect climbing a “hill” made by the footstep.
2. Data are devoid of meaning
It’s worth to distinguish “data as facts” and “data as symbols” definitions. The first one is especially problematic. A fact is a statement about something. And statements usually have meaning. “Data as facts” often goes along with data being defined as “basic individual items of numeric or other information”. This is another contradiction as statements need a subject, predicate, and object, and not just one of them.
“Data as symbols” does not imply that that data are devoid of meaning, as it does not go along with the belief that “data exist out there”, independent of the observer. The same datum may have meaning for some but not for others. And the meaning will be different for different observers. This view is quite fine, but I would suggest that whatever makes something a symbol for somebody, may symbolise different things depending on the interaction, and as such it does not have meaning by itself. The meaning is made by the observer in the process of interaction, which, by the way, does not include only perception. But the important point here is that the same physical thing for the same observer might be seen as two different symbols in two different interactions.
Some make a step further to define data not as facts but as “objective facts”, thus explicitly linking (1) and (2). But isn’t that just another contradiction? Facts are statements and as such, they are stated by somebody. And even if we indulge the etymology quest, hopefully not literally seeking the “true sense” of the words but just insights, then fact, coming from the Latin verb facere, can be interpreted not only as “things done”, but also as “things made (up)”.
3. Knowledge comes way after data.
Let’s go back and imagine that what I see is not a footstep but just a dent, and I don’t know what it is exactly. But I can tell there is a dent. I know the difference between a smooth surface and a surface with a dent. Which means there is knowledge already. There is knowledge by the virtue of me being able to make a difference if there is a dent or not.
Information is said to be data in context. Whenever there are structure and context, data is transformed into information.
This notion of information “inherits” the problems of data, thus excluding the informed and their capability of being such. But there is another problem here – accepting the addition of structure and context as sufficient. Once they are added to data, there is information. But is that the case?
If I don’t know your birth date and you tell it to me and I understand what you tell me, this is information. If you tell me the same thing five minutes later, even if it has the same structure and context, as I already know it, it’s no longer information for me. If later on, I forget, and you tell me the same thing, then it will be information for me once again. As Luhmann put it “a piece of information that is repeated is no longer information. It retains its meaning in the repetition but loses its value as information”.
To inform means to let somebody know. There are two things to notice here. The transformation of the verb to inform into the noun information changes the nature of informing as the act of bringing knowledge. First, it demands a new kind of distinction between information and knowledge and this is how we end up with the DIKW layering. And second, the act of informing is changing the state of awareness of the one who is being informed. Looking at information as informing, would be a useful reminder that it is about an event. And making the same announcement to the receiver will not change the already changed state of awareness. A more formal expression of this can be found in the second law of form: “The value of a call made again is the value of the call”, as discussed in another blog post.
The idea of information as bringing new knowledge would probably evoke associations with the popular interpretation of Claude Shannon’s notion of information as the “average amount of surprise”. However, the “amount of surprise” or the “information entropy” refers to a message. If anyway, Shannon’s information theory is considered a good source for a definition of information, one should carefully check its relevance outside the domain of telecommunications.
Seeing information as the act of bringing knowledge is very much in line with one of the original meanings “the act of communicating knowledge to another person”, the other one being “the action of giving a form to something material”. In fact, the history of the concept information is very interesting, so if you are curious, check this article of Rafael Capurro.
Usually, when we discuss such issues, sooner or later I’m asked to suggest a better definition since I’m not happy with the popular ones. I don’t value the efforts spent on definitions. And it is not about being right or wrong. But then again some definitions are more useful than others, and some are particularly harmful. And the latter is the main motivation to write this article. I’ll try to explain that later.
Now, for those who won’t accept “no” as an answer, if I have to suggest a useful definition of information, I would either pick the one of Bateson, or – if a longer one is allowed – I’d use the following suggestion of Evan Thompson built on Bateson and Oyama:
“Information, dynamically conceived, is the making of a difference that makes a difference for some-body somewhere” (italics in the original).
Each part of this definition, “dynamically conceived”, “making of”, “difference that makes a difference”, “some-body” and “somewhere” deserve elaboration, but this will go beyond the objectives of this text.
I know there is a little chance of such definition to be accepted by the business, and later I’ll explain why. Now let’s quickly review knowledge and move on. I won’t bother discussing wisdom, as long as I’m keeping the text as prose.
The definitions of knowledge vary greatly. This makes it difficult to address their common properties. A prominent and a positive one is that people finally enter into the picture. On the other hand, almost without exception, knowledge is defined in reference to information. The availability of information is understood as the necessary but insufficient condition for knowledge. It’s rare that knowledge is seen dynamically as knowing, or even rarer as sense-making. This would have helped to realise that the place of knowledge in the pyramid is misleading, that knowledge cannot be embedded in documents, and that the popular split of knowledge into “implicit” and “explicit” neither makes sense, as knowledge can never be the one or the other, nor has any utility beyond providing conceptual support to consulting and software products and services. The split implicit/explicit is frequently used in Knowledge Management (KM) narratives. KM itself is a misnomer. Knowledge cannot be managed, although historically there have been attempts by interrogators to manage the retrieval process with varying degrees of success. If Knowledge Management is preaching the split, then it should either keep “implicit” or “management” but not both.
Where then to look for a good definition of knowledge?
One natural choice would be traditional epistemology, according to which there are three kinds: practical knowledge, knowledge by acquaintance, and propositional knowledge (for more details see Bernecker here, and also here). Practical knowledge refers to skills. That’s not too far from what in business is known as know-how. Knowledge by acquaintance is direct recognition of external physical objects, organisms, or phenomena. Propositional knowledge usually comes in the form of knowing-that. Propositional knowledge has entered some organisations through ICT, in cases where reasoning algorithms are applied. However, that would also be in clash with the DIKW model, as non-inferential knowledge would be indistinguishable from information.
Apart from epistemology, if philosophy is regarded as an eligible source, another method to apply would be phenomenology or find something in-between, which would be my preference.
Then, of course, there shouldn’t be a better place to look for a definition of knowledge than the cognitive science. But this is not likely to be a fruitful quest. First, there is no common understanding in cognitive science what knowledge is. And second, whatever communication is part of cognitive science is there only by the virtue of referring and being referred to within science. If it can provoke some communication in business, such communication would only be part of the business if it refers to and is referred by other business communication. It will inevitably be a misunderstanding and, only if lucky, a productive one.
In summary, the DIKW pyramid is problematic in showing data-information-knowledge as a logical sequence, especially when it comes to knowledge; for ignoring people at the “level” of data and information; when defining data as facts, and information as facts in context.
But, the DIKW pyramid is just a model. And, as zillions presentations keep reminding us, all models are wrong but some are useful. I have tried to explain why this model is particularly wrong. Now it’s time to say a few words why it is not useful and often quite harmful.
One of the problems of many organisations is information management. It is understood as a complex problem. And for dealing with complex problems, every convincing method of categorisation or another way of simplification is more than welcome. Enter the DIKW pyramid, normally as part of a bigger package. Now things look a bit more manageable as there are smaller chunks to deal with. First, after some exercises to get a common understanding of the problem, comes the split of responsibilities. The most extreme version of it applies the DIKW pyramid literally. One department gets the mandate to deal with data, another with information, and a third one with knowledge. The results are disastrous and can live for a long time without appearing as such.
Then each “discipline” works out its own understanding, it’s own problems and solutions. For data, there are data-related problems and data-related solutions. Very often the solution is to buy some kind of software. If the main data-related problem is with Master Data Management(MDM), then the solution is MDM software. For information-related problems, the solution could be some Business Intelligence package, while most of the knowledge management problems are believed to be solved by implementing good collaboration software.
But the model is not always harmful or without utility.
Some organisations have the DIKW in well-respected reports, but they actually ignore it. It stays as an elegant theory, while the actual decisions are made without using this frame. That’s one case when it is not harmful.
Some interpretations of the data-information-knowledge distinctions and sequence, in some contexts, can even be useful. For example, in the area of Linked Data, there is some utility in seeing URIs and literals as data, when in triples as information, and then when a SPARQL query gets a good answer, that can be said is bringing new knowledge.
And yet, the small utility in some domains does no justify the much bigger risks of using the model to support business decisions. So, before I let you go, in case your organisation is not worshiping the DIKW pyramid or has started to doubt it, I would suggest:
- If possible don’t use any layers and keep everything in one “discipline”. How about “Information management”?
- If that’s not possible, then distinguishing data and information can have some utility, but separate data management would always bring worse information management, and the latter is what the business cares about.
- Allow definitions to change depending on the context, as this is how the language works anyway.