14 years, 5 months ago

Updates Harmful

I have written about this before, I suspect. So forgive me if this is another representation of that resource.

Hanging out with Nigel Green and John Schlesinger is dangerous, so be warned. There be sacred cows being slaughtered in this post.

It all started innocently enough a year or so when Nigel and I were discussing the role of databases (record keeping) vs the role of databases (transaction scheduling, message management, etc.). Faulty normalization came into the picture too. Then the harm done by data modeling (where we thought we could model the rules in data). Large scale data modeling efforts requiring significant investment and it becoming very hard to see where the return comes from. Then an aha. If we go back a bit in time, the worst office job used to be “filing clerk.” An imaginary discussion when someone comes home from work, “Well dear, what did you do at work today?” “We opened a new file for Acme enterprises. That meant that our index cards were all messed up because we had run out of space for companies starting with A, so we had to rewrite some of those cards, but while we were there we saw that some cards pointed to files that no longer existed so we removed those – but only after we had added more space for the letter A companies (which we didn’t need right now anyway.)” “That’s nice dear, how about a nice cold beer?”

The point is that filing clerks used to be the lowliest members of the office – and yet in their electronic reincarnation they have acquired really expensive care and feeding. Of course the new clerks are the databases, the expensive care and feeding is manifested by having a group of thugs (DBAs) who hold everyone to ransom with their talk of normalization, SGA, proper keys,.. All things which we did pretty easily with clerks. So what’s going wrong? What is normalization for?

Taking normalization first – it is simply for ensuring that we don’t get update anomalies. That something that is to have the same value regardless of usage actually does have that value. You don’t have to have a normalized database to ensure that the update anomalies aren’t present. Although it is a bit easier.

What is going wrong, is in many ways a harder question. One fundamental thing going wrong is that we use the “filing cabinet” as a scratch pad. So returning to the physical world for a bit. Let’s imagine a filing cabinet in which we store the active accounts (perhaps bank accounts). When someone wishes to open an account, we give them a whole bunch of forms to fill in, they go off and fill them in and hand them back to us. We transcribe those forms and do some checkup on the data contained. Once we are happy with the data, we can now give the stuff to the filing clerk and have the clerk create the new file folder. So where were the forms and the checking? In some kind of “blotter” or case management pile on the clerk’s desk. They weren’t in the active accounts cabinets. And nor should they be.

No we go to a computerized system. We enter the data from the completed forms into the system and “poof” they create an active account. But actually it is more insidious than that. We go through a series of screens putting in different bits of the account – each leading us to a more perfect account, but we aren’t there yet. Eventually they will be in the active accounts database (but probably with an inactive flag) so that they can sometime be transacted. This is nuts. We are using the record keeping database (aka the filing cabinet) to manage work in process. This is not a proper separation of duties.

It gets worse. The company decides to “go online”. Expensive consultants are hired, golf outings are scheduled, expensive dinners eaten and the “new account workflow” is eventually unveiled. It, too is a sequence of steps. However, the poor schmuck filling this in has to complete each page of the form before moving on. That means that s/he cannot stop for a break – store a scratchpad version of this, do it out of sequence because they can’t remember their spouse’s Social Security number or whatever. The people in charge of the design of the system understand that THE SYSTEM needs accurate record keeping, have heard that “it is ALWAYS better to validate the data at the point of capture” and other platitudes, but forget that at the end of the line there is the poor user. For these kinds of data entry systems, (and a whole host of housekeeping systems) we need to store the “process state” separately. Don’t use the state of the key entity as a substitute for that. Store where I am in the account opening in the account opening process, not in the entity that represents the account.

So what got this diatribe really going? The notion that updates are unnatural – and probably harmful. I posit that the reason that we do updates is mostly because the common need for retrieval of something is the most recent version of it. So it makes sense to have access to the most recent version and update in place. But that isn’t always the most expedient behavior. Certainly the most recent value is often the value you need – especially in an operational system. However more and more systems really need the ability to look back. Even something as simple (looking) as you medical record is not something you want to update. Patient History is key. We don’t need to know the current cholesterol level (in isolation), we need its trend. So we don’t just update the “cholesterol value” in the patient record. We add a new item for the cholesterol and keep the history. We keep the record sorted in time sequence so we can see the latest. We don’t just overwrite the value. Our uses of data are so unpredictable, that simply updating database arbitrarily is going to give us data loss. We don’t know in advance how serious that data loss might be. Perhaps it would be better to assume that we will need everything and come up with a scheme that at some backbone level ensures that the current view can be reconstructed by replaying the operational events.