Achieving agility with data virtualization (1/2)

Agility is a key ability of enterprises. This is particularly true in the current tough economic times. There is a lot of (external / management) pressure to quickly respond to changing conditions: the pace at which customers demand changes, the pressure of new laws and regulations, and the ease with which competitors can copy their services leads to tremendous pressure on companies.

Data is another important theme for many organizations. Indeed, once could argue that this has been true since the rise of information technology in the 1970’s. However, if we look at the literature over the last few decades, then it seems that there has been a gradual shift away from a focus on (information) systems towards managing data as an asset in its own right.

This is a common scenario: many organizations have grown either through an increased product portfolio with associated structures, or via a series of acquisitions and mergers and have ended up with various silos. Numerous attempts have been made to integrate these systems, either through rip-and-replace (i.e. introducing a major ERP package), introducing various interconnected interfaces, installing an Enterprise Service Bus (ESB), and so on. While these have fixed many (local) needs, the perception is that these efforts have been slow and expensive. Requests for new functionality as well as for reports have been piling up, and a business case for an enterprise data warehouse (EDW) has been in the making for a long time now: the idea of having an complete repository with all corporate data is appealing but still, building it seems like a daunting task. The following system illustrates a typical abstraction of the issue:

Application Landscape in ArchiMate

The diagram shows that there are many (logical) data flows between various systems which somehow seem to converge at the (planned) BI system. Some of these flows pass through the Master Data Management (MDM) hub where some integration and standardization takes place. Moving data across the landscape through each of these flows takes time and is a potential source of errors. Indeed, there have been some perceived data quality (DQ) issues: there are minor differences between definitions in various systems, derivation rules and key calculations (e.g. handling tax rates and discount schemes) differ and are hard coded, text fields are misused, etcetera. On top of that all, there is also a wide-spread idea that semi- and unstructured value such as E-mail and documents in the repository potentially have a lot of value.

This is a situation where data virtualization techniques can help. In the following posting we will show how that works.