Achieving agility with data virtualization (2/2)

This posting is the follow up from a previous post where we described the need for agility as well as a setting where we believe that data virtualization techniques can help.

Following the definition of data virtualization of Rick van der Lans, we see data virtualization as a group of technologies that makes a heterogeneous set of databases and files look like one integrated database, which has some commonality with how many people see the concept of a federated database. As we will see shortly, though, data virtualization picks up where “traditional” data federation stops and provides organizations with a rich set of techniques for data integration issues:

Starting at the bottom, we see a series of source systems (or at least: the data part of it). The data structures are replicated and wrapped in the data virtualization server. The idea is that the virtualization software discovers the data structures in the source systems to make them available as virtual table structures. This achieves the notion of federation as mentioned earlier. If desired, the actual content of the source systems may be (partially) cached: this has the advantage that queries can be handled mostly in the virtualization environment to prevent huge workloads for the source systems.

Based on this virtualized ‘foundation’ layer, it is fairly straight forward to build new layers of virtual tables on top. This allows for building data structures that are close to the needs of end-users (i.e. star schema’s). It also allows for easy integration, application of transformation and integration rules and so on. In practice we increasingly see virtualized data warehouses, master data management hubs, etcetera.

One aspect of agility should be obvious from this discussion: development and data integration within the virtualized environment can be considerably more agile than in a traditional setting. Requirements and specification (e.g. meta-data management) could still be used, but rather than a long build and deploy time, we now have results available immediately in a virtual table structure. As a result, it is easy to learn-while-doing in quick and highly interactive cycles with end users: quick sprints will deliver a working prototype and later adjustments can easily be made without having wasted many valuable development hours.

This also demonstrates the fact that such a system itself is also considered to be agile:

It will be fairly easy – and most of all: fast – to adapt to ever changing business needs for information.
Deploying changes to a virtualized data model is easier than changing the data structure of a physical database, which can entail all kinds of data conversion issues.
Dealing with the impact of changes is easier, since no software is adapted, lowering the risk of disruptions and keeping the impact localized.
Integration of the solution is simple, since existing interfaces remain stable.

Built-in features around security, auditing, logging, and monitoring (i.e., when things change in the source systems) provide the organization with the means to stay in control of their data. In short:

Data virtualization decouples access to data from the source systems. This allows further manipulation of data without impacting the original systems.
Virtualized access to structured and unstructured data allows for uniform querying. Caching avoids heavy work-loads on the original transaction systems.
Data access can be optimized for various stakeholders with different needs, concept definitions, permissions etc.
Virtualization allows for rapid, incremental development & delivery of information with minimal impact on source systems.

This mechanism can be considered a key resource for agility that supports key activities in the organization. A virtual data warehouse with rapid / agile development of new data structures will make it easier to accommodate management that increasingly seeks data-based, rationalized decision making to complement with creative strategic skill. Suppose, for example, that there is a feeling that international markets can be conquered with a cross selling strategy: offer one product at a discount to generate interest for high-end services that will generate revenue. Running the numbers based on historic sales in countries where the organization was already active must be swift. Even more, when executing this strategy, the system should be flexible enough to easily monitor actual investments and revenues in near real-time.

The other obvious need for system agility in the field of data lies with compliance and regulations. Many industries are heavily regulated, for example in finance or healthcare, and rules for compliance reporting change all the time. In and by itself this need not be an issue. However, we often see that concept definitions change slightly, derivations and key calculations become more complex, other types of information are required, and so on. Here also, the rapid development cycles and flexibility.

If you have any questions or suggestions: either drop a note or get in touch via E-mail!