More detail on EA metamodel

Link: http://weblog.tetradian.com/2011/09/01/more-detail-on-ea-metamodel/

Moving on to more detail on that EA metamodel.

(By the way, a quick thank-you to Nic Plum and Sally Bean for really helpful peer-reviews on this. )

The legal bit: There’s a heck of a lot of unpaid work that’s gone into this, and also a lot of my own ‘prior art’ on these themes, dating back to at least September 2008, with more detailed specification dating from at least mid-2009. Although it’d be nice if someone actually paid me for some of the work that’s gone into this it really needs to be something that’s shared in the most open way possible, such as via open-source, so consider this for now to be published under a Creative Commons Attribution-NonCommercialShareAlike (CC BY-NC-SA) license. I don’t really want to see any restrictions on it at all, but unfortunately we do need some kind of protection here: it’s definitely not okay for some commercial organisation to lift it, put a couple of minor tweaks on it, pretend that the whole thing is and always has been their own private ‘intellectual property’, and then demand money from everyone else for the privilege – because we’ve seen way too much of that already, thankyouverymuch. Sigh…

What follows is deliberately broad and abstract. It’s missing quite a few implementation-details, in part because I’ve probably missed a few key items, but even more because some is still a bit hazy and needs proper review by folks who really know what they’re doing with implementable metamodels. As all too usual, it’s also long: my apologies… Oh well!

Core concepts

Right at the root of the metamodel is a single object, with just two attributes:

globally-unique identifier
‘instantiable’ boolean (if true, is ‘type’; if false, is ‘instance’)

We also create a special-purpose variant of this:

collection – a container for objects

And, from that, another special-purpose variant:

tag – an object that contains a collection of attributes

From this, we create our three core specialisations:

entity – a ‘thing’, which may also contain collections or tags
relation – an entity that may link between two tags
model – an entity that incorporates a collection of allowable relation-types and (references to) entity/relation instances, together with any required validation-rules

Fundamentally, everything is based on the same root entity-type. This commonality enables us to have a single standard repository-structure and exchange-format that can be used for every possible notation, re-used across all model-types and toolsets.

Collections

A collection is just a container. It can contain any types of objects, including tags and other collections. A collection is always embedded within another object, which provides its identity and supports any additional attributes (via tags) that it may need.

Example: every object (certainly every entity) will need a RACI collection to identify object-owners and people or roles responsible for or affected by the real-world item represented by the model-object type or instance.

Tags

A tag is a container for attributes. At the simplest level, an attribute is a name/value pair, but it might be better to implement it with somewhat more content, to include a distinct non-editable unique-identifier, an editable name, a value-type (simple, MIME-type or other), validation-rules and the attribute-content itself.

A tag is also optionally a target for relations (links between objects). Every object will need to be able to support at least a default ‘isAssociatedWith’ link to arbitrary other objects; the target-point for this would be a default tag embedded in every object.

More explanation later of how the tag-system would work in practice, and how it could be implemented.

Versioning and ‘inheritance’

Every object would need to support in-depth versioning. As I see it, the simplest way to envision this is that every object is in effect its own wiki-page. The surface view of the object represents the current state of all the attributes and suchlike of that object, but it accretes a history of changes over time, and the entire history is always still available if required.

A ‘snapshot’ of ‘current state’ or suchlike consists simply of attaching a tag to one or more objects that signals the option to revert (usually temporarily) to the state that applied at that time.

Since there is only one root-object, with everything else defined by add-on tags, ‘multiple-inheritance’ consists simply of importing into a single ‘child’-object the tags that represent the state (attributes, relation-anchors etc) of the respective ‘parent’-objects. This may be done selectively, to enable partial-multiple-inheritance. De-inheritance can be done in much the same way, by removing or disabling selected tags within a child-object.

(One point not yet resolved is whether a ‘child’ incorporates an entire copy of its ‘parent’(s), or whether it simply points to them and builds its own versioning thereafter. In terms of data-storage, the latter is probably preferable, but does imply certain risks. However, it’s not critical at this preliminary design-stage, and decisions on this can probably be left until initial implementation.)

Entity

An entity is an object that incorporates (‘contains’) tags that define and maintain its attributes.

Attributes supported by an entity (i.e. via tags) may include any number of media-items of any appropriate type, including audio, video, images, URLs etc.

An entity will support one or more presentation-types. The default presentation-type is a summary of content in text-form (i.e. ‘wiki-page’), plus, for graphic display, a simple rectangle enclosing the entity-name. Other presentation-types may include images or bit-maps, SVG or other scalable/vector images (perhaps via a Visio-style layered-spreadsheet or equivalent). A model-type may specify a default presentation-type via an appropriate tag associated with the model-type (i.e. the presence of the tag on entities would suggest or enforce a specific presentation for that entity), such that the same entity would appear in different visual forms on BPMN or UML diagrams or concept-maps.

Relation

A relation is a specific type of entity that can connect to tags on other entities.

At present, I assume that relations can connect only between two entities (or, where the relation itself has embedded link-tags, between entities and/or other relations). A ‘one-to-many’ relation is actually a set of nominally-identical relations, but may be displayed in ‘branched’ form (as often used in concept-maps, for example) if the respective model-type supports that display-format.

Because relations connect to tags rather than to specific entity-types, the number of relation-types expands almost linearly with the complexity of the underlying metamodel, rather than exponentially or near-factorially as in conventional relation-to-entity metamodels. In general, sub-types ‘inherit’ the tag-set from their parent entity-type(s), hence no new relation-types need be created for each new sub-type.

Relation-types are typically associated with a notation, which in turn is typically associated with one or more specific model-types.

As for entities, each relation will support one or more presentation-types. The most common default graphic presentation-type for relations is a one-dimensional line, with optional arrowheads. Line-routing, collision-avoidance and the like are an aspect of user-interface and graphic display rather than a function of the relation itself: the only explicit function of the relation is to indicate that two entities are related in some way.

For some purposes, and in some user-interface contexts, creating a relation that is not anchored at one end will cause a matching null-entity to be created, according to the tag(s) associated with that relation and the applicable rules of the model-type currently in use. (See the Archi ‘Magic Connector‘ for an example of this type of user-interface functionality.)

One important optional tag for relations (e.g. ‘isFlow’) indicates that the relation also represents a ‘flow’ – a transaction or other exchange between entities. This is typically used to model simulations. The directionality, content and other aspects of the flow would be indicated by other tags, as mediated by the respective model-type within which the simulation is executed.

Model-type and model

A model (graphic, text, simulation etc) is an instance of a model-type.

A model-type is an entity that maintains a collection of tags and relation-types that define the allowable content. Note that, via this mechanism, a model-type does not need to specify the allowable entity-types – hence enabling extension of metamodels without requiring alterations to the list of relations or to the model-type.

A model-instance maintains a collection of references to entity- and relation-instances, together with any context-specific information required for graphic displays. (In some cases a model-type might also include templates for default entity- and/or relation-instances, such as for page-headers and other displayable model-identifiers.)

By default, any instances created within the model-type would be assigned all of the respective tags for that model-type or notation (i.e. initially constrained to that notation and usage). For example, a BPMN model-type would create only entities that are tag-compatible with the BPMN standard, and that can be displayed in accordance with the BPMN notation. However, since ultimately these are all still just entities with tags, and further tags can be added if required, potentially any of these entities may be re-used in any way in any other compatible notation. For example, we could re-use a BPMN Event in an Archimate model, representing the same Event in a different way for different modelling-purposes; a BPMN Swimlane entity might be re-used in Archimate as a Device, an Application or an Actor, with all of its BPMN relations carried through transparently to the Archimate model.

Full validation of models and entity-relationships is a function of the model-type, not the metamodel. This makes it possible to relax formal rigour during development or for certain types of simulation.

Crucially, a toolset and model-type must preserve any entity-tags and/or relations that it does not use or display. This is required to enable ’round-tripping’ between model-types and toolsets. Any toolset that does not support this will be able to import existing entities and relations, or export new entities and relations, but will not be able to ’round-trip’ amended entities and relations.

Model-types and models are often described in terms of views or viewpoints. For this purpose, a viewpoint or view is simply another standard entity, which is then associated with the model-type.

Models would typically display their results, in whatever form required, by ‘calling’ the appropriate presentation-type for each entity and relation in its scope (i.e. referenced within its instance-collection).

Non-semantic entities

Many types of models require or would support a variety of ‘non-semantic’ entities – in other words entities which do not in themselves add to the semantics of a model. Typical examples include:

annotation – an arbitrary explanatory note attached to an object or relation
group – a ‘box’ or other container to cluster a group of entities together for ease of reading
model-caption

An annotation would typically be implemented as an instance of a fairly low-level entity (i.e. one with few tags), optionally associated only with the model or model-view, but more usually with a connected relation that may be linked to any other entity or relation. (In other words, similar to an ‘annotation’ entity in Visio or Powerpoint.)

A group may be displayed just as a box, but in fact it has an implicit relation with all of the ‘contained’ entities. (See the Archi ‘Automatic Relationship Management System‘ for an example of how this would work in practice.)

A model-caption is a container for information about a model or model-view (such as used in several of the Visio templates, and in most types of controlled-diagrams). In effect, this is an entity that is attached to the model-instance, rather than actual content for the model, but would be displayed in the normal way.

Glossary and thesaurus

The total collection of entities and relations within the repository forms the content ‘holograph’ for the respective scope.

A glossary and thesaurus represent different views into that ‘holograph’, in part to answer the questions ‘Tell me about yourself?’ (glossary) and ‘Tell me what you’re associated with?’ (thesaurus).

The glossary for the context, or for any selected subset of the context, consists of a live report derived from the ‘definition’ tags of all entities in scope.

The thesaurus consists of a report, usually starting from a single entity, of all thesaurus-type relations from that entity. These would typically include standard relations such as synonym, antonym, broader-term, narrower-term, conflicts-with and so on. (In some cases these relations between entities may be automatically generated.)

Frameworks and notations

Taxonomy frameworks such as Zachman, or the layered structures used in TOGAF or Archimate, can be represented very simply via sets of tags, and (in the case of supposed ‘layers’) explicit rules around relations that connect to those tags.

Notations consist of sets of tags that define (or extend) specific entity-types, and matching sets of representation-types.

Governance and change-management

Governance regimes such as used in the TOGAF ADM, PRINCE2, PMP and the like, and models such as Gantt-charts, can be represented by sets of entities, collections, and relations between them and other entities and/or relations within the scope covered by a particular change-project, change-programme or whatever. In short, everything is just an entity, or a relation between entities.

A governance method would in effect be the usage of a specific model-type which uses a governance-set, using the validation-rules defined within that model-type.

As described above, every entity, relation and model should have an associated RACI (responsibility-assignment) collection.

Governance-artefacts such as reference-models may be created by ‘freezing’ an existing model as a new model-type, and then enabling relations from there to other entities that represent the implementation of the respective items described in the reference-model. A waiver or dispensation (i.e. an accepted and documented breach of the validation-rules for the reference-model) in effect consists of a descriptive entity that is linked to the non-compliant item and to the respective section of the reference-model.

Toolset-ecosystem

The full toolset-ecosystem for enterprise modelling and sense-making covers a huge range, including:

enterprise-wide repository, supporting specialist edit/moderation tools and simpler client-server interfaces (e.g. static or annotatable web-pages)
team-based tools – essentially a smaller-scale version of the enterprise-tools, with a repository shared across the team
single-user repository-based tools
non-repository tools (such as Visio or Powerpoint)
tablet-style systems with gesture-based interface, often thin-client only or with limited local repository
handheld tools (smartphone etc) primarily used as thin-client
pencil and paper

Given this scope and range, it is extremely unlikely that there would ever be ‘One Toolset To Rule Them All’. However, this metamodel can link between them all, by providing a means to share information and models between them. Each toolset-type could also use any of the model-types, frameworks and notations supported by the metamodel, and link between all of those as well. In that sense, not ‘One Toolset To Rule Them All’, but possibly ‘One Standard Data-Structure To Link Them All’.

Conceptually, the metamodel described here is best suited to the enterprise-wide repository. Merge and data-cleansing would be important moderator/curator activities at this level.

For team-based tools and single-user repository-based tools, these would support the metamodel directly. They would need to be able to import and export individual models and/or selected sections of the entire repository. Again, merge and data-cleansing are important activities, though this is likely to be the scope within which most actual specialist modelling takes place.

For non-repository tools, distinct import and export functions will be required. These tools do not actually have any real concept of ‘entity-type’: everything is an uncontrolled instance, in essence without any layering or depth. Import into these tools is relatively straightforward, simply by ‘flattening’ the internal structure of existing instances and selecting an appropriate representation for the respective notation; but real care will be needed when importing into a repository-based system an exported model from a non-repository tool, because much of the underlying structure may have been lost, and connection to a parent entity-type will be questionable at best.

Tablet-style tools may or may not be repository-based – the main distinction here is the interface-metaphor (gestural rather than text/visuals or keyboard/mouse) rather than the underlying metamodel. If repository-based, or thin-client only where the repository is maintained only on the source server, the usage is much the same as for a team-based or single-user tool. If essentially a non-repository-tool (equivalent to Visio or Powerpoint), then the respective constraints apply. If solely graphic (i.e. a ‘draw’ application), the image-file should be handled much as for pencil and paper.

Most handheld tools would be used as thin-client, with primarily for local consumption and with little to no original information-capture. The thin-client relationship would, however, enable update of uncontrolled sections of information – such as the free-form ‘wiki-page’ section for each entity or relation.

Pencil and paper will still remain one of the most common tools for idea-capture and initial development. Some applications already exist to scan paper-based drawings and create the respective entities and relations; these will always need subsequent moderation and data-check, but it is feasible. The layered tag-based structure for this metamodel should make that process easier than for some other existing metamodels and notations, particularly as it allows information-capture in ‘non-standard’ forms.

Implementing the metamodel

Implementation could be relatively straightforward, because at a fundamental level there is only one entity-type, and only one relation-type – everything else is created via the ‘tag’ concept, which itself should be reasonably simple to implement. The only fundamental difference between entities and relations is that relations can link to things, whilst entities cannot (in their own right, anyway). The ‘wiki-page’ concept is also fairly straightforward.

This could be implemented via a conventional relational-database, although an object-database would probably be a better choice in practice. I’ve implemented something fairly similar to this in PHP, with a MySQL back-end: it’s used as the base for several of my existing websites, such as tomgraves.org, tetradian.com, sempermetrics.com and the OS-EA Tools site. (It’s versatile, as you’ll see; but yes, as you’ll also see, it’s, uh, clunky – which is why I’m definitely not the person to be writing the code for this… )

As I see it at present, the common file-format can be a straightforward XML or JSON text-file. The core, and each notation or framework or the like (each, in effect, little more than a set of tag-definitions) would be specified as the equivalent of an XML DTD (data-type description), with a defined namespace, conceptually similar to namespaces in XML or Java.

Each notation and associated model-type is therefore a ‘plug-in’, conceptually similar to plug-ins in Eclipse and the like, except that in principle it could be possible (and in practice may indeed be possible) to implement these ‘plug-ins’ via a simple configuration-file, without requiring any embedded plug-in-specific code.

There’s a tricky area around how to specify representation-types for model-types, and how to define validation-rules within model-types, but I don’t see that it’d be a ‘deal-breaker’. It just needs someone who’s more experienced than I am about designing a configuration-language.

The other key point, though, is that because everything is in effect defined via an equivalent of a DTD, this means that any toolset that can interpret that configuration-file would also be able to implement and use any model-type or notation that can be defined via that form of DTD – and as far as I can see at present, that includes just about every notation or framework that I know of in general use in current enterprise-architecture, strategy, sense-making and the like. In other words, not quite ‘One Toolset To Rule Them All’, but in practice not far off it.

Summary

The metamodel described here can, in principle, cover just about all enterprise-architecture needs, and many other related needs, with one very simple structure.

The core of the structure is a single entity-type; a single relation-type; a very simple ‘collections’ structure that can be used within any of these; a concept of ‘tags’ that carry attributes and/or presentation-types, and to which relations may be linked; and a structure for model-types that inverts the usual structure by focussing on relation-types rather than entity-types.

Because the structures are so simple, it should be straightforward to implement in practice, for different toolsets using a range of different user-interface metaphors, across the whole of the toolset-ecosystem.

Okay, that’s it: over to you? Any comments/advice/suggestions, anyone?

(And many thanks for reading this far, of course! )