Concept

The aim of Simple Semantic Data Modeling in XML (SeMoX) is to provide non-technical domain experts a simple model and additional tooling for capturing semantics of data with a technology-neutral approach.

In short, domain knowledge can be captured in a shareable semantic model that can be read and validated based on its specification by business and legal experts. Its technical implementation of validation artifacts and related tools is kept independently of these artifacts to allow for full freedom of the model’s technical implementation.

SeMoX can therefore bridge the gap between individuals with varying degrees of business, legal and technical expertise within a certain knowledge domain through the separation of domain knowledge and technical implementations.

As such, SeMoX is foremost designed for modeling data exchange standards between heterogeneous systems.

(Maybe insert Fig. 1 from ECIS Paper here)

SeMoX is open source under a permissive MIT Licence and invites usage and participation.

The project repository can be found here: https://projekte.kosit.org/semox/semox-model.

Building Blocks

SeMoX is simple because all it needs are five basic building blocks: Terms, Semantic Datatypes, Rules, Structures and Syntax Bindings.

Terms

hallo Koblenz cedric

hallo zurück nach Bremen

At the heart of SeMoX is the Term (a.k.a. Business Term (BT)).

A term is a word or phrase that captures the semantics of a particular concept that is used within a given knowledge domain. Whatever the underlying concept of a Term is, domain experts have to name and describe it.

As such, Terms are defined by Term-names and Term-descriptions through consultations with domain experts and encoded in SeMo-XML by technical experts.

A simple example might be:

Name Description
Invoice issue date The date at which the invoice was created.

A list of Terms is flat and can be seen as a glossary.

Semantic Datatypes

In order to use Terms in some kind of computable formal structure, it is essential to express term values as relatively simple data values. The Semantic Datatype is a relatively high level expression of the expected formal structure of the data content of a Term.

Once again, whatever the semantic or intent of a Datatype may be, domain experts have to name and describe it. In the following example, we expect any data field in any system to treat this Term as a date value in terms of a calendar date.

Name Description Example value Semantic Datatype
Invoice issue date The date at which the invoice was created. 2021/09/11 Date

Which and how many Semantic Datatypes are to be defined in a SeMoX model is a design choice.

Atomic Types

Composed Types

Rules

Rules are means of further imposing restrictions, assertions and constraints on the use of terms. These can be of any kind. On the one hand, Rules can be defined for human readers only:

Name Id Description
Multilang R-1 {{ site.project_name}} should allow Multiple Languages for defintions and names of elements.

On the other hand, however, Rules are most useful if they are developed from a pure business perspective and can be tested and answered with yes and no only.

Name Id Description
past-date R-2 An invoice date should be before or equal the current date.

The R-2 rule is testable and can also be implemented by various means. See [Semantic Data Rules in XML]({% link _documentation/en/model-in-xml-rules.md %}).

Rules set the expectation levels against which the technical implementation must be measured against.

Structures

All or some Business Terms can be aggregated into groups to form logic/semantic structures. These structures can be defined according to the intents and needs of the domain as defined by experts. This is independent of concrete implementations and simply defines each Term’s affiliation and cardinality.

For example, the structure of {{ site.project_name}} itself can be expressed as:

  • Semantic Model
    • Business Terms
      • Term 1
      • Term 2
    • Semantic Datatypes
      • datatype A
      • datatype B

This structure follows the logic of the order of explanation on this web page.

For the sake of real modeling, however, the structure resembles the following example more:

  • Semantic Model
    • Semantic Datatypes
      • datatype A
      • datatype B
    • Business Terms
      • Term 1
      • Term 2

Both structures are valid and many such structures can be represented in SeMoX. See [Semantic Data Models in XML]({% link_documentation/en/model-in-xml.md %}) for more details on actual implementation.

Syntax Bindings

In that sense, Syntax Binding loosely means that “You can find the representation of a term value here in an instance, because this part of the instance represents what we call Term X”.

As such, this allows binding every Term to a concrete element or attribute of an XML instance as defined by a standard. This binding information is crucial for data parsing and validation. Moreover, it allows automatically generating several business documents as well as technical artefacts from a SeMoX-model. (see TODO)

Metadata

Projects using SeMoX need additional metadata for e.g. date of creation or development status of the semantic model etc..