匹配管理领域定义语言(MappingMaster DSL)——叙词转换为本体专用语言(二)

时间:2021-09-08 07:35:18

(续)

5、处理单元格内容

默认的操作是直接用引用单元格的内容。然而,默认的规则可以通过使用可选的值指定语句(value specification clause)改变。

这个语句通常由紧跟在编码指定的关键词后面的‘=’符号和由一个圆括号包围,逗号分隔的值指定列表。这些值指定列表,一个接一个的。这些值指定可以是单元格引用,引用的值,包含匹配组的正则表达式,或者内置的文档处理功能。


5.1 基本的单元格内容处理

   例如,扩展一个引用的表达式从而指定实体从单元格A5创建,就使用rdfs:label 命名编码并且名字的值是字符Sale在前,单元格值在后的的值。可以表达如下:

   Class:@A5(rdfs:label=("Sale:",@A5))

   值指定引用并不局限于引用单元格本身,也可以表达任意的单元格。多于一个编码也可为一个专门的引用指定,例如,不同的标识和标签值可以因为一个特殊的实体二生成,通过使用不同单元格的内容的方式。

   例如,我们能扩展上面的例子,从而给生成类的rdf:ID赋值为B5,如下:

   Class:@A5(rdf:ID=@B5  rdfs:label=("Sale:",@A5))


这个语言包含几个内置的文本处理方法,这些方法可以被用在值指定过程中。目前支持的方法包括mm:replacemm:replaceAllmm:replaceFirstmm:prependmm:appendmm:toLowerCasemm:toUpperCasemm:trimmm:reverse, and mm:printfmm:decimalFormat。这些方法能有0个或者更多个参数,并且有一个返回值。提供的参数可以是引用字符串和引用本身的任意组合。


一个在标签分配之前转换单元格A5中的内容为大写的格式的语句可以书写为:

Class:@A5(mm:toUpperCase(@A5))


值处理函数也可用在值指定语句的后面,但是仅限于这些语句没有在引用中使用,并且只有一个函数被使用。


5.2 decimalFormat and printf


decimalFormat 和printf支持对文字的和数字的内容的编码。他们的行为遵守标准的java语言的格式。

例如:

  Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00", @A1))
 
 Class: @A1(mm:printf("A_%s", @A1))
5.3 替换字符
mm:replace和mm:replaceAll函数从标准Java String类关联的方法中起作用。
例如,为了移除单元格中所有的非字母数字的字符,mm:replaceAll函数将通过如下方式使用:

Individual:@A5

Facts:hasItems @B5(mm:replaceAll("[^a-zA-Z0-9]",""))


5.4 前追加和后追加

  Class: @A5(rdfs:label=mm:prepend("Sale:")) 
Individual: @A2(mm:append("_MM")) 
5.5  文字

匹配管理目前支持如下的数据类型:

xsd:stringxsd:booleanxsd:bytexsd:shortxsd:intxsd:longxsd:floatxsd:double,xsd:integerxsd:decimalxsd:dateTimexsd:datexsd:timexsd:Durationrdf:PlainLiteralrdf:XMLLiteral


5.6 IRIs

为了自定义IRI创建过程,匹配管理有几个原则

mm:iri,mm:camelCaseEncode,mm:snakeCaseEncode,mm:uuidEncode,mm:hashEncode


5.7 缺失值处理

为了处理缺失单元格的值,默认值也可以在引用中被指定。默认值子句用来为这些单元的分配值。这个子句由mm:DefaultLocationValue,mm:DefaultLiteral,mm:DefaultLabel,和mm:DefaultID关键词表示,这些关键词后面紧跟一个为字符串的分配。例如,下面的表达式用这个子句来表明,“Unknown”值应该被用作新创建的类的label,如果单元格A5为空的情况下:

Class:@A5(rdfs:label mm:DefaultLabel="Unkown")

其他的行为也被支持来处理缺失的单元格值。默认的行为是忽略整个表达,如果它包含任何有空单元格值的引用。四个关键词被提供来更正这种行为。四个关键词是:

mm:ErrorIfEmptyLocation

mm:SkipIfEmptyLocation

mm:WarningIfEmptyLocation

mm:ProcessIfEmptyLocation

最后一个关键词允许电子表格的处理,这个电子表格可能包含大量缺失的值。这个关键词表明,这个语言处理器应该,如果可能的话,谨慎的去掉包含空引用的子表达语句,而不是去掉所有的表达。例如,下面的表达用电子表格的单元格A5申明一个Individual,并且用在单元格A6的值关联一个属性hasAge。

Individual:@A5

Facts:hasAge @A6(mm:ProcessIfEmptyLocation)


这里,用默认的忽略行为情况下,在单元格A5中丢失的值将会导致整个表达式都被忽略。然而,用Process规则的话,单元格A6将会被丢弃,仅仅会在包含它的子句为空的情况下。因此,如果单元格A5包含一个值,而单元格A6为空,这个结果表达式将任会申明一个Individual。


相似的方法,更多好的空值处理方法也被支持来指定一个不同的空值处理行为。这些处理行为可以针对:mm:Literal,rdf:ID和rdfs:label值。这里,这个标注指导规则包括mm:ErrorIfEmptyLabel.mm:SkipIfEmptyLable,mm:WarningIfEmptyLabel,和mm:ProcessIfEmptyLabel,响应的rdf:id和mm:Literal有相同的关键词:

mm:ErrorIfEmptyIDmm:SkipIfEmptyIDmm:WarningIfEmptyIDmm:ProcessIfEmptyID 和 mm:ErrorIfEmptyLiteralmm:SkipIfEmptyLiteralmm:WarningIfEmptyLiteral,mm:ProcessIfEmptyLiteral.


5.8 位置移动(转换)

一个额外的选项被提供来处理空单元格的值。这个选项的目标是在许多电子表格中通常出现的情况,一个特定的单元格被提供一个值,而其下面所有空单元格隐含着与它有着相同的值。在这种情况下,当这些空单元格被处理时,他们的位置必须装换到包含这个值的的单元格的位置。例如,下面的表达式用这个关键词来表明,如果调用A5不包含申明类的名称值,则行号补习向上转换直到一个值被找到。

Class:@A5(mm:ShiftUp)

如果没有值被找到,通用的空值处理国科可以被使用。相似的规则还有:mm:ShiftDown,mm:ShiftLeft,mm:ShiftRight


5.9 在一个引用中遍历一系列的单元格

很明显,大部分的匹配将不会仅仅引用单个的单元格,而是会遍历表格中的一系列的行或者列,通配符‘*’能在一个序列中引用中被用来引用到目前的列或者行。匹配管理提供一个图形接口来指定这些范围。

用这些通配符标注的引用范例包括:

@A3

@A*

@**

例如,遍历格网D4到G6以创建一个实例类,Sale,可以表达为:

Individual:@**

Types:Sale **

这个表达式可以被拓展来为这些实例的属性值分配属性

Individual:@**

Types:Sale

Facts:hasAmount @**

           hasProduct @B*

           hasState    @*2




附件:(英文原文)


MappingMaster uses a domain specific language (DSL) to define mappings from spreadsheet content to OWL ontologies. This language is based on the Manchester OWL Syntax, which is itself a DSL for describing OWL ontologies.

An introduction to the Manchester Syntax can be found here. A set of example Manchester Syntax expressions can be found in the Quick Reference section of that document.

The Manchester Syntax supports the declarative specification of OWL axioms.

For example, a Manchester Syntax declaration of an OWL named class Gum that is a subclass of a named class called Product can be written using using a class declaration clause as:

  Class: Gum SubClassOf: Product 

The MappingMaster DSL extends the Manchester Syntax to support references to spreadsheet content in these declarations. MappingMaster introduces a new reference clause for referring to spreadsheet content. In this DSL, any clause in a Manchester Syntax expression that indicates an OWL named class, OWL property, OWL individual, data type, or a literal can be substituted with this reference clause. Any declarations containing such references are preprocessed and the relevant spreadsheet content specified by these references is imported. As each declaration is processed, the appropriate spreadsheet content is retrieved for each reference. This content can then be used in four main ways:

  • It can be used to directly name OWL entities that are created on demand.
  • It can be used to annotate OWL entities that are created on demand.
  • The content may reference existing OWL entities, either directly as a URI or through an annotation property.
  • Finally, the content may be used as a literal.
Using one of these approaches, each reference within an expression is thus resolved during preprocessing to a named OWL entity, a data type, or a literal. The resulting expression can then be executed by a standard Manchester Syntax processor.

Table of Contents

References

Reference in the MappingMaster DSL are prefixed by the character @. These are generally followed by an Excel-style cell reference. In the standard Excel cell notation, cells extend from A1 in the top left corner of a sheet within a spreadsheet to successively higher columns and rows, with alpha characters referring to columns and numerical values referring to rows .

Basic References Use

For example, a reference to cell A5 in a spreadsheet is written as follows:

  @A5 

The above cell specification indicates that the reference is relative, meaning that if a formula containing the reference is copied to another cell then the row and column components of the reference are updated appropriately.

Sheets can also be specified by enclosing their name in single quotes and using the "!" character separator between the sheet name and the cell specification:

  @'A sheet'!A3 

For example, in the following spreadsheet rows 4 to 6 of column B contain product categories; columns D to G of row 2 contain state identifiers, and the grid range D4 to G6 contains sales amounts.

匹配管理领域定义语言(MappingMaster DSL)——叙词转换为本体专用语言(二)

These references can then be used in MappingMaster's DSL to define OWL constructs using spreadsheet content.

For example, a MappingMaster expression to declare that a class FlavouredGum is a subclass of the class named by the contents of cell B4 can be written:

  Class: FlavouredGum SubClassOf: @B4 

When processed, this expression will create an OWL named class using the contents of cell B4 ("Gum") as the class name and declare FlavouredGum to be its subclass. If the class Gum already exists, the subclass relationship will simply be established.

That is, references can be used both to define new OWL entities or to refer to existing entities.

A similar expression to declare that the class SalesItem is equivalent to the class named by the contents of cell B4 can be written:

  Class: SalesItem 
  EquivalentTo: @B4 

The Manchester Syntax also supports an individual declaration clause for declaring individuals; property values can be associated with the declared individuals using a facts subclause, which contains a list of property value declarations.

For example, an expression to specify that an individual created from the contents of cell D2 ("CA") has a value of "California" for a data property value hasStateName can be written:

  Individual: @D2 
  Facts: hasStateName "California" 

Here, an individual will CA be created if necessary and associated with the data property hasStateName, which will be given the string value "California".

Using the standard Manchester Syntax, annotation properties can also be associated with declared entities.

For example, an existing string data type annotation property called hasSource can be used to associated the above declared California individual with the source document as follows:

  Individual: @D2 
  Facts: hasStateName "California" 
  Annotations: hasSource "DMV Spreadsheet 12/12/2010" 

Classes or properties can be annotated in the same way. For example, a class can be annotated with the hasSource annotation property as follows:

  Class: @D2 
  Annotations: hasSource "DMV Spreadsheet 12/12/2010" 

The Manchester Syntax also supports the use of OWL class expressions. In general, a class expression may occur anywhere a named class can occur.

For example, an expression to define a necessary and sufficient condition of a class Sale used the contents of cell D4 as the filler of an owl:HasValue axiom with the property hasAmount can be written:

  Class: Sale 
  SubClassOf: (hasAmount value @D4) 

In general, OWL entities named explicitly in a MappingMaster expression (as opposed to resolved through a reference) must already exist in the target ontology. In these examples, the classes SaleSalesItem and FlavouredGum, and properties hasAmounthasStateName and hasSource must already exist.

Specifying the Type of a Reference

In the expression

   Class: @A5 
   SubClassOf: Drug 

reference @A5 clearly refers to an OWL class. However, the reference type cannot always be inferred unambiguously.

For example, in the expression

    Class: Sale 
    SubClassOf: (@A3 value @D4) 

the reference @A3 could refer to an object, data, or annotation property, and reference @D4 could be either an OWL individual or a literal.

To deal with this situation, Mapping Master supports explicit entity type specification. Specifically, a reference may be optionally followed by a parenthesis-enclosed entity type specification to explicitly declare the type of referenced entity. This specification can indicate that the entity is a named OWL class, an OWL object, data or annotation property, an OWL named individual, or a data type. The MappingMaster keywords to specify the types are the standard Manchester Syntax keywords ClassObjectPropertyDataPropertyAnnotationProperty and Individual, plus any XSD type name (e.g., xsd:int).

Using this specification, the previous drug declaration, for example, can be written:

  Class: @A5(Class) 
  SubClassOf: Drug 

A declaration of an individual from cell B5 with an associated property value from cell C5 that is of type float can be specified as follows:

  Individual: @B5 
  Facts: hasSalary @C5(xsd:float) 

If the hasSalary data property is already declared to be of type xsd:float then the explicit type qualification is not needed. A global default type can also be specified for literals in the case where the type of the associated data property is either unknown or unspecified or if no explicit type is provided in the reference.

References to OWL properties and individuals can be qualified in the same way.

Reference Resolution

References may specify OWL entities (i.e., classes, properties, individuals, or datatypes) or literals. When a reference specified an OWL entity the reference value may resolve to an existing OWL entity or may be used to name an OWL entity that is created on demand.

Basic Reference Resolution

A variety of name resolution strategies are supported when creating or referencing OWL entities. The three primary strategies are to:

  • Using rdf:IDs to create or resolve OWL entities.
  • Use rdfs:label annotations to create or resolve OWL entities
  • Create OWL entities based on the location of a cell ignoring the resolved reference value.
With  rdf:ID  encoding, and OWL entity generated from a reference is assigned its  rdf:ID  directly from the resolved reference value. Obviously, this content must represent a valid identifier (spaces are not, allowed in  rdf:ID s for example).

Using rdfs:label encoding, an OWL entity resolved from a reference is given an automatically generated URI and its rdfs:label annotation value is set to the resolved reference value.

With location encoding, an OWL entity generated from a reference also given an automatically generated URI but in this case the resolved reference value are unused.

The default naming encoding uses the rdfs:label annotation property. The default may also be changed globally.

A name encoding clause is provided to explicitly specify a desired encoding for a particular reference. As with entity type specifications, this clause is enclosed by parentheses after the cell reference. The keywords to specify the three types of encoding are mm:Locationrdf:ID, andrdfs:label.

Using this clause, a specification of rdf:ID encoding for the previous drug example can be written:

  Class: @B4(rdf:ID) 
  SubClassOf: Drug 

As mentioned, MappingMaster also supports entity creation where cell values are ignored. In this case, the keyword mm:Location can be used in parenthesis following a reference.

For example, an expression to create an individual for cell D4 while ignoring the contents of the cell can be written:

  Individual: @D4(mm:Location) 

By default, OWL entities names are resolved or generated using the namespace of the currently active ontology. The language includes mm:prefix and mm:namespace clauses to override this default behavior.

For example, an expression to indicate that an individual created or resolved from the contents of cell A2 (assuming rdfs:label resolution) should use the namespace identified by the prefix "clinical", can be written:

  Individual: @A2(mm:prefix="clinical") 

Similarly, an expression to indicate that it must use the namespace "http://clinical.stanford.edu/Clinical.owl#" can be written:

  Individual: @A2(mm:namespace="http://clinical.stanford.edu/Clinical.owl#") 

Explicit namespace or prefix qualification in reference allows disambiguation of duplicate labels in an ontology.

Reference Resolution Using Annotation Values

To support direct references to annotation values in expressions, MappingMaster's DSL adopts the Manchester Syntax mechanism of enclosing these references in single quotes.

For example, if the OWL class Product has an rdfs:label annotation value 'A sellable product' it can be referred as follows:

  Class: @B4 
  SubClassOf: 'A sellable product' 

A sellable product will be resolved through an annotation value to the class Product when this expression is processed.

Reference Resolution Configuration Options

Document the following options:

mm:defaultPrefixmm:defaultNamespacemm:defaultLanguagemm:ResolveIfOWLEntityExistsmm:SkipIfOWLEntityExistsmm:WarningIfOWLEntityExistsmm:ErrorIfOWLEntityExistsmm:CreateIfOWLEntityDoesNotExistmm:SkipIfOWLEntityDoesNotExistmm:WarningIfOWLEntityDoesNotExistmm:ErrorIfOWLEntityDoesNotExistmm:ProcessIfEmptyLabelmm:ErrorIfEmptyLabelmm:WarningIfEmptyLabelmm:SkipIfEmptyLabel

Processing Cell Content

The default behavior is to directly use the contents of the referenced cell. However, this default can be overridden using an optional value specification clause.

This clause is usually indicated by the '=' character immediately after the encoding specification keyword and is followed by a parenthesis-enclosed, comma-separated list of value specifications, which are appended to each other. These value specifications can be cell references, quoted values, regular expressions containing capturing groups, or inbuilt text processing functions.

Basic Cell Content Processing

For example, an expression that extends a reference to specify that the entity created from cell A5 is to use rdfs:label name encoding and that the name is to be the value of the cell preceded by the string "Sale:" can be written as follows:

  Class: @A5(rdfs:label=("Sale:", @A5)) 

Value specification references are not restricted to the referenced cell itself and may indicate arbitrary cells. More than one encoding can also be specified for a particular reference so, for example, separate identifier and label annotation values can be generated for a particular entity using the contents of different cells.

For example, we can extend the example above to assign the rdf:ID of generated classes to cell B5 as follows:

  Class: @A5(rdf:ID=@B5 rdfs:label=("Sale:", @A5)) 

If the assignment list includes only a single value then the opening and closing parenthesis can be omitted:

  Class: @A5(rdf:ID=@B5 rdfs:label=("Sale:", @A5)) 

The language includes several inbuilt text processing methods that be used in value specifications. At present, several methods are supported. These include mm:replacemm:replaceAllmm:replaceFirstmm:prependmm:appendmm:toLowerCasemm:toUpperCasemm:trimmm:reverse, and mm:printfmm:decimalFormat. These methods take zero or more arguments and return a value. Supplied arguments may be any combination of quoted strings or references.

An expression to convert the contents of cell A5 to upper case before label assignment can be written:

  Class: @A5(mm:toUpperCase(@A5)) 

A method can also have an explicit first argument omitted if the argument refers to the current location value. The previous expression can thus also be written:

  Class: @A5(mm:toUpperCase) 

Value processing functions can also used outside of value specification clauses - but only if these clause are not used in a reference, and only a single function can be used.

decimalFormat and printf

decimalFormat and printf support formatting of textual and numerical content. Their behavior follows the standard Java specifications for the DecimalFormat class and the String.formatmethod.

mm:decimalFormat can be used as follows:

  Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00", @A1))

When the value of cell A1 is "23000.2" this will render:

   Individual: Fred Facts: hasSalary "23,000.20"

Here is an example of mm:printf:

   Class: @A1(mm:printf("A_%s", @A1))

When value of cell A1 is "Car" this will render:

   Class: A_Car

Any parameter can be replaced with a reference clause. These functions will work with explicit rdf:ID and rdfs:label assignment too.

Note that if only one parameter is supplied the second is assumed to be the enclosing reference location.

So

   Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00"))

is equivalent to:

   Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00", @A1))

And

   Class: @A1(mm:printf("A_%s"))

is equivalent to:

   Class: @A1(mm:printf("A_%s", @A1))

Which is also equivalent to:

   Class: @A1(rdf:ID=mm:printf("A_%s", @A1))

Replacing Characters

The mm:replace and mm:replaceAll functions follow from the associated methods in the standard Java String class.

For example, to remove all non alphanumeric characters from a cell before assignment, the mm:replaceAll function can be used as follows:

  Individual: @A5 
  Facts: hasItems @B5(mm:replaceAll("[^a-zA-Z0-9]","")) 

Similarly, the mm:replace method can be used to replace commas with periods when processing literals:

  Individual: @A2 
  Facts: hasSalary @A3(xsd:float mm:replace(",", ".")) 

Prepending and Appending

The mm:prepend method can be used as follows to simplify the above example:

  Class: @A5(rdfs:label=mm:prepend("Sale:")) 

The expression can be further simplified by omitting the explicit rdfs:label qualification if it is the default:

  Class: @A5(mm:prepend("Sale:")) 

The append method works similarly.

For example, assuming default rdfs:label encoding, the string "_MM" can be appended to a generated label as follows using the mm:append function:

  Individual: @A2(mm:append("_MM")) 

Extracting Values Using Regular Expressions

A similar approach can be used to selectively extract values from referenced cells. A regular expression groups clause is provided and can be used in any position in a value specification clause. This clause is contained in a quoted string enclosed by square parenthesis. For example, if cell A5 in a spreadsheet contains the string "Pfizer:Zyvox" but only the text following the ':' character is to be used in the label encoding, an appropriate capture expression could be written as:

  Class: @A5(rdfs:label=[":(\S+)"]) 

Note that parentheses around the sub-expressions in a regular expression clause specify capture groups and indicate that the matched strings are to be extracted. In some cases, more than one group may be matched for a cell value, in which case the matched strings are extracted in the order that they are matched and are appended to each other.

Capturing groups can also be used to generate literals. For example, if cell A2 in a spreadsheet has a person's forename, middle initial, and surname separated by a single space, three capturing expressions can be used to selectively extract each name portion and separately assign them to different properties as follows:

  Individual: @A2 
  Types: Person 
  Facts: hasForename @A2(["(\S+)"]), 
         hasInitial @A2(["\S+\s(\S+)"]), 
         hasSurname @A2(["\S+\s\S+\s(\S+)"]) 

A similar example to separately extract two space-separated integers from a cell can be written as:

  Individual: @A2 
  Types: Person 
  Facts: hasMin @A2(xsd:int ["(\d+)\s+"]), 
         hasMax @A2(xsd:int ["\s+(\d+)"]) 

If the hasMan and hasMax properties are of type xsd:int then the explicit qualification is not required here.

Capturing expressions can also be invoked via the mm:capturing function:

  Individual: @A2 
  Types: Person 
  Facts: hasForename @A2(mm:capturing("(\S+)")

The syntax of capturing expressions follows that supported by the Java Pattern class.

Literals

Mapping Master currently supports the following datatypes:

xsd:stringxsd:booleanxsd:bytexsd:shortxsd:intxsd:longxsd:floatxsd:double,xsd:integerxsd:decimalxsd:dateTimexsd:datexsd:timexsd:Durationrdf:PlainLiteralrdf:XMLLiteral

IRIs

Mapping Master has several directives to customize the IRI creation process.

Directive Explanation
mm:iri Use the resolved reference value to generate an IRI. An error will be thrown if the generated value does not represent a valid IRI.
mm:camelCaseEncode  
mm:snakeCaseEncode  
mm:uuidEncode  
mm:hashEncode  

Missing Value Handling

To deal with missing cell values, default values can also be specified in references. A default value clause is provided to assign these values. This clause is indicated by the keywords mm:DefaultLocationValuemm:DefaultLiteralmm:DefaultLabel, and mm:DefaultID followed by an assignment to a string. For example, the following expression uses this clause to indicate that the value "Unknown" should be used as the created class label if cell A5 is empty:

  Class: @A5(rdfs:label mm:DefaultLabel="Unknown") 

Additional behaviors are also supported to deal with missing cell values. The default behavior is to skip an entire expression if it contains any references with empty cells. Four keywords are supplied to modify this behavior. These keywords indicate that:

  • An error should be thrown if a cell value is missing and the mapping process should be stopped (mm:ErrorIfEmptyLocation)
  • Expressions containing references with empty cells should be skipped (mm:SkipIfEmptyLocation)
  • Expressions containing references with empty cells should generate a warning in addition to being skipped (mm:WarningIfEmptyLocation)
  • Expressions containing such empty cells should be processed (mm:ProcessIfEmptyLocation).
The last option allows processing of spreadsheets that may contain a large amount of missing values. The option indicates that the language processor should, if possible, conservatively drop the sub-expression containing the empty reference rather than dropping the entire expression.

Consider, for example, the following expression declaring an individual from cell A5 of a spreadsheet and associating a property hasAge with it using the value in cell A6:

  Individual: @A5 
  Facts: hasAge @A6(mm:ProcessIfEmptyLocation) 

Here, using the default skip behavior action, a missing value in cell A5 will cause the expression to be skipped. However, the process directive for the hasAge property value in cell A6 will instead drop only the sub-expression containing it if that cell is empty. So, if cell A5 contains a value and cell A6 is empty, the resulting expression will still declare an individual.

Using a similar approach, more fine grained empty value handling is also supported to specify different empty value handling behaviors for mm:Literalrdf:ID and rdfs:label values. Here, the label directives are mm:ErrorIfEmptyLabelmm:SkipIfEmptyLabelmm:WarningIfEmptyLabel, andmm:ProcessIfEmptyLabel, with equivalent keywords for RDF identifier and literal handling. These are mm:ErrorIfEmptyIDmm:SkipIfEmptyIDmm:WarningIfEmptyIDmm:ProcessIfEmptyID and mm:ErrorIfEmptyLiteralmm:SkipIfEmptyLiteralmm:WarningIfEmptyLiteral,mm:ProcessIfEmptyLiteral.

Location Shifting

One additional option is provided to deal with empty cell values. This option is targeted to the common case in many spreadsheets where a particular cell is supplied with a value and all empty cells below it are implied to have the same value. In this case, when these empty cells are being processed, their location must be shifted to the location above it containing a value. For example, the following expression uses this keyword to indicate that call A5 does not contain a value for the name of the declared class then the row number must be shifted upwards until a value is found:

  Class: @A5(mm:ShiftUp) 

If no value is found, normal empty value handling processing is applied. Similar directives provide for shifting down (mm:ShiftDown), and to allow shifting to the left (mm:ShiftLeft), or to the right (mm:ShiftRight).

Iterating Over a Range of Cells in a Reference

Obviously, most mappings will not just reference individual cells but will instead iterate of a range of columns or rows in a spreadsheet. The wildcard character '*' can then be used in references to refer to the current column and/or row in an iteration. MappingMaster provides a graphical interface to specify these ranges. (They will soon be supported in the DSL.)

Example references using this wildcard notation include:

  • @A3
  • @A*
  • @**
For example, an expression that iterates over the grid D4 to G6 to create an individual of class  Sale  for each cell can be written:
  Individual: @** 
  Types: Sale **

This expression can be extended to assign property values to these individuals:

  Individual: @** 
  Types: Sale 
  Facts: hasAmount @**, 
         hasProduct @B*, 
         hasState @*2 

Manchester Syntax Coverage

The DSL does not support the entire Manchester Syntax. The following clauses are not currently supported:

  • OWL object property declarations
  • OWL data property declarations
  • OWL annotation property declarations
  • OWL datatype declarations
  • OWL literal type qualification
  • OWL disjoint classes
  • OWL equivalent and disjoint properties
  • OWL negative property assertions
  • OWL has key

Configuration Options

A set of global defaults can be specified for reference directives. The language has a number of clauses to specify these defaults.

The following examples illustrate the use of these clauses together with the current defaults.

  • mm:DefaultReferenceType Current default is Class. Other possible values include NamedIndividualObjectPropertyDataPropertyAnnotationProperty, and any XSD datatype.
  • mm:DefaultPropertyType Current default is ObjectProperty. Other possible value are DataProperty and AnnotationProperty.
  • mm:DefaultPropertyValueType Current default is xsd:string If we are expecting a (data or annotation) property value, use xsd:string
  • mm:DefaultDataPropertyValueType Current default is xsd:string. Other possible values include any XSD datatype.
  • mm:DefaultValueEncoding Current default is rdf:ID. Other possible values are rdfs:Labelmm:Literal andrdfs:Location.
  • mm:DefaultIRIEncoding Current default is mm:CamelCaseEncoding. Other passible values are mm:NoEncodemm:NoSnakeCaseEncodemm:UUIDEncode and mm:HashEncode.
  • mm:DefaultShiftSetting Current default is mm:NoShift. Other possible values are mm:ShiftUpmm:ShiftDownmm:ShiftLeft, and mm:ShiftRight.
  • mm:DefaultEmptyLocationSetting Current default is mm:WarningIfEmptyLocation.
  • mm:DefaultEmptyLiteralSetting Current default is mm:WarningIfEmptyLiteral.
  • mm:DefaultEmptyRDFIDSetting Current default is mm:WarningIfEmptyRDFID.
  • mm:DefaultEmptyRDFSLabelSetting Current default is mm:WarningIfEmptyRDFSLabel.
  • mm:DefaultIfOWLEntityExistsSetting Current default is mm:ResolveIfOWLEntityExists.
  • mm:DefaultIfOWLEntityDoesNotExistSetting Current default is mm:CreateIfOWLEntityDoesNotExist.
  • mm:DefaultLocationValue Current default is "".
  • mm:DefaultLiteralValue Current default is "".
  • mm:DefaultRDFID Current default is "".
  • mm:DefaultRDFSLabel Current default is "".
  • mm:DefaultLanguage Current default is "".
  • mm:DefaultPrefix Current default is "".
  • mm:DefaultNamespace Current default is "".

Summary

The MappingMaster DSL allows OWL axioms and entities to be created from spreadsheet content. The use of the Manchester syntax allows these OWL entities to be related to each other in complex ways.

Declaratively specifying mappings in this way has several advantages. The writing of these mappings does not require any programming or scripting expertise. These mappings can be shared easily using the MappingMaster GUI, which can save and load theese mappings. The mappings can also easily be executed repeatedly on different spreadsheets with the same structure.