XML,S表达式和重叠范围......它叫什么?

时间:2021-10-07 09:05:10

I was reading XML is not S-Expressions. XML scoping is kind of strict, as are S-expressions. And in every programming language I've seen, you can't have the following:

我在阅读XML不是S-Expressions。 XML范围是一种严格的,S表达式也是如此。在我看过的每种编程语言中,你都不能拥有以下内容:

<b>BOLD <i>BOTH </b>ITALIC</i> == BOLD BOTH ITALIC

BOLD BOTH ITALIC == BOLD BALH ITALIC

It's not even expressible with S-Expressions:

它甚至不能用S-Expressions表达:

(bold "BOLD" (italic "BOTH" ) "ITALIC" ) == :(

(粗体“BOLD”(斜体“BOTH”)“ITALIC”)== :(

Does any programming language support this kind of "overlapping" scoping? Could there be any practical use for it?

是否有任何编程语言支持这种“重叠”范围?它有什么实际用途吗?

2 个解决方案

#1


3  

Overlapping markup structures has many practical uses. Consider for example applications of concurrent markup for text analysis in the humanities. The International Workshop on Markup of Overlapping Structures noted that:

重叠标记结构具有许多实际用途。考虑例如在人文学科中进行文本分析的并发标记的应用。国际重叠结构加价研讨会指出:

Overlapping structures are ubiquitous, appearing in applications of textual markup as varied as aircraft maintenance manuals and ancient scriptural and liturgical works. The “overlap issue“ raises its ugly head whenever text encoding looks beyond the snapshot view of a particular hierarchy to represent and process multiple concurrent aspects of a text, including features that reflect the text’s evolution across multiple versions and variants whether typographic or presentational, structural, annotational or referential, taxonomic or topical.

重叠结构无处不在,出现在文本标记的应用中,如飞机维护手册和古代圣经和礼仪作品。每当文本编码超出特定层次结构的快照视图以表示和处理文本的多个并发方面时,“重叠问题”就会引起其丑陋的头脑,包括反映文本在多个版本和变体中的演变的特征,无论是排版还是表达,结构,注释或参考,分类学或专题。

Overlap is a problem in texts as diverse as technical documents and product manuals (versioning), legal codes (effectivity), literary works (prosadic versus dramatic stucture, rhetorical structures, annotation), sacred texts (chapter plus verse reference versus sentence structure and commentary), and language corpora (multiple layers of linguistic annotation).

重叠是文本中的一个问题,如技术文档和产品手册(版本控制),法律代码(有效性),文学作品(经验与戏剧结构,修辞结构,注释),神圣文本(章节加上诗歌参考与句子结构和评论)。 )和语言语料库(多层语言注释)。

The Text Encoding Initiative (TEI) publishes Guidelines to handle non-nesting information and provides an XML syntax for overlap. They stated in 2004 that:

文本编码计划(TEI)发布处理非嵌套信息的指南,并提供重叠的XML语法。他们在2004年表示:

[N]o solution has yet been suggested which combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation, and clear identity with the notations needed for simpler cases (i.e. cases where the textual features do nest properly).

[N] o解决方案尚未被提出,它结合了形式简单的所有理想属性,表示所有正在发生或可想象的结构的能力,正式或机械验证的适用性,以及与简单案例所需符号的明确身份(即案例)文本功能正确嵌套的地方)。

Some options to handle overlapping structures include:

处理重叠结构的一些选项包括:

SGML has a CONCUR feature that can be used to support overlapping structures, although Goldfarb (the author of the standard) writes that "“I therefore recommend that CONCUR not be used to create multiple logical views of a document".

SGML具有CONCUR功能,可用于支持重叠结构,尽管Goldfarb(标准的作者)写道“因此我建议不要使用CONCUR来创建文档的多个逻辑视图”。

GODDAG provides a data structure for representing documents with overlapping structures.

GODDAG提供了一种表示具有重叠结构的文档的数据结构。

XCONCUR is an experimental markup language with the major goal to provide a convenient method to express concurrent hierarchies in an XML-like fashion.

XCONCUR是一种实验性标记语言,其主要目标是提供一种方便的方法,以类似XML的方式表达并发层次结构。

#2


2  

There probably isn't any programming language that supports overlapping scopes in its formal definition. While technically possible, it would make the implementation more complex than it needed to be. It would also make the language ambiguous as to accept as valid what would very likely supposed to be a mistake.

可能没有任何编程语言支持其正式定义中的重叠范围。虽然技术上可行,但它会使实施变得更加复杂。它还会使语言模糊不清,以至于接受有可能是错误的有效内容。

The only practical use I can think of right now is that it's less typing and is written more intuitively, just as writing attributes in mark-up feel more intuitive without uneccessary quotes, as in <foo id=45 /> instead of <foo id="45" />.

我现在能想到的唯一实际用途是它更少输入并且更直观地编写,就像在标记中写入属性时更加直观而没有不必要的引号,如 而不是

I think that enforcing nested structures makes for more efficient processing, too. By enforcing nested structures, the parser can push and pop nodes onto a single stack to keep track of the list of open nodes. With overlapped scopes, you'd need an ordered list of open scopes that you'd have to append to whenever you come across a begin-new-scope token, and then scan each time you come across an end-scope token to see which open scope is most likely to be the one it closes.

我认为强制嵌套结构也可以提高处理效率。通过强制执行嵌套结构,解析器可以将节点推送到单个堆栈以跟踪打开的节点列表。对于重叠范围,您需要一个有序的开放范围列表,当您遇到begin-new-scope标记时,您必须附加该开放范围,然后在每次遇到结束范围标记时进行扫描以查看哪个开放范围最有可能是它关闭的范围。

Although no programming languages support overlapping scopes, there are HTML parsers that support it as part of their error-recovery algorithms, including the ones in all major browsers.

虽然没有编程语言支持重叠范围,但是有一些HTML解析器支持它作为错误恢复算法的一部分,包括所有主流浏览器中的算法。

Also, the switch statement in C allows for constructs that look something like overlapping scopes, as in Duff's Device:

此外,C中的switch语句允许看起来像重叠范围的构造,如Duff的设备:

switch(count%8)
  {
   case 0:  do{ *to = *from++;
   case 7:      *to = *from++;
   case 6:      *to = *from++;
   case 5:      *to = *from++;
   case 4:      *to = *from++;
   case 3:      *to = *from++;
   case 2:      *to = *from++;
   case 1:      *to = *from++;

              } while(--n>0);
  } 

So, in theory, a programming language can have similar semantics for scopes in general to allow these kinds of tricks for optimization when needed but readability would be very low.

因此,从理论上讲,编程语言通常可以为范围提供类似的语义,以便在需要时允许这些技巧进行优化,但可读性非常低。

The goto statement, along with break and continue in some languages also lets you structure programs to behave like overlapped scopes:

goto语句以及某些语言中的break和continue也允许您将程序结构化为重叠范围:

BOLD: while (bold)
 { styles.add(bold)
   print "BOLD"

   while(italic) 
    { styles.add(italic)
      print "BOTH";
      break BOLD;
    }
 }

italic-continued: 
    styles.remove(bold)
    print "ITALIC"

#1


3  

Overlapping markup structures has many practical uses. Consider for example applications of concurrent markup for text analysis in the humanities. The International Workshop on Markup of Overlapping Structures noted that:

重叠标记结构具有许多实际用途。考虑例如在人文学科中进行文本分析的并发标记的应用。国际重叠结构加价研讨会指出:

Overlapping structures are ubiquitous, appearing in applications of textual markup as varied as aircraft maintenance manuals and ancient scriptural and liturgical works. The “overlap issue“ raises its ugly head whenever text encoding looks beyond the snapshot view of a particular hierarchy to represent and process multiple concurrent aspects of a text, including features that reflect the text’s evolution across multiple versions and variants whether typographic or presentational, structural, annotational or referential, taxonomic or topical.

重叠结构无处不在,出现在文本标记的应用中,如飞机维护手册和古代圣经和礼仪作品。每当文本编码超出特定层次结构的快照视图以表示和处理文本的多个并发方面时,“重叠问题”就会引起其丑陋的头脑,包括反映文本在多个版本和变体中的演变的特征,无论是排版还是表达,结构,注释或参考,分类学或专题。

Overlap is a problem in texts as diverse as technical documents and product manuals (versioning), legal codes (effectivity), literary works (prosadic versus dramatic stucture, rhetorical structures, annotation), sacred texts (chapter plus verse reference versus sentence structure and commentary), and language corpora (multiple layers of linguistic annotation).

重叠是文本中的一个问题,如技术文档和产品手册(版本控制),法律代码(有效性),文学作品(经验与戏剧结构,修辞结构,注释),神圣文本(章节加上诗歌参考与句子结构和评论)。 )和语言语料库(多层语言注释)。

The Text Encoding Initiative (TEI) publishes Guidelines to handle non-nesting information and provides an XML syntax for overlap. They stated in 2004 that:

文本编码计划(TEI)发布处理非嵌套信息的指南,并提供重叠的XML语法。他们在2004年表示:

[N]o solution has yet been suggested which combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation, and clear identity with the notations needed for simpler cases (i.e. cases where the textual features do nest properly).

[N] o解决方案尚未被提出,它结合了形式简单的所有理想属性,表示所有正在发生或可想象的结构的能力,正式或机械验证的适用性,以及与简单案例所需符号的明确身份(即案例)文本功能正确嵌套的地方)。

Some options to handle overlapping structures include:

处理重叠结构的一些选项包括:

SGML has a CONCUR feature that can be used to support overlapping structures, although Goldfarb (the author of the standard) writes that "“I therefore recommend that CONCUR not be used to create multiple logical views of a document".

SGML具有CONCUR功能,可用于支持重叠结构,尽管Goldfarb(标准的作者)写道“因此我建议不要使用CONCUR来创建文档的多个逻辑视图”。

GODDAG provides a data structure for representing documents with overlapping structures.

GODDAG提供了一种表示具有重叠结构的文档的数据结构。

XCONCUR is an experimental markup language with the major goal to provide a convenient method to express concurrent hierarchies in an XML-like fashion.

XCONCUR是一种实验性标记语言,其主要目标是提供一种方便的方法,以类似XML的方式表达并发层次结构。

#2


2  

There probably isn't any programming language that supports overlapping scopes in its formal definition. While technically possible, it would make the implementation more complex than it needed to be. It would also make the language ambiguous as to accept as valid what would very likely supposed to be a mistake.

可能没有任何编程语言支持其正式定义中的重叠范围。虽然技术上可行,但它会使实施变得更加复杂。它还会使语言模糊不清,以至于接受有可能是错误的有效内容。

The only practical use I can think of right now is that it's less typing and is written more intuitively, just as writing attributes in mark-up feel more intuitive without uneccessary quotes, as in <foo id=45 /> instead of <foo id="45" />.

我现在能想到的唯一实际用途是它更少输入并且更直观地编写,就像在标记中写入属性时更加直观而没有不必要的引号,如 而不是

I think that enforcing nested structures makes for more efficient processing, too. By enforcing nested structures, the parser can push and pop nodes onto a single stack to keep track of the list of open nodes. With overlapped scopes, you'd need an ordered list of open scopes that you'd have to append to whenever you come across a begin-new-scope token, and then scan each time you come across an end-scope token to see which open scope is most likely to be the one it closes.

我认为强制嵌套结构也可以提高处理效率。通过强制执行嵌套结构,解析器可以将节点推送到单个堆栈以跟踪打开的节点列表。对于重叠范围,您需要一个有序的开放范围列表,当您遇到begin-new-scope标记时,您必须附加该开放范围,然后在每次遇到结束范围标记时进行扫描以查看哪个开放范围最有可能是它关闭的范围。

Although no programming languages support overlapping scopes, there are HTML parsers that support it as part of their error-recovery algorithms, including the ones in all major browsers.

虽然没有编程语言支持重叠范围,但是有一些HTML解析器支持它作为错误恢复算法的一部分,包括所有主流浏览器中的算法。

Also, the switch statement in C allows for constructs that look something like overlapping scopes, as in Duff's Device:

此外,C中的switch语句允许看起来像重叠范围的构造,如Duff的设备:

switch(count%8)
  {
   case 0:  do{ *to = *from++;
   case 7:      *to = *from++;
   case 6:      *to = *from++;
   case 5:      *to = *from++;
   case 4:      *to = *from++;
   case 3:      *to = *from++;
   case 2:      *to = *from++;
   case 1:      *to = *from++;

              } while(--n>0);
  } 

So, in theory, a programming language can have similar semantics for scopes in general to allow these kinds of tricks for optimization when needed but readability would be very low.

因此,从理论上讲,编程语言通常可以为范围提供类似的语义,以便在需要时允许这些技巧进行优化,但可读性非常低。

The goto statement, along with break and continue in some languages also lets you structure programs to behave like overlapped scopes:

goto语句以及某些语言中的break和continue也允许您将程序结构化为重叠范围:

BOLD: while (bold)
 { styles.add(bold)
   print "BOLD"

   while(italic) 
    { styles.add(italic)
      print "BOTH";
      break BOLD;
    }
 }

italic-continued: 
    styles.remove(bold)
    print "ITALIC"