以XML格式存储二维表(决策表)以实现高效查询

时间:2022-02-23 01:08:25

I need to implement a Routing Table where there are a number of paramters.

我需要实现一个路由表,其中有许多参数。

For eg, i am stating five attributes in the incoming message below

例如,我在下面的传入消息中说明了五个属性

Customer Txn Group Txn Type Sender Priority  Target
UTI       CORP     ONEOFF   ABC    LOW       TRG1
UTI       GOV      ONEOFF   ABC    LOW       TRG2

What is the best way to represent this data in XML so that it can be queried efficiently.

在XML中表示此数据的最佳方法是什么,以便可以有效地查询它。

I want to store this data in XML and using Java i would load this up in memory and when a message comes in i want to identify the target based on the attributes.

我想将这些数据存储在XML中并使用Java,我会将其加载到内存中,当有消息进入时,我想根据属性识别目标。

Appreciate any inputs.

感谢任何输入。

Thanks, Manglu

5 个解决方案

#1


If you're loading it into memory, it doesn't really matter what form the XML takes - make it the easiest to read or write by hand, I would suggest. When you load it into memory, then you should transform it into an appropriate data structure. (The exact nature of the data structure would depend on the exact nature of the requirements.)

如果你将它加载到内存中,那么XML所采用的形式并不重要 - 我建议,最简单的是手动读取或写入。当您将其加载到内存中时,您应该将其转换为适当的数据结构。 (数据结构的确切性质取决于要求的确切性质。)

EDIT: This is to counter the arguments made in comments by Dimitre:

编辑:这是为了反驳Dimitre在评论中提出的论点:

I'm not sure whether you thought I was suggesting that people implement their own hashtable - I certainly wasn't. Just keep a straight hashtable or perhaps a MultiMap for each column which you want to use as a key. Developers know how to use hashtables.

我不确定你是否认为我建议人们实现他们自己的哈希表 - 我当然不是。只需为每个要用作键的列保留一个直接哈希表或MultiMap。开发人员知道如何使用哈希表。

As for the runtime efficiency, which do you think is going to be more efficient:

至于运行时效率,您认为哪种效率更高:

  • You build some XSLT (and bear in mind this is foreign territory, at least relatively speaking, for most developers)
  • 你构建了一些XSLT(并记住这是外国领域,至少相对来说,对大多数开发人员而言)

  • XSLT engine parses it. This step may be avoidable if you're using an XSLT library which lets you just parameterise an existing query. Even so, you've got some extra work to do.
  • XSLT引擎解析它。如果您使用的XSLT库允许您只对现有查询进行参数化,则可以避免此步骤。即便如此,你还有一些额外的工作要做。

  • XSLT engine hits hashtables (you hope, at least) and returns a node
  • XSLT引擎命中哈希表(至少你希望)并返回一个节点

  • You convert the node into a more useful data structure
  • 您将节点转换为更有用的数据结构

Or:

  • You look up appropriate entries in your hashtable based on the keys you've been given, getting straight to a useful data structure
  • 您可以根据已经给出的键在哈希表中查找适当的条目,直接进入有用的数据结构

I think I'd trust the second one, personally. Using XSLT here feels like using a screwdriver to bash in a nail...

我个人认为我相信第二个。在这里使用XSLT感觉就像用螺丝刀砸钉子一样......

#2


Here is a pure XML representation that can be processed very efficiently as is, without the need to be converted into any other internal data structure:

这是一个纯XML表示,可以非常有效地处理,而无需转换为任何其他内部数据结构:

<table>
 <record Customer="UTI" Txn-Group="CORP" 
      Txn-Type="ONEOFF" Sender="ABC1" 
      Priority="LOW"  Target="TRG1"/>

 <record Customer="UTI" Txn-Group="Gov" 
      Txn-Type="ONEOFF" Sender="ABC2" 
      Priority="LOW"  Target="TRG2"/>


</table>

There is an extremely efficient way to query data in this format using the <xsl:key> instruction and the XSLT key() function:

使用 指令和XSLT key()函数以这种格式查询数据是一种非常有效的方法:

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes"/>

 <xsl:key name="kRec" match="record"
  use="concat(@Customer,'+',@Sender)"/>

    <xsl:template match="/">
      <xsl:copy-of select="key('kRec', 'UTI+ABC2')"/>
    </xsl:template>
</xsl:stylesheet>

when applied on the above XML document produces the desired result:

当应用于上述XML文档时,会产生所需的结果:

<record Customer="UTI" 
        Txn-Group="Gov" Txn-Type="ONEOFF" 
        Sender="ABC2" Priority="LOW" 
        Target="TRG2"/>

Do note the following:

请注意以下事项:

  1. There can be multiple <xsl:key>s defined that identify a record using different combinations of values to be concatenated together (whatever will be considered "keys" and/or "primary keys").

    可以定义多个 ,其使用要连接在一起的不同值组合来标识记录(无论将被视为“密钥”和/或“主密钥”)。

  2. If an <xsl:key> is defined to use the concatenation of "primary keys" then a unique record (or no record) will be found when the key() function is evaluated.

    如果定义 使用“主键”的串联,则在评估key()函数时将找到唯一记录(或无记录)。

  3. If an <xsl:key> is defined to use the concatenation of "non-primary keys", then more than one record may be found when the key() function is evaluated.

    如果定义 使用“非主键”的串联,则在评估key()函数时可能会找到多个记录。

  4. The <xsl:key> instruction is the equivalent of defining an index in a database. This makes using the key() function extremely efficient.

    指令相当于在数据库中定义索引。这使得使用key()函数非常有效。

  5. In many cases it is not necessary to convert the above XML form to an intermediary data structure, due neither to reasons of understandability nor of efficiency.

    在许多情况下,由于可理解性和效率原因,没有必要将上述XML表单转换为中间数据结构。

#3


That depends on what is repeating and what could be empty. XML is not known for its efficient queryability, as it is neither fixed-length nor compact.

这取决于什么是重复,什么可能是空的。 XML因其高效的可查询性而闻名,因为它既不是固定长度也不是紧凑的。

#4


I agree with the previous two posters - you should definitely not keep the internal representation of this data in XML when querying as messages come in.

我同意前两张海报 - 在查询消息时,绝对不应该将这些数据的内部表示保留在XML中。

The XML representation can be anything, you could do something like this:

XML表示可以是任何东西,你可以做这样的事情:

<routes>
  <route customer="UTI" txn-group="CORP" txn-type="ONEOFF" .../>
  ...
  </routes>

My internal representation would depend on the format of the message coming in, and the language. A simple representation would be a map, mapping a structure of data (i.e. the key fields from which the routing decision is made) to the info on the target route.

我的内部表示将取决于进入的消息的格式和语言。简单表示将是映射,将数据结构(即,做出路由决定的关键字段)映射到目标路由上的信息。

Depending on your performance requirements, you could keep the key/target information as strings, though in any high performing system you'd probably want to do a straight memory comparison (in C/C++) or some form integer comparison.

根据您的性能要求,您可以将键/目标信息保存为字符串,但在任何高性能系统中,您可能希望进行直接内存比较(在C / C ++中)或某种形式的整数比较。

#5


Yeah, your basic problem is that you're using "XML" and "efficient" in the same sentence.

是的,你的基本问题是你在同一句话中使用“XML”和“高效”。

Edit: No, seriously, yer killin' me. The fact that several people in this thread are using "highly efficient" to describe anything to do with operations on a data format that require string parsing just to find out where your fields are shows that several people in this thread do not even know what the word "efficient" means. Downvote me as much as you like for saying it. I can take it, coach.

编辑:不,说真的,你杀了我。事实上,这个线程中的几个人正在使用“高效”来描述与数据格式上的操作有关的任何操作,这些数据格式需要字符串解析才能找到字段的位置,这表明该线程中的几个人甚至不知道是什么“有效”一词意味着。尽可能多地向我倾诉。我可以接受,教练。

#1


If you're loading it into memory, it doesn't really matter what form the XML takes - make it the easiest to read or write by hand, I would suggest. When you load it into memory, then you should transform it into an appropriate data structure. (The exact nature of the data structure would depend on the exact nature of the requirements.)

如果你将它加载到内存中,那么XML所采用的形式并不重要 - 我建议,最简单的是手动读取或写入。当您将其加载到内存中时,您应该将其转换为适当的数据结构。 (数据结构的确切性质取决于要求的确切性质。)

EDIT: This is to counter the arguments made in comments by Dimitre:

编辑:这是为了反驳Dimitre在评论中提出的论点:

I'm not sure whether you thought I was suggesting that people implement their own hashtable - I certainly wasn't. Just keep a straight hashtable or perhaps a MultiMap for each column which you want to use as a key. Developers know how to use hashtables.

我不确定你是否认为我建议人们实现他们自己的哈希表 - 我当然不是。只需为每个要用作键的列保留一个直接哈希表或MultiMap。开发人员知道如何使用哈希表。

As for the runtime efficiency, which do you think is going to be more efficient:

至于运行时效率,您认为哪种效率更高:

  • You build some XSLT (and bear in mind this is foreign territory, at least relatively speaking, for most developers)
  • 你构建了一些XSLT(并记住这是外国领域,至少相对来说,对大多数开发人员而言)

  • XSLT engine parses it. This step may be avoidable if you're using an XSLT library which lets you just parameterise an existing query. Even so, you've got some extra work to do.
  • XSLT引擎解析它。如果您使用的XSLT库允许您只对现有查询进行参数化,则可以避免此步骤。即便如此,你还有一些额外的工作要做。

  • XSLT engine hits hashtables (you hope, at least) and returns a node
  • XSLT引擎命中哈希表(至少你希望)并返回一个节点

  • You convert the node into a more useful data structure
  • 您将节点转换为更有用的数据结构

Or:

  • You look up appropriate entries in your hashtable based on the keys you've been given, getting straight to a useful data structure
  • 您可以根据已经给出的键在哈希表中查找适当的条目,直接进入有用的数据结构

I think I'd trust the second one, personally. Using XSLT here feels like using a screwdriver to bash in a nail...

我个人认为我相信第二个。在这里使用XSLT感觉就像用螺丝刀砸钉子一样......

#2


Here is a pure XML representation that can be processed very efficiently as is, without the need to be converted into any other internal data structure:

这是一个纯XML表示,可以非常有效地处理,而无需转换为任何其他内部数据结构:

<table>
 <record Customer="UTI" Txn-Group="CORP" 
      Txn-Type="ONEOFF" Sender="ABC1" 
      Priority="LOW"  Target="TRG1"/>

 <record Customer="UTI" Txn-Group="Gov" 
      Txn-Type="ONEOFF" Sender="ABC2" 
      Priority="LOW"  Target="TRG2"/>


</table>

There is an extremely efficient way to query data in this format using the <xsl:key> instruction and the XSLT key() function:

使用 指令和XSLT key()函数以这种格式查询数据是一种非常有效的方法:

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes"/>

 <xsl:key name="kRec" match="record"
  use="concat(@Customer,'+',@Sender)"/>

    <xsl:template match="/">
      <xsl:copy-of select="key('kRec', 'UTI+ABC2')"/>
    </xsl:template>
</xsl:stylesheet>

when applied on the above XML document produces the desired result:

当应用于上述XML文档时,会产生所需的结果:

<record Customer="UTI" 
        Txn-Group="Gov" Txn-Type="ONEOFF" 
        Sender="ABC2" Priority="LOW" 
        Target="TRG2"/>

Do note the following:

请注意以下事项:

  1. There can be multiple <xsl:key>s defined that identify a record using different combinations of values to be concatenated together (whatever will be considered "keys" and/or "primary keys").

    可以定义多个 ,其使用要连接在一起的不同值组合来标识记录(无论将被视为“密钥”和/或“主密钥”)。

  2. If an <xsl:key> is defined to use the concatenation of "primary keys" then a unique record (or no record) will be found when the key() function is evaluated.

    如果定义 使用“主键”的串联,则在评估key()函数时将找到唯一记录(或无记录)。

  3. If an <xsl:key> is defined to use the concatenation of "non-primary keys", then more than one record may be found when the key() function is evaluated.

    如果定义 使用“非主键”的串联,则在评估key()函数时可能会找到多个记录。

  4. The <xsl:key> instruction is the equivalent of defining an index in a database. This makes using the key() function extremely efficient.

    指令相当于在数据库中定义索引。这使得使用key()函数非常有效。

  5. In many cases it is not necessary to convert the above XML form to an intermediary data structure, due neither to reasons of understandability nor of efficiency.

    在许多情况下,由于可理解性和效率原因,没有必要将上述XML表单转换为中间数据结构。

#3


That depends on what is repeating and what could be empty. XML is not known for its efficient queryability, as it is neither fixed-length nor compact.

这取决于什么是重复,什么可能是空的。 XML因其高效的可查询性而闻名,因为它既不是固定长度也不是紧凑的。

#4


I agree with the previous two posters - you should definitely not keep the internal representation of this data in XML when querying as messages come in.

我同意前两张海报 - 在查询消息时,绝对不应该将这些数据的内部表示保留在XML中。

The XML representation can be anything, you could do something like this:

XML表示可以是任何东西,你可以做这样的事情:

<routes>
  <route customer="UTI" txn-group="CORP" txn-type="ONEOFF" .../>
  ...
  </routes>

My internal representation would depend on the format of the message coming in, and the language. A simple representation would be a map, mapping a structure of data (i.e. the key fields from which the routing decision is made) to the info on the target route.

我的内部表示将取决于进入的消息的格式和语言。简单表示将是映射,将数据结构(即,做出路由决定的关键字段)映射到目标路由上的信息。

Depending on your performance requirements, you could keep the key/target information as strings, though in any high performing system you'd probably want to do a straight memory comparison (in C/C++) or some form integer comparison.

根据您的性能要求,您可以将键/目标信息保存为字符串,但在任何高性能系统中,您可能希望进行直接内存比较(在C / C ++中)或某种形式的整数比较。

#5


Yeah, your basic problem is that you're using "XML" and "efficient" in the same sentence.

是的,你的基本问题是你在同一句话中使用“XML”和“高效”。

Edit: No, seriously, yer killin' me. The fact that several people in this thread are using "highly efficient" to describe anything to do with operations on a data format that require string parsing just to find out where your fields are shows that several people in this thread do not even know what the word "efficient" means. Downvote me as much as you like for saying it. I can take it, coach.

编辑:不,说真的,你杀了我。事实上,这个线程中的几个人正在使用“高效”来描述与数据格式上的操作有关的任何操作,这些数据格式需要字符串解析才能找到字段的位置,这表明该线程中的几个人甚至不知道是什么“有效”一词意味着。尽可能多地向我倾诉。我可以接受,教练。