在Marklogic数据库中存储名称/值对的最有效方法是什么

My application often needs to decorate values in the documents it serves using a lookup take to fetch human readable forms of various codes.

我的应用程序通常需要使用查找来修饰它所服务的文档中的值，以获取各种代码的可读形式。

For example <product_code>PC001</product_code> would want to be returned as <product_code code='PC001'>Widgets</product_code>. It's not always product_code; there are a few different types of code that need similar behaviour (some of them having just a few dozen examples, some of them a few thousand.)

例如， PC001 希望以 Widgets 的形式返回。它并不总是product_code;有一些不同类型的代码需要类似的行为（其中一些只有几十个例子，其中一些只有几千个。）

What I want to know is what is the most efficient way to store that data in the database? I can think of two possibilities:

我想知道的是将数据存储在数据库中的最有效方法是什么？我可以想到两种可能性：

1) One document per code type, with many elements:

1）每个代码类型一个文档，包含许多元素：

<product-codes>
  <product-code code = "PC001">Widgets</product-code>
  <product-code code = "PC002">Wodgets</product-code>
  <product-code code = "PC003">Wudgets</product-code>
</product-codes>

2) One document per code, each containing a <product-code> element as above.

2）每个代码一个文档，每个文档包含如上所述的元素。

(Obviously, both options would include sensible indexes)

（显然，两种选择都包括合理的指数）

Is either of these noticeably faster than the other? Is there another, better option?

这些中的任何一个明显比另一个快吗？还有另一种更好的选择吗？

My feeling is that it's generally better to keep one 'thing' per document since it's conceptually slightly cleaner and (I understand) better suited to ML's indexing, but in this case that seems like it would lead to a very large number of very small files. Is that something I should worry about?

我的感觉是，每个文档保留一个“东西”通常会更好，因为它在概念上稍微清晰一点，并且（我理解）更适合ML的索引，但在这种情况下，它似乎会导致非常大量的非常小的文件。那是我应该担心的吗？

2 个解决方案

#1

Anything that needs to be searched independently should be its own document or fragment. However, if you are just doing lookups then an element attribute range index should be very fast at returning values:

任何需要独立搜索的内容都应该是自己的文档或片段。但是，如果您只是在进行查找，那么返回值时元素属性范围索引应该非常快：

element-attribute-range-query(xs:QName('product-code'), xs:QName('code'), '=', 'PC001') 
=> 
Widgets

Using a range index the lookups will all occur from the same index regardless of how you chunk the documents. So unless you will need to use cts:search on product-code to retrieve the actual elements, it shouldn't matter how you chunk the documents.

使用范围索引，无论您如何分块文档，查找都将从同一索引发生。因此，除非您需要使用cts：搜索产品代码以检索实际元素，否则无论您如何分块文档都无关紧要。

#2

Another approach is to store a map that represents the name-value pairs.

另一种方法是存储表示名称 - 值对的映射。

let $m := map:map()
let $_ := map:put($m, 'a', 'fubar')
return document { $m }

This returns an XML representation of the hashmap, which can be stored directly in the database using xdmp:document-insert. You can turn an XML map back into a native map using map:map as a constructor function. The native map could also be memoized using xdmp:set-server-field.

这将返回hashmap的XML表示形式，可以使用xdmp：document-insert直接存储在数据库中。您可以使用map：map作为构造函数将XML映射转换回本机映射。也可以使用xdmp：set-server-field对本机映射进行备忘。

#1

任何需要独立搜索的内容都应该是自己的文档或片段。但是，如果您只是在进行查找，那么返回值时元素属性范围索引应该非常快：

element-attribute-range-query(xs:QName('product-code'), xs:QName('code'), '=', 'PC001') 
=> 
Widgets

#2

Another approach is to store a map that represents the name-value pairs.

另一种方法是存储表示名称 - 值对的映射。

let $m := map:map()
let $_ := map:put($m, 'a', 'fubar')
return document { $m }

秒客网

在Marklogic数据库中存储名称/值对的最有效方法是什么

2 个解决方案

#1

#2

#1

#2

相关文章