XQuery:如何计算值按顺序出现的次数

时间:2020-12-02 23:54:52

I know that the function count can be used to count the number of elements in a given sequence, like this:

我知道函数计数可以用来计算给定序列中元素的数量,如下所示:

count(result/actors/actor)

in this XML:

在这个XML中:

<result>
    <actors>
        <actor id="00000015">Anderson, Jeff</actor>
        <actor id="00000030">Bishop, Kevin</actor>
        <actor id="0000000f">Bonet, Lisa</actor>
        <actor id="916503207">Parillaud, Anne</actor>
        <actor id="916503208">Pitt, Brad</actor>
        <actor id="916503209">Freeman, Morgan</actor>
        <actor id="916503211">Domingo, Placido</actor>
        <actor id="916503210">Sharif, Omar</actor>
        <actor id="1337">Doqumenteriet2011</actor>
    </actors>
</result>

But what if I want to know how many times a value occurs in a given sequence?

但是,如果我想知道给定序列中出现的值多少次呢?

For example if I would like to know how many movies each actor (actorRef) appeared in in the following XML:

例如,如果我想知道每个actor(actorRef)出现在以下XML中的电影数量:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000006</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>

I can easily produce a list of all the appearing actors, and even have them appear as many times in my produced sequence as in the XML:

我可以很容易地生成所有出现的演员的列表,甚至让它们在我生成的序列中出现多次,就像在XML中一样:

result/videos//actorRef

but I am not able to do anything similar to what for example COUNT() and GROUP BY do together in SQL to get a list of the actors by count of their multiplicity in the sequence produced by the above line of XQuery.

但我无法做任何类似于例如COUNT()和GROUP BY在SQL中一起做的事情,以通过上述XQuery行生成的序列中的多重性计数来获取actor的列表。

How can I produce this list?

我该如何制作这个清单?

PS: The end goal is to find the actors that appeared the most movies.

PS:最终目标是找到出现最多电影的演员。

3 个解决方案

#1


3  

This is the kind of question that isn't good for a document store when you are just storing the list of actors in videos. I'd suggest also storing the lists of videos that an actor is part of. Then you'd just have to query for the actor that has the most videos elements.

当您只是在视频中存储演员列表时,这是一种对文档存储不利的问题。我建议还存储一个演员所属的视频列表。然后你只需要查询具有最多视频元素的actor。

All that said, you can do it with the data you have it just isn't going to be all that fast. You first need to get a distance list of actors. Then query for each actor filter the videos that have that actor and do a count. and then order by count.

所有这一切,你可以用你拥有的数据来做它只是不会那么快。首先需要获得演员的距离列表。然后查询每个演员过滤具有该演员的视频并进行计数。然后按计数排序。

let $actors := fn:distinct-values($results/videos/video/actorRef)

for $actor in $actors
let $count := fn:count($results/videos/video[actorRef = $actor])
Order by $count
return ($actor, $count)

#2


3  

Here is a pure XPath 2.0 expression (XPath 2.0 is a proper subset of XQuery), that produces the sequence of actorRef values identifying the actors that participated in maximum number of movies:

这是一个纯XPath 2.0表达式(XPath 2.0是XQuery的一个合适子集),它生成一系列actorRef值,用于标识参与最多电影数量的actor:

 for $maxMovies in 
       max(for $actorId in distinct-values(/*/*/actorRef) 
            return
               count(index-of(/*/*/actorRef, $actorId))
           )
    return 
      (/*/*/actorRef)[index-of(/*/*/actorRef, .)[$maxMovies]]/string()

When this expression is evaluated on the following source XML document:

在以下源XML文档上计算此表达式时:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000009</actorRef>
        <actorRef>0000000x</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>
</videos>

The correct, wanted result is produced:

产生了正确的,想要的结果:

00000009

Using XPath 3.0 (proper subset of XQuery 3.0) one can even write this quite shorter:

使用XPath 3.0(XQuery 3.0的适当子集),甚至可以写得更短:

let $vSeq := /*/*/actorRef/string()
  return
    for $maxMovies in 
       max(for $actorId in distinct-values($vSeq) 
            return
              index-of($vSeq, $actorId) ! last()
           )
      return 
        $vSeq[index-of($vSeq, .)[$maxMovies]]

And this can be shortened even further using the simple mapping operator (!) to avoid any for-expression:

使用简单的映射运算符(!)可以进一步缩短这一点,以避免任何for-expression:

let $vSeq := /*/*/actorRef/string(),
    $maxOccurs := 
      max(distinct-values($vSeq) ! count(index-of($vSeq, .)) ) 
  return 
    $vSeq[index-of($vSeq, .)[$maxOccurs]]

#3


0  

Tyler's answer is the best solution for what you're ultimately trying to achieve, so I'd go with that, but to answer the specific question of how to count the number of times a value occurs in a sequence: you can use a predicate on the sequence to create a new sequence containing only the values that match the one you care about and then count that:

Tyler的答案是你最终想要达到的最好的解决方案,所以我会继续这样做,但是要回答如何计算序列中值出现次数的具体问题:你可以使用谓词在序列上创建一个新序列,其中只包含与您关注的值匹配的值,然后计算:

let $actors := result/videos//actorRef
for $actor in distinct-values($actors)
return
  ($actor, count($actors[. = $actor]))

#1


3  

This is the kind of question that isn't good for a document store when you are just storing the list of actors in videos. I'd suggest also storing the lists of videos that an actor is part of. Then you'd just have to query for the actor that has the most videos elements.

当您只是在视频中存储演员列表时,这是一种对文档存储不利的问题。我建议还存储一个演员所属的视频列表。然后你只需要查询具有最多视频元素的actor。

All that said, you can do it with the data you have it just isn't going to be all that fast. You first need to get a distance list of actors. Then query for each actor filter the videos that have that actor and do a count. and then order by count.

所有这一切,你可以用你拥有的数据来做它只是不会那么快。首先需要获得演员的距离列表。然后查询每个演员过滤具有该演员的视频并进行计数。然后按计数排序。

let $actors := fn:distinct-values($results/videos/video/actorRef)

for $actor in $actors
let $count := fn:count($results/videos/video[actorRef = $actor])
Order by $count
return ($actor, $count)

#2


3  

Here is a pure XPath 2.0 expression (XPath 2.0 is a proper subset of XQuery), that produces the sequence of actorRef values identifying the actors that participated in maximum number of movies:

这是一个纯XPath 2.0表达式(XPath 2.0是XQuery的一个合适子集),它生成一系列actorRef值,用于标识参与最多电影数量的actor:

 for $maxMovies in 
       max(for $actorId in distinct-values(/*/*/actorRef) 
            return
               count(index-of(/*/*/actorRef, $actorId))
           )
    return 
      (/*/*/actorRef)[index-of(/*/*/actorRef, .)[$maxMovies]]/string()

When this expression is evaluated on the following source XML document:

在以下源XML文档上计算此表达式时:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000009</actorRef>
        <actorRef>0000000x</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>
</videos>

The correct, wanted result is produced:

产生了正确的,想要的结果:

00000009

Using XPath 3.0 (proper subset of XQuery 3.0) one can even write this quite shorter:

使用XPath 3.0(XQuery 3.0的适当子集),甚至可以写得更短:

let $vSeq := /*/*/actorRef/string()
  return
    for $maxMovies in 
       max(for $actorId in distinct-values($vSeq) 
            return
              index-of($vSeq, $actorId) ! last()
           )
      return 
        $vSeq[index-of($vSeq, .)[$maxMovies]]

And this can be shortened even further using the simple mapping operator (!) to avoid any for-expression:

使用简单的映射运算符(!)可以进一步缩短这一点,以避免任何for-expression:

let $vSeq := /*/*/actorRef/string(),
    $maxOccurs := 
      max(distinct-values($vSeq) ! count(index-of($vSeq, .)) ) 
  return 
    $vSeq[index-of($vSeq, .)[$maxOccurs]]

#3


0  

Tyler's answer is the best solution for what you're ultimately trying to achieve, so I'd go with that, but to answer the specific question of how to count the number of times a value occurs in a sequence: you can use a predicate on the sequence to create a new sequence containing only the values that match the one you care about and then count that:

Tyler的答案是你最终想要达到的最好的解决方案,所以我会继续这样做,但是要回答如何计算序列中值出现次数的具体问题:你可以使用谓词在序列上创建一个新序列,其中只包含与您关注的值匹配的值,然后计算:

let $actors := result/videos//actorRef
for $actor in distinct-values($actors)
return
  ($actor, count($actors[. = $actor]))