查找没有任何文本节点的所有节点

时间:2021-07-14 18:04:16

With XPath (.NET), I'm trying to select all nodes that don't contain any text node.

使用XPath(.NET),我正在尝试选择不包含任何文本节点的所有节点。

Given this document:

鉴于此文件:

<root>
  <node1>
    <node1a>Node 1A</node1a>
  </node1>
  <node2>Node 2</node2>
  <node3>
    <node3a>Node 3A</node3a>
    <node3b></node3b>
  </node3>
  <node4></node4>
  <node5>
    <node5A></node5A>
  </node5>
</root>

I'm tyring to get the nodes:

我想要获取节点:

<node3b></node3b>

<node4></node4>

<node5>
  <node5A></node5A>
</node5>

Note that overlapping subtrees are merged, so node5A should not be returned separately.

请注意,重叠的子树是合并的,因此不应单独返回node5A。

I would expect this to pull the trick, but for some reason (which is probably obvious when someone points it out) it doesn't:

我希望这可以解决问题,但由于某种原因(当有人指出它时可能很明显)它不会:

//*[count(//text()) = 0]

Note: I'm using XPath tester to try things out.

注意:我正在使用XPath测试器来尝试。

4 个解决方案

#1


1  

Assuming your result example is really what you want (which is not totally in accordance with statement in the title) the suggestions above

假设您的结果示例实际上是您想要的(这不完全符合标题中的声明)上面的建议

//*[count(.//text()) = 0]

or the preferred way

或者首选的方式

//*[not(.//text())]

Doesn't work as the result is not what you expected

不起作用,结果不是你所期望的

<node3b />
<node4 />
<node5>
  <node5A />
</node5>
<node5A /> <!-- this node is not present in your example -->

If what you want is all subtrees without any text node not included in other resulting subtrees the solution is this one

如果您想要的是所有子树而没有任何文本节点未包含在其他结果子树中,那么解决方案就是这个

//*[not(.//text())][not(ancestor::*[not(.//text())])]

The second predicate remove from the result all the nodes which has at least one ancestor already included in the result

第二个谓词从结果中删除所有已经包含在结果中的至少一个祖先的节点

#2


2  

Arg... and just when posting, the solution crops up:

Arg ......就在发布时,解决方案就出现了:

//*[count(.//text()) = 0]

Explanation: the condition count(//text()) = 0 counts all text nodes from the root, which is always greater than zero. To count from the current node, I needed to prefix the dot: count(.//text()) = 0

说明:条件计数(// text())= 0对根中的所有文本节点进行计数,该节点始终大于零。要从当前节点计数,我需要在点前加上:count(.// text())= 0

Note that @jvverde correctly remarks that nodes can occur multiple times in the result set. So this expression is not an exact match for the conditions I mention, as node5A is in there twice:

请注意,@ jvverde正确地指出节点可以在结果集中多次出现。所以这个表达式与我提到的条件不完全匹配,因为node5A在那里两次:

<node3b></node3b>

<node4></node4>

<node5>
  <node5A></node5A>
</node5>

<node5A></node5A>

#3


1  

You could also use //*[.=''] as far as empty element should have empty string value.

你也可以使用//* [。=''],因为空元素应该有空字符串值。

#4


0  

You can also use the more simple and readable

您还可以使用更简单和可读性

//*[not(.//text())]

or replace not(...) by empty(...) if you prefer.

如果您愿意,可以用空(...)代替(...)。

Both are already optimized, so even simple XPath implementations should be able to implement them in a "fail-fast" manner (found one text node, evaluate predicate to false).

两者都已经过优化,因此即使是简单的XPath实现也应该能够以“快速失败”的方式实现它们(找到一个文本节点,将谓词评估为false)。

#1


1  

Assuming your result example is really what you want (which is not totally in accordance with statement in the title) the suggestions above

假设您的结果示例实际上是您想要的(这不完全符合标题中的声明)上面的建议

//*[count(.//text()) = 0]

or the preferred way

或者首选的方式

//*[not(.//text())]

Doesn't work as the result is not what you expected

不起作用,结果不是你所期望的

<node3b />
<node4 />
<node5>
  <node5A />
</node5>
<node5A /> <!-- this node is not present in your example -->

If what you want is all subtrees without any text node not included in other resulting subtrees the solution is this one

如果您想要的是所有子树而没有任何文本节点未包含在其他结果子树中,那么解决方案就是这个

//*[not(.//text())][not(ancestor::*[not(.//text())])]

The second predicate remove from the result all the nodes which has at least one ancestor already included in the result

第二个谓词从结果中删除所有已经包含在结果中的至少一个祖先的节点

#2


2  

Arg... and just when posting, the solution crops up:

Arg ......就在发布时,解决方案就出现了:

//*[count(.//text()) = 0]

Explanation: the condition count(//text()) = 0 counts all text nodes from the root, which is always greater than zero. To count from the current node, I needed to prefix the dot: count(.//text()) = 0

说明:条件计数(// text())= 0对根中的所有文本节点进行计数,该节点始终大于零。要从当前节点计数,我需要在点前加上:count(.// text())= 0

Note that @jvverde correctly remarks that nodes can occur multiple times in the result set. So this expression is not an exact match for the conditions I mention, as node5A is in there twice:

请注意,@ jvverde正确地指出节点可以在结果集中多次出现。所以这个表达式与我提到的条件不完全匹配,因为node5A在那里两次:

<node3b></node3b>

<node4></node4>

<node5>
  <node5A></node5A>
</node5>

<node5A></node5A>

#3


1  

You could also use //*[.=''] as far as empty element should have empty string value.

你也可以使用//* [。=''],因为空元素应该有空字符串值。

#4


0  

You can also use the more simple and readable

您还可以使用更简单和可读性

//*[not(.//text())]

or replace not(...) by empty(...) if you prefer.

如果您愿意,可以用空(...)代替(...)。

Both are already optimized, so even simple XPath implementations should be able to implement them in a "fail-fast" manner (found one text node, evaluate predicate to false).

两者都已经过优化,因此即使是简单的XPath实现也应该能够以“快速失败”的方式实现它们(找到一个文本节点,将谓词评估为false)。