Given HTML of the following form (generated outside of my control), how would I extract the text 'What I wanted' using Kanna?
给定以下表单的HTML(在我的控制之外生成),我如何使用Kanna提取文本“我想要什么”?
<div class="entry-meta">
\n\t\t\t<p>
<span class="tags-links">
<a href="http://example.com" rel="tag">This is not</a>
</span>
</p>
What I wanted\t\t
</div>\n
(The \n
s and \t
s are there in the original source, so are included here only for completeness - I can remove them using .trimmingCharacters(in:)
)
(\ns和\ts在原始的源代码中,所以这里只包含完整的源代码——我可以使用.trimmingCharacters(in:)删除它们)
Given I have an XMLElement
representing that div
node (nodes are XMLElement
s in Kanna, regardless of source data type), I've tried various ways of extracting the text 'What I wanted', but both .text
and .content
return 'This is not What I wanted'.
假设我有一个表示div节点的XMLElement(节点在Kanna中是XMLElements,无论源数据类型如何),我尝试了各种方法来提取文本“我想要的”,但是.text和.content返回“这不是我想要的”。
I was previously using Hpple, but it's not as Swifty and requires a lot more work to use. Given a reference to the same node, Hpple would yield the expected text via (node.children.last as! TFHppleElement).content
, but looking into the source of Kanna, it looks like .content
and .text
both return the result of libxmlGetNodeContent(nodePtr)
我以前用过Hpple,但是它没有那么灵活,需要做更多的工作。给定对同一节点的引用,Hpple将通过(node.children)生成预期的文本。持续!TFHppleElement)。内容,但是查看Kanna的源代码,它看起来像.content和.text都返回libxmlGetNodeContent(nodePtr)的结果
Is there another approach that I'm missing, or is this a shortcoming in Kanna?
我是否遗漏了另一种方法,或者这是肯纳的缺点?
1 个解决方案
#1
1
Kanna lets you select nodes using XPath expressions, and the node you want is the 2nd text-node child in that div
element, so you should be able to get it with this:
Kanna允许您使用XPath表达式选择节点,您想要的节点是div元素中的第二个text-node子节点,因此您应该能够通过以下方式获得:
divElement.xpath("text()[2]")
#1
1
Kanna lets you select nodes using XPath expressions, and the node you want is the 2nd text-node child in that div
element, so you should be able to get it with this:
Kanna允许您使用XPath表达式选择节点,您想要的节点是div元素中的第二个text-node子节点,因此您应该能够通过以下方式获得:
divElement.xpath("text()[2]")