如何使用Kanna访问HTML元素的文本

时间:2022-10-30 10:42:11

Given HTML of the following form (generated outside of my control), how would I extract the text 'What I wanted' using Kanna?

给定以下表单的HTML(在我的控制之外生成),我如何使用Kanna提取文本“我想要什么”?

<div class="entry-meta">
    \n\t\t\t<p>
        <span class="tags-links">
            <a href="http://example.com" rel="tag">This is not</a>
        </span>
    </p>
    What I wanted\t\t
</div>\n

(The \ns and \ts are there in the original source, so are included here only for completeness - I can remove them using .trimmingCharacters(in:))

(\ns和\ts在原始的源代码中,所以这里只包含完整的源代码——我可以使用.trimmingCharacters(in:)删除它们)

Given I have an XMLElement representing that div node (nodes are XMLElements in Kanna, regardless of source data type), I've tried various ways of extracting the text 'What I wanted', but both .text and .content return 'This is not What I wanted'.

假设我有一个表示div节点的XMLElement(节点在Kanna中是XMLElements,无论源数据类型如何),我尝试了各种方法来提取文本“我想要的”,但是.text和.content返回“这不是我想要的”。

I was previously using Hpple, but it's not as Swifty and requires a lot more work to use. Given a reference to the same node, Hpple would yield the expected text via (node.children.last as! TFHppleElement).content, but looking into the source of Kanna, it looks like .content and .text both return the result of libxmlGetNodeContent(nodePtr)

我以前用过Hpple,但是它没有那么灵活,需要做更多的工作。给定对同一节点的引用,Hpple将通过(node.children)生成预期的文本。持续!TFHppleElement)。内容,但是查看Kanna的源代码,它看起来像.content和.text都返回libxmlGetNodeContent(nodePtr)的结果

Is there another approach that I'm missing, or is this a shortcoming in Kanna?

我是否遗漏了另一种方法,或者这是肯纳的缺点?

1 个解决方案

#1


1  

Kanna lets you select nodes using XPath expressions, and the node you want is the 2nd text-node child in that div element, so you should be able to get it with this:

Kanna允许您使用XPath表达式选择节点,您想要的节点是div元素中的第二个text-node子节点,因此您应该能够通过以下方式获得:

divElement.xpath("text()[2]")

#1


1  

Kanna lets you select nodes using XPath expressions, and the node you want is the 2nd text-node child in that div element, so you should be able to get it with this:

Kanna允许您使用XPath表达式选择节点,您想要的节点是div元素中的第二个text-node子节点,因此您应该能够通过以下方式获得:

divElement.xpath("text()[2]")