On this page I would like Selenium for Python to grab the text contents of the "Investment Objective", excluding the <h3>
header. I want to use XPath.
在这个页面上,我希望Selenium for Python能够获取“投资目标”的文本内容,不包括
标题。我想使用XPath。
The nodes look like this:
节点看起来像这样:
<div class="carousel-content column fund-objective">
<h3 class="carousel-header">INVESTMENT OBJECTIVE</h3>
The Fund seeks to track the performance of an index composed of 25 of the largest Dutch companies listed on NYSE Euronext Amsterdam.
</div>
To retrieve the text, I'm using:
要检索文本,我正在使用:
string = driver.find_element_by_xpath(xpath).text
If use I the this XPath for the top node:
如果使用我的*节点的XPath:
xpath = '//div[@class="carousel-content column fund-objective"]'
It will work, but it includes the <h3>
header INVESTMENT OBJECTIVE
— which I want to exclude.
它会工作,但它包括
标题投资目标 - 我想排除。
However, if I try to use /text()
to address the actual text content, it seems that Selenium for Python doesn't let me grab it whilst using the .text
to get the attribute:
但是,如果我尝试使用/ text()来解决实际的文本内容,那么看起来Selenium for Python不会让我在使用.text获取属性时抓住它:
xpath = '//div[@class="carousel-content column fund-objective"]/text()'
Note that there seems to be multiple nodes with this XPath on this particular page, so I'm specifying the correct node like this:
请注意,在此特定页面上似乎有多个带有此XPath的节点,因此我正在指定正确的节点,如下所示:
xpath = '(//div[@class="carousel-content column fund-objective"]/text())[2]'
My interpretation of the problem is that .text
doesn't allow me to retrieve the text contents of the XPath sub-node text()
. My apologies for incorrect terminology.
我对这个问题的解释是.text不允许我检索XPath子节点text()的文本内容。我为不正确的术语道歉。
3 个解决方案
#1
4
/text()
will locate and return text node, which is not an element node. It doesn't have text
property.
/ text()将定位并返回文本节点,该节点不是元素节点。它没有文本属性。
One solution will be to locate both elements and remove the unwanted text
一种解决方案是找到两个元素并删除不需要的文本
xpath = '//div[@class="carousel-content column fund-objective"]'
element = driver.find_element_by_xpath(xpath)
all_text = element .text
title_text = element.find_element_by_xpath('./*[@class="carousel-header"]').text
all_text.replace(title_text, '')
#2
2
You can try below code to get required output:
您可以尝试以下代码来获取所需的输出:
div = driver.find_element_by_xpath('(//div[@class="carousel-content column fund-objective"])[2]')
driver.execute_script('return arguments[0].lastChild.textContent;', div).strip()
The output is
输出是
'The Fund seeks to track the performance of an index composed of 25 of the largest Dutch companies listed on NYSE Euronext Amsterdam.'
#3
1
To retrieve the text The Fund seeks to track the performance of an index composed of 25 of the largest Dutch companies listed on NYSE Euronext Amsterdam. you can use the following line of code :
检索案文本基金旨在追踪由纽约泛欧交易所集团旗下上市的25家最大荷兰公司组成的指数的表现。您可以使用以下代码行:
string = driver.find_element_by_xpath("//div[@class='carousel-content column fund-objective' and not (@class='carousel-header')]").text
#1
4
/text()
will locate and return text node, which is not an element node. It doesn't have text
property.
/ text()将定位并返回文本节点,该节点不是元素节点。它没有文本属性。
One solution will be to locate both elements and remove the unwanted text
一种解决方案是找到两个元素并删除不需要的文本
xpath = '//div[@class="carousel-content column fund-objective"]'
element = driver.find_element_by_xpath(xpath)
all_text = element .text
title_text = element.find_element_by_xpath('./*[@class="carousel-header"]').text
all_text.replace(title_text, '')
#2
2
You can try below code to get required output:
您可以尝试以下代码来获取所需的输出:
div = driver.find_element_by_xpath('(//div[@class="carousel-content column fund-objective"])[2]')
driver.execute_script('return arguments[0].lastChild.textContent;', div).strip()
The output is
输出是
'The Fund seeks to track the performance of an index composed of 25 of the largest Dutch companies listed on NYSE Euronext Amsterdam.'
#3
1
To retrieve the text The Fund seeks to track the performance of an index composed of 25 of the largest Dutch companies listed on NYSE Euronext Amsterdam. you can use the following line of code :
检索案文本基金旨在追踪由纽约泛欧交易所集团旗下上市的25家最大荷兰公司组成的指数的表现。您可以使用以下代码行:
string = driver.find_element_by_xpath("//div[@class='carousel-content column fund-objective' and not (@class='carousel-header')]").text