Xpath获取带有子标记的父标记

时间:2022-11-27 18:23:15

I am scraping a website and the website does not have the information organized in a good way like there are sometimes fields called "Transmission" and "Engine type" and sometimes those does not exist and the problem is everything is inside each p tags, with a span tag with the Title like Transmission and Engine Type

我正在抓一个网站而且该网站没有以良好的方式组织信息,例如有时候称为“传输”和“引擎类型”的字段,有时这些字段不存在,问题是每个p标签内部的一切都是带标题的span标记,如Transmission和Engine Type

here you can understand it better if i show you

如果我告诉你,你可以在这里更好地理解它

Sometimes there are certain fields and sometimes there isn't

有时有某些领域,有时却没有

Xpath获取带有子标记的父标记

Here engine type and transmission is present in vehicle information

这里发动机类型和变速器存在于车辆信息中

Xpath获取带有子标记的父标记

Here engine type and transmission is not present in vehicle information

这里发动机类型和变速器不存在于车辆信息中

and there isn't any concrete way to map the fields for all pages Fields gets switched cause if i try to reach the engine type text with this xpath

没有任何具体的方法来映射所有页面的字段如果我尝试使用此xpath到达引擎类型文本,则会切换字段

'.//div[@id="result"]/div[@class="details"][2]/p[2]/text()'

maximum time i get different value cause the values in the p tags gets switched around so sometime i get transmission with this xpath and sometimes i get engine type with this xpath

最大时间我得到不同的值导致p标签中的值被切换,所以有时我得到这个xpath的传输,有时我得到这个xpath的引擎类型

so i was thinking is their a way to get the desired fields with the span titles right beside them ?

所以我想他们是一种方法来获得所需的字段与跨度标题在他们旁边?

like this

喜欢这个

<div class="details">
    <p><span class="label">Chassis/VIN #:</span>017S</p>
    <p><span class="label">Displacement:</span>0 </p>
    <p><span class="label">Odometer:</span>79,111</p>
    <p><span class="label">Condition:</span><a href="#condition-rating">2-</a>
    </p>
    <p><span class="label">Body Style:</span>coupe</p>
</div>

Every p tag has a span tag as a title, is there a way to get the p tag data with the span tag title ?

每个p标签都有一个span标签作为标题,有没有办法获得带有span标签标题的p标签数据?

so for example i can get p tag's text engine type with the span tag title text engine type ?

所以例如我可以使用span标签标题文本引擎类型获得p标签的文本引擎类型?

there is a way to get a item by text in xpath like this

有一种方法可以在xpath中按文本获取项目

"//*[contains(text(), 'The Text Associated With The Element')]/text()"

is there a way to implement something like this here ?

有没有办法在这里实现这样的东西?

1 个解决方案

#1


1  

You can get list of Title/Value pairs with:

您可以使用以下命令获取标题/值对列表:

//div[@class="details"]/p//text()

Output:

输出:

Chassis/VIN #: 
017S
Displacement: 
0 
Odometer: 
79,111
Condition: 
2-  
Body Style: 
coupe

If you want to get specific value by title, e.g. by "Odometer:":

如果您想按标题获取特定值,例如通过“里程表:”:

//div[@class="details"]/p[span="Odometer:"]/text()

Output:

输出:

79,111

#1


1  

You can get list of Title/Value pairs with:

您可以使用以下命令获取标题/值对列表:

//div[@class="details"]/p//text()

Output:

输出:

Chassis/VIN #: 
017S
Displacement: 
0 
Odometer: 
79,111
Condition: 
2-  
Body Style: 
coupe

If you want to get specific value by title, e.g. by "Odometer:":

如果您想按标题获取特定值,例如通过“里程表:”:

//div[@class="details"]/p[span="Odometer:"]/text()

Output:

输出:

79,111