如何在XPath中仅返回第一个匹配结果?

时间:2021-12-30 08:34:54

I tried to use XPath string-after to grab data after Property ID: but the result is not what I want.It show all the result that matched with Property ID. I want only P-000324. And here are my code

我尝试使用XPath string-after来获取Property ID之后的数据:但结果不是我想要的。它显示了与Property ID匹配的所有结果。我只想要P-000324。这是我的代码

<?php
$getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
$dom = new DOMDocument();
@$dom->loadHTML($getURL);
$xpath = new DOMXPath($dom);

echo $xpath->evaluate("normalize-space(substring-after(., 'Property ID:'))");

So how can I make it get only one first result?

那么我怎样才能让它只得到一个第一个结果呢?

1 个解决方案

#1


3  

You can change your XPath expression to select the string after only the first occurrence of a p that contains Property ID: by using a position index ([1]).

您可以更改XPath表达式,以便仅在第一次出现包含Property ID的p后选择字符串:使用位置索引([1])。

For example, the following XPath expression will select just the first paragraph that directly contains the string 'Property ID:':

例如,以下XPath表达式将仅选择直接包含字符串'Property ID:'的第一个段落:

(//p[contains(text(),'Property ID:')])[1]

Putting this together with your request to return just the string that follows 'Property ID:' but nothing beyond the P-000324 string:

将此与您的请求放在一起只返回“Property ID:”后面的字符串,但不包括P-000324字符串之外的任何内容:

echo $xpath->evaluate("normalize-space(substring-before(substring-after((//p[contains(text(),'Property ID:')])[1], 'Property ID:'), '–'))");

will echo just P-000324 as requested.

将按要求回应P-000324。

Update: This solves the problem for the original page as it was originally presented, but the goal seems to be broader per the comments. A more robust solution would be to use just the first expression to obtain the string for the first paragraph containing 'Property ID' and then do regex pattern matching immediately after the label on normal forms of the property id or normal forms of delimiters surrounding property id. You'll have to use the regex facilities of the hosting language as XPath 1.0's string processing functions are very limited; XPath 2.0's are much better and included regex capabilities.

更新:这解决了最初呈现的原始页面的问题,但根据评论,目标似乎更广泛。更健壮的解决方案是仅使用第一个表达式来获取包含“Property ID”的第一个段落的字符串,然后在属性id的正常形式上的标签之后立即执行正则表达式模式匹配,或者在属性id的正常形式的分隔符之后执行。您将不得不使用托管语言的正则表达式工具,因为XPath 1.0的字符串处理功能非常有限; XPath 2.0更好,包括正则表达式功能。

#1


3  

You can change your XPath expression to select the string after only the first occurrence of a p that contains Property ID: by using a position index ([1]).

您可以更改XPath表达式,以便仅在第一次出现包含Property ID的p后选择字符串:使用位置索引([1])。

For example, the following XPath expression will select just the first paragraph that directly contains the string 'Property ID:':

例如,以下XPath表达式将仅选择直接包含字符串'Property ID:'的第一个段落:

(//p[contains(text(),'Property ID:')])[1]

Putting this together with your request to return just the string that follows 'Property ID:' but nothing beyond the P-000324 string:

将此与您的请求放在一起只返回“Property ID:”后面的字符串,但不包括P-000324字符串之外的任何内容:

echo $xpath->evaluate("normalize-space(substring-before(substring-after((//p[contains(text(),'Property ID:')])[1], 'Property ID:'), '–'))");

will echo just P-000324 as requested.

将按要求回应P-000324。

Update: This solves the problem for the original page as it was originally presented, but the goal seems to be broader per the comments. A more robust solution would be to use just the first expression to obtain the string for the first paragraph containing 'Property ID' and then do regex pattern matching immediately after the label on normal forms of the property id or normal forms of delimiters surrounding property id. You'll have to use the regex facilities of the hosting language as XPath 1.0's string processing functions are very limited; XPath 2.0's are much better and included regex capabilities.

更新:这解决了最初呈现的原始页面的问题,但根据评论,目标似乎更广泛。更健壮的解决方案是仅使用第一个表达式来获取包含“Property ID”的第一个段落的字符串,然后在属性id的正常形式上的标签之后立即执行正则表达式模式匹配,或者在属性id的正常形式的分隔符之后执行。您将不得不使用托管语言的正则表达式工具,因为XPath 1.0的字符串处理功能非常有限; XPath 2.0更好,包括正则表达式功能。