如何使用XPath执行不区分大小写的搜索并支持非英语字符?

时间:2021-08-23 20:21:03

I am performing a search in an XML file, using the following code:

我正在使用以下代码在XML文件中执行搜索:

$result = $xml->xpath("//StopPoint[contains(StopName, '$query')]");

Where $query is the search query, and StopName is the name of a bus stop. The problem is, it's case sensitive.

其中$ query是搜索查询,StopName是公共汽车站的名称。问题是,它区分大小写。

And not only that, I would also be able to search with non-english characters like ÆØÅæøå to return Norwegian names.

不仅如此,我还可以搜索ÆØÅæøå这样的非英语字符来返回挪威名字。

How is this possible?

这怎么可能?

4 个解决方案

#1


12  

In XPath 1.0 (which is, I believe, the best you can get with PHP SimpleXML), you'd have to use the translate() function to produce all-lowercase output from mixed-case input.

在XPath 1.0中(我相信,你可以用PHP SimpleXML获得最好的),你必须使用translate()函数从混合大小写输入生成全小写输出。

For convenience, I would wrap it in a function like this:

为方便起见,我将它包装在这样的函数中:

function findStopPointByName($xml, $query) {
  $upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ"; // add any characters...
  $lower = "abcdefghijklmnopqrstuvwxyzæøå"; // ...that are missing

  $arg_stopname = "translate(StopName, '$upper', '$lower')";
  $arg_query    = "translate('$query', '$upper', '$lower')";

  return $xml->xpath("//StopPoint[contains($arg_stopname, $arg_query)");
}

As a sanitizing measure I would either completely forbid or escape single quotes in $query, because they will break your XPath string if they are ignored.

作为一种消毒措施,我要么完全禁止或者在$ query中转义单引号,因为如果它们被忽略,它们将破坏你的XPath字符串。

#2


9  

In XPath 2.0 you can use lower-case() function, which is unicode aware, so it'll handle non-ASCII characters fine.

在XPath 2.0中,您可以使用小写()函数,它具有unicode感知功能,因此它可以很好地处理非ASCII字符。

contains(lower-case(StopName), lower-case('$query'))

To access XPath 2.0 you need XSLT 2.0 parser. For example SAXON. You can access it from PHP via JavaBridge.

要访问XPath 2.0,您需要XSLT 2.0解析器。例如SAXON。您可以通过JavaBridge从PHP访问它。

#3


3  

Non-English names should not be a problem. Just add them to your XPath. (XML is defined as using Unicode).

非英文名称应该不是问题。只需将它们添加到XPath即可。 (XML定义为使用Unicode)。

As for case-insensitivity, ...

对于不区分大小写,...

XPath 1.0 includes the following statement:

XPath 1.0包含以下语句:

Two strings are equal if and only if they consist of the same sequence of UCS characters.

当且仅当两个字符串由相同的UCS字符序列组成时,它们是相等的。

So even using explicit predicates on the local-name will not help.

因此,即使在local-name上使用显式谓词也无济于事。

XPath 2 includes functions to map case. E.g. fn:upper-case

XPath 2包含映射大小写的函数。例如。 FN:大写


Additional: using XPath's translate function should allow case mapping to be faked in XPath 1, but the input will need to include every cased code point you and your users will ever need:

附加:使用XPath的翻译功能应该允许在XPath 1中伪造案例映射,但输入将需要包含您和您的用户将需要的每个套接字代码点:

"test" = translate($inputString, "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ")

#4


0  

In addition:

此外:

$xml->xpath("//StopPoint[contains(StopName, '$query')]");

$ xml-> xpath(“// StopPoint [contains(StopName,'$ query')]”);

You will need to strip out any apostrophe characters from $query to avoid breaking your expression.

您需要从$ query中删除任何撇号字符以避免破坏表达式。

In XPath 2.0 you can double-up the quote being used in the delimiter to put that quote into a string literal, but in XPath 1.0 it's impossible to include the delimiter in the string.

在XPath 2.0中,您可以将分隔符中使用的引号加倍,以将该引号放入字符串文字中,但在XPath 1.0中,不可能在字符串中包含分隔符。

#1


12  

In XPath 1.0 (which is, I believe, the best you can get with PHP SimpleXML), you'd have to use the translate() function to produce all-lowercase output from mixed-case input.

在XPath 1.0中(我相信,你可以用PHP SimpleXML获得最好的),你必须使用translate()函数从混合大小写输入生成全小写输出。

For convenience, I would wrap it in a function like this:

为方便起见,我将它包装在这样的函数中:

function findStopPointByName($xml, $query) {
  $upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ"; // add any characters...
  $lower = "abcdefghijklmnopqrstuvwxyzæøå"; // ...that are missing

  $arg_stopname = "translate(StopName, '$upper', '$lower')";
  $arg_query    = "translate('$query', '$upper', '$lower')";

  return $xml->xpath("//StopPoint[contains($arg_stopname, $arg_query)");
}

As a sanitizing measure I would either completely forbid or escape single quotes in $query, because they will break your XPath string if they are ignored.

作为一种消毒措施,我要么完全禁止或者在$ query中转义单引号,因为如果它们被忽略,它们将破坏你的XPath字符串。

#2


9  

In XPath 2.0 you can use lower-case() function, which is unicode aware, so it'll handle non-ASCII characters fine.

在XPath 2.0中,您可以使用小写()函数,它具有unicode感知功能,因此它可以很好地处理非ASCII字符。

contains(lower-case(StopName), lower-case('$query'))

To access XPath 2.0 you need XSLT 2.0 parser. For example SAXON. You can access it from PHP via JavaBridge.

要访问XPath 2.0,您需要XSLT 2.0解析器。例如SAXON。您可以通过JavaBridge从PHP访问它。

#3


3  

Non-English names should not be a problem. Just add them to your XPath. (XML is defined as using Unicode).

非英文名称应该不是问题。只需将它们添加到XPath即可。 (XML定义为使用Unicode)。

As for case-insensitivity, ...

对于不区分大小写,...

XPath 1.0 includes the following statement:

XPath 1.0包含以下语句:

Two strings are equal if and only if they consist of the same sequence of UCS characters.

当且仅当两个字符串由相同的UCS字符序列组成时,它们是相等的。

So even using explicit predicates on the local-name will not help.

因此,即使在local-name上使用显式谓词也无济于事。

XPath 2 includes functions to map case. E.g. fn:upper-case

XPath 2包含映射大小写的函数。例如。 FN:大写


Additional: using XPath's translate function should allow case mapping to be faked in XPath 1, but the input will need to include every cased code point you and your users will ever need:

附加:使用XPath的翻译功能应该允许在XPath 1中伪造案例映射,但输入将需要包含您和您的用户将需要的每个套接字代码点:

"test" = translate($inputString, "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ")

#4


0  

In addition:

此外:

$xml->xpath("//StopPoint[contains(StopName, '$query')]");

$ xml-> xpath(“// StopPoint [contains(StopName,'$ query')]”);

You will need to strip out any apostrophe characters from $query to avoid breaking your expression.

您需要从$ query中删除任何撇号字符以避免破坏表达式。

In XPath 2.0 you can double-up the quote being used in the delimiter to put that quote into a string literal, but in XPath 1.0 it's impossible to include the delimiter in the string.

在XPath 2.0中,您可以将分隔符中使用的引号加倍,以将该引号放入字符串文字中,但在XPath 1.0中,不可能在字符串中包含分隔符。