如何使用php在字符串中检测iso8859-8和utf8中的希伯来字符

时间:2022-09-26 08:47:07

I want to be able to detect (using regular expressions) if a string contains hebrew characters both utf8 and iso8859-8 in the php programming language. thanks!

如果一个字符串包含了php编程语言中utf8和iso8859-8的希伯来字符,我希望能够检测(使用正则表达式)。谢谢!

5 个解决方案

#1


14  

Here's map of the iso8859-8 character set. The range E0 - FA appears to be reserved for Hebrew. You could check for those characters in a character class:

这是iso8859-8字符集的地图。范围E0 - FA似乎是为希伯来语保留的。你可以在字符类中检查这些字符:

[\xE0-\xFA]

For UTF-8, the range reserved for Hebrew appears to be 0591 to 05F4. So you could detect that with:

对于UTF-8,为希伯来语保留的射程似乎是0591到05F4。所以你可以通过

[\u0591-\u05F4]

Here's an example of a regex match in PHP:

下面是PHP中的regex匹配示例:

echo preg_match("/[\u0591-\u05F4]/", $string);

#2


4  

well if your PHP file is encoded with UTF-8 as should be in cases that you have hebrew in it, you should use the following RegX:

如果你的PHP文件是用UTF-8编码的,如果里面有希伯来语,你应该使用下面的RegX:

$string="אבהג";
echo preg_match("/\p{Hebrew}/u", $string);
// output: 1

#3


1  

Here's a small function to check whether the first character in a string is in hebrew:

这里有一个小函数来检查字符串中的第一个字符是否是希伯来语:

function IsStringStartsWithHebrew($string)
{
    return (strlen($string) > 1 && //minimum of chars for hebrew encoding
        ord($string[0]) == 215 && //first byte is 110-10111
        ord($string[1]) >= 144 && ord($string[1]) <= 170 //hebrew range in the second byte.
        );
}

good luck :)

祝你好运:)

#4


0  

First, such a string would be completely useless - a mix of two different character sets?

首先,这样的字符串是完全无用的——混合了两个不同的字符集?

Both the hebrew characters in iso8859-8, and each byte of multibyte sequences in UTF-8, have a value ord($char) > 127. So what I would do is find all bytes with a value greater than 127, and then check if they make sense as is8859-8, or if you think they would make more sense as an UTF8-sequence...

iso8859-8中的希伯来字符和UTF-8中多字节序列的每个字节都有一个值ord($char) > 127。所以我要做的是找到所有大于127的字节,然后检查它们是否符合is8859-8,或者如果你认为它们更符合UTF8-sequence…

#5


0  

function is_hebrew($string)
{
    return preg_match("/\p{Hebrew}/u", $string);
}

#1


14  

Here's map of the iso8859-8 character set. The range E0 - FA appears to be reserved for Hebrew. You could check for those characters in a character class:

这是iso8859-8字符集的地图。范围E0 - FA似乎是为希伯来语保留的。你可以在字符类中检查这些字符:

[\xE0-\xFA]

For UTF-8, the range reserved for Hebrew appears to be 0591 to 05F4. So you could detect that with:

对于UTF-8,为希伯来语保留的射程似乎是0591到05F4。所以你可以通过

[\u0591-\u05F4]

Here's an example of a regex match in PHP:

下面是PHP中的regex匹配示例:

echo preg_match("/[\u0591-\u05F4]/", $string);

#2


4  

well if your PHP file is encoded with UTF-8 as should be in cases that you have hebrew in it, you should use the following RegX:

如果你的PHP文件是用UTF-8编码的,如果里面有希伯来语,你应该使用下面的RegX:

$string="אבהג";
echo preg_match("/\p{Hebrew}/u", $string);
// output: 1

#3


1  

Here's a small function to check whether the first character in a string is in hebrew:

这里有一个小函数来检查字符串中的第一个字符是否是希伯来语:

function IsStringStartsWithHebrew($string)
{
    return (strlen($string) > 1 && //minimum of chars for hebrew encoding
        ord($string[0]) == 215 && //first byte is 110-10111
        ord($string[1]) >= 144 && ord($string[1]) <= 170 //hebrew range in the second byte.
        );
}

good luck :)

祝你好运:)

#4


0  

First, such a string would be completely useless - a mix of two different character sets?

首先,这样的字符串是完全无用的——混合了两个不同的字符集?

Both the hebrew characters in iso8859-8, and each byte of multibyte sequences in UTF-8, have a value ord($char) > 127. So what I would do is find all bytes with a value greater than 127, and then check if they make sense as is8859-8, or if you think they would make more sense as an UTF8-sequence...

iso8859-8中的希伯来字符和UTF-8中多字节序列的每个字节都有一个值ord($char) > 127。所以我要做的是找到所有大于127的字节,然后检查它们是否符合is8859-8,或者如果你认为它们更符合UTF8-sequence…

#5


0  

function is_hebrew($string)
{
    return preg_match("/\p{Hebrew}/u", $string);
}