如何检测字符串中的非ASCII字符?

时间:2021-08-23 20:20:51

If I have a PHP string, how can I determine if it contains at least one non-ASCII character or not, in an efficient way? And by non-ASCII character, I mean any character that is not part of this table, http://www.asciitable.com/, positions 32 - 126 inclusive.

如果我有一个PHP字符串,如何以有效的方式确定它是否包含至少一个非ASCII字符?对于非ASCII字符,我的意思是任何不属于此表的字符,http://www.asciitable.com/,位置32 - 126包含。

So not only does it have to be part of the ASCII table, but it also has to be printable. I want to detect a string that contains at least one character that does not meet these specifications (either non-printable ASCII, or a different character altogether, such as a Unicode character that is not part of that table.

因此,它不仅必须是ASCII表的一部分,而且还必须是可打印的。我想检测一个包含至少一个不符合这些规范的字符的字符串(不可打印的ASCII,或者完全不同的字符,例如不属于该表的Unicode字符)。

9 个解决方案

#1


55  

I found it more useful to detect if any character falls out of the list

我发现检测是否有任何字符从列表中删除更有用

if(preg_match('/[^\x20-\x7f]/', $string))

#2


28  

You can use mb_detect_encoding and check for ASCII:

您可以使用mb_detect_encoding并检查ASCII:

mb_detect_encoding($str, 'ASCII', true)

This will return false if $str contains at least one non-ASCI character (byte value > 0x7F).

如果$ str包含至少一个非ASCI字符(字节值> 0x7F),则返回false。

#3


2  

You could use:

你可以使用:

mb_detect_encoding

mb_detect_encoding

but it will be maybe not as precise as you want it to be.

但它可能不会像你想要的那样精确。

#4


2  

Try (mb_detect_encoding)

试试(mb_detect_encoding)

#5


2  

Try: (Source)

尝试:(来源)

function is_ascii( $string = '' ) {
    return ( bool ) ! preg_match( '/[\\x80-\\xff]+/' , $string );
}

Although, all of the above answers are correct, but depending upon the input, these solutions may give wrong answers. See the last section in this ASCII validation post.

虽然上述所有答案都是正确的,但根据输入,这些解决方案可能会给出错误的答案。请参阅此ASCII验证帖子的最后一节。

#6


2  

The function ctype_print returns true iff all characters fall into the ASCII range 32-126 (PHP unit test).

如果所有字符都落在ASCII范围32-126(PHP单元测试)中,则函数ctype_print返回true。

#7


1  

If you do not want to deal with Regex in javascript you can do

如果您不想在javascript中处理Regex,您可以这样做

detectUf8 : function(s) {
  var utf8=s.split('').filter(function(C) {
    return C.charCodeAt(0)>127;
  })
  return (utf8.join('').length>0);
},

#8


0  

I suggest you look into utf8_encode or utf8_decode under PHP's manual:

我建议你在PHP手册下查看utf8_encode或utf8_decode:

http://www.php.net/manual/en/function.utf8-encode.php

http://www.php.net/manual/en/function.utf8-encode.php

Look into the examples down below as it may have something there that leads you to the right direction if not finding what you are looking for.

请查看下面的示例,因为如果没有找到您要查找的内容,可能会有一些内容可以引导您找到正确的方向。

#9


-1  

Then the fastest method is actually: mb_check_encoding, sample usage:

那么最快的方法实际上是:mb_check_encoding,样本用法:

    return mb_check_encoding($str, 'ASCII');

#1


55  

I found it more useful to detect if any character falls out of the list

我发现检测是否有任何字符从列表中删除更有用

if(preg_match('/[^\x20-\x7f]/', $string))

#2


28  

You can use mb_detect_encoding and check for ASCII:

您可以使用mb_detect_encoding并检查ASCII:

mb_detect_encoding($str, 'ASCII', true)

This will return false if $str contains at least one non-ASCI character (byte value > 0x7F).

如果$ str包含至少一个非ASCI字符(字节值> 0x7F),则返回false。

#3


2  

You could use:

你可以使用:

mb_detect_encoding

mb_detect_encoding

but it will be maybe not as precise as you want it to be.

但它可能不会像你想要的那样精确。

#4


2  

Try (mb_detect_encoding)

试试(mb_detect_encoding)

#5


2  

Try: (Source)

尝试:(来源)

function is_ascii( $string = '' ) {
    return ( bool ) ! preg_match( '/[\\x80-\\xff]+/' , $string );
}

Although, all of the above answers are correct, but depending upon the input, these solutions may give wrong answers. See the last section in this ASCII validation post.

虽然上述所有答案都是正确的,但根据输入,这些解决方案可能会给出错误的答案。请参阅此ASCII验证帖子的最后一节。

#6


2  

The function ctype_print returns true iff all characters fall into the ASCII range 32-126 (PHP unit test).

如果所有字符都落在ASCII范围32-126(PHP单元测试)中,则函数ctype_print返回true。

#7


1  

If you do not want to deal with Regex in javascript you can do

如果您不想在javascript中处理Regex,您可以这样做

detectUf8 : function(s) {
  var utf8=s.split('').filter(function(C) {
    return C.charCodeAt(0)>127;
  })
  return (utf8.join('').length>0);
},

#8


0  

I suggest you look into utf8_encode or utf8_decode under PHP's manual:

我建议你在PHP手册下查看utf8_encode或utf8_decode:

http://www.php.net/manual/en/function.utf8-encode.php

http://www.php.net/manual/en/function.utf8-encode.php

Look into the examples down below as it may have something there that leads you to the right direction if not finding what you are looking for.

请查看下面的示例,因为如果没有找到您要查找的内容,可能会有一些内容可以引导您找到正确的方向。

#9


-1  

Then the fastest method is actually: mb_check_encoding, sample usage:

那么最快的方法实际上是:mb_check_encoding,样本用法:

    return mb_check_encoding($str, 'ASCII');