How to convert ASCII encoding to UTF8 in PHP
如何在PHP中将ASCII编码转换为UTF8
6 个解决方案
#1
45
ASCII is a subset of UTF-8, so if a document is ASCII then it is already UTF-8.
ASCII是UTF-8的一个子集,所以如果一个文档是ASCII,那么它已经是UTF-8了。
#2
19
If you know for sure that your current encoding is pure ASCII, then you don't have to do anything because ASCII is already a valid UTF-8.
如果您确信当前的编码是纯ASCII,那么您不需要做任何事情,因为ASCII已经是一个有效的UTF-8。
But if you still want to convert, just to be sure that its UTF-8, then you can use iconv
但是如果你仍然想转换,为了确保它是UTF-8,那么你可以使用iconv
$string = iconv('ASCII', 'UTF-8//IGNORE', $string);
The IGNORE will discard any invalid characters just in case some were not valid ASCII.
忽略将丢弃任何无效字符,以防某些字符不是有效的ASCII。
#3
4
Use utf8_encode()
使用utf8_encode()
Man page can be found here http://php.net/manual/en/function.utf8-encode.php
可以在这里找到手册页http://php.net/manual/en/function.utf8-encode.php
Also read this article from Joel on Software. It provides an excellent explanation if what Unicode is and how it works. http://www.joelonsoftware.com/articles/Unicode.html
也请阅读乔尔关于软件的文章。如果Unicode是什么以及它是如何工作的,它提供了一个很好的解释。http://www.joelonsoftware.com/articles/Unicode.html
#4
2
"ASCII is a subset of UTF-8, so..." - so UTF-8 is a set? :)
“ASCII是UTF-8的子集,所以……”- UTF-8是一个集合?:)
In other words: any string build with code points
from x00 to x7F has indistinguishable representations (byte sequences) in ASCII and UTF-8. Converting such string is pointless.
换句话说:任何由x00到x7F的代码点构建的字符串在ASCII和UTF-8中都有难以区分的表示(字节序列)。转换这样的字符串是没有意义的。
#5
2
Use mb_convert_encoding to convert an ASCII to UTF-8. More info here
使用mb_convert_encoding将ASCII转换为UTF-8。更多的信息在这里
$string = "chárêctërs";
print(mb_detect_encoding ($string));
$string = mb_convert_encoding($string, "UTF-8");
print(mb_detect_encoding ($string));
#6
-1
Using iconv looks like best solution but i my case I have Notice form this function: "Detected an illegal character in input string in" (without igonore). I use 2 functions to manipulate ASCII strings convert it to array of ASCII code and then serialize:
使用iconv看起来是最好的解决方案,但我的情况是,我从这个函数中注意到:“在输入字符串中检测到非法字符”(没有igonore)。我用两个函数操作ASCII字符串将它转换成ASCII码的数组,然后序列化:
public static function ToAscii($string) {
$strlen = strlen($string);
$charCode = array();
for ($i = 0; $i < $strlen; $i++) {
$charCode[] = ord(substr($string, $i, 1));
}
$result = json_encode($charCode);
return $result;
}
public static function fromAscii($string) {
$charCode = json_decode($string);
$result = '';
foreach ($charCode as $code) {
$result .= chr($code);
};
return $result;
}
#1
45
ASCII is a subset of UTF-8, so if a document is ASCII then it is already UTF-8.
ASCII是UTF-8的一个子集,所以如果一个文档是ASCII,那么它已经是UTF-8了。
#2
19
If you know for sure that your current encoding is pure ASCII, then you don't have to do anything because ASCII is already a valid UTF-8.
如果您确信当前的编码是纯ASCII,那么您不需要做任何事情,因为ASCII已经是一个有效的UTF-8。
But if you still want to convert, just to be sure that its UTF-8, then you can use iconv
但是如果你仍然想转换,为了确保它是UTF-8,那么你可以使用iconv
$string = iconv('ASCII', 'UTF-8//IGNORE', $string);
The IGNORE will discard any invalid characters just in case some were not valid ASCII.
忽略将丢弃任何无效字符,以防某些字符不是有效的ASCII。
#3
4
Use utf8_encode()
使用utf8_encode()
Man page can be found here http://php.net/manual/en/function.utf8-encode.php
可以在这里找到手册页http://php.net/manual/en/function.utf8-encode.php
Also read this article from Joel on Software. It provides an excellent explanation if what Unicode is and how it works. http://www.joelonsoftware.com/articles/Unicode.html
也请阅读乔尔关于软件的文章。如果Unicode是什么以及它是如何工作的,它提供了一个很好的解释。http://www.joelonsoftware.com/articles/Unicode.html
#4
2
"ASCII is a subset of UTF-8, so..." - so UTF-8 is a set? :)
“ASCII是UTF-8的子集,所以……”- UTF-8是一个集合?:)
In other words: any string build with code points
from x00 to x7F has indistinguishable representations (byte sequences) in ASCII and UTF-8. Converting such string is pointless.
换句话说:任何由x00到x7F的代码点构建的字符串在ASCII和UTF-8中都有难以区分的表示(字节序列)。转换这样的字符串是没有意义的。
#5
2
Use mb_convert_encoding to convert an ASCII to UTF-8. More info here
使用mb_convert_encoding将ASCII转换为UTF-8。更多的信息在这里
$string = "chárêctërs";
print(mb_detect_encoding ($string));
$string = mb_convert_encoding($string, "UTF-8");
print(mb_detect_encoding ($string));
#6
-1
Using iconv looks like best solution but i my case I have Notice form this function: "Detected an illegal character in input string in" (without igonore). I use 2 functions to manipulate ASCII strings convert it to array of ASCII code and then serialize:
使用iconv看起来是最好的解决方案,但我的情况是,我从这个函数中注意到:“在输入字符串中检测到非法字符”(没有igonore)。我用两个函数操作ASCII字符串将它转换成ASCII码的数组,然后序列化:
public static function ToAscii($string) {
$strlen = strlen($string);
$charCode = array();
for ($i = 0; $i < $strlen; $i++) {
$charCode[] = ord(substr($string, $i, 1));
}
$result = json_encode($charCode);
return $result;
}
public static function fromAscii($string) {
$charCode = json_decode($string);
$result = '';
foreach ($charCode as $code) {
$result .= chr($code);
};
return $result;
}