json_encode()非utf - 8编码的字符串?

时间:2023-01-05 22:17:27

So I have an array of strings, and all of the strings are using the system default ANSI encoding and were pulled from a sql database. So there are 256 different possible character byte values (single byte encoding). Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?

我有一个字符串数组,所有的字符串都是使用系统默认的ANSI编码从一个sql数据库中提取。因此,有256种可能的字符字节值(单个字节编码)。是否有一种方法可以让json_encode()工作并显示这些字符,而不是在所有字符串中使用utf8_encode(),最后使用诸如“\u0082”之类的东西?

Or is that the standard for json?

或者是json的标准?

7 个解决方案

#1


29  

Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?

是否有一种方法可以让json_encode()工作并显示这些字符,而不是在所有字符串中使用utf8_encode(),最后使用诸如“\u0082”之类的东西?

If you have an ANSI encoded string, using utf8_encode() is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082 from the json output, but technically these sequences are valid for json, you must not fear them.

如果您有一个ANSI编码的字符串,那么使用utf8_encode()是处理这个问题的错误函数。你需要把它从ANSI正确地转换为UTF-8。这肯定会减少json输出中的Unicode转义序列,比如\u0082,但是技术上这些序列对于json是有效的,您不能害怕它们。

Converting ANSI to UTF-8 with PHP

json_encode works with UTF-8 encoded strings only. If you need to create valid json successfully from an ANSI encoded string, you need to re-encode/convert it to UTF-8 first. Then json_encode will just work as documented.

json_encode只使用UTF-8编码的字符串。如果您需要从ANSI编码的字符串成功地创建有效的json,您需要重新编码/将其转换为UTF-8。然后json_encode将按照文档的形式进行工作。

To convert an encoding from ANSI (more correctly I assume you have a Windows-1252 encoded string, which is popular but wrongly referred to as ANSI) to UTF-8 you can make use of the mb_convert_encoding() function:

要从ANSI转换编码(更准确地说,我假定您有一个windows1252编码的字符串,它很流行,但错误地称为ANSI)到UTF-8,您可以使用mb_convert_encoding()函数:

$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");

Another function in PHP that can convert the encoding / charset of a string is called iconv based on libiconv. You can use it as well:

PHP中的另一个可以转换字符串的编码/字符集的函数是基于libiconv的iconv。你也可以用它:

$str = iconv("CP1252", "UTF-8", $str);

Note on utf8_encode()

utf8_encode() does only work for Latin-1, not for ANSI. So you will destroy part of your characters inside that string when you run it through that function.

utf8_encode()只适用于Latin-1,而不是ANSI。当你运行这个函数时,你会破坏你的部分字符。


Related: What is ANSI format?

相关:什么是ANSI格式?


For a more fine-grained control of what json_encode() returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).

为了更细粒度地控制json_encode()返回的内容,请参阅预分解常量的列表(PHP版本依赖,incl. PHP 5.4,一些常量仍然是没有文档的,并且在源代码中只提供到目前为止)。

Changing the encoding of an array/iteratively (PDO comment)

As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode. That's just a standard array operation, for the simpler case of pdo::fetch() a foreach iteration:

正如您在评论中所写的,您有问题将函数应用到一个数组中,这里有一些代码示例。在使用json_encode之前,总是需要先更改编码。这只是一个标准的数组操作,对于简单的pdo::fetch()一个foreach迭代:

while($row = $q->fetch(PDO::FETCH_ASSOC))
{
  foreach($row as &$value)
  {
    $value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
  }
  unset($value); # safety: remove reference
  $items[] = array_map('utf8_encode', $row );
}

#2


7  

The JSON standard ENFORCES Unicode encoding. From RFC4627:

JSON标准执行Unicode编码。从RFC4627:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

Therefore, on the strictest sense, ANSI encoded JSON wouldn't be valid JSON; this is why PHP enforces unicode encoding when using json_encode().

因此,在严格意义上,ANSI编码的JSON不是有效的JSON;这就是PHP在使用json_encode()时强制unicode编码的原因。

As for "default ANSI", I'm pretty sure that your strings are encoded in Windows-1252. It is incorrectly referred to as ANSI.

至于“默认的ANSI”,我很确定您的字符串被编码在Windows-1252中。它被错误地称为ANSI。

#3


4  

<?php
$array = array('first word' => array('Слово','Кириллица'),'second word' => 'Кириллица','last word' => 'Кириллица');
echo json_encode($array);
/*
return {"first word":["\u0421\u043b\u043e\u0432\u043e","\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"],"second word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430","last word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"}
*/
echo json_encode($array,256);
/*
return {"first word":["Слово","Кириллица"],"second word":"Кириллица","last word":"Кириллица"}
*/
?>

JSON_UNESCAPED_UNICODE (integer) Encode multibyte Unicode characters literally (default is to escape as \uXXXX). Available since PHP 5.4.0.

JSON_UNESCAPED_UNICODE(整数)对多字节Unicode字符进行了字面上的编码(默认情况下是作为\uXXXX脱逃)。由于PHP 5.4.0可用。

http://php.net/manual/en/json.constants.php#constant.json-unescaped-unicode

http://php.net/manual/en/json.constants.php constant.json-unescaped-unicode

#4


1  

Yes, this is standard behavior for json within PHP

是的,这是PHP内json的标准行为。

If you read documentation: http://php.net/manual/en/function.json-encode.php
You'll see that it can work only with utf-8 encoded data,

如果您阅读了文档:http://php.net/manual/en/function.json-encode.php,您将看到它只能使用utf-8编码的数据,

on the other hand you can use the first comment in: http://php.net/manual/en/function.json-encode.php#104278

另一方面,您可以使用第一个注释:http://php.net/manual/en/function.json-encode.php#104278。

and create your own encode/decode function working with ANSI

并创建自己的编码/解码功能与ANSI一起工作。

#5


-1  

json_encode($str,JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS|JSON_HEX_QUOT);

that will convert windows based ANSI to utf-8 and the error will be no more.

这将把基于windows的ANSI转换为utf-8,错误将不复存在。

#6


-1  

I found the following answer for an analogous problem with a nested array not utf-8 encoded that i had to json encode:

我找到了一个类似问题的答案,它是一个嵌套数组,而不是我必须json编码的utf-8编码:

$inputArray = array(
    'a'=>'First item - à',
    'c'=>'Third item - é'
);
$inputArray['b']= array (
          'a'=>'First subitem - ù',
          'b'=>'Second subitem - ì'
    );
 if (!function_exists('recursive_utf8')) {
  function recursive_utf8 ($data) {
     if (!is_array($data)) {
        return utf8_encode($data);
     }
     $result = array();
     foreach ($data as $index=>$item) {
        if (is_array($item)) {
           $result[$index] = array();
           foreach($item as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);
           }
        }
        else if (is_object($item)) {
           $result[$index] = array();
           foreach(get_object_vars($item) as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);   
           }
        } 
        else {
           $result[$index] = recursive_utf8($item);
        }
     }
     return $result; 
   }
}
$outputArray =  json_encode(array_map('recursive_utf8', $inputArray ));

#7


-3  

Use this instead:

取代它可使用:

<?php 
//$return_arr = the array of data to json encode 
//$out = the output of the function 
//don't forget to escape the data before use it! 

$out = '["' . implode('","', $return_arr) . '"]'; 
?>

Copy from json_encode php manual's comments. Always read the comments. They are useful.

从json_encode php手册的注释中复制。总是阅读注释。他们是有用的。

#1


29  

Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?

是否有一种方法可以让json_encode()工作并显示这些字符,而不是在所有字符串中使用utf8_encode(),最后使用诸如“\u0082”之类的东西?

If you have an ANSI encoded string, using utf8_encode() is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082 from the json output, but technically these sequences are valid for json, you must not fear them.

如果您有一个ANSI编码的字符串,那么使用utf8_encode()是处理这个问题的错误函数。你需要把它从ANSI正确地转换为UTF-8。这肯定会减少json输出中的Unicode转义序列,比如\u0082,但是技术上这些序列对于json是有效的,您不能害怕它们。

Converting ANSI to UTF-8 with PHP

json_encode works with UTF-8 encoded strings only. If you need to create valid json successfully from an ANSI encoded string, you need to re-encode/convert it to UTF-8 first. Then json_encode will just work as documented.

json_encode只使用UTF-8编码的字符串。如果您需要从ANSI编码的字符串成功地创建有效的json,您需要重新编码/将其转换为UTF-8。然后json_encode将按照文档的形式进行工作。

To convert an encoding from ANSI (more correctly I assume you have a Windows-1252 encoded string, which is popular but wrongly referred to as ANSI) to UTF-8 you can make use of the mb_convert_encoding() function:

要从ANSI转换编码(更准确地说,我假定您有一个windows1252编码的字符串,它很流行,但错误地称为ANSI)到UTF-8,您可以使用mb_convert_encoding()函数:

$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");

Another function in PHP that can convert the encoding / charset of a string is called iconv based on libiconv. You can use it as well:

PHP中的另一个可以转换字符串的编码/字符集的函数是基于libiconv的iconv。你也可以用它:

$str = iconv("CP1252", "UTF-8", $str);

Note on utf8_encode()

utf8_encode() does only work for Latin-1, not for ANSI. So you will destroy part of your characters inside that string when you run it through that function.

utf8_encode()只适用于Latin-1,而不是ANSI。当你运行这个函数时,你会破坏你的部分字符。


Related: What is ANSI format?

相关:什么是ANSI格式?


For a more fine-grained control of what json_encode() returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).

为了更细粒度地控制json_encode()返回的内容,请参阅预分解常量的列表(PHP版本依赖,incl. PHP 5.4,一些常量仍然是没有文档的,并且在源代码中只提供到目前为止)。

Changing the encoding of an array/iteratively (PDO comment)

As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode. That's just a standard array operation, for the simpler case of pdo::fetch() a foreach iteration:

正如您在评论中所写的,您有问题将函数应用到一个数组中,这里有一些代码示例。在使用json_encode之前,总是需要先更改编码。这只是一个标准的数组操作,对于简单的pdo::fetch()一个foreach迭代:

while($row = $q->fetch(PDO::FETCH_ASSOC))
{
  foreach($row as &$value)
  {
    $value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
  }
  unset($value); # safety: remove reference
  $items[] = array_map('utf8_encode', $row );
}

#2


7  

The JSON standard ENFORCES Unicode encoding. From RFC4627:

JSON标准执行Unicode编码。从RFC4627:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

Therefore, on the strictest sense, ANSI encoded JSON wouldn't be valid JSON; this is why PHP enforces unicode encoding when using json_encode().

因此,在严格意义上,ANSI编码的JSON不是有效的JSON;这就是PHP在使用json_encode()时强制unicode编码的原因。

As for "default ANSI", I'm pretty sure that your strings are encoded in Windows-1252. It is incorrectly referred to as ANSI.

至于“默认的ANSI”,我很确定您的字符串被编码在Windows-1252中。它被错误地称为ANSI。

#3


4  

<?php
$array = array('first word' => array('Слово','Кириллица'),'second word' => 'Кириллица','last word' => 'Кириллица');
echo json_encode($array);
/*
return {"first word":["\u0421\u043b\u043e\u0432\u043e","\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"],"second word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430","last word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"}
*/
echo json_encode($array,256);
/*
return {"first word":["Слово","Кириллица"],"second word":"Кириллица","last word":"Кириллица"}
*/
?>

JSON_UNESCAPED_UNICODE (integer) Encode multibyte Unicode characters literally (default is to escape as \uXXXX). Available since PHP 5.4.0.

JSON_UNESCAPED_UNICODE(整数)对多字节Unicode字符进行了字面上的编码(默认情况下是作为\uXXXX脱逃)。由于PHP 5.4.0可用。

http://php.net/manual/en/json.constants.php#constant.json-unescaped-unicode

http://php.net/manual/en/json.constants.php constant.json-unescaped-unicode

#4


1  

Yes, this is standard behavior for json within PHP

是的,这是PHP内json的标准行为。

If you read documentation: http://php.net/manual/en/function.json-encode.php
You'll see that it can work only with utf-8 encoded data,

如果您阅读了文档:http://php.net/manual/en/function.json-encode.php,您将看到它只能使用utf-8编码的数据,

on the other hand you can use the first comment in: http://php.net/manual/en/function.json-encode.php#104278

另一方面,您可以使用第一个注释:http://php.net/manual/en/function.json-encode.php#104278。

and create your own encode/decode function working with ANSI

并创建自己的编码/解码功能与ANSI一起工作。

#5


-1  

json_encode($str,JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS|JSON_HEX_QUOT);

that will convert windows based ANSI to utf-8 and the error will be no more.

这将把基于windows的ANSI转换为utf-8,错误将不复存在。

#6


-1  

I found the following answer for an analogous problem with a nested array not utf-8 encoded that i had to json encode:

我找到了一个类似问题的答案,它是一个嵌套数组,而不是我必须json编码的utf-8编码:

$inputArray = array(
    'a'=>'First item - à',
    'c'=>'Third item - é'
);
$inputArray['b']= array (
          'a'=>'First subitem - ù',
          'b'=>'Second subitem - ì'
    );
 if (!function_exists('recursive_utf8')) {
  function recursive_utf8 ($data) {
     if (!is_array($data)) {
        return utf8_encode($data);
     }
     $result = array();
     foreach ($data as $index=>$item) {
        if (is_array($item)) {
           $result[$index] = array();
           foreach($item as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);
           }
        }
        else if (is_object($item)) {
           $result[$index] = array();
           foreach(get_object_vars($item) as $key=>$value) {
              $result[$index][$key] = recursive_utf8($value);   
           }
        } 
        else {
           $result[$index] = recursive_utf8($item);
        }
     }
     return $result; 
   }
}
$outputArray =  json_encode(array_map('recursive_utf8', $inputArray ));

#7


-3  

Use this instead:

取代它可使用:

<?php 
//$return_arr = the array of data to json encode 
//$out = the output of the function 
//don't forget to escape the data before use it! 

$out = '["' . implode('","', $return_arr) . '"]'; 
?>

Copy from json_encode php manual's comments. Always read the comments. They are useful.

从json_encode php手册的注释中复制。总是阅读注释。他们是有用的。