json_encode():参数中无效的UTF-8序列。

时间:2023-01-05 22:17:39

I'm calling json_encode() on data that comes from a MySQL database with utf8_general_ci collation. The problem is that some rows have weird data which I can't clean. For example symbol , so once it reaches json_encode(), it fails with json_encode(): Invalid UTF-8 sequence in argument.

我在使用utf8_general_ci排序的MySQL数据库中调用json_encode()。问题是有些行有奇怪的数据,我不能清理。例如,当它到达json_encode()时,它就会失败,json_encode():参数中无效的UTF-8序列。

I've tried utf8_encode() and utf8_decode(), even with mb_check_encoding() but it keeps getting through and causing havoc.

我已经尝试了utf8_encode()和utf8_decode(),即使使用mb_check_encoding(),但它仍然能够通过并造成破坏。

Running PHP 5.3.10 on Mac. So the question is - how can I clean up invalid utf8 symbols, keeping the rest of data, so that json_encoding() would work?

在Mac上运行PHP 5.3.10,所以问题是——如何清除无效的utf8符号,保留其余的数据,以便json_encoding()能够工作?

Update. Here is a way to reproduce it:

更新。这里有一种复制的方法:

echo json_encode(pack("H*" ,'c32e'));

11 个解决方案

#1


31  

I had a similar error which caused json_encode to return a null field whenever there was a hi-ascii character such as a curly apostrophe in a string, due to the wrong character set being returned in the query.

我有一个类似的错误,它导致json_encode返回一个空字段,因为在查询中返回的字符设置错误,在字符串中出现了一个高ascii字符,比如字符串中的一个卷撇号。

The solution was to make sure it comes as utf8 by adding:

解决方案是通过添加:

mysql_set_charset('utf8');

after the mysql connect statement.

在mysql连接语句之后。

#2


23  

Seems like the symbol was Å, but since data consists of surnames that shouldn't be public, only first letter was shown and it was done by just $lastname[0], which is wrong for multibyte strings and caused the whole hassle. Changed it to mb_substr($lastname, 0, 1) - works like a charm.

似乎这个符号是A,但是由于数据包含了不应该公开的姓氏,只显示了第一个字母,它只使用了$lastname[0],这对于多字节字符串是错误的,并且引起了整个麻烦。将其更改为mb_substr($lastname, 0,1)——工作起来很有魅力。

#3


21  

The problem is that this character is UTF8, but json_encode does not handle it correctly. To say more, there is a list of other characters (see Unicode characters list), that will trigger the same error, so stripping off this one (Å) will not correct an issue to the end.

问题是这个字符是UTF8,但是json_encode不能正确处理它。另外,还有其他字符的列表(参见Unicode字符列表),这将触发相同的错误,因此去掉这个(a)将不会纠正一个问题。

What we have used is to convert these chars to html entities like this:

我们所使用的是将这些字符转换成这样的html实体:

htmlentities( (string) $value, ENT_QUOTES, 'utf-8', FALSE);

#4


12  

Make sure that your connection charset to MySQL is UTF-8. It often defaults to ISO-8859-1 which means that the MySQL driver will convert the text to ISO-8859-1.

确保你的连接字符集到MySQL是UTF-8。它通常默认为ISO-8859-1,这意味着MySQL驱动程序将把文本转换为ISO-8859-1。

You can set the connection charset with mysql_set_charset, mysqli_set_charset or with the query SET NAMES 'utf-8'

您可以使用mysql_set_charset、mysqli_set_charset或查询集名称“utf-8”来设置连接字符集。

#5


5  

Using this code might help. It solved my problem!

使用此代码可能会有所帮助。它解决了我的问题!

mb_convert_encoding($post["post"],'UTF-8','UTF-8');

or like that

或者像这样

mb_convert_encoding($string,'UTF-8','UTF-8');

#6


2  

The symbol you posted is the placeholder symbol for a broken byte sequence. Basically, it's not a real symbol but an error in your string.

您发布的符号是一个破字节序列的占位符符号。基本上,它不是一个真正的符号,而是字符串中的一个错误。

What is the exact byte value of the symbol? Blindly applying utf8_encode is not a good idea, it's better to find out first where the byte(s) came from and what they mean.

符号的确切字节值是多少?盲目地应用utf8_encode不是一个好主意,最好先找出字节来自哪里以及它们的含义。

#7


0  

Another thing that throws this error, when you use php's json_encode function, is when unicode characters are upper case \U and not lower case \u

当你使用php的json_encode函数时,另一个错误是,当unicode字符是大写字母U,而不是小写字母U时。

#8


0  

json_encode works only with UTF-8 data. You'll have to ensure that your data is in UTF-8. alternatively, you can use iconv() to convert your results to UTF-8 before feeding them to json_encode()

json_encode只使用UTF-8数据。你必须确保你的数据是UTF-8。或者,您也可以使用iconv()将结果转换为UTF-8,然后再将其输入json_encode()

#9


0  

Updated.. I solved this issue by stating the charset on PDO connection as below:

更新. .我通过在PDO连接上的charset来解决这个问题:

"mysql:host=$host;dbname=$db;charset=utf8"

“mysql:主机= $主机;dbname = $ db;charset = utf8 "

All data received was then in the correct charset for the rest of the code to use

接收到的所有数据都在正确的字符集中,以供其余代码使用。

#10


0  

I am very late but if some one working on SLIM to make rest api and getting same error can solve this problem by adding below line as:

<?php

// DbConnect.php file
class DbConnect
{
    //Variable to store database link
    private $con;

    //Class constructor
    function __construct()
    {

    }

    //This method will connect to the database
    function connect()
    {
        //Including the constants.php file to get the database constants
        include_once dirname(__FILE__) . '/Constants.php';

        //connecting to mysql database
        $this->con = new mysqli(DB_HOST, DB_USERNAME, DB_PASSWORD, DB_NAME);

        mysqli_set_charset($this->con, "utf8"); // add this line 
        //Checking if any error occured while connecting
        if (mysqli_connect_errno()) {
            echo "Failed to connect to MySQL: " . mysqli_connect_error();
        }

        //finally returning the connection link
        return $this->con;
    }
}

#11


-1  

Using setLocale('fr_FR.UTF8') before json_encode solved the problem.

在json_encode解决问题之前,使用setLocale('fr_FR.UTF8')。

#1


31  

I had a similar error which caused json_encode to return a null field whenever there was a hi-ascii character such as a curly apostrophe in a string, due to the wrong character set being returned in the query.

我有一个类似的错误,它导致json_encode返回一个空字段,因为在查询中返回的字符设置错误,在字符串中出现了一个高ascii字符,比如字符串中的一个卷撇号。

The solution was to make sure it comes as utf8 by adding:

解决方案是通过添加:

mysql_set_charset('utf8');

after the mysql connect statement.

在mysql连接语句之后。

#2


23  

Seems like the symbol was Å, but since data consists of surnames that shouldn't be public, only first letter was shown and it was done by just $lastname[0], which is wrong for multibyte strings and caused the whole hassle. Changed it to mb_substr($lastname, 0, 1) - works like a charm.

似乎这个符号是A,但是由于数据包含了不应该公开的姓氏,只显示了第一个字母,它只使用了$lastname[0],这对于多字节字符串是错误的,并且引起了整个麻烦。将其更改为mb_substr($lastname, 0,1)——工作起来很有魅力。

#3


21  

The problem is that this character is UTF8, but json_encode does not handle it correctly. To say more, there is a list of other characters (see Unicode characters list), that will trigger the same error, so stripping off this one (Å) will not correct an issue to the end.

问题是这个字符是UTF8,但是json_encode不能正确处理它。另外,还有其他字符的列表(参见Unicode字符列表),这将触发相同的错误,因此去掉这个(a)将不会纠正一个问题。

What we have used is to convert these chars to html entities like this:

我们所使用的是将这些字符转换成这样的html实体:

htmlentities( (string) $value, ENT_QUOTES, 'utf-8', FALSE);

#4


12  

Make sure that your connection charset to MySQL is UTF-8. It often defaults to ISO-8859-1 which means that the MySQL driver will convert the text to ISO-8859-1.

确保你的连接字符集到MySQL是UTF-8。它通常默认为ISO-8859-1,这意味着MySQL驱动程序将把文本转换为ISO-8859-1。

You can set the connection charset with mysql_set_charset, mysqli_set_charset or with the query SET NAMES 'utf-8'

您可以使用mysql_set_charset、mysqli_set_charset或查询集名称“utf-8”来设置连接字符集。

#5


5  

Using this code might help. It solved my problem!

使用此代码可能会有所帮助。它解决了我的问题!

mb_convert_encoding($post["post"],'UTF-8','UTF-8');

or like that

或者像这样

mb_convert_encoding($string,'UTF-8','UTF-8');

#6


2  

The symbol you posted is the placeholder symbol for a broken byte sequence. Basically, it's not a real symbol but an error in your string.

您发布的符号是一个破字节序列的占位符符号。基本上,它不是一个真正的符号,而是字符串中的一个错误。

What is the exact byte value of the symbol? Blindly applying utf8_encode is not a good idea, it's better to find out first where the byte(s) came from and what they mean.

符号的确切字节值是多少?盲目地应用utf8_encode不是一个好主意,最好先找出字节来自哪里以及它们的含义。

#7


0  

Another thing that throws this error, when you use php's json_encode function, is when unicode characters are upper case \U and not lower case \u

当你使用php的json_encode函数时,另一个错误是,当unicode字符是大写字母U,而不是小写字母U时。

#8


0  

json_encode works only with UTF-8 data. You'll have to ensure that your data is in UTF-8. alternatively, you can use iconv() to convert your results to UTF-8 before feeding them to json_encode()

json_encode只使用UTF-8数据。你必须确保你的数据是UTF-8。或者,您也可以使用iconv()将结果转换为UTF-8,然后再将其输入json_encode()

#9


0  

Updated.. I solved this issue by stating the charset on PDO connection as below:

更新. .我通过在PDO连接上的charset来解决这个问题:

"mysql:host=$host;dbname=$db;charset=utf8"

“mysql:主机= $主机;dbname = $ db;charset = utf8 "

All data received was then in the correct charset for the rest of the code to use

接收到的所有数据都在正确的字符集中,以供其余代码使用。

#10


0  

I am very late but if some one working on SLIM to make rest api and getting same error can solve this problem by adding below line as:

<?php

// DbConnect.php file
class DbConnect
{
    //Variable to store database link
    private $con;

    //Class constructor
    function __construct()
    {

    }

    //This method will connect to the database
    function connect()
    {
        //Including the constants.php file to get the database constants
        include_once dirname(__FILE__) . '/Constants.php';

        //connecting to mysql database
        $this->con = new mysqli(DB_HOST, DB_USERNAME, DB_PASSWORD, DB_NAME);

        mysqli_set_charset($this->con, "utf8"); // add this line 
        //Checking if any error occured while connecting
        if (mysqli_connect_errno()) {
            echo "Failed to connect to MySQL: " . mysqli_connect_error();
        }

        //finally returning the connection link
        return $this->con;
    }
}

#11


-1  

Using setLocale('fr_FR.UTF8') before json_encode solved the problem.

在json_encode解决问题之前,使用setLocale('fr_FR.UTF8')。