如何修复“不正确的字符串值”错误?

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.

在注意到一个应用程序倾向于丢弃由于错误的字符串值错误而导致的随机邮件之后，我去了，并切换了许多文本列来使用utf8列charset和默认的列collate (utf8_general_ci)，以便它能够接受它们。这修复了大部分错误，并使应用程序在遇到非拉丁电子邮件时停止接收sql错误。

Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)

尽管如此，仍有一些电子邮件导致程序错误地命中错误的字符串值:(不正确的字符串值:\xE4\xC5\ xC9\xD3\xD8……)为第一行的“内容”列

The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.

content列是一个MEDIUMTEXT数据类型，它使用utf8列字符集和utf8_general_ci列排序。在本专栏中没有可以切换的标志。

Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:

记住，除非绝对必要，否则我不想接触或查看应用程序源代码。

What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
是什么导致了这个错误?(是的，我知道邮件里全是随机的垃圾，但我认为utf8是相当宽容的)
How can I fix it?
我怎样才能修好它呢?
What are the likely effects of such a fix?
这样的修正可能产生什么影响?

One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.

我考虑过的一件事是在打开二进制标志时切换到utf8 varchar([一些较大的数字])，但我对MySQL相当不熟悉，不知道这样的修复是否有意义。

19 个解决方案

#1

"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:

“\ xE4 \ xC5 \ xCC \ xC9 \ xD3 \ xD8”不是有效的utf - 8。使用Python测试:

>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data

If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.

如果您正在寻找一种在数据库中避免解码错误的方法，那么cp1252编码(又名“Windows-1252”，又名“Windows西欧”)是最允许的编码——每个字节值都是一个有效的代码点。

Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?

当然，它不会更多地理解真正的UTF-8，也不会理解任何其他非cp1252编码，但听起来你并不太在意这个?

#2

105

I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.

我不建议里奇的回答，因为你把数据库中的数据搞砸了。您不会修复您的问题，但是尝试“隐藏”它，并且无法使用有问题的数据执行基本的数据库操作。

If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.

如果您遇到这个错误，您发送的数据不是UTF-8编码的，或者您的连接不是UTF-8编码的。首先，验证数据源(文件…)是否真的是UTF-8。

Then, check your database connection, you should do this after connecting:

然后，检查您的数据库连接，连接后应该这样做:

SET NAMES 'utf8';
SET CHARACTER SET utf8;

Next, verify that the tables where the data is stored have the utf8 character set:

接下来，验证存储数据的表具有utf8字符集:

SELECT
  `tables`.`TABLE_NAME`,
  `collations`.`character_set_name`
FROM
  `information_schema`.`TABLES` AS `tables`,
  `information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
  `tables`.`table_schema` = DATABASE()
  AND `collations`.`collation_name` = `tables`.`table_collation`
;

Last, check your database settings:

最后，检查您的数据库设置:

mysql> show variables like '%colla%';
mysql> show variables like '%charac%';

If source, transport and destination are UTF-8, your problem is gone;)

如果来源、运输和目的地是UTF-8，您的问题就消失了;)

#3

MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).

MySQL的utf-8类型实际上并不是合适的utf-8——它每个字符最多只使用3个字节，并且只支持基本的多语言平面(即没有表情符号，没有星体平面等)。

If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.

如果需要从较高的Unicode平面存储值，则需要utf8mb4编码。

#4

The table and fields have the wrong encoding; however, you can convert them to UTF-8.

表和字段编码错误;但是，您可以将它们转换为UTF-8。

ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;

#5

I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.

我今天通过将列改为“LONGBLOB”类型来解决这个问题，这种类型存储原始字节而不是UTF-8字符。

The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.

这样做的唯一缺点是您必须自己处理编码。如果应用程序的一个客户端使用UTF-8编码，而另一个客户端使用CP1252，那么您的电子邮件可能会发送错误的字符。为了避免这种情况，在所有应用程序中始终使用相同的编码(例如UTF-8)。

Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.

请参阅这个页面http://dev.mysql.com/doc/refman/5.0/en/blob.html，了解更多关于文本/长文本与BLOB/LONGBLOB的区别。在网络上也有许多其他的争论讨论这两个问题。

#6

In general, this happens when you insert strings to columns with incompatible encoding/collation.

通常，当您向具有不兼容编码/排序的列插入字符串时，就会发生这种情况。

I got this error when I had TRIGGERs, which inherit server's collation for some reason. And mysql's default is (at least on Ubuntu) latin-1 with swedish collation. Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:

当我有触发器时，我得到了这个错误，触发器继承了服务器的排序。mysql的默认设置是(至少在Ubuntu上是)latin-1和瑞典语排序。虽然我已经将数据库和所有表设置为UTF-8，但我还没有设置。cnf:

/etc/mysql/my.cnf :

/etc/mysql/my.cnf:

[mysqld]
character-set-server=utf8
default-character-set=utf8

And this must list all triggers with utf8-*:

这必须列出所有使用utf8-*的触发器:

select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS

And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):

其中列出的一些变量也应该有utf-8-*(没有latin-1或其他编码):

show variables like 'char%';

#7

First check if your default_character_set_name is utf8.

首先检查您的default_character_set_name是否为utf8。

SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";

If the result is not utf8 you must convert your database. At first you must save a dump.

如果结果不是utf8，则必须转换数据库。首先你必须保存一个转储文件。

To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:

要将指定数据库中的所有表的字符集编码更改为UTF-8，请在命令行输入以下命令。用数据库名替换DBNAME:

mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME

To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:

要将数据库本身的字符集编码更改为UTF-8，请在mysql>提示符下输入以下命令。用数据库名替换DBNAME:

ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;

You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.

现在可以重新尝试将utf8字符写入数据库。这个解决方案帮助我在我的数据库中上传20万行csv文件。

#8

That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.

这个错误意味着要么您的字符串编码不正确(例如，您试图将ISO-8859-1编码的字符串输入到UTF-8编码的列中)，要么该列不支持您试图输入的数据。

In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details.

在实践中，后一个问题是由MySQL UTF-8实现引起的，该实现只支持UNICODE字符，在UTF-8中表示时需要1-3字节。当尝试通过JDBC将UTF-8插入MySQL时，看到“不正确的字符串值”吗?获取详细信息。

#9

Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.

尽管您的排序被设置为utf8_general_ci，但我怀疑数据库、表或甚至列的字符编码可能有所不同。

ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)  
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

#10

I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1). I have tried to change character set of column to utf8mb4 and after that the error has changed to 'Data too long for column 'content' at row 1'.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8 and changed type of the column to MEDIUMTEXT. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.

我得到了一个类似的错误(错误的字符串值:'\xD0\xBE\xDO\xB2…' for 'content'在第1行。我尝试将列的字符集更改为utf8mb4，之后错误变为'Data too long for column 'content' at row 1'。mysql显示错误。我将列的字符集转换为utf8，并将该列的类型改为MEDIUMTEXT。之后，错误就消失了。我希望它能帮助别人。顺便说一下，在同样的情况下(我在那里测试了相同的插入)，MariaDB可以毫无错误地剪切文本。

#11

I have tried all of the above solutions (which all bring valid points), but nothing was working for me.

我已经尝试了以上所有的解决方案(所有的都带来了有效的点)，但是没有任何东西对我有效。

Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!

直到我发现c#中的MySQL表字段映射使用了不正确的类型:MySqlDbType。Blob。我将它改为MySqlDbType。现在我可以写所有我想要的UTF8符号了!

p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.

我的MySQL表字段是“LongText”类型。但是，当我使用MyGeneration软件自动生成字段映射时，它会自动将字段类型设置为MySqlDbType。团在c#。

Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.

有趣的是，我一直在使用MySqlDbType。使用UTF8字符的Blob类型输入了好几个月，没有遇到任何问题，直到有一天我尝试用其中的一些特定字符编写一个字符串。

Hope this helps someone who is struggling to find a reason for the error.

希望这能帮助那些努力寻找错误原因的人。

#12

The solution for me when running into this Incorrect string value: '\xF8' for column error using scriptcase was to be sure that my database is set up for utf8 general ci and so are my field collations. Then when I do my data import of a csv file I load the csv into UE Studio then save it formatted as utf8 and Voila! It works like a charm, 29000 records in there no errors. Previously I was trying to import an excel created csv.

当使用scriptcase运行这个不正确的字符串值“\xF8”时，我的解决方案是确保我的数据库是为utf8通用ci设置的，我的字段排序也是如此。然后当我做csv文件的数据导入时，我将csv加载到UE Studio，然后将其格式化为utf8 !它工作起来像一个魅力，29000记录在那里没有错误。之前我尝试导入excel创建的csv。

#13

I added binary before the column name and solve the charset error.

我在列名之前添加了二进制，并解决了字符集错误。

insert into tableA values(binary stringcolname1);

插入到表a值(二进制stringcolname1);

#14

Hi i also got this error when i use my online databases from godaddy server i think it has the mysql version of 5.1 or more. but when i do from my localhost server (version 5.7) it was fine after that i created the table from local server and copied to the online server using mysql yog i think the problem is with character set

你好，我在使用godaddy服务器上的在线数据库时也犯了这个错误，我想它的mysql版本是5.1或更多。但是当我从本地服务器(version 5.7)创建表并使用mysql yog复制到在线服务器时，我认为问题出在字符集上

Screenshot Here

这里的截图

#15

In my case ,first i've meet a '???' in my website, then i check Mysql's character set which is latin now ,so i change it into utf-8,then i restart my project ,then i got the same error with you , then i found that i forget to change the database's charset and change into utf-8, boom,it worked.

就我而言，我第一次见到a '??? ?“在我的网站上，我检查Mysql的字符集，现在是拉丁文，所以我把它改成utf-8，然后我重新启动我的项目，然后我得到了同样的错误，然后我发现我忘记修改数据库的字符集，变成utf-8，嘣，它成功了。”

#16

To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).

为了修复这个错误，我将MySQL数据库升级到utf8mb4，它支持完整的Unicode字符集。我建议仔细检查一下，因为有很多问题(例如，索引键可能会因为新的编码而变得太大，之后必须修改字段类型)。

#17

There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)

这里有很好的答案。我只是添加了我的，因为我遇到了相同的错误但结果却是一个完全不同的问题。(也许表面上是一样的，但根本原因不同。)

For me the error happened for the following field:

对于我来说，错误发生在以下领域:

@Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;

This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:

这最终将作为URI类的二进制序列化存储在数据库中。这并没有使用单元测试(使用H2)或CI/集成测试(使用MariaDB4j)来提高任何标志，它在我们的生产环境中被放大了。(不过，一旦理解了问题，就很容易发现MariaDB4j实例中的错误值;只是没把考试搞砸。解决方案是构建一个自定义类型映射器:

package redacted;

import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;

import static java.lang.String.format;

public class UriConverter implements AttributeConverter<URI, String> {
    @Override
    public String convertToDatabaseColumn(URI attribute) {
        return attribute.toString();
    }

    @Override
    public URI convertToEntityAttribute(String field) {
        try {
            return new URI(field);
        }
        catch (URISyntaxException e) {
            throw new RuntimeException(format("could not convert database field to URI: %s", field));
        }
    }
}

Used as follows:

使用如下:

@Column(nullable = false, columnDefinition = "VARCHAR(255)")
@Convert(converter = UriConverter.class)
private URI consulUri;

As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).

就Hibernate而言，它似乎有一些提供的类型映射器，包括java.net.URL，但不包括java.net.URI(这是我们在这里需要的)。

#18

What I did ,was firstly changed the column type to LONG BLOB ,inserted data and then changed the column type to VARCHAR(255) as the data was not that sensitive ,I took the risk and it was huge too( Around 40k entries).I suggest you can try this if only you don't have any data which you don't want to distort.

我所做的是，首先将列类型改为LONG BLOB，插入数据，然后将列类型改为VARCHAR(255)，因为数据没有那么敏感，我冒了风险，它也很大(大约40k个条目)。我建议你可以尝试这个，只要你没有任何数据，你不想扭曲。

#19

-2

1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.

1 -您必须在您的连接中声明enconconding UTF8的属性。http://php.net/manual/en/mysqli.set-charset.php。

2 - If you are using mysql commando line to execute a script, you have to use the flag, like: Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

2 -如果你使用mysql命令行执行脚本，你必须使用这个标志，比如:Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql。exe -h localhost -u root - p3306 -default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\ PontoEletronico\PE10002Corporacao\ bancodedsmodelo \omega_empresa_parametros.sql . \wamp64\

#1