特殊字符在MySQL (UTF-8)中不能使用

时间:2023-01-06 14:47:14

So, I've had some issues while trying to come over from Latin1 encoded databases, tables as well as columns, and now that everything is finally in UTF-8, I can't seem to update a row in a column. I am trying to replace an "e" with an e with acute (é). But it gives me this:

因此,我在尝试从Latin1编码的数据库、表和列中获取数据时遇到了一些问题,现在所有的数据都是UTF-8格式的,我似乎无法更新列中的一行。我想把e换成锐角e。但它告诉我:

ERROR 1366 (HY000): Incorrect string value: '\x82m ...' for column 'Name' at row 1

错误1366 (HY000):错误的字符串值:'\x82m…在第一行的列名。

when running this:

运行时:

UPDATE access SET Name='ém' WHERE id="2";

更新访问集名称=' em ',其中id="2";

All databases gives me this when running the status command (except the current database part of course):

所有的数据库在运行状态命令时都给出了这一点(当然当前数据库部分除外):


Connection id:          1  
Current database:       access  
Current user:           root@localhost  
SSL:                    Not in use  
Using delimiter:        ;  
Server version:         5.1.47-community MySQL Community Server (GPL)  
Protocol version:       10  
Connection:             localhost via TCP/IP  
Server characterset:    utf8  
Db     characterset:    utf8  
Client characterset:    utf8  
Conn.  characterset:    utf8  
TCP port:               3306  
Uptime:                 20 min 16 sec  

Threads: 1 Questions: 110 Slow queries: 0 Opens: 18 Flush tables: 1 Open tables: 11  Queries per second avg: 0.90

And running the chcp command in cmd gives me 850. Oh, and at some points I got this:

在cmd中运行chcp命令得到850。哦,在某些时候我得到了这个:

ERROR 1300 (HY000): Invalid utf8 character string: 'ém' WHERE id="2"

错误1300 (HY000):无效的utf8字符串:' em ',其中id="2"

I've looked everywhere for a solution, but I couldn't seem to find anything anywhere, and since I've always had good responses on *, I thought I'd ask here.

我到处都在寻找解决方案,但似乎找不到任何地方,而且由于我对*的反应一直很好,所以我想在这里问一下。

Thanks for any help!

感谢任何帮助!

4 个解决方案

#1


3  

This thread, although somewhat old, seems to result in the conclusion that cmd.exe and the mysql client don't handle UTF-8 encoding properly (with the blame being more aimed at cmd.exe).

这条线虽然有点旧,但似乎得出了cmd的结论。exe和mysql客户端不能正确地处理UTF-8编码(指责更多地针对cmd.exe)。

Reading in SQL from a file is recommended, as is using an alternative client - or a flavour of UNIX. :)

从文件读取SQL中建议,使用另一个客户机或UNIX的味道。:)

#2


4  

The solution is to set the connection variables to whatever codepage your installation of windows uses (not latin1 like what a lot of pages out there recommend - cmd.exe's character encoding isn't latin1).

解决方案是将连接变量设置为windows安装使用的代码页(而不是像很多页面推荐的那样使用latin1)——cmd。exe的字符编码不是latin1)。

In my case the codepage is 850:

在我的例子中,codepage是850:

mysql> SET NAMES cp850;

mysql > cp850组名称;

Here's an example with the connection set to UTF-8:

这里有一个连接设置为UTF-8的例子:

mysql> show variables like '%char%';
+--------------------------+---------------------------------+
| Variable_name            | Value                           |
+--------------------------+---------------------------------+
| character_set_client     | utf8                            |
| character_set_connection | utf8                            |
| character_set_database   | utf8                            |
| character_set_filesystem | binary                          |
| character_set_results    | utf8                            |
| character_set_server     | utf8                            |
| character_set_system     | utf8                            |
| character_sets_dir       | C:\xampp\mysql\share\charsets\  |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)

This is what happens to accented characters:

这就是重音字符的情况:

mysql> select nom from assignatura where nom like '%prob%';
+---------------------------------------+
| nom                                   |
+---------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+---------------------------------------+
2 rows in set (0.03 sec)

Notice the extraneous character just before the á. Also the accent is the wrong direction, it should be à.

注意到外来├字符之前。而且口音是错误的方向,应该是a。

After executing SET NAMES cp850;:

执行完SET NAMES cp850后:

mysql> show variables like '%char%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | cp850                          |
| character_set_connection | cp850                          |
| character_set_database   | utf8                           |
| character_set_filesystem | binary                         |
| character_set_results    | cp850                          |
| character_set_server     | utf8                           |
| character_set_system     | utf8                           |
| character_sets_dir       | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)

We finally get the correct accented character:

我们最终得到了正确的重读字符:

mysql> select nom from assignatura where nom like '%prob%';
+--------------------------------------+
| nom                                  |
+--------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+--------------------------------------+
2 rows in set (0.00 sec)

#3


0  

When you input stuff on the command line, the strings will be in whatever character set the terminal uses. Why the mysql client doesn't translate that before sending it to the db still puzzles me, but it doesn't. You're probably sending latin1 to the db.

当您在命令行上输入内容时,字符串将位于终端使用的任何字符集中。为什么mysql客户端在发送到db之前不翻译它仍然让我困惑,但它没有。您可能正在向db发送latin1。

You could save your update SQL in a text file, make sure that text file is UTF-8, and run something like type myfile.txt | mysql db_name

您可以将更新SQL保存在一个文本文件中,确保文本文件是UTF-8,并运行类似myfile的操作。txt | mysql db_name

#4


0  

Well ... 0x82 is e-acute in code page 850. It would be 0xE9 in ISO-8859-1 which makes it something like 0xD0 0xB4 in UTF-8. I don't know if there is a good way to get a DOS window to handle UTF-8 input correctly. Here is an alternative if you are using the command line client. You can set the client character set to match whatever your local code page is and let the mysql library take care of the transcoding for you:

嗯…0x82在代码页850中是e-急性。它是ISO-8859-1中的0xE9,这使得它在UTF-8中的值类似于0xD0 0xB4。我不知道是否有一个好的方法可以让DOS窗口正确地处理UTF-8输入。如果您正在使用命令行客户端,这里有一个替代方案。您可以设置客户端字符集以匹配您的本地代码页,并让mysql库为您处理代码转换:

c:\> mysql --default-character-set=cp850
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.34, for apple-darwin9.6.0 (i386) using readline 5.2

Connection id:         17
Current database:
Current user:          daveshawley@localhost
SSL:                   Not in use
Current pager:         stdout
Using outfile:         ''
Using delimiter:       ;
Server version:        5.1.34-log Source distribution
Protocol version:      10
Connection:            localhost via TCP/IP
Server characterset:   ucs2
Db     characterset:   ucs2
Client characterset:   cp850
Conn.  characterset:   cp850
TCP port:              3306
Uptime:                19 days 8 hours 37 min 55 sec

Threads: 2  Questions: 248  Slow queries: 0  Opens: 71  Flush tables: 1  Open tables: 64  Queries per second avg: 0.0
--------------

I know that this works for the combination of latin1 in one window and utf8 in another window on my MacBook. I also verified that an ALTER TABLE ... CONVERT TO CHARACTER SET ucs2 did the right thing.

我知道这适用于在我的MacBook上同时使用latin1和utf8。我还验证了一个修改表……转换为字符集ucs2做了正确的事情。

#1


3  

This thread, although somewhat old, seems to result in the conclusion that cmd.exe and the mysql client don't handle UTF-8 encoding properly (with the blame being more aimed at cmd.exe).

这条线虽然有点旧,但似乎得出了cmd的结论。exe和mysql客户端不能正确地处理UTF-8编码(指责更多地针对cmd.exe)。

Reading in SQL from a file is recommended, as is using an alternative client - or a flavour of UNIX. :)

从文件读取SQL中建议,使用另一个客户机或UNIX的味道。:)

#2


4  

The solution is to set the connection variables to whatever codepage your installation of windows uses (not latin1 like what a lot of pages out there recommend - cmd.exe's character encoding isn't latin1).

解决方案是将连接变量设置为windows安装使用的代码页(而不是像很多页面推荐的那样使用latin1)——cmd。exe的字符编码不是latin1)。

In my case the codepage is 850:

在我的例子中,codepage是850:

mysql> SET NAMES cp850;

mysql > cp850组名称;

Here's an example with the connection set to UTF-8:

这里有一个连接设置为UTF-8的例子:

mysql> show variables like '%char%';
+--------------------------+---------------------------------+
| Variable_name            | Value                           |
+--------------------------+---------------------------------+
| character_set_client     | utf8                            |
| character_set_connection | utf8                            |
| character_set_database   | utf8                            |
| character_set_filesystem | binary                          |
| character_set_results    | utf8                            |
| character_set_server     | utf8                            |
| character_set_system     | utf8                            |
| character_sets_dir       | C:\xampp\mysql\share\charsets\  |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)

This is what happens to accented characters:

这就是重音字符的情况:

mysql> select nom from assignatura where nom like '%prob%';
+---------------------------------------+
| nom                                   |
+---------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+---------------------------------------+
2 rows in set (0.03 sec)

Notice the extraneous character just before the á. Also the accent is the wrong direction, it should be à.

注意到外来├字符之前。而且口音是错误的方向,应该是a。

After executing SET NAMES cp850;:

执行完SET NAMES cp850后:

mysql> show variables like '%char%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | cp850                          |
| character_set_connection | cp850                          |
| character_set_database   | utf8                           |
| character_set_filesystem | binary                         |
| character_set_results    | cp850                          |
| character_set_server     | utf8                           |
| character_set_system     | utf8                           |
| character_sets_dir       | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)

We finally get the correct accented character:

我们最终得到了正确的重读字符:

mysql> select nom from assignatura where nom like '%prob%';
+--------------------------------------+
| nom                                  |
+--------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+--------------------------------------+
2 rows in set (0.00 sec)

#3


0  

When you input stuff on the command line, the strings will be in whatever character set the terminal uses. Why the mysql client doesn't translate that before sending it to the db still puzzles me, but it doesn't. You're probably sending latin1 to the db.

当您在命令行上输入内容时,字符串将位于终端使用的任何字符集中。为什么mysql客户端在发送到db之前不翻译它仍然让我困惑,但它没有。您可能正在向db发送latin1。

You could save your update SQL in a text file, make sure that text file is UTF-8, and run something like type myfile.txt | mysql db_name

您可以将更新SQL保存在一个文本文件中,确保文本文件是UTF-8,并运行类似myfile的操作。txt | mysql db_name

#4


0  

Well ... 0x82 is e-acute in code page 850. It would be 0xE9 in ISO-8859-1 which makes it something like 0xD0 0xB4 in UTF-8. I don't know if there is a good way to get a DOS window to handle UTF-8 input correctly. Here is an alternative if you are using the command line client. You can set the client character set to match whatever your local code page is and let the mysql library take care of the transcoding for you:

嗯…0x82在代码页850中是e-急性。它是ISO-8859-1中的0xE9,这使得它在UTF-8中的值类似于0xD0 0xB4。我不知道是否有一个好的方法可以让DOS窗口正确地处理UTF-8输入。如果您正在使用命令行客户端,这里有一个替代方案。您可以设置客户端字符集以匹配您的本地代码页,并让mysql库为您处理代码转换:

c:\> mysql --default-character-set=cp850
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.34, for apple-darwin9.6.0 (i386) using readline 5.2

Connection id:         17
Current database:
Current user:          daveshawley@localhost
SSL:                   Not in use
Current pager:         stdout
Using outfile:         ''
Using delimiter:       ;
Server version:        5.1.34-log Source distribution
Protocol version:      10
Connection:            localhost via TCP/IP
Server characterset:   ucs2
Db     characterset:   ucs2
Client characterset:   cp850
Conn.  characterset:   cp850
TCP port:              3306
Uptime:                19 days 8 hours 37 min 55 sec

Threads: 2  Questions: 248  Slow queries: 0  Opens: 71  Flush tables: 1  Open tables: 64  Queries per second avg: 0.0
--------------

I know that this works for the combination of latin1 in one window and utf8 in another window on my MacBook. I also verified that an ALTER TABLE ... CONVERT TO CHARACTER SET ucs2 did the right thing.

我知道这适用于在我的MacBook上同时使用latin1和utf8。我还验证了一个修改表……转换为字符集ucs2做了正确的事情。