Ruby 1.9.3中Rails 2.3的编码错误

时间:2023-01-18 09:13:11

I'm in the process of upgrading an old legacy Rails 2.3 app to something more modern and running into an encoding issue. I've read all the existing answers I can find on this issue but I'm still running into problems.

我正在升级一个旧的遗留Rails 2.3应用程序,使其更现代化,并遇到编码问题。我已经阅读了关于这个问题的所有现有答案,但我仍然遇到问题。

Rails ver: 2.3.17 Ruby ver: 1.9.3p385

Ruby ver: 2.3.17 Ruby ver: 1.9.3p385

My MySQL tables are default charset: utf8, collation: utf8_general_ci. Prior to 1.9 I was using the original mysql gem without incident. After upgrading to 1.9 when it retrieved anything with utf8 characters in it would get this well-documented problem:

我的MySQL表是默认字符集:utf8, collation: utf8_general_ci。在1.9之前,我使用了原始的mysql gem,没有出现任何问题。在升级到1.9之后,当它检索到任何带有utf8字符的内容时,就会出现这个问题:

ActionView::TemplateError (incompatible character encodings: ASCII-8BIT and UTF-8)

I switched to the mysql2 gem for it's superior handling and I no longer see exceptions but things are definitely not encoding correctly. For example, what appears in the DB as the string Repoussé is being rendered by Rails as Repoussé, “Boat” appears as “Boatâ€, etc.

我切换到mysql2 gem,因为它具有出色的处理能力,我不再看到异常,但编码肯定不正确。例如,出现在DB一样正在呈现字符串金属细工的Rails RepoussA©,“船”作为一个€œBoata€,等等。

A few more details:

更多细节:

  • I see the same results when I use the ruby-mysql gem as the driver.
  • 当我使用ruby-mysql gem作为驱动程序时,我看到了相同的结果。
  • I've added encoding: utf8 lines to each entry in my database.yml
  • 我已经向数据库中的每个条目添加了编码:utf8行

I've also added the following to my environment.rb:

我还在我的环境中添加了以下内容。

Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

It has occurred to me that I may have some mismatch where latin1 was being written by the old version of the app into the utf8 fields of the database or something, but all of the characters appear correctly when viewed in the mysql command line client.

我突然想到,当旧版本的应用程序将latin1写入数据库的utf8字段或其他字段时,我可能会出现一些不匹配,但是当在mysql命令行客户端中查看时,所有字符都显示正确。

Thanks in advance for any advice, much appreciated!

感谢您的建议,非常感谢!

UPDATE: I now believe that the issue is that my utf8 data is being coerced through a binary conversion into latin1 on the way out of the db, I'm just not sure where.

更新:我现在认为,问题是我的utf8数据在离开db时通过二进制转换为latin1,我只是不确定在哪里。

mysql> SELECT CONVERT(CONVERT(name USING BINARY) USING latin1) AS latin1, CONVERT(CONVERT(name USING BINARY) USING utf8) AS utf8 FROM items WHERE id=myid;
+-------------+----------+
| latin1      | utf8     |
+-------------+----------+
| Repoussé   | Repoussé |
+-------------+----------+

I have my encoding set to utf8 in database.yml, any other ideas where this could be coming from?

我在数据库中将编码设置为utf8。yml,还有其他的想法吗?

2 个解决方案

#1


6  

I finally figured out what my issue was. While my databases were encoded with utf8, the app with the original mysql gem was injecting latin1 text into the utf8 tables.

我终于明白了我的问题所在。当我的数据库使用utf8编码时,使用原始mysql gem的应用程序正在向utf8表中注入latin1文本。

What threw me off was that the output from the mysql comand line client looked correct. It is important to verify that your terminal, the database fields and the MySQL client are all running in utf8.

让我吃惊的是,来自mysql comand line客户端的输出看起来是正确的。重要的是要验证您的终端、数据库字段和MySQL客户端都在utf8中运行。

MySQL's client runs in latin1 by default. You can discover what it is running in by issuing this query:

MySQL的客户端默认运行在latin1中。您可以通过发出以下查询来发现它正在运行的内容:

show variables like 'char%';

If setup properly for utf8 you should see:

如果正确设置utf8,您应该看到:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

If these don't look correct, make sure the following is set in the [client] section of your my.cnf config file:

如果这些看起来不正确,请确保在my.cnf配置文件的[client]部分中设置了以下内容:

default-character-set = utf8

Add add the following to the [mysqld] section:

向[mysqld]部分添加以下内容:

# use utf8 by default
character-set-server=utf8
collation-server=utf8_general_ci

Make sure to restart the mysql daemon before relaunching the client and then verify.

在重新启动客户机之前,请确保重新启动mysql守护进程,然后进行验证。

NOTE: This doesn't change the charset or collation of existing databases, just ensures that any new databases created will default into utf8 and that the client will display in utf8.

注意:这不会更改现有数据库的字符集或排序,只会确保创建的任何新数据库都默认为utf8,并且客户端将在utf8中显示。

After I did this I saw characters in the mysql client that matched what I was getting from the mysql2 gem. I was also able to verify that this content was latin1 by switching to "encoding: latin1" temporarily in my database.conf.

这样做之后,我在mysql客户机中看到了与mysql2 gem匹配的字符。我还可以通过在我的数据库中临时切换到“编码:latin1”来验证该内容是否为latin1。

One extremely handy query to find issues is using char length to find the rows with multi-byte characters:

一个非常方便的查询是使用char长度来查找具有多字节字符的行:

SELECT id, name FROM items WHERE LENGTH(name) != CHAR_LENGTH(name);

There are a lot of scripts out there to convert latin1 contents to utf8, but what worked best for me was dumping all of the databases as latin1 and stuffing the contents back in as utf8:

有很多脚本可以将latin1内容转换为utf8,但是对我来说最有效的方法是将所有数据库作为latin1转储,并将内容填充为utf8:

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset  DBNAME > DBNAME.sql

mysql -u root -p --default-character-set=utf8  DBNAME < DBNAME.sql

I backed up my primary db first, then dumped into a test database and verified like crazy before rolling over to the corrected DB.

我首先备份了我的主数据库,然后将它转储到一个测试数据库中,然后像疯了一样进行验证,然后再转到正确的数据库中。

My understanding is that MySQL's translation can leave some things to be desired with certain more complex characters but since most of my multibyte chars are fairly common things (accent marks, quotes, etc), this worked great for me.

我的理解是,MySQL的翻译可以为某些更复杂的字符保留一些需要的东西,但是由于我的大多数多字节字符字符是相当常见的东西(重音标记、引号等),这对我来说非常有用。

Some resources that proved invaluable in sorting all of this out:

一些资源被证明在整理这一切中是无价的:

#2


1  

You say it all looks OK in the command line client, but perhaps your Terminal's character encoding isn't set to show UTF8? To check in OS X Terminal, click Terminal > Preferences > Settings > Advanced > Character Encoding. Also, check using a graphical tool like MySQL Query Browser at http://dev.mysql.com/downloads/gui-tools/5.0.html.

您说在命令行客户端看起来一切正常,但是您的终端的字符编码可能不会显示UTF8吗?要检查OS X终端,点击终端>首选项>设置>高级>字符编码。此外,还可以在http://dev.mysql.com/downloads/gui-tools/5.0.html中使用MySQL查询浏览器之类的图形工具进行检查。

#1


6  

I finally figured out what my issue was. While my databases were encoded with utf8, the app with the original mysql gem was injecting latin1 text into the utf8 tables.

我终于明白了我的问题所在。当我的数据库使用utf8编码时,使用原始mysql gem的应用程序正在向utf8表中注入latin1文本。

What threw me off was that the output from the mysql comand line client looked correct. It is important to verify that your terminal, the database fields and the MySQL client are all running in utf8.

让我吃惊的是,来自mysql comand line客户端的输出看起来是正确的。重要的是要验证您的终端、数据库字段和MySQL客户端都在utf8中运行。

MySQL's client runs in latin1 by default. You can discover what it is running in by issuing this query:

MySQL的客户端默认运行在latin1中。您可以通过发出以下查询来发现它正在运行的内容:

show variables like 'char%';

If setup properly for utf8 you should see:

如果正确设置utf8,您应该看到:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

If these don't look correct, make sure the following is set in the [client] section of your my.cnf config file:

如果这些看起来不正确,请确保在my.cnf配置文件的[client]部分中设置了以下内容:

default-character-set = utf8

Add add the following to the [mysqld] section:

向[mysqld]部分添加以下内容:

# use utf8 by default
character-set-server=utf8
collation-server=utf8_general_ci

Make sure to restart the mysql daemon before relaunching the client and then verify.

在重新启动客户机之前,请确保重新启动mysql守护进程,然后进行验证。

NOTE: This doesn't change the charset or collation of existing databases, just ensures that any new databases created will default into utf8 and that the client will display in utf8.

注意:这不会更改现有数据库的字符集或排序,只会确保创建的任何新数据库都默认为utf8,并且客户端将在utf8中显示。

After I did this I saw characters in the mysql client that matched what I was getting from the mysql2 gem. I was also able to verify that this content was latin1 by switching to "encoding: latin1" temporarily in my database.conf.

这样做之后,我在mysql客户机中看到了与mysql2 gem匹配的字符。我还可以通过在我的数据库中临时切换到“编码:latin1”来验证该内容是否为latin1。

One extremely handy query to find issues is using char length to find the rows with multi-byte characters:

一个非常方便的查询是使用char长度来查找具有多字节字符的行:

SELECT id, name FROM items WHERE LENGTH(name) != CHAR_LENGTH(name);

There are a lot of scripts out there to convert latin1 contents to utf8, but what worked best for me was dumping all of the databases as latin1 and stuffing the contents back in as utf8:

有很多脚本可以将latin1内容转换为utf8,但是对我来说最有效的方法是将所有数据库作为latin1转储,并将内容填充为utf8:

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset  DBNAME > DBNAME.sql

mysql -u root -p --default-character-set=utf8  DBNAME < DBNAME.sql

I backed up my primary db first, then dumped into a test database and verified like crazy before rolling over to the corrected DB.

我首先备份了我的主数据库,然后将它转储到一个测试数据库中,然后像疯了一样进行验证,然后再转到正确的数据库中。

My understanding is that MySQL's translation can leave some things to be desired with certain more complex characters but since most of my multibyte chars are fairly common things (accent marks, quotes, etc), this worked great for me.

我的理解是,MySQL的翻译可以为某些更复杂的字符保留一些需要的东西,但是由于我的大多数多字节字符字符是相当常见的东西(重音标记、引号等),这对我来说非常有用。

Some resources that proved invaluable in sorting all of this out:

一些资源被证明在整理这一切中是无价的:

#2


1  

You say it all looks OK in the command line client, but perhaps your Terminal's character encoding isn't set to show UTF8? To check in OS X Terminal, click Terminal > Preferences > Settings > Advanced > Character Encoding. Also, check using a graphical tool like MySQL Query Browser at http://dev.mysql.com/downloads/gui-tools/5.0.html.

您说在命令行客户端看起来一切正常,但是您的终端的字符编码可能不会显示UTF8吗?要检查OS X终端,点击终端>首选项>设置>高级>字符编码。此外,还可以在http://dev.mysql.com/downloads/gui-tools/5.0.html中使用MySQL查询浏览器之类的图形工具进行检查。