使用PHP和MySQL,如何正确地将智能引号写入数据库?

时间:2021-10-24 20:17:03

I have a PHP website with the CLEditor richtext control on it. When I try to write Euros and British Pounds to the database, the character goes through just fine because I have the charset set to UTF-8 in the containing page HTML, in the richtext control IFRAME HTML, and in the MySQL table collation. All is well on that front. However, when I try to write smart quotes, I end up seeing this output in the database:

我有一个PHP网站,上面有CLEditor richtext控件。当我尝试将Euros和British Pounds写入数据库时​​,该字符完全正常,因为我在包含页面HTML,richtext控件IFRAME HTML和MySQL表格排序中将字符集设置为UTF-8。一切都很顺利。但是,当我尝试编写智能引号时,我最终在数据库中看到了这个输出:

This is a “testâ€.

(If that doesn't show up properly above in you browser, the test word has something like a Latin a, a Euro symbol, and the small AE symbol in front of the word, and a Latin a and a Euro symbol after it.)

(如果在您的浏览器中没有正确显示,则测试单词的内容类似于拉丁语a,欧元符号,单词前面的小AE符号,以及后面的拉丁语a和欧元符号。 )

When I use PHP to read that value back out of the database to display it on the page, it ends up as black diamonds with question marks on them as well as some other Latin characters.

当我使用PHP从数据库中读取该值以在页面上显示它时,它最终会成为带有问号的黑色钻石以及其他一些拉丁字符。

What should I be doing to fix this?

我应该怎么做才能解决这个问题?

4 个解决方案

#1


3  

First, make sure your MySQL table is using UTF-8 as its encoding. If it is, it will look like this:

首先,确保您的MySQL表使用UTF-8作为其编码。如果是,它将如下所示:

mysql> SHOW CREATE TABLE Users (
...
) ENGINE=InnoDB AUTO_INCREMENT=30 DEFAULT CHARSET=utf8 |

Next, make sure your HTML page is set to display UTF-8:

接下来,确保您的HTML页面设置为显示UTF-8:

<html>
    <head>
        <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
    </head>
    ....
</html>

Then it should work.

然后它应该工作。


EDIT: I purposefully did not talk about collation, because I thought it was already considered, but for the benefit of everyone, let me add some more to this answer.

编辑:我故意没有谈论整理,因为我认为它已经考虑过了,但为了每个人的利益,让我在这个答案中添加更多内容。

You state,

I have the charset set to UTF-8 … in the MySQL table collation.

我在MySQL表格排序中将字符集设置为UTF-8 ...

Table collation is not the same thing as charset.

Collation is the act of automagically trying to convert one charset to another FOR THE PURPOSES OF QUERYING. E.g., if you have a charset of latin1 and a collation of UTF-8, and you do something like SELECT * FROM foo WHERE bar LIKE '%—%'; (UTF-8 U+2014) on a table with a charset of latin1 that match either L+0151 or U+2014.

整理是为了寻求目的而自动尝试将一个字符集转换为另一个字符集的行为。例如,如果你有一个latin1的字符集和一个UTF-8的校对,你会做一些像SELECT * FROM foo WHERE bar LIKE'% - %'; (UTF-8 U + 2014)在一张桌子上,latin1的字符集与L + 0151或U + 2014相匹配。

Not so coincidentally... if you were output this latin1 encoded character onto a UTF-8 encoded web page, you will get the following:

不是巧合......如果您将此latin1编码字符输出到UTF-8编码的网页上,您将获得以下信息:

This is a “testâ€.

这是一个“测试”。

That seems to be the output of your problem, exactly. Here's the HTML to duplicate it:

这似乎是你问题的输出,确切地说。这是复制它的HTML:

<?php
$string = "This is a “test”.";
?>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html;charset=utf8"/>
    </head>
    <body>
        <p><?php echo $string; ?></p>
    </body>
</html>

Make sure you save this file in latin1...

确保将此文件保存在latin1中...

To see what charset your table is set to, run this query:

要查看您的表设置为什么charset,请运行以下查询:

SELECT CCSA.character_set_name, TABLE_COLLATION FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "database"
  AND T.table_name = "table";

The only proper results for your uses (unless you're using multiple non-English languages) is:

您使用的唯一正确结果(除非您使用多种非英语语言)是:

+--------------------+-----------------+
| character_set_name | TABLE_COLLATION |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
+--------------------+-----------------+

Thanks for the upvotes ;-)

感谢upvotes ;-)

#2


0  

Make sure that your PHP file has this at the top before any content is printed. I can take latin_swedish_ci into a utf8 encoded website and it encodes correctly.

在打印任何内容之前,请确保您的PHP文件位于顶部。我可以把latin_swedish_ci带到一个utf8编码的网站,它编码正确。

header("Content-type: text/html;charset=UTF-8");

I also put this after my database connection (not sure if this matters as much):

我也在我的数据库连接之后放了这个(不确定这是否重要):

mysql_query("SET NAMES 'utf8'");
mysql_query("SET CHARACTER SET 'utf8'");

mysql_query(“SET NAMES'utf8'”); mysql_query(“SET CHARACTER SET'utf8'”);

#3


0  

For what it's worth for anyone else coming accross this post, I found that adding these mysqld configuration lines - if you have access to the mysql server and can make changes - solved my problem with the curly-quotes.

对于其他任何人来到这篇文章的价值,我发现添加这些mysqld配置行 - 如果你有权访问mysql服务器并且可以进行更改 - 用卷曲引号解决了我的问题。

http://dev.mysql.com/doc/refman/5.6/en/charset-server.html

# Force UTF8 Charset Encoding
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8

I had double-checked the SQL being called from PHP (which appeared fine), and also manually executed an insert/update statment with curly quotes from my GUI (which worked fine), but from the web server was still getting the multi-control characters inserted into the database.

我已经仔细检查了从PHP调用的SQL(看起来很好),并且还手动执行了一个插入/更新语句,其中包含来自我的GUI的曲线引号(工作正常),但是从Web服务器仍然得到了多控件插入数据库的字符。

I checked my mysql server variables and noticed latin1 was the default for the server, and the database (even though the table/columns were UTF8). Once I added the lines above and refreshed the page that issued the update statement, the curly quotes did insert correctly. I can only assume this had something to do with our server's default charset being latin1 and the web server mysql library handshake negotiating as such.

我检查了我的mysql服务器变量并注意到latin1是服务器和数据库的默认值(即使表/列是UTF8)。一旦我添加了上面的行并刷新了发出update语句的页面,就会正确插入引号。我只能假设这与我们的服务器的默认字符集是latin1和web服务器mysql库握手协商有关。

#4


-1  

I found the answer here:

我在这里找到了答案:

https://*.com/a/1262210/105539

This seems to not disturb my Euro and British Pound characters either.

这似乎也不会打扰我的欧元和英镑字符。

#1


3  

First, make sure your MySQL table is using UTF-8 as its encoding. If it is, it will look like this:

首先,确保您的MySQL表使用UTF-8作为其编码。如果是,它将如下所示:

mysql> SHOW CREATE TABLE Users (
...
) ENGINE=InnoDB AUTO_INCREMENT=30 DEFAULT CHARSET=utf8 |

Next, make sure your HTML page is set to display UTF-8:

接下来,确保您的HTML页面设置为显示UTF-8:

<html>
    <head>
        <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
    </head>
    ....
</html>

Then it should work.

然后它应该工作。


EDIT: I purposefully did not talk about collation, because I thought it was already considered, but for the benefit of everyone, let me add some more to this answer.

编辑:我故意没有谈论整理,因为我认为它已经考虑过了,但为了每个人的利益,让我在这个答案中添加更多内容。

You state,

I have the charset set to UTF-8 … in the MySQL table collation.

我在MySQL表格排序中将字符集设置为UTF-8 ...

Table collation is not the same thing as charset.

Collation is the act of automagically trying to convert one charset to another FOR THE PURPOSES OF QUERYING. E.g., if you have a charset of latin1 and a collation of UTF-8, and you do something like SELECT * FROM foo WHERE bar LIKE '%—%'; (UTF-8 U+2014) on a table with a charset of latin1 that match either L+0151 or U+2014.

整理是为了寻求目的而自动尝试将一个字符集转换为另一个字符集的行为。例如,如果你有一个latin1的字符集和一个UTF-8的校对,你会做一些像SELECT * FROM foo WHERE bar LIKE'% - %'; (UTF-8 U + 2014)在一张桌子上,latin1的字符集与L + 0151或U + 2014相匹配。

Not so coincidentally... if you were output this latin1 encoded character onto a UTF-8 encoded web page, you will get the following:

不是巧合......如果您将此latin1编码字符输出到UTF-8编码的网页上,您将获得以下信息:

This is a “testâ€.

这是一个“测试”。

That seems to be the output of your problem, exactly. Here's the HTML to duplicate it:

这似乎是你问题的输出,确切地说。这是复制它的HTML:

<?php
$string = "This is a “test”.";
?>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html;charset=utf8"/>
    </head>
    <body>
        <p><?php echo $string; ?></p>
    </body>
</html>

Make sure you save this file in latin1...

确保将此文件保存在latin1中...

To see what charset your table is set to, run this query:

要查看您的表设置为什么charset,请运行以下查询:

SELECT CCSA.character_set_name, TABLE_COLLATION FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "database"
  AND T.table_name = "table";

The only proper results for your uses (unless you're using multiple non-English languages) is:

您使用的唯一正确结果(除非您使用多种非英语语言)是:

+--------------------+-----------------+
| character_set_name | TABLE_COLLATION |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
+--------------------+-----------------+

Thanks for the upvotes ;-)

感谢upvotes ;-)

#2


0  

Make sure that your PHP file has this at the top before any content is printed. I can take latin_swedish_ci into a utf8 encoded website and it encodes correctly.

在打印任何内容之前,请确保您的PHP文件位于顶部。我可以把latin_swedish_ci带到一个utf8编码的网站,它编码正确。

header("Content-type: text/html;charset=UTF-8");

I also put this after my database connection (not sure if this matters as much):

我也在我的数据库连接之后放了这个(不确定这是否重要):

mysql_query("SET NAMES 'utf8'");
mysql_query("SET CHARACTER SET 'utf8'");

mysql_query(“SET NAMES'utf8'”); mysql_query(“SET CHARACTER SET'utf8'”);

#3


0  

For what it's worth for anyone else coming accross this post, I found that adding these mysqld configuration lines - if you have access to the mysql server and can make changes - solved my problem with the curly-quotes.

对于其他任何人来到这篇文章的价值,我发现添加这些mysqld配置行 - 如果你有权访问mysql服务器并且可以进行更改 - 用卷曲引号解决了我的问题。

http://dev.mysql.com/doc/refman/5.6/en/charset-server.html

# Force UTF8 Charset Encoding
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8

I had double-checked the SQL being called from PHP (which appeared fine), and also manually executed an insert/update statment with curly quotes from my GUI (which worked fine), but from the web server was still getting the multi-control characters inserted into the database.

我已经仔细检查了从PHP调用的SQL(看起来很好),并且还手动执行了一个插入/更新语句,其中包含来自我的GUI的曲线引号(工作正常),但是从Web服务器仍然得到了多控件插入数据库的字符。

I checked my mysql server variables and noticed latin1 was the default for the server, and the database (even though the table/columns were UTF8). Once I added the lines above and refreshed the page that issued the update statement, the curly quotes did insert correctly. I can only assume this had something to do with our server's default charset being latin1 and the web server mysql library handshake negotiating as such.

我检查了我的mysql服务器变量并注意到latin1是服务器和数据库的默认值(即使表/列是UTF8)。一旦我添加了上面的行并刷新了发出update语句的页面,就会正确插入引号。我只能假设这与我们的服务器的默认字符集是latin1和web服务器mysql库握手协商有关。

#4


-1  

I found the answer here:

我在这里找到了答案:

https://*.com/a/1262210/105539

This seems to not disturb my Euro and British Pound characters either.

这似乎也不会打扰我的欧元和英镑字符。