在MySQL查询中规范化重音字符。

时间:2022-03-31 20:19:39

I'd like to be able to do queries that normalize accented characters, so that for example:

我希望能够进行查询,使重音字符规范化,例如:

é, è, and ê

are all treated as 'e', in queries using '=' and 'like'. I have a row with username field set to 'rené', and I'd like to be able to match on it with both 'rene' and 'rené'.

在使用'='和'like'的查询中,都被当作'e'。我有一个用户名字段设置为“rene”的行,我希望能够将它与“rene”和“rene”匹配。

I'm attempting to do this with the 'collate' clause in MySQL 5.0.8. I get the following error:

我正在尝试使用MySQL 5.0.8中的“collate”子句来实现这一点。我得到以下错误:

mysql> select * from User where username = 'rené' collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

FWIW, my table was created with:

FWIW,我的表格是这样创建的:

CREATE TABLE `User` (
  `id` bigint(19) NOT NULL auto_increment,
  `username` varchar(32) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `uniqueUsername` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=56790 DEFAULT CHARSET=utf8

4 个解决方案

#1


9  

The reason for the error is not the table but the characterset of your input, i.e. the 'rené' in your query. The behaviour depends on the character_set_connection variable:

错误的原因不是表,而是输入的字符集,即查询中的“rene”。行为取决于character_set_connection变量:

The character set used for literals that do not have a character set introducer and for number-to-string conversion.

用于没有字符集导入器的文字和用于数字到字符串转换的字符集。

Using the MySQL Client, change it using SET NAMES:

使用MySQL客户端,使用设置名称进行更改:

A SET NAMES 'charset_name' statement is equivalent to these three statements:

SET NAMES 'charset_name'语句等价于这三个语句:

SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;

(from http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html)

(来自http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html)

Example output:

示例输出:

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = 'rené' collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = 'rené' collate utf8_general_ci;
Empty set (0.00 sec)

Altenatively, use can explicitly set the character set using a 'character set introducer':

通常,use可以使用“字符集导入器”显式设置字符集:

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = _utf8'rené' collate utf8_general_ci;
Empty set (0.00 sec)

I know this question is pretty old but since Google led me here for a related question, I though it still deserves an answer :)

我知道这个问题很老了,但既然谷歌带我来这里问了一个相关的问题,我想它仍然值得一个答案:

#2


8  

I'd suggest that you save the normalized versions to your table in addition with the real username. Changing the encoding on the fly can be expensive, and you have to do the conversion again for every row on every search.

我建议您将规范化版本保存到您的表中,并添加真正的用户名。改变飞行中的编码可能会很昂贵,并且您必须在每次搜索的每一行中再次进行转换。

If you're using PHP, you can use iconv() to handle the conversion:

如果使用PHP,则可以使用iconv()来处理转换:

$username = 'rené';
$normalized = iconv('UTF-8', 'ASCII//TRANSLIT', $string);

Then you'd just save both versions and use the normalized version for searching and normal username for display. Comparing and selecting will be alot faster from the normalized column, provided that you normalize the search string also:

然后保存两个版本,使用规范化版本进行搜索,使用普通用户名进行显示。如果您将搜索字符串规范化,那么从规范化的列中比较和选择将会更快。

$search = mysql_real_escape_string(iconv('UTF-8', 'ASCII//TRANSLIT', $_GET['search']));
mysql_query("SELECT * FROM User WHERE normalized LIKE '%".$search."%'");

Of course this method might not be viable if you have several columns that need normalizations, but in your specific case this might work allright.

当然,如果您有几个需要规范化的列,这个方法可能是不可行的,但是在您的特定情况下,这个方法可能会正常工作。

#3


5  

I have implemented a strtr php function/tr unix command in MySQL you can get the source here

我在MySQL中实现了一个strtr php函数/tr unix命令,您可以在这里获得源代码

You can use as:

您可以使用:

SELECT tr(name, 'áäèëî', 'aaeei') FROM persons

or to strip some characters

或者去掉一些字符

SELECT tr(name, 'áäèëî', null) FROM persons

#4


3  

$normalized = iconv('UTF-8', 'ASCII//TRANSLIT', $string);

is a perfect php solution, but in mysql? CONVERT?

这是一个完美的php解决方案,但在mysql中?转换?

in mysql

在mysql中

SELECT 'Álvaro José' as accented, (CONVERT ('Álvaro José' USING ascii)) as notaccented

Produce:

生产:

Álvaro José     ?lvaro Jos?

The accented words is not converted to no accented words, it is not equivalent a translit of iconv.

重音词不转化为无重音词,它不等同于iconv的横线。

RegExp don't work with UTF-8.

RegExp不使用UTF-8。

Not any solution.

没有任何解决方案。

#1


9  

The reason for the error is not the table but the characterset of your input, i.e. the 'rené' in your query. The behaviour depends on the character_set_connection variable:

错误的原因不是表,而是输入的字符集,即查询中的“rene”。行为取决于character_set_connection变量:

The character set used for literals that do not have a character set introducer and for number-to-string conversion.

用于没有字符集导入器的文字和用于数字到字符串转换的字符集。

Using the MySQL Client, change it using SET NAMES:

使用MySQL客户端,使用设置名称进行更改:

A SET NAMES 'charset_name' statement is equivalent to these three statements:

SET NAMES 'charset_name'语句等价于这三个语句:

SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;

(from http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html)

(来自http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html)

Example output:

示例输出:

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = 'rené' collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = 'rené' collate utf8_general_ci;
Empty set (0.00 sec)

Altenatively, use can explicitly set the character set using a 'character set introducer':

通常,use可以使用“字符集导入器”显式设置字符集:

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = _utf8'rené' collate utf8_general_ci;
Empty set (0.00 sec)

I know this question is pretty old but since Google led me here for a related question, I though it still deserves an answer :)

我知道这个问题很老了,但既然谷歌带我来这里问了一个相关的问题,我想它仍然值得一个答案:

#2


8  

I'd suggest that you save the normalized versions to your table in addition with the real username. Changing the encoding on the fly can be expensive, and you have to do the conversion again for every row on every search.

我建议您将规范化版本保存到您的表中,并添加真正的用户名。改变飞行中的编码可能会很昂贵,并且您必须在每次搜索的每一行中再次进行转换。

If you're using PHP, you can use iconv() to handle the conversion:

如果使用PHP,则可以使用iconv()来处理转换:

$username = 'rené';
$normalized = iconv('UTF-8', 'ASCII//TRANSLIT', $string);

Then you'd just save both versions and use the normalized version for searching and normal username for display. Comparing and selecting will be alot faster from the normalized column, provided that you normalize the search string also:

然后保存两个版本,使用规范化版本进行搜索,使用普通用户名进行显示。如果您将搜索字符串规范化,那么从规范化的列中比较和选择将会更快。

$search = mysql_real_escape_string(iconv('UTF-8', 'ASCII//TRANSLIT', $_GET['search']));
mysql_query("SELECT * FROM User WHERE normalized LIKE '%".$search."%'");

Of course this method might not be viable if you have several columns that need normalizations, but in your specific case this might work allright.

当然,如果您有几个需要规范化的列,这个方法可能是不可行的,但是在您的特定情况下,这个方法可能会正常工作。

#3


5  

I have implemented a strtr php function/tr unix command in MySQL you can get the source here

我在MySQL中实现了一个strtr php函数/tr unix命令,您可以在这里获得源代码

You can use as:

您可以使用:

SELECT tr(name, 'áäèëî', 'aaeei') FROM persons

or to strip some characters

或者去掉一些字符

SELECT tr(name, 'áäèëî', null) FROM persons

#4


3  

$normalized = iconv('UTF-8', 'ASCII//TRANSLIT', $string);

is a perfect php solution, but in mysql? CONVERT?

这是一个完美的php解决方案,但在mysql中?转换?

in mysql

在mysql中

SELECT 'Álvaro José' as accented, (CONVERT ('Álvaro José' USING ascii)) as notaccented

Produce:

生产:

Álvaro José     ?lvaro Jos?

The accented words is not converted to no accented words, it is not equivalent a translit of iconv.

重音词不转化为无重音词,它不等同于iconv的横线。

RegExp don't work with UTF-8.

RegExp不使用UTF-8。

Not any solution.

没有任何解决方案。