如何使用select查询过滤非重音字符的列

时间:2022-03-23 07:37:44

I have a MySQL table (test) with utf-8 charset encoding. There are three entries, two entries with normal characters and another name with accent characters.

我有一个使用utf-8字符集编码的MySQL表(测试)。有三个条目,两个正常字符的条目和另一个带重音字符的名字。

CREATE TABLE test (
  id Integer,
  name VARCHAR(50), 
  PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

INSERT INTO `test` (`id`, `name`) VALUES (1, 'aaaa');
INSERT INTO `test` (`id`, `name`) VALUES (2, 'AAAA');
INSERT INTO `test` (`id`, `name`) VALUES (3, 'áááá');

If I run the following select query, it returns all the 3 entries

如果我运行下面的select查询,它将返回所有3个条目

Actual Result:-

实际结果:

select * from test where name like '%aa%';

id  | name
----|----
1   | aaaa
2   | AAAA
3   | áááá

Instead of that, it should be return last entry with id=3.

相反,它应该返回id=3的最后一个条目。

I don't want to use 'BINARY' OR 'COLLATE utf8_bin' because it returns only case sensitive search.

我不想使用'BINARY'或'COLLATE utf8_bin',因为它只返回区分大小写的搜索。

I need normal search with string like query, e.g:-

我需要像查询一样的字符串进行常规搜索,例如:-

Expected Result:-

预期的结果:

select * from test where name like '%aa%';

id | name
---|-----
1  | aaaa
2  | AAAA

5 个解决方案

#1


6  

The utf8_bin collation is what you need for your requirement to handle accents

utf8_bin排序是处理口音所需的

I don't want to use 'BINARY' OR 'COLLATE utf8_bin' because it returns only case sensitive search.

我不想使用'BINARY'或'COLLATE utf8_bin',因为它只返回区分大小写的搜索。

This is easier (and more performant) to solve with utf8_bin than solving the accent issue with another collation

使用utf8_bin解决这个问题比使用另一个排序规则解决口音问题更容易(也更高效)

SELECT * FROM test WHERE LOWER(name) like '%aa%' COLLATE utf8_bin

-> added after comments

- >添加评论后

The query above assumes that the query parameters are minuscule but if you cant modify the params to always be minuscules then you can also use this variation

上面的查询假定查询参数是极小的,但是如果不能将params修改为总是极小的,那么也可以使用这种变体

SELECT * FROM test WHERE LOWER(name) like LOWER('%ÚÙ%') COLLATE utf8_bin

#2


4  

utf8_bin is the collation you want to distinguish accented characters.

utf8_bin是要区分重音字符的排序。

In the query you can user lower to make the query case insensitive.

在查询中,您可以使用更低的用户来使查询大小写不敏感。

CREATE TABLE `token` (
  `id` int(11) NOT NULL DEFAULT '0',
  `name` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

mysql> select * from token where lower(name) like '%aa%';
+----+------+
| id | name |
+----+------+
|  1 | aaaa |
|  2 | AAAA |
+----+------+
2 rows in set (0.00 sec)

#3


1  

You can solve your problem using following query

您可以使用以下查询来解决问题

  select * from token where (convert(name using ASCII)) like '%aa%'

convert is used to convert between character sets

转换用于在字符集之间进行转换

#4


1  

Using RLIKE (REGEXP) could solve your problem (it will return your expected result by using a more powerfull version of like)

使用RLIKE (REGEXP)可以解决您的问题(通过使用更强大的like版本,它将返回您期望的结果)

from MYSQL-Documentation:
A regular expression is a powerful way of specifying a pattern for a complex search.
.... REGEXP is not case sensitive, except when used with binary strings.

MYSQL-Documentation:正则表达式是一种强大的方式,指定一个模式,一个复杂的搜索。....REGEXP不区分大小写,除非使用二进制字符串。

just replace

只是替换

where name like '%aa%'

with

where Name rlike 'aa';

to do a case insensitive search for the expression 'aa'.

对“aa”表达式进行不区分大小写的搜索。

BUT :
This can be a somehow unsafe approach as unexpected results can be produced by comparing multi-byte characters according to MySQL Documentation.

但是:这可能是一种不安全的方法,因为根据MySQL文档,通过比较多字节字符可以产生意外结果。

#5


1  

You can try with:

你可以尝试:

SELECT * FROM test.test
where convert(name using ascii) like '%aa%';

But be careful, convert has performance issues on indexes. More information at http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

但是要小心,转换在索引上有性能问题。更多信息在http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

#1


6  

The utf8_bin collation is what you need for your requirement to handle accents

utf8_bin排序是处理口音所需的

I don't want to use 'BINARY' OR 'COLLATE utf8_bin' because it returns only case sensitive search.

我不想使用'BINARY'或'COLLATE utf8_bin',因为它只返回区分大小写的搜索。

This is easier (and more performant) to solve with utf8_bin than solving the accent issue with another collation

使用utf8_bin解决这个问题比使用另一个排序规则解决口音问题更容易(也更高效)

SELECT * FROM test WHERE LOWER(name) like '%aa%' COLLATE utf8_bin

-> added after comments

- >添加评论后

The query above assumes that the query parameters are minuscule but if you cant modify the params to always be minuscules then you can also use this variation

上面的查询假定查询参数是极小的,但是如果不能将params修改为总是极小的,那么也可以使用这种变体

SELECT * FROM test WHERE LOWER(name) like LOWER('%ÚÙ%') COLLATE utf8_bin

#2


4  

utf8_bin is the collation you want to distinguish accented characters.

utf8_bin是要区分重音字符的排序。

In the query you can user lower to make the query case insensitive.

在查询中,您可以使用更低的用户来使查询大小写不敏感。

CREATE TABLE `token` (
  `id` int(11) NOT NULL DEFAULT '0',
  `name` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

mysql> select * from token where lower(name) like '%aa%';
+----+------+
| id | name |
+----+------+
|  1 | aaaa |
|  2 | AAAA |
+----+------+
2 rows in set (0.00 sec)

#3


1  

You can solve your problem using following query

您可以使用以下查询来解决问题

  select * from token where (convert(name using ASCII)) like '%aa%'

convert is used to convert between character sets

转换用于在字符集之间进行转换

#4


1  

Using RLIKE (REGEXP) could solve your problem (it will return your expected result by using a more powerfull version of like)

使用RLIKE (REGEXP)可以解决您的问题(通过使用更强大的like版本,它将返回您期望的结果)

from MYSQL-Documentation:
A regular expression is a powerful way of specifying a pattern for a complex search.
.... REGEXP is not case sensitive, except when used with binary strings.

MYSQL-Documentation:正则表达式是一种强大的方式,指定一个模式,一个复杂的搜索。....REGEXP不区分大小写,除非使用二进制字符串。

just replace

只是替换

where name like '%aa%'

with

where Name rlike 'aa';

to do a case insensitive search for the expression 'aa'.

对“aa”表达式进行不区分大小写的搜索。

BUT :
This can be a somehow unsafe approach as unexpected results can be produced by comparing multi-byte characters according to MySQL Documentation.

但是:这可能是一种不安全的方法,因为根据MySQL文档,通过比较多字节字符可以产生意外结果。

#5


1  

You can try with:

你可以尝试:

SELECT * FROM test.test
where convert(name using ascii) like '%aa%';

But be careful, convert has performance issues on indexes. More information at http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

但是要小心,转换在索引上有性能问题。更多信息在http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html