使用“order by”在数据库中排序的非拉丁字符

时间:2022-09-22 22:47:32

I just found some strange behavior of database's "order by" clause. In string comparison, I expected some characters such as '[' and '_' are greater than latin characters/digits such as 'I' or '2' considering their orders in the ASCII table. However, the sorting results from database's "order by" clause is different with my expectation. Here's my test:

我刚刚发现了数据库的“order by”子句的一些奇怪行为。在字符串比较中,我预计一些字符如'['和'_'比拉丁字符/数字更大,例如'I'或'2',考虑它们在ASCII表中的顺序。但是,数据库的“order by”子句的排序结果与我的期望不同。这是我的测试:

SQLite version 3.6.23
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table products(name varchar(10));
sqlite> insert into products values('ipod');
sqlite> insert into products values('iphone');
sqlite> insert into products values('[apple]');
sqlite> insert into products values('_ipad');
sqlite> select * from products order by name asc;
[apple]
_ipad
iphone
ipod

SQLite版本3.6.23输入“.help”作为指令输入以“;”结尾的SQL语句sqlite> create table products(name varchar(10)); sqlite>插入产品值('ipod'); sqlite>插入产品值('iphone'); sqlite>插入产品值('[apple]'); sqlite>插入产品值('_ ipad'); sqlite> select * from products order by name asc; [苹果] _ipad iphone ipod

select * from products order by name asc;
name
...
[B@
_ref
123
1ab
...

This behavior is different from Java's string comparison (which cost me some time to find this issue). I can verify this in both SQLite 3.6.23 and Microsoft SQL Server 2005. I did some web search but cannot find any related documentation. Could someone shed me some light on it? Is it a SQL standard? Where can I find some information about this? Thanks in advance.

这种行为与Java的字符串比较(我花了一些时间来查找此问题)不同。我可以在SQLite 3.6.23和Microsoft SQL Server 2005中验证这一点。我做了一些网络搜索但找不到任何相关文档。有人能给我一些启示吗?它是SQL标准吗?我在哪里可以找到关于此的一些信息?提前致谢。

3 个解决方案

#1


2  

The concept of comparing and ordering the characters in a database is called collation.

比较和排序数据库中字符的概念称为整理。

How the strings are stored depends on the collation which is usually set in the server, client or session properties.

如何存储字符串取决于通常在服务器,客户端或会话属性中设置的排序规则。

In MySQL:

SELECT  *
FROM    (
        SELECT  'a' AS str
        UNION ALL
        SELECT  'A' AS str
        UNION ALL
        SELECT  'b' AS str
        UNION ALL
        SELECT  'B' AS str
        ) q
ORDER BY
        str COLLATE UTF8_BIN


--
'A'
'B'
'a'
'b'

and

SELECT  *
FROM    (
        SELECT  'a' AS str
        UNION ALL
        SELECT  'A' AS str
        UNION ALL
        SELECT  'b' AS str
        UNION ALL
        SELECT  'B' AS str
        ) q
ORDER BY
        str COLLATE UTF8_GENERAL_CI


--
'a'
'A'
'b'
'B'

UTF8_BIN sorts characters according to their unicode. Caps have lower unicodes and therefore go first.

UTF8_BIN根据其unicode对字符进行排序。帽子有较低的unicodes,因此首先。

UTF8_GENERAL_CI sorts characters according to their alphabetical position, disregarding case.

UTF8_GENERAL_CI根据字母位置对字符进行排序,无视大小写。

Collation is also important for indexes, since the indexes rely heavily on sorting and comparison rules.

归类对于索引也很重要,因为索引很大程度上依赖于排序和比较规则。

#2


1  

The important keyword in this case is 'collation'. I have no experience with SQLite, but would expect it to be similar to other database engines in that you can define the collation to use for whole databases, single tables, per connection, etc.

在这种情况下,重要的关键字是“整理”。我没有使用SQLite的经验,但希望它与其他数据库引擎类似,因为您可以定义用于整个数据库,单个表,每个连接等的排序规则。

Check your DB documentation for the options available to you.

检查数据库文档以获取可用的选项。

#3


0  

The ASCII codes for lower-case characters such as 'i' are greater than the ones for '[' and '_':

小写字符(如“i”)的ASCII码大于“[”和“_”的ASCII码:

'i': 105
'[': 91
'_': 95

However, try to insert upper-case characters, eg. try with "IPOD" or "Iphone", those will become before "_" and "[" with the default binary collation.

但是,尝试插入大写字符,例如。尝试使用“IPOD”或“Iphone”,这些将在“_”和“[”之前使用默认的二进制排序规则。

#1


2  

The concept of comparing and ordering the characters in a database is called collation.

比较和排序数据库中字符的概念称为整理。

How the strings are stored depends on the collation which is usually set in the server, client or session properties.

如何存储字符串取决于通常在服务器,客户端或会话属性中设置的排序规则。

In MySQL:

SELECT  *
FROM    (
        SELECT  'a' AS str
        UNION ALL
        SELECT  'A' AS str
        UNION ALL
        SELECT  'b' AS str
        UNION ALL
        SELECT  'B' AS str
        ) q
ORDER BY
        str COLLATE UTF8_BIN


--
'A'
'B'
'a'
'b'

and

SELECT  *
FROM    (
        SELECT  'a' AS str
        UNION ALL
        SELECT  'A' AS str
        UNION ALL
        SELECT  'b' AS str
        UNION ALL
        SELECT  'B' AS str
        ) q
ORDER BY
        str COLLATE UTF8_GENERAL_CI


--
'a'
'A'
'b'
'B'

UTF8_BIN sorts characters according to their unicode. Caps have lower unicodes and therefore go first.

UTF8_BIN根据其unicode对字符进行排序。帽子有较低的unicodes,因此首先。

UTF8_GENERAL_CI sorts characters according to their alphabetical position, disregarding case.

UTF8_GENERAL_CI根据字母位置对字符进行排序,无视大小写。

Collation is also important for indexes, since the indexes rely heavily on sorting and comparison rules.

归类对于索引也很重要,因为索引很大程度上依赖于排序和比较规则。

#2


1  

The important keyword in this case is 'collation'. I have no experience with SQLite, but would expect it to be similar to other database engines in that you can define the collation to use for whole databases, single tables, per connection, etc.

在这种情况下,重要的关键字是“整理”。我没有使用SQLite的经验,但希望它与其他数据库引擎类似,因为您可以定义用于整个数据库,单个表,每个连接等的排序规则。

Check your DB documentation for the options available to you.

检查数据库文档以获取可用的选项。

#3


0  

The ASCII codes for lower-case characters such as 'i' are greater than the ones for '[' and '_':

小写字符(如“i”)的ASCII码大于“[”和“_”的ASCII码:

'i': 105
'[': 91
'_': 95

However, try to insert upper-case characters, eg. try with "IPOD" or "Iphone", those will become before "_" and "[" with the default binary collation.

但是,尝试插入大写字符,例如。尝试使用“IPOD”或“Iphone”,这些将在“_”和“[”之前使用默认的二进制排序规则。