SQL Server中奇怪的比较结果

时间:2021-01-23 23:39:02

For characters that are do not have an explicit order that depends on the collation (i.e., that are not case-folded and/or do not have accents stripped), the ordering is based on the ordinal value of the code point in the base character set. If one string is a prefix of another, the shorter string collates first.

对于不具有显式顺序的字符,则不具有依赖于排序的显式顺序。如果一个字符串是另一个字符串的前缀,则较短的字符串会首先排序。

The first 128 code points in Unicode are the same as the ASCII character set, which means / comes after -. ---- From cco's answer.

Unicode中的前128个代码点与ASCII字符集相同,这意味着/在-之后。- - - - -从cco的答案。

Then why would the following happen?

那么为什么会发生以下情况呢?

SQL Server中奇怪的比较结果

I expect '2017/8/22 1:33:53' to come after '2017-08-13 23:12:33.411' because / comes after -. Is this because special ordering in Chinese_PRC_CI_AS collation? Where can I find the specification if that's the case?

我期待着“2017/8/22 1:33:53”在“2017-08-13 23:12:33411”之后,因为…这是因为Chinese_PRC_CI_AS排序的特殊顺序吗?如果是这样的话,我在哪里可以找到说明书?

3 个解决方案

#1


3  

I expect '2017/8/22 1:33:53' to come after '2017-08-13 23:12:33.411' because / comes after -. Is this because special ordering in Chinese_PRC_CI_AS collation?

我期待着“2017/8/22 1:33:53”在“2017-08-13 23:12:33411”之后,因为…这是因为Chinese_PRC_CI_AS排序的特殊顺序吗?

Yes it's because of Chinese_PRC_CI_AS collation

是的,因为Chinese_PRC_CI_AS排序

Where can I find the specification if that's the case?

如果是这样的话,我在哪里可以找到说明书?

The rules are complicated enough but you can find them here: UNICODE COLLATION ALGORITHM

规则非常复杂,但是您可以在这里找到它们:UNICODE排序算法

I extract some text from it as a picture to give you an idea: SQL Server中奇怪的比较结果

我从中提取一些文字作为图片,给你一个想法:

In your case you can try to use a binary collation Chinese_PRC_BIN2, it will give you the desirable result in this particular case, but it can be non acceptable for letters sorting, I mean in european languages binary sort will always put Upper case letters before all the lower case letters, but I have no idea how Chinese symbols should be sorted

在你的情况下,您可以尝试使用二进制排序Chinese_PRC_BIN2,它会给你的结果在这个特殊情况下,但它可以不接受字母排序,我的意思是在欧洲语言二进制排序总是把大写字母之前所有的小写字母,但是我不知道应该如何排序的中国符号

Here is the code for sort your dates with your order (you said you use UNICODE columns):

这是您的日期和订单排序的代码(您说过您使用UNICODE列):

declare @t table (s nvarchar(100))
insert into @t values (N'2017/8/22 1:33:53'),  (N'2017-08-13 23:12:33.411')

select *
from @t
order by s collate Chinese_PRC_BIN2;

Bin2 collation will work in "non unicode case" as your example in the picture, too:

Bin2排序将在“非unicode的情况”中工作,就像你在图片中的例子一样:

select case 
      when '2017/8/22 1:33:53' collate Chinese_PRC_BIN2 < 
           '2017-08-13 23:12:33.411' collate Chinese_PRC_BIN2 
      then 'TRUE' 
      else 'FALSE' 
   end;

SQL Server中奇怪的比较结果

#2


1  

It is not only about ASCII codes.

它不仅仅是关于ASCII码。

Yes, ASCII code of / is 0x2F and ASCII code of - is 0x2D, but string comparison rules depend on collation and these rules can be quite complex, taking into account not only values of the character code.

是的,ASCII码的/是0x2F, ASCII码的-是0x2D,但是字符串比较规则依赖于排序,这些规则可能非常复杂,不仅要考虑字符代码的值。

The - symbol is treated in a special way in some collations.

在某些排序中,符号被用一种特殊的方式处理。

Here is a full example:

这里有一个完整的例子:

DECLARE @T1 TABLE (Value varchar(100) COLLATE Chinese_PRC_CI_AS);
DECLARE @T2 TABLE (Value varchar(100) COLLATE Latin1_General_CI_AS);
DECLARE @T3 TABLE (Value varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS);

INSERT INTO @T1 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

INSERT INTO @T2 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

INSERT INTO @T3 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

SELECT * FROM @T1 ORDER BY Value;
SELECT * FROM @T2 ORDER BY Value;
SELECT * FROM @T3 ORDER BY Value;

Result

结果

T1 (Chinese_PRC_CI_AS)

T1(Chinese_PRC_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abcdef  |
| abc-def |
+---------+

T2 (Latin1_General_CI_AS)

T2(Latin1_General_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abcdef  |
| abc-def |
+---------+

T3 (SQL_Latin1_General_CP1_CI_AS)

T3(SQL_Latin1_General_CP1_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abc-def |
| abcdef  |
+---------+

Note, that SQL collation in the third table produced result in a different order.

注意,第三个表中的SQL排序会产生不同的顺序。


Note also, that if you change the column type to nvarchar, the effect disappears in this specific example. In other words, rules for treating the - symbol depend on collation and on the type of the value.

还要注意,如果将列类型更改为nvarchar,那么在这个特定示例中效果将消失。换句话说,处理符号的规则依赖于排序和值的类型。

#3


0  

For characters that are do not have an explicit order that depends on the collation (i.e., that are not case-folded and/or do not have accents stripped), the ordering is based on the ordinal value of the code point in the base character set. If one string is a prefix of another, the shorter string collates first.
The first 128 code points in Unicode are the same as the ASCII character set, which means / comes after -.

对于不具有显式顺序的字符,则不具有依赖于排序的显式顺序。,没有大小写折叠和/或没有去掉重音),排序基于基字符集中代码点的序数值。Unicode中的前128个代码点与ASCII字符集相同,这意味着/在-之后。

#1


3  

I expect '2017/8/22 1:33:53' to come after '2017-08-13 23:12:33.411' because / comes after -. Is this because special ordering in Chinese_PRC_CI_AS collation?

我期待着“2017/8/22 1:33:53”在“2017-08-13 23:12:33411”之后,因为…这是因为Chinese_PRC_CI_AS排序的特殊顺序吗?

Yes it's because of Chinese_PRC_CI_AS collation

是的,因为Chinese_PRC_CI_AS排序

Where can I find the specification if that's the case?

如果是这样的话,我在哪里可以找到说明书?

The rules are complicated enough but you can find them here: UNICODE COLLATION ALGORITHM

规则非常复杂,但是您可以在这里找到它们:UNICODE排序算法

I extract some text from it as a picture to give you an idea: SQL Server中奇怪的比较结果

我从中提取一些文字作为图片,给你一个想法:

In your case you can try to use a binary collation Chinese_PRC_BIN2, it will give you the desirable result in this particular case, but it can be non acceptable for letters sorting, I mean in european languages binary sort will always put Upper case letters before all the lower case letters, but I have no idea how Chinese symbols should be sorted

在你的情况下,您可以尝试使用二进制排序Chinese_PRC_BIN2,它会给你的结果在这个特殊情况下,但它可以不接受字母排序,我的意思是在欧洲语言二进制排序总是把大写字母之前所有的小写字母,但是我不知道应该如何排序的中国符号

Here is the code for sort your dates with your order (you said you use UNICODE columns):

这是您的日期和订单排序的代码(您说过您使用UNICODE列):

declare @t table (s nvarchar(100))
insert into @t values (N'2017/8/22 1:33:53'),  (N'2017-08-13 23:12:33.411')

select *
from @t
order by s collate Chinese_PRC_BIN2;

Bin2 collation will work in "non unicode case" as your example in the picture, too:

Bin2排序将在“非unicode的情况”中工作,就像你在图片中的例子一样:

select case 
      when '2017/8/22 1:33:53' collate Chinese_PRC_BIN2 < 
           '2017-08-13 23:12:33.411' collate Chinese_PRC_BIN2 
      then 'TRUE' 
      else 'FALSE' 
   end;

SQL Server中奇怪的比较结果

#2


1  

It is not only about ASCII codes.

它不仅仅是关于ASCII码。

Yes, ASCII code of / is 0x2F and ASCII code of - is 0x2D, but string comparison rules depend on collation and these rules can be quite complex, taking into account not only values of the character code.

是的,ASCII码的/是0x2F, ASCII码的-是0x2D,但是字符串比较规则依赖于排序,这些规则可能非常复杂,不仅要考虑字符代码的值。

The - symbol is treated in a special way in some collations.

在某些排序中,符号被用一种特殊的方式处理。

Here is a full example:

这里有一个完整的例子:

DECLARE @T1 TABLE (Value varchar(100) COLLATE Chinese_PRC_CI_AS);
DECLARE @T2 TABLE (Value varchar(100) COLLATE Latin1_General_CI_AS);
DECLARE @T3 TABLE (Value varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS);

INSERT INTO @T1 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

INSERT INTO @T2 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

INSERT INTO @T3 VALUES
('abc'),
('abc-def'),
('abcdef'),
('abc-');

SELECT * FROM @T1 ORDER BY Value;
SELECT * FROM @T2 ORDER BY Value;
SELECT * FROM @T3 ORDER BY Value;

Result

结果

T1 (Chinese_PRC_CI_AS)

T1(Chinese_PRC_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abcdef  |
| abc-def |
+---------+

T2 (Latin1_General_CI_AS)

T2(Latin1_General_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abcdef  |
| abc-def |
+---------+

T3 (SQL_Latin1_General_CP1_CI_AS)

T3(SQL_Latin1_General_CP1_CI_AS)

+---------+
|  Value  |
+---------+
| abc     |
| abc-    |
| abc-def |
| abcdef  |
+---------+

Note, that SQL collation in the third table produced result in a different order.

注意,第三个表中的SQL排序会产生不同的顺序。


Note also, that if you change the column type to nvarchar, the effect disappears in this specific example. In other words, rules for treating the - symbol depend on collation and on the type of the value.

还要注意,如果将列类型更改为nvarchar,那么在这个特定示例中效果将消失。换句话说,处理符号的规则依赖于排序和值的类型。

#3


0  

For characters that are do not have an explicit order that depends on the collation (i.e., that are not case-folded and/or do not have accents stripped), the ordering is based on the ordinal value of the code point in the base character set. If one string is a prefix of another, the shorter string collates first.
The first 128 code points in Unicode are the same as the ASCII character set, which means / comes after -.

对于不具有显式顺序的字符,则不具有依赖于排序的显式顺序。,没有大小写折叠和/或没有去掉重音),排序基于基字符集中代码点的序数值。Unicode中的前128个代码点与ASCII字符集相同,这意味着/在-之后。