SQL Server在Arabic_CI_AS排序规则中的'ی'和'ي'之间没有区别

时间:2021-02-23 20:23:42

I'm using ASCII function for getting equivalent ASCII code of two characters, but I'm surprised when seeing there is no difference between 'ي' and 'ی', can anyone help me?

我正在使用ASCII函数获取两个字符的等效ASCII码,但是当看到“ي”和“ی”之间没有区别时我很惊讶,有人可以帮助我吗?

SELECT ASCII('ي'), ASCII('ی')

4 个解决方案

#1


4  

Because your character is non Unicode you have to use UNICODE() function instead of ASCII() .

因为您的字符是非Unicode,您必须使用UNICODE()函数而不是ASCII()。

SELECT ASCII('ي'), ASCII('ی')

will result: 237, 237

结果:237,237

but

SELECT UNICODE(N'ي'), UNICODE(N'ی')

will result: 1610, 1740

结果:1610,1740

#2


4  

Try this

SELECT UNICODE(N'ي'), UNICODE(N'ی')

#3


3  

Another solution by using the proper collate in case you want to use Ascii

如果您想使用Ascii,使用适当的整理的另一种解决方案

Arabic_CS_AS_KS

result will come as ى = 236 and ي= 237

结果将为ى= 236和ي= 237

#4


2  

This is a limitation ASCII function. According to the documentation, ASCII:

这是限制ASCII功能。根据文档,ASCII:

Returns the ASCII code value of the leftmost character of a character expression.

返回字符表达式最左侧字符的ASCII代码值。

However, the characters in your question are made up of more than one byte. It appears that ASCII can only read one byte.

但是,问题中的字符由多个字节组成。看来ASCII只能读取一个字节。

When you use these characters as string literals without the N prefix, they are treated as single-byte characters. The following query shows that SQL Server does not treat these characters as equal in the Arabic_CI_AS collation when they are properly marked as multi-byte:

当您将这些字符用作不带N前缀的字符串文字时,它们将被视为单字节字符。以下查询显示,当正确标记为多字节时,SQL Server不会将这些字符视为在Arabic_CI_AS排序规则中相等:

SELECT CASE WHEN 'ي' COLLATE Arabic_CI_AS <> 'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_ascii,
CASE WHEN N'ي' COLLATE Arabic_CI_AS <> N'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_unicode

The following query shows the bytes that make up the characters:

以下查询显示组成字符的字节:

SELECT CAST(N'ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST(N'ی' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ی' COLLATE Arabic_CI_AS as varbinary(4))

However, even when you mark the characters as unicode, the ASCII function returns the same value because it can only read one byte:

但是,即使将字符标记为unicode,ASCII函数也会返回相同的值,因为它只能读取一个字节:

SELECT ASCII(N'ي' COLLATE Arabic_CI_AS) , ASCII(N'ی' COLLATE Arabic_CI_AS)

EDIT As TT. points out, these characters don't have an entry in the ASCII code table.

编辑为TT。指出,这些字符在ASCII代码表中没有条目。

#1


4  

Because your character is non Unicode you have to use UNICODE() function instead of ASCII() .

因为您的字符是非Unicode,您必须使用UNICODE()函数而不是ASCII()。

SELECT ASCII('ي'), ASCII('ی')

will result: 237, 237

结果:237,237

but

SELECT UNICODE(N'ي'), UNICODE(N'ی')

will result: 1610, 1740

结果:1610,1740

#2


4  

Try this

SELECT UNICODE(N'ي'), UNICODE(N'ی')

#3


3  

Another solution by using the proper collate in case you want to use Ascii

如果您想使用Ascii,使用适当的整理的另一种解决方案

Arabic_CS_AS_KS

result will come as ى = 236 and ي= 237

结果将为ى= 236和ي= 237

#4


2  

This is a limitation ASCII function. According to the documentation, ASCII:

这是限制ASCII功能。根据文档,ASCII:

Returns the ASCII code value of the leftmost character of a character expression.

返回字符表达式最左侧字符的ASCII代码值。

However, the characters in your question are made up of more than one byte. It appears that ASCII can only read one byte.

但是,问题中的字符由多个字节组成。看来ASCII只能读取一个字节。

When you use these characters as string literals without the N prefix, they are treated as single-byte characters. The following query shows that SQL Server does not treat these characters as equal in the Arabic_CI_AS collation when they are properly marked as multi-byte:

当您将这些字符用作不带N前缀的字符串文字时,它们将被视为单字节字符。以下查询显示,当正确标记为多字节时,SQL Server不会将这些字符视为在Arabic_CI_AS排序规则中相等:

SELECT CASE WHEN 'ي' COLLATE Arabic_CI_AS <> 'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_ascii,
CASE WHEN N'ي' COLLATE Arabic_CI_AS <> N'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_unicode

The following query shows the bytes that make up the characters:

以下查询显示组成字符的字节:

SELECT CAST(N'ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST(N'ی' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ی' COLLATE Arabic_CI_AS as varbinary(4))

However, even when you mark the characters as unicode, the ASCII function returns the same value because it can only read one byte:

但是,即使将字符标记为unicode,ASCII函数也会返回相同的值,因为它只能读取一个字节:

SELECT ASCII(N'ي' COLLATE Arabic_CI_AS) , ASCII(N'ی' COLLATE Arabic_CI_AS)

EDIT As TT. points out, these characters don't have an entry in the ASCII code table.

编辑为TT。指出,这些字符在ASCII代码表中没有条目。