SQL Server 2008空字符串vs空间

时间:2021-05-23 10:04:43

I ran into something a little odd this morning and thought I'd submit it for commentary.

今天早上我遇到了一些奇怪的事情,我想把它提交评论。

Can someone explain why the following SQL query prints 'equal' when run against SQL 2008. The db compatibility level is set to 100.

有人能解释一下为什么在运行SQL 2008时,下面的SQL查询会输出'equal'。db兼容性级别设置为100。

if '' = ' '
    print 'equal'
else
    print 'not equal'

And this returns 0:

这返回0:

select (LEN(' '))

It appears to be auto trimming the space. I have no idea if this was the case in previous versions of SQL Server, and I no longer have any around to even test it.

它似乎是自动修剪空间。我不知道在以前的SQL Server版本中是否存在这种情况,我已经没有任何工具可以测试它了。

I ran into this because a production query was returning incorrect results. I cannot find this behavior documented anywhere.

我遇到这个问题是因为生产查询返回了不正确的结果。我在任何地方都找不到这种行为。

Does anyone have any information on this?

有人知道吗?

7 个解决方案

#1


78  

varchars and equality are thorny in TSQL. The LEN function says:

在TSQL中,varchars和平等是棘手的问题。LEN函数表示:

Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

返回给定字符串表达式的字符数,而不是字节数,不包括尾空。

You need to use DATALENGTH to get a true byte count of the data in question. If you have unicode data, note that the value you get in this situation will not be the same as the length of the text.

您需要使用DATALENGTH获取相关数据的真实字节计数。如果您有unicode数据,请注意,您在这种情况下所得到的值将与文本的长度不相同。

print(DATALENGTH(' ')) --1
print(LEN(' '))        --0

When it comes to equality of expressions, the two strings are compared for equality like this:

当涉及到表达式的相等时,将这两个字符串作如下比较:

  • Get Shorter string
  • 变短字符串
  • Pad with blanks until length equals that of longer string
  • 用空格填充,直到长度等于长字符串的长度
  • Compare the two
  • 比较两个

It's the middle step that is causing unexpected results - after that step, you are effectively comparing whitespace against whitespace - hence they are seen to be equal.

这是导致意外结果的中间步骤——在此步骤之后,您将有效地比较空白与空白——因此它们看起来是相等的。

LIKE behaves better than = in the "blanks" situation because it doesn't perform blank-padding on the pattern you were trying to match:

就像在“空白”情况下表现得更好,因为它不会在你想要匹配的模式上执行空白填充:

if '' = ' '
print 'eq'
else
print 'ne'

Will give eq while:

将情商:

if '' LIKE ' '
print 'eq'
else
print 'ne'

Will give ne

将给东北

Careful with LIKE though: it is not symmetrical: it treats trailing whitespace as significant in the pattern (RHS) but not the match expression (LHS). The following is taken from here:

注意LIKE:它不是对称的:它把拖尾空格当作模式中的重要空格(RHS),而不是匹配表达式(LHS)。从这里开始:

declare @Space nvarchar(10)
declare @Space2 nvarchar(10)

set @Space = ''
set @Space2 = ' '

if @Space like @Space2
print '@Space Like @Space2'
else
print '@Space Not Like @Space2'

if @Space2 like @Space
print '@Space2 Like @Space'
else
print '@Space2 Not Like @Space'

@Space Not Like @Space2
@Space2 Like @Space

#2


15  

The = operator is T-SQL is not so much "equals" as it is "are the same word/phrase, according to the collation of the expression's context," and LEN is "the number of characters in the word/phrase." No collations treat trailing blanks as part of the word/phrase preceding them (though they do treat leading blanks as part of the string they precede).

=运算符是T-SQL与其说是“等于”,不如说是“是同一个单词/短语,根据表达式上下文的排序”,而LEN是“单词/短语中的字符数”。没有排序规则将尾空作为它们前面的单词/短语的一部分(尽管它们确实将前导空作为它们前面的字符串的一部分)。

If you need to distinguish 'this' from 'this ', you shouldn't use the "are the same word or phrase" operator because 'this' and 'this ' are the same word.

如果你需要区分'this'和'this',你不应该使用"are the same word or phrase" operator "因为'this'和'this'是同一个词。

Contributing to the way = works is the idea that the string-equality operator should depend on its arguments' contents and on the collation context of the expression, but it shouldn't depend on the types of the arguments, if they are both string types.

对way = works有贡献的一点是,string-equality操作符应该依赖于它的参数的内容和表达式的排序上下文,但如果它们都是字符串类型,则不应该依赖于参数的类型。

The natural language concept of "these are the same word" isn't typically precise enough to be able to be captured by a mathematical operator like =, and there's no concept of string type in natural language. Context (i.e., collation) matters (and exists in natural language) and is part of the story, and additional properties (some that seem quirky) are part of the definition of = in order to make it well-defined in the unnatural world of data.

“这些是同一个词”的自然语言概念通常不够精确,不能被像=这样的数学运算符捕获,而且在自然语言中没有字符串类型的概念。上下文(即。,排序)重要(并且存在于自然语言中),并且是故事的一部分,而附加的属性(有些看起来很奇怪)是=的定义的一部分,以便在不自然的数据世界中定义=。

On the type issue, you wouldn't want words to change when they are stored in different string types. For example, the types VARCHAR(10), CHAR(10), and CHAR(3) can all hold representations of the word 'cat', and ? = 'cat' should let us decide if a value of any of these types holds the word 'cat' (with issues of case and accent determined by the collation).

在类型问题上,您不希望在将单词存储在不同的字符串类型时更改它们。例如,类型VARCHAR(10)、CHAR(10)和CHAR(3)都可以包含单词“cat”的表示,以及?=“cat”应该让我们决定这些类型中的任何一个值是否包含“cat”(由排序决定大小写和口音的问题)。

Response to JohnFx's comment:

回应JohnFx的评论:

See Using char and varchar Data in Books Online. Quoting from that page, emphasis mine:

请参阅在在线图书中使用char和varchar数据。引用该页,强调我的:

Each char and varchar data value has a collation. Collations define attributes such as the bit patterns used to represent each character, comparison rules, and sensitivity to case or accenting.

每个char和varchar数据值都有一个排序。排序定义属性,如用于表示每个字符的位模式、比较规则和对大小写或重音的敏感性。

I agree it could be easier to find, but it's documented.

我同意它可能更容易找到,但它是有文件证明的。

Worth noting, too, is that SQL's semantics, where = has to do with the real-world data and the context of the comparison (as opposed to something about bits stored on the computer) has been part of SQL for a long time. The premise of RDBMSs and SQL is the faithful representation of real-world data, hence its support for collations many years before similar ideas (such as CultureInfo) entered the realm of Algol-like languages. The premise of those languages (at least until very recently) was problem-solving in engineering, not management of business data. (Recently, the use of similar languages in non-engineering applications like search is making some inroads, but Java, C#, and so on are still struggling with their non-businessy roots.)

同样值得注意的是,SQL的语义,where =与真实世界的数据和比较的上下文(与存储在计算机上的位有关)长期以来一直是SQL的一部分。RDBMSs和SQL的前提是真实数据的忠实表示,因此,在类似的想法(如CultureInfo)进入类似于algolas的语言领域之前,它支持多年的排序。这些语言的前提(至少直到最近)是解决工程问题,而不是管理业务数据。(最近,在搜索等非工程应用程序中使用类似的语言正在取得一些进展,但Java、c#等仍然在与它们的非商业根源作斗争。)

In my opinion, it's not fair to criticize SQL for being different from "most programming languages." SQL was designed to support a framework for business data modeling that's very different from engineering, so the language is different (and better for its goal).

在我看来,批评SQL与“大多数编程语言”不同是不公平的。SQL被设计为支持与工程非常不同的业务数据建模框架,因此语言是不同的(并且更适合其目标)。

Heck, when SQL was first specified, some languages didn't have any built-in string type. And in some languages still, the equals operator between strings doesn't compare character data at all, but compares references! It wouldn't surprise me if in another decade or two, the idea that == is culture-dependent becomes the norm.

赫克,当第一次指定SQL时,有些语言没有任何内置的字符串类型。在某些语言中,字符串之间的等号操作符根本不比较字符数据,而是比较引用!如果再过一二十年,我也不会感到惊讶,因为文化依赖成为常态。

#3


9  

I found this blog article which describes the behavior and explains why.

我找到了这篇描述这种行为并解释其原因的博客文章。

The SQL standard requires that string comparisons, effectively, pad the shorter string with space characters. This leads to the surprising result that N'' = N' ' (the empty string equals a string of one or more space characters) and more generally any string equals another string if they differ only by trailing spaces. This can be a problem in some contexts.

SQL标准要求字符串比较有效地用空格字符填充较短的字符串。这导致了令人惊讶的结果:N " = N'(空字符串等于一个或多个空格字符的字符串),更普遍的情况是,任何字符串都等于另一个字符串,如果它们仅通过尾随空格进行区别的话。在某些情况下,这可能是一个问题。

More information also available in MSKB316626

更多信息也可以在MSKB316626中找到

#4


4  

There was a similar question a while ago where I looked into a similar problem here

不久前我也遇到过类似的问题,我在这里也遇到过类似的问题

Instead of LEN(' '), use DATALENGTH(' ') - that gives you the correct value.

使用DATALENGTH(')而不是LEN(')——这会给您正确的值。

The solutions were to use a LIKE clause as explained in my answer in there, and/or include a 2nd condition in the WHERE clause to check DATALENGTH too.

解决方案是使用LIKE子句,在我的答案中解释,并且/或在WHERE子句中包含第二个条件,以检查DATALENGTH。

Have a read of that question and links in there.

请阅读其中的问题和链接。

#5


3  

To compare a value to a literal space, you may also use this technique as an alternative to the LIKE statement:

为了将值与文字空间进行比较,您还可以使用这种技术作为类似语句的替代:

IF ASCII('') = 32 PRINT 'equal' ELSE PRINT 'not equal'

#6


0  

Sometimes one has to deal with spaces in data, with or without any other characters, even though the idea of using Null is better - but not always usable. I did run into the described situation and solved it this way:

有时,我们必须处理数据中的空格,无论是否有其他字符,即使使用Null的想法更好——但并不总是可用的。我确实遇到了所描述的情况,并以这种方式解决了它:

... where ('>' + @space + '<') <> ('>' + @space2 + '<')

…在哪里(' > ' + @space + ' < ')< >(' > ' + @space2 + ' < ')

Of course you wouldn't do that fpr large amount of data but it works quick and easy for some hundred lines ...

当然,你不会做那么多的fpr数据,但是对于几百行来说,它工作起来又快又简单……

Herbert

赫伯特

#7


0  

How to distinct records on select with fields char/varchar on sql server: example:

如何使用sql server上的char/varchar字段区分select上的记录:

declare @mayvar as varchar(10)

set @mayvar = 'data '

select mykey, myfield from mytable where myfield = @mayvar

expected

预期

mykey (int) | myfield (varchar10)

mykey (int) | myfield (varchar10)

1 | 'data '

1 |“数据”

obtained

获得

mykey | myfield

mykey | myfield

1 | 'data' 2 | 'data '

1 | 'data' 2 | 'data'

even if I write select mykey, myfield from mytable where myfield = 'data' (without final blank) I get the same results.

即使我从mytable中写入select mykey, myfield = 'data'(没有最后的空格),也会得到相同的结果。

how I solved? In this mode:

我如何解决?在这种模式下:

select mykey, myfield
from mytable
where myfield = @mayvar 
and DATALENGTH(isnull(myfield,'')) = DATALENGTH(@mayvar)

and if there is an index on myfield, it'll be used in each case.

如果myfield上有一个索引,它将在每种情况下使用。

I hope it will be helpful.

我希望这能有所帮助。

#1


78  

varchars and equality are thorny in TSQL. The LEN function says:

在TSQL中,varchars和平等是棘手的问题。LEN函数表示:

Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

返回给定字符串表达式的字符数,而不是字节数,不包括尾空。

You need to use DATALENGTH to get a true byte count of the data in question. If you have unicode data, note that the value you get in this situation will not be the same as the length of the text.

您需要使用DATALENGTH获取相关数据的真实字节计数。如果您有unicode数据,请注意,您在这种情况下所得到的值将与文本的长度不相同。

print(DATALENGTH(' ')) --1
print(LEN(' '))        --0

When it comes to equality of expressions, the two strings are compared for equality like this:

当涉及到表达式的相等时,将这两个字符串作如下比较:

  • Get Shorter string
  • 变短字符串
  • Pad with blanks until length equals that of longer string
  • 用空格填充,直到长度等于长字符串的长度
  • Compare the two
  • 比较两个

It's the middle step that is causing unexpected results - after that step, you are effectively comparing whitespace against whitespace - hence they are seen to be equal.

这是导致意外结果的中间步骤——在此步骤之后,您将有效地比较空白与空白——因此它们看起来是相等的。

LIKE behaves better than = in the "blanks" situation because it doesn't perform blank-padding on the pattern you were trying to match:

就像在“空白”情况下表现得更好,因为它不会在你想要匹配的模式上执行空白填充:

if '' = ' '
print 'eq'
else
print 'ne'

Will give eq while:

将情商:

if '' LIKE ' '
print 'eq'
else
print 'ne'

Will give ne

将给东北

Careful with LIKE though: it is not symmetrical: it treats trailing whitespace as significant in the pattern (RHS) but not the match expression (LHS). The following is taken from here:

注意LIKE:它不是对称的:它把拖尾空格当作模式中的重要空格(RHS),而不是匹配表达式(LHS)。从这里开始:

declare @Space nvarchar(10)
declare @Space2 nvarchar(10)

set @Space = ''
set @Space2 = ' '

if @Space like @Space2
print '@Space Like @Space2'
else
print '@Space Not Like @Space2'

if @Space2 like @Space
print '@Space2 Like @Space'
else
print '@Space2 Not Like @Space'

@Space Not Like @Space2
@Space2 Like @Space

#2


15  

The = operator is T-SQL is not so much "equals" as it is "are the same word/phrase, according to the collation of the expression's context," and LEN is "the number of characters in the word/phrase." No collations treat trailing blanks as part of the word/phrase preceding them (though they do treat leading blanks as part of the string they precede).

=运算符是T-SQL与其说是“等于”,不如说是“是同一个单词/短语,根据表达式上下文的排序”,而LEN是“单词/短语中的字符数”。没有排序规则将尾空作为它们前面的单词/短语的一部分(尽管它们确实将前导空作为它们前面的字符串的一部分)。

If you need to distinguish 'this' from 'this ', you shouldn't use the "are the same word or phrase" operator because 'this' and 'this ' are the same word.

如果你需要区分'this'和'this',你不应该使用"are the same word or phrase" operator "因为'this'和'this'是同一个词。

Contributing to the way = works is the idea that the string-equality operator should depend on its arguments' contents and on the collation context of the expression, but it shouldn't depend on the types of the arguments, if they are both string types.

对way = works有贡献的一点是,string-equality操作符应该依赖于它的参数的内容和表达式的排序上下文,但如果它们都是字符串类型,则不应该依赖于参数的类型。

The natural language concept of "these are the same word" isn't typically precise enough to be able to be captured by a mathematical operator like =, and there's no concept of string type in natural language. Context (i.e., collation) matters (and exists in natural language) and is part of the story, and additional properties (some that seem quirky) are part of the definition of = in order to make it well-defined in the unnatural world of data.

“这些是同一个词”的自然语言概念通常不够精确,不能被像=这样的数学运算符捕获,而且在自然语言中没有字符串类型的概念。上下文(即。,排序)重要(并且存在于自然语言中),并且是故事的一部分,而附加的属性(有些看起来很奇怪)是=的定义的一部分,以便在不自然的数据世界中定义=。

On the type issue, you wouldn't want words to change when they are stored in different string types. For example, the types VARCHAR(10), CHAR(10), and CHAR(3) can all hold representations of the word 'cat', and ? = 'cat' should let us decide if a value of any of these types holds the word 'cat' (with issues of case and accent determined by the collation).

在类型问题上,您不希望在将单词存储在不同的字符串类型时更改它们。例如,类型VARCHAR(10)、CHAR(10)和CHAR(3)都可以包含单词“cat”的表示,以及?=“cat”应该让我们决定这些类型中的任何一个值是否包含“cat”(由排序决定大小写和口音的问题)。

Response to JohnFx's comment:

回应JohnFx的评论:

See Using char and varchar Data in Books Online. Quoting from that page, emphasis mine:

请参阅在在线图书中使用char和varchar数据。引用该页,强调我的:

Each char and varchar data value has a collation. Collations define attributes such as the bit patterns used to represent each character, comparison rules, and sensitivity to case or accenting.

每个char和varchar数据值都有一个排序。排序定义属性,如用于表示每个字符的位模式、比较规则和对大小写或重音的敏感性。

I agree it could be easier to find, but it's documented.

我同意它可能更容易找到,但它是有文件证明的。

Worth noting, too, is that SQL's semantics, where = has to do with the real-world data and the context of the comparison (as opposed to something about bits stored on the computer) has been part of SQL for a long time. The premise of RDBMSs and SQL is the faithful representation of real-world data, hence its support for collations many years before similar ideas (such as CultureInfo) entered the realm of Algol-like languages. The premise of those languages (at least until very recently) was problem-solving in engineering, not management of business data. (Recently, the use of similar languages in non-engineering applications like search is making some inroads, but Java, C#, and so on are still struggling with their non-businessy roots.)

同样值得注意的是,SQL的语义,where =与真实世界的数据和比较的上下文(与存储在计算机上的位有关)长期以来一直是SQL的一部分。RDBMSs和SQL的前提是真实数据的忠实表示,因此,在类似的想法(如CultureInfo)进入类似于algolas的语言领域之前,它支持多年的排序。这些语言的前提(至少直到最近)是解决工程问题,而不是管理业务数据。(最近,在搜索等非工程应用程序中使用类似的语言正在取得一些进展,但Java、c#等仍然在与它们的非商业根源作斗争。)

In my opinion, it's not fair to criticize SQL for being different from "most programming languages." SQL was designed to support a framework for business data modeling that's very different from engineering, so the language is different (and better for its goal).

在我看来,批评SQL与“大多数编程语言”不同是不公平的。SQL被设计为支持与工程非常不同的业务数据建模框架,因此语言是不同的(并且更适合其目标)。

Heck, when SQL was first specified, some languages didn't have any built-in string type. And in some languages still, the equals operator between strings doesn't compare character data at all, but compares references! It wouldn't surprise me if in another decade or two, the idea that == is culture-dependent becomes the norm.

赫克,当第一次指定SQL时,有些语言没有任何内置的字符串类型。在某些语言中,字符串之间的等号操作符根本不比较字符数据,而是比较引用!如果再过一二十年,我也不会感到惊讶,因为文化依赖成为常态。

#3


9  

I found this blog article which describes the behavior and explains why.

我找到了这篇描述这种行为并解释其原因的博客文章。

The SQL standard requires that string comparisons, effectively, pad the shorter string with space characters. This leads to the surprising result that N'' = N' ' (the empty string equals a string of one or more space characters) and more generally any string equals another string if they differ only by trailing spaces. This can be a problem in some contexts.

SQL标准要求字符串比较有效地用空格字符填充较短的字符串。这导致了令人惊讶的结果:N " = N'(空字符串等于一个或多个空格字符的字符串),更普遍的情况是,任何字符串都等于另一个字符串,如果它们仅通过尾随空格进行区别的话。在某些情况下,这可能是一个问题。

More information also available in MSKB316626

更多信息也可以在MSKB316626中找到

#4


4  

There was a similar question a while ago where I looked into a similar problem here

不久前我也遇到过类似的问题,我在这里也遇到过类似的问题

Instead of LEN(' '), use DATALENGTH(' ') - that gives you the correct value.

使用DATALENGTH(')而不是LEN(')——这会给您正确的值。

The solutions were to use a LIKE clause as explained in my answer in there, and/or include a 2nd condition in the WHERE clause to check DATALENGTH too.

解决方案是使用LIKE子句,在我的答案中解释,并且/或在WHERE子句中包含第二个条件,以检查DATALENGTH。

Have a read of that question and links in there.

请阅读其中的问题和链接。

#5


3  

To compare a value to a literal space, you may also use this technique as an alternative to the LIKE statement:

为了将值与文字空间进行比较,您还可以使用这种技术作为类似语句的替代:

IF ASCII('') = 32 PRINT 'equal' ELSE PRINT 'not equal'

#6


0  

Sometimes one has to deal with spaces in data, with or without any other characters, even though the idea of using Null is better - but not always usable. I did run into the described situation and solved it this way:

有时,我们必须处理数据中的空格,无论是否有其他字符,即使使用Null的想法更好——但并不总是可用的。我确实遇到了所描述的情况,并以这种方式解决了它:

... where ('>' + @space + '<') <> ('>' + @space2 + '<')

…在哪里(' > ' + @space + ' < ')< >(' > ' + @space2 + ' < ')

Of course you wouldn't do that fpr large amount of data but it works quick and easy for some hundred lines ...

当然,你不会做那么多的fpr数据,但是对于几百行来说,它工作起来又快又简单……

Herbert

赫伯特

#7


0  

How to distinct records on select with fields char/varchar on sql server: example:

如何使用sql server上的char/varchar字段区分select上的记录:

declare @mayvar as varchar(10)

set @mayvar = 'data '

select mykey, myfield from mytable where myfield = @mayvar

expected

预期

mykey (int) | myfield (varchar10)

mykey (int) | myfield (varchar10)

1 | 'data '

1 |“数据”

obtained

获得

mykey | myfield

mykey | myfield

1 | 'data' 2 | 'data '

1 | 'data' 2 | 'data'

even if I write select mykey, myfield from mytable where myfield = 'data' (without final blank) I get the same results.

即使我从mytable中写入select mykey, myfield = 'data'(没有最后的空格),也会得到相同的结果。

how I solved? In this mode:

我如何解决?在这种模式下:

select mykey, myfield
from mytable
where myfield = @mayvar 
and DATALENGTH(isnull(myfield,'')) = DATALENGTH(@mayvar)

and if there is an index on myfield, it'll be used in each case.

如果myfield上有一个索引,它将在每种情况下使用。

I hope it will be helpful.

我希望这能有所帮助。