为什么和什么时候应该使用稀疏列?(SQL SERVER 2008)

时间:2021-10-14 01:09:42

After going thru some tutorials on SQL SERVER 2008's new feature SPARSE COLUMN, I have found that it doesn't take any space if the column value is 0 or null but when there is a value, it takes 4 times the space a regular(non sparse) column holds.

在学习了SQL SERVER 2008的新特性稀疏列的一些教程之后,我发现,如果列值为0或null,那么它不会占用任何空间,但是当有值时,它会占用常规(非稀疏)列所占用的空间的4倍。

If my understanding is correct, then why I will go for that at the time of database design? And if I use that, then at what situation so I be?

如果我的理解是正确的,那我为什么要在数据库设计的时候这么做呢?如果我用它,那么在什么情况下?

Also out of curiosity, how come no space get's reserve when a column is defined as sparse column(I mean to say, what is the internal implementation for that)

同样出于好奇,当一个列被定义为稀疏列(我的意思是说,它的内部实现是什么)时,为什么没有空间获取储备呢?

Thanks in advance

谢谢提前

5 个解决方案

#1


78  

A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)

稀疏列不使用4x存储一个值的空间,它使用一个(固定的)4个额外字节的非空值。(如前所述,NULL占用0空间。)

  • So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.

    所以在位列中存储的非空值是1位+ 4字节= 4.125字节。但如果99%都是零,这仍然是净节约。

  • A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.

    存储在GUID (UniqueIdentifier)列中的非空值为16字节+ 4字节= 20字节。如果只有50%是空的,这仍然是净节约。

So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.

"预期储蓄"很大程度上取决于我们讨论的是哪一列,以及你估计的零和非零之比。可变宽度列(varchars)可能更难准确地预测。

This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.

这本书在线页面有一个表,显示不同数据类型的百分比需要为null,以使您最终得到一个好处。

So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:

那么什么时候应该使用稀疏列呢?当您期望很大比例的行具有空值时。我想到了一些例子:

  • A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
  • 订单表中的“订单返回日期”列。你会希望很小一部分的销售会导致退货。
  • A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
  • 地址表中的“第四个地址”行。大多数邮件地址,即使您需要一个部门名称和一个“关心”,可能也不需要4行独立的行。
  • A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
  • 客户表中的“后缀”列。相当低的比例的人名字后面有“Jr.”、“III”或“Esquire”。

#2


22  

  • Storing a null in a sparse column takes up no space at all.

    在稀疏列中存储null根本不占用任何空间。

  • To any external application the column will behave the same

    对于任何外部应用程序,列的行为都是相同的

  • Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.

    稀疏列非常适合于筛选的索引,因为您只需要创建一个索引来处理列中的非空属性。

  • You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set. The column set behaves like a column itself. Note: you can only have one column set per table.

    您可以在稀疏列上创建一个列集,该列集从集合覆盖的列中返回所有非空数据的xml剪辑。注意:每个表只能设置一个列。

  • Change Data Capture and Transactional replication both work, but not the column sets feature.

    更改数据捕获和事务复制都可以工作,但列集特性不起作用。

Downsides

缺点

  • If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.

    如果稀疏列中有数据,它将比普通列多占用4个字节,例如,即使是位(通常为0.125字节)也是4.125字节,唯一标识符从16字节增加到20字节。

  • Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)

    不是所有的数据类型都可以是稀疏的:文本、ntext、图像、时间戳、用户定义的数据类型、几何图形、地理或varbinray (max)和FILESTREAM属性不能是稀疏的。(谢谢Alex发现了错误)

  • computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)

    计算列不能是稀疏的(尽管稀疏列可以在另一个计算列中参与计算)

  • You can't apply rules or have default values.

    您不能应用规则或拥有默认值。

  • Sparse columns cannot form part of a clustered index. If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).

    稀疏列不能构成群集索引的一部分。如果需要这样做,请使用基于稀疏列的计算列,并在该列上创建聚集索引(这将导致对象失败)。

  • Merge replication doesn't work.

    合并复制是行不通的。

  • Data compression doesn't work.

    数据压缩是行不通的。

  • Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.

    访问(读和写)稀疏列要花费更多的钱,但是我还没有找到任何关于这方面的确切数字。

Reference

参考

#3


3  

You're reading it wrong - it never takes 4x the space.

你读错了,它不会占用4x的空间。

Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.

具体地说,它是4*(4字节,见脚注),而不是4x(乘以4)。

"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."

“*长度等于类型中包含的数据的平均值,加上2或4个字节。”

#4


0  

| datetime NULL      | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| NULL     (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| 20171213 (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |
| NULL     (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |

You lose 4 bytes not just once per row; but for every cell in the row that is not null.

你损失了4个字节而不是每一行只损失一次;但是对于行中不为空的每个单元格。

#5


-2  

From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2 by Pinal Dave:

从SQL SERVER - 2008 -稀疏列简介-第2部分,Pinal Dave:

All SPARSE columns are stored as one XML column in database. Let us see some of the advantage and disadvantage of SPARSE column.

所有稀疏列都存储为数据库中的一个XML列。让我们看看稀疏列的一些优点和缺点。

Advantages of SPARSE column are:

稀疏柱的优点是:

  • INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.

    插入、更新和删除语句可以按名称引用稀疏列。稀疏列也可以作为一个XML列。

  • SPARSE column can take advantage of filtered Indexes, where data are filled in the row.

    稀疏列可以利用过滤的索引,其中的数据被填充到行中。

  • SPARSE column saves lots of database space when there are zero or null values in database.

    当数据库中有0或null值时,稀疏列节省了大量的数据库空间。

Disadvantages of SPARSE column are:

稀疏柱的缺点是:

  • SPARSE column does not have IDENTITY or ROWGUIDCOL property.

    稀疏列没有标识或ROWGUIDCOL属性。

  • SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.

    稀疏列不能应用于文本、ntext、图像、时间戳、几何、地理或用户定义的数据类型。

  • SPARSE column can not have default value or rule or computed column.

    稀疏列不能有默认值或规则或计算列。

  • Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.

    群集索引或唯一的主键索引不能应用于稀疏列。稀疏列不能是群集索引键的一部分。

  • Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.

    包含稀疏列的表可以最大大小为8018字节,而不是常规的8060字节。包含稀疏列的表操作对常规列进行性能打击。

#1


78  

A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)

稀疏列不使用4x存储一个值的空间,它使用一个(固定的)4个额外字节的非空值。(如前所述,NULL占用0空间。)

  • So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.

    所以在位列中存储的非空值是1位+ 4字节= 4.125字节。但如果99%都是零,这仍然是净节约。

  • A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.

    存储在GUID (UniqueIdentifier)列中的非空值为16字节+ 4字节= 20字节。如果只有50%是空的,这仍然是净节约。

So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.

"预期储蓄"很大程度上取决于我们讨论的是哪一列,以及你估计的零和非零之比。可变宽度列(varchars)可能更难准确地预测。

This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.

这本书在线页面有一个表,显示不同数据类型的百分比需要为null,以使您最终得到一个好处。

So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:

那么什么时候应该使用稀疏列呢?当您期望很大比例的行具有空值时。我想到了一些例子:

  • A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
  • 订单表中的“订单返回日期”列。你会希望很小一部分的销售会导致退货。
  • A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
  • 地址表中的“第四个地址”行。大多数邮件地址,即使您需要一个部门名称和一个“关心”,可能也不需要4行独立的行。
  • A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
  • 客户表中的“后缀”列。相当低的比例的人名字后面有“Jr.”、“III”或“Esquire”。

#2


22  

  • Storing a null in a sparse column takes up no space at all.

    在稀疏列中存储null根本不占用任何空间。

  • To any external application the column will behave the same

    对于任何外部应用程序,列的行为都是相同的

  • Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.

    稀疏列非常适合于筛选的索引,因为您只需要创建一个索引来处理列中的非空属性。

  • You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set. The column set behaves like a column itself. Note: you can only have one column set per table.

    您可以在稀疏列上创建一个列集,该列集从集合覆盖的列中返回所有非空数据的xml剪辑。注意:每个表只能设置一个列。

  • Change Data Capture and Transactional replication both work, but not the column sets feature.

    更改数据捕获和事务复制都可以工作,但列集特性不起作用。

Downsides

缺点

  • If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.

    如果稀疏列中有数据,它将比普通列多占用4个字节,例如,即使是位(通常为0.125字节)也是4.125字节,唯一标识符从16字节增加到20字节。

  • Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)

    不是所有的数据类型都可以是稀疏的:文本、ntext、图像、时间戳、用户定义的数据类型、几何图形、地理或varbinray (max)和FILESTREAM属性不能是稀疏的。(谢谢Alex发现了错误)

  • computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)

    计算列不能是稀疏的(尽管稀疏列可以在另一个计算列中参与计算)

  • You can't apply rules or have default values.

    您不能应用规则或拥有默认值。

  • Sparse columns cannot form part of a clustered index. If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).

    稀疏列不能构成群集索引的一部分。如果需要这样做,请使用基于稀疏列的计算列,并在该列上创建聚集索引(这将导致对象失败)。

  • Merge replication doesn't work.

    合并复制是行不通的。

  • Data compression doesn't work.

    数据压缩是行不通的。

  • Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.

    访问(读和写)稀疏列要花费更多的钱,但是我还没有找到任何关于这方面的确切数字。

Reference

参考

#3


3  

You're reading it wrong - it never takes 4x the space.

你读错了,它不会占用4x的空间。

Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.

具体地说,它是4*(4字节,见脚注),而不是4x(乘以4)。

"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."

“*长度等于类型中包含的数据的平均值,加上2或4个字节。”

#4


0  

| datetime NULL      | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| NULL     (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| 20171213 (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |
| NULL     (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |

You lose 4 bytes not just once per row; but for every cell in the row that is not null.

你损失了4个字节而不是每一行只损失一次;但是对于行中不为空的每个单元格。

#5


-2  

From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2 by Pinal Dave:

从SQL SERVER - 2008 -稀疏列简介-第2部分,Pinal Dave:

All SPARSE columns are stored as one XML column in database. Let us see some of the advantage and disadvantage of SPARSE column.

所有稀疏列都存储为数据库中的一个XML列。让我们看看稀疏列的一些优点和缺点。

Advantages of SPARSE column are:

稀疏柱的优点是:

  • INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.

    插入、更新和删除语句可以按名称引用稀疏列。稀疏列也可以作为一个XML列。

  • SPARSE column can take advantage of filtered Indexes, where data are filled in the row.

    稀疏列可以利用过滤的索引,其中的数据被填充到行中。

  • SPARSE column saves lots of database space when there are zero or null values in database.

    当数据库中有0或null值时,稀疏列节省了大量的数据库空间。

Disadvantages of SPARSE column are:

稀疏柱的缺点是:

  • SPARSE column does not have IDENTITY or ROWGUIDCOL property.

    稀疏列没有标识或ROWGUIDCOL属性。

  • SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.

    稀疏列不能应用于文本、ntext、图像、时间戳、几何、地理或用户定义的数据类型。

  • SPARSE column can not have default value or rule or computed column.

    稀疏列不能有默认值或规则或计算列。

  • Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.

    群集索引或唯一的主键索引不能应用于稀疏列。稀疏列不能是群集索引键的一部分。

  • Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.

    包含稀疏列的表可以最大大小为8018字节,而不是常规的8060字节。包含稀疏列的表操作对常规列进行性能打击。