After going thru some tutorials on SQL SERVER 2008's new feature SPARSE COLUMN, I have found that it doesn't take any space if the column value is 0 or null but when there is a value, it takes 4 times the space a regular(non sparse) column holds.
在学习了SQL SERVER 2008的新特性稀疏列的一些教程之后,我发现,如果列值为0或null,那么它不会占用任何空间,但是当有值时,它会占用常规(非稀疏)列所占用的空间的4倍。
If my understanding is correct, then why I will go for that at the time of database design? And if I use that, then at what situation so I be?
如果我的理解是正确的,那我为什么要在数据库设计的时候这么做呢?如果我用它,那么在什么情况下?
Also out of curiosity, how come no space get's reserve when a column is defined as sparse column(I mean to say, what is the internal implementation for that)
同样出于好奇,当一个列被定义为稀疏列(我的意思是说,它的内部实现是什么)时,为什么没有空间获取储备呢?
Thanks in advance
谢谢提前
5 个解决方案
#1
78
A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)
稀疏列不使用4x存储一个值的空间,它使用一个(固定的)4个额外字节的非空值。(如前所述,NULL占用0空间。)
-
So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.
所以在位列中存储的非空值是1位+ 4字节= 4.125字节。但如果99%都是零,这仍然是净节约。
-
A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.
存储在GUID (UniqueIdentifier)列中的非空值为16字节+ 4字节= 20字节。如果只有50%是空的,这仍然是净节约。
So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.
"预期储蓄"很大程度上取决于我们讨论的是哪一列,以及你估计的零和非零之比。可变宽度列(varchars)可能更难准确地预测。
This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.
这本书在线页面有一个表,显示不同数据类型的百分比需要为null,以使您最终得到一个好处。
So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:
那么什么时候应该使用稀疏列呢?当您期望很大比例的行具有空值时。我想到了一些例子:
- A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
- 订单表中的“订单返回日期”列。你会希望很小一部分的销售会导致退货。
- A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
- 地址表中的“第四个地址”行。大多数邮件地址,即使您需要一个部门名称和一个“关心”,可能也不需要4行独立的行。
- A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
- 客户表中的“后缀”列。相当低的比例的人名字后面有“Jr.”、“III”或“Esquire”。
#2
22
-
Storing a null in a sparse column takes up no space at all.
在稀疏列中存储null根本不占用任何空间。
-
To any external application the column will behave the same
对于任何外部应用程序,列的行为都是相同的
-
Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.
稀疏列非常适合于筛选的索引,因为您只需要创建一个索引来处理列中的非空属性。
-
You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set. The column set behaves like a column itself. Note: you can only have one column set per table.
您可以在稀疏列上创建一个列集,该列集从集合覆盖的列中返回所有非空数据的xml剪辑。注意:每个表只能设置一个列。
-
Change Data Capture and Transactional replication both work, but not the column sets feature.
更改数据捕获和事务复制都可以工作,但列集特性不起作用。
Downsides
缺点
-
If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.
如果稀疏列中有数据,它将比普通列多占用4个字节,例如,即使是位(通常为0.125字节)也是4.125字节,唯一标识符从16字节增加到20字节。
-
Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)
不是所有的数据类型都可以是稀疏的:文本、ntext、图像、时间戳、用户定义的数据类型、几何图形、地理或varbinray (max)和FILESTREAM属性不能是稀疏的。(谢谢Alex发现了错误)
-
computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)
计算列不能是稀疏的(尽管稀疏列可以在另一个计算列中参与计算)
-
You can't apply rules or have default values.
您不能应用规则或拥有默认值。
-
Sparse columns cannot form part of a clustered index. If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).
稀疏列不能构成群集索引的一部分。如果需要这样做,请使用基于稀疏列的计算列,并在该列上创建聚集索引(这将导致对象失败)。
-
Merge replication doesn't work.
合并复制是行不通的。
-
Data compression doesn't work.
数据压缩是行不通的。
-
Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.
访问(读和写)稀疏列要花费更多的钱,但是我还没有找到任何关于这方面的确切数字。
参考
#3
3
You're reading it wrong - it never takes 4x the space.
你读错了,它不会占用4x的空间。
Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.
具体地说,它是4*(4字节,见脚注),而不是4x(乘以4)。
"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."
“*长度等于类型中包含的数据的平均值,加上2或4个字节。”
#4
0
| datetime NULL | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| NULL (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| 20171213 (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
| NULL (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
You lose 4 bytes not just once per row; but for every cell in the row that is not null.
你损失了4个字节而不是每一行只损失一次;但是对于行中不为空的每个单元格。
#5
-2
From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2 by Pinal Dave:
从SQL SERVER - 2008 -稀疏列简介-第2部分,Pinal Dave:
All SPARSE columns are stored as one XML column in database. Let us see some of the advantage and disadvantage of SPARSE column.
所有稀疏列都存储为数据库中的一个XML列。让我们看看稀疏列的一些优点和缺点。
Advantages of SPARSE column are:
稀疏柱的优点是:
INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.
插入、更新和删除语句可以按名称引用稀疏列。稀疏列也可以作为一个XML列。
SPARSE column can take advantage of filtered Indexes, where data are filled in the row.
稀疏列可以利用过滤的索引,其中的数据被填充到行中。
SPARSE column saves lots of database space when there are zero or null values in database.
当数据库中有0或null值时,稀疏列节省了大量的数据库空间。
Disadvantages of SPARSE column are:
稀疏柱的缺点是:
SPARSE column does not have IDENTITY or ROWGUIDCOL property.
稀疏列没有标识或ROWGUIDCOL属性。
SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.
稀疏列不能应用于文本、ntext、图像、时间戳、几何、地理或用户定义的数据类型。
SPARSE column can not have default value or rule or computed column.
稀疏列不能有默认值或规则或计算列。
Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.
群集索引或唯一的主键索引不能应用于稀疏列。稀疏列不能是群集索引键的一部分。
Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.
包含稀疏列的表可以最大大小为8018字节,而不是常规的8060字节。包含稀疏列的表操作对常规列进行性能打击。
#1
78
A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)
稀疏列不使用4x存储一个值的空间,它使用一个(固定的)4个额外字节的非空值。(如前所述,NULL占用0空间。)
-
So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.
所以在位列中存储的非空值是1位+ 4字节= 4.125字节。但如果99%都是零,这仍然是净节约。
-
A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.
存储在GUID (UniqueIdentifier)列中的非空值为16字节+ 4字节= 20字节。如果只有50%是空的,这仍然是净节约。
So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.
"预期储蓄"很大程度上取决于我们讨论的是哪一列,以及你估计的零和非零之比。可变宽度列(varchars)可能更难准确地预测。
This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.
这本书在线页面有一个表,显示不同数据类型的百分比需要为null,以使您最终得到一个好处。
So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:
那么什么时候应该使用稀疏列呢?当您期望很大比例的行具有空值时。我想到了一些例子:
- A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
- 订单表中的“订单返回日期”列。你会希望很小一部分的销售会导致退货。
- A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
- 地址表中的“第四个地址”行。大多数邮件地址,即使您需要一个部门名称和一个“关心”,可能也不需要4行独立的行。
- A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
- 客户表中的“后缀”列。相当低的比例的人名字后面有“Jr.”、“III”或“Esquire”。
#2
22
-
Storing a null in a sparse column takes up no space at all.
在稀疏列中存储null根本不占用任何空间。
-
To any external application the column will behave the same
对于任何外部应用程序,列的行为都是相同的
-
Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.
稀疏列非常适合于筛选的索引,因为您只需要创建一个索引来处理列中的非空属性。
-
You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set. The column set behaves like a column itself. Note: you can only have one column set per table.
您可以在稀疏列上创建一个列集,该列集从集合覆盖的列中返回所有非空数据的xml剪辑。注意:每个表只能设置一个列。
-
Change Data Capture and Transactional replication both work, but not the column sets feature.
更改数据捕获和事务复制都可以工作,但列集特性不起作用。
Downsides
缺点
-
If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.
如果稀疏列中有数据,它将比普通列多占用4个字节,例如,即使是位(通常为0.125字节)也是4.125字节,唯一标识符从16字节增加到20字节。
-
Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)
不是所有的数据类型都可以是稀疏的:文本、ntext、图像、时间戳、用户定义的数据类型、几何图形、地理或varbinray (max)和FILESTREAM属性不能是稀疏的。(谢谢Alex发现了错误)
-
computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)
计算列不能是稀疏的(尽管稀疏列可以在另一个计算列中参与计算)
-
You can't apply rules or have default values.
您不能应用规则或拥有默认值。
-
Sparse columns cannot form part of a clustered index. If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).
稀疏列不能构成群集索引的一部分。如果需要这样做,请使用基于稀疏列的计算列,并在该列上创建聚集索引(这将导致对象失败)。
-
Merge replication doesn't work.
合并复制是行不通的。
-
Data compression doesn't work.
数据压缩是行不通的。
-
Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.
访问(读和写)稀疏列要花费更多的钱,但是我还没有找到任何关于这方面的确切数字。
参考
#3
3
You're reading it wrong - it never takes 4x the space.
你读错了,它不会占用4x的空间。
Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.
具体地说,它是4*(4字节,见脚注),而不是4x(乘以4)。
"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."
“*长度等于类型中包含的数据的平均值,加上2或4个字节。”
#4
0
| datetime NULL | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| NULL (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| 20171213 (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
| NULL (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
You lose 4 bytes not just once per row; but for every cell in the row that is not null.
你损失了4个字节而不是每一行只损失一次;但是对于行中不为空的每个单元格。
#5
-2
From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2 by Pinal Dave:
从SQL SERVER - 2008 -稀疏列简介-第2部分,Pinal Dave:
All SPARSE columns are stored as one XML column in database. Let us see some of the advantage and disadvantage of SPARSE column.
所有稀疏列都存储为数据库中的一个XML列。让我们看看稀疏列的一些优点和缺点。
Advantages of SPARSE column are:
稀疏柱的优点是:
INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.
插入、更新和删除语句可以按名称引用稀疏列。稀疏列也可以作为一个XML列。
SPARSE column can take advantage of filtered Indexes, where data are filled in the row.
稀疏列可以利用过滤的索引,其中的数据被填充到行中。
SPARSE column saves lots of database space when there are zero or null values in database.
当数据库中有0或null值时,稀疏列节省了大量的数据库空间。
Disadvantages of SPARSE column are:
稀疏柱的缺点是:
SPARSE column does not have IDENTITY or ROWGUIDCOL property.
稀疏列没有标识或ROWGUIDCOL属性。
SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.
稀疏列不能应用于文本、ntext、图像、时间戳、几何、地理或用户定义的数据类型。
SPARSE column can not have default value or rule or computed column.
稀疏列不能有默认值或规则或计算列。
Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.
群集索引或唯一的主键索引不能应用于稀疏列。稀疏列不能是群集索引键的一部分。
Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.
包含稀疏列的表可以最大大小为8018字节,而不是常规的8060字节。包含稀疏列的表操作对常规列进行性能打击。