A Quick Look At Compression In SQL 2008
SQL2008数据压缩功能之一瞥
原文:
http://sqlblog.com/blogs/denis_gobo/archive/2009/04/07/13137.aspx
To use compression you have to understand what your data looks like, if your data is mostly unique then compression might not really help a lot. If your data is repeated on a page then it could be beneficial. In my own test compression would improve performance with some tables and not so much with other tables. I won't explain how compression works, this is all explained in books on line. the goal of this post is for you to have some code that you can run and then see how page and row level compression differs compared to the original data.
要使用数据压缩功能,你必须了解数据。如果你的数据很少重复,那么数据压缩将收效甚微。如果在一个数据页中重复数据较多,使用数据压缩将获益很大。在我对数据压缩的测试中,对于有些表有显著的性能改进,而对有些其它表则改进甚微。我无意讲解数据压缩的工作原理,有关工作原理SQL Server在线手册中有相关说明。本文的目的是通过一些可运行的代码,让你了解页级和行级数据压缩和原始数据之间的差异。
Here is some code that you can run to see how compression works. the data I have used is similar to what we use at my job(but not the same, i used fake tickers so don't try to invest in those ;-0). I have used just the 10 industry groups from the Industry Classification Benchmark, I left out supersectors,sectors and subsectors otherwise we would have a lot more rows per table. I also did not create fake ISIN Code,SEDOL and CUSIP columns since that would have bloated the code in this post. The code will generate 2780650 rows per table, this is a nice size to test with.
Compression is really nice when you get charged $30 per GB for the SAN, so besides performance there is also a cost benefit in storage
通过执行这里的代码你将了解数据压缩如何奏效。我使用的数据和我工作中的数据相类似(但不完全相同, 我使用假的股票行情记录,而且我并非要对这些股票进行投资)。我从行业分类基准选用了10只工业集团股,同时剔除了supersectors, sectors and subsectors表,否则每个表都会有一大笔数据。我也没有为ISINCode,SEDOL 和CUSIP列作假的数据,因为那样将使本文中的代码更为庞大。下列代码将为每个表产生2780650行数据,这可是规模相当不错的测试数据。
当你需要为SAN(存储区域网络)每GB的容量支付30美元时,压缩数据简直太好不过了。因此除了获得性能提升,你还可以从数据存储上的节约成本。
Here is the code
This block of code will create the database, setup the lookup tables and create the tables for our tests
以下是这些测试代码
这些代码将创建数据库,创建查询表和测试用的表。
use master
go
create database CompressionTest2
go
use CompressionTest2
go
--our Industry Classification Benchmark table
Create table IndustryGroup(GroupCode char(4) not null primary key,
GroupDescription varchar(40))
GO
--just industry groups
insert IndustryGroup values('0001','Oil & Gas' )
insert IndustryGroup values('1000','Basic Materials' )
insert IndustryGroup values('2000','Industrials')
insert IndustryGroup values('3000','Consumer Goods')
insert IndustryGroup values('4000','Healthcare')
insert IndustryGroup values('5000','Consumer Services')
insert IndustryGroup values('6000','Telecommunications')
insert IndustryGroup values('7000','Utilities')
insert IndustryGroup values('8000','Financials')
insert IndustryGroup values('9000','Technology')
GO
--currency table
Create table Currency (CurrencyCode char(3) not null primary key, CurrencyDescription varchar(30))
GO
--just handful of currencies (几种货币)
insert currency values('USD','U.S. Dollar')
insert currency values('AUD','Australian Dollar')
insert currency values('CAD','Canadian Dollar')
insert currency values('JPY','Japanese Yen')
insert currency values('MXN','Mexican Peso')
insert currency values('GBP','U.K. Sterling')
insert currency values('EUR','European Euro')
insert currency values('ISK','Iceland Krona')
insert currency values('BGN','Bulgarian Lev')
insert currency values('RON','Romanian Leu')
insert currency values('INR','Indian Rupee')
insert currency values('RUB','Russia Rubles')
insert currency values('BHD','Bahrain Dinar')
insert currency values('EGP','Egypt Pounds')
insert currency values('JOD','Jodan Dinars')
insert currency values('KWD','Kuwait Dinars')
insert currency values('MAD','Morocco Dirham')
insert currency values('OMR','Omam Rial')
insert currency values('QAR','Qatari Rial')
GO
--market cap table
create table MarketCap (MarketCap varchar(20) not null primary key,MarketCapDescription varchar(100) )
GO
--left out a bunch of market caps
insert MarketCap values('ALL','broad')
insert MarketCap values('MID','mid cap')
insert MarketCap values('MCR','micro cap')
insert MarketCap values('SML','small cap')
insert MarketCap values('LRG','large cap')
--calendar table
create table Calendar (CalendarDate date not null primary key)
GO
insert Calendar
select dateadd(d,number,'19920101') from master..spt_values
where type = 'p'
and datepart(dw,dateadd(d,number,'20080101')) not in (1,7)
union
select dateadd(d,number,'19970810') from master..spt_values
where type = 'p'
and datepart(dw,dateadd(d,number,'19970810')) not in (1,7)
--the table that we will test against 测试用表
create table IndexCloseValues (CalendarDate date not null,
IndexSymbol varchar(30) not null,
GroupCode char(4) not null,
CurrencyCode char(3) not null,
MarketCap varchar(20) not null,
CloseValue decimal(30,10),
TotalReturnClose decimal(30,10))
--2780650 rows 2780650行
insert IndexCloseValues
select CalendarDate,
CurrencyCode + left(GroupCode,1) + MarketCap as Symbol, --fake tickers
GroupCode,CurrencyCode,MarketCap,
ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.00100,
ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.247001
from IndustryGroup i cross join currency c
cross join MarketCap
cross join calendar
GO
--create a copy for page compression 创建页级压缩备份
select * into IndexCloseValuesPage
from IndexCloseValues
Go
--create a copy for row compression创建行级压缩备份
select * into IndexCloseValuesRow
from IndexCloseValues
GO
--add unique constraint 增加唯一约束
ALTER TABLE IndexCloseValues WITH NOCHECK
ADD CONSTRAINT UQ_IndexCloseValues UNIQUE Clustered (CalendarDate,IndexSymbol)
GO
--add unique constraint 增加唯一约束
ALTER TABLE IndexCloseValuesPage WITH NOCHECK
ADD CONSTRAINT UQ_IndexCloseValuesPage UNIQUE Clustered (CalendarDate,IndexSymbol)
GO
--page compression 页压缩
ALTER TABLE IndexCloseValuesPage
REBUILD WITH (DATA_COMPRESSION = PAGE);
GO
--add unique constraint增加唯一约束
ALTER TABLE IndexCloseValuesRow WITH NOCHECK
ADD CONSTRAINT UQ_IndexCloseValuesRow UNIQUE Clustered (CalendarDate,IndexSymbol)
GO
--row compression 行压缩
ALTER TABLE IndexCloseValuesRow
REBUILD WITH (DATA_COMPRESSION = ROW);
GO
Now that everything is setup we can look how big the tables are. Run the code below (old school I know)
dbcc showcontig('IndexCloseValues')
dbcc showcontig('IndexCloseValuesPage')
dbcc showcontig('IndexCloseValuesRow')
Table: 'IndexCloseValues' (213575799); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 23767
- Extents Scanned..............................: 2972
- Extent Switches..............................: 2971
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.97% [2971:2972]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.17%
- Avg. Bytes Free per Page.....................: 23.3
- Avg. Page Density (full).....................: 99.71%
Table: 'IndexCloseValuesPage' (245575913); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 9307
- Extents Scanned..............................: 1165
- Extent Switches..............................: 1164
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.91% [1164:1165]
- Logical Scan Fragmentation ..................: 0.04%
- Extent Scan Fragmentation ...................: 0.77%
- Avg. Bytes Free per Page.....................: 10.7
- Avg. Page Density (full).....................: 99.87%
Table: 'IndexCloseValuesRow' (261575970); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 13255
- Extents Scanned..............................: 1657
- Extent Switches..............................: 1656
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1657:1657]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.36%
- Avg. Bytes Free per Page.....................: 11.4
- Avg. Page Density (full).....................: 99.86%
As you can see compression really reduced the pages for the table, page level compression looks really good here.
显而易见,数据压缩的确减少了表的页数量页级——数据压缩看起来相当不错。
首先,让我们运行以下代码段,看看执行计划 ……
First let's look at the execution plans, running the following block of code....
select * from IndexCloseValues
where CalendarDate = '19920101'
select * from IndexCloseValuesPage
where CalendarDate = '19920101'
select * from IndexCloseValuesRow
where CalendarDate = '19920101'
...will result in this plan
Increasing the query to return a month of data instead of a day
增加查询返回数据从一天到一月。
select * from IndexCloseValues
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesPage
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesRow
where CalendarDate between '19920101' and '19920121'
will result in this plan.
You can draw your own conclusions from those images
从这些图片中你可以得出你的结论。
Let's look at some reads, first turn statistics IO on
打开STATISTICS IO, 让我们看看数据读取信息。
SET STATISTICS IO ON
go
select * from IndexCloseValues
where CalendarDate = '19920101'
select * from IndexCloseValuesPage
where CalendarDate = '19920101'
select * from IndexCloseValuesRow
where CalendarDate = '19920101'
950 row(s) affected)
Table 'IndexCloseValues'. Scan count 1, logical reads 12, physical reads 0......
(950 row(s) affected)
Table 'IndexCloseValuesPage'. Scan count 1, logical reads 7, physical reads 0......
(950 row(s) affected)
Table 'IndexCloseValuesRow'. Scan count 1, logical reads 8, physical reads 0......
Those are some nice numbers for the reads, now we will increase the date range to one month
select * from IndexCloseValues
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesPage
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesRow
where CalendarDate between '19920101' and '19920121'
(14250 row(s) affected)
Table 'IndexCloseValues'. Scan count 1, logical reads 125, physical reads 0......
(14250 row(s) affected)
Table 'IndexCloseValuesPage'. Scan count 1, logical reads 52, physical reads 0......
(14250 row(s) affected)
Table 'IndexCloseValuesRow'. Scan count 1, logical reads 69, physical reads 0......
When selecting more data the numbers look even better.
Turn statistics io off again
当选取更多的数据时,逻辑读取次数的改进效果会更好 。
现在把STATISTICS IO关闭掉。
SET STATISTICS IO OFF
go
So as you can see compression reduces the reads by over half when using page compression.
可以看出,使用页级数据压缩会减少一半以上的逻辑读取次数。