Contents
从InnoDB存储引擎的逻辑存储结构看,所有数据都被逻辑地存放在一个空间中,称之为表空间(tablespace)。表空间又由段(segment)、区(extent)、页(page)组成。页在一些文档中有时也称为块(block),InnoDB存储引擎的逻辑存储结构大致如图:
表空间
表空间可以看做是InnoDB存储引擎逻辑结构的最高层,所有的数据都存放在表空间中。在默认情况下InnoDB存储引擎有一个共享表空间ibdata1,即所有数据都存放在这个表空间内。如果用户启用了参数innodb_file_per_table,则每个表内的数据可以单独放到一个表空间内。
如果启动了innodb_file_per_table参数,需要注意的是每张表的表空间内存放的只是数据、索引和插入缓冲Bitmap页,其他类的数据,如回滚(undo)信息,插入缓冲索引页、系统事务信息,二次写缓冲(Double write buffer)等还是存放在原来的共享表空间内。这同时也说明了另一个问题:即使在启用了参数innodb_file_per_table之后,共享表空间还是会不断地增加其大小。可以来做一个实验,在实验之前已经将innodb_file_per_table设为ON了。现在看看初始共享表空间文件的大小:
mysql> show variables like 'innodb_file_per_table'G
*************************** 1. row ***************************
Variable_name: innodb_file_per_table
Value: ON
1 row in set (0.00 sec)
mysql> system ls -tlhr /home/mysql/mysql/data/ibdata*
-rw-rw----. 1 mysql mysql 204M Mar 21 05:54 /home/mysql/mysql/data/ibdata1
mysql>
可以看到,共享表空间ibdata1的大小为204MB,接着模拟产生undo的操作,使用表orders,并把其存储引擎更改为InnoDB,执行如下操作:
mysql> update orders set device_number=0;
Query OK, 3278492 rows affected (0.03 sec)
Rows matched: 3278492 Changed: 3278492 Warnings: 0
mysql> system ls -tlhr /home/mysql/mysql/data/ibdata*
-rw-rw----. 1 mysql mysql 652M Mar 21 07:38 /home/mysql/mysql/data/ibdata1
mysql>
这里首先将自动提交设为0,即用户需要显式提交事务(注意,在上面操作结束时,并没有对该事务执行commit或rollback)。接着执行会产生大量的undo操作的语句update orders set device_number=0,完成后再观察共享表空间,会发现ibdata1已经增长到了652MB。这个例子虽然简单,但是足以说明共享表空间中还包含有undo信息。
有用户会问,如果对k这个事务执行rollback,ibdata1这个表空间会不会缩减至原来的大小(204MB)?这可以通过继续运行下面的语句得到验证:
mysql> rollback;
Query OK, 0 rows affected (0.01 sec)
mysql> system ls -tlhr /home/mysql/mysql/data/ibdata*
-rw-rw----. 1 mysql mysql 652M Mar 21 07:49 /home/mysql/mysql/data/ibdata1
mysql>
很“可惜”,共享表空间的大小还是204MB,即InnoDB存储引擎不会在执行rollback时去收缩这个表空间。虽然InnoDB不会回收这些空间,但是会自动判断这些undo信息是否还需要,如果不需要,则会将这些空间标记为可用空间,供下次undo使用。
master thread每10秒会执行一次的full purge操作,很有可能的一种情况是:用户再次执行上述的update语句后,会发现ibdata1不会再变大了,那就是这个原因了。
使用py_innodb_page_info小工具查看表空间中各页的类型和信息,用户可以在code.google.com上搜索david-mysql-tools进行查找。使用方法如下:
[[email protected] data]$ python ~/py_innodb_page_info.py ibdata1
Total number of page: 41728:
Insert Buffer Free List: 1035
Insert Buffer Bitmap: 3
System Page: 131
Transaction system Page: 2
Freshly Allocated Page: 5074
Undo Log Page: 33238
File Segment inode: 5
B-tree Node: 2235
File Space Header: 2
扩展描述页: 3
可以看到共有41728个页,其中插入缓冲的空间有1035个页、5074个可用页、33238个undo页、2235个数据页等。用户可以通过添加-v参数来查看更详细的内容。
段(Segment)
上图中显示了表空间是由各个段组成,常见的段有数据段、索引段、回滚段等。InnoDB存储引擎表是索引组织的(index organized),因此数据即索引,索引即数据。那么数据段即为B 树的叶子节点(图中Leaf of node segment),索引段即为B 树的非索引节点(图中Nonleaf node segment)。回滚段较为特殊,将会在后面进行单独的介绍。
在InnoDB存储引擎中,对段的管理都是由引擎自身所完成,DBA不能也没有必要对其进行控制。这和Oracle数据库中的自动段空间管理(ASSM)类似,从一定程度上简化了DBA对于段的管理。
区(Extent)
区是由连续页组成的空间,在任何情况下每个区的大小都为1MB。为了保证区中页的连续性InnoDB存储引擎一次从磁盘申请4~5个区。在默认情况下,InnoDB存储引擎页的大小为16KB,即一个区中一共有64个连续的页。
InnoDB 1.0.x版本开始引入压缩页,即每个页的大小可以通过参数KEY_BLOCK_SIZE设置为2K、4K、8K,因此每个区对应页的数量就应该为512、256、128.
InnoDB 1.2.x版本新增了参数innodb_page_size,通过该参数可以将默认页的大小设置为4K、8K,但是页中的数据不是压缩。这时区中页的数量同样也为256、128。总之,不论页的大小怎么变化,区的大小总是为1M。
但是这里还有这样一个问题:在用户启用了参数innodb_file_per_table后,创建的表默认大小是96KB。区中是64个连续的页,创建的表的大小至少是1MB才对啊?其实这是因为在每个段开始时,先用32个页大小的碎片页(fragent page)来存放数据,在使用完这些页之后才是64个连续页的申请。这样做的目的是,对于一些小表,或者是undo这类的段,可以在开始申请较少的空间,节省磁盘容量的开销。这里可以通过一个很小的示例来显示InnoDB存储引擎对于区的申请方式:
mysql> create table t1(
-> col1 int not null auto_increment,
-> col2 varchar(7000),
-> primary key(col1))engine=innodb;
Query OK, 0 rows affected (0.05 sec)
mysql> system ls -lh /home/mysql/mysql/data/guo/t1.ibd;
-rw-rw----. 1 mysql mysql 96K Mar 21 12:13 /home/mysql/mysql/data/guo/t1.ibd
上述的构架语句创建了t1表,将col2字段设为VARCHAR(7000),这样能保证一个页最多可以存放2条记录。通过ls命令可以发现,初始化并创建t1表后,表空间默认大小为96KB,接着运行如下SQL语句:
mysql> insert t1 select null,repeat('a',7000);
Query OK, 1 row affected (0.03 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert t1 select null,repeat('a',7000);
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> system ls -lh /home/mysql/mysql/data/guo/t1.ibd;
-rw-rw----. 1 mysql mysql 96K Mar 21 12:18 /home/mysql/mysql/data/guo/t1.ibd
mysql>
插入两条记录,根据之前对表的定义,这两条记录应该位于同一个页中。如果这时通过py_innodb_page_info工具来查看表空间,可以看到:
[[email protected] ~]$ python py_innodb_page_info.py -v mysql/data/guo/t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 6:
Freshly Allocated Page: 2
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 1
File Segment inode: 1
[[email protected] ~]$
这次使用-v详细模式来看表空间的内容,注意到了page offset为3的页,这个就是数据页。page level表示所在的索引层,0表示叶子节点。因为当前所有记录都在一个页中,因此没有非叶节点。但是如果这时用户再插入一条记录,就会产生一个非叶节点:
mysql> system ls -lh /home/mysql/mysql/data/guo/t1.ibd;
-rw-rw----. 1 mysql mysql 96K Mar 21 12:18 /home/mysql/mysql/data/guo/t1.ibd
mysql> insert t1 select null,repeat('a',7000);
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
[[email protected] ~]$ python py_innodb_page_info.py -v mysql/data/guo/t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001>
page offset 00000004, page type <B-tree Node>, page level <0000>
page offset 00000005, page type <B-tree Node>, page level <0000>
Total number of page: 6:
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 3
File Segment inode: 1
现在可以看到page offset为3的页的page level由之前的0变为了1,这时虽然新插入的记录导致了B 树的分裂操作,再插入60条记录也就是说当前表t1*有63条记录,32个页。为了导入的方便,在这之前先建立一个导入的存储过程。
mysql> DELIMITER //
mysql> CREATE PROCEDURE load_t1(count int unsigned)
-> BEGIN
-> DECLARE s INT UNSIGnED DEFAULT 1;
-> DECLARE c VARCHAR(7000) DEFAULT REPEAT('a',7000);
-> while s <= count DO
-> insert into t1 select null,c;
-> set s=s 1;
-> end while;
-> end;
-> //
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER ;
mysql> call load_t1(60);
Query OK, 1 row affected (0.17 sec)
mysql> select count(*) from t1;
----------
| count(*) |
----------
| 63 |
----------
1 row in set (0.00 sec)
mysql>
mysql> system ls -lh /home/mysql/mysql/data/guo/t1.ibd
-rw-rw----. 1 mysql mysql 576K Apr 14 18:08 /home/mysql/mysql/data/guo/t1.ibd
mysql>
可以看到,在导入了63条数据后,表空间的大小还是小于1MB,即表示数据空间的申请还是通过碎片页,而不是通过64个连续的区。这时如果通过py_innodb_page_info工具再来观察表空间t1.idb文件,可得:
[[email protected] ~]$ python py_innodb_page_info.py -v /home/mysql/mysql/data/guo/t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001>
page offset 00000004, page type <B-tree Node>, page level <0000>
page offset 00000005, page type <B-tree Node>, page level <0000>
page offset 00000006, page type <B-tree Node>, page level <0000>
page offset 00000007, page type <B-tree Node>, page level <0000>
page offset 00000008, page type <B-tree Node>, page level <0000>
page offset 00000009, page type <B-tree Node>, page level <0000>
page offset 0000000a, page type <B-tree Node>, page level <0000>
page offset 0000000b, page type <B-tree Node>, page level <0000>
page offset 0000000c, page type <B-tree Node>, page level <0000>
page offset 0000000d, page type <B-tree Node>, page level <0000>
page offset 0000000e, page type <B-tree Node>, page level <0000>
page offset 0000000f, page type <B-tree Node>, page level <0000>
page offset 00000010, page type <B-tree Node>, page level <0000>
page offset 00000011, page type <B-tree Node>, page level <0000>
page offset 00000012, page type < 大专栏 InnoDB逻辑存储结构;B-tree Node>, page level <0000>
page offset 00000013, page type <B-tree Node>, page level <0000>
page offset 00000014, page type <B-tree Node>, page level <0000>
page offset 00000015, page type <B-tree Node>, page level <0000>
page offset 00000016, page type <B-tree Node>, page level <0000>
page offset 00000017, page type <B-tree Node>, page level <0000>
page offset 00000018, page type <B-tree Node>, page level <0000>
page offset 00000019, page type <B-tree Node>, page level <0000>
page offset 0000001a, page type <B-tree Node>, page level <0000>
page offset 0000001b, page type <B-tree Node>, page level <0000>
page offset 0000001c, page type <B-tree Node>, page level <0000>
page offset 0000001d, page type <B-tree Node>, page level <0000>
page offset 0000001e, page type <B-tree Node>, page level <0000>
page offset 0000001f, page type <B-tree Node>, page level <0000>
page offset 00000020, page type <B-tree Node>, page level <0000>
page offset 00000021, page type <B-tree Node>, page level <0000>
page offset 00000022, page type <B-tree Node>, page level <0000>
page offset 00000023, page type <B-tree Node>, page level <0000>
Total number of page: 36:
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 33
File Segment inode: 1
可以观察到B-tree Node页一共有33个,除去一个page level为1的非叶节点页,一共有32个page level为0的页,也就是说,对于数据段,已经有32个碎片页了。之后用户再申请空间,则空间按连续64个页的大小开始增长了。好了,接着就这样来操作,插入一条数据,看之后表空间的大小:
mysql> call load_t1(1);
Query OK, 1 row affected (0.01 sec)
mysql> system ls -lh /home/mysql/mysql/data/guo/t1.ibd
-rw-rw----. 1 mysql mysql 2.0M Apr 14 18:18 /home/mysql/mysql/data/guo/t1.ibd
mysql>
已经用完了32个碎片页,新的页会采用区的方式进行空间的申请,如果此时用户再通过py_innodb_page_info工具来看表空间文件t1.idb,应该可以看到很多类型为Freshly Allocated Page的页:
[[email protected] ~]$ python py_innodb_page_info.py -v /home/mysql/mysql/data/guo/t1.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001>
page offset 00000004, page type <B-tree Node>, page level <0000>
page offset 00000005, page type <B-tree Node>, page level <0000>
page offset 00000006, page type <B-tree Node>, page level <0000>
page offset 00000007, page type <B-tree Node>, page level <0000>
page offset 00000008, page type <B-tree Node>, page level <0000>
page offset 00000009, page type <B-tree Node>, page level <0000>
page offset 0000000a, page type <B-tree Node>, page level <0000>
page offset 0000000b, page type <B-tree Node>, page level <0000>
page offset 0000000c, page type <B-tree Node>, page level <0000>
page offset 0000000d, page type <B-tree Node>, page level <0000>
page offset 0000000e, page type <B-tree Node>, page level <0000>
page offset 0000000f, page type <B-tree Node>, page level <0000>
page offset 00000010, page type <B-tree Node>, page level <0000>
page offset 00000011, page type <B-tree Node>, page level <0000>
page offset 00000012, page type <B-tree Node>, page level <0000>
page offset 00000013, page type <B-tree Node>, page level <0000>
page offset 00000014, page type <B-tree Node>, page level <0000>
page offset 00000015, page type <B-tree Node>, page level <0000>
page offset 00000016, page type <B-tree Node>, page level <0000>
page offset 00000017, page type <B-tree Node>, page level <0000>
page offset 00000018, page type <B-tree Node>, page level <0000>
page offset 00000019, page type <B-tree Node>, page level <0000>
page offset 0000001a, page type <B-tree Node>, page level <0000>
page offset 0000001b, page type <B-tree Node>, page level <0000>
page offset 0000001c, page type <B-tree Node>, page level <0000>
page offset 0000001d, page type <B-tree Node>, page level <0000>
page offset 0000001e, page type <B-tree Node>, page level <0000>
page offset 0000001f, page type <B-tree Node>, page level <0000>
page offset 00000020, page type <B-tree Node>, page level <0000>
page offset 00000021, page type <B-tree Node>, page level <0000>
page offset 00000022, page type <B-tree Node>, page level <0000>
page offset 00000023, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000040, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 128:
Freshly Allocated Page: 91
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 34
File Segment inode: 1
页
同大多数数据库一样,InnoDB有页(Page)的概念(也可以称为块),页是InnoDB磁盘管理的最小单位。在InnoDB存储引擎中,默认每个页的大小为16KB。而从InnoDB 1.2.x版本开始,可以通过参数innodb_page_size将页的大小设置为4K、8K、16K。若设置完成,则所有表中页的大小都为innodb_page_size,不可以对其再进行修改。除非mysqldump导入和导出操作来产生新的库。
在InnoDB存储引擎中,常见的页类型有:
- 数据页(B-tree Node)
- undo页(undo Log Page)
- 系统页(System Page)
- 事务数据页(Transaction system Page)
- 插入缓冲位图页(Insert Buffer Bitmap)
- 插入缓冲空闲列表页(Insert Buffer Free List)
- 未压缩的二进制大对象页(Uncompressed BLOB Page)
- 压缩的二进制大对象页(compressed BLOB Page)
行
InnoDB存储引擎是面向行的(row-oriented),也就是说数据是按行进行存放的。每个页存放的行记录也是有硬性定义的,最多允许存放16KB/2-200行的记录,即7992行记录。这里提到了row-oriented的数据库,也就是说,存在有column-oriened的数据库。MySQL infobright存储引擎就是按列来存放数据的,这对于数据仓库下的分析类SQL语句的执行数据压缩非常有帮助。类似的数据库还有Sybase IQ、Google Big Table。