写在前面:此总结为2013年的某一天的讨论中延伸得到,以下为正文。
根据这几天的讨论和查阅相关资料,我将这个问题总结如下:
源问题为:mysql对含有auto_increment字段的表多次load数据时(中间不使用insert),表中auto_increment的字段中值会出现不连续的,叫做“间隙”或者“断裂”(gaps)。为什么会出现这种情况呢?
以这个问题扩展出来的问题为:
- load data和普通的insert有什么区别?
- Innodb的锁机制,load data和insert过程的锁机制?
- auto_increment的最大值?
**
从mysql的Insert说起
(以下出自MySQL官网):
First, some definitions:
“INSERT-like” statements
All statements that generate new rows in a table, including INSERT, INSERT … SELECT, REPLACE, REPLACE … SELECT, and LOAD DATA.
“插入相关”的语句
以下所有的语句都能在表中产生新的行,有INSERT,INSERT… SELECT,REPLACE,REPLACE … SELECT,和LOAD DATA。“Simple inserts”
Statements for which the number of rows to be inserted can be determined in advance (when the statement is initially processed). This includes single-row and multiple-row INSERT and REPLACE statements that do not have a nested subquery, but not INSERT … ON DUPLICATE KEY UPDATE.
“一般插入”
当语句在初始化执行(initially processed)的时候,就知道了要插入的行的数量,这包括单行和多行INSERT以及REPLACE语句,并且这些语句没有子嵌套语句。但是不包括INSERT… ON DUPLICATE KEY UPDATE“Bulk inserts”
Statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. This includes INSERT … SELECT, REPLACE … SELECT, and LOAD DATA statements, but not plain INSERT. InnoDB will assign new values for the AUTO_INCREMENT column one at a time as each row is processed.
“批量插入”
语句中要插入的行的数量和需要的auto_increment值的数量不能提前知道。这包括INSERT … SELECT,REPLACE …SELECT和load data语句,但不包括一般的插入语句(plain insert),innodb 会在每行执行的时候给auto_increment列分配新的值。“Mixed-mode inserts”
These are “simple insert” statements that specify the auto-increment value for some (but not all) of the new rows. An example follows, where c1 is an AUTO_INCREMENT column of table t1:
INSERT INTO t1 (c1,c2) VALUES (1,’a’), (NULL,’b’), (5,’c’), (NULL,’d’);
Another type of “mixed-mode insert” is INSERT … ON DUPLICATE KEY UPDATE, which in the worst case is in effect an INSERT followed by a UPDATE, where the allocated value for the AUTO_INCREMENT column may or may not be used during the update phase.
“混合模式插入”
这个模式相当于是“一般插入”语句对一些新行(但不是所有的行)指定了auto_increment的值。下面是一个例子,这里c1是table t1的一个auto_increment 列。
INSERT INTO t1 (c1,c2) VALUES (1,’a’), (NULL,’b’), (5,’c’), (NULL,’d’);
另一种类型的“混合插入”是INSERT … ON DUPLICATE KEY UPDATE,最坏的情况是一个insert语句后面跟着一个update语句,这时候auto_increment列所分配的值可能会或者说不会在更新语句中使用。
本人较懒,下面的关于锁模式的这个参数的介绍就不翻译了。
innodb_autoinc_lock_mode = 0 (“traditional” lock mode)
This lock mode provides the same behavior as before innodb_autoinc_lock_mode existed. For all “INSERT-like”statements, a special table-level AUTO-INC lock is obtained and held to the end of the statement. This assures that the auto-increment values assigned by any given statement are consecutive.innodb_autoinc_lock_mode = 1 (“consecutive” lock mode)
This is the default lock mode. In this mode, “bulk inserts” use the special AUTO-INC table-level lock and hold it until the end of the statement. This applies to all INSERT … SELECT, REPLACE … SELECT, and LOAD DATAstatements. Only one statement holding the AUTO-INC lock can execute at a time.
With this lock mode, “simple inserts” (only) use a new locking model where a light-weight mutex is used during the allocation of auto-increment values, and no table-level AUTO-INC lock is used, unless an AUTO-INC lock is held by another transaction. If another transaction does hold an AUTO-INC lock, a “simple insert” waits for the AUTO-INClock, as if it too were a “bulk insert”.
This lock mode ensures that, in the presence of INSERT statements where the number of rows is not known in advance (and where auto-increment numbers are assigned as the statement progresses), all auto-increment values assigned by any “INSERT-like” statement are consecutive, and operations are safe for statement-based replication.
Simply put, the important impact of this lock mode is significantly better scalability. This mode is safe for use with statement-based replication. Further, as with “traditional” lock mode, auto-increment numbers assigned by any given statement are consecutive. In this mode, there is no change in semantics compared to “traditional” mode for any statement that uses auto-increment, with one important exception.
The exception is for “mixed-mode inserts”, where the user provides explicit values for an AUTO_INCREMENT column for some, but not all, rows in a multiple-row “simple insert”. For such inserts, InnoDB will allocate more auto-increment values than the number of rows to be inserted. However, all values automatically assigned are consecutively generated (and thus higher than) the auto-increment value generated by the most recently executed previous statement. “Excess” numbers are lost.innodb_autoinc_lock_mode = 2 (“interleaved” lock mode)
In this lock mode, no “INSERT-like” statements use the table-level AUTO-INC lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log.
In this lock mode, auto-increment values are guaranteed to be unique and monotonically increasing across all concurrently executing “INSERT-like” statements. However, because multiple statements can be generating numbers at the same time (that is, allocation of numbers is interleaved across statements), the values generated for the rows inserted by any given statement may not be consecutive.
If the only statements executing are “simple inserts” where the number of rows to be inserted is known ahead of time, there will be no gaps in the numbers generated for a single statement, except for “mixed-mode inserts”. However, when “bulk inserts” are executed, there may be gaps in the auto-increment values assigned by any given statement.-
Gaps in auto-increment values for “bulk inserts”
With innodb_autoinc_lock_mode set to 0 (“traditional”) or 1 (“consecutive”), the auto-increment values generated by any given statement will be consecutive, without gaps, because the table-level AUTO-INC lock is held until the end of the statement, and only one such statement can execute at a time.
With innodb_autoinc_lock_mode set to 2 (“interleaved”), there may be gaps in the auto-increment values generated by “bulk inserts,” but only if there are concurrently executing “INSERT-like” statements.
For lock modes 1 or 2, gaps may occur between successive statements because for bulk inserts the exact number of auto-increment values required by each statement may not be known and overestimation is possible.
源问题之所以会出现,直接原因是因为mysql的一个参数设置(innodb_autoinc_lock_mode)。跟auto_increments相关的参数一共有四个:innodb_autonic_lock_mode,innodb_autoextend_increment,auto_increment_increment,auto_increment_offset。其中innodb_autonic_lock_mode共有三个值,分别是:
innodb_autonic_lock_mode=0
innodb_autonic_lock_mode=1
innodb_autonic_lock_mode=2
详细说明见上方介绍!
其中mysql中一共有三种插入模式,分别是:
- simple insert
- bulk insert
- mixed insert
关于这三种模式的详细说明见上方介绍
load data是怎样执行的?
load data 一般比insert 语句快20倍左右,为什么会那么快呢,我了解的也不是很清楚,只是用bin-log日志做了一个小小的测试(bin-log默认调为–verbose详细信息输出模式):下面是执行一次load(innodb引擎)的bin-log日志内容
BEGIN
/*!*/;
# at 175
# at 221
#121122 21:36:30 server id 1 end_log_pos 221 Table_map: `test`.`t` mapped to number 33
#121122 21:36:30 server id 1 end_log_pos 295 Write_rows: table id 33 flags: STMT_END_F
BINLOG '
XiquUBMBAAAALgAAAN0AAAAAACEAAAAAAAEABHRlc3QAAXQAAwP+/gT+BP4UBg==
XiquUBcBAAAASgAAACcBAAAAACEAAAAAAAEAA//4AQAAAAFhAzIzDfgCAAAAAWQDamsN+AMAAAAC
NDUDZHMN+AQAAAABZANqaw0=
'/*!*/;
### INSERT INTO test.t
### SET
### @1=1
### @2='a'
### @3='23\x0d'
### INSERT INTO test.t
### SET
### @1=2
### @2='d'
### @3='jk\x0d'
### INSERT INTO test.t
### SET
### @1=3
### @2='45'
### @3='ds\x0d'
### INSERT INTO test.t
### SET
### @1=4
### @2='d'
### @3='jk\x0d'
# at 295
#121122 21:36:30 server id 1 end_log_pos 322 Xid = 1
COMMIT/*!*/;
从上面的日志内容中很容看到事务型的标志BEGIN……COMMIT,证明load语句是事务型的,一个load就是一个事务。并且还可以看到load执行的时候实际上是一个一个的”insert“,当然这种说法不正确,单个insert的执行顺序及所占时间可以分解为:
- 连接:(3)
- 发送查询给服务器:(2)
- 分析查询:(2)
- 插入记录:(1x记录大小)
- 插入索引:(1x索引)
- 关闭:(1)
所以load里面的每个“insert”只相当于上面的插入记录(1x记录大小)部分,然后在load之后统一建立索引,这里面还有各种的日志写入等。都是要耗费时间的,所以综上load语句比insert语句快那么多也就可以理解了。实际上我根据这个测试插入1000条语句分别用load和insert语句,对应的时间计算为:
load:3+2+2+1*1000+1*N+1约=1100。(这个N是多少不清楚,但应该不会太大)
insert:1000*(3+2+2+1+1)约=10000。
所以load应该比insert快将近10倍的样子,但仍比20倍(官方)是有差距的。
当上面的知识都了解之后,相信这个问题也就不是什么问题了。load data 应该是bulk insert(批量插入),而bulk insert在innodb_autonic_lock_mode=0或1时 ,由于mysql插入的时候auto_increment列将获取一个特殊的锁auto-inc lock(表级别的锁),这个锁将一直维持到语句的结束(不是事务的结束),所以在这个事务里,auto_increment一定是连续的。但是,当innodb_autonic_lock_mode=1时(默认),由于bulk insert无法在插入之前就知道这个事务里一共有多少条语句要插入,所以就会预留分配(这里有一个分配策略),所以两次load,就是两个事务之间就会产生auto_inc的不连续(间隙)。
当innodb_autonic_lock_mode=2时,这个是交叉分配,这个在同步复制的时候非常危险(详细的原因不做详解)。
上面所说的一个分配策略是指:首先会分配一条语句的空间(就是给auto_increment的增长空间只有1),当不够的时候就变成1+1*2个语句的空间(此时auto_increment的最大值为3,下一个值为4),当再不够的时候就变成1+1*2+1*2*2个,再不够的时候就变成了1+1*2+1*2*2+1*2*2*2个,以此类推,得到auto_increment的最大值。