批量操作的核心就是一次传入多个数据然后进行相关操作,增删改查中掌握其中一个,其它的就可以举一反三,触类旁通。它之所以执行效率高,是因为合并后日志量(MySQL的binlog和InnoDB的事务日志)减少了,降低日志刷盘的数据量和频率,从而提高效率;同时也能减少SQL语句解析的次数,减少网络传输的IO。但是,以下几点需要注意:
-
SQL语句有长度限制,在进行数据合并在同一SQL中务必不能超过SQL长度限制,通过max_allowed_packet配置可以修改,默认是1M。
-
事务需要控制大小,事务太大可能会影响执行的效率。MySQL有innodb_log_buffer_size配置项,超过这个值会把InnoDB的数据刷到磁盘中,这时,效率会有所下降。所以比较好的做法是,在数据达到这个值前进行事务提交。
在《Java 使用线程池分批插入或者更新数据》中介绍了如何在Java端批量切分数据,然后,使用线程池将被切分的数据传入相应DAO层的方法后,即可完整实现批量操作。在《Mybatis批量insert 返回主键值和foreach标签详解》中已经介绍了批量插入操作,而且,详细描述了foreach标签,这里简要概述批量删除、更新和查找。
首先,mysql需要数据库连接配置&allowMultiQueries=true
jdbc:mysql://127.0.0.1:3306/mybank?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true
批量删除
<delete id= "batchDeleteByIds" parameterType= "list">
delete from instance where instance_id in
<foreach collection="list" item= "item" index ="index"
open= "(" close =")" separator=",">
#{item}
</foreach >
</delete >
|
批量更新
<update id= "updateUpdateTimeByIds" parameterType= "map">
update instance
set update_time = #{ updateTime } where instance_id in
<foreach collection="idlist" item= "uid" index ="index"
open= "(" close =")" separator=",">
#{ uid}
</foreach >
</update >
|
用法和之前的基本相同,但是这里传入的参数是map类型,idlist和updateTime是map的key。
批量查询
<select id="selectByIds" resultType="list" parameterType="map">
SELECT infos, create_time, update_time FROM instance WHERE instance_id in
<foreach collection="ids" item="id" index="index" open="(" close=")" separator=",">
#{id}
</foreach>
</select>
|
这里提供一下DAO层:
List<Instance> selectByIds (Map<String, Object> map);
void batchDeleteByIds (List<Long> list);
void updateUpdateTimeByIds(Map<String, Object> map);
|
乍看上去这个foreach没有问题,但是经过项目实践发现,当表的列数较多(20+),以及一次性插入的行数较多(5000+)时,整个插入的耗时十分漫长,达到了14分钟,这是不能忍的。在资料中也提到了一句话:
Of course don't combine ALL of them, if the amount is HUGE. Say you have 1000 rows you need to insert, then don't do it one at a time. You shouldn't equally try to have all 1000 rows in a single query. Instead break it into smaller sizes.
|
它强调,当插入数量很多时,不能把所有的鸡蛋放在同一个篮子里,即一次性全放在一条语句里。可是为什么不能放在同一条语句里呢?这条语句为什么会耗时这么久呢?我查阅了资料发现:
Insert inside MyBatis foreach is not batch, this is a single (could become giant) SQL statement and that brings drawbacks:
Iteration over the collection must not be done in the mybatis XML. Just execute a simple Insertstatement in a Java Foreach loop. The most important thing is the session Executor type.
SqlSession session = sessionFactory.openSession(ExecutorType.BATCH); for (Model model : list) { session.insert("insertStatement", model); }
Unlike default ExecutorType.SIMPLE, the statement will be prepared once and executed for each record to insert.
|
虽然MyBatis官网推荐使用ExecutorType.BATCH 的插入方式,因为,其性能更好;但是,其SQL写在了Java里,如果SQL比较复杂,则不易于维护。因此,本文只详细介绍了常见的使用foreach标签的方式。下面从MyBatis官网借用一个Batch Insert
示例。
A batch insert is a collection of statements that can be used to execute a JDBC batch. A batch is the preferred method of doing bulk inserts with JDBC. The basic idea is that you configure the connection for a batch insert, then execute the same statement multiple times, with different values for each inserted record. MyBatis has a nice abstraction of JDBC batches that works well with statements generated from this library. A batch insert looks like this:
... SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH); try { SimpleTableMapper mapper = session.getMapper(SimpleTableMapper.class); List<SimpleTableRecord> records = getRecordsToInsert(); // not shown
BatchInsert<SimpleTableRecord> batchInsert = insert(records) .into(simpleTable) .map(id).toProperty("id") .map(firstName).toProperty("firstName") .map(lastName).toProperty("lastName") .map(birthDate).toProperty("birthDate") .map(employed).toProperty("employed") .map(occupation).toProperty("occupation") .build() .render(RenderingStrategy.MYBATIS3); batchInsert.insertStatements().stream().forEach(mapper::insert); session.commit(); } finally { session.close(); } ...
Reference