Right - I want to delete (e.g.) 1,000,000 records from a database. This takes a long time -> the transaction times out and fails. So - I delete them in batches say 25000 records per transaction. Using the limit clause on MySQL or ROWNUM on Oracle. Great this works.
对 - 我想从数据库中删除(例如)1,000,000条记录。这需要很长时间 - >事务超时并失败。所以 - 我分批删除它们说每笔交易25000条记录。在Oracle上使用MySQL或ROWNUM上的limit子句。这很棒。
I want to do this in a database independent way. And from an existing Java code base that uses JPA/Hibernate.
我想以独立于数据库的方式这样做。并且来自使用JPA / Hibernate的现有Java代码库。
Out of luck. JPA Query.setMaxResults and setFirstResult have no effect for write 'queries' (e.g. delete). Selecting many entities into memory to delete them individually is very slow and dumb I'd say.
运气不好。 JPA Query.setMaxResults和setFirstResult对写入“查询”(例如删除)没有任何影响。在内存中选择许多实体来单独删除它们是非常缓慢和愚蠢的我会说。
So I use a native query and manage the 'limit' clause in application code. It'd be nice to encapsulate this clause in orm.xml but ... "Hibernate Annotations 3.2 does not support bulk update/deletes using native queries." - http://opensource.atlassian.com/projects/hibernate/browse/ANN-469.
所以我使用本机查询并管理应用程序代码中的'limit'子句。将这个子句封装在orm.xml中会很好,但是......“Hibernate Annotations 3.2不支持使用本机查询进行批量更新/删除。” - http://opensource.atlassian.com/projects/hibernate/browse/ANN-469。
I'd imagine this is a common problem. Anybody got a better database independent solution?
我想这是一个常见的问题。有人有更好的数据库独立解决方案吗?
4 个解决方案
#1
I hate to give a non constructive answer but an ORM isn’t really meant for doing bulk operations on the database. So it looks like you native query is probably the best bet for these operations.
我讨厌给出一个非建设性的答案,但ORM并不是真正意义上对数据库进行批量操作。所以看起来你的本机查询可能是这些操作的最佳选择。
You should also make sure that your ORM is updated to reflect the new state of the database otherwise you may get some weirdness happening.
您还应该确保更新ORM以反映数据库的新状态,否则您可能会发生一些奇怪的事情。
ORMs are great tools for mapping objects to databases, but they are not generally generic database interfaces.
ORM是将对象映射到数据库的绝佳工具,但它们通常不是通用的数据库接口。
#2
Limits on queries is a database specific feature and there is no SQL standard (I agree there should be).
查询限制是一个特定于数据库的功能,没有SQL标准(我同意应该有)。
A solution which works with most databases is using a view to group several tables into one. Each table contains a subset of the data (say one day). This allows you to drop a whole subset at once. That said, many databases have issues with running UPDATE and INSERT on such a view.
适用于大多数数据库的解决方案是使用视图将多个表分组为一个。每个表包含数据的子集(比如说一天)。这允许您一次删除整个子集。也就是说,许多数据库在这样的视图上运行UPDATE和INSERT都存在问题。
You can usually work around this by creating a view or alias for INSERT/UPDATE (which points to a single table; the "current" one) and a grouping view for searching.
您通常可以通过为INSERT / UPDATE(指向单个表;“当前”表)和用于搜索的分组视图创建视图或别名来解决此问题。
Some databases also offer partitions which is basically the same thing except that you can define a column which specifies in which underlying table a row should go (on INSERT). When you need to delete a subset, you can drop/truncate one of the underlying tables.
有些数据库还提供了基本相同的分区,除了你可以定义一个列,该列指定行应该在哪个底层表中(在INSERT上)。当您需要删除子集时,可以删除/截断其中一个基础表。
#3
I believe you can use HQL (JPA QL) direct DML operations which will bypass the persistence context and cache, and execute the (resulting SQL) statements directly:
我相信您可以使用HQL(JPA QL)直接DML操作,它将绕过持久性上下文和缓存,并直接执行(生成的SQL)语句:
Query q = session.createQuery("delete YourEntity ye where ye.something like :param");
q.setParameter("param", "anything");
int deletedEntities = q.executeUpdate();
#4
q.setMaxResults(int)
...sony
#1
I hate to give a non constructive answer but an ORM isn’t really meant for doing bulk operations on the database. So it looks like you native query is probably the best bet for these operations.
我讨厌给出一个非建设性的答案,但ORM并不是真正意义上对数据库进行批量操作。所以看起来你的本机查询可能是这些操作的最佳选择。
You should also make sure that your ORM is updated to reflect the new state of the database otherwise you may get some weirdness happening.
您还应该确保更新ORM以反映数据库的新状态,否则您可能会发生一些奇怪的事情。
ORMs are great tools for mapping objects to databases, but they are not generally generic database interfaces.
ORM是将对象映射到数据库的绝佳工具,但它们通常不是通用的数据库接口。
#2
Limits on queries is a database specific feature and there is no SQL standard (I agree there should be).
查询限制是一个特定于数据库的功能,没有SQL标准(我同意应该有)。
A solution which works with most databases is using a view to group several tables into one. Each table contains a subset of the data (say one day). This allows you to drop a whole subset at once. That said, many databases have issues with running UPDATE and INSERT on such a view.
适用于大多数数据库的解决方案是使用视图将多个表分组为一个。每个表包含数据的子集(比如说一天)。这允许您一次删除整个子集。也就是说,许多数据库在这样的视图上运行UPDATE和INSERT都存在问题。
You can usually work around this by creating a view or alias for INSERT/UPDATE (which points to a single table; the "current" one) and a grouping view for searching.
您通常可以通过为INSERT / UPDATE(指向单个表;“当前”表)和用于搜索的分组视图创建视图或别名来解决此问题。
Some databases also offer partitions which is basically the same thing except that you can define a column which specifies in which underlying table a row should go (on INSERT). When you need to delete a subset, you can drop/truncate one of the underlying tables.
有些数据库还提供了基本相同的分区,除了你可以定义一个列,该列指定行应该在哪个底层表中(在INSERT上)。当您需要删除子集时,可以删除/截断其中一个基础表。
#3
I believe you can use HQL (JPA QL) direct DML operations which will bypass the persistence context and cache, and execute the (resulting SQL) statements directly:
我相信您可以使用HQL(JPA QL)直接DML操作,它将绕过持久性上下文和缓存,并直接执行(生成的SQL)语句:
Query q = session.createQuery("delete YourEntity ye where ye.something like :param");
q.setParameter("param", "anything");
int deletedEntities = q.executeUpdate();
#4
q.setMaxResults(int)
...sony