I need to delete millions of rows from a table from within an EJB Timer. The problem is that the timer has a transaction timeout of 90 seconds, so I should divide the work into bite-size chunks.
我需要从EJB Timer中删除表中的数百万行。问题是计时器的事务超时为90秒,所以我应该将工作分成一口大小的块。
Since I don't know how many rows can be deleted in 90 seconds the algorithm should loop and delete a few at a time until the time is almost up.
由于我不知道在90秒内可以删除多少行,因此算法应该循环并一次删除几个行,直到时间几乎结束。
The problem is: How can the number of rows to delete be limited elegantly in JPA? The delete is made on all rows having a timestamp earlier than a certain date.
问题是:如何在JPA中优雅地限制要删除的行数?删除是在时间戳早于某个日期的所有行上进行的。
I guess it is possible to find the 1000th oldest row and DELETE WHERE timestamp <= {1000th-oldest-row.timestamp}
This, however, is not very elegant and I would have to get to the last row in a 1000 to get the timestamp.
我想有可能找到第1000个最老的行和DELETE WHERE时间戳<= {1000th-oldest-row.timestamp}然而,这不是很优雅,我必须到达1000的最后一行才能获得时间戳。
Secondly, the timer should trigger immediately again if the table is not clean after the 90 seconds. This can be easily solved but, again, is not very elegant.
其次,如果在90秒后桌子不干净,计时器应立即再次触发。这很容易解决,但同样不是很优雅。
4 个解决方案
#1
2
You will still face transaction expiration issues with the solution you have.
您仍将面临使用您的解决方案的事务过期问题。
The trick is to execute each chunk in a separate transaction as shown below in pesudo code.
诀窍是在单独的事务中执行每个块,如下面的pesudo代码所示。
@Entity
@NamedQueries ( value = {
@NamedQuery (
name = pagedDeleteExpiredItems
query= DELETE FROM MyTable
WHERE (<table key>) IN (
SELECT <table key> FROM (
SELECT ROWNUM AS row_num, <table key> FROM MyTable
WHERE timestamp <= :currentTime
)
WHERE row_num < :pageSize
)
)
})
public class MyEntity {
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
Query query = em.createNamedQuery("pagedDeleteExpiredItems");
query.setParameter("currentTime", currentTime);
query.setParameter("pageSize", pageSize);
int deleteCount = query.executeUpdate();
return deleteCount;
}
}
@EJBTimer
public class DeleteExpiredItemsTimer {
@EJB(beanName = "MyEntity")
MyEntity myEntity;
@TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
void handleTimeout(Timer timer) {
Date currentTime = getCurrentTime()
int pageSize = 100
int deleteCount;
do {
myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
} while(deleteCount>0);
}
}
#2
1
We had a similar requirement and here is how we solved it. I was using EJB 3.0.
我们有类似的要求,这就是我们如何解决它。我在使用EJB 3.0。
- Timer is started when the app. server starts (or the module is deployed) in a
ServletContextListener
. - 应用程序启动计时器。服务器在ServletContextListener中启动(或部署模块)。
- When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
- 当计时器触发时,它最多可处理100行待处理的行。然后,您需要订购查询结果并限制行数。
- If there was 100 rows, the timer schedules the next timeout with
0ms
. That it, the transaction is committed, and the timer fires again in a new transaction. - 如果有100行,则计时器以0ms调度下一个超时。它,事务被提交,并且计时器在新事务中再次触发。
- If there was less than 100 rows, the timer schedule the next timeout in
90sec
. - 如果少于100行,则定时器将在90秒内安排下一个超时。
If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.
如果有250行,则计时器按顺序触发三次。如果正好要处理100行,那么只有一个小问题,在这种情况下,计时器按顺序触发两次,但第二次触发实际上没有任何处理。但总而言之,它运作正常。
#3
1
One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:
我在SQL中使用的一个技巧是删除TOP 1000(或100或10000,取决于页面中的平均行数),如下所示:
DELETE top 1000 WHERE timestamp <= @ExpirationDate
Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?
重复调用此方法直到没有删除任何行(使用@@ rowcount检查)或者您没有时间。这种技术可以在JPA中实现吗?
#4
0
Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.
通过获取有资格清理的行的排序列表并使用setFirstResult(int)与setMaxResults(int)相同来解决问题。通过这种方式,我可以从最旧的项目获得大约maxCount步骤的项目顺序。
Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
expired.setMaxResults(maxCount);
expired.setFirstResult(maxCount);
@SuppressWarnings("unchecked")
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;
The function returns true (at least) if it should be executed again.
如果应该再次执行该函数,则该函数返回true(至少)。
#1
2
You will still face transaction expiration issues with the solution you have.
您仍将面临使用您的解决方案的事务过期问题。
The trick is to execute each chunk in a separate transaction as shown below in pesudo code.
诀窍是在单独的事务中执行每个块,如下面的pesudo代码所示。
@Entity
@NamedQueries ( value = {
@NamedQuery (
name = pagedDeleteExpiredItems
query= DELETE FROM MyTable
WHERE (<table key>) IN (
SELECT <table key> FROM (
SELECT ROWNUM AS row_num, <table key> FROM MyTable
WHERE timestamp <= :currentTime
)
WHERE row_num < :pageSize
)
)
})
public class MyEntity {
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
Query query = em.createNamedQuery("pagedDeleteExpiredItems");
query.setParameter("currentTime", currentTime);
query.setParameter("pageSize", pageSize);
int deleteCount = query.executeUpdate();
return deleteCount;
}
}
@EJBTimer
public class DeleteExpiredItemsTimer {
@EJB(beanName = "MyEntity")
MyEntity myEntity;
@TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
void handleTimeout(Timer timer) {
Date currentTime = getCurrentTime()
int pageSize = 100
int deleteCount;
do {
myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
} while(deleteCount>0);
}
}
#2
1
We had a similar requirement and here is how we solved it. I was using EJB 3.0.
我们有类似的要求,这就是我们如何解决它。我在使用EJB 3.0。
- Timer is started when the app. server starts (or the module is deployed) in a
ServletContextListener
. - 应用程序启动计时器。服务器在ServletContextListener中启动(或部署模块)。
- When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
- 当计时器触发时,它最多可处理100行待处理的行。然后,您需要订购查询结果并限制行数。
- If there was 100 rows, the timer schedules the next timeout with
0ms
. That it, the transaction is committed, and the timer fires again in a new transaction. - 如果有100行,则计时器以0ms调度下一个超时。它,事务被提交,并且计时器在新事务中再次触发。
- If there was less than 100 rows, the timer schedule the next timeout in
90sec
. - 如果少于100行,则定时器将在90秒内安排下一个超时。
If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.
如果有250行,则计时器按顺序触发三次。如果正好要处理100行,那么只有一个小问题,在这种情况下,计时器按顺序触发两次,但第二次触发实际上没有任何处理。但总而言之,它运作正常。
#3
1
One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:
我在SQL中使用的一个技巧是删除TOP 1000(或100或10000,取决于页面中的平均行数),如下所示:
DELETE top 1000 WHERE timestamp <= @ExpirationDate
Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?
重复调用此方法直到没有删除任何行(使用@@ rowcount检查)或者您没有时间。这种技术可以在JPA中实现吗?
#4
0
Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.
通过获取有资格清理的行的排序列表并使用setFirstResult(int)与setMaxResults(int)相同来解决问题。通过这种方式,我可以从最旧的项目获得大约maxCount步骤的项目顺序。
Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
expired.setMaxResults(maxCount);
expired.setFirstResult(maxCount);
@SuppressWarnings("unchecked")
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;
The function returns true (at least) if it should be executed again.
如果应该再次执行该函数,则该函数返回true(至少)。