从EJB Timer中删除大量行

时间:2021-08-07 09:20:42

I need to delete millions of rows from a table from within an EJB Timer. The problem is that the timer has a transaction timeout of 90 seconds, so I should divide the work into bite-size chunks.

我需要从EJB Timer中删除表中的数百万行。问题是计时器的事务超时为90秒,所以我应该将工作分成一口大小的块。

Since I don't know how many rows can be deleted in 90 seconds the algorithm should loop and delete a few at a time until the time is almost up.


The problem is: How can the number of rows to delete be limited elegantly in JPA? The delete is made on all rows having a timestamp earlier than a certain date.


I guess it is possible to find the 1000th oldest row and DELETE WHERE timestamp <= {1000th-oldest-row.timestamp} This, however, is not very elegant and I would have to get to the last row in a 1000 to get the timestamp.

我想有可能找到第1000个最老的行和DELETE WHERE时间戳<= {1000th-oldest-row.timestamp}然而,这不是很优雅,我必须到达1000的最后一行才能获得时间戳。

Secondly, the timer should trigger immediately again if the table is not clean after the 90 seconds. This can be easily solved but, again, is not very elegant.


4 个解决方案



You will still face transaction expiration issues with the solution you have.


The trick is to execute each chunk in a separate transaction as shown below in pesudo code.



@NamedQueries ( value = {
    @NamedQuery (
        name = pagedDeleteExpiredItems
        query=    DELETE FROM MyTable
            WHERE (<table key>) IN (
                SELECT <table key> FROM (
                SELECT ROWNUM AS row_num, <table key> FROM MyTable
                WHERE timestamp <= :currentTime
                WHERE row_num < :pageSize

public class MyEntity {
    int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
        Query query = em.createNamedQuery("pagedDeleteExpiredItems");
        query.setParameter("currentTime", currentTime);
        query.setParameter("pageSize", pageSize);
        int deleteCount = query.executeUpdate();
        return deleteCount;

public class DeleteExpiredItemsTimer {

    @EJB(beanName = "MyEntity")
    MyEntity myEntity;

    void handleTimeout(Timer timer) {
        Date currentTime = getCurrentTime()
        int pageSize = 100
        int deleteCount;
        do {
            myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
        } while(deleteCount>0);



We had a similar requirement and here is how we solved it. I was using EJB 3.0.

我们有类似的要求,这就是我们如何解决它。我在使用EJB 3.0。

  1. Timer is started when the app. server starts (or the module is deployed) in a ServletContextListener.
  2. 应用程序启动计时器。服务器在ServletContextListener中启动(或部署模块)。
  3. When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
  4. 当计时器触发时,它最多可处理100行待处理的行。然后,您需要订购查询结果并限制行数。
  5. If there was 100 rows, the timer schedules the next timeout with 0ms. That it, the transaction is committed, and the timer fires again in a new transaction.
  6. 如果有100行,则计时器以0ms调度下一个超时。它,事务被提交,并且计时器在新事务中再次触发。
  7. If there was less than 100 rows, the timer schedule the next timeout in 90sec.
  8. 如果少于100行,则定时器将在90秒内安排下一个超时。

If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.




One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:

我在SQL中使用的一个技巧是删除TOP 1000(或100或10000,取决于页面中的平均行数),如下所示:

DELETE top 1000 WHERE timestamp <= @ExpirationDate

Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?

重复调用此方法直到没有删除任何行(使用@@ rowcount检查)或者您没有时间。这种技术可以在JPA中实现吗?



Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.


Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;

The function returns true (at least) if it should be executed again.




You will still face transaction expiration issues with the solution you have.


The trick is to execute each chunk in a separate transaction as shown below in pesudo code.



@NamedQueries ( value = {
    @NamedQuery (
        name = pagedDeleteExpiredItems
        query=    DELETE FROM MyTable
            WHERE (<table key>) IN (
                SELECT <table key> FROM (
                SELECT ROWNUM AS row_num, <table key> FROM MyTable
                WHERE timestamp <= :currentTime
                WHERE row_num < :pageSize

public class MyEntity {
    int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
        Query query = em.createNamedQuery("pagedDeleteExpiredItems");
        query.setParameter("currentTime", currentTime);
        query.setParameter("pageSize", pageSize);
        int deleteCount = query.executeUpdate();
        return deleteCount;

public class DeleteExpiredItemsTimer {

    @EJB(beanName = "MyEntity")
    MyEntity myEntity;

    void handleTimeout(Timer timer) {
        Date currentTime = getCurrentTime()
        int pageSize = 100
        int deleteCount;
        do {
            myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
        } while(deleteCount>0);



We had a similar requirement and here is how we solved it. I was using EJB 3.0.

我们有类似的要求,这就是我们如何解决它。我在使用EJB 3.0。

  1. Timer is started when the app. server starts (or the module is deployed) in a ServletContextListener.
  2. 应用程序启动计时器。服务器在ServletContextListener中启动(或部署模块)。
  3. When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
  4. 当计时器触发时,它最多可处理100行待处理的行。然后,您需要订购查询结果并限制行数。
  5. If there was 100 rows, the timer schedules the next timeout with 0ms. That it, the transaction is committed, and the timer fires again in a new transaction.
  6. 如果有100行,则计时器以0ms调度下一个超时。它,事务被提交,并且计时器在新事务中再次触发。
  7. If there was less than 100 rows, the timer schedule the next timeout in 90sec.
  8. 如果少于100行,则定时器将在90秒内安排下一个超时。

If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.




One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:

我在SQL中使用的一个技巧是删除TOP 1000(或100或10000,取决于页面中的平均行数),如下所示:

DELETE top 1000 WHERE timestamp <= @ExpirationDate

Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?

重复调用此方法直到没有删除任何行(使用@@ rowcount检查)或者您没有时间。这种技术可以在JPA中实现吗?



Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.


Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;

The function returns true (at least) if it should be executed again.
