从EJB Timer中删除大量行

时间:2021-08-07 09:20:42

I need to delete millions of rows from a table from within an EJB Timer. The problem is that the timer has a transaction timeout of 90 seconds, so I should divide the work into bite-size chunks.

我需要从EJB Timer中删除表中的数百万行。问题是计时器的事务超时为90秒,所以我应该将工作分成一口大小的块。

Since I don't know how many rows can be deleted in 90 seconds the algorithm should loop and delete a few at a time until the time is almost up.

由于我不知道在90秒内可以删除多少行,因此算法应该循环并一次删除几个行,直到时间几乎结束。

The problem is: How can the number of rows to delete be limited elegantly in JPA? The delete is made on all rows having a timestamp earlier than a certain date.

问题是:如何在JPA中优雅地限制要删除的行数?删除是在时间戳早于某个日期的所有行上进行的。

I guess it is possible to find the 1000th oldest row and DELETE WHERE timestamp <= {1000th-oldest-row.timestamp} This, however, is not very elegant and I would have to get to the last row in a 1000 to get the timestamp.

我想有可能找到第1000个最老的行和DELETE WHERE时间戳<= {1000th-oldest-row.timestamp}然而,这不是很优雅,我必须到达1000的最后一行才能获得时间戳。

Secondly, the timer should trigger immediately again if the table is not clean after the 90 seconds. This can be easily solved but, again, is not very elegant.

其次,如果在90秒后桌子不干净,计时器应立即再次触发。这很容易解决,但同样不是很优雅。

4 个解决方案

#1


2  

You will still face transaction expiration issues with the solution you have.

您仍将面临使用您的解决方案的事务过期问题。

The trick is to execute each chunk in a separate transaction as shown below in pesudo code.

诀窍是在单独的事务中执行每个块,如下面的pesudo代码所示。

@Entity

@NamedQueries ( value = {
    @NamedQuery (
        name = pagedDeleteExpiredItems
        query=    DELETE FROM MyTable
            WHERE (<table key>) IN (
                SELECT <table key> FROM (
                SELECT ROWNUM AS row_num, <table key> FROM MyTable
                WHERE timestamp <= :currentTime
                )
                WHERE row_num < :pageSize
            )
    )
})

public class MyEntity {
    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
        Query query = em.createNamedQuery("pagedDeleteExpiredItems");
        query.setParameter("currentTime", currentTime);
        query.setParameter("pageSize", pageSize);
        int deleteCount = query.executeUpdate();
        return deleteCount;
    }
}


@EJBTimer
public class DeleteExpiredItemsTimer {

    @EJB(beanName = "MyEntity")
    MyEntity myEntity;

    @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
    void handleTimeout(Timer timer) {
        Date currentTime = getCurrentTime()
        int pageSize = 100
        int deleteCount;
        do {
            myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
        } while(deleteCount>0);
    }
}

#2


1  

We had a similar requirement and here is how we solved it. I was using EJB 3.0.

我们有类似的要求,这就是我们如何解决它。我在使用EJB 3.0。

  1. Timer is started when the app. server starts (or the module is deployed) in a ServletContextListener.
  2. 应用程序启动计时器。服务器在ServletContextListener中启动(或部署模块)。
  3. When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
  4. 当计时器触发时,它最多可处理100行待处理的行。然后,您需要订购查询结果并限制行数。
  5. If there was 100 rows, the timer schedules the next timeout with 0ms. That it, the transaction is committed, and the timer fires again in a new transaction.
  6. 如果有100行,则计时器以0ms调度下一个超时。它,事务被提交,并且计时器在新事务中再次触发。
  7. If there was less than 100 rows, the timer schedule the next timeout in 90sec.
  8. 如果少于100行,则定时器将在90秒内安排下一个超时。

If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.

如果有250行,则计时器按顺序触发三次。如果正好要处理100行,那么只有一个小问题,在这种情况下,计时器按顺序触发两次,但第二次触发实际上没有任何处理。但总而言之,它运作正常。

#3


1  

One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:

我在SQL中使用的一个技巧是删除TOP 1000(或100或10000,取决于页面中的平均行数),如下所示:

DELETE top 1000 WHERE timestamp <= @ExpirationDate

Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?

重复调用此方法直到没有删除任何行(使用@@ rowcount检查)或者您没有时间。这种技术可以在JPA中实现吗?

#4


0  

Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.

通过获取有资格清理的行的排序列表并使用setFirstResult(int)与setMaxResults(int)相同来解决问题。通过这种方式,我可以从最旧的项目获得大约maxCount步骤的项目顺序。

Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
expired.setMaxResults(maxCount);
expired.setFirstResult(maxCount);
@SuppressWarnings("unchecked")
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;

The function returns true (at least) if it should be executed again.

如果应该再次执行该函数,则该函数返回true(至少)。

#1


2  

You will still face transaction expiration issues with the solution you have.

您仍将面临使用您的解决方案的事务过期问题。

The trick is to execute each chunk in a separate transaction as shown below in pesudo code.

诀窍是在单独的事务中执行每个块,如下面的pesudo代码所示。

@Entity

@NamedQueries ( value = {
    @NamedQuery (
        name = pagedDeleteExpiredItems
        query=    DELETE FROM MyTable
            WHERE (<table key>) IN (
                SELECT <table key> FROM (
                SELECT ROWNUM AS row_num, <table key> FROM MyTable
                WHERE timestamp <= :currentTime
                )
                WHERE row_num < :pageSize
            )
    )
})

public class MyEntity {
    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
        Query query = em.createNamedQuery("pagedDeleteExpiredItems");
        query.setParameter("currentTime", currentTime);
        query.setParameter("pageSize", pageSize);
        int deleteCount = query.executeUpdate();
        return deleteCount;
    }
}


@EJBTimer
public class DeleteExpiredItemsTimer {

    @EJB(beanName = "MyEntity")
    MyEntity myEntity;

    @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
    void handleTimeout(Timer timer) {
        Date currentTime = getCurrentTime()
        int pageSize = 100
        int deleteCount;
        do {
            myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
        } while(deleteCount>0);
    }
}

#2


1  

We had a similar requirement and here is how we solved it. I was using EJB 3.0.

我们有类似的要求,这就是我们如何解决它。我在使用EJB 3.0。

  1. Timer is started when the app. server starts (or the module is deployed) in a ServletContextListener.
  2. 应用程序启动计时器。服务器在ServletContextListener中启动(或部署模块)。
  3. When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
  4. 当计时器触发时,它最多可处理100行待处理的行。然后,您需要订购查询结果并限制行数。
  5. If there was 100 rows, the timer schedules the next timeout with 0ms. That it, the transaction is committed, and the timer fires again in a new transaction.
  6. 如果有100行,则计时器以0ms调度下一个超时。它,事务被提交,并且计时器在新事务中再次触发。
  7. If there was less than 100 rows, the timer schedule the next timeout in 90sec.
  8. 如果少于100行,则定时器将在90秒内安排下一个超时。

If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.

如果有250行,则计时器按顺序触发三次。如果正好要处理100行,那么只有一个小问题,在这种情况下,计时器按顺序触发两次,但第二次触发实际上没有任何处理。但总而言之,它运作正常。

#3


1  

One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:

我在SQL中使用的一个技巧是删除TOP 1000(或100或10000,取决于页面中的平均行数),如下所示:

DELETE top 1000 WHERE timestamp <= @ExpirationDate

Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?

重复调用此方法直到没有删除任何行(使用@@ rowcount检查)或者您没有时间。这种技术可以在JPA中实现吗?

#4


0  

Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.

通过获取有资格清理的行的排序列表并使用setFirstResult(int)与setMaxResults(int)相同来解决问题。通过这种方式,我可以从最旧的项目获得大约maxCount步骤的项目顺序。

Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
expired.setMaxResults(maxCount);
expired.setFirstResult(maxCount);
@SuppressWarnings("unchecked")
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;

The function returns true (at least) if it should be executed again.

如果应该再次执行该函数,则该函数返回true(至少)。