如何快速检查数据库中是否存在大量记录?

时间:2023-01-01 16:27:39

At some point in my Rails application, I retrieve a large number of ActiveRecord objects from the cache. However, it's possible that some of these records have been deleted from the database itself since they were stored in the cache, so I loop the records and check each one to see if it exists. This takes quite a lot of time. Is there a more efficient way to do this?

在我的Rails应用程序的某个时候,我从缓存中检索了大量的ActiveRecord对象。但是,由于其中一些记录存储在缓存中,因此可能已经从数据库本身中删除,所以我对这些记录进行循环,并检查每个记录是否存在。这需要很多时间。有更有效的方法吗?

4 个解决方案

#1


2  

Is there a reason why you are not deleting the records from the cache when they are deleted from the database?

当从数据库中删除记录时,为什么不从缓存中删除它们呢?

If you are going to be storing these records in the cache and need them in sync with the db, then when you remove them from the db make sure to remove their existence from the cache as well, therefore saving yourself the expensive query of having to check for redundant data later.

如果你要在缓存中存储这些记录,需要与数据库同步,当你删除它们从db确保从缓存中移除他们的存在,因此,节省昂贵的查询必须检查冗余数据。

#2


1  

This could also be considered a db design problem and not a really rails issue. Taking that point of view, can you add an AUTO INCREMENT field with a unique index to your table?

这也可以被认为是一个db设计问题,而不是一个真正的rails问题。从这个角度来看,是否可以向表中添加具有惟一索引的自动增量字段?

The active record query interface has to rely on the database ultimately for lookups even when doing a record existence check. So no matter how good the interface is if the db has to do lots of work it will take time, and is not a rails "fault". Make it as fast as possible for the db to validate the record you want.

活动记录查询接口必须依赖于数据库,即使在进行记录存在检查时也需要查找。因此,无论接口有多好,如果db要做很多工作,它都需要时间,而不是rails的“错误”。让db尽可能快地验证您想要的记录。

If you are familiar with oracle, this is the same idea as storing an oracle rowid in a query to be able to validate an existing record later on.

如果您熟悉oracle,这与将oracle rowid存储在查询中以便稍后验证现有记录是相同的思想。

As Danny seems to indicate, maybe caching loads of records and using them a lot later is bad idea for your app. Can you read, and then immediately process your record?

正如Danny所指出的,也许缓存记录并在以后大量使用它们对你的应用来说不是一个好主意。

Neither of these suggestions is a quick fix.

这些建议都不是一个快速解决方案。

#3


0  

If the number of records you are checking is truly massive, then you may be able to amortize the cost of shipping them one-by-one by doing a bulk transfer: create a temporary table, do a big insert into it of the rows you pulled out of your cache, and then join the temporary table against the original table. Your DBMS will then do the looping for you.

如果你检查记录的数量确实是巨大的,那么你可以摊销的成本航运他们一个一个通过批量转移:创建一个临时表,做一个大的插入的行你拿出你的缓存中,然后加入对原始表的临时表。然后,DBMS将为您执行循环。

#4


0  

If the results from cache include the primary keys of the records you're interested in, you can pretty easily filter the results by selecting just those keys from the database and seeing what comes back. Then just kick out stale records and you're good to go.

如果缓存的结果包含您感兴趣的记录的主键,那么您可以通过从数据库中选择这些键并查看返回的结果来很容易地过滤结果。然后把陈旧的唱片踢出去,你就可以走了。

results_from_cache = $redis.get("users")

cached_user_ids = results_from_cache.map(&:id)
actual_user_ids = User.where(id: user_ids).pluck(:id)

results_minus_stale = results_from_cache.select do |user|
  actual_user_ids.include?(user.id)
end

#1


2  

Is there a reason why you are not deleting the records from the cache when they are deleted from the database?

当从数据库中删除记录时,为什么不从缓存中删除它们呢?

If you are going to be storing these records in the cache and need them in sync with the db, then when you remove them from the db make sure to remove their existence from the cache as well, therefore saving yourself the expensive query of having to check for redundant data later.

如果你要在缓存中存储这些记录,需要与数据库同步,当你删除它们从db确保从缓存中移除他们的存在,因此,节省昂贵的查询必须检查冗余数据。

#2


1  

This could also be considered a db design problem and not a really rails issue. Taking that point of view, can you add an AUTO INCREMENT field with a unique index to your table?

这也可以被认为是一个db设计问题,而不是一个真正的rails问题。从这个角度来看,是否可以向表中添加具有惟一索引的自动增量字段?

The active record query interface has to rely on the database ultimately for lookups even when doing a record existence check. So no matter how good the interface is if the db has to do lots of work it will take time, and is not a rails "fault". Make it as fast as possible for the db to validate the record you want.

活动记录查询接口必须依赖于数据库,即使在进行记录存在检查时也需要查找。因此,无论接口有多好,如果db要做很多工作,它都需要时间,而不是rails的“错误”。让db尽可能快地验证您想要的记录。

If you are familiar with oracle, this is the same idea as storing an oracle rowid in a query to be able to validate an existing record later on.

如果您熟悉oracle,这与将oracle rowid存储在查询中以便稍后验证现有记录是相同的思想。

As Danny seems to indicate, maybe caching loads of records and using them a lot later is bad idea for your app. Can you read, and then immediately process your record?

正如Danny所指出的,也许缓存记录并在以后大量使用它们对你的应用来说不是一个好主意。

Neither of these suggestions is a quick fix.

这些建议都不是一个快速解决方案。

#3


0  

If the number of records you are checking is truly massive, then you may be able to amortize the cost of shipping them one-by-one by doing a bulk transfer: create a temporary table, do a big insert into it of the rows you pulled out of your cache, and then join the temporary table against the original table. Your DBMS will then do the looping for you.

如果你检查记录的数量确实是巨大的,那么你可以摊销的成本航运他们一个一个通过批量转移:创建一个临时表,做一个大的插入的行你拿出你的缓存中,然后加入对原始表的临时表。然后,DBMS将为您执行循环。

#4


0  

If the results from cache include the primary keys of the records you're interested in, you can pretty easily filter the results by selecting just those keys from the database and seeing what comes back. Then just kick out stale records and you're good to go.

如果缓存的结果包含您感兴趣的记录的主键,那么您可以通过从数据库中选择这些键并查看返回的结果来很容易地过滤结果。然后把陈旧的唱片踢出去,你就可以走了。

results_from_cache = $redis.get("users")

cached_user_ids = results_from_cache.map(&:id)
actual_user_ids = User.where(id: user_ids).pluck(:id)

results_minus_stale = results_from_cache.select do |user|
  actual_user_ids.include?(user.id)
end