I would like to store key-value pair in Cassandra and have entry Automatically deleted in LRU fashion when a fixed storage size is reached.
我想在Cassandra中存储键值对,并且当达到固定的存储大小时,以LRU方式自动删除条目。
Is it possible to do this using Cassandra, if so what would be the best way to do it. If not is there any other distributed storage system that would support this use-case while not having to keep all data in memory.
是否有可能使用Cassandra来做到这一点,如果是这样,最好的方法是什么。如果没有,那么可以支持这种用例的任何其他分布式存储系统,而不必将所有数据保存在内存中。
2 个解决方案
#1
2
The short answer is, no, Cassandra does not support LRU out of the box.
简短的回答是,不,Cassandra不支持开箱即用的LRU。
You could, if you really wanted to, build a LRU layer in your app on Cassandra to accomplish the same effect. This could be done several ways, but generally you'd want to maintain a separate index of the cache objects along with stats/timestamps and have your app purge objects as appropriate. Even then, disk space wouldn't be a good upper limit due to the nature of how Cassandra stores its data and manages updates, deletes, etc. Cassandra doesn't free storage immediately on a Delete, rather it sets a tombstone and the old data is removed later (About Deletes).
如果你真的想,你可以在你的Cassandra应用程序中构建一个LRU层来实现同样的效果。这可以通过多种方式完成,但通常您需要维护缓存对象的单独索引以及统计信息/时间戳,并让应用程序清除对象。即便如此,由于Cassandra如何存储其数据并管理更新,删除等等,磁盘空间不会是一个很好的上限.Cassandra不会立即在删除时释放存储,而是设置一个墓碑和旧的稍后删除数据(关于删除)。
One advantage to building a custom caching layer (i.e. in Cassandra) would be that you could move beyond simple LRU eviction and ensure that more expensive objects and or objects that are used more frequently (but not recently) are weighted heavier so that they remain in the cache longer even if they might be purged by LRU. Whether this would be useful or not would depend completely on your specific use case. But again, Cassandra can get bloated by a lot of data churn and one would need to ensure that the cluster is properly tuned and getting its routine maintenance.
构建自定义缓存层(即Cassandra)的一个优点是,您可以超越简单的LRU驱逐,并确保更频繁(但不是最近)使用的更昂贵的对象和/或对象的权重更重,以便它们保持在缓存更长,即使它们可能被LRU清除。这是否有用将完全取决于您的具体用例。但同样,Cassandra可能会因大量数据流失而变得臃肿,并且需要确保群集得到适当调整并进行日常维护。
In reality most would deploy Memcache (or similar) to support this use case.
实际上,大多数人会部署Memcache(或类似的)来支持这个用例。
#2
0
Cassandra can be used as LRU, you just need to use TTL or manage the removal yourself.
Cassandra可以用作LRU,你只需要使用TTL或自己管理删除。
New data is always appended. The removed data is only marked as deleted and is physically removed during compaction. You might need to tweak the compaction.
始终附加新数据。删除的数据仅标记为已删除,并在压缩过程中被物理删除。您可能需要调整压缩。
The advantage of Cassandra is that data is persisted the moment it comes in, you don't have to fit all data into memory except extreme use cases, you can use replication not to lose data, and you can access it from multiple languages. Beware that new data may be not immediately available.
Cassandra的优势在于数据在进入时就会持久存在,除了极端用例之外,您不必将所有数据都放入内存中,您可以使用复制不丢失数据,并且可以从多种语言访问数据。请注意,新数据可能无法立即获得。
A more lightweight approach is Redis.
Redis是一种更轻量级的方法。
#1
2
The short answer is, no, Cassandra does not support LRU out of the box.
简短的回答是,不,Cassandra不支持开箱即用的LRU。
You could, if you really wanted to, build a LRU layer in your app on Cassandra to accomplish the same effect. This could be done several ways, but generally you'd want to maintain a separate index of the cache objects along with stats/timestamps and have your app purge objects as appropriate. Even then, disk space wouldn't be a good upper limit due to the nature of how Cassandra stores its data and manages updates, deletes, etc. Cassandra doesn't free storage immediately on a Delete, rather it sets a tombstone and the old data is removed later (About Deletes).
如果你真的想,你可以在你的Cassandra应用程序中构建一个LRU层来实现同样的效果。这可以通过多种方式完成,但通常您需要维护缓存对象的单独索引以及统计信息/时间戳,并让应用程序清除对象。即便如此,由于Cassandra如何存储其数据并管理更新,删除等等,磁盘空间不会是一个很好的上限.Cassandra不会立即在删除时释放存储,而是设置一个墓碑和旧的稍后删除数据(关于删除)。
One advantage to building a custom caching layer (i.e. in Cassandra) would be that you could move beyond simple LRU eviction and ensure that more expensive objects and or objects that are used more frequently (but not recently) are weighted heavier so that they remain in the cache longer even if they might be purged by LRU. Whether this would be useful or not would depend completely on your specific use case. But again, Cassandra can get bloated by a lot of data churn and one would need to ensure that the cluster is properly tuned and getting its routine maintenance.
构建自定义缓存层(即Cassandra)的一个优点是,您可以超越简单的LRU驱逐,并确保更频繁(但不是最近)使用的更昂贵的对象和/或对象的权重更重,以便它们保持在缓存更长,即使它们可能被LRU清除。这是否有用将完全取决于您的具体用例。但同样,Cassandra可能会因大量数据流失而变得臃肿,并且需要确保群集得到适当调整并进行日常维护。
In reality most would deploy Memcache (or similar) to support this use case.
实际上,大多数人会部署Memcache(或类似的)来支持这个用例。
#2
0
Cassandra can be used as LRU, you just need to use TTL or manage the removal yourself.
Cassandra可以用作LRU,你只需要使用TTL或自己管理删除。
New data is always appended. The removed data is only marked as deleted and is physically removed during compaction. You might need to tweak the compaction.
始终附加新数据。删除的数据仅标记为已删除,并在压缩过程中被物理删除。您可能需要调整压缩。
The advantage of Cassandra is that data is persisted the moment it comes in, you don't have to fit all data into memory except extreme use cases, you can use replication not to lose data, and you can access it from multiple languages. Beware that new data may be not immediately available.
Cassandra的优势在于数据在进入时就会持久存在,除了极端用例之外,您不必将所有数据都放入内存中,您可以使用复制不丢失数据,并且可以从多种语言访问数据。请注意,新数据可能无法立即获得。
A more lightweight approach is Redis.
Redis是一种更轻量级的方法。