The question is directed to experienced Cassandra developers. I need to count how many times and when each user accessed some resource. I have data structure like this (CQL):
这个问题是针对经验丰富的Cassandra开发人员的。我需要计算每个用户访问某个资源的次数和时间。我有这样的数据结构(CQL):
CREATE TABLE IF NOT EXISTS access_counter_table (
access_number counter,
resource_id varchar,
user_id varchar,
dateutc varchar,
PRIMARY KEY (user_id, dateutc, resource_id)
);
I need to get an information about how many times user has accessed resources for last N days. So, to get last 7 days I make requests like this:
我需要获得关于用户在过去N天内访问资源的次数的信息。所以,在过去的7天里,我提出这样的要求:
SELECT * FROM access_counter_table
WHERE
user_id = 'user_1'
AND dateutc > '2015-04-03'
AND dateutc <= '2015-04-10' ;
And I get something like this:
我得到了这样的结果:
user_1 : 2015-04-10 : [resource1:1, resource2:4]
user_1 : 2015-04-09 : [resource1:3]
user_1 : 2015-04-08 : [resource1:1, resource3:2]
...
So, my problem is: old data must be deleted after some time, but Cassandra does not allow set EXPIRE TTL to counter tables.
因此,我的问题是:必须在一段时间后删除旧数据,但Cassandra不允许设置过期TTL来计数器表。
I have millions of access events per hour (and it could billions). And after 7 days those records will be useless.
我每小时有数百万个访问事件(可能有数十亿个)。7天后,这些记录就没用了。
- How can I clear them? Or make something like garbage collector in Cassandra? Is this a good approach?
- 我如何清除它们?或者在卡桑德拉做垃圾收集?这是一个好方法吗?
- Maybe I need to use another data model for this? What could it be?
- 也许我需要使用另一个数据模型?可能是什么病呢?
Thanks.
谢谢。
1 个解决方案
#1
2
As you've found, Cassandra does not support TTLs on Counter columns. In fact, deletes on counters in Cassandra are problematic in general (once you delete a counter, you essentially cannot reuse it for a while).
正如您所发现的,Cassandra不支持计数器列上的TTLs。事实上,在Cassandra的计数器上的删除通常是有问题的(一旦你删除了一个计数器,你就无法在一段时间内重用它)。
If you need automatic expiration, you can model it using an int field, and perhaps use external locking (such as zookeeper), request routing (only allow one writer to access a particular partition), or Lightweight transactions to safely increment that integer field with a TTL.
如果您需要自动过期,您可以使用int字段对其进行建模,或者使用外部锁定(例如zookeeper),请求路由(只允许一个作者访问特定的分区),或者使用轻量级事务来安全地使用TTL增加该整数字段。
Alternatively, you can page through the table of counters and remove "old" counters manually with DELETE on a scheduled task. This is less elegant, and doesn't scale as well, but may work in some cases.
或者,您可以在计数器表中进行分页,并使用DELETE对计划任务手动删除“旧”计数器。这就不那么优雅了,也不那么规模化,但在某些情况下可能行得通。
#1
2
As you've found, Cassandra does not support TTLs on Counter columns. In fact, deletes on counters in Cassandra are problematic in general (once you delete a counter, you essentially cannot reuse it for a while).
正如您所发现的,Cassandra不支持计数器列上的TTLs。事实上,在Cassandra的计数器上的删除通常是有问题的(一旦你删除了一个计数器,你就无法在一段时间内重用它)。
If you need automatic expiration, you can model it using an int field, and perhaps use external locking (such as zookeeper), request routing (only allow one writer to access a particular partition), or Lightweight transactions to safely increment that integer field with a TTL.
如果您需要自动过期,您可以使用int字段对其进行建模,或者使用外部锁定(例如zookeeper),请求路由(只允许一个作者访问特定的分区),或者使用轻量级事务来安全地使用TTL增加该整数字段。
Alternatively, you can page through the table of counters and remove "old" counters manually with DELETE on a scheduled task. This is less elegant, and doesn't scale as well, but may work in some cases.
或者,您可以在计数器表中进行分页,并使用DELETE对计划任务手动删除“旧”计数器。这就不那么优雅了,也不那么规模化,但在某些情况下可能行得通。