使用Cassandra中的计数器、过期列进行数据建模

时间:2021-07-20 16:53:26

The question is directed to experienced Cassandra developers. I need to count how many times and when each user accessed some resource. I have data structure like this (CQL):

这个问题是针对经验丰富的Cassandra开发人员的。我需要计算每个用户访问某个资源的次数和时间。我有这样的数据结构(CQL):

CREATE TABLE IF NOT EXISTS access_counter_table (
  access_number counter,
  resource_id varchar,
  user_id varchar,
  dateutc varchar,
  PRIMARY KEY (user_id, dateutc, resource_id)
);

I need to get an information about how many times user has accessed resources for last N days. So, to get last 7 days I make requests like this:

我需要获得关于用户在过去N天内访问资源的次数的信息。所以,在过去的7天里,我提出这样的要求:

SELECT * FROM access_counter_table
  WHERE
    user_id = 'user_1'
    AND dateutc > '2015-04-03'
    AND dateutc <= '2015-04-10' ;

And I get something like this:

我得到了这样的结果:

user_1 : 2015-04-10 : [resource1:1, resource2:4]
user_1 : 2015-04-09 : [resource1:3]
user_1 : 2015-04-08 : [resource1:1, resource3:2]
...

So, my problem is: old data must be deleted after some time, but Cassandra does not allow set EXPIRE TTL to counter tables.

因此,我的问题是:必须在一段时间后删除旧数据,但Cassandra不允许设置过期TTL来计数器表。

I have millions of access events per hour (and it could billions). And after 7 days those records will be useless.

我每小时有数百万个访问事件(可能有数十亿个)。7天后,这些记录就没用了。

  • How can I clear them? Or make something like garbage collector in Cassandra? Is this a good approach?
  • 我如何清除它们?或者在卡桑德拉做垃圾收集?这是一个好方法吗?
  • Maybe I need to use another data model for this? What could it be?
  • 也许我需要使用另一个数据模型?可能是什么病呢?

Thanks.

谢谢。

1 个解决方案

#1


2  

As you've found, Cassandra does not support TTLs on Counter columns. In fact, deletes on counters in Cassandra are problematic in general (once you delete a counter, you essentially cannot reuse it for a while).

正如您所发现的,Cassandra不支持计数器列上的TTLs。事实上,在Cassandra的计数器上的删除通常是有问题的(一旦你删除了一个计数器,你就无法在一段时间内重用它)。

If you need automatic expiration, you can model it using an int field, and perhaps use external locking (such as zookeeper), request routing (only allow one writer to access a particular partition), or Lightweight transactions to safely increment that integer field with a TTL.

如果您需要自动过期,您可以使用int字段对其进行建模,或者使用外部锁定(例如zookeeper),请求路由(只允许一个作者访问特定的分区),或者使用轻量级事务来安全地使用TTL增加该整数字段。

Alternatively, you can page through the table of counters and remove "old" counters manually with DELETE on a scheduled task. This is less elegant, and doesn't scale as well, but may work in some cases.

或者,您可以在计数器表中进行分页,并使用DELETE对计划任务手动删除“旧”计数器。这就不那么优雅了,也不那么规模化,但在某些情况下可能行得通。

#1


2  

As you've found, Cassandra does not support TTLs on Counter columns. In fact, deletes on counters in Cassandra are problematic in general (once you delete a counter, you essentially cannot reuse it for a while).

正如您所发现的,Cassandra不支持计数器列上的TTLs。事实上,在Cassandra的计数器上的删除通常是有问题的(一旦你删除了一个计数器,你就无法在一段时间内重用它)。

If you need automatic expiration, you can model it using an int field, and perhaps use external locking (such as zookeeper), request routing (only allow one writer to access a particular partition), or Lightweight transactions to safely increment that integer field with a TTL.

如果您需要自动过期,您可以使用int字段对其进行建模,或者使用外部锁定(例如zookeeper),请求路由(只允许一个作者访问特定的分区),或者使用轻量级事务来安全地使用TTL增加该整数字段。

Alternatively, you can page through the table of counters and remove "old" counters manually with DELETE on a scheduled task. This is less elegant, and doesn't scale as well, but may work in some cases.

或者,您可以在计数器表中进行分页,并使用DELETE对计划任务手动删除“旧”计数器。这就不那么优雅了,也不那么规模化,但在某些情况下可能行得通。