Cassandra,设计用户喜欢项目表

时间:2021-04-29 12:58:05

I have a list of items, I'd like to know if current user has liked some of them. I'm wondering how to design my table as it's the first one I'm doing in C*. So I want to know if I'm heading towards the right direction :

我有一个项目列表,我想知道当前用户是否喜欢其中的一些。我想知道如何设计我的表格,因为这是我在C*中做的第一个。所以我想知道我的方向是否正确:

I was thinking about having userID as primary key and item liked as clustering column.

我想把userID作为主键,把喜欢的项目作为集群列。

The problem I see with this is that if an user likes so much things the partition won't fit on a node (so I lose the data ?). I have no idea how many item an user has to like for that to happen but my guess is that it's not even doable for a human. The thing is that the eventuality is still there and it bothers me. Also what if there is already a lot of data on a node, does it mean the amount of item an user has to like is lower in order for the partition to be too big for a node (since there is less available memory)?

我看到的问题是,如果一个用户喜欢很多东西,那么这个分区就不适合一个节点(所以我丢失了数据?)我不知道有多少用户会喜欢这样的东西,但我猜这对人类来说甚至都不可行。问题是,这种可能性仍然存在,它困扰着我。另外,如果节点上已经有很多数据,这是否意味着用户必须喜欢的项的数量更低,以便分区对节点来说太大(因为可用内存更少)?

1 个解决方案

#1


2  

Your statement is correct when you say that all the data will be in a single node and if there is insufficient space on that node the write will fail. If you worried about this you could also add something like a "timestamp" or "bucket" column to your primary key in order to reduce the size of you partition.

当您说所有数据都将位于一个节点中,并且该节点上空间不足时,写入将失败,您的语句是正确的。如果您对此感到担心,还可以在主键中添加“时间戳”或“bucket”列,以减少分区的大小。

Cassandra has a hard limitation of 2 billion cells per partition but in practical terms I believe that the advice is to keep a partition <100 MB in Cassandra 2.0 and earlier and <200-300 MB in Cassandra 2.1 and later. If I were you I would do a bit of a calculation to see how many items a person would need to like in order to get near these limits and decide if this is a limitation you are willing to accept. You can get a good description how to do that here.

Cassandra对每个分区有20亿个单元的严格限制,但实际上我认为建议在Cassandra 2.0中保持一个分区小于100 MB,在Cassandra 2.1中保持小于200-300 MB。如果我是你,我会做一些计算,看看一个人需要多少东西才能接近这些极限,并决定这是否是一个你愿意接受的限制。你可以在这里得到一个很好的描述。

#1


2  

Your statement is correct when you say that all the data will be in a single node and if there is insufficient space on that node the write will fail. If you worried about this you could also add something like a "timestamp" or "bucket" column to your primary key in order to reduce the size of you partition.

当您说所有数据都将位于一个节点中,并且该节点上空间不足时,写入将失败,您的语句是正确的。如果您对此感到担心,还可以在主键中添加“时间戳”或“bucket”列,以减少分区的大小。

Cassandra has a hard limitation of 2 billion cells per partition but in practical terms I believe that the advice is to keep a partition <100 MB in Cassandra 2.0 and earlier and <200-300 MB in Cassandra 2.1 and later. If I were you I would do a bit of a calculation to see how many items a person would need to like in order to get near these limits and decide if this is a limitation you are willing to accept. You can get a good description how to do that here.

Cassandra对每个分区有20亿个单元的严格限制,但实际上我认为建议在Cassandra 2.0中保持一个分区小于100 MB,在Cassandra 2.1中保持小于200-300 MB。如果我是你,我会做一些计算,看看一个人需要多少东西才能接近这些极限,并决定这是否是一个你愿意接受的限制。你可以在这里得到一个很好的描述。