创建一个索引，或者添加到主键并创建一个新表?

In Cassandra, I have a table with columns (a,b,c). I either need to query SELECT * FROM {table} WHERE a = ? and b = ? and SELECT * FROM {table} WHERE a = ? and c = ?.

在Cassandra，我有一个列(a,b,c)列。我需要查询SELECT * FROM {table} WHERE a = ?和b = ?选择* FROM {table} WHERE a = ?和c = ?。

In this case, what should I make the primary key? Could I make two tables with PRIMARY KEY(a,b) and PRIMARY KEY(a,c), because Cassandra needs the entirety of the partition key and/or non-partition keys in the order they are listed? Or could I do something like PRIMARY KEY(a) and create an INDEX on b and c?

在这种情况下，我应该使用什么作为主键?我是否可以用主键(a,b)和主键(a,c)创建两个表，因为Cassandra需要按照列出的顺序完整的分区键和/或非分区键?或者我可以做一些像主键(a)和在b和c上创建索引吗?

Basically, should the primary key only contain the minimum number of values required for uniqueness (and choosing an appropriate partition key from these values)? Will performance improve if I add other columns to the primary key because I need to query them?

基本上，主键应该只包含惟一性所需的最小值数(并从这些值中选择适当的分区键)吗?如果我将其他列添加到主键，因为我需要查询它们，性能是否会有所改善?

1 个解决方案

#1

As noted above, a well-grounded answer can only be given if you provide more information about the cardinality of the a, b and c columns. Also make sure you understand the meaning of partitioning key and clustering key - they are both part of primary key, and have a huge impact on your design.

如上所述，只有当您提供关于a、b和c列的基数的更多信息时，才能给出一个可靠的答案。还要确保您理解分区键和集群键的含义——它们都是主键的一部分，对您的设计有很大的影响。

If you have enough distinct values in column a, you can make it a partition key, and choose one from the following two approaches:

如果在a列中有足够的不同值，可以将其设置为分区键，并从以下两种方法中选择一种:

1) separate table for each query

1)每个查询单独的表

CREATE TABLE table1_by_ab (
  a int, b int, c int, 
  PRIMARY KEY (a, b));

CREATE TABLE table1_by_ac (
  a int, b int, c int, 
  PRIMARY KEY (a, c));

2) one table for the more frequent query, and index for the other column:

2)一个表用于更频繁的查询，另一个表用于索引:

CREATE TABLE table2 (
  a int, b int, c int, 
  PRIMARY KEY (a, b));

CREATE INDEX ON table2 (c);

In both cases you can execute your queries on (a,b) and (a,c). Usually it is recommended to avoid secondary indexes, but in case 2) your query on (a,c) pre-selects the partition key (field a), so the secondary index can be executed on a single node, and its performance will not be bad.

在这两种情况下，您都可以对(a,b)和(a,c)执行查询。通常建议避免二级索引，但在情况2)中，您对(a,c)的查询会预先选择分区键(字段a)，因此可以在单个节点上执行二级索引，其性能不会很差。

If you don't have enough distinct values in column a, then you cannot make it a partitioning key, you will need to duplicate your tables, both with a compound partitioning key:

如果在a列中没有足够的不同值，那么就不能将其作为分区键，需要使用复合分区键来复制表:

CREATE TABLE table3_by_ab (
  a int, b int, c int, 
  PRIMARY KEY ((a, b)));

CREATE TABLE table3_by_ac (
  a int, b int, c int, 
  PRIMARY KEY ((a, c)));

Hope this helps

希望这有助于

#1