In Cassandra, I have a table with columns (a,b,c)
. I either need to query SELECT * FROM {table} WHERE a = ? and b = ?
and SELECT * FROM {table} WHERE a = ? and c = ?
.
在Cassandra,我有一个列(a,b,c)列。我需要查询SELECT * FROM {table} WHERE a = ?和b = ?选择* FROM {table} WHERE a = ?和c = ?。
In this case, what should I make the primary key? Could I make two tables with PRIMARY KEY(a,b)
and PRIMARY KEY(a,c)
, because Cassandra needs the entirety of the partition key and/or non-partition keys in the order they are listed? Or could I do something like PRIMARY KEY(a)
and create an INDEX
on b
and c
?
在这种情况下,我应该使用什么作为主键?我是否可以用主键(a,b)和主键(a,c)创建两个表,因为Cassandra需要按照列出的顺序完整的分区键和/或非分区键?或者我可以做一些像主键(a)和在b和c上创建索引吗?
Basically, should the primary key only contain the minimum number of values required for uniqueness (and choosing an appropriate partition key from these values)? Will performance improve if I add other columns to the primary key because I need to query them?
基本上,主键应该只包含惟一性所需的最小值数(并从这些值中选择适当的分区键)吗?如果我将其他列添加到主键,因为我需要查询它们,性能是否会有所改善?
1 个解决方案
#1
2
As noted above, a well-grounded answer can only be given if you provide more information about the cardinality of the a, b and c columns. Also make sure you understand the meaning of partitioning key and clustering key - they are both part of primary key, and have a huge impact on your design.
如上所述,只有当您提供关于a、b和c列的基数的更多信息时,才能给出一个可靠的答案。还要确保您理解分区键和集群键的含义——它们都是主键的一部分,对您的设计有很大的影响。
If you have enough distinct values in column a, you can make it a partition key, and choose one from the following two approaches:
如果在a列中有足够的不同值,可以将其设置为分区键,并从以下两种方法中选择一种:
1) separate table for each query
1)每个查询单独的表
CREATE TABLE table1_by_ab (
a int, b int, c int,
PRIMARY KEY (a, b));
CREATE TABLE table1_by_ac (
a int, b int, c int,
PRIMARY KEY (a, c));
2) one table for the more frequent query, and index for the other column:
2)一个表用于更频繁的查询,另一个表用于索引:
CREATE TABLE table2 (
a int, b int, c int,
PRIMARY KEY (a, b));
CREATE INDEX ON table2 (c);
In both cases you can execute your queries on (a,b) and (a,c). Usually it is recommended to avoid secondary indexes, but in case 2) your query on (a,c) pre-selects the partition key (field a), so the secondary index can be executed on a single node, and its performance will not be bad.
在这两种情况下,您都可以对(a,b)和(a,c)执行查询。通常建议避免二级索引,但在情况2)中,您对(a,c)的查询会预先选择分区键(字段a),因此可以在单个节点上执行二级索引,其性能不会很差。
If you don't have enough distinct values in column a, then you cannot make it a partitioning key, you will need to duplicate your tables, both with a compound partitioning key:
如果在a列中没有足够的不同值,那么就不能将其作为分区键,需要使用复合分区键来复制表:
CREATE TABLE table3_by_ab (
a int, b int, c int,
PRIMARY KEY ((a, b)));
CREATE TABLE table3_by_ac (
a int, b int, c int,
PRIMARY KEY ((a, c)));
Hope this helps
希望这有助于
#1
2
As noted above, a well-grounded answer can only be given if you provide more information about the cardinality of the a, b and c columns. Also make sure you understand the meaning of partitioning key and clustering key - they are both part of primary key, and have a huge impact on your design.
如上所述,只有当您提供关于a、b和c列的基数的更多信息时,才能给出一个可靠的答案。还要确保您理解分区键和集群键的含义——它们都是主键的一部分,对您的设计有很大的影响。
If you have enough distinct values in column a, you can make it a partition key, and choose one from the following two approaches:
如果在a列中有足够的不同值,可以将其设置为分区键,并从以下两种方法中选择一种:
1) separate table for each query
1)每个查询单独的表
CREATE TABLE table1_by_ab (
a int, b int, c int,
PRIMARY KEY (a, b));
CREATE TABLE table1_by_ac (
a int, b int, c int,
PRIMARY KEY (a, c));
2) one table for the more frequent query, and index for the other column:
2)一个表用于更频繁的查询,另一个表用于索引:
CREATE TABLE table2 (
a int, b int, c int,
PRIMARY KEY (a, b));
CREATE INDEX ON table2 (c);
In both cases you can execute your queries on (a,b) and (a,c). Usually it is recommended to avoid secondary indexes, but in case 2) your query on (a,c) pre-selects the partition key (field a), so the secondary index can be executed on a single node, and its performance will not be bad.
在这两种情况下,您都可以对(a,b)和(a,c)执行查询。通常建议避免二级索引,但在情况2)中,您对(a,c)的查询会预先选择分区键(字段a),因此可以在单个节点上执行二级索引,其性能不会很差。
If you don't have enough distinct values in column a, then you cannot make it a partitioning key, you will need to duplicate your tables, both with a compound partitioning key:
如果在a列中没有足够的不同值,那么就不能将其作为分区键,需要使用复合分区键来复制表:
CREATE TABLE table3_by_ab (
a int, b int, c int,
PRIMARY KEY ((a, b)));
CREATE TABLE table3_by_ac (
a int, b int, c int,
PRIMARY KEY ((a, c)));
Hope this helps
希望这有助于