如何获取Cassandra表中的行数

时间:2022-09-10 13:02:06

This is a super basic question but it's actually been bugging me for days. Is there a good way to obtain the equivalent of a COUNT(*) of a given table in Cassandra?

这是一个超级基本的问题,但它实际上已经困扰了我好几天。有没有一种很好的方法来获得Cassandra中给定表的COUNT(*)等价物?

I will be moving several hundreds of millions of rows into C* for some load testing and I'd like to at least get a row count on some sample ETL jobs before I move massive amounts of data over the network.

我将把几亿行数移到C *中进行一些负载测试,我想在通过网络移动大量数据之前至少获得一些样本ETL作业的行数。

The best idea I have is to basically loop over each row with Python and auto increment a counter. Is there a better way to determine (or even estimate) the row size of a C* table? I've also poked around Datastax Ops Center to see if I can determine the row size there. If you can, I don't see how it's possible.

我最好的想法是基本上用Python循环遍历每一行并自动递增一个计数器。有没有更好的方法来确定(甚至估计)C *表的行大小?我还在Datastax Ops Center周围寻找,看看我是否可以确定那里的行大小。如果可以的话,我看不出它是怎么可能的。

Anyone else needed to get a count(*) of a table in C*? If so, how'd you go about doing it?

其他人需要在C *中获取表的计数(*)吗?如果是这样,你怎么去做呢?

6 个解决方案

#1


36  

Yes, you can use COUNT(*). Here's the documentation.

是的,您可以使用COUNT(*)。这是文档。

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

使用COUNT(*)的SELECT表达式返回与查询匹配的行数。或者,您可以使用COUNT(1)来获得相同的结果。

Count the number of rows in the users table:

计算users表中的行数:

SELECT COUNT(*) FROM users;

#2


9  

You can also get some estimates from nodetool cfhistograms if you don't need an exact count (these values are estimates).

如果您不需要精确计数(这些值是估计值),您还可以从nodetool cfhistograms获得一些估计值。

You can also use spark if you're running DSE.

如果您正在运行DSE,也可以使用spark。

#3


3  

nodetool tablestats can be pretty handy for quickly getting row estimates (and other table stats).

nodetool tablestats可以非常方便快速获取行估计(和其他表统计信息)。

nodetool tablestats <keyspace.table> for a specific table

nodetool tablestats 表示特定表

#4


0  

You can use copy to avoid cassandra timeout usually happens on count(*)

你可以使用copy来避免cassandra超时通常发生在count(*)上

cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'

cqlsh -e“将keyspace.table_name(first_partition_key_name)复制到'/ dev / null'”| sed -n 5p | sed's /。* //'

#5


0  

For those using the C# Linq Component Adapter you can use:

对于使用C#Linq组件适配器的用户,您可以使用:

var t = new Table<T>(session);
var count = t.Count().Execute();

#6


-3  

nodetool cfstats | grep -A 1000 KEYSPACE

nodetool cfstats | grep -A 1000 KEYSPACE

Replace KEYSPACE for getting details of all tables in that KEYSPACE

替换KEYSPACE以获取该KEYSPACE中所有表的详细信息

#1


36  

Yes, you can use COUNT(*). Here's the documentation.

是的,您可以使用COUNT(*)。这是文档。

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

使用COUNT(*)的SELECT表达式返回与查询匹配的行数。或者,您可以使用COUNT(1)来获得相同的结果。

Count the number of rows in the users table:

计算users表中的行数:

SELECT COUNT(*) FROM users;

#2


9  

You can also get some estimates from nodetool cfhistograms if you don't need an exact count (these values are estimates).

如果您不需要精确计数(这些值是估计值),您还可以从nodetool cfhistograms获得一些估计值。

You can also use spark if you're running DSE.

如果您正在运行DSE,也可以使用spark。

#3


3  

nodetool tablestats can be pretty handy for quickly getting row estimates (and other table stats).

nodetool tablestats可以非常方便快速获取行估计(和其他表统计信息)。

nodetool tablestats <keyspace.table> for a specific table

nodetool tablestats 表示特定表

#4


0  

You can use copy to avoid cassandra timeout usually happens on count(*)

你可以使用copy来避免cassandra超时通常发生在count(*)上

cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'

cqlsh -e“将keyspace.table_name(first_partition_key_name)复制到'/ dev / null'”| sed -n 5p | sed's /。* //'

#5


0  

For those using the C# Linq Component Adapter you can use:

对于使用C#Linq组件适配器的用户,您可以使用:

var t = new Table<T>(session);
var count = t.Count().Execute();

#6


-3  

nodetool cfstats | grep -A 1000 KEYSPACE

nodetool cfstats | grep -A 1000 KEYSPACE

Replace KEYSPACE for getting details of all tables in that KEYSPACE

替换KEYSPACE以获取该KEYSPACE中所有表的详细信息