I'm looking for an HBase shell command that will count the number of records in a specified column family. I know I can run:
我正在寻找一个HBase shell命令,它将计算指定列族中的记录数。我知道我可以跑:
echo "scan 'table_name'" | hbase shell | grep column_family_name | wc -l
however this will run much slower than the standard counting command:
但是这将比标准计数命令运行得慢得多:
count 'table_name' , CACHE => 50000 (because the use of the CACHE=>50000)
and worse - it doesn't return the real number of records, but something like the total number of cells (if I'm not mistaken?) in the specified column family. I need something of the sort:
更糟糕的是 - 它不返回实际的记录数,而是指定列系列中的单元格总数(如果我没有弄错?)。我需要这样的东西:
count 'table_name' , CACHE => 50000 , {COLUMNS => 'column_family_name'}
Thanks in advance,
Michael
先谢谢你,迈克尔
1 个解决方案
#1
4
Here is Ruby code I have written when needed thing like you need. Appropriate comments are provided. It provides you with HBase
shell count_table
command. First parameter is table name and second is array of properties, the same as for scan
shell command.
这是我在需要时编写的Ruby代码。提供适当的评论。它为您提供HBase shell count_table命令。第一个参数是表名,第二个是属性数组,与scan shell命令相同。
Direct answer to your question is
直接回答你的问题是
count_table 'your.table', { COLUMNS => 'your.family' }
I also recommend to add cache, like for scan:
我还建议添加缓存,例如扫描:
count_table 'your.table', { COLUMNS => 'your.family', CACHE => 10000 }
And here you go with sources:
在这里你来源:
# Argiments are the same as for scan command.
# Examples:
#
# count_table 'test.table', { COLUMNS => 'f:c1' }
# --- Counts f:c1 columsn in 'test_table'.
#
# count_table 'other.table', { COLUMNS => 'f' }
# --- Counts 'f' family rows in 'other.table'.
#
# count_table 'test.table', { CACHE => 1000 }
# --- Count rows with caching.
#
def count_table(tablename, args = {})
table = @shell.hbase_table(tablename)
# Run the scanner
scanner = table._get_scanner(args)
count = 0
iter = scanner.iterator
# Iterate results
while iter.hasNext
row = iter.next
count += 1
end
# Return the counter
return count
end
#1
4
Here is Ruby code I have written when needed thing like you need. Appropriate comments are provided. It provides you with HBase
shell count_table
command. First parameter is table name and second is array of properties, the same as for scan
shell command.
这是我在需要时编写的Ruby代码。提供适当的评论。它为您提供HBase shell count_table命令。第一个参数是表名,第二个是属性数组,与scan shell命令相同。
Direct answer to your question is
直接回答你的问题是
count_table 'your.table', { COLUMNS => 'your.family' }
I also recommend to add cache, like for scan:
我还建议添加缓存,例如扫描:
count_table 'your.table', { COLUMNS => 'your.family', CACHE => 10000 }
And here you go with sources:
在这里你来源:
# Argiments are the same as for scan command.
# Examples:
#
# count_table 'test.table', { COLUMNS => 'f:c1' }
# --- Counts f:c1 columsn in 'test_table'.
#
# count_table 'other.table', { COLUMNS => 'f' }
# --- Counts 'f' family rows in 'other.table'.
#
# count_table 'test.table', { CACHE => 1000 }
# --- Count rows with caching.
#
def count_table(tablename, args = {})
table = @shell.hbase_table(tablename)
# Run the scanner
scanner = table._get_scanner(args)
count = 0
iter = scanner.iterator
# Iterate results
while iter.hasNext
row = iter.next
count += 1
end
# Return the counter
return count
end