使用扫描在HBase与开始行,结束行和一个过滤器

时间:2022-03-11 08:25:08

I need to use a Scan in HBase for scanning all rows that meet certain criteria: that's the reason why I will use a filter (really a compound filter list that includes two SingleColumnValueFilter). Now, I have my rowKeys structured in this way:

我需要在HBase中使用扫描来扫描符合某些条件的所有行:这就是为什么我要使用过滤器(实际上是包含两个SingleColumnValueFilter的复合过滤器列表)。现在,我把我的rowkey构造成这样:

a.b.x|1|1252525  
a.b.x|1|2373273  
a.b.x|1|2999238  
...  
a.b.x|2|3000320  
a.b.x|2|4000023  
...  
a.b.y|1|1202002  
a.b.y|1|1778949  
a.b.y|1|2738273  

and as an additional requirement, I need to iterate only those rows having a rowKey starting with "a.b.x|1"

作为一个额外的需求,我只需要迭代那些以“a.b.x|1”开头的行

Now, the questions

现在,问题

  1. if I use an additional PrefixFilter in my filter list does the scanner always scan all rows (and on each of them applies the filter)?
  2. 如果我在过滤器列表中使用一个附加的前缀过滤器,扫描器是否总是扫描所有的行(并且在每个行上应用过滤器)?
  3. if I instantiate the Scan passing a startRow (prefix) and the filterlist (without the PrefixFilter), I understood that the scan starts from the given row prefix. So, assume I'm using an "a.b.x." as startRow, does the scan will scan also the a.b.y?
  4. 如果我实例化了传递startRow(前缀)和filterlist(没有前缀过滤器)的扫描,我就知道扫描是从给定的行前缀开始的。假设我用的是“a.b.x”作为startRow,扫描也会扫描a.b.y吗?
  5. What is the behaviour if I use new Scan(startRow, endRow) and then setFilter? In any words: what about the missing constructor Scan(byte [] start, byte [] end, Filter filter)?
  6. 如果我使用新的扫描(startRow, endRow)然后使用setFilter,会有什么行为呢?总之:丢失的构造函数扫描(byte [] start、byte [] end、Filter)怎么样?

Thanks in advance
Andrea

提前谢谢安德里亚

1 个解决方案

#1


2  

Row keys are sorted(lexical) in hbase. Hence all the "a.b.x|1"s would come before "a.b.x|2"s and so on.. As rows keys are stored as byte arrays and are lexicographically sorted, be careful with non fixed length row keys and when you are mixing up different character classes. But for your requirement something on this lines should work:

行键在hbase中被排序(词汇)。从此,所有的“a.b。x|1"s会在a。b之前出现。x | 2”等等. .由于行键被存储为字节数组并按字典顺序排序,所以在混合不同的字符类时,请小心使用非固定长度的行键。但对于您的要求,这方面的一些东西应该是有用的:

Scan scan = new Scan(Bytes.toBytes("a.b.x|1"),Bytes.toBytes("a.b.x|2"); //creating a scan object with start and stop row keys

scan.setFilter(colFilter);//set the Column filters you have to this scan object.

//And then you can get a scanner object and iterate through your results
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next())
{
    //Use the result object
}

update: ToBytes should be toBytes

更新:ToBytes应该是ToBytes

#1


2  

Row keys are sorted(lexical) in hbase. Hence all the "a.b.x|1"s would come before "a.b.x|2"s and so on.. As rows keys are stored as byte arrays and are lexicographically sorted, be careful with non fixed length row keys and when you are mixing up different character classes. But for your requirement something on this lines should work:

行键在hbase中被排序(词汇)。从此,所有的“a.b。x|1"s会在a。b之前出现。x | 2”等等. .由于行键被存储为字节数组并按字典顺序排序,所以在混合不同的字符类时,请小心使用非固定长度的行键。但对于您的要求,这方面的一些东西应该是有用的:

Scan scan = new Scan(Bytes.toBytes("a.b.x|1"),Bytes.toBytes("a.b.x|2"); //creating a scan object with start and stop row keys

scan.setFilter(colFilter);//set the Column filters you have to this scan object.

//And then you can get a scanner object and iterate through your results
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next())
{
    //Use the result object
}

update: ToBytes should be toBytes

更新:ToBytes应该是ToBytes