I would greatly appreciate your help on this.
我非常感谢你在这方面的帮助。
I want to insert the out-put of my map-reduce job into a HBase table using HBase Bulk loading API : LoadIncrementalHFiles.doBulkLoad(new Path(), hTable);
我想使用HBase大容量加载API: LoadIncrementalHFiles将map-reduce作业的输出插入到HBase表中。doBulkLoad(新路径(),hTable);
I am emmitting the KeyValue data type from my mapper and then using the HFileOutputFormat to prepare my HFiles using its default reducer.
我从我的映射器中提取了KeyValue数据类型,然后使用HFileOutputFormat使用它的默认还原剂准备hfile。
when I run my map-reduce job, it gets completed without any errors and it creates the outfile, however, the final step - inserting HFiles to HBase is not happening. I get the below error after my map-reduce completes:
当我运行我的map-reduce作业时,它会在没有任何错误的情况下完成,并且它创建了outfile,但是,最后一步——将HFiles插入到HBase中并没有发生。我的map-reduce完成后得到如下错误:
13/09/08 03:39:51 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://localhost:54310/user/xx.xx/output/_SUCCESS
13/09/08 03:39:51 WARN mapreduce.LoadIncrementalHFiles: Bulk load operation did not find any files to load in directory output/. Does it contain files in subdirectories that correspond to column family names?
But I can see the output directory containing:
但是我可以看到输出目录包含:
1. _SUCCESS
2. _logs
3. _0/2aa96255f7f5446a8ea7f82aa2bd299e file (which contains my data)
I have no clue as to why my bulkloader is not picking the files from output directory.
我不知道为什么我的bulkloader不会从output目录中选取文件。
Below is the code of my Map-Reduce driver class:
下面是我的Map-Reduce驱动程序类的代码:
public static void main(String[] args) throws Exception{
String inputFile = args[0];
String tableName = args[1];
String outFile = args[2];
Path inputPath = new Path(inputFile);
Path outPath = new Path(outFile);
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
//set the configurations
conf.set("mapred.job.tracker", "localhost:54311");
//Input data to HTable using Map Reduce
Job job = new Job(conf, "MapReduce - Word Frequency Count");
job.setJarByClass(MapReduce.class);
job.setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(job, inputPath);
fs.delete(outPath);
FileOutputFormat.setOutputPath(job, outPath);
job.setMapperClass(MapReduce.MyMap.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(KeyValue.class);
HTable hTable = new HTable(conf, tableName.toUpperCase());
// Auto configure partitioner and reducer
HFileOutputFormat.configureIncrementalLoad(job, hTable);
job.waitForCompletion(true);
// Load generated HFiles into table
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(new Path(outFile), hTable);
}
I would appreciate if anybody can help me in figuring out the wrong thing happening here which I avoiding my data insert to HBase.
如果有人能帮我弄明白这里发生的错误,我将避免将数据插入到HBase中,我将不胜感激。
Thanks in advance !
提前谢谢!
1 个解决方案
#1
2
Finally, I figured out as to why my HFiles were not getting dumped into HBase. Below are the details:
最后,我明白了为什么我的HFiles不会被转储到HBase中。以下是详细信息:
My create statement ddl was not having any default column-name so my guess is that Phoenix created the default column-family as "_0". I was able to see this column-family in my HDFS/hbase dir.
我的create语句ddl没有任何默认的列名,所以我猜测Phoenix将默认的列族创建为“_0”。我可以在我的HDFS/hbase dir中看到这个列族。
However, when I use the HBase's LoadIncrementalHFiles API for fetching the files from my output directory, it was not picking my dir named after the col-family ("0") in my case. I debugged the LoadIncrementalHFiles API code and found that it skips all the directories from the output path that starts with "" (for e.g. "_logs").
但是,当我使用HBase的LoadIncrementalHFiles API从我的输出目录中获取文件时,它并不是根据我的情况选择以coll -family(“0”)命名的目录。我调试了LoadIncrementalHFiles API代码,发现它从以“”开头的输出路径中跳过所有目录。“_logs”)。
I re-tried the same again but now by specifying some column-family and everything worked perfectly fine. I am able to query data using Phoenix SQL.
我重新尝试了相同的方法,但现在通过指定一些列的家庭,一切都很好。我能用Phoenix SQL查询数据。
#1
2
Finally, I figured out as to why my HFiles were not getting dumped into HBase. Below are the details:
最后,我明白了为什么我的HFiles不会被转储到HBase中。以下是详细信息:
My create statement ddl was not having any default column-name so my guess is that Phoenix created the default column-family as "_0". I was able to see this column-family in my HDFS/hbase dir.
我的create语句ddl没有任何默认的列名,所以我猜测Phoenix将默认的列族创建为“_0”。我可以在我的HDFS/hbase dir中看到这个列族。
However, when I use the HBase's LoadIncrementalHFiles API for fetching the files from my output directory, it was not picking my dir named after the col-family ("0") in my case. I debugged the LoadIncrementalHFiles API code and found that it skips all the directories from the output path that starts with "" (for e.g. "_logs").
但是,当我使用HBase的LoadIncrementalHFiles API从我的输出目录中获取文件时,它并不是根据我的情况选择以coll -family(“0”)命名的目录。我调试了LoadIncrementalHFiles API代码,发现它从以“”开头的输出路径中跳过所有目录。“_logs”)。
I re-tried the same again but now by specifying some column-family and everything worked perfectly fine. I am able to query data using Phoenix SQL.
我重新尝试了相同的方法,但现在通过指定一些列的家庭,一切都很好。我能用Phoenix SQL查询数据。