
时间:2023-01-14 08:26:22

        String parentPath = "F:/pic/2003-zhujiajian";
        File[] files = getAllFilePath(parentPath);
        HBaseConfiguration config = new HBaseConfiguration();
        HTable table = new HTable(config, new Text("offer"));
        long start = System.currentTimeMillis();
        for (File file :files) {
            if(file.isFile()) {
                byte[] data = getData(file);
        long end = System.currentTimeMillis();
        System.out.println("time cost=" + (end-start));
 108037206 bytes, 303个files write from local windows to remote hbase,cost 23328 or 21001 milliseconds
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path src = new Path("F:/pic/2003-zhujiajian");
        Path dst = new Path("/user/zxf/image");
        long start = System.currentTimeMillis();
        fs.copyFromLocalFile(src, dst);
        long end = System.currentTimeMillis();
        System.out.println("time cost=" + (end-start));
 108037206 bytes, 303 files write from local windows to remote hdfs,cost 26531 or 32407 milliseconds
  108037206 bytes, 303 files read from hdfs to local cost 479350 milliseconds
 108037206 bytes, 303 files read from hdfs to local cost 14188 milliseconds

 fileSize(byte)  hdfs time(ms) hbase time(ms)
 12341140        1313          14688
 708474          63            4359
 82535           15            3907
 55296           16            125

6 思考
  测试期间发生了一个region offline的错误,重启服务也还是报错,后然重新format namenode, delete datanode上数据,重启发现还有datanode没有起来,ssh上去发现java进程死了
  浪费了1个多小时,仔细想了一下 HTable分散到各个HRegionServer上的各子表,一台datanode挂了,当有数据请求时,连不上,所以报region offline错误
 为什么hbase读取的performance那么差?我单个读取11m的文件需要14000 milliseconds,而hdfs真个文件目录的读取才14188 milliseconds,这篇文章中说到
 Finally, another thing you shouldn’t do with HBase (or an RDBMS, forthat matter), is store large amounts of binary data. When I say largeamounts, I mean tens to hundreds of megabytes. Certainly both RDBMSsand HBase have the capabilities to store large amounts of binary data.However, again, we have an impedance mismatch. RDBMSs are built to befast metadata stores; HBase is designed to have lots of rows and cells,but functions best when the rows are (relatively) small. HBase splitsthe virtual table space into regions that can be spread out across manyservers. The default size of individual files in a region is 256MB. Thecloser to the region limit you make each row, the more overhead you arepaying to host those rows. If you have to store a lot of big files,then you’re best off storing in the local filesystem, or if you haveLOTS of data, HDFS. You can still keep the metadata in an RDBMS orHBase - but do us all a favor and just keep the path in the metadata.

  alter table offer change image_big IN_MEMORY;
  fileSize(byte)  1(ms)   2(ms)  3(ms)
  12341140        11750   11109  11718
  708474          625     610    672
  82535           78      78     78
  55296           47      62     47
  原因可能是系统在创建row's clunm data的时候打上了cache标志,cache适合clunm系统绑定在一起的,hbase启动的时候会把打了cache标志的colunm数据读到memory中.
 所以在我执行alter table offer change image_big IN_MEMORY之前所创建的数据都没有cache标志, 此cache不是像其他的cache,启动的时候不做load,访问后再cache,这样一来,cache的数据愈多必然造成启动速度的加慢,我这里也有所感觉了,当然对用户体验是好的,不会在第一次访问的时候特别慢

  从debug日志来看,情况的确是这样,文件越大,regionServer response clinet的次数非常多.具体还需分析源代码仔细看看了.