在Ubuntu18.04下配置HBase

时间:2022-11-05 06:32:24

HBase在HDFS基础上提供了高可靠, 列存储, 可扩展的数据库系统. HBase仅能通过主键(row key)和主键的range来检索数据, 主要用来存储非结构化和半结构化的松散数据. 与Hadoop一样, HBase依靠横向扩展, 通过不断增加廉价的普通服务器来增加计算和存储能力. 适合使用HBase的数据表特点为:

  • 数量巨大: 一个表可以存储数亿行, 数百万列
  • 列存储: 面向列的存储和权限控制, 列族独立检索.
  • 稀疏字段: 数据中的空(null)字段不占用存储空间, 因此适合于存储非常稀疏的表

Row Key
Row key是用来检索记录的主键, 访问table中的行只有三种方式:

  • 通过单个row key访问
  • 通过row key的range
  • 全表扫描

Row key可以是任意字符串, 最大长度是64KB, 实际应用中一般使用10 ~ 100Bytes
在HBase内部, Row key保存为字节数组, 存储时, 数据按照Row key的字典序(byte order)排序存储, 设计key时要充分利用排序存储这个特性, 将经常一起读取的行存储在一起. 注意: 字典排序对int排序的结果是1, 10, 100, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21... 如果使行键按整数大小排序, 必须在左边填充0.
行的一次读写是原子操作, 不论一次读写多少列. 这个设计决策能够使用户很容易的理解程序在同一行进行并发更新时的行为.

Column Family CF, 列族
HBase表中的每个列都归属于某个CF. CF是表的schema的一部分(而列不是), 必须在使用表之前定义. 列名都以CF作为前缀, 例如cf:username, cf:code都属于cf这个列族. 访问控制, 磁盘和和内存的使用统计都是在列族这个层面进行的. 实际应用中, 列族上的控制权限能帮助我们管理不同类型的应用, 我们允许一些应用可以添加新的基本数据, 一些应用可以读取基本数据并创建继承的列族, 一些应用只允许浏览数据(甚至可能因为隐私的原因不能浏览所有的数据).

Timestamp 时间戳
HBase中通过Row key和Columns确定的一个存储单元称为cell. 每个cell都保存着同一份数据的多个版本, 版本通过时间戳来索引. 时间戳的类型是64位整型数据, 时间戳可以由HBase自动赋值(在数据写入时). 此时时间戳是精确到毫秒的当前系统时间. 时间戳也可以由客户端显式的赋值. 如果应用程序要避免数据版本冲突, 就必须自己生成具有唯一性的时间戳. 每个cell中不同版本的数据按照时间倒序排序, 即最新的数据排列在最前面.
为了避免数据存在过多版本造成的管理(包括存储和索引)的负担, HBase提供了两种数据版本回收方式: 一是保存数据的最后n个版本, 二是保存最近一段时间内的版本(比如近十天). 用户可以针对每个列族进行设置.

Cell
由于{row key, column{=<family>+<label>}, version} 确定的唯一单元. cell中的数据是没有类型的,全部是以字节码形式存储.

系统设置

安装ntp

避免服务器间时间不同步

设置ulimit

参考自 https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_ig_hbase_config.html

对于运行hdfs和hbase的用户, 其文件打开数量限制和运行进程数量限制可以通过 ulimit -n 和 ulimit -u 查看和设置, 如果需要在启动时自动应用, 可以写入该用户的 .bashrc

另一种配置方式是通过 PAM (Pluggable Authentication Modules). 
修改 /etc/security/limits.conf, 对每个需要调整的用户增加两行配置, 例如

hdfs  -       nofile
hdfs - nproc
hbase - nofile
hbase - nproc

为了让配置生效, 需要修改 /etc/pam.d/common-session 在里面增加一行

session required  pam_limits.so

Zookeeper配置

Zookeeper集群的节点数量和配置
数量上, 只运行一个节点也可以, 但是在生产环境一般会运行3~7个(奇数)节点, 数量越多对单个节点故障的容忍度就越高. 使用奇数是因为如果使用偶数的话, 选举需要的法定人数(quorum)更高. 4个节点和5个节点需要的quorum都是3. 配置上, 建议给每个节点1GB的内存, 如果可以的话, 每个节点使用自己的独立硬盘. 对于负载很高的集群, 建议将节点运行在独立的机器上, 与RegionServer(DataNodes and TaskTrackers)分开.

HBase设置

主节点

解压, 修改 conf/regionservers, 删除localhost, 添加从节点的hostname, 这些主机会随着主节点的启动而启动, 停止而停止

vm149
vm150

如果需要有backup master, 在conf/ 下面添加配置文件 backup-masters, 添加对应的hostname

修改 conf/hbase-env.sh

export JAVA_HOME==/opt/jdk/latest
export HBASE_MANAGES_ZK=false
export HBASE_LOG_DIR=/home/tomcat/run/hbase/logs

HBASE_MANAGES_ZK=false表示使用外置的zookeeper
HBASE_LOG_DIR 如果不使用安装目录下的logs存放日志, 需要在这里指定日志路径, 否则可能在启动时无法写入

修改 conf/hbase-site.xml

<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://vm148:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
<description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>vm151,vm152,vm153</value>
<description>For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.</description>
</property>
</configuration>

默认的端口是2181, 如果不是标准端口, 则需要在配置中体现. }
zookeeper.quorum这个参数是必须的, 这里会列出zk集群的所有节点. 因为用的是独立管理的zk集群, 所以其他的zk参数都不需要.

启动后, 可以通过./zkCli.sh 在里面 ls /hbase 来检查是否正确连接

从节点

将配置好的目录从主节点直接复制到从节点

启动

启动顺序

start-dfs.sh (主节点)
start-yarn.sh (主节点)
zkServer.sh start (各个zk节点)
start-hbase.sh (主节点)

启动后, 访问主节点的 16010 端口 http://vm148:16010/ 就能看到HBase的webui

其他配置

设置dfs.datanode.max.transfer.threads

dfs.datanode.max.transfer.threads 是HDFS的参数, 用于替换掉作废的参数dfs.datanode.max.xciever. 这个参数用于控制HDFS datanode在同一时间服务的文件数量上限. 修改配置文件 etc/hadoop/conf/hdfs-site.xml, 增加以下条目

<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>4096</value>
</property>

配置HBase的BlockCache

默认配置下, HBase使用的是单独的on-heap cache, 如果配置了BucketCache, 那么on-heap cache就只用于Bloom filters和索引, 而off-heap的BucketCache则用于数据cache. 这种形式称为Blockcache配置. 这样可以使用更大的内存缓存, 也可以避免jvm gc带来的影响.

命令行参考

进入shell环境

./bin/hbase shell

列出所有table: list (如果list后面加'table name', 可以用于检查 table 是否存在)

hbase(main):001:0> list
TABLE
users
1 row(s)
Took 0.5786 seconds
=> ["users"]

显示table明细: describe 'table name'

hbase(main):003:0> describe 'users'
Table users is ENABLED
users
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false
', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',
TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_IN
DEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS
_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
1 row(s)
Took 0.2581 seconds

启用, 停用table: disable / enable 'table name'

hbase(main):004:0> disable 'users'
Took 0.5861 seconds
hbase(main):005:0> enable 'users'
Took 0.7949 seconds

创建table: create 'table name', 'cf field', (cf可以多个)

hbase(main):006:0> create 'test','cf'
Created table test
Took 1.3285 seconds
=> Hbase::Table - test hbase(main):008:0> create 'test2','cf1','cf2','cf3'
Created table test2
Took 1.2728 seconds
=> Hbase::Table - test2

删除table: drop 'table name' 在drop之前需要先disable.
注意: 在drop后不会立即释放磁盘空间, 默认情况下, 空间会在5分钟后释放.

hbase(main):011:0> disable 'test2'
Took 0.4568 seconds
hbase(main):012:0> drop 'test2'
Took 0.5034 seconds

列出table记录: scan 'table name'

hbase(main):013:0> scan 'test'
ROW COLUMN+CELL
0 row(s)
Took 0.1512 seconds

新增记录:  put 'table name', 'row id', 'cf field', 'value' 
对于一个row id, 每次只能put一个字段值, 给同一个row id分别put不同字段的值, 在scan时实际上是显示为多行的

hbase(main):026:0> put 'test','row001','cf:a','001'
Took 0.0884 seconds
hbase(main):027:0> put 'test','row002','cf:a','002'
Took 0.0076 seconds
hbase(main):028:0> put 'test','row003','cf:b','001'
Took 0.0086 seconds
hbase(main):029:0> scan 'test'
ROW COLUMN+CELL
row001 column=cf:a, timestamp=1548510719243, value=001
row002 column=cf:a, timestamp=1548510724943, value=002
row003 column=cf:b, timestamp=1548510733680, value=001
3 row(s)
Took 0.0477 seconds

读取一个row id的所有字段记录:  get 'table name', 'row id'

hbase(main):032:0> get 'test', 'row001'
COLUMN CELL
cf:a timestamp=1548510719243, value=001
cf:b timestamp=1548510892749, value=003
1 row(s)
Took 0.0491 seconds

删除一个row id 在指定字段上的记录: delete 'table name', 'row id', 'cf field'

hbase(main):033:0> delete 'test', 'row001', 'cf:b'
Took 0.0298 seconds
hbase(main):034:0> get 'test', 'row001'
COLUMN CELL
cf:a timestamp=1548510719243, value=001
1 row(s)
Took 0.0323 seconds

如果要删除一整个row id, 要使用 deleteall:

hbase(main):045:0> deleteall 'test', 'row004'
Took 0.0081 seconds

统计row id数量: count 'table name'

hbase(main):039:0> scan 'test'
ROW COLUMN+CELL
row001 column=cf:a, timestamp=1548510719243, value=001
row001 column=cf:b, timestamp=1548511393583, value=003
row002 column=cf:a, timestamp=1548510724943, value=002
row002 column=cf:b, timestamp=1548511400007, value=002
row003 column=cf:b, timestamp=1548510733680, value=001
3 row(s)
Took 0.0409 seconds
hbase(main):040:0> count 'test'
3 row(s)
Took 0.0178 seconds
=> 3

.清空table: truncate 'table name'
这个命令实际上是执行了disable, drop, recreate三个步骤

hbase(main):047:0> truncate 'test'
Truncating 'test' table (it may take a while):
Disabling table...
Truncating table...
Took 2.1415 seconds

将csv文件导入hbase

假定csv文件在当前文件系统目录下(不是hdfs), csv文件以逗号分隔, 要将其导入目标表格为test:

$ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv

因为默认使用的是TSV格式, 对于CSV格式需要特别指定分隔符为','. 
目标字段使用importtsv,columns参数指定, 根据csv文件中的列依次对应hbase table中的cf字段.

导入过程的完整输出为

$ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv
2019-01-26 14:35:52,566 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:host.name=vm148
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_192
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.home=/opt/jdk/jdk1.8.0_192/jre
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: opt/hbase/latest/bin/../lib/protobuf-java-2.5.0.jar:/opt/hbase/latest/bin/../lib/snappy-java-1.0.5.jar:/opt/hbase/latest/bin/../lib/spymemcached-2.12.2.jar:/opt/hbase/latest/bin/../lib/validation-api-1.1.0.Final.jar:/opt/hbase/latest/bin/../lib/xmlenc-0.52.jar:/opt/hbase/latest/bin/../lib/xz-1.0.jar:/opt/hbase/latest/bin/../lib/zookeeper-3.4.10.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.name=Linux
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.version=4.15.0-43-generic
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.name=tomcat
2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.home=/home/tomcat
2019-01-26 14:35:52,950 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.dir=/home/tomcat
2019-01-26 14:35:52,951 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:52,969 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:52,974 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session
2019-01-26 14:35:52,986 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0002, negotiated timeout = 40000
2019-01-26 14:35:54,071 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Session: 0x3002261518a0002 closed
2019-01-26 14:35:54,074 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0002
2019-01-26 14:35:54,095 INFO [main] Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2019-01-26 14:35:54,096 INFO [main] jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
2019-01-26 14:35:54,126 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:54,130 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:54,134 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session
2019-01-26 14:35:54,138 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0003, negotiated timeout = 40000
2019-01-26 14:35:54,416 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Session: 0x3002261518a0003 closed
2019-01-26 14:35:54,416 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0003
2019-01-26 14:35:54,579 INFO [main] input.FileInputFormat: Total input paths to process : 1
2019-01-26 14:35:54,615 INFO [main] mapreduce.JobSubmitter: number of splits:1
2019-01-26 14:35:54,752 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_local98574210_0001
2019-01-26 14:35:55,026 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop2-compat-2.1.2.jar
2019-01-26 14:35:55,084 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop2-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar
2019-01-26 14:35:55,686 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar <- /home/tomcat/jackson-core-2.9.2.jar
2019-01-26 14:35:55,693 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-core-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar
2019-01-26 14:35:55,713 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar <- /home/tomcat/hbase-metrics-2.1.2.jar
2019-01-26 14:35:55,722 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar
2019-01-26 14:35:55,744 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar <- /home/tomcat/hadoop-common-2.7.7.jar
2019-01-26 14:35:55,746 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-common-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar
2019-01-26 14:35:55,746 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar <- /home/tomcat/zookeeper-3.4.10.jar
2019-01-26 14:35:55,754 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/zookeeper-3.4.10.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar
2019-01-26 14:35:55,755 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar <- /home/tomcat/hbase-protocol-shaded-2.1.2.jar
2019-01-26 14:35:55,758 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-shaded-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar
2019-01-26 14:35:55,758 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar <- /home/tomcat/hbase-client-2.1.2.jar
2019-01-26 14:35:55,760 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-client-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar
2019-01-26 14:35:55,760 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar <- /home/tomcat/hadoop-mapreduce-client-core-2.7.7.jar
2019-01-26 14:35:55,762 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-mapreduce-client-core-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar
2019-01-26 14:35:55,762 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar <- /home/tomcat/hbase-shaded-netty-2.1.0.jar
2019-01-26 14:35:55,763 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-netty-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar
2019-01-26 14:35:55,763 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar <- /home/tomcat/commons-lang3-3.6.jar
2019-01-26 14:35:55,766 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/commons-lang3-3.6.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar
2019-01-26 14:35:55,766 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar <- /home/tomcat/hbase-mapreduce-2.1.2.jar
2019-01-26 14:35:55,768 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-mapreduce-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar
2019-01-26 14:35:55,768 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar <- /home/tomcat/metrics-core-3.2.1.jar
2019-01-26 14:35:55,770 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/metrics-core-3.2.1.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar
2019-01-26 14:35:55,770 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar <- /home/tomcat/hbase-common-2.1.2.jar
2019-01-26 14:35:55,771 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-common-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar
2019-01-26 14:35:55,771 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar <- /home/tomcat/htrace-core4-4.2.0-incubating.jar
2019-01-26 14:35:55,775 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar
2019-01-26 14:35:55,775 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop-compat-2.1.2.jar
2019-01-26 14:35:55,777 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar
2019-01-26 14:35:55,777 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar <- /home/tomcat/hbase-zookeeper-2.1.2.jar
2019-01-26 14:35:55,778 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-zookeeper-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar
2019-01-26 14:35:55,779 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar <- /home/tomcat/hbase-shaded-miscellaneous-2.1.0.jar
2019-01-26 14:35:55,780 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-miscellaneous-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar
2019-01-26 14:35:55,781 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar <- /home/tomcat/protobuf-java-2.5.0.jar
2019-01-26 14:35:55,782 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/protobuf-java-2.5.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar
2019-01-26 14:35:55,782 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar <- /home/tomcat/jackson-annotations-2.9.2.jar
2019-01-26 14:35:55,784 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-annotations-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar
2019-01-26 14:35:55,784 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar <- /home/tomcat/hbase-server-2.1.2.jar
2019-01-26 14:35:55,786 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-server-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar
2019-01-26 14:35:55,786 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar <- /home/tomcat/hbase-metrics-api-2.1.2.jar
2019-01-26 14:35:55,787 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-api-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar
2019-01-26 14:35:55,788 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar <- /home/tomcat/jackson-databind-2.9.2.jar
2019-01-26 14:35:55,789 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-databind-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar
2019-01-26 14:35:55,790 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar <- /home/tomcat/hbase-protocol-2.1.2.jar
2019-01-26 14:35:55,791 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar
2019-01-26 14:35:55,791 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar <- /home/tomcat/hbase-shaded-protobuf-2.1.0.jar
2019-01-26 14:35:55,799 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-protobuf-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar
2019-01-26 14:35:55,852 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar
2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar
2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar
2019-01-26 14:35:55,858 INFO [main] mapreduce.Job: The url to track the job: http://localhost:8080/
2019-01-26 14:35:55,858 INFO [main] mapreduce.Job: Running job: job_local98574210_0001
2019-01-26 14:35:55,861 INFO [Thread-55] mapred.LocalJobRunner: OutputCommitter set in config null
2019-01-26 14:35:55,892 INFO [Thread-55] mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.hbase.mapreduce.TableOutputCommitter
2019-01-26 14:35:55,936 INFO [Thread-55] mapred.LocalJobRunner: Waiting for map tasks
2019-01-26 14:35:55,938 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Starting task: attempt_local98574210_0001_m_000000_0
2019-01-26 14:35:55,995 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2019-01-26 14:35:56,000 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: file:/home/tomcat/output.csv:0+1703
2019-01-26 14:35:56,008 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:56,009 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:56,009 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session
2019-01-26 14:35:56,016 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420008, negotiated timeout = 40000
2019-01-26 14:35:56,021 INFO [LocalJobRunner Map Task Executor #0] mapreduce.TableOutputFormat: Created table instance for test
2019-01-26 14:35:56,047 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:56,048 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:56,049 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session
2019-01-26 14:35:56,052 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420009, negotiated timeout = 40000
2019-01-26 14:35:56,116 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Session: 0x200226284420009 closed
2019-01-26 14:35:56,116 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420009
2019-01-26 14:35:56,138 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner:
2019-01-26 14:35:56,280 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Task:attempt_local98574210_0001_m_000000_0 is done. And is in the process of committing
2019-01-26 14:35:56,289 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Session: 0x200226284420008 closed
2019-01-26 14:35:56,289 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420008
2019-01-26 14:35:56,296 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: map
2019-01-26 14:35:56,296 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Task 'attempt_local98574210_0001_m_000000_0' done.
2019-01-26 14:35:56,303 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Final Counters for attempt_local98574210_0001_m_000000_0: Counters: 16
File System Counters
FILE: Number of bytes read=37574934
FILE: Number of bytes written=38237355
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=30
Map output records=30
Input split bytes=93
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=8
Total committed heap usage (bytes)=62849024
ImportTsv
Bad Lines=0
File Input Format Counters
Bytes Read=1703
File Output Format Counters
Bytes Written=0
2019-01-26 14:35:56,304 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Finishing task: attempt_local98574210_0001_m_000000_0
2019-01-26 14:35:56,304 INFO [Thread-55] mapred.LocalJobRunner: map task executor complete.
2019-01-26 14:35:56,860 INFO [main] mapreduce.Job: Job job_local98574210_0001 running in uber mode : false
2019-01-26 14:35:56,862 INFO [main] mapreduce.Job: map 100% reduce 0%
2019-01-26 14:35:56,866 INFO [main] mapreduce.Job: Job job_local98574210_0001 completed successfully
2019-01-26 14:35:56,899 INFO [main] mapreduce.Job: Counters: 16
File System Counters
FILE: Number of bytes read=37574934
FILE: Number of bytes written=38237355
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=30
Map output records=30
Input split bytes=93
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=8
Total committed heap usage (bytes)=62849024
ImportTsv
Bad Lines=0
File Input Format Counters
Bytes Read=1703
File Output Format Counters
Bytes Written=0

.将tsv导入hbase, 这边使用的文件直接导入, 2.6GB, 5kw条记录花了整整29分钟, 不知道是不是放到hdfs里再导入会快一些?

/opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:key1,cf:key2,cf:key3,cf:key4,cf:key5 worktable posts.txt

Update 2019-01-28: 关于TSV文件里的双引号, 字段值当中的tab字符:

如果mysql -e导出时使用了OPTIONALLY ENCLOSED BY '\"', 那么导出的tsv文件中, 凡是字符串类型的字段, 都会加上双引号, 在通过上面的语句导入到HBase中之后, 会发现双引号出现在了字段的value当中. 所以mysql -e时, 不建议使用 OPTIONALLY ENCLOSED BY '\"' 参数

如果mysql的记录中, 字段的值包含了tab, 那么在导出时, 会被自动转义, 如下

40	2	,\	[bot]	1528869876
41 2 [bot], 1528869876
42 2 t\ [bot]" 1528869876
43 2 t\ [bot]' 1528869876
44 2 't\ [bot]' 1528869876
45 2 "t\ [bot]" 1528869876
46 2 t\ [bot] 1528869876
47 2 tab\ \ [bot] 1528869876

这个和是否使用OPTIONALLY ENCLOSED BY '\"' 有关, 上面是不加此参数的, 下面是加了此参数输出的内容

40	2	",	[bot]"	1528869876
41 2 "[bot]," 1528869876
42 2 "t [bot]\"" 1528869876
43 2 "t [bot]'" 1528869876
44 2 "'t [bot]'" 1528869876
45 2 "\"t [bot]\"" 1528869876
46 2 "t [bot]" 1528869876
47 2 "tab [bot]" 1528869876

可以看到, 加了此参数后, 就不再转义tab, 而是转义双引号.

而对于importTSV, 对于以上两种TSV文件, 这几行带tab的数据都是不能正常导入的, 会被处理为Bad Line. 因为importTSV处理分隔符时是简单地对单字符逐个处理, 并不会识别转义的tab. 具体的代码可以查看其源代码 https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java aaa其中的 public ParsedLine parse(byte[] lineBytes, int length) 方法.

所以如果字段值带tab的话, 要么换一个不冲突的分隔符, 要么在生成TSV时替换成别的内容(例如空格)

==

在hbase shell里, count 'worktable' 速度非常慢, 花了一个小时才count完

Current count: 49458000, row: 9999791
49458230 row(s)
Took 3684.2802 seconds
=> 49458230

get的速度很快

hbase(main):056:0> get 'smth','1995'
COLUMN CELL
cf:post_time timestamp=1548515983185, value=876546980
cf:user_id timestamp=1548515983185, value=554
cf:username timestamp=1548515983185, value="aaa"
1 row(s)
Took 0.0882 seconds
hbase(main):057:0> get 'smth','49471229'
COLUMN CELL
cf:post_time timestamp=1548515983185, value=1546941261
cf:user_id timestamp=1548515983185, value=161838
cf:username timestamp=1548515983185, value="bbb"
1 row(s)
Took 0.0873 seconds

.