HBase Thrift:如何连接到远程HBase主/集群?

时间:2021-10-29 00:55:07

Thanks to the Cloudera distribution, I have a HBase master/datanode + Thrift server running on a local machine, and can code and test HBase client programs and use it, no problem.

由于Cloudera发行版,我有一个运行在本地机器上的HBase master/datanode + Thrift server,可以编写和测试HBase客户端程序并使用它,没有问题。

However, I now need to use Thrift in production, and I'm not able to find documentation on how to get Thrift running with a production HBase cluster.

然而,我现在需要在生产中使用节俭,我不能找到关于如何使用生产HBase集群的节俭的文档。

From what I understand, I will need to run the hbase-thrift program on the client node since the Thrift program is just another intermediate client to HBase.

根据我的理解,我将需要在客户端节点上运行HBase - Thrift程序,因为Thrift程序只是HBase的另一个中间客户。

So I'm guessing that I have to be able to somehow specify the master node hostname/IP to HBase-Thrift? How would I do this?

所以我猜我必须能够以某种方式指定主节点主机名/IP到HBase-Thrift?我该怎么做呢?

Also, any suggestions on how to scale this up in production? Do I only need a setup like this:

另外,对于如何在生产中扩大规模,有什么建议吗?我只需要这样的设置吗:

Client <-> Thrift client <-> HBase Master <-> Multiple HBase workers

1 个解决方案

#1


7  

Get it running

You don't have to run a Thrift server on your local machine, it can run anywhere but the RegionServers are usually a good place*. In the code you then connect to that server.

您不必在本地机器上运行节俭服务器,它可以在任何地方运行,但是区域服务器通常是一个好地方*。在代码中,您将连接到该服务器。

A Python example:

一个Python示例:

transport = TSocket.TSocket("random-regionserver", 9090)

Where you'd obviously replace the random-regionserver with one of the servers you're running the Thrift server on.

显然,您应该将random-regionserver替换为正在使用的一个服务器。

That server gets its configuration from the usual places. If you're using CDH then you'll find the configuration in /etc/hbase/conf/hbase-site.xml and you'll need to add a property hbase.zookeeper.quorum:

该服务器从通常的位置获取配置。如果您正在使用CDH,那么您将在/etc/hbase/conf/hbase-site中找到配置。您需要添加一个属性hbase.zookeeper.quorum:

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>list of your zookeeper servers</value>
</property>

When you start the Thrift server from the downloaded Apache distribution this is similar except that the hbase-site.xml will probably sit in a different directory.

当您从下载的Apache发行版启动Thrift服务器时,这与hbase-site类似。xml可能位于不同的目录中。

Scaling it up

One easy way to scale up right now is to keep a list of all the Regionservers in your Thrift client and pick one at random on connect. Or you create multiple connections and use a random one each time. Some language bindings (i.e. PHP) have a TSocketPool where you can pass in all your servers. Otherwise there's some manual work you need to do.

现在扩展的一种简单方法是,在你的储蓄客户端中保存所有区域服务器的列表,并在connect中随机选择一个。或者创建多个连接,每次使用一个随机连接。有些语言绑定(例如PHP)有一个TSocketPool,您可以在其中传入所有服务器。否则你需要做一些手工工作。

Using this technique all reads and writes should be more or less distributed across the Thrift servers in your cluster. Each read or write operation arriving at a Thrift server will still be translated into a Java based API call from the Thrift server which then opens a network connection to the proper Regionserver(s) to perform the requested action.

使用这种技术,所有的读写都应该或多或少地分布在集群中的节俭服务器上。到达一个节约服务器的每个读或写操作仍然会被转换成一个来自节约服务器的基于Java的API调用,然后打开一个到适当的区域服务器的网络连接来执行请求的操作。

That means that you won't get as good a performance as you would when you use the Java API. It might help if you cache region locations yourself and hit the appropriate Thrift server but even then an additional Java API call will be made even if it ends up on the local server. HBASE-4460 would help with this scenario but this is not included in CDH3u4 or CDH4.

这意味着,当您使用Java API时,您将无法获得良好的性能。如果您自己缓存区域位置并单击适当的储蓄服务器,这可能会有所帮助,但即使在本地服务器上,也会进行额外的Java API调用。HBASE-4460将有助于实现这个场景,但这并不包括在CDH3u4或CDH4中。

* There is an issue HBASE-4460 which actually embeds a Thrift server in a Regionserver.

*有一个问题HBASE-4460实际上嵌入了一个区域性服务器中的储蓄服务器。

#1


7  

Get it running

You don't have to run a Thrift server on your local machine, it can run anywhere but the RegionServers are usually a good place*. In the code you then connect to that server.

您不必在本地机器上运行节俭服务器,它可以在任何地方运行,但是区域服务器通常是一个好地方*。在代码中,您将连接到该服务器。

A Python example:

一个Python示例:

transport = TSocket.TSocket("random-regionserver", 9090)

Where you'd obviously replace the random-regionserver with one of the servers you're running the Thrift server on.

显然,您应该将random-regionserver替换为正在使用的一个服务器。

That server gets its configuration from the usual places. If you're using CDH then you'll find the configuration in /etc/hbase/conf/hbase-site.xml and you'll need to add a property hbase.zookeeper.quorum:

该服务器从通常的位置获取配置。如果您正在使用CDH,那么您将在/etc/hbase/conf/hbase-site中找到配置。您需要添加一个属性hbase.zookeeper.quorum:

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>list of your zookeeper servers</value>
</property>

When you start the Thrift server from the downloaded Apache distribution this is similar except that the hbase-site.xml will probably sit in a different directory.

当您从下载的Apache发行版启动Thrift服务器时,这与hbase-site类似。xml可能位于不同的目录中。

Scaling it up

One easy way to scale up right now is to keep a list of all the Regionservers in your Thrift client and pick one at random on connect. Or you create multiple connections and use a random one each time. Some language bindings (i.e. PHP) have a TSocketPool where you can pass in all your servers. Otherwise there's some manual work you need to do.

现在扩展的一种简单方法是,在你的储蓄客户端中保存所有区域服务器的列表,并在connect中随机选择一个。或者创建多个连接,每次使用一个随机连接。有些语言绑定(例如PHP)有一个TSocketPool,您可以在其中传入所有服务器。否则你需要做一些手工工作。

Using this technique all reads and writes should be more or less distributed across the Thrift servers in your cluster. Each read or write operation arriving at a Thrift server will still be translated into a Java based API call from the Thrift server which then opens a network connection to the proper Regionserver(s) to perform the requested action.

使用这种技术,所有的读写都应该或多或少地分布在集群中的节俭服务器上。到达一个节约服务器的每个读或写操作仍然会被转换成一个来自节约服务器的基于Java的API调用,然后打开一个到适当的区域服务器的网络连接来执行请求的操作。

That means that you won't get as good a performance as you would when you use the Java API. It might help if you cache region locations yourself and hit the appropriate Thrift server but even then an additional Java API call will be made even if it ends up on the local server. HBASE-4460 would help with this scenario but this is not included in CDH3u4 or CDH4.

这意味着,当您使用Java API时,您将无法获得良好的性能。如果您自己缓存区域位置并单击适当的储蓄服务器,这可能会有所帮助,但即使在本地服务器上,也会进行额外的Java API调用。HBASE-4460将有助于实现这个场景,但这并不包括在CDH3u4或CDH4中。

* There is an issue HBASE-4460 which actually embeds a Thrift server in a Regionserver.

*有一个问题HBASE-4460实际上嵌入了一个区域性服务器中的储蓄服务器。