HBase源码分析之客户端连接发展历程

时间:2021-03-31 08:37:30

一直在找关于HBase线程池的实现方法,不过找到的资料都是之前的老版本,现在都已经被@deprecated。

1.较早的版本使用的HTablePool类

/**
* A simple pool of HTable instances.
* HTable的线程池
*
* Each HTablePool acts as a pool for all tables. To use, instantiate an HTablePool and use {@link #getTable(String)} to get an HTable from the pool.
*使用的时候使用getTable获得一个HTable实例
*
* Once you are done with it, close your instance of {@link HTableInterface} by calling {@link HTableInterface#close()} rather than returning the tables to the pool with (deprecated) {@link #putTable(HTableInterface)}.
* 使用完成后使用close()关闭实例而不是再返回到线程池
*
* A pool can be created with a <i>maxSize</i> which defines the most HTable references that will ever be retained for each table. Otherwise the default is {@link Integer#MAX_VALUE}.
* 可以预定义池中线程数量,不设定的话就按默认最大连接数
*
* Pool will manage its own connections to the cluster. See {@link HConnectionManager}.
* 线程池通过HConnectionManager管理自己到集群的connections
*
* @deprecated as of 0.98.1. See {@link HConnection#getTable(String)}.
* 0.98.1版本后废弃使用,推荐使用HConnection类中getTable(String)方法获得HTable实例
*/

@InterfaceAudience.Private
@Deprecated
public class HTablePool implements Closeable {
...
}

然后查看了HConnectionManager,发现推荐使用ConnectionFactory替代

/*
*...
* @deprecated Please use ConnectionFactory instead
* 推荐使用ConnectionFactory替代
*/
@InterfaceAudience.Public
@InterfaceStability.Evolving
@Deprecated
public class HConnectionManager extends ConnectionFactory {
...
}

2.hbase0.98.1版本后使用了HConnection类

/**
* A cluster connection. Knows how to find the master, locate regions out on the cluster, keeps a cache of locations and then knows how to re-calibrate after they move. You need one of these to talk to your HBase cluster. {@link HConnectionManager} manages instances of this class. See it for how to get one of these.
* 一个建立到集群的连接,可以在集群上找到master定位regions,保持高速缓存的位置,然后知道如何在他们重新分配后重新定位。通过HConnectionManager来管理这个实例。
*
* <p>This is NOT a connection to a particular server but to ALL servers in the cluster. Individual connections are managed at a lower level.
* 这不是一个连接到特定的服务器,而是连接到集群中的所有服务器。在较低的水平上管理单独的连接。
*
* <p>HConnections are used by {@link HTable} mostly but also by {@link HBaseAdmin}, and {@link org.apache.hadoop.hbase.zookeeper.MetaTableLocator}. HConnection instances can be shared. Sharing is usually what you want because rather than each HConnection instance having to do its own discovery of regions out on the cluster, instead, all clients get to share the one cache of locations. {@link HConnectionManager} does the sharing for you if you go by it getting connections. Sharing makes cleanup of HConnections awkward. See {@link HConnectionManager} for cleanup discussion.
* <p>HConnections一般由HTable使用,但HBaseAdmin和MetaTableLocator也能使用。HConnection的实例可以被共享,共享并不是每个实例在集群独立的寻找regions,而是所有客户端共享一块缓存地址。HConnectionManager可以提供获得连接,共享使得回收HConnections连接比较麻烦。
*
* @see HConnectionManager
* @deprecated in favor of {@link Connection} and {@link ConnectionFactory}
* 使用Connection和ConnectionFactory替代这个类
*/
@InterfaceAudience.Public
@InterfaceStability.Stable
@Deprecated
public interface HConnection extends Connection {
...
/**
* 返回HTableInterface实例的一个方法
* Retrieve an HTableInterface implementation for access to a table.
* 得到一个HTableInterface实例来操作表
* The returned HTableInterface is not thread safe, a new instance should be created for each using thread.
* 返回的表不是线程安全的,应该为每个线程创建一个新的实例。
* This is a lightweight operation, pooling or caching of the returned HTableInterface is neither required nor desired.
* 这是一个轻量级的操作,不要使用线程池或缓存返回的HTableInterface实例
* @param tableName
* @return an HTable to use for interactions with this table
*/
@Override
public HTableInterface getTable(TableName tableName) throws IOException;

...
}

于是打开HTableInterface看了下,发现HTableInterface被Table取代

/**
* @since 0.21.0
* @deprecated use {@link org.apache.hadoop.hbase.client.Table} instead
*/

@Deprecated
public interface HTableInterface extends Table {
...
}

有必要再去HTable里看一眼,果然是被Table取代了

  /**
* Creates an object to access a HBase table.
* @param conf Configuration object to use.
* @param tableName Name of the table.
* @throws IOException if a remote or network exception occurs
* @deprecated Constructing HTable objects manually has been deprecated. Please use
* {@link Connection} to instantiate a {@link Table} instead.
* 被Table取代了
*/

@Deprecated
public HTable(Configuration conf, final String tableName)
throws IOException {
this(conf, TableName.valueOf(tableName));
}
...

3.现在看看现在的版本使用ConnectionFactory管理Connection

/**
* A cluster connection encapsulating lower level individual connections to actual servers and a connection to zookeeper. Connections are instantiated through the {@link ConnectionFactory} class. The lifecycle of the connection is managed by the caller, who has to {@link #close()} the connection to release the resources.
*封装了一个低级别个人连接到实际的服务器和连接到Zookeeper的群集连接。连接是由ConnectionFactory建立的,其生命周期由调用者管理,使用close()方法释放连接资源;
*
* <p> The connection object contains logic to find the master, locate regions out on the cluster, keeps a cache of locations and then knows how to re-calibrate after they move.
* The individual connections to servers, meta cache, zookeeper connection, etc are all shared by the {@link Table} and {@link Admin} instances obtained from this connection.
* 单个连接可以被Table和Admin实例共享
*
* <p> Connection creation is a heavy-weight operation. Connection implementations are thread-safe, so that the client can create a connection once, and share it with different threads. {@link Table} and {@link Admin} instances, on the other hand, are light-weight and are not thread-safe. Typically, a single connection per client application is instantiated and every thread will obtain its own Table instance. Caching or pooling of {@link Table} and {@link Admin} is not recommended.
* <p> 连接创建是一个重量级的操作。连接实现是线程安全的,客户端可以创建一次,连接并分享给不同的线程。Table和Admin是轻量级的而且不是线程安全的。通常情况下,每个客户端应用程序的单个连接实例化,并且每个线程将获取它自己的表实例。缓存或线程池Table和Admin不推荐使用。
*
* @see ConnectionFactory
* @since 0.99.0
*/
@InterfaceAudience.Public
@InterfaceStability.Evolving
public interface Connection extends Abortable, Closeable {
...
}

所以以后使用ConnectionFactory创建Connection然后获得Table实例来进行表操作

使用方法:

Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf("table1"));
try {
// Use the table as needed, for a single operation and a single thread
} finally {
table.close();
connection.close();
}