I need to access the bigtable in one of the transformation from streaming dataflow job.As per my knowledge there are two ways :
我需要在流式数据流工作的转换中访问bigtable。据我所知,有两种方法:
1) we can create the conneciton to bigtable from startBundle method of DoFn and access data from bigtable in processElement method.In this approach dataflow sdk create the new connection to Bigtable every time new element come in stream.
1)我们可以从DoFn的startBundle方法创建与bigtable的连接,并在processElement方法中访问bigtable中的数据。在这种方法中,每当新元素进入流时,数据流sdk就会创建与Bigtable的新连接。
2) Create the bigtable connection at the time transformation obj creation and use that in processElement method, but dataflow sdk creates the obj, serialize it and recreate it in worker node, so Is the connection still active in worker node? or In streaming mode is it good to have open bigtable connection for longer period?
2)在转换obj创建时创建bigtable连接并在processElement方法中使用它,但是dataflow sdk创建obj,序列化它并在worker节点中重新创建它,那么连接是否仍在工作节点中处于活动状态?或者在流媒体模式下,开放bigtable连接更长时间是好的吗?
Or is there any another efficient way to achieve this.
或者是否有任何其他有效的方法来实现这一目标。
Thanks.
谢谢。
1 个解决方案
#1
1
AbstractCloudBigtableTableDoFn maintains the connection in the most optimal way we could think of, which is essentially a singleton per VM. It has a getConnection()
method which will allow you to access a Connection
in a managed way.
AbstractCloudBigtableTableDoFn以我们能想到的最佳方式维护连接,这实际上是每个VM的单例。它有一个getConnection()方法,允许您以托管方式访问Connection。
FWIW, the class is in the bigtable-hbase-dataflow project and not the DataflowSDK.
FWIW,该类位于bigtable-hbase-dataflow项目中,而不是DataflowSDK。
#1
1
AbstractCloudBigtableTableDoFn maintains the connection in the most optimal way we could think of, which is essentially a singleton per VM. It has a getConnection()
method which will allow you to access a Connection
in a managed way.
AbstractCloudBigtableTableDoFn以我们能想到的最佳方式维护连接,这实际上是每个VM的单例。它有一个getConnection()方法,允许您以托管方式访问Connection。
FWIW, the class is in the bigtable-hbase-dataflow project and not the DataflowSDK.
FWIW,该类位于bigtable-hbase-dataflow项目中,而不是DataflowSDK。