suppose my query is: "select b.* from sourav_test.test1
a inner join sourav_test.test2
b on a.id=b.id". I need the best and efficient approach for apache beam to write this.
假设我的查询是:“选择b。*来自sourav_test.test1,内部连接sourav_test.test2 b on a.id = b.id”。我需要最好和最有效的apache beam方法来编写它。
1 个解决方案
#1
0
In Apache Beam SDK 2.5 a great approach is using the join library which performs SQL like joins. In the case of inner joins, the syntax would be as follows:
在Apache Beam SDK 2.5中,一个很好的方法是使用连接库来执行类似连接的SQL。在内连接的情况下,语法如下:
innerJoin(PCollection<KV<K,V1>> leftCollection,PCollection<KV<K,V2>> rightCollection)
Relating to your case, the left and side collections represents the collections to be inner joined. The K value would be the type of the key related to both collections. The Vs would represent the values of each collection respectively.
与您的案例相关,左侧和侧面集合表示要内部连接的集合。 K值将是与两个集合相关的密钥的类型。 Vs分别代表每个集合的值。
#1
0
In Apache Beam SDK 2.5 a great approach is using the join library which performs SQL like joins. In the case of inner joins, the syntax would be as follows:
在Apache Beam SDK 2.5中,一个很好的方法是使用连接库来执行类似连接的SQL。在内连接的情况下,语法如下:
innerJoin(PCollection<KV<K,V1>> leftCollection,PCollection<KV<K,V2>> rightCollection)
Relating to your case, the left and side collections represents the collections to be inner joined. The K value would be the type of the key related to both collections. The Vs would represent the values of each collection respectively.
与您的案例相关,左侧和侧面集合表示要内部连接的集合。 K值将是与两个集合相关的密钥的类型。 Vs分别代表每个集合的值。