如何在没有重复项的配置单元中执行LEFT OUTER JOIN以仅检查右表中的一个值?

时间:2022-06-23 15:34:27

I have two datasets: 1)

我有两个数据集:1)

Table A 
id  name
1   raju
2   ramu
2   ramu
3   rakesh

Table A

表A.

2)

2)

Table B 
id  Status
1   Y
1   Y
2   N
2   N
2   Y
3   N

Table B

表B.

I want to perform a left outer join (Table A Left outer join Table B) in hive in such a way that while joining the two datasets on column 'id', if the Status column in table B has 'Y' at least once, then the resultant dataset will have the Status as 'Yes' like below:

我想在hive中执行左外连接(表A左外连接表B),以便在列'id'上连接两个数据集时,如果表B中的Status列至少有一次'Y',那么结果数据集的状态为“是”,如下所示:

Final Result:       
id  name    Status
1   raju    Yes
2   ramu    Yes
2   ramu    Yes
3   rakesh  No

I do not want to increase the number of records in the result while performing the join. The result table should have only 4 records, not 5(increased records based on the matching join condition).

我不想在执行连接时增加结果中的记录数。结果表应该只有4条记录,而不是5条记录(根据匹配的连接条件增加记录)。

How do I achieve this?

我该如何实现这一目标?

1 个解决方案

#1


1  

To meet this requirement, you need to reduce the row count of your table B to either one row or zero rows per row of table A.

要满足此要求,您需要将表B的行数减少到表A的每行或每行零行。

Do that like this, getting one row for every id value with a status of Y.

这样做,为每个id值获取一行,状态为Y.

         SELECT DISTINCT id, Status
           FROM B
          WHERE Status = 'Y'

Then you can use the old LEFT JOIN ... IS NOT NULL trick to figure out which rows of A have matching rows of B.

然后你可以使用旧的LEFT JOIN ... IS NOT NULL技巧来确定A的哪些行具有匹配的B行。

This does it. (http://sqlfiddle.com/#!9/71d84b/1/0)

这样做。 (http://sqlfiddle.com/#!9/71d84b/1/0)

SELECT A.id, A.name, 
       CASE WHEN B.Status IS NOT NULL THEN 'Yes' ELSE 'No' END Status
  FROM A
  LEFT JOIN (
         SELECT DISTINCT id, Status
           FROM B
          WHERE Status = 'Y'
       ) B ON A.id = B.id

#1


1  

To meet this requirement, you need to reduce the row count of your table B to either one row or zero rows per row of table A.

要满足此要求,您需要将表B的行数减少到表A的每行或每行零行。

Do that like this, getting one row for every id value with a status of Y.

这样做,为每个id值获取一行,状态为Y.

         SELECT DISTINCT id, Status
           FROM B
          WHERE Status = 'Y'

Then you can use the old LEFT JOIN ... IS NOT NULL trick to figure out which rows of A have matching rows of B.

然后你可以使用旧的LEFT JOIN ... IS NOT NULL技巧来确定A的哪些行具有匹配的B行。

This does it. (http://sqlfiddle.com/#!9/71d84b/1/0)

这样做。 (http://sqlfiddle.com/#!9/71d84b/1/0)

SELECT A.id, A.name, 
       CASE WHEN B.Status IS NOT NULL THEN 'Yes' ELSE 'No' END Status
  FROM A
  LEFT JOIN (
         SELECT DISTINCT id, Status
           FROM B
          WHERE Status = 'Y'
       ) B ON A.id = B.id