I have two datasets: 1)
我有两个数据集:1)
Table A
id name
1 raju
2 ramu
2 ramu
3 rakesh
表A.
2)
2)
Table B
id Status
1 Y
1 Y
2 N
2 N
2 Y
3 N
表B.
I want to perform a left outer join (Table A Left outer join Table B) in hive in such a way that while joining the two datasets on column 'id', if the Status column in table B has 'Y' at least once, then the resultant dataset will have the Status as 'Yes' like below:
我想在hive中执行左外连接(表A左外连接表B),以便在列'id'上连接两个数据集时,如果表B中的Status列至少有一次'Y',那么结果数据集的状态为“是”,如下所示:
Final Result:
id name Status
1 raju Yes
2 ramu Yes
2 ramu Yes
3 rakesh No
I do not want to increase the number of records in the result while performing the join. The result table should have only 4 records, not 5(increased records based on the matching join condition).
我不想在执行连接时增加结果中的记录数。结果表应该只有4条记录,而不是5条记录(根据匹配的连接条件增加记录)。
How do I achieve this?
我该如何实现这一目标?
1 个解决方案
#1
1
To meet this requirement, you need to reduce the row count of your table B to either one row or zero rows per row of table A.
要满足此要求,您需要将表B的行数减少到表A的每行或每行零行。
Do that like this, getting one row for every id
value with a status of Y
.
这样做,为每个id值获取一行,状态为Y.
SELECT DISTINCT id, Status
FROM B
WHERE Status = 'Y'
Then you can use the old LEFT JOIN ... IS NOT NULL
trick to figure out which rows of A have matching rows of B.
然后你可以使用旧的LEFT JOIN ... IS NOT NULL技巧来确定A的哪些行具有匹配的B行。
This does it. (http://sqlfiddle.com/#!9/71d84b/1/0)
这样做。 (http://sqlfiddle.com/#!9/71d84b/1/0)
SELECT A.id, A.name,
CASE WHEN B.Status IS NOT NULL THEN 'Yes' ELSE 'No' END Status
FROM A
LEFT JOIN (
SELECT DISTINCT id, Status
FROM B
WHERE Status = 'Y'
) B ON A.id = B.id
#1
1
To meet this requirement, you need to reduce the row count of your table B to either one row or zero rows per row of table A.
要满足此要求,您需要将表B的行数减少到表A的每行或每行零行。
Do that like this, getting one row for every id
value with a status of Y
.
这样做,为每个id值获取一行,状态为Y.
SELECT DISTINCT id, Status
FROM B
WHERE Status = 'Y'
Then you can use the old LEFT JOIN ... IS NOT NULL
trick to figure out which rows of A have matching rows of B.
然后你可以使用旧的LEFT JOIN ... IS NOT NULL技巧来确定A的哪些行具有匹配的B行。
This does it. (http://sqlfiddle.com/#!9/71d84b/1/0)
这样做。 (http://sqlfiddle.com/#!9/71d84b/1/0)
SELECT A.id, A.name,
CASE WHEN B.Status IS NOT NULL THEN 'Yes' ELSE 'No' END Status
FROM A
LEFT JOIN (
SELECT DISTINCT id, Status
FROM B
WHERE Status = 'Y'
) B ON A.id = B.id