I need to return all values from colA
that are not in colB
from mytable
. I am using:
我需要从mytable返回colA中不在colB中的所有值。我在用:
SELECT DISTINCT(colA) FROM mytable WHERE colA NOT IN (SELECT colB FROM mytable)
It is working however the query is taking an excessively long time to complete.
但它正在运行,但查询需要花费很长时间才能完成。
Is there a more efficient way to do this?
有没有更有效的方法来做到这一点?
3 个解决方案
#1
13
In standard SQL there are no parentheses in DISTINCT colA
. DISTINCT
is not a function.
在标准SQL中,DISTINCT colA中没有括号。 DISTINCT不是一个功能。
SELECT DISTINCT colA
FROM mytable
WHERE colA NOT IN (SELECT DISTINCT colB FROM mytable);
Added DISTINCT
to the sub-select as well. If you have many duplicates it could speed up the query.
也将DISTINCT添加到子选择中。如果您有许多重复项,它可以加快查询速度。
A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN
as alternative to exclude the values in valB
, and an alternative way to get distinct values with GROUP BY
:
CTE可能更快,具体取决于您的DBMS。我还演示了LEFT JOIN作为排除valB中值的替代方法,以及使用GROUP BY获取不同值的另一种方法:
WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM mytable m
LEFT JOIN x ON x.colB = m.colA
WHERE x.colB IS NULL
GROUP BY m.colA;
Or, simplified further, and with a plain subquery (probably fastest):
或者,进一步简化,并使用普通子查询(可能最快):
SELECT DISTINCT m.colA
FROM mytable m
LEFT JOIN mytable x ON x.colB = m.colA
WHERE x.colB IS NULL;
There are basically 4 techniques to exclude rows with keys present in another (or the same) table:
基本上有4种技术可以排除具有另一个(或相同)表中的键的行:
- Select rows which are not present in other table
选择其他表中不存在的行
The deciding factor for speed will be indexes. You need to have indexes on colA
and colB
for this query to be fast.
速度的决定因素是指数。您需要在colA和colB上具有索引才能使此查询更快。
#2
6
You can use exists
:
你可以使用exists:
select distinct
colA
from
mytable m1
where
not exists (select 1 from mytable m2 where m2.colB = m1.colA)
exists
does a semi-join to quickly match the values. not in
completes the entire result set and then does an or
on it. exists
is typically faster for values in tables.
exists执行半连接以快速匹配值。不是完成整个结果集然后做一个或上面。对于表中的值,exists通常更快。
#3
#1
13
In standard SQL there are no parentheses in DISTINCT colA
. DISTINCT
is not a function.
在标准SQL中,DISTINCT colA中没有括号。 DISTINCT不是一个功能。
SELECT DISTINCT colA
FROM mytable
WHERE colA NOT IN (SELECT DISTINCT colB FROM mytable);
Added DISTINCT
to the sub-select as well. If you have many duplicates it could speed up the query.
也将DISTINCT添加到子选择中。如果您有许多重复项,它可以加快查询速度。
A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN
as alternative to exclude the values in valB
, and an alternative way to get distinct values with GROUP BY
:
CTE可能更快,具体取决于您的DBMS。我还演示了LEFT JOIN作为排除valB中值的替代方法,以及使用GROUP BY获取不同值的另一种方法:
WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM mytable m
LEFT JOIN x ON x.colB = m.colA
WHERE x.colB IS NULL
GROUP BY m.colA;
Or, simplified further, and with a plain subquery (probably fastest):
或者,进一步简化,并使用普通子查询(可能最快):
SELECT DISTINCT m.colA
FROM mytable m
LEFT JOIN mytable x ON x.colB = m.colA
WHERE x.colB IS NULL;
There are basically 4 techniques to exclude rows with keys present in another (or the same) table:
基本上有4种技术可以排除具有另一个(或相同)表中的键的行:
- Select rows which are not present in other table
选择其他表中不存在的行
The deciding factor for speed will be indexes. You need to have indexes on colA
and colB
for this query to be fast.
速度的决定因素是指数。您需要在colA和colB上具有索引才能使此查询更快。
#2
6
You can use exists
:
你可以使用exists:
select distinct
colA
from
mytable m1
where
not exists (select 1 from mytable m2 where m2.colB = m1.colA)
exists
does a semi-join to quickly match the values. not in
completes the entire result set and then does an or
on it. exists
is typically faster for values in tables.
exists执行半连接以快速匹配值。不是完成整个结果集然后做一个或上面。对于表中的值,exists通常更快。