I have two databases on a local machine, connected to localhost
. They both have roughly two million rows a piece. I was doing the following very simple join and it took over a minute to complete.
我在本地计算机上有两个数据库,连接到localhost。它们都有大约两百万行。我正在做以下非常简单的加入,花了一分多钟才完成。
select distinct x.patid
from [i 3 sci study].dbo.clm_extract as x
left join [i 3 study].dbo.claims as y on y.patid=x.patid
where y.patid is null
When I looked at the execution plan I saw that the join showplan operator had this to say
当我查看执行计划时,我看到了联合showplan运营商有这个说法
Why is the actual number of rows so exorbitantly high compared to the actual number of rows in both tables?
为什么实际行数与两个表中的实际行数相比过高?
1 个解决方案
#1
3
The LEFT JOIN
will match each row on the left with each row on the right, and then filter. Assuming patid
is not unique in either table, the number of possible match combinations could get very high.
LEFT JOIN将匹配左侧的每一行与右侧的每一行,然后进行过滤。假设patid在两个表中都不是唯一的,那么可能的匹配组合的数量可能会非常高。
Try the following:
请尝试以下方法:
SET NOCOUNT ON;
GO
CREATE TABLE #t1 (Id INT NOT NULL);
CREATE TABLE #t2 (Id INT NOT NULL);
GO
INSERT #t1 (Id)
VALUES (1);
GO 100
INSERT #t2 (Id)
SELECT Id FROM #t1;
GO
Now look at the execution plan for the left join query form:
现在查看左连接查询表单的执行计划:
SELECT *
FROM #t1
LEFT OUTER JOIN #t2 ON #t1.Id = #t2.Id
WHERE #t2.Id IS NULL;
Looking at the execution plan, the hash join shows 10,000 actual rows (100 from #t1 x 100 from #t2). This shows the advantage of checking for existence (or a lack thereof) using any of the following T-SQL syntaxes:
查看执行计划,散列连接显示10,000个实际行(来自#t2的#t1 x 100中的100个)。这显示了使用以下任何T-SQL语法检查是否存在(或缺少)的优点:
SELECT #t1.Id
FROM #t1
WHERE NOT EXISTS (SELECT * FROM #t2 WHERE Id = #t1.Id);
-- #t2.Id must not contain any NULLs for this to be correct
SELECT #t1.Id
FROM #t1
WHERE Id NOT IN (SELECT #t2.Id FROM #t2);
-- Returns DISTINCT #t1 values
SELECT Id
FROM #t1
EXCEPT
SELECT Id
FROM #t2;
Checking for a lack of existence enables the engine to short circuit. This is due to the anti semi join. As soon as the first match is found, it moves on to the next record. For more details, see this blog post.
检查是否存在导致发动机短路。这是由于反半连接。一旦找到第一个匹配,它就会移动到下一个记录。有关更多详细信息,请参阅此博客文章。
#1
3
The LEFT JOIN
will match each row on the left with each row on the right, and then filter. Assuming patid
is not unique in either table, the number of possible match combinations could get very high.
LEFT JOIN将匹配左侧的每一行与右侧的每一行,然后进行过滤。假设patid在两个表中都不是唯一的,那么可能的匹配组合的数量可能会非常高。
Try the following:
请尝试以下方法:
SET NOCOUNT ON;
GO
CREATE TABLE #t1 (Id INT NOT NULL);
CREATE TABLE #t2 (Id INT NOT NULL);
GO
INSERT #t1 (Id)
VALUES (1);
GO 100
INSERT #t2 (Id)
SELECT Id FROM #t1;
GO
Now look at the execution plan for the left join query form:
现在查看左连接查询表单的执行计划:
SELECT *
FROM #t1
LEFT OUTER JOIN #t2 ON #t1.Id = #t2.Id
WHERE #t2.Id IS NULL;
Looking at the execution plan, the hash join shows 10,000 actual rows (100 from #t1 x 100 from #t2). This shows the advantage of checking for existence (or a lack thereof) using any of the following T-SQL syntaxes:
查看执行计划,散列连接显示10,000个实际行(来自#t2的#t1 x 100中的100个)。这显示了使用以下任何T-SQL语法检查是否存在(或缺少)的优点:
SELECT #t1.Id
FROM #t1
WHERE NOT EXISTS (SELECT * FROM #t2 WHERE Id = #t1.Id);
-- #t2.Id must not contain any NULLs for this to be correct
SELECT #t1.Id
FROM #t1
WHERE Id NOT IN (SELECT #t2.Id FROM #t2);
-- Returns DISTINCT #t1 values
SELECT Id
FROM #t1
EXCEPT
SELECT Id
FROM #t2;
Checking for a lack of existence enables the engine to short circuit. This is due to the anti semi join. As soon as the first match is found, it moves on to the next record. For more details, see this blog post.
检查是否存在导致发动机短路。这是由于反半连接。一旦找到第一个匹配,它就会移动到下一个记录。有关更多详细信息,请参阅此博客文章。