如何提高大型左外连接的性能?

时间:2022-01-31 03:24:14

These are my tables:

这些是我的表:

Source_Artikelen - columns: article - description (1.438.171 records)

Source_Artikelen - 列:文章 - 描述(1.438.171记录)

Source_LevArt - columns: article - manufacturer part number (1.751.801 records)

Source_LevArt - columns:article - 制造商部件号(1.751.801记录)

... and this is the query I'm performing

......这是我正在执行的查询

SELECT a.Artikel,a.Omschrijving, l.Artikel_Leverancier
  FROM Source_Artikelen AS a
       LEFT OUTER JOIN Source_LevArt AS l
    ON a.Artikel Like l.Artikel

This query was running tonight for more than 20 hours before I cancelled it manually.

在我手动取消它之前,此查询今晚运行了20多个小时。

So what am I trying to do?

那我该怎么办呢?

I want to list down all articles from my table Source_Artikelen. Then I would like to see if there are manufacturer part numbers available in Source_LevArt.

我想列出我的表Source_Artikelen中的所有文章。然后我想看看Source_LevArt中是否有可用的制造商部件号。

  • not every article from Source_Artikelen is present in Source_LevArt
  • 并非Source_Artikelen中的每篇文章都出现在Source_LevArt中

  • sometimes there are multiple manufacturer part numbers in Source_LevArt for one article
  • 有时在Source_LevArt中有一篇文章中有多个制造商部件号

That's why I need to use a LEFT OUTER JOIN.

这就是我需要使用LEFT OUTER JOIN的原因。

I've tried some things with indexes, but it's not really helping. Possibly I'm doing something wrong.

我已经尝试了一些索引,但它并没有真正帮助。可能我做错了什么。

I can really use some help, as this is only the beginning of the query I'm writing. I will have to add 2 other (large) tabes as left outer join later...

我真的可以使用一些帮助,因为这只是我正在编写的查询的开始。我将不得不在以后添加2个其他(大)标签作为左外连接...


UPDATE 19/12/2016 16:24: Hi piet.t

更新2016/12/16 16:24:嗨piet.t

SELECT TOP(20) a.Artikel,a.Omschrijving, l.Artikel_Leverancier 
  FROM Source_Artikelen AS a 
       LEFT JOIN Source_LevArt AS l 
    ON a.Artikel LIKE l.Artikel 

this takes 9 seconds

这需要9秒

SELECT TOP(20) a.Artikel,a.Omschrijving, l.Artikel_Leverancier 
  FROM Source_Artikelen AS a 
       LEFT JOIN Source_LevArt AS l 
    ON a.Artikel = l.Artikel 

this takes 1 second!

这需要1秒钟!

I really didn't know there was a difference as I'm not using wildcards.

因为我没有使用通配符,所以我真的不知道有什么区别。

2 个解决方案

#1


0  

This is covered by Paul White here :Dynamic Seeks and Hidden Implicit Conversions

Paul White在此处介绍了这一点:动态搜索和隐藏隐式转换

using like even when there is exact match tends to do a dynamic seek..which means knowing the column to be seeked at execution time,not at compilation time..

即使在完全匹配时使用就像往往会进行动态搜索...这意味着在执行时知道要搜索的列,而不是在编译时...

below is how .,column is derived for the tables in below example of mine..

下面是如何。,列是为我的下面的例子中的表派生的..

[Expr1005] = Scalar Operator(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0)),
[Expr1006] = Scalar Operator(LikeRangeStart(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0))),
[Expr1007] = Scalar Operator(LikeRangeEnd(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0))),
[Expr1008] = Scalar Operator(LikeRangeInfo(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0)))

[Expr1005] =标量运算符(CONVERT_IMPLICIT(varchar(12),[Aegon_X]。[Sales]。[Orders]。[custid] as [o]。[custid],0)),[Expr1006] =标量运算符(LikeRangeStart) (CONVERT_IMPLICIT(varchar(12),[Aegon_X]。[Sales]。[Orders]。[custid] as [o]。[custid],0))),[Expr1007] =标量运算符(LikeRangeEnd(CONVERT_IMPLICIT(varchar( 12),[Aegon_X]。[销售]。[订单]。[custid] as [o]。[custid],0))),[Expr1008] =标量运算符(LikeRangeInfo(CONVERT_IMPLICIT(varchar(12),[Aegon_X ]。[销售]。[订单]。[custid] as [o]。[custid],0)))

below is what Paul describes ,how those are derived

以下是保罗描述的内容,以及如何得出这些内容

The upper tooltip shows that the Compute Scalar uses three internal functions, LikeRangeStart, LikeRangeEnd, and LikeRangeInfo.

上部工具提示显示Compute Scalar使用三个内部函数,LikeRangeStart,LikeRangeEnd和LikeRangeInfo。

The first two functions describe the range as an open interval. The third function returns a set of flags encoded in an integer, that are used internally to define certain seek properties for the Storage Engine. The lower tooltip shows the seek on the open interval described by the result of LikeRangeStart and LikeRangeEnd, and the application of the residual predicate ‘LIKE @Like’.

前两个函数将范围描述为开放区间。第三个函数返回一组以整数编码的标志,这些标志在内部用于定义存储引擎的某些搜索属性。较低的工具提示显示了由LikeRangeStart和LikeRangeEnd的结果描述的开放间隔的搜索,以及残差谓词'LIKE @Like'的应用。

So in summary ,using like SQL uses dynamic seek to derive seek properties at compile time ..

总而言之,使用类似SQL使用动态搜索在编译时派生搜索属性。

Examples below showing different plans

以下示例显示了不同的计划

using like :
I really didn't know there was a difference as I'm not using wildcards.

使用like:我真的不知道有什么不同,因为我没有使用通配符。

select top 10* from sales.orders o
join
sales.customers c
on c.custid like o.custid

plan:
如何提高大型左外连接的性能?

Now when using exact match..

现在使用完全匹配..

 select top 10* from sales.orders o
    join
    sales.customers c
    on c.custid =o.custid   

如何提高大型左外连接的性能?

You can see merge join plan

您可以看到合并加入计划

#2


0  

Use = instead of like.

使用=而不是喜欢。

These 2 indexes should give you the best performance for a Select.

这两个索引应该为Select提供最佳性能。

CREATE INDEX idx ON Source_Artikelen(Artikel) INCLUDE(Omschrijving);

CREATE INDEX idx ON Source_LevArt(Artikel) INCLUDE(Artikel_Leverancier);

If you implement them and try your SELECT again, can you please upload a copy of your execution plan?

如果您实施它们并再次尝试SELECT,请上传一份执行计划吗?

#1


0  

This is covered by Paul White here :Dynamic Seeks and Hidden Implicit Conversions

Paul White在此处介绍了这一点:动态搜索和隐藏隐式转换

using like even when there is exact match tends to do a dynamic seek..which means knowing the column to be seeked at execution time,not at compilation time..

即使在完全匹配时使用就像往往会进行动态搜索...这意味着在执行时知道要搜索的列,而不是在编译时...

below is how .,column is derived for the tables in below example of mine..

下面是如何。,列是为我的下面的例子中的表派生的..

[Expr1005] = Scalar Operator(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0)),
[Expr1006] = Scalar Operator(LikeRangeStart(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0))),
[Expr1007] = Scalar Operator(LikeRangeEnd(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0))),
[Expr1008] = Scalar Operator(LikeRangeInfo(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0)))

[Expr1005] =标量运算符(CONVERT_IMPLICIT(varchar(12),[Aegon_X]。[Sales]。[Orders]。[custid] as [o]。[custid],0)),[Expr1006] =标量运算符(LikeRangeStart) (CONVERT_IMPLICIT(varchar(12),[Aegon_X]。[Sales]。[Orders]。[custid] as [o]。[custid],0))),[Expr1007] =标量运算符(LikeRangeEnd(CONVERT_IMPLICIT(varchar( 12),[Aegon_X]。[销售]。[订单]。[custid] as [o]。[custid],0))),[Expr1008] =标量运算符(LikeRangeInfo(CONVERT_IMPLICIT(varchar(12),[Aegon_X ]。[销售]。[订单]。[custid] as [o]。[custid],0)))

below is what Paul describes ,how those are derived

以下是保罗描述的内容,以及如何得出这些内容

The upper tooltip shows that the Compute Scalar uses three internal functions, LikeRangeStart, LikeRangeEnd, and LikeRangeInfo.

上部工具提示显示Compute Scalar使用三个内部函数,LikeRangeStart,LikeRangeEnd和LikeRangeInfo。

The first two functions describe the range as an open interval. The third function returns a set of flags encoded in an integer, that are used internally to define certain seek properties for the Storage Engine. The lower tooltip shows the seek on the open interval described by the result of LikeRangeStart and LikeRangeEnd, and the application of the residual predicate ‘LIKE @Like’.

前两个函数将范围描述为开放区间。第三个函数返回一组以整数编码的标志,这些标志在内部用于定义存储引擎的某些搜索属性。较低的工具提示显示了由LikeRangeStart和LikeRangeEnd的结果描述的开放间隔的搜索,以及残差谓词'LIKE @Like'的应用。

So in summary ,using like SQL uses dynamic seek to derive seek properties at compile time ..

总而言之,使用类似SQL使用动态搜索在编译时派生搜索属性。

Examples below showing different plans

以下示例显示了不同的计划

using like :
I really didn't know there was a difference as I'm not using wildcards.

使用like:我真的不知道有什么不同,因为我没有使用通配符。

select top 10* from sales.orders o
join
sales.customers c
on c.custid like o.custid

plan:
如何提高大型左外连接的性能?

Now when using exact match..

现在使用完全匹配..

 select top 10* from sales.orders o
    join
    sales.customers c
    on c.custid =o.custid   

如何提高大型左外连接的性能?

You can see merge join plan

您可以看到合并加入计划

#2


0  

Use = instead of like.

使用=而不是喜欢。

These 2 indexes should give you the best performance for a Select.

这两个索引应该为Select提供最佳性能。

CREATE INDEX idx ON Source_Artikelen(Artikel) INCLUDE(Omschrijving);

CREATE INDEX idx ON Source_LevArt(Artikel) INCLUDE(Artikel_Leverancier);

If you implement them and try your SELECT again, can you please upload a copy of your execution plan?

如果您实施它们并再次尝试SELECT,请上传一份执行计划吗?