Here's a basic query that relies on two non-clustered indexes:
下面是一个依赖于两个非聚集索引的基本查询:
SELECT cc.categoryid, count(*) from company c
INNER JOIN companycategory cc on cc.companyid = c.id
WHERE c.placeid like 'ca_%'
GROUP BY cc.categoryid order by count(*) desc
When the exact same database is hosted on SQL Server 2008, on virtually any hardware, this returns < 500 ms. Even with the cache buffers cleared:
当相同的数据库驻留在SQL Server 2008上时,在几乎任何硬件上,这将返回< 500 ms。即使缓存缓冲区已清除:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
... this still returns in ~1 second on traditional SQL.
…在传统SQL上,这仍然会在1秒内返回。
On Azure, it takes approximately 3.5 seconds to return each time.
在Azure上,每次返回大约需要3.5秒。
Some articles out there seem to suggest that people are generally happy with query performance in SQL Azure. And yet here's a basic scenario where 'obvious' tuning has been exhausted and there's no network latency issues to speak of. It's just really slow when working w/ large tables (companycategroy has 1.2M records, places has 7.5K).
有些文章似乎表明,人们通常对SQL Azure的查询性能感到满意。然而,这里有一个基本的场景,即“明显”的调优已经耗尽,并且没有网络延迟问题。当使用大型表格时(company roy有120万记录,places有7.5万记录),这真的很慢。
The total database size is no more than 4GB. Selecting 'Web' edition vs. 'Enterprise' edition doesn't seem to make much of a difference either.
数据库的总大小不超过4GB。选择“网络”版vs。《企业版》似乎也没什么不同。
What am I missing?
我缺少什么?
This is only a basic example, it only gets worse with more sophisticated queries, all of have been reviewed, tuned, and perform well on-premise.
这只是一个基本的示例,它只会随着更复杂的查询而变得更糟,所有查询都经过了评审、调优,并且在内部执行得很好。
Here's the execution plan:
执行计划:
|--Sort(ORDER BY:([Expr1004] DESC))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
|--Hash Match(Aggregate, HASH:([cc].[CategoryId]), RESIDUAL:([XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId] = [XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId]) DEFINE:([Expr1007]=COUNT(*)))
|--Hash Match(Inner Join, HASH:([c].[Id])=([cc].[CompanyId]))
|--Index Scan(OBJECT:([XX].[dbo].[Company].[IX_Company_PlaceId] AS [c]), WHERE:([XX].[dbo].[Company].[PlaceId] as [c].[PlaceId] like N'ca_%'))
|--Index Scan(OBJECT:([XX].[dbo].[CompanyCategory].[IX_CompanyCategory_CompanyId] AS [cc]))
And here are the stats:
以下是统计数据:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 14 ms, elapsed time = 14 ms.
(789 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CompanyCategory'. Scan count 1, logical reads 5183, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Company'. Scan count 1, logical reads 8710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 3328 ms, elapsed time = 3299 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Index definitions are as follows:
索引定义如下:
CREATE NONCLUSTERED INDEX [IX_Company_PlaceId] ON [dbo].[Company]
(
[PlaceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
CREATE NONCLUSTERED INDEX [IX_CompanyCategory_CompanyId] ON [dbo].[CompanyCategory]
(
[CompanyId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
ALTER TABLE [dbo].[Company] ADD CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
3 个解决方案
#1
6
They seem to use one CPU
core for your query while on your machine the query probably parallelizes (all operations used by the query do parallelize).
它们似乎在您的机器上使用一个CPU核心,而在您的机器上查询可能是并行化(查询使用的所有操作都执行并行化)。
However, an index scan is used for the LIKE
predicate for some reason while an index seek could suffice.
但是,出于某种原因,LIKE谓词使用索引扫描,而索引查找就足够了。
Please try using this explicit condition instead of LIKE
:
请尝试使用这个明确的条件,而不是像:
c.placeid >= 'ca'
AND c.placeid < 'cb'
and see if it changes the plan to an Index Seek
on IX_CompanyPlaceId
.
并查看它是否将计划更改为IX_CompanyPlaceId的索引查找。
#2
1
Just a few things:
只是几件事:
- Are stats up to date on Azure? I'm a bit wary of that Hash Match for a 1.2M row table
- Azure上的统计数据是最新的吗?对于一个1.2M行表的散列匹配,我有点担心
- Does Azure have auto stats? If not, your local database might have a lot more information that SQL Azure can't use to pick an optimal query plan
- Azure有自动统计功能吗?如果不是,您的本地数据库可能有很多SQL Azure无法使用的信息来选择最优查询计划
- Index
c.placeid
for some statistics on it - 指数c。关于它的一些统计数据
- Why is
c.placeid
a string? Does this follow through tocompanyid
andc.id
? I think this is why you have the Hash Match - try joining on integer surrogate keys instead. - 为什么是c。placeid字符串?这跟公司id和c id有关吗?我认为这就是为什么您有哈希匹配——尝试加入整数代理键。
#3
0
I am posting this link on Azure SQL Database index maintenance since index maintenance still needs a helping hand.
我在Azure SQL数据库索引维护上发布了这个链接,因为索引维护仍然需要帮助。
https://blogs.msdn.microsoft.com/azuresqldbsupport/2016/07/03/how-to-maintain-azure-sql-indexes-and-statistics/
We use a runbook to execute across our 350+ databases on different elastic pools to perform the index maintenance. Hope others find the information as helpful as we did.
我们使用一个runbook在不同的弹性池上跨350+数据库执行索引维护。希望其他人也能像我们一样找到有用的信息。
#1
6
They seem to use one CPU
core for your query while on your machine the query probably parallelizes (all operations used by the query do parallelize).
它们似乎在您的机器上使用一个CPU核心,而在您的机器上查询可能是并行化(查询使用的所有操作都执行并行化)。
However, an index scan is used for the LIKE
predicate for some reason while an index seek could suffice.
但是,出于某种原因,LIKE谓词使用索引扫描,而索引查找就足够了。
Please try using this explicit condition instead of LIKE
:
请尝试使用这个明确的条件,而不是像:
c.placeid >= 'ca'
AND c.placeid < 'cb'
and see if it changes the plan to an Index Seek
on IX_CompanyPlaceId
.
并查看它是否将计划更改为IX_CompanyPlaceId的索引查找。
#2
1
Just a few things:
只是几件事:
- Are stats up to date on Azure? I'm a bit wary of that Hash Match for a 1.2M row table
- Azure上的统计数据是最新的吗?对于一个1.2M行表的散列匹配,我有点担心
- Does Azure have auto stats? If not, your local database might have a lot more information that SQL Azure can't use to pick an optimal query plan
- Azure有自动统计功能吗?如果不是,您的本地数据库可能有很多SQL Azure无法使用的信息来选择最优查询计划
- Index
c.placeid
for some statistics on it - 指数c。关于它的一些统计数据
- Why is
c.placeid
a string? Does this follow through tocompanyid
andc.id
? I think this is why you have the Hash Match - try joining on integer surrogate keys instead. - 为什么是c。placeid字符串?这跟公司id和c id有关吗?我认为这就是为什么您有哈希匹配——尝试加入整数代理键。
#3
0
I am posting this link on Azure SQL Database index maintenance since index maintenance still needs a helping hand.
我在Azure SQL数据库索引维护上发布了这个链接,因为索引维护仍然需要帮助。
https://blogs.msdn.microsoft.com/azuresqldbsupport/2016/07/03/how-to-maintain-azure-sql-indexes-and-statistics/
We use a runbook to execute across our 350+ databases on different elastic pools to perform the index maintenance. Hope others find the information as helpful as we did.
我们使用一个runbook在不同的弹性池上跨350+数据库执行索引维护。希望其他人也能像我们一样找到有用的信息。