I'm pretty much an idiot when it come to databases, I can write the query to do what I want without too many problems, but when I hit a performance issue I really have no idea what to do, so any help would be gratefully received.
当它来到数据库时,我几乎是一个白痴,我可以编写查询来做我想要的而没有太多问题,但是当我遇到性能问题时我真的不知道该怎么做,所以任何帮助都会感激不尽接收。
I have three tables:
我有三张桌子:
Bill
- Bill_Id - BIGINT - Primary key
- BillDate - DATE
Bill_Id - BIGINT - 主键
BillDate - 日期
BillDetail
- BillDetail_Id - BIGINT - Primary key
- Bill_Id - BIGINT - Foreign key for Bill, indexed
- BillDetailType_Id - INT - Foreign key for BillDetailType, indexed
- Charge - MONEY
BillDetail_Id - BIGINT - 主键
Bill_Id - BIGINT - 比尔的外键,已编入索引
BillDetailType_Id - INT - BillDetailType的外键,已编入索引
收费 - 钱
BillDetailType
- BillDetailType_Id - INT - Primary key
- TypeName - NVARCHAR(20)
BillDetailType_Id - INT - 主键
TypeName - NVARCHAR(20)
Each Bill has multiple BillDetails, which are basically the individual items on a bill. Every BillDetail has a BillDetailType, which is what kind of bill item it is (e.g. electricity, internet, tax).
每个账单都有多个BillDetails,基本上是账单上的单个项目。每个BillDetail都有BillDetailType,它是什么类型的账单项目(例如电力,互联网,税收)。
I have also created a view like this:
我还创建了一个这样的视图:
CREATE VIEW BillSubtotal
AS
SELECT b.*,
(SELECT SUM(bd.Charge) FROM BillDetail AS bd INNER JOIN BillDetailType AS bdt ON bd.BillDetailType_Id = bdt.BillDetailType_Id
WHERE (bdt.TypeName = 'Tax') AND (bd.Bill_Id = b.Bill_id)) AS Tax,
(SELECT SUM(bd.Charge) FROM BillDetail AS bd INNER JOIN BillDetailType AS bdt ON bd.BillDetailType_Id = bdt.BillDetailType_Id
WHERE (bdt.TypeName <> 'Tax') AND (bd.Bill_Id = b.Bill_id)) AS NonTaxTotal
FROM Bill AS b
Running that view takes about 14 seconds with the current dev database, which has about 60000 Bills and 700000 BillDetails. There are 26 different BillDetailTypes. I'd like to add some more subtotals once I get this working, but for now that's all I have.
使用当前的开发数据库运行该视图大约需要14秒,该数据库有大约60000条票据和700000条BillDetails。有26种不同的BillDetailTypes。一旦我开始工作,我想添加更多的小计,但是现在这就是我的全部。
Now I'm trying to do a join like this:
现在我想尝试这样的连接:
SELECT bs.BillDate, bs.Tax, bs.NonTaxTotal, bd.Charge, bdt.TypeName FROM
BillDetail bd
INNER JOIN BillSubtotal bs ON bs.Bill_Id = bd.Bill_Id
INNER JOIN BillDetailType bdt ON bdt.BillDetailType_Id = bd.BillDetailType_Id
I would like to calculate what percentage of a pre-tax Bill a particular BillDetail is and some other things, so I will eventually have something like bd.Charge/bs.NonTaxTotal*100, but at the moment this query takes 14 hours to run and I really don't understand why.
我想计算一个特定BillDetail的税前账单的百分比和其他一些东西,所以我最终会有像bd.Charge / bs.NonTaxTotal * 100这样的东西,但此刻此查询需要14个小时才能运行我真的不明白为什么。
If I remove either of the INNER JOINs, the query speeds up dramatically:
如果我删除任何一个INNER JOIN,查询速度会急剧增加:
SELECT bs.BillDate, bs.Tax, bs.NonTaxTotal, bd.Charge FROM
BillDetail bd
INNER JOIN BillSubtotal bs ON bs.Bill_Id = bd.Bill_Id
Takes about 1.5 minutes to run.
大约需要1.5分钟才能运行。
SELECT bd.Charge, bdt.TypeName FROM
BillDetail bd
INNER JOIN BillDetailType bdt ON bdt.BillDetailType_Id = bd.BillDetailType_Id
Takes about 12 seconds.
大约需要12秒。
I don't understand why either of the joins by themselves runs in such a short time, but when I do the joins together it takes hours. Maybe it's something very obvious, but because I don't really understand how the queries are being evaluated I'm missing it. I looked at the execution plan, but I can't glean anything useful from it and I'm kind of at a dead end. I've tried various ways of switching things around, moving one of the joins to a subquery and other things I thought might help, but nothing I've done has changed the performance.
我不明白为什么其中任何一个连接本身都在如此短的时间内运行,但是当我连接在一起时需要几个小时。也许这是非常明显的事情,但因为我真的不明白如何评估查询我错过了它。我查看了执行计划,但我无法收集任何有用的东西,而且我有点死路一条。我已经尝试了各种方法来切换事物,将其中一个连接移动到子查询以及我认为可能有帮助的其他事情,但我所做的一切都没有改变性能。
Thanks for any help.
谢谢你的帮助。
2 个解决方案
#1
3
I would suggest not using a view at all. I did a bunch of this several years ago, but they just became too difficult to manage over a long time. If you add a column to one of the tables, you should update the view. It just becomes too laborious. That being said, you can add indexes to views.
我建议不要使用视图。几年前我做了很多这样的事情,但是他们在很长一段时间内都变得难以管理。如果向其中一个表添加列,则应更新视图。它变得太费力了。话虽这么说,您可以为视图添加索引。
I would also suggest using the Group By strategy. In my experience, this can be a whole lot faster. I've used it in several cases and found remarkable improvements in speed. Something like this:
我还建议使用Group By策略。根据我的经验,这可以快得多。我在几个案例中使用它,并发现速度显着提高。像这样的东西:
SELECT Bill_Id,
SUM(BillDetail.Charge),
CASE
WHEN BillDetailType.TypeName = 'Tax'
THEN 'Tax'
ELSE 'Not Tax'
END AS TypeName
FROM BillDetail
INNER JOIN BillDetailType
ON BillDetail.BillDetailType_Id = BillDetailType.BillDetailType_Id
GROUP BY Bill_Id, TypeName
You could just use this query and join to that rather than creating a view. This would leverage that indexes on the tables themselves.
您可以使用此查询并加入到该查询而不是创建视图。这将利用表本身的索引。
Finally, you may want to try running whatever query you end up with through Sql Server Profiler tool.
最后,您可能希望尝试通过Sql Server Profiler工具运行您最终得到的任何查询。
I have a blog post about SQL Query Optimization, which recounts various techniques I've learned over the past 7 years.
我有一篇关于SQL查询优化的博客文章,其中叙述了我在过去7年中学到的各种技术。
#2
0
It's hard to know without seeing the exact execution plan, but there's a very good chance you need to create some indexes on your view. The query optimizer won't necessarily use the indexes on the underlying tables, you may need to create indexes specifically on the view itself.
如果没有看到确切的执行计划,很难知道,但是你很有可能需要在视图上创建一些索引。查询优化器不一定使用基础表上的索引,您可能需要专门在视图本身上创建索引。
A screenshot of the execution plan would make it much easier to analyze.
执行计划的屏幕截图将使分析更加容易。
From the MSDN article: It is possible to create a unique clustered index on a view, as well as nonclustered indexes, to improve data access performance on the most complex queries by precomputing and materializing the view. **This is often particularly effective for aggregate views** in decision support or data warehouse environments.
(emphasis mine).
从MSDN文章:可以在视图上创建唯一的聚簇索引以及非聚簇索引,以通过预先计算和实现视图来提高最复杂查询的数据访问性能。 **这对于决策支持或数据仓库环境中的聚合视图**通常特别有效。 (强调我的)。
#1
3
I would suggest not using a view at all. I did a bunch of this several years ago, but they just became too difficult to manage over a long time. If you add a column to one of the tables, you should update the view. It just becomes too laborious. That being said, you can add indexes to views.
我建议不要使用视图。几年前我做了很多这样的事情,但是他们在很长一段时间内都变得难以管理。如果向其中一个表添加列,则应更新视图。它变得太费力了。话虽这么说,您可以为视图添加索引。
I would also suggest using the Group By strategy. In my experience, this can be a whole lot faster. I've used it in several cases and found remarkable improvements in speed. Something like this:
我还建议使用Group By策略。根据我的经验,这可以快得多。我在几个案例中使用它,并发现速度显着提高。像这样的东西:
SELECT Bill_Id,
SUM(BillDetail.Charge),
CASE
WHEN BillDetailType.TypeName = 'Tax'
THEN 'Tax'
ELSE 'Not Tax'
END AS TypeName
FROM BillDetail
INNER JOIN BillDetailType
ON BillDetail.BillDetailType_Id = BillDetailType.BillDetailType_Id
GROUP BY Bill_Id, TypeName
You could just use this query and join to that rather than creating a view. This would leverage that indexes on the tables themselves.
您可以使用此查询并加入到该查询而不是创建视图。这将利用表本身的索引。
Finally, you may want to try running whatever query you end up with through Sql Server Profiler tool.
最后,您可能希望尝试通过Sql Server Profiler工具运行您最终得到的任何查询。
I have a blog post about SQL Query Optimization, which recounts various techniques I've learned over the past 7 years.
我有一篇关于SQL查询优化的博客文章,其中叙述了我在过去7年中学到的各种技术。
#2
0
It's hard to know without seeing the exact execution plan, but there's a very good chance you need to create some indexes on your view. The query optimizer won't necessarily use the indexes on the underlying tables, you may need to create indexes specifically on the view itself.
如果没有看到确切的执行计划,很难知道,但是你很有可能需要在视图上创建一些索引。查询优化器不一定使用基础表上的索引,您可能需要专门在视图本身上创建索引。
A screenshot of the execution plan would make it much easier to analyze.
执行计划的屏幕截图将使分析更加容易。
From the MSDN article: It is possible to create a unique clustered index on a view, as well as nonclustered indexes, to improve data access performance on the most complex queries by precomputing and materializing the view. **This is often particularly effective for aggregate views** in decision support or data warehouse environments.
(emphasis mine).
从MSDN文章:可以在视图上创建唯一的聚簇索引以及非聚簇索引,以通过预先计算和实现视图来提高最复杂查询的数据访问性能。 **这对于决策支持或数据仓库环境中的聚合视图**通常特别有效。 (强调我的)。