I'd like some suggestions for online resources (blogs, guides, etc - not forums) to help me become good at designing high performance SQL Server databases that operate with large amounts of data and have heavy loads in terms of data turnover and queries per minute.
我想对在线资源(博客,指南等 - 而不是论坛)提出一些建议,以帮助我擅长设计高性能SQL Server数据库,这些数据库运行大量数据并且在数据周转和查询方面负担很重分钟。
Suggestions?
EDIT
The load I'm talking about is mainly in terms of data turnover. The main table has up to a million rows, about 30 fields of data of varying size and is updated with about 30-40000 new rows per day and at least 200000 rows are updated with new data every day. These updates happen on a continuing basis throughout the day. On top of this, all changes and updates need to be pulled from the database throughout the day to keep a large Lucene index up to date.
我所说的负载主要是数据周转率。主表有多达一百万行,大约30个不同大小的数据字段,每天更新大约30-40000个新行,每天至少有200000行用新数据更新。这些更新在一整天内持续发生。除此之外,需要在一天内从数据库中提取所有更改和更新,以使最新的Lucene索引保持最新状态。
4 个解决方案
#1
4
Sounds like a fairly manageable load on a moderate server - you haven't said what kind of read operations are happening while these inserts and updates are going on (other than the extractions for Lucene) and the size (byte-wise/data type-wise) of the data (the cardinality you have given seems fine).
在中等服务器上听起来像一个相当可管理的负载 - 你没有说过这些插入和更新正在进行时发生了什么样的读操作(除了Lucene的提取)和大小(按字节/数据类型 - 明智的)数据(你给出的基数似乎很好)。
At this point, I would recommend just using regular SQL Server best practices - determine a schema which is appropriate (normalize, then denormalize only if necessary), review execution plans, use the index tuning wizard, use the DMVs to find the unused indexes and remove them, choose clustered indexes carefully to manage page splits, carefully choose data types and size and use referential integrity and constraints where possible to give the optimizer as much help as possible. Beyond that is performance counters and ensuring your hardware/software installation is tuned.
此时,我建议只使用常规SQL Server最佳实践 - 确定适当的模式(规范化,然后仅在必要时进行非规范化),检查执行计划,使用索引调整向导,使用DMV查找未使用的索引和删除它们,仔细选择聚簇索引来管理页面拆分,仔细选择数据类型和大小,并尽可能使用参照完整性和约束,为优化器提供尽可能多的帮助。除此之外还有性能计数器,并确保您的硬件/软件安装得到优化。
In many/most cases, you'll never need to go beyond that to actually re-engineer your architecture.
在许多/大多数情况下,您永远不需要超越它来实际重新设计您的架构。
However, even after all that, if the read load is heavy, the inserts and updates can cause locking issues between reads and writes, and then you are looking at architectural decisions for your application.
但是,即使在所有这些之后,如果读取负载很重,插入和更新可能会导致读取和写入之间出现锁定问题,然后您正在查看应用程序的体系结构决策。
Also, the million rows and 200k updates a day wouldn't worry me - but you mention Lucene (i.e. full text indexing), so presumably some of the columns are rather large. Updating large columns and exporting them obviously takes far longer - and far more bandwidth and IO. 30 columns in a narrow million row table with traditional data type columns would be a completely different story. You might want to look at the update profile and see if you need to partition the table vertically to move some columns out of the row (if they are large, they will already be stored out of row) to improve the locking behavior.
此外,每天百万行和200k更新不会让我担心 - 但你提到Lucene(即全文索引),所以可能有些列相当大。更新大型列并导出它们显然需要更长的时间 - 而且带宽和IO要多得多。在具有传统数据类型列的窄百万行表中的30列将是完全不同的故事。您可能希望查看更新配置文件,并查看是否需要垂直分区表以将某些列移出行(如果它们很大,它们已经存储在行外)以改善锁定行为。
So the key thing when you have heavy read load: Inserts and updates need to be as fast as possible, lock as little as possible (avoiding lock escalation), update as few indexes as can be afforded to support the read operation.
因此,当您有大量读取负载时,关键是:插入和更新需要尽可能快,尽可能少地锁定(避免锁升级),更新尽可能少的索引以支持读取操作。
If the read load is so heavy (so that the inserts/updates start to conflict) but does not require 100% up to date information (say a 5 minute or 15 minute delay is not noticeable), you can have a read only version of the database which is maintained (either identical through replication, differently indexed for performance, denormalized or differently modeled - like a dimensional model). Perhaps your Lucene indexes can contain additional information so that the expensive read operations all stay in Lucene - i.e. Lucene becomes covering for many large read operations, thereby reducing your read load on the database to essential reads which support the inserts/updates (these are typically small reads) and the transactional part of your app (i.e. say a customer service information screen would use the regular database, while your hourly dashboard would use the secondary database).
如果读取负载太重(以便插入/更新开始冲突)但不需要100%最新信息(例如5分钟或15分钟延迟不明显),您可以拥有只读版本维护的数据库(通过复制相同,不同的性能索引,非规范化或不同建模 - 如维度模型)。也许您的Lucene索引可以包含其他信息,以便昂贵的读取操作都保留在Lucene中 - 即Lucene成为许多大型读取操作的覆盖,从而将数据库上的读取负载减少到支持插入/更新的基本读取(这些通常是小读取)和应用程序的事务部分(即说客户服务信息屏幕将使用常规数据库,而您的每小时仪表板将使用辅助数据库)。
#2
3
You might try the SQL Server samples on CodePlex or DatabaseAnswers.com.
您可以在CodePlex或DatabaseAnswers.com上尝试SQL Server示例。
#3
3
Here are some resources about troubleshooting and optimizing performance in SQL Server, that I've found really helpful:
以下是有关在SQL Server中进行故障排除和优化性能的一些资源,我发现这些资源非常有用:
http://updates.sqlservervideos.com/2009/09/power-up-with-sql-server-sql-server-performance.html
In particular, effective use of indexes can be a huge performance booster. I think that most web applications, in most circumstances, do a lot more reading than writing. Also, the sargability of an expression can have a serious impact on performance.
特别是,有效使用索引可以成为一个巨大的性能助推器。我认为在大多数情况下,大多数Web应用程序比写入更多的阅读。此外,表达式的可攻击性会对性能产生严重影响。
#4
2
This is subject better explored first with books as it is highly technical and complex.
由于技术性和复杂性很高,因此首先要对书籍进行更好的探讨。
I will point out that the people who created this website include several who work with very large databases. You can learn alot from them. http://lessthandot.com/
我要指出,创建这个网站的人包括几个使用非常大的数据库的人。你可以从中学到很多东西。 http://lessthandot.com/
#1
4
Sounds like a fairly manageable load on a moderate server - you haven't said what kind of read operations are happening while these inserts and updates are going on (other than the extractions for Lucene) and the size (byte-wise/data type-wise) of the data (the cardinality you have given seems fine).
在中等服务器上听起来像一个相当可管理的负载 - 你没有说过这些插入和更新正在进行时发生了什么样的读操作(除了Lucene的提取)和大小(按字节/数据类型 - 明智的)数据(你给出的基数似乎很好)。
At this point, I would recommend just using regular SQL Server best practices - determine a schema which is appropriate (normalize, then denormalize only if necessary), review execution plans, use the index tuning wizard, use the DMVs to find the unused indexes and remove them, choose clustered indexes carefully to manage page splits, carefully choose data types and size and use referential integrity and constraints where possible to give the optimizer as much help as possible. Beyond that is performance counters and ensuring your hardware/software installation is tuned.
此时,我建议只使用常规SQL Server最佳实践 - 确定适当的模式(规范化,然后仅在必要时进行非规范化),检查执行计划,使用索引调整向导,使用DMV查找未使用的索引和删除它们,仔细选择聚簇索引来管理页面拆分,仔细选择数据类型和大小,并尽可能使用参照完整性和约束,为优化器提供尽可能多的帮助。除此之外还有性能计数器,并确保您的硬件/软件安装得到优化。
In many/most cases, you'll never need to go beyond that to actually re-engineer your architecture.
在许多/大多数情况下,您永远不需要超越它来实际重新设计您的架构。
However, even after all that, if the read load is heavy, the inserts and updates can cause locking issues between reads and writes, and then you are looking at architectural decisions for your application.
但是,即使在所有这些之后,如果读取负载很重,插入和更新可能会导致读取和写入之间出现锁定问题,然后您正在查看应用程序的体系结构决策。
Also, the million rows and 200k updates a day wouldn't worry me - but you mention Lucene (i.e. full text indexing), so presumably some of the columns are rather large. Updating large columns and exporting them obviously takes far longer - and far more bandwidth and IO. 30 columns in a narrow million row table with traditional data type columns would be a completely different story. You might want to look at the update profile and see if you need to partition the table vertically to move some columns out of the row (if they are large, they will already be stored out of row) to improve the locking behavior.
此外,每天百万行和200k更新不会让我担心 - 但你提到Lucene(即全文索引),所以可能有些列相当大。更新大型列并导出它们显然需要更长的时间 - 而且带宽和IO要多得多。在具有传统数据类型列的窄百万行表中的30列将是完全不同的故事。您可能希望查看更新配置文件,并查看是否需要垂直分区表以将某些列移出行(如果它们很大,它们已经存储在行外)以改善锁定行为。
So the key thing when you have heavy read load: Inserts and updates need to be as fast as possible, lock as little as possible (avoiding lock escalation), update as few indexes as can be afforded to support the read operation.
因此,当您有大量读取负载时,关键是:插入和更新需要尽可能快,尽可能少地锁定(避免锁升级),更新尽可能少的索引以支持读取操作。
If the read load is so heavy (so that the inserts/updates start to conflict) but does not require 100% up to date information (say a 5 minute or 15 minute delay is not noticeable), you can have a read only version of the database which is maintained (either identical through replication, differently indexed for performance, denormalized or differently modeled - like a dimensional model). Perhaps your Lucene indexes can contain additional information so that the expensive read operations all stay in Lucene - i.e. Lucene becomes covering for many large read operations, thereby reducing your read load on the database to essential reads which support the inserts/updates (these are typically small reads) and the transactional part of your app (i.e. say a customer service information screen would use the regular database, while your hourly dashboard would use the secondary database).
如果读取负载太重(以便插入/更新开始冲突)但不需要100%最新信息(例如5分钟或15分钟延迟不明显),您可以拥有只读版本维护的数据库(通过复制相同,不同的性能索引,非规范化或不同建模 - 如维度模型)。也许您的Lucene索引可以包含其他信息,以便昂贵的读取操作都保留在Lucene中 - 即Lucene成为许多大型读取操作的覆盖,从而将数据库上的读取负载减少到支持插入/更新的基本读取(这些通常是小读取)和应用程序的事务部分(即说客户服务信息屏幕将使用常规数据库,而您的每小时仪表板将使用辅助数据库)。
#2
3
You might try the SQL Server samples on CodePlex or DatabaseAnswers.com.
您可以在CodePlex或DatabaseAnswers.com上尝试SQL Server示例。
#3
3
Here are some resources about troubleshooting and optimizing performance in SQL Server, that I've found really helpful:
以下是有关在SQL Server中进行故障排除和优化性能的一些资源,我发现这些资源非常有用:
http://updates.sqlservervideos.com/2009/09/power-up-with-sql-server-sql-server-performance.html
In particular, effective use of indexes can be a huge performance booster. I think that most web applications, in most circumstances, do a lot more reading than writing. Also, the sargability of an expression can have a serious impact on performance.
特别是,有效使用索引可以成为一个巨大的性能助推器。我认为在大多数情况下,大多数Web应用程序比写入更多的阅读。此外,表达式的可攻击性会对性能产生严重影响。
#4
2
This is subject better explored first with books as it is highly technical and complex.
由于技术性和复杂性很高,因此首先要对书籍进行更好的探讨。
I will point out that the people who created this website include several who work with very large databases. You can learn alot from them. http://lessthandot.com/
我要指出,创建这个网站的人包括几个使用非常大的数据库的人。你可以从中学到很多东西。 http://lessthandot.com/