数据库查询和插入速度取决于什么?

时间:2021-05-27 15:40:11

At my work we have a small database (as in two hundred tables and maybe a total of a million of rows or so).

在我的工作中,我们有一个小的数据库(在200个表中,可能总共有100万行左右)。

I've always expected it to be quite fast in the order of several ten of thousands insertion per second and with querys taking milliseconds once the connection is established.

我一直期望它的速度非常快,每秒插入数万次,一旦建立连接,查询就会占用几毫秒的时间。

Quite the contrary we are having some performance problems so that we only get a couple of hundred insertions per second and querys, even the simplest ones would take for ever.

与此相反,我们有一些性能问题,所以我们每秒钟只能得到几百次插入和查询,即使是最简单的插入也永远不会停止。

I'm not enterly sure if that's the standar behavior/performance or we're doing something wrong. For example, 1500 querys which imply joining 4 tables on a single key column take around 10 seconds. It takes 3 minutes to load 300K of data in xml format into the database using simple inserts without violating any constraints.

我不确定这是标准行为还是我们做错了什么。例如,1500个查询意味着在一个键列上连接4个表需要大约10秒。在不违反任何约束的情况下,使用简单的插入将300K xml格式的数据加载到数据库需要3分钟。

The database is SQL Server 2005 and has a rich relational dependency model, meaning a lot of relations and categorizations over the data as well as a full set of check constraints for the categorization codes and several other things.

数据库是SQL Server 2005,具有丰富的关系依赖模型,这意味着对数据的大量关系和分类,以及对分类代码和其他一些东西的全面检查约束。

Are those times right? If not, what could be affecting performance? (All queries are done on indexed columns)

这些时间对吧?如果没有,什么会影响性能?(所有查询都在索引列上完成)

5 个解决方案

#1


5  

To have a rough comparison: the TPC-C benchmark record for SQL Server is at around 1.2 mil transactions per minute, and is been like this over last 4 years or so (caped by the 64 CPU OS limit). That is something in the balpark of ~16k transactions per second. This is on super high end machines, 64 CPUs, plenty of RAM, affinitized clients per NUMA node and a serverly short stripped I/O system (only about like 1-2% of each spindle is used). Bear in mind those are TPC-C transactions, so they consist of several operations (I think is 4-5 reads and 1-2 writes each in average).

要进行一个粗略的比较:SQL Server的TPC-C基准记录的每分钟大约为1.2 mil事务,并且在过去的4年左右(由64个CPU OS限制)。这是每秒大约16k个事务的差额。这是在超级高端机器上,64个cpu,大量RAM,每个NUMA节点都有传入客户端,还有一个非常短的剥离I/O系统(每个主轴只使用大约1-2%的主轴)。记住,这些是TPC-C事务,所以它们包含几个操作(我认为每个操作平均有4-5个读操作和1-2个写操作)。

Now you should scale down this top of the line hardware to your actual deployment and will get the ballpark where to set your expectations for overal OLTP transaction processing.

现在,您应该将这条线硬件的顶部缩小到实际部署,并将得到一个大致的目标,在那里设置您对overal OLTP事务处理的期望。

For data upload the current world record is about 1TB in 30 minutes (if is still current...). Several tens of thousands of inserts per second is quite ambitious, but achievable, when properly done on serious hardware. The article in the link contains tips and tricks for ETL high troughput (eg. use multiple upload streams and affinitize them to NUMA nodes).

对于数据上传,目前的世界记录在30分钟内大约为1TB(如果仍然是当前记录…)。每秒几万次插入是非常有野心的,但是如果在认真的硬件上正确地完成的话,是可以实现的。链接中的文章包含了ETL high troughput的提示和技巧。使用多个上传流并将它们关联到NUMA节点)。

For your situation I would advise first and foremost measure so you find out the bottlenecks and then ask specific questions how to solve specific botlenecks. A good starting point is the Waits and Queues whitepaper.

对于您的情况,我建议您首先采取措施,找出瓶颈,然后提出具体的问题,如何解决具体的问题。一个好的起点是等待和队列白皮书。

#2


5  

Indexing is a major factor here, when done properly they can speed up Select statements quite well, but remember that an index will bog down an insert as well as the server not only updates the data, but the indexes as well. The trick here is:

这里的一个主要因素是,如果处理得当,它们可以很好地加速Select语句,但是请记住索引将使插入和服务器不仅更新数据,而且还更新索引。这里的技巧是:

1) Determine the queries that are truly speed critical, these queries should have optimal indexes for them.

1)确定真正对速度至关重要的查询,这些查询应该有最优的索引。

2) Fill factor is important here as well. This provides empty space to an index page for filling later. When an index page is full (enough rows are inserted), a new page needs to be created taking yet more time. However empty pages occupy disk space.

填充因子在这里也很重要。这将为以后填充的索引页提供空空间。当索引页已满(插入足够多的行)时,需要花费更多的时间来创建一个新的页面。然而,空页占用磁盘空间。

My trick is this, for each application I set priorities as follows:

我的诀窍是,对于每个应用程序,我设置优先级如下:

1) Speed of read (SELECT, Some UPDATE, Some DELETE) - the higher this priority, the more indexes I create
2) Speed of write (INSERT, Some Update, Some DELETE) - the higher this priority, the fewer indexes I create
3) Disk space efficiency - the higher this priority, the higher my fill factor

1)阅读速度(选择、一些更新、删除)——这个优先级越高,写的我更多索引创建2)速度(插入,更新,删除)——这个优先级越高,越少我创建索引3)磁盘空间效率——这个优先级越高,越高填充因数

Note this knowledge generally applies to SQL Server, your mileage may vary on a different DBMS.

注意,这一知识通常适用于SQL Server,在不同的DBMS上,您的里数可能会有所不同。

SQL Statement evaluation can help here too, but this takes a real pro, careful WHERE and JOIN analysis can help determine bottlenecks and where your queries are suffering. Turn on SHOWPLAN and query plans, evaluate what you see and plan accordingly.

SQL语句评估在这里也有帮助,但是这需要一个真正的专业知识,在哪里、在哪里以及在哪里进行连接分析可以帮助确定瓶颈和查询遇到的困难。打开SHOWPLAN和查询计划,评估您看到的内容并相应地进行计划。

Also look at SQL Server 2008, indexed Joins!

还可以查看SQL Server 2008,索引连接!

#3


2  

A "Rich Relational Dependency" model is not conducive to fast insert speeds. Every constraint (primary key, value checks, and especially foreign keys), must be checked for every inserted record. Thats a lot more work than a "simple insert".

“丰富的关系依赖”模型不利于快速插入速度。对于每个插入的记录,必须检查每个约束(主键、值检查,特别是外键)。这比“简单插入”要复杂得多。

And it doesn't mtter that your inserts have no constraint violations, the time is probably going to be all in checking your foreign keys. Unless you have triggers also, because they're even worse.

而且插入没有违反约束,这并不能说明问题,检查外键的时间很可能是全部。除非你也有触发器,因为它们更糟糕。

Of course is it possible that the only thing that is wrong is that your Insert table is the parent-FK for a must-have-children" FK relation for another table tha forgot to add an index for the child-FK side on the FK relation (this is not automatic and is often forgotten). Of course, that's just hoping to get lucky. :-)

当然,可能唯一的错误是,您的插入表是一个必须要孩子的“父-FK”,而另一个表的FK关系忘记在FK关系上添加一个子FK的索引(这不是自动的,经常被遗忘)。当然,那只是希望幸运而已。:-)

#4


1  

Constraints add a small performance penalty. It also has to update indexes for every insert. And if you don't put multiple inserts into a single transaction, the database server has to execute every insert as a new, separate transaction, slowing it down further.

约束增加了很小的性能损失。它还必须为每次插入更新索引。如果不将多个插入放入一个事务中,则数据库服务器必须将每个插入作为一个新的、单独的事务执行,从而进一步降低它的速度。

150 queries/second joining 4 tables sounds normal, though I don't know much about your data.

虽然我对你的数据不太了解,但150个查询/第二个连接4个表听起来很正常。

#5


0  

"I've always expected it to be quite fast in the order of several ten of thousands insertion per second and with querys taking milliseconds once the connection is established."

“我一直认为它会很快,每秒插入数万次,一旦建立了连接,查询会花几毫秒。”

(a) Database performance depends for 99% on the amount of physical I/O (unless you are in some small site using an in-memory database, which can harmlessly afford to postpone all physical I/O until after the day is done). (b) Database I/O involves not only the actual physical I/O to the data files, but also the physical I/O to persist the journals/logs/... (and journaling is often even done in dual mode (i.e. twice) since say about two decades or so). (c) In what way the "amount of inserts" corresponds to the "amount of physical I/O", is completely determined by how much options the database designer has available for optimising the physical design. Only one thing can be said in general about this : SQL systems mostly fail (to provide the options necessary to transform "tens of thousands of inserts" to just maybe "a couple of hundreds" of physical I/O). Meaning that "tens of thousands of inserts" usually also implies "thousands of physical I/O", which usually implies "tens of seconds".

(a)数据库性能99%取决于物理I/O的数量(除非您是在使用内存中的数据库的某个小站点上,该数据库可以毫无危害地将所有物理I/O推迟到一天之后)。(b)数据库I/O不仅涉及到数据文件的实际物理I/O,而且还涉及物理I/O来持久化日志/日志/…(而且写日记通常是在双重模式下完成的(比如两次),因为大约二十年左右。)(c)“插入数量”与“物理I/O数量”之间的对应关系,完全取决于数据库设计人员有多少选择来优化物理设计。关于这一点,我们只能笼统地说一件事:SQL系统通常会失败(为将“成千上万的插入”转换为可能的“几百个”物理I/O提供必要的选项)。意思是“成千上万的插入”通常也意味着“成千上万的物理I/O”,这通常意味着“几十秒”。

That said, your message seems to express an expectation that somehow "inserts are extremely fast ("tens of thousands per second")" while "queries are slower" ("milliseconds per query", implying "less than 1000 queries per second"). That expectation is absurd.

也就是说,您的消息似乎表达了这样一种期望:不知何故“插入速度非常快(“每秒数万次”)”,而“查询速度较慢”(“每次查询毫秒数”,意味着“每秒1000次查询”)。期望是荒谬的。

#1


5  

To have a rough comparison: the TPC-C benchmark record for SQL Server is at around 1.2 mil transactions per minute, and is been like this over last 4 years or so (caped by the 64 CPU OS limit). That is something in the balpark of ~16k transactions per second. This is on super high end machines, 64 CPUs, plenty of RAM, affinitized clients per NUMA node and a serverly short stripped I/O system (only about like 1-2% of each spindle is used). Bear in mind those are TPC-C transactions, so they consist of several operations (I think is 4-5 reads and 1-2 writes each in average).

要进行一个粗略的比较:SQL Server的TPC-C基准记录的每分钟大约为1.2 mil事务,并且在过去的4年左右(由64个CPU OS限制)。这是每秒大约16k个事务的差额。这是在超级高端机器上,64个cpu,大量RAM,每个NUMA节点都有传入客户端,还有一个非常短的剥离I/O系统(每个主轴只使用大约1-2%的主轴)。记住,这些是TPC-C事务,所以它们包含几个操作(我认为每个操作平均有4-5个读操作和1-2个写操作)。

Now you should scale down this top of the line hardware to your actual deployment and will get the ballpark where to set your expectations for overal OLTP transaction processing.

现在,您应该将这条线硬件的顶部缩小到实际部署,并将得到一个大致的目标,在那里设置您对overal OLTP事务处理的期望。

For data upload the current world record is about 1TB in 30 minutes (if is still current...). Several tens of thousands of inserts per second is quite ambitious, but achievable, when properly done on serious hardware. The article in the link contains tips and tricks for ETL high troughput (eg. use multiple upload streams and affinitize them to NUMA nodes).

对于数据上传,目前的世界记录在30分钟内大约为1TB(如果仍然是当前记录…)。每秒几万次插入是非常有野心的,但是如果在认真的硬件上正确地完成的话,是可以实现的。链接中的文章包含了ETL high troughput的提示和技巧。使用多个上传流并将它们关联到NUMA节点)。

For your situation I would advise first and foremost measure so you find out the bottlenecks and then ask specific questions how to solve specific botlenecks. A good starting point is the Waits and Queues whitepaper.

对于您的情况,我建议您首先采取措施,找出瓶颈,然后提出具体的问题,如何解决具体的问题。一个好的起点是等待和队列白皮书。

#2


5  

Indexing is a major factor here, when done properly they can speed up Select statements quite well, but remember that an index will bog down an insert as well as the server not only updates the data, but the indexes as well. The trick here is:

这里的一个主要因素是,如果处理得当,它们可以很好地加速Select语句,但是请记住索引将使插入和服务器不仅更新数据,而且还更新索引。这里的技巧是:

1) Determine the queries that are truly speed critical, these queries should have optimal indexes for them.

1)确定真正对速度至关重要的查询,这些查询应该有最优的索引。

2) Fill factor is important here as well. This provides empty space to an index page for filling later. When an index page is full (enough rows are inserted), a new page needs to be created taking yet more time. However empty pages occupy disk space.

填充因子在这里也很重要。这将为以后填充的索引页提供空空间。当索引页已满(插入足够多的行)时,需要花费更多的时间来创建一个新的页面。然而,空页占用磁盘空间。

My trick is this, for each application I set priorities as follows:

我的诀窍是,对于每个应用程序,我设置优先级如下:

1) Speed of read (SELECT, Some UPDATE, Some DELETE) - the higher this priority, the more indexes I create
2) Speed of write (INSERT, Some Update, Some DELETE) - the higher this priority, the fewer indexes I create
3) Disk space efficiency - the higher this priority, the higher my fill factor

1)阅读速度(选择、一些更新、删除)——这个优先级越高,写的我更多索引创建2)速度(插入,更新,删除)——这个优先级越高,越少我创建索引3)磁盘空间效率——这个优先级越高,越高填充因数

Note this knowledge generally applies to SQL Server, your mileage may vary on a different DBMS.

注意,这一知识通常适用于SQL Server,在不同的DBMS上,您的里数可能会有所不同。

SQL Statement evaluation can help here too, but this takes a real pro, careful WHERE and JOIN analysis can help determine bottlenecks and where your queries are suffering. Turn on SHOWPLAN and query plans, evaluate what you see and plan accordingly.

SQL语句评估在这里也有帮助,但是这需要一个真正的专业知识,在哪里、在哪里以及在哪里进行连接分析可以帮助确定瓶颈和查询遇到的困难。打开SHOWPLAN和查询计划,评估您看到的内容并相应地进行计划。

Also look at SQL Server 2008, indexed Joins!

还可以查看SQL Server 2008,索引连接!

#3


2  

A "Rich Relational Dependency" model is not conducive to fast insert speeds. Every constraint (primary key, value checks, and especially foreign keys), must be checked for every inserted record. Thats a lot more work than a "simple insert".

“丰富的关系依赖”模型不利于快速插入速度。对于每个插入的记录,必须检查每个约束(主键、值检查,特别是外键)。这比“简单插入”要复杂得多。

And it doesn't mtter that your inserts have no constraint violations, the time is probably going to be all in checking your foreign keys. Unless you have triggers also, because they're even worse.

而且插入没有违反约束,这并不能说明问题,检查外键的时间很可能是全部。除非你也有触发器,因为它们更糟糕。

Of course is it possible that the only thing that is wrong is that your Insert table is the parent-FK for a must-have-children" FK relation for another table tha forgot to add an index for the child-FK side on the FK relation (this is not automatic and is often forgotten). Of course, that's just hoping to get lucky. :-)

当然,可能唯一的错误是,您的插入表是一个必须要孩子的“父-FK”,而另一个表的FK关系忘记在FK关系上添加一个子FK的索引(这不是自动的,经常被遗忘)。当然,那只是希望幸运而已。:-)

#4


1  

Constraints add a small performance penalty. It also has to update indexes for every insert. And if you don't put multiple inserts into a single transaction, the database server has to execute every insert as a new, separate transaction, slowing it down further.

约束增加了很小的性能损失。它还必须为每次插入更新索引。如果不将多个插入放入一个事务中,则数据库服务器必须将每个插入作为一个新的、单独的事务执行,从而进一步降低它的速度。

150 queries/second joining 4 tables sounds normal, though I don't know much about your data.

虽然我对你的数据不太了解,但150个查询/第二个连接4个表听起来很正常。

#5


0  

"I've always expected it to be quite fast in the order of several ten of thousands insertion per second and with querys taking milliseconds once the connection is established."

“我一直认为它会很快,每秒插入数万次,一旦建立了连接,查询会花几毫秒。”

(a) Database performance depends for 99% on the amount of physical I/O (unless you are in some small site using an in-memory database, which can harmlessly afford to postpone all physical I/O until after the day is done). (b) Database I/O involves not only the actual physical I/O to the data files, but also the physical I/O to persist the journals/logs/... (and journaling is often even done in dual mode (i.e. twice) since say about two decades or so). (c) In what way the "amount of inserts" corresponds to the "amount of physical I/O", is completely determined by how much options the database designer has available for optimising the physical design. Only one thing can be said in general about this : SQL systems mostly fail (to provide the options necessary to transform "tens of thousands of inserts" to just maybe "a couple of hundreds" of physical I/O). Meaning that "tens of thousands of inserts" usually also implies "thousands of physical I/O", which usually implies "tens of seconds".

(a)数据库性能99%取决于物理I/O的数量(除非您是在使用内存中的数据库的某个小站点上,该数据库可以毫无危害地将所有物理I/O推迟到一天之后)。(b)数据库I/O不仅涉及到数据文件的实际物理I/O,而且还涉及物理I/O来持久化日志/日志/…(而且写日记通常是在双重模式下完成的(比如两次),因为大约二十年左右。)(c)“插入数量”与“物理I/O数量”之间的对应关系,完全取决于数据库设计人员有多少选择来优化物理设计。关于这一点,我们只能笼统地说一件事:SQL系统通常会失败(为将“成千上万的插入”转换为可能的“几百个”物理I/O提供必要的选项)。意思是“成千上万的插入”通常也意味着“成千上万的物理I/O”,这通常意味着“几十秒”。

That said, your message seems to express an expectation that somehow "inserts are extremely fast ("tens of thousands per second")" while "queries are slower" ("milliseconds per query", implying "less than 1000 queries per second"). That expectation is absurd.

也就是说,您的消息似乎表达了这样一种期望:不知何故“插入速度非常快(“每秒数万次”)”,而“查询速度较慢”(“每次查询毫秒数”,意味着“每秒1000次查询”)。期望是荒谬的。