Is it bad to SELECT
all columns at once even though you probably don't neeed all of them? However you might need them in another task but you are to lazy to write queries for every task.
即使您可能不需要所有列,但是一次选择所有列是不是很糟糕吗?但是,您可能需要在另一个任务中使用它们,但是您懒得为每个任务编写查询。
Should you only do queries where you SELECT
only columns you need and do this query again if you need another column?
如果您只需要选择所需的列,并且需要另一列,请再次执行此查询吗?
So basically the question is: Does it has any effect on performance to SELECT
one column vs multiple columns?
基本上问题是:SELECT一列与多列的性能是否有任何影响?
The query is very simple (no functions, joins etc.) For example:
查询非常简单(没有函数,连接等)例如:
SELECT
id, name, status, date
FROM user_table
WHERE user_id = :user_id
2 个解决方案
#1
13
The issue here isn't so much a matter of the database server, as just the network communication. By selecting all columns at once, you're telling the server to return to you, all columns at once. As for concerns over IO and all that, those are addressed nicely in the question and answer @Karamba gave in a comment: select * vs select column. But for most real-world applications (and I use "applications" in every sense), the main concern is just network traffic and how long it takes to serialize, transmit, then deserialize the data. Although really, the answer is the same either way.
这里的问题不仅仅是数据库服务器,而只是网络通信。通过一次选择所有列,您可以告诉服务器立即返回所有列。至于对IO的关注以及所有这些,在问题和答案中很好地解决了这些问题@Karamba在评论中:select * vs select column。但对于大多数真实世界的应用程序(我在各种意义上都使用“应用程序”),主要关注的是网络流量以及序列化,传输然后反序列化数据所需的时间。虽然真的,答案是相同的。
So pulling back all the columns is great, if you intend to use them all, but that can be a lot of extra data transfer, particularly if you store, say, lengthy strings in your columns. In many cases, of course, the difference will be undetectable and is mostly just a matter of principle. Not all, but a significant majority.
因此,如果您打算全部使用它们,那么撤回所有列是很好的,但这可能需要大量额外的数据传输,特别是如果您在列中存储冗长的字符串。当然,在许多情况下,差异将无法察觉,而且大部分只是原则问题。不是全部,而是绝大多数。
It's really just a trade-off between your aforementioned laziness (and trust me, we all feel that way) now and how important performance really is.
这真的只是你前面提到的懒惰(并且相信我,我们都有这种感觉)现在以及表现真正重要性之间的权衡。
That all said, if you do intend to use all the column values, you're much better off pulling them all back at once then you are filing a bunch of queries.
总而言之,如果你打算使用所有的列值,那么你最好立即将它们全部拉回来然后你提交一堆查询。
Think of it like doing a web search: you do your search, you find your page, and you only need one detail. You could read the entire page and know everything about the subject, or you could just jump to the part about what you're looking for and be done. The latter is a lot faster if that's all you ever want, but if you're then going to have to learn about the other aspects, you'd be way better off reading them the first time than having to do your search again and find the site to talk about it.
可以把它想象成一个网络搜索:你进行搜索,找到你的页面,你只需要一个细节。您可以阅读整个页面并了解有关该主题的所有内容,或者您可以跳到有关您正在寻找和完成的内容的部分。如果这就是你想要的,那么后者要快很多,但是如果你将要了解其他方面,那么你第一次阅读它们会比再次进行搜索更好。该网站谈论它。
If you aren't sure whether you'll need the other column values in the future, then that's your call to make as the developer for which case is more likely.
如果您不确定将来是否需要其他列值,那么您的调用是作为开发人员更有可能的情况。
It all depends on what your application is, what your data is, how you're using it, and how important performance really is to you.
这完全取决于您的应用程序是什么,您的数据是什么,您如何使用它,以及性能对您的重要性。
#2
8
Selecting a single column can have a large effect on the performance of certain queries. For example, it is more efficient for the query engine to process an index rather than look up data in the original data pages. If a covering index is available -- that is, an index that contains all the columns needed for a query -- then the query will run faster. For large tables that are too big for available memory, the use of a covering index can be a big, big win. (Think orders of magnitude improvement in performance in some cases.)
选择单个列可能会对某些查询的性能产生很大影响。例如,查询引擎处理索引而不是在原始数据页中查找数据更有效。如果覆盖索引可用 - 即包含查询所需的所有列的索引 - 那么查询将运行得更快。对于对于可用内存而言太大的大型表,使用覆盖索引可能是一个巨大而巨大的胜利。 (在某些情况下,可以认为性能有所提升。)
Another case when a limited number of columns is beneficial is when one or more of the columns are very large, such as a BLOB or TEXT column. These can grow in size to tens of thousands of bytes or even megabytes. Retrieving them and put a big load on the server.
有限数量的列有益的另一种情况是当一个或多个列非常大时,例如BLOB或TEXT列。它们的大小可以增加到数万字节甚至兆字节。检索它们并在服务器上施加很大的负担。
There is a danger in using *
, if you have prepared statements and the underlying structure of the table changes. The query itself could get out-of-date (I've had this problem on other databases, but not specifically on MySQL). The underlying change could be as simple as changing the name of a column. What would be caught as a compile time error is instead a run-time error that might be much more mysterious.
使用*时存在危险,如果您准备好了语句并且表的基础结构发生了变化。查询本身可能已经过时(我在其他数据库上遇到过这个问题,但在MySQL上没有特别说明)。潜在的变化可能就像更改列的名称一样简单。作为编译时错误而被捕获的是运行时错误,这可能更加神秘。
In general, the reasons given for avoiding *
have more to do with network performance. In many cases, it is not going to make much difference. If you are returning 20 rows from a table where each row contains, on average 100 or 200 bytes, then then difference between selecting all the columns and a subset of the columns will be minor in most hardware environments. The vast majority of the time the spent for the query will be for compiling the query, executing it in the engine, and reading the data pages. The difference between returning 200 bytes or 2000 bytes probably won't be a big difference.
一般而言,避免*的原因与网络性能有关。在许多情况下,它不会产生太大的影响。如果从每行包含的表中返回20行,平均为100或200字节,则在大多数硬件环境中,选择所有列和列的子集之间的差异将很小。查询花费的绝大部分时间用于编译查询,在引擎中执行查询以及读取数据页面。返回200字节或2000字节之间的差异可能不会有很大差异。
However, there are cases (such as the ones listed above) where it can make a big difference. So, avoiding *
is a good habit, but using it now and then probably isn't going to bring down your system.
但是,有些情况(例如上面列出的情况)会产生很大的不同。所以,避免*是一个好习惯,但现在使用它可能不会打倒你的系统。
#1
13
The issue here isn't so much a matter of the database server, as just the network communication. By selecting all columns at once, you're telling the server to return to you, all columns at once. As for concerns over IO and all that, those are addressed nicely in the question and answer @Karamba gave in a comment: select * vs select column. But for most real-world applications (and I use "applications" in every sense), the main concern is just network traffic and how long it takes to serialize, transmit, then deserialize the data. Although really, the answer is the same either way.
这里的问题不仅仅是数据库服务器,而只是网络通信。通过一次选择所有列,您可以告诉服务器立即返回所有列。至于对IO的关注以及所有这些,在问题和答案中很好地解决了这些问题@Karamba在评论中:select * vs select column。但对于大多数真实世界的应用程序(我在各种意义上都使用“应用程序”),主要关注的是网络流量以及序列化,传输然后反序列化数据所需的时间。虽然真的,答案是相同的。
So pulling back all the columns is great, if you intend to use them all, but that can be a lot of extra data transfer, particularly if you store, say, lengthy strings in your columns. In many cases, of course, the difference will be undetectable and is mostly just a matter of principle. Not all, but a significant majority.
因此,如果您打算全部使用它们,那么撤回所有列是很好的,但这可能需要大量额外的数据传输,特别是如果您在列中存储冗长的字符串。当然,在许多情况下,差异将无法察觉,而且大部分只是原则问题。不是全部,而是绝大多数。
It's really just a trade-off between your aforementioned laziness (and trust me, we all feel that way) now and how important performance really is.
这真的只是你前面提到的懒惰(并且相信我,我们都有这种感觉)现在以及表现真正重要性之间的权衡。
That all said, if you do intend to use all the column values, you're much better off pulling them all back at once then you are filing a bunch of queries.
总而言之,如果你打算使用所有的列值,那么你最好立即将它们全部拉回来然后你提交一堆查询。
Think of it like doing a web search: you do your search, you find your page, and you only need one detail. You could read the entire page and know everything about the subject, or you could just jump to the part about what you're looking for and be done. The latter is a lot faster if that's all you ever want, but if you're then going to have to learn about the other aspects, you'd be way better off reading them the first time than having to do your search again and find the site to talk about it.
可以把它想象成一个网络搜索:你进行搜索,找到你的页面,你只需要一个细节。您可以阅读整个页面并了解有关该主题的所有内容,或者您可以跳到有关您正在寻找和完成的内容的部分。如果这就是你想要的,那么后者要快很多,但是如果你将要了解其他方面,那么你第一次阅读它们会比再次进行搜索更好。该网站谈论它。
If you aren't sure whether you'll need the other column values in the future, then that's your call to make as the developer for which case is more likely.
如果您不确定将来是否需要其他列值,那么您的调用是作为开发人员更有可能的情况。
It all depends on what your application is, what your data is, how you're using it, and how important performance really is to you.
这完全取决于您的应用程序是什么,您的数据是什么,您如何使用它,以及性能对您的重要性。
#2
8
Selecting a single column can have a large effect on the performance of certain queries. For example, it is more efficient for the query engine to process an index rather than look up data in the original data pages. If a covering index is available -- that is, an index that contains all the columns needed for a query -- then the query will run faster. For large tables that are too big for available memory, the use of a covering index can be a big, big win. (Think orders of magnitude improvement in performance in some cases.)
选择单个列可能会对某些查询的性能产生很大影响。例如,查询引擎处理索引而不是在原始数据页中查找数据更有效。如果覆盖索引可用 - 即包含查询所需的所有列的索引 - 那么查询将运行得更快。对于对于可用内存而言太大的大型表,使用覆盖索引可能是一个巨大而巨大的胜利。 (在某些情况下,可以认为性能有所提升。)
Another case when a limited number of columns is beneficial is when one or more of the columns are very large, such as a BLOB or TEXT column. These can grow in size to tens of thousands of bytes or even megabytes. Retrieving them and put a big load on the server.
有限数量的列有益的另一种情况是当一个或多个列非常大时,例如BLOB或TEXT列。它们的大小可以增加到数万字节甚至兆字节。检索它们并在服务器上施加很大的负担。
There is a danger in using *
, if you have prepared statements and the underlying structure of the table changes. The query itself could get out-of-date (I've had this problem on other databases, but not specifically on MySQL). The underlying change could be as simple as changing the name of a column. What would be caught as a compile time error is instead a run-time error that might be much more mysterious.
使用*时存在危险,如果您准备好了语句并且表的基础结构发生了变化。查询本身可能已经过时(我在其他数据库上遇到过这个问题,但在MySQL上没有特别说明)。潜在的变化可能就像更改列的名称一样简单。作为编译时错误而被捕获的是运行时错误,这可能更加神秘。
In general, the reasons given for avoiding *
have more to do with network performance. In many cases, it is not going to make much difference. If you are returning 20 rows from a table where each row contains, on average 100 or 200 bytes, then then difference between selecting all the columns and a subset of the columns will be minor in most hardware environments. The vast majority of the time the spent for the query will be for compiling the query, executing it in the engine, and reading the data pages. The difference between returning 200 bytes or 2000 bytes probably won't be a big difference.
一般而言,避免*的原因与网络性能有关。在许多情况下,它不会产生太大的影响。如果从每行包含的表中返回20行,平均为100或200字节,则在大多数硬件环境中,选择所有列和列的子集之间的差异将很小。查询花费的绝大部分时间用于编译查询,在引擎中执行查询以及读取数据页面。返回200字节或2000字节之间的差异可能不会有很大差异。
However, there are cases (such as the ones listed above) where it can make a big difference. So, avoiding *
is a good habit, but using it now and then probably isn't going to bring down your system.
但是,有些情况(例如上面列出的情况)会产生很大的不同。所以,避免*是一个好习惯,但现在使用它可能不会打倒你的系统。