I've seen a number of people claim that you should specifically name each column you want in your select query.
我见过一些人声称,应该在select查询中指定每个列的名称。
Assuming I'm going to use all of the columns anyway, why would I not use SELECT *
?
假设我要使用所有的列,为什么不使用SELECT *呢?
Even considering the question *SQL query - Select * from view or Select col1, col2, … colN from view*, I don't think this is an exact duplicate as I'm approaching the issue from a slightly different perspective.
即使考虑到问题*SQL查询-从视图中选择*或从视图*中选择col1、col2、colN…
One of our principles is to not optimize before it's time. With that in mind, it seems like using SELECT *
should be the preferred method until it is proven to be a resource issue or the schema is pretty much set in stone. Which, as we know, won't occur until development is completely done.
我们的原则之一是在时机成熟之前不进行优化。考虑到这一点,似乎使用SELECT *应该是首选方法,直到它被证明是一个资源问题,或者模式几乎是一成不变的。正如我们所知,在开发完全完成之前,这是不会发生的。
That said, is there an overriding issue to not use SELECT *
?
也就是说,是否存在不使用SELECT *的压倒一切的问题?
20 个解决方案
#1
153
The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.
不提前优化引用的本质是使用简单而直接的代码,然后使用分析器指出热点,然后您可以对其进行优化以提高效率。
When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select *
is an anti-pattern.
当您使用select *时,您就不可能对其进行概要分析,因此您没有编写清晰和直接的代码,而且您违背了引用的精神。select *是反模式。
So selecting columns is not a premature optimization. A few things off the top of my head ....
因此选择列并不是过早的优化。一些东西从我的头顶....
- If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
- 如果在SQL语句中指定列,如果从表中删除该列并执行查询,那么SQL执行引擎将会出错。
- You can more easily scan code where that column is being used.
- 您可以更容易地扫描使用该列的代码。
- You should always write queries to bring back the least amount of information.
- 您应该始终编写查询以返回最少的信息。
- As others mention if you use ordinal column access you should never use select *
- 正如其他人提到的,如果使用序号列访问,则永远不应该使用select *
- If your SQL statement joins tables, select * gives you all columns from all tables in the join
- 如果SQL语句连接表,select *将给出连接中所有表的所有列
The corollary is that using select *
...
推论是使用select *…
- The columns used by the application is opaque
- 应用程序使用的列是不透明的
- DBA's and their query profilers are unable to help your application's poor performance
- DBA及其查询分析器无法帮助您的应用程序实现糟糕的性能
- The code is more brittle when changes occur
- 当发生更改时,代码更加脆弱
- Your database and network are suffering because they are bringing back too much data (I/O)
- 您的数据库和网络正在遭受损失,因为它们带来了太多的数据(I/O)
- Database engine optimizations are minimal as you're bringing back all data regardless (logical).
- 数据库引擎优化是最小的,因为您正在返回所有的数据,无论如何(逻辑上)。
Writing correct SQL is just as easy as writing Select *
. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.
编写正确的SQL和编写Select *一样简单。因此,真正懒惰的人编写适当的SQL,因为他们不想重访代码,并试图记住他们在做什么的时候在做什么。他们不想向DBA解释每一点代码。他们不想向客户解释为什么应用程序像狗一样运行。
#2
42
If your code depends on the columns being in a specific order, your code will break when there are changes to the table. Also, you may be fetching too much from the table when you select *, especially if there is a binary field in the table.
如果您的代码依赖于列的特定顺序,那么当表发生更改时,您的代码将会崩溃。此外,当您选择*时,可能会从表中获取过多的数据,特别是如果表中有一个二进制字段的话。
Just because you are using all the columns now, it doesn't mean someone else isn't going to add an extra column to the table.
仅仅因为您现在正在使用所有列,并不意味着其他人不会向表添加额外的列。
It also adds overhead to the plan execution caching since it has to fetch the meta data about the table to know what columns are in *.
它还增加了计划执行缓存的开销,因为它必须获取关于表的元数据,以了解*中的哪些列。
#3
22
One major reason is that if you ever add/remove columns from your table, any query/procedure that is making a SELECT * call will now be getting more or less columns of data than expected.
一个主要原因是,如果您从表中添加/删除列,那么进行SELECT *调用的任何查询/过程现在都将获得或多或少的数据列。
#4
16
-
In a roundabout way you are breaking the modularity rule about using strict typing wherever possible. Explicit is almost universally better.
以一种迂回的方式,您正在打破在任何可能的地方使用严格类型的模块化规则。明确的几乎是普遍的更好。
-
Even if you now need every column in the table, more could be added later which will be pulled down every time you run the query and could hurt performance. It hurts performance because
即使您现在需要表中的每一列,以后还可以添加更多的列,这些列在每次运行查询时都会被删除,可能会影响性能。这很伤我的性能,因为
- You are pulling more data over the wire; and
- 你把更多的数据拉过电线;和
- Because you might defeat the optimizer's ability to pull the data right out of the index (for queries on columns that are all part of an index.) rather than doing a lookup in the table itself
- 因为您可能会破坏优化器从索引中提取数据的能力(对于所有属于索引的列的查询),而不是在表本身中进行查找。
When TO use select *
When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.
当显式地需要表中的每一列时,而不需要编写查询时表中的每一列时。例如,如果正在编写一个DB管理应用程序,该应用程序需要显示表的全部内容(无论它们是什么情况),您可以使用该方法。
#5
12
There are a few reasons:
有几个原因:
- If the number of columns in a database changes and your application expects there to be a certain number...
- 如果数据库中的列数发生了变化,而应用程序希望有一定的列数……
- If the order of columns in a database changes and your application expects them to be in a certain order...
- 如果数据库中列的顺序发生了变化,并且应用程序希望它们按照一定的顺序……
- Memory overhead. 8 unnecessary INTEGER columns would add 32 bytes of wasted memory. That doesn't sound like a lot, but this is for each query and INTEGER is one of the small column types... the extra columns are more likely to be VARCHAR or TEXT columns, which adds up quicker.
- 内存开销。8个不必要的整数列将增加32字节的内存浪费。这听起来不是很多,但这是针对每个查询的,INTEGER是一种小列类型……额外的列更可能是VARCHAR或文本列,它们加起来更快。
- Network overhead. Related to memory overhead: if I issue 30,000 queries and have 8 unnecessary INTEGER columns, I've wasted 960kB of bandwidth. VARCHAR and TEXT columns are likely to be considerably larger.
- 网络开销。与内存开销相关:如果我发出30000个查询,并且有8个不必要的整数列,我浪费了960kB的带宽。VARCHAR和文本列可能要大得多。
Note: I chose INTEGER in the above example because they have a fixed size of 4 bytes.
注意:我在上面的示例中选择了INTEGER,因为它们的大小是固定的4字节。
#6
7
If your application gets data with SELECT * and the table structure in the database is changed (say a column is removed), your application will fail in every place that you reference the missing field. If you instead include all the columns in your query, you application will break in the (hopefully) one place where you initially get the data, making the fix easier.
如果您的应用程序使用SELECT *获取数据,并且数据库中的表结构发生了更改(比如删除了一个列),那么您的应用程序将在引用缺失字段的每个地方失败。如果您在查询中包含所有列,那么应用程序将在(希望如此)最初获取数据的地方中断,从而使修复变得更容易。
That being said, there are a number of situations in which SELECT * is desirable. One is a situation that I encounter all the time, where I need to replicate an entire table into another database (like SQL Server to DB2, for example). Another is an application written to display tables generically (i.e. without any knowledge of any particular table).
也就是说,有很多情况下选择*是可取的。一个是我经常遇到的情况,我需要将整个表复制到另一个数据库(例如,将SQL Server复制到DB2)。另一个是用来显示表的应用程序(即不了解任何特定表)。
#7
3
I actually noticed a strange behaviour when I used select *
in views in SQL Server 2005.
当我在SQL Server 2005的视图中使用select *时,我注意到了一个奇怪的行为。
Run the following query and you will see what I mean.
运行下面的查询,您将看到我的意思。
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[D] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest
Compare the results of last 2 select statements. I believe what you will see is a result of Select * referencing columns by index instead of name.
比较最后两个select语句的结果。我相信您将看到的是按索引而不是按名称选择*引用列的结果。
If you rebuild the view it will work fine again.
如果您重新构建视图,它将再次正常工作。
EDIT
编辑
I have added a separate question, *“select * from table” vs “select colA, colB, etc. from table” interesting behaviour in SQL Server 2005* to look into that behaviour in more details.
我在SQL Server 2005中添加了一个单独的问题,* " select * from table " vs " select colA, colB等from table " interesting behaviour in SQL Server 2005*,以更详细地研究这种行为。
#8
2
You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.
您可以连接两个表并使用第二个表中的A列。如果稍后将列A添加到第一个表(名称相同,但可能含义不同),则很可能从第一个表中获得值,而不是像前面那样从第二个表中获得值。如果显式指定要选择的列,则不会发生这种情况。
Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.
当然,如果您忘记将新列添加到每个select子句,那么指定列也会导致错误。如果每次执行查询时都不需要新列,那么可能需要一段时间才能发现这个错误。
#9
2
I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessary optimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?
我理解你对过早优化的看法,但这只是一个问题。其目的是在一开始就避免不必要的优化。你的表去?取消建立索引你会用nvarchar(4000)存储邮政编码吗?
As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).
正如其他人所指出的,指定要在查询中使用的每个列(例如可维护性)还有其他积极的方面。
#10
2
When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.
当你指定列的时候,你也把自己绑在一组特定的列上,让自己变得不那么灵活,让福厄斯坦滚进来,不管他在哪里。只是一个想法。
#11
2
SELECT * is not always evil. In my opinion, at least. I use it quite often for dynamic queries returning a whole table, plus some computed fields.
选择*并不总是邪恶的。至少在我看来是这样。我经常将它用于动态查询,返回整个表,以及一些计算字段。
For instance, I want to compute geographical geometries from a "normal" table, that is a table without any geometry field, but with fields containing coordinates. I use postgresql, and its spatial extension postgis. But the principle applies for many other cases.
例如,我想从“普通”表计算地理几何图形,这是一个没有任何几何字段的表,但包含坐标的字段。我使用postgresql,以及它的空间扩展postgis。但这一原则适用于许多其他情况。
An example:
一个例子:
-
a table of places, with coordinates stored in fields labeled x, y, z:
位置表,坐标存储在标有x、y、z的字段中:
CREATE TABLE places (place_id integer, x numeric(10, 3), y numeric(10, 3), z numeric(10, 3), description varchar);
创建表位置(place_id integer, x numeric(10,3), y numeric(10,3), z numeric(10,3), description varchar);
-
let's feed it with a few example values:
让我们以几个示例值来填充它:
INSERT INTO places (place_id, x, y, z, description) VALUES
(1, 2.295, 48.863, 64, 'Paris, Place de l\'Étoile'),
(2, 2.945, 48.858, 40, 'Paris, Tour Eiffel'),
(3, 0.373, 43.958, 90, 'Condom, Cathédrale St-Pierre');插入位置(place_id, x, y, z, description)值(1,2.295,48.863,64,“Paris, Place de l\' Etoile”),(2,2.945,48.858,40,“Paris, Tour Eiffel”),(3,0.373,43.958,90,“安全套,导管圣皮埃尔”);
-
I want to be able to map the contents of this table, using some GIS client. The normal way is to add a geometry field to the table, and build the geometry, based on the coordinates. But I would prefer to get a dynamic query: this way, when I change coordinates (corrections, more accuracy, etc.), the objects mapped actually move, dynamically. So here is the query with the SELECT *:
我希望能够使用一些GIS客户机来映射该表的内容。通常的方法是在表中添加一个几何字段,并根据坐标构建几何图形。但是我更希望得到一个动态查询:这样,当我改变坐标(更正,更准确,等等)时,映射的对象实际上是动态移动的。这是带有SELECT *的查询:
CREATE OR REPLACE VIEW places_points AS
SELECT *,
GeomFromewkt('SRID=4326; POINT ('|| x || ' ' || y || ' ' || z || ')')
FROM places;创建或替换视图places_points为SELECT *, geofromewkt ('SRID=4326;x点(' | | | |“| | y | |”“| | z | |”)”)的地方;
Refer to postgis, for GeomFromewkt() function use.
请参阅postgis,用于风水()函数的使用。
-
Here is the result:
这里是结果:
SELECT * FROM places_points;
从places_points SELECT *;
place_id | x | y | z | description | geomfromewkt ----------+-------+--------+--------+------------------------------+-------------------------------------------------------------------- 1 | 2.295 | 48.863 | 64.000 | Paris, Place de l'Étoile | 01010000A0E61000005C8FC2F5285C02405839B4C8766E48400000000000005040 2 | 2.945 | 48.858 | 40.000 | Paris, Tour Eiffel | 01010000A0E61000008FC2F5285C8F0740E7FBA9F1D26D48400000000000004440 3 | 0.373 | 43.958 | 90.000 | Condom, Cathédrale St-Pierre | 01010000A0E6100000AC1C5A643BDFD73FB4C876BE9FFA45400000000000805640 (3 lignes)
The rightmost column can now be used by any GIS program to properly map the points.
现在,任何GIS程序都可以使用最右边的列来正确地映射这些点。
- If, in the future, some fields get added to the table: no worries, I just have to run again the same VIEW definition.
- 如果将来有一些字段被添加到表中:不用担心,我只需要再次运行相同的视图定义。
I wish the definition of the VIEW could be kept "as is", with the *, but hélas it is not the case: this is how it is internally stored by postgresql:
我希望视图的定义可以与*保持“原样”,但helas不是这样:postgresql内部存储的方式是:
SELECT places.place_id, places.x, places.y, places.z, places.description, geomfromewkt(((((('SRID=4326; POINT ('::text || places.x) || ' '::text) || places.y) || ' '::text) || places.z) || ')'::text) AS geomfromewkt FROM places;
选择的地方。place_id,地方。x,地方。y的地方。z,places.description geomfromewkt((((((SRID = 4326;点(“::文本| | places.x)| |”“::文本)| | places.y)| |”“::文本)| | places.z)| |”)“::geomfromewkt从文本);
#12
1
Even if you use every column but address the row array by numeric index you will have problems if you add another row later on.
即使您使用了每一列,但是通过数字索引处理行数组,如果稍后添加另一行,也会出现问题。
So basically it is a question of maintainability! If you don't use the * selector you will not have to worry about your queries.
所以基本上这是一个可维护性的问题!如果不使用*选择器,就不必担心查询。
#13
1
Selecting only the columns you need keeps the dataset in memory smaller and therefor keeps your application faster.
只选择所需的列可以使内存中的数据集更小,从而使应用程序更快。
Also, a lot of tools (e.g. stored procedures) cache query execution plans too. If you later add or remove a column (particularly easy if you're selecting off a view), the tool will often error when it doesn't get back results that it expects.
此外,许多工具(例如存储过程)也会缓存查询执行计划。如果您以后添加或删除列(特别是在选择视图时),该工具通常会在没有得到预期结果时出错。
#14
1
It makes your code more ambiguous and more difficult to maintain; because you're adding extra unused data to the domain, and it's not clear which you've intended and which not. (It also suggests that you might not know, or care.)
它使您的代码更加模棱两可,并且更难维护;因为您正在向域添加额外的未使用的数据,所以不清楚您想要哪个和不需要哪个。(这也表明你可能不知道,也不在乎。)
#15
1
To answer you question directly: Do not use "SELECT *" when it makes your code more fragle to changes to the underlying tables. Your code should break only when a change is made to the table that directly affects requirments of your program.
要直接回答您的问题:当“SELECT *”使您的代码更容易修改底层表时,不要使用它。只有对直接影响程序请求的表进行更改时,代码才应该中断。
Your application should take advantage of the abstraction layer that Relational access provides.
应用程序应该利用关系访问提供的抽象层。
#16
1
I don't use SELECT * simply because it is nice to see and know what fields I am retrieving.
我不使用SELECT *仅仅是因为它很容易看到并知道我正在检索的字段。
#17
1
Generally bad to use 'select *' inside of views because you will be forced to recompile the view in the event of a table column change. Changing the underlying table columns of a view you will get an error for non-existant columns until you go back and recompile.
一般来说,在视图内部使用“select *”是不好的,因为在表列更改时,您将*重新编译视图。更改视图的底层表列将会得到一个错误的非存在列,直到您返回并重新编译。
#18
1
It's ok when you're doing exists(select * ...)
since it never gets expanded. Otherwise it's really only useful when exploring tables with temporary select statments or if you had a CTE defined above and you want every column without typing them all out again.
当你在做存在的事情时(选择*…)是可以的,因为它永远不会展开。否则,只有在使用临时的select语句或在上面定义了CTE并且希望每一列都不输入时,它才真正有用。
#19
1
Just to add one thing that no one else has mentioned. Select *
returns all the columns, someone may add a column later that you don't necessarily want the users to be able to see such as who last updated the data or a timestamp or notes that only managers should see not all users, etc.
我想补充一点,没有人提到过。Select *返回所有的列,稍后有人可能会添加一个列,您不一定希望用户能够看到,比如谁最近更新了数据,或者一个时间戳或注释,只有管理员才能看到不是所有用户,等等。
Further, when adding a column, the impact on existing code should be reviewed and considered to see if changes are needed based on what information is stored in the column. By using select *
, that review will often be skipped because the developer will assume that nothing will break. And in fact nothing may explicitly appear to break but queries may now start returning the wrong thing. Just because nothing explicitly breaks, doesn't mean that there should not have been changes to the queries.
此外,在添加列时,应该检查和考虑对现有代码的影响,看是否需要根据列中存储的信息进行更改。通过使用select *,常常会跳过这个检查,因为开发人员会假定没有任何东西会损坏。实际上,没有任何东西可以显式地显示中断,但是查询现在可能开始返回错误的东西。仅仅因为没有明显的中断,并不意味着不应该对查询进行更改。
#20
0
because "select * " will waste memory when you don't need all the fields.But for sql server, their performence are the same.
因为“select *”会在不需要所有字段时浪费内存。但是对于sql server来说,它们的性能是相同的。
#1
153
The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.
不提前优化引用的本质是使用简单而直接的代码,然后使用分析器指出热点,然后您可以对其进行优化以提高效率。
When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select *
is an anti-pattern.
当您使用select *时,您就不可能对其进行概要分析,因此您没有编写清晰和直接的代码,而且您违背了引用的精神。select *是反模式。
So selecting columns is not a premature optimization. A few things off the top of my head ....
因此选择列并不是过早的优化。一些东西从我的头顶....
- If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
- 如果在SQL语句中指定列,如果从表中删除该列并执行查询,那么SQL执行引擎将会出错。
- You can more easily scan code where that column is being used.
- 您可以更容易地扫描使用该列的代码。
- You should always write queries to bring back the least amount of information.
- 您应该始终编写查询以返回最少的信息。
- As others mention if you use ordinal column access you should never use select *
- 正如其他人提到的,如果使用序号列访问,则永远不应该使用select *
- If your SQL statement joins tables, select * gives you all columns from all tables in the join
- 如果SQL语句连接表,select *将给出连接中所有表的所有列
The corollary is that using select *
...
推论是使用select *…
- The columns used by the application is opaque
- 应用程序使用的列是不透明的
- DBA's and their query profilers are unable to help your application's poor performance
- DBA及其查询分析器无法帮助您的应用程序实现糟糕的性能
- The code is more brittle when changes occur
- 当发生更改时,代码更加脆弱
- Your database and network are suffering because they are bringing back too much data (I/O)
- 您的数据库和网络正在遭受损失,因为它们带来了太多的数据(I/O)
- Database engine optimizations are minimal as you're bringing back all data regardless (logical).
- 数据库引擎优化是最小的,因为您正在返回所有的数据,无论如何(逻辑上)。
Writing correct SQL is just as easy as writing Select *
. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.
编写正确的SQL和编写Select *一样简单。因此,真正懒惰的人编写适当的SQL,因为他们不想重访代码,并试图记住他们在做什么的时候在做什么。他们不想向DBA解释每一点代码。他们不想向客户解释为什么应用程序像狗一样运行。
#2
42
If your code depends on the columns being in a specific order, your code will break when there are changes to the table. Also, you may be fetching too much from the table when you select *, especially if there is a binary field in the table.
如果您的代码依赖于列的特定顺序,那么当表发生更改时,您的代码将会崩溃。此外,当您选择*时,可能会从表中获取过多的数据,特别是如果表中有一个二进制字段的话。
Just because you are using all the columns now, it doesn't mean someone else isn't going to add an extra column to the table.
仅仅因为您现在正在使用所有列,并不意味着其他人不会向表添加额外的列。
It also adds overhead to the plan execution caching since it has to fetch the meta data about the table to know what columns are in *.
它还增加了计划执行缓存的开销,因为它必须获取关于表的元数据,以了解*中的哪些列。
#3
22
One major reason is that if you ever add/remove columns from your table, any query/procedure that is making a SELECT * call will now be getting more or less columns of data than expected.
一个主要原因是,如果您从表中添加/删除列,那么进行SELECT *调用的任何查询/过程现在都将获得或多或少的数据列。
#4
16
-
In a roundabout way you are breaking the modularity rule about using strict typing wherever possible. Explicit is almost universally better.
以一种迂回的方式,您正在打破在任何可能的地方使用严格类型的模块化规则。明确的几乎是普遍的更好。
-
Even if you now need every column in the table, more could be added later which will be pulled down every time you run the query and could hurt performance. It hurts performance because
即使您现在需要表中的每一列,以后还可以添加更多的列,这些列在每次运行查询时都会被删除,可能会影响性能。这很伤我的性能,因为
- You are pulling more data over the wire; and
- 你把更多的数据拉过电线;和
- Because you might defeat the optimizer's ability to pull the data right out of the index (for queries on columns that are all part of an index.) rather than doing a lookup in the table itself
- 因为您可能会破坏优化器从索引中提取数据的能力(对于所有属于索引的列的查询),而不是在表本身中进行查找。
When TO use select *
When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.
当显式地需要表中的每一列时,而不需要编写查询时表中的每一列时。例如,如果正在编写一个DB管理应用程序,该应用程序需要显示表的全部内容(无论它们是什么情况),您可以使用该方法。
#5
12
There are a few reasons:
有几个原因:
- If the number of columns in a database changes and your application expects there to be a certain number...
- 如果数据库中的列数发生了变化,而应用程序希望有一定的列数……
- If the order of columns in a database changes and your application expects them to be in a certain order...
- 如果数据库中列的顺序发生了变化,并且应用程序希望它们按照一定的顺序……
- Memory overhead. 8 unnecessary INTEGER columns would add 32 bytes of wasted memory. That doesn't sound like a lot, but this is for each query and INTEGER is one of the small column types... the extra columns are more likely to be VARCHAR or TEXT columns, which adds up quicker.
- 内存开销。8个不必要的整数列将增加32字节的内存浪费。这听起来不是很多,但这是针对每个查询的,INTEGER是一种小列类型……额外的列更可能是VARCHAR或文本列,它们加起来更快。
- Network overhead. Related to memory overhead: if I issue 30,000 queries and have 8 unnecessary INTEGER columns, I've wasted 960kB of bandwidth. VARCHAR and TEXT columns are likely to be considerably larger.
- 网络开销。与内存开销相关:如果我发出30000个查询,并且有8个不必要的整数列,我浪费了960kB的带宽。VARCHAR和文本列可能要大得多。
Note: I chose INTEGER in the above example because they have a fixed size of 4 bytes.
注意:我在上面的示例中选择了INTEGER,因为它们的大小是固定的4字节。
#6
7
If your application gets data with SELECT * and the table structure in the database is changed (say a column is removed), your application will fail in every place that you reference the missing field. If you instead include all the columns in your query, you application will break in the (hopefully) one place where you initially get the data, making the fix easier.
如果您的应用程序使用SELECT *获取数据,并且数据库中的表结构发生了更改(比如删除了一个列),那么您的应用程序将在引用缺失字段的每个地方失败。如果您在查询中包含所有列,那么应用程序将在(希望如此)最初获取数据的地方中断,从而使修复变得更容易。
That being said, there are a number of situations in which SELECT * is desirable. One is a situation that I encounter all the time, where I need to replicate an entire table into another database (like SQL Server to DB2, for example). Another is an application written to display tables generically (i.e. without any knowledge of any particular table).
也就是说,有很多情况下选择*是可取的。一个是我经常遇到的情况,我需要将整个表复制到另一个数据库(例如,将SQL Server复制到DB2)。另一个是用来显示表的应用程序(即不了解任何特定表)。
#7
3
I actually noticed a strange behaviour when I used select *
in views in SQL Server 2005.
当我在SQL Server 2005的视图中使用select *时,我注意到了一个奇怪的行为。
Run the following query and you will see what I mean.
运行下面的查询,您将看到我的意思。
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[D] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest
Compare the results of last 2 select statements. I believe what you will see is a result of Select * referencing columns by index instead of name.
比较最后两个select语句的结果。我相信您将看到的是按索引而不是按名称选择*引用列的结果。
If you rebuild the view it will work fine again.
如果您重新构建视图,它将再次正常工作。
EDIT
编辑
I have added a separate question, *“select * from table” vs “select colA, colB, etc. from table” interesting behaviour in SQL Server 2005* to look into that behaviour in more details.
我在SQL Server 2005中添加了一个单独的问题,* " select * from table " vs " select colA, colB等from table " interesting behaviour in SQL Server 2005*,以更详细地研究这种行为。
#8
2
You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.
您可以连接两个表并使用第二个表中的A列。如果稍后将列A添加到第一个表(名称相同,但可能含义不同),则很可能从第一个表中获得值,而不是像前面那样从第二个表中获得值。如果显式指定要选择的列,则不会发生这种情况。
Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.
当然,如果您忘记将新列添加到每个select子句,那么指定列也会导致错误。如果每次执行查询时都不需要新列,那么可能需要一段时间才能发现这个错误。
#9
2
I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessary optimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?
我理解你对过早优化的看法,但这只是一个问题。其目的是在一开始就避免不必要的优化。你的表去?取消建立索引你会用nvarchar(4000)存储邮政编码吗?
As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).
正如其他人所指出的,指定要在查询中使用的每个列(例如可维护性)还有其他积极的方面。
#10
2
When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.
当你指定列的时候,你也把自己绑在一组特定的列上,让自己变得不那么灵活,让福厄斯坦滚进来,不管他在哪里。只是一个想法。
#11
2
SELECT * is not always evil. In my opinion, at least. I use it quite often for dynamic queries returning a whole table, plus some computed fields.
选择*并不总是邪恶的。至少在我看来是这样。我经常将它用于动态查询,返回整个表,以及一些计算字段。
For instance, I want to compute geographical geometries from a "normal" table, that is a table without any geometry field, but with fields containing coordinates. I use postgresql, and its spatial extension postgis. But the principle applies for many other cases.
例如,我想从“普通”表计算地理几何图形,这是一个没有任何几何字段的表,但包含坐标的字段。我使用postgresql,以及它的空间扩展postgis。但这一原则适用于许多其他情况。
An example:
一个例子:
-
a table of places, with coordinates stored in fields labeled x, y, z:
位置表,坐标存储在标有x、y、z的字段中:
CREATE TABLE places (place_id integer, x numeric(10, 3), y numeric(10, 3), z numeric(10, 3), description varchar);
创建表位置(place_id integer, x numeric(10,3), y numeric(10,3), z numeric(10,3), description varchar);
-
let's feed it with a few example values:
让我们以几个示例值来填充它:
INSERT INTO places (place_id, x, y, z, description) VALUES
(1, 2.295, 48.863, 64, 'Paris, Place de l\'Étoile'),
(2, 2.945, 48.858, 40, 'Paris, Tour Eiffel'),
(3, 0.373, 43.958, 90, 'Condom, Cathédrale St-Pierre');插入位置(place_id, x, y, z, description)值(1,2.295,48.863,64,“Paris, Place de l\' Etoile”),(2,2.945,48.858,40,“Paris, Tour Eiffel”),(3,0.373,43.958,90,“安全套,导管圣皮埃尔”);
-
I want to be able to map the contents of this table, using some GIS client. The normal way is to add a geometry field to the table, and build the geometry, based on the coordinates. But I would prefer to get a dynamic query: this way, when I change coordinates (corrections, more accuracy, etc.), the objects mapped actually move, dynamically. So here is the query with the SELECT *:
我希望能够使用一些GIS客户机来映射该表的内容。通常的方法是在表中添加一个几何字段,并根据坐标构建几何图形。但是我更希望得到一个动态查询:这样,当我改变坐标(更正,更准确,等等)时,映射的对象实际上是动态移动的。这是带有SELECT *的查询:
CREATE OR REPLACE VIEW places_points AS
SELECT *,
GeomFromewkt('SRID=4326; POINT ('|| x || ' ' || y || ' ' || z || ')')
FROM places;创建或替换视图places_points为SELECT *, geofromewkt ('SRID=4326;x点(' | | | |“| | y | |”“| | z | |”)”)的地方;
Refer to postgis, for GeomFromewkt() function use.
请参阅postgis,用于风水()函数的使用。
-
Here is the result:
这里是结果:
SELECT * FROM places_points;
从places_points SELECT *;
place_id | x | y | z | description | geomfromewkt ----------+-------+--------+--------+------------------------------+-------------------------------------------------------------------- 1 | 2.295 | 48.863 | 64.000 | Paris, Place de l'Étoile | 01010000A0E61000005C8FC2F5285C02405839B4C8766E48400000000000005040 2 | 2.945 | 48.858 | 40.000 | Paris, Tour Eiffel | 01010000A0E61000008FC2F5285C8F0740E7FBA9F1D26D48400000000000004440 3 | 0.373 | 43.958 | 90.000 | Condom, Cathédrale St-Pierre | 01010000A0E6100000AC1C5A643BDFD73FB4C876BE9FFA45400000000000805640 (3 lignes)
The rightmost column can now be used by any GIS program to properly map the points.
现在,任何GIS程序都可以使用最右边的列来正确地映射这些点。
- If, in the future, some fields get added to the table: no worries, I just have to run again the same VIEW definition.
- 如果将来有一些字段被添加到表中:不用担心,我只需要再次运行相同的视图定义。
I wish the definition of the VIEW could be kept "as is", with the *, but hélas it is not the case: this is how it is internally stored by postgresql:
我希望视图的定义可以与*保持“原样”,但helas不是这样:postgresql内部存储的方式是:
SELECT places.place_id, places.x, places.y, places.z, places.description, geomfromewkt(((((('SRID=4326; POINT ('::text || places.x) || ' '::text) || places.y) || ' '::text) || places.z) || ')'::text) AS geomfromewkt FROM places;
选择的地方。place_id,地方。x,地方。y的地方。z,places.description geomfromewkt((((((SRID = 4326;点(“::文本| | places.x)| |”“::文本)| | places.y)| |”“::文本)| | places.z)| |”)“::geomfromewkt从文本);
#12
1
Even if you use every column but address the row array by numeric index you will have problems if you add another row later on.
即使您使用了每一列,但是通过数字索引处理行数组,如果稍后添加另一行,也会出现问题。
So basically it is a question of maintainability! If you don't use the * selector you will not have to worry about your queries.
所以基本上这是一个可维护性的问题!如果不使用*选择器,就不必担心查询。
#13
1
Selecting only the columns you need keeps the dataset in memory smaller and therefor keeps your application faster.
只选择所需的列可以使内存中的数据集更小,从而使应用程序更快。
Also, a lot of tools (e.g. stored procedures) cache query execution plans too. If you later add or remove a column (particularly easy if you're selecting off a view), the tool will often error when it doesn't get back results that it expects.
此外,许多工具(例如存储过程)也会缓存查询执行计划。如果您以后添加或删除列(特别是在选择视图时),该工具通常会在没有得到预期结果时出错。
#14
1
It makes your code more ambiguous and more difficult to maintain; because you're adding extra unused data to the domain, and it's not clear which you've intended and which not. (It also suggests that you might not know, or care.)
它使您的代码更加模棱两可,并且更难维护;因为您正在向域添加额外的未使用的数据,所以不清楚您想要哪个和不需要哪个。(这也表明你可能不知道,也不在乎。)
#15
1
To answer you question directly: Do not use "SELECT *" when it makes your code more fragle to changes to the underlying tables. Your code should break only when a change is made to the table that directly affects requirments of your program.
要直接回答您的问题:当“SELECT *”使您的代码更容易修改底层表时,不要使用它。只有对直接影响程序请求的表进行更改时,代码才应该中断。
Your application should take advantage of the abstraction layer that Relational access provides.
应用程序应该利用关系访问提供的抽象层。
#16
1
I don't use SELECT * simply because it is nice to see and know what fields I am retrieving.
我不使用SELECT *仅仅是因为它很容易看到并知道我正在检索的字段。
#17
1
Generally bad to use 'select *' inside of views because you will be forced to recompile the view in the event of a table column change. Changing the underlying table columns of a view you will get an error for non-existant columns until you go back and recompile.
一般来说,在视图内部使用“select *”是不好的,因为在表列更改时,您将*重新编译视图。更改视图的底层表列将会得到一个错误的非存在列,直到您返回并重新编译。
#18
1
It's ok when you're doing exists(select * ...)
since it never gets expanded. Otherwise it's really only useful when exploring tables with temporary select statments or if you had a CTE defined above and you want every column without typing them all out again.
当你在做存在的事情时(选择*…)是可以的,因为它永远不会展开。否则,只有在使用临时的select语句或在上面定义了CTE并且希望每一列都不输入时,它才真正有用。
#19
1
Just to add one thing that no one else has mentioned. Select *
returns all the columns, someone may add a column later that you don't necessarily want the users to be able to see such as who last updated the data or a timestamp or notes that only managers should see not all users, etc.
我想补充一点,没有人提到过。Select *返回所有的列,稍后有人可能会添加一个列,您不一定希望用户能够看到,比如谁最近更新了数据,或者一个时间戳或注释,只有管理员才能看到不是所有用户,等等。
Further, when adding a column, the impact on existing code should be reviewed and considered to see if changes are needed based on what information is stored in the column. By using select *
, that review will often be skipped because the developer will assume that nothing will break. And in fact nothing may explicitly appear to break but queries may now start returning the wrong thing. Just because nothing explicitly breaks, doesn't mean that there should not have been changes to the queries.
此外,在添加列时,应该检查和考虑对现有代码的影响,看是否需要根据列中存储的信息进行更改。通过使用select *,常常会跳过这个检查,因为开发人员会假定没有任何东西会损坏。实际上,没有任何东西可以显式地显示中断,但是查询现在可能开始返回错误的东西。仅仅因为没有明显的中断,并不意味着不应该对查询进行更改。
#20
0
because "select * " will waste memory when you don't need all the fields.But for sql server, their performence are the same.
因为“select *”会在不需要所有字段时浪费内存。但是对于sql server来说,它们的性能是相同的。